# Q1. Install MLflow

```bash
conda create -n mlflow-env python=3.10
conda activate mlflow-env
pip install mlflow
mlflow --version # mlflow, version 2.22.0
```

# Q2. Download and preprocess the data
```bash
python preprocess_data.py --raw_data_path /workspaces/mlops-zoomcamp/02-experiment-tracking/TAXI_DATA_FOLDER --dest_path ./output
# There are 4 files in the output folder
```

# Q3. Train a model with autolog
1. Modified the `train.py` file to include MLflow autologging and the function mlflow.start_run() to start a run.
2. The `train.py` file is modified to mlflow.autolog() and:
```python
with mlflow.start_run():
        rf = RandomForestRegressor(max_depth=10, random_state=0)
        rf.fit(X_train, y_train)
        y_pred = rf.predict(X_val)

        rmse = root_mean_squared_error(y_val, y_pred)
```
Then:
```bash
python train.py --data_path ./output
# The min_samples_split is set to 2.
```

# Q4. Launch the tracking server locally
```bash
mlflow server \
    # where to store experiment metadata
  --backend-store-uri sqlite:///mlflow.db \ 
    # where to store artifacts
  --default-artifact-root ./artifacts 
```

# Q5. Tune model hyperparameters
1. The `train.py` file is modified to include the following prompts:
    ```python
        def objective(params):
        with mlflow.start_run(nested=True):
            mlflow.set_tag("andy", "theitadatadude")
            rf = RandomForestRegressor(**params)
            rf.fit(X_train, y_train)
            y_pred = rf.predict(X_val)
            rmse = root_mean_squared_error(y_val, y_pred)

            mlflow.log_params(params)
            mlflow.log_metric('rmse', rmse)

            return {'loss': rmse, 'status': STATUS_OK}
    ```
Then:
```bash
python hpo.py
# The RMSE is equal to 5.335419588556921
```


# Q6. Promote the best model to the model registry

