In [1]:
!pip install mlflow

Collecting mlflow
  Downloading mlflow-2.22.0-py3-none-any.whl.metadata (30 kB)
Collecting mlflow-skinny==2.22.0 (from mlflow)
  Downloading mlflow_skinny-2.22.0-py3-none-any.whl.metadata (31 kB)
Collecting alembic!=1.10.0,<2 (from mlflow)
  Downloading alembic-1.16.1-py3-none-any.whl.metadata (7.3 kB)
Collecting docker<8,>=4.0.0 (from mlflow)
  Downloading docker-7.1.0-py3-none-any.whl.metadata (3.8 kB)
Collecting graphene<4 (from mlflow)
  Downloading graphene-3.4.3-py2.py3-none-any.whl.metadata (6.9 kB)
Collecting gunicorn<24 (from mlflow)
  Downloading gunicorn-23.0.0-py3-none-any.whl.metadata (4.4 kB)
Collecting databricks-sdk<1,>=0.20.0 (from mlflow-skinny==2.22.0->mlflow)
  Downloading databricks_sdk-0.55.0-py3-none-any.whl.metadata (39 kB)
Collecting fastapi<1 (from mlflow-skinny==2.22.0->mlflow)
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting opentelemetry-api<3,>=1.9.0 (from mlflow-skinny==2.22.0->mlflow)
  Downloading opentelemetry_api-1.33.1-py3-

### Q1. Install MLflow

In [1]:
!mlflow --version

mlflow, version 2.22.0


Answer: MLFlow version is 2.22.0

### Q2. Download and preprocess the data

In [3]:
!mkdir /content/input/

In [4]:
!python preprocess_data.py --raw_data_path /content/input/ --dest_path ./output

In [5]:
!ls ./output | wc -l

4


Answer: There are 4 files in the output folder

### Q3. Train a model with autolog

In [6]:
!python train.py --data_path ./output

2025/05/27 23:07:09 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2025/05/27 23:07:09 INFO mlflow.store.db.utils: Updating database tables
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> 451aebb31d03, add metric step
INFO  [alembic.runtime.migration] Running upgrade 451aebb31d03 -> 90e64c465722, migrate user column to tags
INFO  [alembic.runtime.migration] Running upgrade 90e64c465722 -> 181f10493468, allow nulls for metric values
INFO  [alembic.runtime.migration] Running upgrade 181f10493468 -> df50e92ffc5e, Add Experiment Tags Table
INFO  [alembic.runtime.migration] Running upgrade df50e92ffc5e -> 7ac759974ad8, Update run tags with larger limit
INFO  [alembic.runtime.migration] Running upgrade 7ac759974ad8 -> 89d4b8295536, create latest metrics table
INFO  [89d4b8295536_create_latest_metrics_table_py] Migration complete!
INFO  

Answer: The value for min_samples_split is 10

### Q4. Launch the tracking server locally

In [9]:
!mlflow ui --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./artifacts

[2025-05-27 23:33:41 +0000] [8103] [INFO] Starting gunicorn 23.0.0
[2025-05-27 23:33:41 +0000] [8103] [INFO] Listening at: http://127.0.0.1:5000 (8103)
[2025-05-27 23:33:41 +0000] [8103] [INFO] Using worker: sync
[2025-05-27 23:33:41 +0000] [8106] [INFO] Booting worker with pid: 8106
[2025-05-27 23:33:42 +0000] [8109] [INFO] Booting worker with pid: 8109
[2025-05-27 23:33:42 +0000] [8110] [INFO] Booting worker with pid: 8110
[2025-05-27 23:33:42 +0000] [8111] [INFO] Booting worker with pid: 8111

Aborted!
[2025-05-27 23:35:25 +0000] [8103] [INFO] Handling signal: int
[2025-05-27 23:35:25 +0000] [8110] [INFO] Worker exiting (pid: 8110)
[2025-05-27 23:35:25 +0000] [8109] [INFO] Worker exiting (pid: 8109)
[2025-05-27 23:35:25 +0000] [8106] [INFO] Worker exiting (pid: 8106)
[2025-05-27 23:35:25 +0000] [8111] [INFO] Worker exiting (pid: 8111)
[2025-05-27 23:35:27 +0000] [8103] [INFO] Shutting down: Master


Answer: The other flag passed to configure the server is --default-artifact-root

### Q5. Tune the hyperparameters of the model

In [8]:
!python hpo.py

2025/05/27 23:27:13 INFO mlflow.tracking.fluent: Experiment with name 'random-forest-hyperopt' does not exist. Creating a new experiment.
100% 15/15 [01:48<00:00,  7.21s/trial, best loss: 5.335419588556921]


Answer: The Best RMSE value on Validation is 5.335

### Q6. Promote the best model to the model registry

In [10]:
!python register_model.py

2025/05/27 23:35:44 INFO mlflow.tracking.fluent: Experiment with name 'random-forest-best-models' does not exist. Creating a new experiment.
5.567882575275988
5.5809693519389
5.585552964251819
5.5851835740578695
5.591280102110505
runs://d21ee04a27a14e5c8ede319eb4451ed4/model
Successfully registered model 'green_taxi_2023'.
Traceback (most recent call last):
  File "/content/register_model.py", line 93, in <module>
    run_register_model()
  File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 1442, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 1363, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 1226, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 794

Answer: The Best RMSE value on Test is 5.567