## Configuraciones comunes para un ambiente de `MLflow Tracking`.
![](https://mlflow.org/docs/latest/_images/tracking-setup-overview.png)

### Escenario 2
```python
# set mlflow tracking uri
import mlflow
mlflow.set_tracking_uri('sqlite:///mlflow.db')
```

### Escenario 3
MLFlow remoto

```python
import mlflow

mlflow.set_tracking_uri('url/remote/server')
```


## `MLflow`: Beneficios
* El `Tracking server` puede ser fácilmente desplegado en la nube
* Compartir experimentos con otros Data Scientists
* Colaborar con otros para construir y desplegar modelos
* Dar más visibilidad de los esfuerzos del equipo de Data Science.

## `MLflow`: Problemas cuando se ejecutan servidores remotos compartidos
* Seguridad:
    * Restringir el acceso al server (por ejemplo a través de una VPN)
* Isolation:
    * Definir un estándar para nombrar experimentos, modelos y un conjunto de tags predeterminados.
    * Restringir el acceso a los artefactos  

## `MLflow`: Limitaciones
* **Autenticación y Usuarios:** La versión open source de `MLflow` no provee ningún tipo de autenticación
* **Versionamiento de datos** 
    * Para asegurar total reproducibilidad, necesitamos versionar los datos que se usan para entrenar el modelo.
    * `MLflow` no provee una solución para eso, pero hay maneras de mitigarlo
* **Monitoreo del modelo y datos:** Veremos la herramienta adecuada para este fin 

# DagsHub
 <div style="text-align:center">
    <img src="https://user-images.githubusercontent.com/611655/181510038-e38f4001-c304-411e-8f45-f71554eb9763.png" alt="DagsHub Logo">
</div>

## Introducción:
DagsHub es una plataforma revolucionaria que se describe como el "GitHub para el aprendizaje automático". Permite a los científicos de datos y desarrolladores de aprendizaje automático gestionar y colaborar en sus proyectos de manera eficiente, asegurando la reproducibilidad y el control de versiones.

## Características Clave:
1. **Control de Versiones**: Realiza un seguimiento de los cambios en los datos, el código y los modelos, garantizando un historial completo de tu proyecto de aprendizaje automático.
2. **Colaboración**: Facilita la colaboración dentro de los equipos al permitir que varios usuarios trabajen en el mismo proyecto manteniendo el historial de versiones.
3. **Versionado de Datos**: Realiza un seguimiento de las versiones de los datos, lo que facilita la reproducción de experimentos y el intercambio de conjuntos de datos.
4. **Reproducibilidad**: Asegura que los experimentos se puedan replicar con el mismo código, datos y entorno.
5. **Interfaz Web**: Ofrece una interfaz web intuitiva para organizar y gestionar proyectos de aprendizaje automático.
6. **Repositorios Públicos y Privados**: Ofrece tanto repositorios públicos como privados para compartir proyectos de manera abierta o segura.
7. **Seguimiento de Experimentos**: Registra todos los detalles de los experimentos de aprendizaje automático, lo que facilita el análisis y la comparación de resultados.
8. **Integración**: Se integra fácilmente con herramientas y formatos de código abierto populares, como Jupyter notebooks y Git.
9. **Organización de Proyectos**: Proporciona herramientas para mantener estructurado y bien documentado tu proyecto de aprendizaje automático.


## Dagshub

1. Creamos una cuenta [aquí](https://dagshub.com/user/sign_up). Se puede asociar con la cuenta de GitHub.
2. Cambiar contraseña.
3. Crear un primer repositorio.

## Actividad
Vamos a prepara el ambiente de trabajo para la siguiente clase:

1. Creamos un repositorio en `Github` llamado `nyc-taxi-time-prediction`
2. Vinculamos el repositorio a nuestra cuenta de `Dagshub`
3. Clonamos el repositorio de `Github` en nuestro local
4. Creamos un ambiente virtual
5. Crear una branch `experiments`
6. Crear un directorio `experiments` en la carpeta raíz del proyecto
7. Crer un `jupyter-notebook` dentro de dicho directorio con el nombre `model_experiments.ipynb`

![ml flow cheatsheet](images/mlflow-cheatsheet.png)


## Vamos a reutilizar el código que ya hemos usado

```bash
pip install mlflow==2.16.1 dagshub==0.3.35 jupyter==1.1.1 xgboost==2.1.1 hyperopt==0.2.7
```

Copiar dataset en una carpeta `data`

In [1]:
# Create the directory if it doesn't exist
!mkdir -p ../data

# Download files using curl
!curl -o ../data/green_tripdata_2024-01.parquet https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2024-01.parquet
!curl -o ../data/green_tripdata_2024-02.parquet https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2024-02.parquet

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1330k  100 1330k    0     0  3507k      0 --:--:-- --:--:-- --:--:-- 3510k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1253k  100 1253k    0     0  8936k      0 --:--:-- --:--:-- --:--:--     0 0 --:--:-- --:--:-- --:--:-- 8955k


Importar las librerías necesarias y definir función para importar los datos

In [2]:
import pickle
import pandas as pd
from sklearn.metrics import  root_mean_squared_error
from sklearn.feature_extraction import  DictVectorizer
from sklearn.linear_model import Lasso, Ridge, LinearRegression

In [3]:
def read_dataframe(filename):

    df = pd.read_parquet(filename)

    df['duration'] = df.lpep_dropoff_datetime - df.lpep_pickup_datetime
    df.duration = df.duration.apply(lambda td: td.total_seconds() / 60)

    df = df[(df.duration >= 1) & (df.duration <= 60)]

    categorical = ['PULocationID', 'DOLocationID']
    df[categorical] = df[categorical].astype(str)

    return df

In [4]:
df_train = read_dataframe('../data/green_tripdata_2024-01.parquet')
df_val = read_dataframe('../data/green_tripdata_2024-02.parquet')

Feature Engineering

In [5]:
df_train['PU_DO'] = df_train['PULocationID'] + '_' + df_train['DOLocationID']
df_val['PU_DO'] = df_val['PULocationID'] + '_' + df_val['DOLocationID']

One Hot Encoding

In [6]:
categorical = ['PU_DO']  #'PULocationID', 'DOLocationID']
numerical = ['trip_distance']
dv = DictVectorizer()

train_dicts = df_train[categorical + numerical].to_dict(orient='records')
X_train = dv.fit_transform(train_dicts)

val_dicts = df_val[categorical + numerical].to_dict(orient='records')
X_val = dv.transform(val_dicts)

In [7]:
target = 'duration'
y_train = df_train[target].values
y_val = df_val[target].values

In [12]:
%pip install dagshub


Note: you may need to restart the kernel to use updated packages.


Definir el `tracking URI` y el nombre del experimento

In [8]:
import dagshub
import mlflow


dagshub.init(url="https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction", mlflow=True)

MLFLOW_TRACKING_URI = mlflow.get_tracking_uri()

print(MLFLOW_TRACKING_URI)

mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
mlflow.set_experiment(experiment_name="nyc-taxi-experiment")

2024/09/20 17:39:20 INFO mlflow.tracking.fluent: Experiment with name 'nyc-taxi-experiment' does not exist. Creating a new experiment.


https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow


<Experiment: artifact_location='mlflow-artifacts:/5237519f0cf74c8e961551526c9bca87', creation_time=1726875667929, experiment_id='0', last_update_time=1726875667929, lifecycle_stage='active', name='nyc-taxi-experiment', tags={}>

Definir los `dataset` como objetos de `mlflow` para poderlos trackear

In [10]:
training_dataset = mlflow.data.from_numpy(X_train.data, targets=y_train, name="green_tripdata_2024-01")
validation_dataset = mlflow.data.from_numpy(X_val.data, targets=y_val, name="green_tripdata_2024-02")

### Subir los dataset al storage que nos brinda `dagshub`

Ahora vamos a entrenar un modelo `xgboost`


In [11]:
import xgboost as xgb
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from hyperopt.pyll import scope
import pathlib

Definir los `dataset` a trabajar.

In [12]:
train = xgb.DMatrix(X_train, label=y_train)
valid = xgb.DMatrix(X_val, label=y_val)

Definir la función objetivo

In [13]:
def objective(params):
    with mlflow.start_run(nested=True):
         
        # Tag model
        mlflow.set_tag("model_family", "xgboost")
        
        # Log parameters
        mlflow.log_params(params)
        
        # Train model
        booster = xgb.train(
            params=params,
            dtrain=train,
            num_boost_round=100,
            evals=[(valid, 'validation')],
            early_stopping_rounds=10
        )
        
        # Log xgboost model with artifact_path
        mlflow.xgboost.log_model(booster, artifact_path="model")
         
        # Predict in the val dataset
        y_pred = booster.predict(valid)
        
        # Calculate metric
        rmse = root_mean_squared_error(y_val, y_pred)
        
        # Log performance metric
        mlflow.log_metric("rmse", rmse)

    return {'loss': rmse, 'status': STATUS_OK}

Definir el espacio de búsqueda

In [14]:
mlflow.xgboost.autolog()

with mlflow.start_run(run_name="Xgboost Hyper-parameter Optimization", nested=True):
    search_space = {
        'max_depth': scope.int(hp.quniform('max_depth', 4, 100, 1)),
        'learning_rate': hp.loguniform('learning_rate', -3, 0),
        'reg_alpha': hp.loguniform('reg_alpha', -5, -1),
        'reg_lambda': hp.loguniform('reg_lambda', -6, -1),
        'min_child_weight': hp.loguniform('min_child_weight', -1, 3),
        'objective': 'reg:squarederror',
        'seed': 42
    }
    
    best_params = fmin(
        fn=objective,
        space=search_space,
        algo=tpe.suggest,
        max_evals=10,
        trials=Trials()
    )
    best_params["max_depth"] = int(best_params["max_depth"])
    best_params["seed"] = 42
    best_params["objective"] = "reg:squarederror"
    
    mlflow.log_params(best_params)

    # Log tags
    mlflow.set_tags(
        tags={
            "project": "NYC Taxi Time Prediction Project",
            "optimizer_engine": "hyper-opt",
            "model_family": "xgboost",
            "feature_set_version": 1,
        }
    )

    # Log a fit model instance
    booster = xgb.train(
        params=best_params,
        dtrain=train,
        num_boost_round=100,
        evals=[(valid, 'validation')],
        early_stopping_rounds=10
    )
        
    y_pred = booster.predict(valid)
    
    rmse = root_mean_squared_error(y_val, y_pred)
    mlflow.log_metric("rmse", rmse)
    
    pathlib.Path("models").mkdir(exist_ok=True)
    with open("models/preprocessor.b", "wb") as f_out:
        pickle.dump(dv, f_out)
        
    mlflow.log_artifact("models/preprocessor.b", artifact_path="preprocessor")



[0]	validation-rmse:8.68150                           
[1]	validation-rmse:8.29335                           
[2]	validation-rmse:7.94396                           
[3]	validation-rmse:7.62973                           
[4]	validation-rmse:7.34841                           
[5]	validation-rmse:7.09659                           
[6]	validation-rmse:6.87133                           
[7]	validation-rmse:6.67186                           
[8]	validation-rmse:6.49442                           
[9]	validation-rmse:6.33748                           
[10]	validation-rmse:6.19897                          
[11]	validation-rmse:6.07709                          
[12]	validation-rmse:5.97006                          
[13]	validation-rmse:5.87631                          
[14]	validation-rmse:5.79472                          
[15]	validation-rmse:5.72227                          
[16]	validation-rmse:5.65937                          
[17]	validation-rmse:5.60413                          
[18]	valid








2024/09/20 17:40:20 INFO mlflow.tracking._tracking_service.client: 🏃 View run delightful-snail-496 at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0/runs/89c6c70f33744277ba035fff47e85ebd.

2024/09/20 17:40:20 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0.



[0]	validation-rmse:8.48552                                                   
[1]	validation-rmse:7.95651                                                   
[2]	validation-rmse:7.51214                                                   
[3]	validation-rmse:7.14119                                                   
[4]	validation-rmse:6.83346                                                   
[5]	validation-rmse:6.57986                                                   
[6]	validation-rmse:6.37155                                                   
[7]	validation-rmse:6.20109                                                   
[8]	validation-rmse:6.06311                                                   
[9]	validation-rmse:5.95139                                                   
[10]	validation-rmse:5.86073                                                  
[11]	validation-rmse:5.78671                                                  
[12]	validation-rmse:5.72749                        








2024/09/20 17:40:38 INFO mlflow.tracking._tracking_service.client: 🏃 View run unleashed-ant-690 at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0/runs/00c651b6b5584b25b9c504de75443ef2.

2024/09/20 17:40:38 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0.



[0]	validation-rmse:5.54166                                                   
[1]	validation-rmse:5.38415                                                   
[2]	validation-rmse:5.37531                                                   
[3]	validation-rmse:5.36934                                                   
[4]	validation-rmse:5.34003                                                   
[5]	validation-rmse:5.33459                                                   
[6]	validation-rmse:5.33490                                                   
[7]	validation-rmse:5.33371                                                   
[8]	validation-rmse:5.33702                                                   
[9]	validation-rmse:5.33582                                                   
[10]	validation-rmse:5.33558                                                  
[11]	validation-rmse:5.33542                                                  
[12]	validation-rmse:5.34038                        








2024/09/20 17:41:00 INFO mlflow.tracking._tracking_service.client: 🏃 View run melodic-bird-740 at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0/runs/a8834da010664b9a9e1af84dc97c0070.

2024/09/20 17:41:00 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0.



[0]	validation-rmse:8.35052                                                   
[1]	validation-rmse:7.72501                                                   
[2]	validation-rmse:7.21644                                                   
[3]	validation-rmse:6.80687                                                   
[4]	validation-rmse:6.47965                                                   
[5]	validation-rmse:6.21848                                                   
[6]	validation-rmse:6.01475                                                   
[7]	validation-rmse:5.85464                                                   
[8]	validation-rmse:5.72905                                                   
[9]	validation-rmse:5.63259                                                   
[10]	validation-rmse:5.55734                                                  
[11]	validation-rmse:5.49856                                                  
[12]	validation-rmse:5.45313                        








2024/09/20 17:41:19 INFO mlflow.tracking._tracking_service.client: 🏃 View run stately-grub-933 at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0/runs/f789d8f541bf4aaaaab99c9bf4749b99.

2024/09/20 17:41:19 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0.



[0]	validation-rmse:8.48195                                                   
[1]	validation-rmse:7.94486                                                   
[2]	validation-rmse:7.48928                                                   
[3]	validation-rmse:7.10583                                                   
[4]	validation-rmse:6.78497                                                   
[5]	validation-rmse:6.51733                                                   
[6]	validation-rmse:6.29446                                                   
[7]	validation-rmse:6.11103                                                   
[8]	validation-rmse:5.96060                                                   
[9]	validation-rmse:5.83678                                                   
[10]	validation-rmse:5.73675                                                  
[11]	validation-rmse:5.65467                                                  
[12]	validation-rmse:5.58844                        








2024/09/20 17:41:39 INFO mlflow.tracking._tracking_service.client: 🏃 View run blushing-gull-648 at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0/runs/5ef4f6cf215a4c1eab9c514e5a39ce3e.

2024/09/20 17:41:39 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0.



[0]	validation-rmse:8.45906                                                   
[1]	validation-rmse:7.90683                                                   
[2]	validation-rmse:7.44232                                                   
[3]	validation-rmse:7.04946                                                   
[4]	validation-rmse:6.72569                                                   
[5]	validation-rmse:6.45603                                                   
[6]	validation-rmse:6.23369                                                   
[7]	validation-rmse:6.05134                                                   
[8]	validation-rmse:5.90093                                                   
[9]	validation-rmse:5.77724                                                   
[10]	validation-rmse:5.67888                                                  
[11]	validation-rmse:5.59611                                                  
[12]	validation-rmse:5.53276                        








2024/09/20 17:41:59 INFO mlflow.tracking._tracking_service.client: 🏃 View run popular-hen-897 at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0/runs/fe9d63fe7dbe4eea8acaad6727064a7b.

2024/09/20 17:41:59 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0.



[0]	validation-rmse:8.75534                                                    
[1]	validation-rmse:8.43045                                                    
[2]	validation-rmse:8.13197                                                    
[3]	validation-rmse:7.86120                                                    
[4]	validation-rmse:7.60753                                                    
[5]	validation-rmse:7.38429                                                    
[6]	validation-rmse:7.18092                                                    
[7]	validation-rmse:6.98869                                                    
[8]	validation-rmse:6.82314                                                    
[9]	validation-rmse:6.66402                                                    
[10]	validation-rmse:6.52773                                                   
[11]	validation-rmse:6.40121                                                   
[12]	validation-rmse:6.29266            








2024/09/20 17:42:33 INFO mlflow.tracking._tracking_service.client: 🏃 View run grandiose-squid-391 at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0/runs/40a185b95533463197af8107ff06e323.

2024/09/20 17:42:33 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0.



[0]	validation-rmse:6.75324                                                    
[1]	validation-rmse:5.78564                                                    
[2]	validation-rmse:5.42288                                                    
[3]	validation-rmse:5.29956                                                    
[4]	validation-rmse:5.24499                                                    
[5]	validation-rmse:5.22471                                                    
[6]	validation-rmse:5.21311                                                    
[7]	validation-rmse:5.20818                                                    
[8]	validation-rmse:5.20525                                                    
[9]	validation-rmse:5.20486                                                    
[10]	validation-rmse:5.20501                                                   
[11]	validation-rmse:5.20761                                                   
[12]	validation-rmse:5.20776            








2024/09/20 17:42:53 INFO mlflow.tracking._tracking_service.client: 🏃 View run delicate-kite-138 at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0/runs/dc720c8264b74b76ab4c268dc5fc8734.

2024/09/20 17:42:53 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0.



[0]	validation-rmse:7.44514                                                    
[1]	validation-rmse:6.45081                                                    
[2]	validation-rmse:5.88557                                                    
[3]	validation-rmse:5.58264                                                    
[4]	validation-rmse:5.42093                                                    
[5]	validation-rmse:5.34011                                                    
[6]	validation-rmse:5.29821                                                    
[7]	validation-rmse:5.27601                                                    
[8]	validation-rmse:5.25838                                                    
[9]	validation-rmse:5.24960                                                    
[10]	validation-rmse:5.24246                                                   
[11]	validation-rmse:5.23915                                                   
[12]	validation-rmse:5.23670            








2024/09/20 17:43:13 INFO mlflow.tracking._tracking_service.client: 🏃 View run thoughtful-wasp-769 at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0/runs/156e37cb1b974c19828016e078f55470.

2024/09/20 17:43:13 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0.



[0]	validation-rmse:8.72825                                                    
[1]	validation-rmse:8.37854                                                    
[2]	validation-rmse:8.05963                                                    
[3]	validation-rmse:7.76929                                                    
[4]	validation-rmse:7.50604                                                    
[5]	validation-rmse:7.26752                                                    
[6]	validation-rmse:7.05239                                                    
[7]	validation-rmse:6.85824                                                    
[8]	validation-rmse:6.68385                                                    
[9]	validation-rmse:6.52628                                                    
[10]	validation-rmse:6.38445                                                   
[11]	validation-rmse:6.25741                                                   
[12]	validation-rmse:6.14479            








2024/09/20 17:43:40 INFO mlflow.tracking._tracking_service.client: 🏃 View run kindly-newt-556 at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0/runs/6f59e11a6c28490da461e7618a5c2aa7.

2024/09/20 17:43:40 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0.



100%|██████████| 10/10 [03:51<00:00, 23.12s/trial, best loss: 5.170179180318672]
[0]	validation-rmse:8.45906
[1]	validation-rmse:7.90683
[2]	validation-rmse:7.44232
[3]	validation-rmse:7.04946
[4]	validation-rmse:6.72569
[5]	validation-rmse:6.45603
[6]	validation-rmse:6.23369
[7]	validation-rmse:6.05134
[8]	validation-rmse:5.90093
[9]	validation-rmse:5.77724
[10]	validation-rmse:5.67888
[11]	validation-rmse:5.59611
[12]	validation-rmse:5.53276
[13]	validation-rmse:5.47803
[14]	validation-rmse:5.43155
[15]	validation-rmse:5.39714
[16]	validation-rmse:5.36570
[17]	validation-rmse:5.34019
[18]	validation-rmse:5.31548
[19]	validation-rmse:5.29718
[20]	validation-rmse:5.28166
[21]	validation-rmse:5.26923
[22]	validation-rmse:5.25649
[23]	validation-rmse:5.24526
[24]	validation-rmse:5.23784
[25]	validation-rmse:5.23178
[26]	validation-rmse:5.22446
[27]	validation-rmse:5.21904
[28]	validation-rmse:5.21373
[29]	validation-rmse:5.20998
[30]	validation-rmse:5.20627
[31]	validation-rmse:5.20264
[

2024/09/20 17:43:54 INFO mlflow.tracking._tracking_service.client: 🏃 View run Xgboost Hyper-parameter Optimization at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0/runs/5baa2192ccec4413beca0c71a70c3afb.
2024/09/20 17:43:54 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://dagshub.com/RenataOrzc/nyc-taxi-time-prediction.mlflow/#/experiments/0.


In [15]:
best_params

{'learning_rate': 0.11103884392800116,
 'max_depth': 38,
 'min_child_weight': 1.1367697094265403,
 'reg_alpha': 0.03836724066880147,
 'reg_lambda': 0.2274553022073802,
 'seed': 42,
 'objective': 'reg:squarederror'}

Ahora vamos a registrar el mejor modelo en el `model registry` y usarlo para hacer predicciones

In [17]:
run_id = input("Ingrese el run_id")
run_uri = f"runs:/{run_id}/model"

result = mlflow.register_model(
    model_uri=run_uri,
    name="nyc-taxi-model"
)

Registered model 'nyc-taxi-model' already exists. Creating a new version of this model...
2024/09/20 17:49:56 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: nyc-taxi-model, version 1
Created version '1' of model 'nyc-taxi-model'.


In [18]:
from datetime import datetime
from mlflow import MlflowClient

client = MlflowClient(tracking_uri=MLFLOW_TRACKING_URI)
client.update_registered_model(
    name="nyc-taxi-model",
    description="Model registry for the NYC Taxi Time Prediction Project",
)

new_alias = "champion"
date = datetime.today()
model_version = "1"

# create "champion" alias for version 1 of model "nyc-taxi-model"
client.set_registered_model_alias(
    name="nyc-taxi-model",
    alias=new_alias,
    version=model_version
)

client.update_model_version(
    name="nyc-taxi-model",
    version=model_version,
    description=f"The model version {model_version} was transitioned to {new_alias} on {date}",
)

<ModelVersion: aliases=['champion'], creation_timestamp=1726876303588, current_stage='None', description='The model version 1 was transitioned to champion on 2024-09-20 17:50:11.383018', last_updated_timestamp=1726876319076, name='nyc-taxi-model', run_id='00c651b6b5584b25b9c504de75443ef2', run_link='', source='mlflow-artifacts:/5237519f0cf74c8e961551526c9bca87/00c651b6b5584b25b9c504de75443ef2/artifacts/model', status='READY', status_message='', tags={}, user_id='', version='1'>

In [19]:
import mlflow.pyfunc

model_name = "nyc-taxi-model"
alias = "champion"

model_uri = f"models:/{model_name}@{alias}"

champion_version = mlflow.pyfunc.load_model(
    model_uri=model_uri
)

champion_version.predict(X_val)

  from .autonotebook import tqdm as notebook_tqdm
Downloading artifacts: 100%|██████████| 5/5 [00:00<00:00,  5.89it/s]


array([18.948471 , 28.315218 ,  6.8515067, ..., 32.208088 , 13.600061 ,
       20.087597 ], dtype=float32)

## Tarea y actividad en clase.

1. Hacer merge de la rama que trabajamos a main.
2. Crear una nueva rama que se llame `feat: tarea 5`.
3. Crear un nuevo `jupyter-notebook` llamado `challenger-experiments.ipynb` en la rama creada anteriormente
4. Hacer dos `parent experiments` con `Gradient Boost` y `Random Forest` regressors en donde cada uno tenga `child experiments` con búsqueda de hyper-parámetros. Puede usar cualquier libreraría con la que se sienta cómodo: `hyperopt`, `optuna`, `scikit-learn` (Grid Search, Random Search, Halving Search etc)
5. Registrar el modelo con la mejor métrica de los obtenidos en dichos experimentos en el `model registry` en el mismo modelo ya previamente creado `nyc-taxi-model`.
6. Asígnele el alias `challenger`
7. Descargue en la carpeta `data` el conjunto de datos correspondiente a marzo del 2024
8. Guardela en el `storage` disponible de `mlflow`
9. Use ese conjunto de datos para probarlo sobre los modelos con el alias `champion` y `challenger`
10. Obtenga la métrica de cada modelo
11. Decida si el nuevo modelo `challenger` debe ser promovido a `champion` o no. Use los criterios que usted como Data Scientis considere relevantes y justifique la respuesta.
12. Abrir un `PR` con los cambios hechos en la rama `feat: tarea 5` hacia la rama `main`.



Habrá dos entregas divididas de la siguiente manera:

1. **Trabajo en clase hoy Martes 17 de Septiembre de 2024.** Para esta entrega, hacer un commit con el siguiente mensaje `feat: entrega trabajo en clase` con los avances realizados en clase.

2. **Tarea: Viernes 20 de Septiembre de 2024 a las 19:55.** Esta entrega debe contener todo lo descrito anteriormente