#### 1.Experiment vs Run

1. An experiment in MLflow is a named collection of runs, where each run represents a specific execution of a machine learning workflow or training process. (An experiment group together realted runs). 

2. It encapsulates all the details of that particular execution, including the code, parameters, metrics, and artifacts produced during the run.

    ```py
    import mlflow
    import mlflow.sklearn
    ```

    ```py
    mlflow.set_experiment("NAME OF EXPERIMENT) # Declare exp in main block
    ```

#### 2. Using in a ML Model

- We declare the run inside this experiment `with` context manager, right before instantiating the model.

    ```py
    my_exp = mlflow.set_experiment("sklearn-experiment")
    with mlfow.start_run(experiment_id=myexp.experiment_id, run_name="RUN_NAME"):
        
        #Train Model
        ...
        ...

    ```
    However, if not using the context manager, then it is advised to use the `mlflow.end_run()` method to end the current run.

- To log hyperparameter, metric and the model 
    ```py
    mlflow.log_params({"n_estimators":n_estimators, "max_depth":max_depth})
    mlflow.log_metrics({
        "f1_score":f1,
        "Accuracy":accuracy,
        "Precisions":precisions
    })
    mlflow.sklearn.log_model(rf_model, "model")
    # first is model and second is the artifactory path
    ```

- A userfirendly UI to explore, interact and analyze ML experiments.
    ```bash
    $ mlflow ui
    ```

### 2. Mlflow Advanced Funcationality

1. Changing tracking location to server, filesystem or any other accesible location. 

    ```py
    mlflow.set_tracking_uri(r"./model_tracking")
    # Chnage URI before the experiment 

    my_exp = mlflow.set_experiment("Experiment_Name")
    mlflow.start_run(experiment_id=myexp.experiment_id, run_name="RUN_NAME)
    ```
    Also, to view these experiments and runs in the MLflow UI, use the following command:

    ```bash
    mlflow ui --backend-store-ui ./model_tracking
    ```
2. The start run method by specifying the run id If we do not specify the `run_id`, then the `start_run()` method will instantiate a new run. As a result, all of our metrics, params and artifacts will be logged under that new `run_id`, even if the run_name is the same. when we specify the `run_id`, it becomes redundant to specify the experiment_id because each run is associated with a single experiment only.

3. Launch Multiple Runs - a. For Hyperparamters tuning, b. Model Checkpointing, c. Systematic Experimentation
    
    a. Optimal set of hyperparams, range of values & iterate through them.
    ```py
    hyper_values = [0.01, 0.1, 0.5, 1.0]
    my_exp = mlflow.set_experiment("new_experiment)
    for lr in hyper_values:
        with mlflow.start_run(experiment_id=my_exp.experiment_id):
            mlflow.log_param("learning_rate", lr)

    ```
    b. Save Model Checkpoints at various intervals.
    ```py
    my_exp = mlflow.set_experiment("new_experiment)
    for epoch in range(100):
        with mlflow.start_run(experiment_id=my_exp.experiment_id):
            # log parameters
            mlflow.log_param("epoch", epoch)

            ...
            ...

            mlflow.sklearn.log_model(model, "model")
    ```
    c. testing different features or algorithms.
    ```py
    my_exp = myflow.set_experiment("experiment_name)
    for feature_set in ["basic", "extended"]:
        for algorithm in ["svm", "random_forest"]:

            with mlflow.start_run(experiment_id=my_exp.experiment_id):
                mlflow.log_param("feature_set", feature_set)
                mlflow.log_param("algorithm", algorithm)


    ```

4. MLflow Autologging - Automatically log various information about run, including: Metrics, Parameters (specified + default), Model Signature, Artifacts, Datasets

    ```py
    my_exp = mlflow.set_experiment("mlflow_autolog")
    with mlflow.start_run(experiment_id=my_exp.experiment_id):
        mlflow.autolog()
        rf_model = RandomForestClassifier(n_estimatores=n_estimators, max_depth=max_depth)
        rf_model.fit(X_train, y_train)
        predictions = rf_model.predict(X_test)
        f1, accuracy, precision = performance(y_test, predictions)
    ```


5. Mlflow tracking server. 2 Components to this - Storage & Communication Component. 
    - Storage: Backend Store: Id, run name, run id, params, metrics. Can be backed by a db like SQlite, PostgreSQL
    - By default logs to `./mlrun` directory
    - Remote aritifcats store : trained models, input data, output files, etc. `./mlartifacts` directory.

    ```bash
    $ mlflow ui --backend-store-uri sqlite:///my.db \
    -- default-artifact-root ./artifacts-store \
    --host 127.0.0.1
    --port 5005
    ```

    ```py
    mlflow_tracking_server = mlflow.set_tracking_uri(uri="http://127.0.0.1:5005")
    my_exp = mlflow.set_experiment("mlflow_tracking_server)
    ```

In [2]:
import sqlite3
import pandas as pd

# connect to sqlite database
conn = sqlite3.connect('my.db')

# create a cursor object to execute SQL queries
cursor = conn.cursor()

cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
table_names = cursor.fetchall()

for table in table_names:
    print(table)

('experiments',)
('alembic_version',)
('experiment_tags',)
('registered_models',)
('runs',)
('registered_model_tags',)
('model_versions',)
('latest_metrics',)
('metrics',)
('registered_model_aliases',)
('inputs',)
('input_tags',)
('params',)
('trace_info',)
('trace_tags',)
('trace_request_metadata',)
('tags',)
('datasets',)
('logged_models',)
('logged_model_metrics',)
('logged_model_params',)
('logged_model_tags',)
('model_version_tags',)


In [None]:
cursor.execute("Select * from metrics")
rows = cursor.fetchall()
column_name = [desc[0] for desc in cursor.description]
pd.DataFrame(rows, columns=column_name)

Unnamed: 0,key,value,timestamp,run_uuid,step,is_nan
0,training_precision_score,0.958627,1753113554784,78d1fc4855c54a08a4cca2c6d0cd0aea,0,0
1,training_recall_score,0.9575,1753113554784,78d1fc4855c54a08a4cca2c6d0cd0aea,0,0
2,training_f1_score,0.957463,1753113554784,78d1fc4855c54a08a4cca2c6d0cd0aea,0,0
3,training_accuracy_score,0.9575,1753113554784,78d1fc4855c54a08a4cca2c6d0cd0aea,0,0
4,training_log_loss,0.179907,1753113554784,78d1fc4855c54a08a4cca2c6d0cd0aea,0,0
5,training_roc_auc,0.995568,1753113554784,78d1fc4855c54a08a4cca2c6d0cd0aea,0,0
6,training_score,0.9575,1753113555932,78d1fc4855c54a08a4cca2c6d0cd0aea,0,0


In [4]:
cursor.execute("Select * from params")
rows = cursor.fetchall()
column_name = [desc[0] for desc in cursor.description]
pd.DataFrame(rows, columns=column_name)

Unnamed: 0,key,value,run_uuid
0,bootstrap,True,78d1fc4855c54a08a4cca2c6d0cd0aea
1,ccp_alpha,0.0,78d1fc4855c54a08a4cca2c6d0cd0aea
2,class_weight,,78d1fc4855c54a08a4cca2c6d0cd0aea
3,criterion,gini,78d1fc4855c54a08a4cca2c6d0cd0aea
4,max_depth,5,78d1fc4855c54a08a4cca2c6d0cd0aea
5,max_features,sqrt,78d1fc4855c54a08a4cca2c6d0cd0aea
6,max_leaf_nodes,,78d1fc4855c54a08a4cca2c6d0cd0aea
7,max_samples,,78d1fc4855c54a08a4cca2c6d0cd0aea
8,min_impurity_decrease,0.0,78d1fc4855c54a08a4cca2c6d0cd0aea
9,min_samples_leaf,1,78d1fc4855c54a08a4cca2c6d0cd0aea
