# Table of Contents

* [Introduction](#1)
* [Workspace Preparation](#2)
* [Data Preparation](#3)
* [Getting x_train, x_test, y_train, y_test](#4)
* [MLFlow workspace preparation and Use](#5)
* [Conclusions](#6)
* [References](#7)

# Introduction

> Nota: Este Jupyter Notebook en la mayor medida esta en Ingles. Se encuentra en español las partes del contexto y habra comentarios con spanglish. ;)

### Escenario del Notebook

- Una empresa activa en Big Data y Data Science desea contratar científicos de datos entre las personas que aprueben exitosamente algunos cursos impartidos por la empresa. Muchas personas se inscriben en su capacitación. La empresa quiere saber cuáles de estos candidatos realmente desean trabajar para la empresa después de la capacitación o si están buscando un nuevo empleo, ya que esto ayuda a reducir los costos y el tiempo, así como la calidad de la capacitación o la planificación de los cursos y la categorización de los candidatos. La información relacionada con la demografía, la educación y la experiencia está disponible a partir de la inscripción y matrícula de los candidatos.

- Este conjunto de datos está diseñado para comprender los factores que llevan a una persona a dejar su trabajo actual, también para investigaciones de recursos humanos. Utilizando el modelo o modelos que utilicen las credenciales actuales, la demografía y los datos de experiencia, **se predecirá la probabilidad de que un candidato busque un nuevo empleo o trabaje para la empresa, además de interpretar los factores que afectan la decisión del empleado.** 

- Todos los datos se dividen en conjuntos de entrenamiento y prueba. 

- El objetivo no está incluido en la prueba, pero se cuenta con el archivo de datos de los valores objetivo de prueba para tareas relacionadas. 

- También se proporciona una muestra de envío correspondiente al identificador de inscrito en el conjunto de prueba con las columnas: enrollee_id, target.


# Workspace preparation

In [1]:
#%matplotlib inline
#%config Completer.use_jedi=False

In [2]:
from pathlib import Path
from urllib.parse import urlparse

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import (accuracy_score, precision_score, recall_score, f1_score,
                             roc_auc_score, classification_report)
import mlflow
import mlflow.sklearn

# Data Preparation 

In [3]:
data = pd.read_csv(Path("../data/aug_train.csv"))
targets = data[["target"]]
data.drop(["enrollee_id", "target"], inplace=True, axis=1)

In [4]:
categorical_features = []
numerical_features = []

for column in data.columns:
    dtype = str(data[column].dtype)
    if dtype in ["float64", "int64"]:
        numerical_features.append(column)
    else:
        categorical_features.append(column)

In [5]:
for categorical_feature in categorical_features:
    data[categorical_feature].fillna('missing', inplace=True)

In [6]:
for categorical_feature in categorical_features:
    le = LabelEncoder()
    data[categorical_feature] = le.fit_transform(data[categorical_feature])

## Getting x_train, x_test, y_train, y_test

In [7]:
x_train, x_test, y_train, y_test = train_test_split(data.values, 
                                                    targets.values.ravel(), 
                                                    test_size=0.3, 
                                                    random_state=2021,
                                                    stratify=targets.values)

In [8]:
print(x_train.shape, x_test.shape)

(13410, 12) (5748, 12)


In [9]:
print(y_train.shape, y_test.shape)

(13410,) (5748,)


# MLFlow workspace preparation and Use

In [10]:
mlflow_uri_from_aws = str(input("Cual es la URL de instancia de EC2"))
#"http://3.226.165.98:5000/"

In [11]:
mlflow.set_tracking_uri(mlflow_uri_from_aws)

In [12]:
tracking_uri = mlflow.get_tracking_uri()

In [13]:
tracking_uri

'http://3.226.165.98:5000/'

## Part1: Checking MLflow Default Experiment and Runs

In [14]:
import mlflow

# Case sensitive name
experiment = mlflow.get_experiment_by_name("Default")
print("Experiment_id: {}".format(experiment.experiment_id))
print("Artifact Location: {}".format(experiment.artifact_location))
print("Tags: {}".format(experiment.tags))
print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))
print("Creation timestamp: {}".format(experiment.creation_time))


Experiment_id: 0
Artifact Location: s3://mlflow-artifact-store-awscday/0
Tags: {}
Lifecycle_stage: active
Creation timestamp: 1689034898116


In [15]:
from mlflow import MlflowClient
from mlflow.entities import ViewType

run = MlflowClient().search_runs(
    experiment_ids="2",
    filter_string="",
    run_view_type=ViewType.ACTIVE_ONLY,
    max_results=1,
    order_by=["metrics.accuracy DESC"],
)

print(run)

[]


## Part2: Understanding MLFlow Functions with scenarios

In [16]:
experiment_name_formlflow = str(input("Cual es el nombre de tu experimento ?:"))
"01B3_MFlow_LR_train"


'01B3_MFlow_LR_train'

In [17]:
print(experiment_name_formlflow)

01B3_MFlow_LR_train


In [18]:
from mlflow import MlflowClient

# Create an experiment with a name that is unique and case sensitive.
client = MlflowClient()

#experiment_name_formlflow = "01Basic_MFlow_LR_train"
# Create an experiment name, which must be unique and case sensitive
experiment_id = client.create_experiment(
    experiment_name_formlflow
)

# Fetch experiment metadata information
experiment = client.get_experiment(experiment_id)
print("Name: {}".format(experiment.name))
print("Experiment_id: {}".format(experiment.experiment_id))
print("Artifact Location: {}".format(experiment.artifact_location))
print("Tags: {}".format(experiment.tags))
print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))

Name: 01B3_MFlow_LR_train
Experiment_id: 22
Artifact Location: s3://mlflow-artifact-store-awscday/22
Tags: {}
Lifecycle_stage: active


### Experiment 1, Smaller Group of Parameters and Metrics

In [19]:
with mlflow.start_run(experiment_id=experiment_id):
    class_weight = "balanced"
    max_iter = 1000

    logistic_regression = LogisticRegression(class_weight=class_weight, max_iter=max_iter)
    logistic_regression.fit(x_train, y_train)

    y_pred = logistic_regression.predict(x_test)

    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    auc = roc_auc_score(y_test, y_pred)
    
    mlflow.log_param("class_weight", class_weight)
    mlflow.log_param("max_iter", max_iter)
    
    mlflow.log_metric("accuracy", accuracy)
    mlflow.log_metric("precision", precision)
    mlflow.log_metric("recall", recall)
    mlflow.log_metric("f1", f1)
    mlflow.log_metric("auc", auc) 
    
    mlflow.sklearn.log_model(logistic_regression, "model")



#### Using nested=True in Experiment1

In [20]:
with mlflow.start_run(experiment_id=experiment_id,nested=True):
    # tracking run parameters, ecosystem
    mlflow.log_param("compute", 'local')
    mlflow.log_param("dataset", 'kaggle-dataset')
    mlflow.log_param("dataset_version", '1.0')
    mlflow.log_param("algo", 'Logistic Regression')

    class_weight = "balanced"
    max_iter = 1000

    logistic_regression = LogisticRegression(class_weight=class_weight, max_iter=max_iter)
    logistic_regression.fit(x_train, y_train)

    y_pred = logistic_regression.predict(x_test)

    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    auc = roc_auc_score(y_test, y_pred)
    
    mlflow.log_param("class_weight", class_weight)
    mlflow.log_param("max_iter", max_iter)
    
    mlflow.log_metric("accuracy", accuracy)
    mlflow.log_metric("precision", precision)
    mlflow.log_metric("recall", recall)
    mlflow.log_metric("f1", f1)
    mlflow.log_metric("auc", auc) 
    
    mlflow.sklearn.log_model(logistic_regression, "model")

### Experiment 2, Bigger Group of Parameters and Metrics

#### Adjusting metadata

In [21]:
#Train dataset URL
dataset_source_url="https://www.kaggle.com/datasets/arashnic/hr-analytics-job-change-of-data-scientists?resource=download&select=aug_train.csv"

#### Using MlflowClient() to create experiment

In [22]:
# Create an experiment with a name that is unique and case sensitive.
client = MlflowClient()

experiment_name_formlflow = "02B3_MFlow_LR_train"
# Create an experiment name, which must be unique and case sensitive
experiment_id = client.create_experiment(
    experiment_name_formlflow
)

#### Making run over experiment with mlflow.start_run 

In [23]:
with mlflow.start_run(experiment_id=experiment_id):

    # tracking run parameters, ecosystem
    mlflow.log_param("compute", 'local')
    mlflow.log_param("dataset", 'kaggle-dataset')
    mlflow.log_param("dataset_version", '1.0')
    mlflow.log_param("dataset_path", '../data/aug_train.csv"')
    mlflow.log_param("dataset_url", dataset_source_url)
    mlflow.log_param("algo", 'Logistic Regression')

    class_weight = "balanced"
    max_iter = 1000

    logistic_regression = LogisticRegression(class_weight=class_weight, max_iter=max_iter)
    logistic_regression.fit(x_train, y_train)

    y_pred = logistic_regression.predict(x_test)

    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    auc = roc_auc_score(y_test, y_pred)
    
    mlflow.log_param("class_weight", class_weight)
    mlflow.log_param("max_iter", max_iter)
    
    mlflow.log_metric("accuracy", accuracy)
    mlflow.log_metric("precision", precision)
    mlflow.log_metric("recall", recall)
    mlflow.log_metric("f1", f1)
    mlflow.log_metric("auc", auc) 
    
    mlflow.sklearn.log_model(logistic_regression, "model")

In [24]:
# Fetch experiment metadata information
experiment = client.get_experiment(experiment_id)
print("Name: {}".format(experiment.name))
print("Experiment_id: {}".format(experiment.experiment_id))
print("Artifact Location: {}".format(experiment.artifact_location))
print("Tags: {}".format(experiment.tags))
print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))
print("Creation timestamp: {}".format(experiment.creation_time))

Name: 02B3_MFlow_LR_train
Experiment_id: 23
Artifact Location: s3://mlflow-artifact-store-awscday/23
Tags: {}
Lifecycle_stage: active
Creation timestamp: 1689264796535


In [27]:
from mlflow.entities import ViewType

run = MlflowClient().search_runs(
    experiment_ids="23",
    filter_string="",
    run_view_type=ViewType.ACTIVE_ONLY,
    order_by=["metrics.accuracy DESC"],
)

print(len(run))
print(run[0])

1
<Run: data=<RunData: metrics={'accuracy': 0.7352122477383438,
 'auc': 0.7208662878564284,
 'f1': 0.5658870507701085,
 'precision': 0.4785335262904004,
 'recall': 0.6922540125610607}, params={'algo': 'Logistic Regression',
 'class_weight': 'balanced',
 'compute': 'local',
 'dataset': 'kaggle-dataset',
 'dataset_path': '../data/aug_train.csv"',
 'dataset_url': 'https://www.kaggle.com/datasets/arashnic/hr-analytics-job-change-of-data-scientists?resource=download&select=aug_train.csv',
 'dataset_version': '1.0',
 'max_iter': '1000'}, tags={'mlflow.log-model.history': '[{"run_id": "a44df8eaac0f4248a23b24b6bae2aa0f", '
                             '"artifact_path": "model", "utc_time_created": '
                             '"2023-07-13 16:13:25.602876", "flavors": '
                             '{"python_function": {"model_path": "model.pkl", '
                             '"predict_fn": "predict", "loader_module": '
                             '"mlflow.sklearn", "python_version": "3.8.3

#### Making rerun over experiment after using mlflow.set_experiment and adding aug_train data as artifact

In [60]:
experiment_name_formlflow="02B3_MFlow_LR_train"
#Set an experiment name, which must be unique and case sensitive
experiment_id = mlflow.set_experiment(
    experiment_name_formlflow
)

In [61]:
print(type(experiment_id),experiment_id.experiment_id)

<class 'mlflow.entities.experiment.Experiment'> 23


In [62]:
experiment_id=experiment_id.experiment_id
# Fetch experiment metadata information
experiment = client.get_experiment(experiment_id)
print("Name: {}".format(experiment.name))
print("Experiment_id: {}".format(experiment.experiment_id))
print("Artifact Location: {}".format(experiment.artifact_location))
print("Tags: {}".format(experiment.tags))
print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))

Name: 02B3_MFlow_LR_train
Experiment_id: 23
Artifact Location: s3://mlflow-artifact-store-awscday/23
Tags: {}
Lifecycle_stage: active


In [64]:
# Re-run over experiment and logging train_data as artifact

import os
#Log data as artifact
import tempfile

data = pd.read_csv(Path("../data/aug_train.csv"))
targets = data[["target"]]
data.drop(["enrollee_id", "target"], inplace=True, axis=1)

with mlflow.start_run(experiment_id=experiment_id):

    # tracking run parameters, ecosystem
    mlflow.log_param("compute", 'local')
    mlflow.log_param("dataset", 'kaggle-dataset')
    mlflow.log_param("dataset_version", '1.0')
    mlflow.log_param("dataset_path", '../data/aug_train.csv"')
    mlflow.log_param("dataset_url", dataset_source_url)
    mlflow.log_param("algo", 'Logistic Regression')

    class_weight = "balanced"
    max_iter = 1000

    logistic_regression = LogisticRegression(class_weight=class_weight, max_iter=max_iter)
    logistic_regression.fit(x_train, y_train)

    y_pred = logistic_regression.predict(x_test)

    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    auc = roc_auc_score(y_test, y_pred)
    
    mlflow.log_param("class_weight", class_weight)
    mlflow.log_param("max_iter", max_iter)
    
    mlflow.log_metric("accuracy", accuracy)
    mlflow.log_metric("precision", precision)
    mlflow.log_metric("recall", recall)
    mlflow.log_metric("f1", f1)
    mlflow.log_metric("auc", auc) 
    
    mlflow.sklearn.log_model(logistic_regression, "model")

    with tempfile.TemporaryDirectory() as tmp:
        path = os.path.join(tmp, 'aug_train.csv')
        print(path)
        data.to_csv(path)
        mlflow.log_artifacts(tmp)

/var/folders/65/1pl9_5wj56j1j9mtndbywpr80000gn/T/tmpwk860smm/aug_train.csv


### Experiment 3: Using Datasets

In [48]:
import mlflow.data
import pandas as pd
from mlflow.data.pandas_dataset import PandasDataset

In [49]:
experiment_name_formlflow="02B3_MFlow_LR_train"
#Set an experiment name, which must be unique and case sensitive
experiment_id = mlflow.set_experiment(
    experiment_name_formlflow
)

In [50]:
print(type(experiment_id),experiment_id.experiment_id)

<class 'mlflow.entities.experiment.Experiment'> 23


In [51]:
experiment_id=experiment_id.experiment_id
# Fetch experiment metadata information
experiment = client.get_experiment(experiment_id)
print("Name: {}".format(experiment.name))
print("Experiment_id: {}".format(experiment.experiment_id))
print("Artifact Location: {}".format(experiment.artifact_location))
print("Tags: {}".format(experiment.tags))
print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))

Name: 02B3_MFlow_LR_train
Experiment_id: 23
Artifact Location: s3://mlflow-artifact-store-awscday/23
Tags: {}
Lifecycle_stage: active


In [52]:
#Train dataset URL
dataset_source_url="https://www.kaggle.com/datasets/arashnic/hr-analytics-job-change-of-data-scientists?resource=download&select=aug_train.csv"

#Using mlflow.data.pandas_dataset, to log_input dataset instead of artifact
dataset = mlflow.data.pandas_dataset.from_pandas(data, source=dataset_source_url)

with mlflow.start_run(experiment_id=experiment_id):
    
    # Log the dataset to the MLflow Run. Specify the "training" context to indicate that the
    # dataset is used for model training
    mlflow.log_input(dataset, context="training")

    # tracking run parameters, ecosystem
    mlflow.log_param("compute", 'local')
    mlflow.log_param("dataset", 'kaggle-dataset')
    mlflow.log_param("dataset_version", '1.0')
    mlflow.log_param("dataset_url", dataset_source_url)
    mlflow.log_param("algo", 'Logistic Regression')

    class_weight = "balanced"
    max_iter = 1000

    logistic_regression = LogisticRegression(class_weight=class_weight, 
    max_iter=max_iter)
    logistic_regression.fit(x_train, y_train)

    y_pred = logistic_regression.predict(x_test)

    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    auc = roc_auc_score(y_test, y_pred)
    
    mlflow.log_param("class_weight", class_weight)
    mlflow.log_param("max_iter", max_iter)
    
    mlflow.log_metric("accuracy", accuracy)
    mlflow.log_metric("precision", precision)
    mlflow.log_metric("recall", recall)
    mlflow.log_metric("f1", f1)
    mlflow.log_metric("auc", auc) 

    mlflow.sklearn.log_model(logistic_regression, "model")

    # Retrieve the run, including dataset information
    run = mlflow.get_run(mlflow.last_active_run().info.run_id)  

  return _infer_schema(self._df)


In [53]:
#Getting dataset information
dataset_info = run.inputs.dataset_inputs[0].dataset
print(f"Dataset name: {dataset_info.name}")
print(f"Dataset digest: {dataset_info.digest}")
print(f"Dataset profile: {dataset_info.profile}")
print(f"Dataset schema: {dataset_info.schema}")

Dataset name: dataset
Dataset digest: b5ad9a5d
Dataset profile: {"num_rows": 19158, "num_elements": 229896}
Dataset schema: {"mlflow_colspec": [{"type": "string", "name": "city"}, {"type": "double", "name": "city_development_index"}, {"type": "string", "name": "gender"}, {"type": "string", "name": "relevent_experience"}, {"type": "string", "name": "enrolled_university"}, {"type": "string", "name": "education_level"}, {"type": "string", "name": "major_discipline"}, {"type": "string", "name": "experience"}, {"type": "string", "name": "company_size"}, {"type": "string", "name": "company_type"}, {"type": "string", "name": "last_new_job"}, {"type": "long", "name": "training_hours"}]}


In [54]:
#Checking last active run
run = mlflow.last_active_run()
run.info.run_id

'd5927bf6e30d472595068813b30ffc80'

In [58]:
# Load the dataset's source, which downloads the content from the source URL to the local
# filesystem
#dataset_source = mlflow.data.get_source(dataset_info)
#data_loaded = dataset_source.load()

# Conclusions

# References

# Other key functions and Info

In [248]:
def assert_experiment_names_equal(experiments, expected_names):
    actual_names = [e.name for e in experiments if e.name != "Default"]
    assert actual_names == expected_names, (actual_names, expected_names)

search_name='02B_MFlow_LR_train'
# Search for experiments with full_name 
experiments = mlflow.search_experiments(filter_string="name = '02B_MFlow_LR_train'")
#assert_experiment_names_equal(experiments, [search_name])

print(f'%% Experiments by the name: {search_name} are: {len(experiments)}')
print(f'%% Making loop over experiments list \n')
for element in experiments:
    print("Name: {}".format(element.name))
    print("Experiment_id: {}".format(element.experiment_id))
    print("Artifact Location: {}".format(element.artifact_location))
    print("Tags: {}".format(element.tags))
    print("Lifecycle_stage: {}".format(element.lifecycle_stage))
    print("Creation timestamp: {}".format(element.creation_time))
 

%% Experiments by the name: 02B_MFlow_LR_train are: 0
%% Making loop over experiments list 



In [249]:
import mlflow

# Specify the name of the experiment you want to check
experiment_name = "02B2_MFlow_LR_train"

# Get the experiment by name
experiment = mlflow.get_experiment_by_name(experiment_name)

# Check if the experiment exists and if it is active
if experiment is not None and experiment.lifecycle_stage == "active":
    print("The experiment is active.")
else:
    print("The experiment is either not found or not active.")

The experiment is active.


In [243]:
# # Construct a Pandas DataFrame using iris flower data from a web URL
# dataset_source_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
# df = pd.read_csv(dataset_source_url)
# # Construct an MLflow PandasDataset from the Pandas DataFrame, and specify the web URL
# # as the source
# dataset = mlflow.data.pandas_dataset.from_pandas(df, source=dataset_source_url)

# with mlflow.start_run(experiment_id=experiment_id):
#     # Log the dataset to the MLflow Run. Specify the "training" context to indicate that the
#     # dataset is used for model training
#     mlflow.log_input(dataset, context="training")

# # Retrieve the run, including dataset information
# run = mlflow.get_run(mlflow.last_active_run().info.run_id)
# dataset_info = run.inputs.dataset_inputs[0].dataset
# print(f"Dataset name: {dataset_info.name}")
# print(f"Dataset digest: {dataset_info.digest}")
# print(f"Dataset profile: {dataset_info.profile}")
# print(f"Dataset schema: {dataset_info.schema}")


In [245]:
# import mlflow

# from sklearn.model_selection import train_test_split
# from sklearn.datasets import load_diabetes
# from sklearn.ensemble import RandomForestRegressor

# mlflow.autolog()

# db = load_diabetes()
# X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)

# # Create and train models.
# rf = RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3)
# rf.fit(X_train, y_train)

# # Use the model to make predictions on the test dataset.
# predictions = rf.predict(X_test)
# autolog_run = mlflow.last_active_run()

# mlflow.end_run()

In [246]:
#client = mlflow.MlflowClient()
#data = client.get_run(mlflow.active_run().info.run_id).data

In [None]:
# experiment_name_formlflow="01B3_MFlow_LR_train"
# #Set an experiment name, which must be unique and case sensitive
# experiment_id = mlflow.set_experiment(
#     experiment_name_formlflow
# )

# print(type(experiment_id),experiment_id.experiment_id)

# from mlflow import MlflowClient

# # Create an experiment with a name that is unique and case sensitive.
# client = MlflowClient()

# experiment_id=experiment_id.experiment_id
# # Fetch experiment metadata information
# experiment = client.get_experiment(experiment_id)
# print("Name: {}".format(experiment.name))
# print("Experiment_id: {}".format(experiment.experiment_id))
# print("Artifact Location: {}".format(experiment.artifact_location))
# print("Tags: {}".format(experiment.tags))
# print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))


# import json
# import plotly.express as px
# import mlflow
# import requests
    
# ### prepare sample files to log
# # test data
# df = px.data.iris()

# # sample CSV file
# df.to_csv("1_data_sample.csv")

# # sample pandas HTML file
# df.to_html("2_data_sample.html")

# # sample image
# r = requests.get("https://upload.wikimedia.org/wikipedia/en/7/7d/Lenna_%28test_image%29.png")
# with open("3_image_sample.png", 'wb') as f:
#     f.write(r.content)
    
# # sample gif
# r = requests.get("https://media1.giphy.com/media/bU3YVJAAXckCI/giphy.gif")
# with open("4_gif_sample.gif", 'wb') as f:
#     f.write(r.content)

# # sample plotly plot - HTML
# fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species", marginal_y="rug", marginal_x="histogram")
# fig.write_html("5_plot_sample.html")

# # sample geojson
# with open("6_map_sample.geojson", "w+") as f:
#     data = requests.get("https://gist.githubusercontent.com/wavded/1200773/raw/e122cf709898c09758aecfef349964a8d73a83f3/sample.json").json()
#     f.write(json.dumps(data))
    
# ### log files to mlflow experiment
# with mlflow.start_run(experiment_id=experiment_id, run_name="file_display") as run:
    
#     mlflow.log_param("parameter","test")
#     mlflow.log_metric("the_answer",42.0)
    
#     mlflow.log_artifact("./1_data_sample.csv")
#     mlflow.log_artifact("./2_data_sample.html")
#     mlflow.log_artifact("./3_image_sample.png")
#     mlflow.log_artifact("./4_gif_sample.gif")
#     mlflow.log_artifact("./5_plot_sample.html")
#     mlflow.log_artifact("./6_map_sample.geojson")