In [24]:
# run this to shorten the data import from the files
import os
cwd = os.path.dirname(os.getcwd())+'/'
path_data = os.path.join(os.path.dirname(os.getcwd()), 'datasets/')


In [25]:
import pandas as pd
from sklearn.model_selection import train_test_split as tts
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LinearRegression

data = pd.read_csv(path_data+'50_Startups.csv')

encoder = LabelEncoder()

X = data.iloc[:, :-1]
X['State'] = encoder.fit_transform(X['State'])
y = data.iloc[:,-1]

X_train, X_test, y_train, y_test = tts(X,y, test_size=0.3)



In [26]:
import warnings

# Ignore the FutureWarning and UserWarning warnings
warnings.simplefilter("ignore", FutureWarning)
warnings.simplefilter("ignore", UserWarning)

In [27]:
# exercise 01

"""
Package a machine learning model

In this exercise, you will train a LinearRegression model from scikit-learn to predict profit of a Unicorn Company.

You will use MLflow's built-in scikit-learn Flavor to package the model. You will use the Flavor's auto logging function to automatically log metrics, parameters and the model to MLflow Tracking when the fit estimator is called.
"""

# Instructions

"""


    Import the sklearn Flavor from the mlflow module.
    Set the Experiment to "Sklearn Model".
    Use auto logging from the flavor to package your model.

"""

# solution

# Import Scikit-learn flavor
import mlflow.sklearn

# Set the experiment to "Sklearn Model"
mlflow.set_experiment("Sklearn Model")

# Set Auto logging for Scikit-learn flavor 
mlflow.sklearn.autolog()

lr = LinearRegression()
lr.fit(X_train, y_train)

# Get a prediction from test data
print(lr.predict(X_test.iloc[[5]]))


#----------------------------------#

# Conclusion

"""
Awesome! Packaging an ML model using MLflow is simple. It's even more simple if you use a built-in flavor that supports autologging.
"""

2024/05/07 08:25:45 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '79411bbb14d24e4887cd50676525ac90', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow


[58446.4113765]


"\nAwesome! Packaging an ML model using MLflow is simple. It's even more simple if you use a built-in flavor that supports autologging.\n"

# Storage Format

MLflow uses a specific storage format in order to standardize the way models are packaged. Which of the following artifacts can be found in the directory structure for MLflow's storage format? 

### Possible Answers

        MLmodel{Answer}


        model.pkl{Answer}


        mlflow_version.txt


        python_env.yaml{Answer}

In [39]:
# exercise 02

"""
What's in an MLmodel file?

You learned that the MLmodel file is used to define specific information about how our models can be loaded and integrated with existing ML tools.

An MLmodel file has been set to a variable called "mlmodel". Use the print(mlmodel) in the IPython Shell to view the contents of the file.

Which of the following is information found within the MLmodel file?
"""

# Instructions

"""
Possible answers:
    
    Cloud provider, hardware, ML libraries
    
    Python environment, model path, Flavors, Python version {Answer}
    
    Model version, model path, model author, Python version
"""

# solution

with open('mlruns/562345919476994573/9dddc9eaff20448ea2e2fa462cea2ce6/artifacts/model/MLmodel') as file:
    print(file.read())
    file.close()

#----------------------------------#

# Conclusion

"""
Great! MLmodel provides information about Python, the path to the model and Flavors used in order to load the model.
"""

artifact_path: model
flavors:
  python_function:
    env:
      conda: conda.yaml
      virtualenv: python_env.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    predict_fn: predict
    python_version: 3.12.3
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 1.4.2
mlflow_version: 2.12.1
model_size_bytes: 609
model_uuid: eded09afb08245108c43e46e1059c029
run_id: 9dddc9eaff20448ea2e2fa462cea2ce6
signature:
  inputs: '[{"type": "double", "name": "R&D Spend", "required": true}, {"type": "double",
    "name": "Administration", "required": true}, {"type": "double", "name": "Marketing
    Spend", "required": true}, {"type": "long", "name": "State", "required": true}]'
  outputs: '[{"type": "tensor", "tensor-spec": {"dtype": "float64", "shape": [-1]}}]'
  params: null
utc_time_created: '2024-05-07 11:19:38.433178'



'\nGreat! MLmodel provides information about Python, the path to the model and Flavors used in order to load the model.\n'

In [29]:
# exercise 03

"""
Saving and loading a model

With the Model API, models can be shared between developers who may not have access to the same MLflow Tracking server by using a local filesystem.

In this exercise, you will train a new LinearRegression model from an existing one using the Unicorn dataset. First, you will load an existing model from the local filesystem. Then you will train a new model from the existing model and save it back to the local filesystem.

The existing model has been saved to the local filesystem in a directory called "lr_local_v1". The mlflow module will be imported.
"""

# Instructions

"""

    Load the model from the local filesystem directory "lr_local_v1" using scikit-learn library from the MLflow module.

    Using the scikit-learn library from the mlflow module, save the model locally to a directory called "lr_local_v2".

"""

# solution

# Load model from local filesystem
model = mlflow.sklearn.load_model("lr_local_v1")

# Training Data
X = df[["R&D Spend", "Administration", "Marketing Spend", "State"]]
y = df[["Profit"]]
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7,random_state=0)
# Train Model
model.fit(X_train, y_train)

# Save model to local filesystem
mlflow.sklearn.save_model(model, "lr_local_v2")

#----------------------------------#

# Conclusion

"""
Good! The save_model() function is used when saving a model locally and can provide a way for developers to share models.
"""

'\n\n'

In [30]:
# exercise 04

"""
Logging and loading a model

The Model API provides a way to interact with our models by logging and loading them directly from MLflow Tracking in a standardized manner. Being able to interact with models is crucial during the ML lifecycle for the Model Engineering and Model Evaluation steps.

In this exercise you will create a Linear Regression model from scikit-learn using the Unicorn dataset. This model will be logged to MLflow Tracking and then loaded using the run_id used to log the artifact.

First, you will log the model using the scikit-learn library from the MLflow module. Then you will load the model from MLflow Tracking using the run_id.

The model will be trained and have the name lr_model.

lr_model = LinearRegression()
lr_model.fit(X_train, y_train)

The mlflow module will be imported.
"""

# Instructions

"""


    Log the model to MLflow Tracking under the artifact path of "lr_tracking".

    Create a variable called run that is set to the last run.

    Create another variable called run_id that is set to the run_id of the run variable.

    Load the model using the run_id and the artifact path used to log the model.

"""

# solution

# Log model to MLflow Tracking
mlflow.sklearn.log_model(lr_model, "lr_tracking")

# Get the last run
run = mlflow.last_active_run()

# Get the run_id of the above run
run_id = run.info.run_id

# Load model from MLflow Tracking
model = mlflow.sklearn.load_model(f"runs:/{run_id}/lr_tracking")


#----------------------------------#

# Conclusion

"""
Great Job! The log_model() function is used to save a model to an MLflow Tracking server and can be loaded by its run_id.
"""

'\n\n'

In [31]:
# exercise 05

"""
Creating a custom Python Class

MLflow provides a way to create custom models in order to provide a way to support a wide variety of use cases. To create custom models, MLflow allows for users to create a Python Class which inherits mlflow.pyfunc.PythonModel Class. The PythonModel Class provides customization by providing methods for custom inference logic and artifact dependencies.

In this exercise, you will create a new Python Class for a custom model that loads a specific model and then decodes labels after inference. The mlflow module will be imported.
"""

# Instructions

"""

    Create a Python Class with the name CustomPredict.

    Define the load_context() method used for loading artifacts within a custom Class.

    Define the predict() method for defining custom inference.

"""

# solution

# Create Python Class
class CustomPredict(mlflow.pyfunc.PythonModel):
    # Set method for loading model
    def load_context(self, context):
        self.model = mlflow.sklearn.load_model("./lr_model/")
    # Set method for custom inference     
    def predict(self, context, model_input):
        predictions = self.model.predict(model_input)
        decoded_predictions = []  
        for prediction in predictions:
            if prediction == 0:
                decoded_predictions.append("female")
            else:
                decoded_predictions.append("male")
        return decoded_predictions

#----------------------------------#

# Conclusion

"""
Well done! Python Classes provide a blueprint for objects and allow for access to methods in order to manipulate them.
"""

'\n\n'

In [32]:
# exercise 06

"""
Custom scikit-learn model

In this exercise you are going to create a custom model using MLflow's pyfunc flavor. Using the insurance_charges dataset, the labels must be changed from female to 0 and male to 1 for classification during training. When using the model, the strings of female or male must be returned instead of 0 or 1.

The custom model is a Classification model based on LogisticRegression and will use a Class called CustomPredict. The CustomPredict adds an additional step in the the predict method that sets your labels of 0 and 1 back to female and male when the model receives input. You will be using pyfunc flavor for logging and loading your model.

Our insurance_charges dataset will be preprocessed and model will be trained using:

lr_model = LogisticRegression().fit(X_train, y_train)

The MLflow module will be imported.
"""

# Instructions

"""

    Use MLflow's pyfunc flavor to log the custom model.

    Set pyfunc python_model argument to use the Custom Class CustomPredict().

    Load the custom model using pyfunc.

"""

# solution

# Log the pyfunc model 
mlflow.pyfunc.log_model(
	artifact_path="lr_pyfunc", 
    # Set model to use CustomPredict Class
	python_model=CustomPredict(), 
	artifacts={"lr_model": "lr_model"}
)

run = mlflow.last_active_run()
run_id = run.info.run_id

# Load the model in python_function format
loaded_model = mlflow.pyfunc.load_model(f"runs:/{run_id}/lr_pyfunc")


#----------------------------------#

# Conclusion

"""
Nice job! Custom Python models allows for users to cover a much wider range of use cases and provides more flexibility.
"""

'\n\n'

In [33]:
# exercise 07

"""
Scikit-learn flavor and evaluation

In this exercise you will train a classification model and evaluates its performance. The model uses your Insurance Charges dataset in order to classify if the charges were for a female or male.

We will start by logging our model to MLflow Tracking using the scikit-learn flavor and finish by evaluating your model using an eval_data dataset.

Your evaluation dataset is created as eval_data and our model trained with the name lr_class. The eval_data will consist of X_test and y_test as the training data was split using train_test_split() function from sklearn.

# Model
lr_class = LogisticRegression()
lr_class.fit(X_train, y_train)

The mlflow module is imported.
"""

# Instructions

"""

    Log the lr_class model using scikit-learn "built-in" flavor.

    Call the evaluate() function from mlflow module.

    Evaluate the eval_data dataset and target the "sex" column.

"""

# solution

# Eval Data
eval_data = X_test
eval_data["sex"] = y_test
# Log the lr_class model using Scikit-Learn Flavor
mlflow.sklearn.log_model(lr_class, "model")

# Get run id
run = mlflow.last_active_run()
run_id = run.info.run_id

# Evaluate the logged model with eval_data data
mlflow.evaluate(f"runs:/{run_id}/model", 
        data=eval_data, 
        targets="sex",
        model_type="classifier"
)

#----------------------------------#

# Conclusion

"""
Great job! Evaluating your model plays an integral part in understandin gmodel performance.
"""

'\n\n'

# Serving a model

Model deployment is another important step of the ML Lifecycle. The MLflow command line interface includes a command for serving models. Models can be deployed with MLflow from the local filesystem, from MLflow Tracking, and from several cloud providers such as AWS S3.

To serve a model from MLflow Tracking using its run_id, which of the following commands is used to serve the model?

### Possible Answers


    mlflow serve models -m runs:/7de9bbe306224c2c9842beb11357d084/model
    
    
    mlflow models serve -m runs:/7de9bbe306224c2c9842beb11357d084/model {Answer}
    
    
    mlflow models serve -m run_id:/7de9bbe306224c2c9842beb11357d084/model

# Score from a served model

Once a model has been served with mlflow serve command line interface command, which of the following curl commands would be used to get retrieve a score for a JSON input using dataframe_split type?

### Possible Answers
Select one answer

    curl -d '{"dataframe_split": {"columns": ["x"], "data": [[10]]}}' -H 'Content-Type: application/json' -X POST localhost:5000/ping
    
    
    curl -d '{"column": [10]"}' -H 'Content-Type: application/json' -X POST localhost:5000/inference
    
    
    curl -d '{"dataframe_split": {"columns": ["x"], "data": [[10]]}}' -H 'Content-Type: application/json' -X POST localhost:5000/invocations {Answer}

**Correct! Your input uses JSON and dataframe_split orientation as well as the correct invocations endpoint.**