My-MLOps-Notes

Listing my MLOps learnings

This repository is containing my notes from this Udemy course.

85 percent of trained ML model don't reach production and 55 % of companies don't deploy a single model.
An ideal ML life cycle

Researches show that companies using UI increased their profit margin by 3% to 15%.
DevOps applied to Machine Learning is known as MLOps. Model creation must be scalable, collaborative and reproducible. The principles, tools and techniques that make models scalable, collaborative and reproducible are known as MLOps.
MLOps process:

DevOps applied to Machine Learning is known as MLOps. DevOps applied to Data is known as DataOps.
Roles in MLOps

Challenges addressed by MLOps

Data and Artifact versioning
Model Tracking: Degradition of performance due to data drift.
Feature Generation: MLOPS allows to reuse methods

Parts of MLOPS

MLOps Tools

Some data labelling tools:

Feature Engineering

Some Feature Engineering Tools:

Hyperparameter Optimization

Some Hyperparameter Optimization Tools:

Fast API can be used in serving ML model.
Streamlit is useful for POC.
MLOps stages:

Some tools to use

Structuring ML projects in one of 3 ways.

Cookiecutter

Cookiecutter is a tool to structure our ML projects and folders. It should be installed GLOBALLY on the computer(not in a virtual environment).

pip install cookiecutter

cookiecutter https://github.com/khuyentran1401/data-science-template

cookiecutter https://github.com/MuhammedBuyukkinaci/Clean-Data-Science-Project-Template.git

Poetry

Poetry allows us to manage dependencies and versions. Poetry is an alternative to pip.
- Poetry separates main dependencies and sub dependencies into two separate files. Whereas,pip stores all dependencies in a single file(requirements.txt).
- Poetry creates readable dependency files.
- Poetry removes all sub dependencies when removing a library
- Poetry avoids installing new libraries in conflict with existing libraries.
- Poetry packages project with few lines of code.
- All the dependencies of he project are specified in pyproject.toml

# TO install poetry on your machine(for linux and mac)
curl -sSL https://install.python-poetry.org | python3 -


# To generate a project
poetry new <project_name>

# To install dependencies
poetry install

# To add a new pypi library
poetry add <library_name>

# To delete a library
poetry remove <library_name>

# To show installed libraries
poetry show

# To show sub dependencies
poetry show --tree

# Link our existing environment(venv, conda etc) to poetry
poetry env use /path/to/python

Hydra

Hydra manages configuration files. It makes project management easier.
- Configuration information shouldn't be mixed with main code.
- It is easier to modify things in a configuration file.
- YAML is a common language for a configuration file.
- An example config file and its usage via hydra
- We can modify hydra parameters via CLI without modifying config file.
- Hydro logging is super useful.
- To use hydra, we must add config as an argument to a function.

import hydra
from pipeline2 import pipeline2

@hydra.main(config_name = 'preprocessing')
def run_training(config):

    match_pipe = pipeline2(config)

Pre-commit

Pre-commit plugins: It automates code review and formatting. In order to install them, use pip install pre-commit. After installing pre-commit, fill out .pre-commit-config.yaml and run pre-commit install to install it. Then, some checks are run before committing to local repository. Commit will not be done until the problem got solved. --no-verify is flag that can be appended to git commit. It doesn'T force you to correct the mistakes detected by pre-commit.

- Formatter: black
- PEP8 Checker: flake8
- Sort imports: isort
- Check for docstrings: interrogate

Black and Flake8

# pip install black
black file_name_01.py

# pip install flake8
flake8 temp.py

isort and iterrogate
- correct isort:

#pip install isort
isort file_name.py
#pip install interrogate
interrogate -vv file_name.py

DVC is used for version control of model training data.
pdoc is used to automatically create documentation for projects.

pip install pdoc3

pdoc --http localhost:8080 temp.py

Makefile creates short and readable commands for configuration tasks. We can use Makefile to automate tasks such as setting up the environment.

A solution design is available at here
MLOps stages:

What AutoML does:

PyCaret

PyCaret is an open source, low code ML library. It has been developed in Python and reduce the time needed to create a model to minutes.

PyCaret incorporates these libraries:

Pandas Profiling is allowing us to develop an exhaustive analysis of data.

An example of PyCaret setup function:

Tukey-Anscomble Plot && Normal QQ Plot

Scale-Location Plot && Residuals & Leverage

MLOps Tracking Server and Model Registry

MLFlow UI for different runs

MLFlow

Different Components of MLFlow

We can log parameters, metrics and models in MLFlow.

import mlflow
from sklearn.linear_model import LogisticRegression
from urllib.parse import urlparse

alpha = 0.5

def rmse_compute(true,preds):
    pass

X_train = None
y_train = None

X_test = None
y_test = None

with mlflow.start_run():

    lr = LogisticRegression(alpha = alpha)
    lr.fit(X_train,y_train)
    y_test_preds = lr.predict(X_test)
    rmse = rmse_compute(y_test,y_test_preds)
    mlflow.log_param('alpha',alpha)
    mlflow.log_metric('rmse',rmse)

    tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

    if tracking_url_type_store != 'file':
        mlflow.sklearn.log_model(lr,'model',registered_monel_name = 'ElasticNetWineModel')
    else:
        mlflow.sklearn.log_model(lr,'model')

We can register models into MLFlow via PyCaret.

#pass log_experiment = True, experiment_name = 'diamond'

s = setup(data, target = 'Precio', transform_target = True, log_experiment = True, experiment_name = 'diamond')

Shap

Shap is a Python library about model interpretability.

A prediction for a single record

We can use SHAP with PyCaret.

Deploying the model

We aren't just deploying a model(a pickle file). We are also deploying a pipeline(composed of preprocessing, feature engineering etc.)

There are 2 different ways to deploy a model in a production environment
- Through API
- Through Applications(mobile/web)

FastAPI

API is an intermediary between 2 different applications that communicate each other. If we want our applications to be available for other developers, creating an api as an intermediate connector is convenient. Developers send http request to consume this service. We can think of an API as an bstraction of our application. Thanks to API, users don't need code or install the dependencies.

HTTP verbs and Status Codes

FastAPI is the framework creating robust & high-performance API for production environments. Compared to Flask, which is a development framework, FastAPI has the following advantages:
- Using syncio
- Implementing PyDantic for data validation
- FastAPI forces the schema application and input data and detect data types at runtime.
- FastAPI uses swagger UI to create automatic documentation.
- FastAPI has a better security and authentication features.
FastAPI documentation UI

FastAPI is built on top of uvicorn library.
A basic usage of FastAPI

import uvicorn
from fastapi import FastAPI

app = FastAPI()
@app.get('/')
def home():
    return {'Hello': 'World'}

#@app.post("/")
#def home_post():
#    return {"Hello": "POST"}

if __name__ == '__main__':
    uvicorn.run("hellow_world_fastapi:app")

#query parameters
@app.get("/employee")
def home(department: str):
    return {"department": department}

#path parameters
@app.get("/employee/{id}")
def home(id: int):
    return {"id": id}

Pydantic usage in FastAPI

PyCaret is able to create automated FastAPI.

Gradio

Gradio is a web application to deploy our ML models. It has a UI for business users.
An example demo for gradio app

PyCaret has a function to create gradio apps easily.

Flask

Flask is a web development framework in Python.
- It is easy to use.
- It is flexible.
- Allows testing.
An example code snippet for Flask

from flask import Flask

app = Flask(__name__)

@app.route("/")
def home():
    return {"Hello": "World"}

if __name__ == '__main__':
    app.run()

A ML Project Deployment Pipeline

Dockerfile can be thought as recipy to cook a meal.
Phyical Machine vs Virtualization vs Container Deployment vs Kubernetes

We can create a docker image using PyCaret vida create_docker function of PyCaret.

Azure

If we are using a paid Azure account, we can register our Docker images on Azure Registry.
Azure Blob Storage is similar to Amazon S3. To use asure blob storage, azure SDK is needed.

Practical MLOPS Notes

Intrıduction to MLOPS

ML job listings are much more than data science job listings.
MLOps is the process of automating machine learning using DevOps methodologies.
Data drift is a phonomenia that means data changed from training to inference.
New Relic, Data Dog and Stackdriver are performance monitoring tools.
An example Makefile is below. Run make install or make lint or make test.

install:
	pip install --upgrade pip &&\
		pip install -r requirements.txt
lint:
	pylint --disable=R,C hello.py

test:
	python -m pytest -vv --cov=hello test_hello.py

Data Lake is a place where we can process data without transferring it outside.
MLOPS is possible after Devops(Jenkins), Data automation(Airflow) and platform automation(AWS Sagemaker) are completed.
Building reusable ML pipelines is crucial and related to versioning.
Locust and loader.io are 2 ways for load testing.
MLOPS is a combination of Data, Devops, Models and Business equally.

MLOPS Foundations

MLOPS for containers and Edge Devices

The future means more ML at the edge devices.
The Coral Project is a platform that helps build local (on-device) inferencing that captures the essence of edge deployments: fast, close to the user, and offline.
Azure Percept is a Microsoft solution similar to The Coral Project.

Continuous Delivery for Machine Learning Models

Github actions can be located under .github/workflows/ as main.yaml.
An examle overview of CI/CD

Kubernetes is a good way to implement Blue-Green Deployment.
Blue green deployment is a way for deployments. There are 2 environments. The current is blue and the new version is green. The app is running on blue environment. We are installing the new version on green environment and carrying out tests. If everything goes well, direct the traffic to green environment.
Canary deployment is a way for deployments. We are progressively moving traffic from old environment to new environment. If something unexpected happens in the new environment, we are rolling back to the previous environment. Thus, less traffic got affected due to mistakes. If everything goes well, the traffic directed to the new environment is progressively increased up to 100%.

AutoML and KaizenML

1 ) KaizenML is about automating everything about the machine learning process and improving it.

Software for training machine learning models could turn into something like the Linux kernel, free and ubiquitous.
AutoML is just a technique, like continuous integration (CI); it automates trivial tasks.
DevOps + KaizenML = MLOps. KaizenML includes building Feature Stores, i.e., a registry of high-quality machine learning inputs and the ability to monitor data for drift and register and serve out ML models
Uber's Michelangelo is a ML as a Service. Databricks has a feature store solution too.
Apple has a machine learning framework called Core ML. It is under Xcode > Open Developer Tool > Create ML.
TFHUB is a hub to store various pretrained ML models.
SageMaker Autopilot is AWS's complete solution for AutoML and MLOps.
[Ludwig] is a Python library that builds up ML solutions declaratively. We are defining a yaml file and then running ludwig's API via programatically or CLI. It is a part of Linux foundation.

FlaML is an AutoML solution. It has a design that accounts for cost-effective hyperparameter optimization.
Tpot is an AutoML solution. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
PyCaret is an AutoML tool.
Autosklearn is an AutoML tool.
H2O AutoML is an AutoML tool.
ELI5 and SHAP are 2 popular open source Model Explainability frameworks.

AWS for MLOPS

Go Hugo is a website to build up static websites.
AWS DeepLens is a hardware sol
It would bea good practice to have a cli.py to get predictions.
Github actions is an alternative to bitbucket pipelines and jenkins. Its cloud native alternative on AWS is AWS Codebuild.
Github container registry is an alternative to Amazon ECR(Elastic COntainer Registry).

Machine Learning Interoperability

ONNX is a tool for ML Interoperability. It is a product of collaboration of Facebook and Microsoft.
Some libraries are able to being converted to ONNX format:

An example script to convert a pytorch model to ONNX format

import torch
import torchvision

dummy_tensor = torch.randn(8, 3, 200, 200)
model = torchvision.models.resnet18(pretrained=True)

input_names = [ "input_%d" % i for i in range(12) ]
output_names = [ "output_1" ]

torch.onnx.export(
    model,
    dummy_tensor,
    "resnet18.onnx",
    input_names=input_names,
    output_names=output_names,
    opset_version=7,
    verbose=True,
)

ONNX has a special format callet ORT for minimized build size of the model.

Building MLOps Command Line Tools and Microservices

requirements.txt file and setup.py files are able to install dependencies for a Python project. However, only setup.py file can package a project for distribution.
Command Line Tool Development can be useful in a situation that needs solving.

ML Engineering and ML Use cases

Porting a Python model into a production language like C++ or Java is challenging, and often results in reduced performance of the original, trained model
Fairlearn is a python package to mitigate observed unfairness. It is a library to detect bias among gender, race, religion etc.
InterpretML is a python package for ML interpretability.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.vscode		.vscode
images		images
mlops_project		mlops_project
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
main.yaml		main.yaml
temp.py		temp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

My-MLOps-Notes

Feature Engineering

Hyperparameter Optimization

Cookiecutter

Poetry

Hydra

Pre-commit

PyCaret

MLFlow

Shap

Deploying the model

FastAPI

Gradio

Flask

Azure

Practical MLOPS Notes

Intrıduction to MLOPS

MLOPS Foundations

MLOPS for containers and Edge Devices

Continuous Delivery for Machine Learning Models

AutoML and KaizenML

AWS for MLOPS

Machine Learning Interoperability

Building MLOps Command Line Tools and Microservices

ML Engineering and ML Use cases

About

Releases

Packages

Languages

MuhammedBuyukkinaci/My-MLOps-Notes

Folders and files

Latest commit

History

Repository files navigation

My-MLOps-Notes

Feature Engineering

Hyperparameter Optimization

Cookiecutter

Poetry

Hydra

Pre-commit

PyCaret

MLFlow

Shap

Deploying the model

FastAPI

Gradio

Flask

Azure

Practical MLOPS Notes

Intrıduction to MLOPS

MLOPS Foundations

MLOPS for containers and Edge Devices

Continuous Delivery for Machine Learning Models

AutoML and KaizenML

AWS for MLOPS

Machine Learning Interoperability

Building MLOps Command Line Tools and Microservices

ML Engineering and ML Use cases

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages