Skip to content

AntonisCSt/Mlops_project_semicon

Repository files navigation

Semiconductor quality prediction and pipeline

A MLops Zoomcamp project

About the project

Manufacturing process feature selection and categorization. The project focuses on MLops best practices.

The projects contains semicoductor sensor data and classifies the end product as Pass or Fail. There are ~580 sensor features that are used. Can we make a classification model that uses the best featuers to make Pass/Fail prediction?

About the data

data source: https://www.kaggle.com/datasets/paresh2047/uci-semcom?resource=download

  • Data Set Characteristics: Multivariate
  • Number of Instances: 1567
  • Area: Computer
  • Attribute Characteristics: Real
  • Number of Attributes: 591
  • Date Donated: 2008-11-19
  • Associated Tasks: Classification, Causal-Discovery
  • Missing Values? Yes

MLops project solution and architecture

Screenshot

Tasks and dates

Task 1) Add experiment tracking and set-up registery server (local) artifacts in s3 with MLflow

  • Experiment tracking (DONE)
  • Model registery (DONE)
  • Connect to S3 (DONE)

Task 2) Convert notebook into a pipeline

  • Define pipeline (DONE)
  • write individual scirpts of pipeline (DONE)
  • create a scikit learn pipeline to have preprocess and model together (DONE)
  • finish predict (DONE)
  • Use and Test model (DONE)

Task 3) Add orchestration with Prefect

  • Added Prefect flow in train.py (DONE) (12/08/22)
  • Full deploy prefect

Task 4) Add Monitoring

  • Add Monitoring service (DONE) (19/08/22)
  • Connect mlflow to monitoring image (DONE) (23/08/22)

Task 4) AWS model deploy also connect kinesis and lambda function

  • Predict using kinesis streams

Task 5) Best practices --> Create tests,linting and pre-commit hooks

  • Add tests: Prefect tasks (DONE) (15/08/22) Integration tests (DONE) (20/08/22)
  • Add pre-commit hooks (Done) (20/08/22)
  • Add linting (DONE) (20/08/22)
  • Add Makefile (DONE) (22/08/22)
  • Add CI/CD (DONE) (22/08/22)

Taks 6) Final touches

  • Make Readme.md nicer (DONE) (03/09/22)
  • Write project description (DONE) (03/09/22)
  • Write Instructions (DONE) (03/09/22)

Instructions

How to make the app run:

requirments:

  1. docker installed
  2. Having your AWS credential added in docker compose file and model exists in S3.

for 2) you can also use the model.pkl. Release line 46 in prediction_service\app.py. Put "#" anything related to S3.

docker compose -f docker-compose.yml up --build

to test it run:

bash build_test_shut.sh

then you can run:

python .\send_data.py

finally:

python ./prefect_monitoring/prefect_monitoring.py

This will create an html file with the report

For the data drift you can check the Grafana container on port 3000 :)

How can you contribure:

Currently this project focuses on MLops. It is weak on the actual ML-pipeline.

  1. A good idea is to apply L1 regularization in a feature selection step.

  2. Try other classification models and grid search.

  3. Use Docker compose or any otehr method to automatically push to ECS.

  4. Use Kinesis and through a lambda function send stream data to the ECS.

Other Insructions:

1) Install enviroment

use:

$pipenv install

2) Create a S3 bucket for mlflow

in train.py and main_notebook.ipynb

mlflow.create_experiment("semicon-sensor-clf","[your S3 bucket]")

(or use a local file)

3) Run prediction locally using FastAPI and uvicorn server (dev)

run: predict.py

go to : http://localhost:8001/docs

press :"try it out"

use example: from test_one_input.txt (it should give output as "0")

4) Testing flow and other functions

run: pytest

5) Run app and monitoring service

docker compose -f docker-compose.yml up --build python .\send_data.py

6) Running Makefile

If you are on windows and want to run a Makefile go to: https://chocolatey.org/install and follow the instructions

run this in gitbash:

make build

It will run needed tests and then build image and run the container

after this you can run:

python send_data.py

python ./prefect_monitoring/prefect_monitoring.py

This will create an html file with the report

Usefull commands:

pipenv:

$pipenv install $pipenv install --dev [library] $pipenv --venv

mlflow:

mlflow server --backend-store-uri=sqlite:///mlflow.db --default-artifact-root=s3://mlflow-semicon-clf/

use:

mlflow.set_tracking_uri("sqlite:///mlflow.db") #mlflow.set_experiment("testing-mlflow") mlflow.create_experiment("semicon-sensor-clf","s3://mlflow-semicon-clf/") mlflow.set_experiment("semicon-sensor-clf")

linting and black:

pylint --recursive=y train.py, predict.py, ./prefect_monitoring/prefect_monitoring.py, ./prediction_service/app.py

black --skip-string-normalization --diff train.py, predict.py, ./prefect_monitoring/prefect_monitoring.py

black --skip-string-normalization train.py, predict.py, ./prefect_monitoring/prefect_monitoring.py, ./prediction_service/app.py

git:

pre-commit

prefect:

prefect orion start

aws ecs: docker compose --project-name semicontest -f docker-compose.yml up --build

About

A project from the ml_ops Zoomcamp (DataTalks) using Semiconductor data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published