GitHub - AntonisCSt/Mlops_project_semicon: A project from the ml_ops Zoomcamp (DataTalks) using Semiconductor data

Semiconductor quality prediction and pipeline

A MLops Zoomcamp project

About the project

Manufacturing process feature selection and categorization. The project focuses on MLops best practices.

The projects contains semicoductor sensor data and classifies the end product as Pass or Fail. There are ~580 sensor features that are used. Can we make a classification model that uses the best featuers to make Pass/Fail prediction?

About the data

data source: https://www.kaggle.com/datasets/paresh2047/uci-semcom?resource=download

Data Set Characteristics: Multivariate
Number of Instances: 1567
Area: Computer
Attribute Characteristics: Real
Number of Attributes: 591
Date Donated: 2008-11-19
Associated Tasks: Classification, Causal-Discovery
Missing Values? Yes

MLops project solution and architecture

Tasks and dates

Task 1) Add experiment tracking and set-up registery server (local) artifacts in s3 with MLflow

Experiment tracking (DONE)
Model registery (DONE)
Connect to S3 (DONE)

Task 2) Convert notebook into a pipeline

Define pipeline (DONE)
write individual scirpts of pipeline (DONE)
create a scikit learn pipeline to have preprocess and model together (DONE)
finish predict (DONE)
Use and Test model (DONE)

Task 3) Add orchestration with Prefect

Added Prefect flow in train.py (DONE) (12/08/22)
Full deploy prefect

Task 4) Add Monitoring

Add Monitoring service (DONE) (19/08/22)
Connect mlflow to monitoring image (DONE) (23/08/22)

Task 4) AWS model deploy also connect kinesis and lambda function

Predict using kinesis streams

Task 5) Best practices --> Create tests,linting and pre-commit hooks

Add tests: Prefect tasks (DONE) (15/08/22) Integration tests (DONE) (20/08/22)
Add pre-commit hooks (Done) (20/08/22)
Add linting (DONE) (20/08/22)
Add Makefile (DONE) (22/08/22)
Add CI/CD (DONE) (22/08/22)

Taks 6) Final touches

Make Readme.md nicer (DONE) (03/09/22)
Write project description (DONE) (03/09/22)
Write Instructions (DONE) (03/09/22)

Instructions

How to make the app run:

requirments:

docker installed
Having your AWS credential added in docker compose file and model exists in S3.

for 2) you can also use the model.pkl. Release line 46 in prediction_service\app.py. Put "#" anything related to S3.

docker compose -f docker-compose.yml up --build

to test it run:

bash build_test_shut.sh

then you can run:

python .\send_data.py

finally:

python ./prefect_monitoring/prefect_monitoring.py

This will create an html file with the report

For the data drift you can check the Grafana container on port 3000 :)

How can you contribure:

Currently this project focuses on MLops. It is weak on the actual ML-pipeline.

A good idea is to apply L1 regularization in a feature selection step.
Try other classification models and grid search.
Use Docker compose or any otehr method to automatically push to ECS.
Use Kinesis and through a lambda function send stream data to the ECS.

Other Insructions:

1) Install enviroment

use:

$pipenv install

2) Create a S3 bucket for mlflow

in train.py and main_notebook.ipynb

mlflow.create_experiment("semicon-sensor-clf","[your S3 bucket]")

(or use a local file)

3) Run prediction locally using FastAPI and uvicorn server (dev)

run: predict.py

go to : http://localhost:8001/docs

press :"try it out"

use example: from test_one_input.txt (it should give output as "0")

4) Testing flow and other functions

run: pytest

5) Run app and monitoring service

docker compose -f docker-compose.yml up --build python .\send_data.py

6) Running Makefile

If you are on windows and want to run a Makefile go to: https://chocolatey.org/install and follow the instructions

run this in gitbash:

make build

It will run needed tests and then build image and run the container

after this you can run:

python send_data.py

python ./prefect_monitoring/prefect_monitoring.py

This will create an html file with the report

Usefull commands:

pipenv:

$pipenv install $pipenv install --dev [library] $pipenv --venv

mlflow:

mlflow server --backend-store-uri=sqlite:///mlflow.db --default-artifact-root=s3://mlflow-semicon-clf/

use:

mlflow.set_tracking_uri("sqlite:///mlflow.db") #mlflow.set_experiment("testing-mlflow") mlflow.create_experiment("semicon-sensor-clf","s3://mlflow-semicon-clf/") mlflow.set_experiment("semicon-sensor-clf")

linting and black:

pylint --recursive=y train.py, predict.py, ./prefect_monitoring/prefect_monitoring.py, ./prediction_service/app.py

black --skip-string-normalization --diff train.py, predict.py, ./prefect_monitoring/prefect_monitoring.py

black --skip-string-normalization train.py, predict.py, ./prefect_monitoring/prefect_monitoring.py, ./prediction_service/app.py

git:

pre-commit

prefect:

prefect orion start

aws ecs: docker compose --project-name semicontest -f docker-compose.yml up --build

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.github/workflows		.github/workflows
data		data
evidently_service		evidently_service
integration_tests		integration_tests
prediction_service		prediction_service
prefect_monitoring		prefect_monitoring
project_pictures		project_pictures
tests		tests
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Makefile		Makefile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
build_test_shut.sh		build_test_shut.sh
clean_mongo.py		clean_mongo.py
docker-compose.yml		docker-compose.yml
main_notebook.ipynb		main_notebook.ipynb
predict.py		predict.py
pyproject.toml		pyproject.toml
send_data.py		send_data.py
target.csv		target.csv
test_one_input.txt		test_one_input.txt
train.py		train.py

AntonisCSt/Mlops_project_semicon

Folders and files

Latest commit

History

Repository files navigation