MLOps Project Car Prices Prediction

This project has been developed as part of the MLOps Zoomcamp course provided by DataTalks.Club.

The dataset used has been downloaded from Kaggle and a preliminary data analysis was performed (see notebooks folder), to get some insights for the further project development.

Below you can find some instructions to understand the project content. Feel free to ⭐ and clone this repo 😉

Tech Stack

Project Structure

The project has been structured with the following folders and files:

.github: contains the CI/CD files (GitHub Actions)
data: dataset and test sample for testing the model
integration_tests: prediction integration test with docker-compose
lambda: test of the lambda handler with and w/o docker
model: full pipeline from preprocessing to prediction and monitoring using MLflow, Prefect, Grafana, Adminer, and docker-compose
notebooks: EDA and Modeling performed at the beginning of the project to establish a baseline
tests: unit tests
terraform: IaC stream-based pipeline infrastructure in AWS using Terraform
Makefile: set of execution tasks
pyproject.toml: linting and formatting
setup.py: project installation module
requirements.txt: project requirements

Project Description

The dataset was obtained from Kaggle and contains various columns with car details and prices. To prepare the data for modeling, an Exploratory Data Analysis was conducted to preprocess numerical and categorical features, and suitable scalers and encoders were chosen for the preprocessing pipeline. Subsequently, a GridSearch was performed to select the best regression models, with RandomForestRegressor and GradientBoostingRegressor being the top performers, achieving an R2 value of approximately 0.9.

Afterward, the models underwent testing, model registry, and deployment using MLflow, Prefect, and Flask. Monitoring of the models was established through Grafana and Adminer Database. Subsequently, a project infrastructure was set up in Terraform, utilizing AWS modules such as Kinesis Streams (Producer & Consumer), Lambda (Serving API), S3 Bucket (Model artifacts), and ECR (Image Registry).

Finally, to streamline the development process, a fully automated CI/CD pipeline was created using GitHub Actions.

Project Set Up

The Python version used for this project is Python 3.9.

Clone the repo (or download it as a zip file):

git clone https://github.com/benitomartin/mlops-car-prices.git

Create the virtual environment named main-env using Conda with Python version 3.9:
```
conda create -n main-env python=3.9
conda activate main-env
```
Install setuptools and wheel:
```
conda install setuptools wheel
```
Execute the setup.py script and install the project dependencies included in the requirements.txt:
```
pip install .

or

make install
```

Each project folder contains a README.md file with instructions about how to run the code. I highly recommend creating a virtual environment for each one. Additionally, please note that an AWS Account, credentials, and proper policies with full access to EC2, S3, ECR, Lambda, and Kinesis are necessary for the projects to function correctly. Make sure to configure the appropriate credentials to interact with AWS services.

Project Best Practices

The following best practices were implemented:

✅ Problem description: The project is well described and it's clear and understandable
✅ Cloud: The project is developed on the cloud and IaC tools are used for provisioning the infrastructure
✅ Experiment tracking and model registry: Both experiment tracking and model registry are used
✅ Workflow orchestration: Fully deployed workflow
✅ Model deployment: The model deployment code is containerized and can be deployed to the cloud
✅ Model monitoring: Basic model monitoring that calculates and reports metrics
✅ Reproducibility: Instructions are clear, it's easy to run the code, and it works. The versions for all the dependencies are specified.
✅Best practices:
- There are unit tests
- There is an integration test
- Linter and code formatting are used
- There is a Makefile
- There is a CI/CD pipeline

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLOps Project Car Prices Prediction

Tech Stack

Project Structure

Project Description

Project Set Up

Project Best Practices

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 217 Commits
.github/workflows		.github/workflows
.vscode		.vscode
data		data
integration_tests		integration_tests
lambda		lambda
model		model
notebooks		notebooks
terraform		terraform
tests		tests
.env.sample		.env.sample
.gitattributes		.gitattributes
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.py		setup.py

benitomartin/mlops-car-prices

Folders and files

Latest commit

History

Repository files navigation

MLOps Project Car Prices Prediction

Tech Stack

Project Structure

Project Description

Project Set Up

Project Best Practices

About

Topics

Resources

Stars

Watchers

Forks

Languages