# üöÄ END-TO-END ML PROJECT

**(Production-Grade | Docker | CI/CD | AWS | Azure)**


## üî∞ PART 0: PROJECT FOUNDATION (VERY IMPORTANT)

### 1Ô∏è‚É£ `setup.py` ‚Äì Project Packaging

* Converts project into a **Python package**
* Enables clean imports across modules
* Required for scalable & production-ready ML projects

```python
from setuptools import setup, find_packages

setup(
    name="mlproject",
    version="0.0.1",
    packages=find_packages()
)
```

**Interview line:**

> ‚ÄúI used `setup.py` to package the ML project and enable modular imports.‚Äù



### 2Ô∏è‚É£ `__init__.py`

* Marks folders as **Python packages**
* Enables structured imports

```python
from src.components.data_ingestion import DataIngestion
```

**Interview line:**

> ‚Äú`__init__.py` ensures folders behave as importable Python packages.‚Äù



### 3Ô∏è‚É£ `src/` Folder Design

* Separates **business logic** from deployment files
* Prevents circular imports
* Industry-standard ML layout


### 4Ô∏è‚É£ `logger.py`

* Centralized logging (instead of `print`)
* Logs pipeline steps, errors, model metrics

**Interview line:**

> ‚ÄúLogging makes the system debuggable in production.‚Äù



### 5Ô∏è‚É£ `exception.py`

* Custom exceptions with:

  * File name
  * Line number
  * Meaningful message

**Interview line:**

> ‚ÄúCustom exceptions improve traceability across pipelines.‚Äù



### 6Ô∏è‚É£ `requirements.txt`

* Ensures reproducible environments
* Used by Docker & CI/CD


### 7Ô∏è‚É£ `.gitignore`

* Prevents pushing:

  * secrets
  * models
  * cache files



## üß© PART 1: PROJECT STRUCTURE

```
project-root/
‚îÇ
‚îú‚îÄ‚îÄ artifacts/
‚îÇ   ‚îú‚îÄ‚îÄ model.pkl
‚îÇ   ‚îî‚îÄ‚îÄ preprocessor.pkl
‚îÇ
‚îú‚îÄ‚îÄ src/
‚îÇ   ‚îú‚îÄ‚îÄ components/
‚îÇ   ‚îÇ   ‚îú‚îÄ‚îÄ data_ingestion.py
‚îÇ   ‚îÇ   ‚îú‚îÄ‚îÄ data_transformation.py
‚îÇ   ‚îÇ   ‚îî‚îÄ‚îÄ model_trainer.py
‚îÇ   ‚îÇ
‚îÇ   ‚îú‚îÄ‚îÄ pipeline/
‚îÇ   ‚îÇ   ‚îú‚îÄ‚îÄ train_pipeline.py
‚îÇ   ‚îÇ   ‚îî‚îÄ‚îÄ predict_pipeline.py
‚îÇ   ‚îÇ
‚îÇ   ‚îú‚îÄ‚îÄ logger.py
‚îÇ   ‚îú‚îÄ‚îÄ exception.py
‚îÇ   ‚îî‚îÄ‚îÄ utils.py
‚îÇ
‚îú‚îÄ‚îÄ templates/
‚îÇ   ‚îú‚îÄ‚îÄ index.html
‚îÇ   ‚îî‚îÄ‚îÄ home.html
‚îÇ
‚îú‚îÄ‚îÄ application.py
‚îú‚îÄ‚îÄ Dockerfile
‚îú‚îÄ‚îÄ requirements.txt
‚îú‚îÄ‚îÄ setup.py
‚îú‚îÄ‚îÄ .github/workflows/main.yml
‚îî‚îÄ‚îÄ README.md
```


Create New Repository and Sync to Github and once you install requirements.txt:

The `.egg-info` folder stores metadata about the package, such as its version and dependencies. It‚Äôs created when you install the package in editable mode (`pip install -e .`), allowing you to make changes to the code without reinstalling it.


![image.png](attachment:image.png)

Written code for Exception and Logger and checked logger and Exception:
![image-2.png](attachment:image-2.png)
![image-3.png](attachment:image-3.png)

---

## üß© PART 2: TRAINING PIPELINE

### üîπ Data Ingestion

* Reads dataset
* Splits train/test
* Stores raw data

### üîπ Data Transformation

* Numerical ‚Üí scaling
* Categorical ‚Üí encoding
* Uses `ColumnTransformer`
* Saves `preprocessor.pkl`

### üîπ Model Trainer

* Trains multiple models:

  * Linear Regression
  * Decision Tree
  * Random Forest
  * Gradient Boosting
  * KNN
* Uses **GridSearchCV**
* Selects best model using **R¬≤ score**
* Saves `model.pkl`

**Artifacts Generated**

```
model.pkl
preprocessor.pkl
```

Excecuted Data_ingestion and logs are created:
Note: In the place of df we can replace any data source like mangoDB or any API as Data source:

![image.png](attachment:image.png)


---

## üß© PART 3: PREDICTION PIPELINE

### üîπ CustomData Class

* Converts HTML inputs ‚Üí DataFrame

### üîπ PredictPipeline

* Loads saved model & preprocessor
* Applies transformation
* Returns prediction

### üîπ Flask App (`application.py`)

* `/` ‚Üí Home page
* `/predictdata` ‚Üí POST request
* Displays prediction

**Same pipeline works for:**

* Web UI
* API
* Postman


## üß© PART 4: DOCKERIZATION

### üîπ Dockerfile

```dockerfile
FROM python:3.8-slim-buster
WORKDIR /app
COPY . /app
RUN pip install -r requirements.txt
CMD ["python3", "application.py"]
```

### üîπ Why Docker?

* OS-independent deployment
* Same behavior locally & cloud
* Easy scaling



## üß© PART 5: CI/CD USING GITHUB ACTIONS

### üîπ Workflow (`main.yml`)

**Triggered on push to `main`**

#### Stages:

1. **Continuous Integration**

   * Code checkout
   * Linting / tests
2. **Build & Push**

   * Build Docker image
   * Push to registry
3. **Continuous Deployment**

   * Pull image
   * Deploy automatically


## üß© PART 6: AWS DEPLOYMENT (PRIVATE)

### üîπ AWS Services

* **IAM** ‚Äì Secure access
* **ECR** ‚Äì Private Docker registry
* **EC2** ‚Äì Application server

### üîπ Flow

```
GitHub ‚Üí GitHub Actions ‚Üí ECR ‚Üí EC2
```

### üîπ Key Points

* Self-hosted GitHub runner on EC2
* Port **8080** exposed
* Fully automated deployment



## üß© PART 7: AZURE DEPLOYMENT

### üîπ Azure Services

* **Azure Container Registry (ACR)**
* **Azure Web App for Containers**

### üîπ Flow

```
GitHub ‚Üí GitHub Actions ‚Üí ACR ‚Üí Azure Web App
```

### üîπ Highlights

* Private Docker image
* Continuous deployment enabled
* Auto-scaling support

---

## üß† INTERVIEW QUESTIONS (MOST IMPORTANT)

### 1Ô∏è‚É£ Explain your project end-to-end

> Built modular ML pipelines, Dockerized the app, automated CI/CD, and deployed on AWS & Azure.



### 2Ô∏è‚É£ Why `setup.py`?

> To package the project and enable clean modular imports.



### 3Ô∏è‚É£ Why separate training & prediction pipelines?

> Training is batch-heavy; prediction must be fast and reusable.



### 4Ô∏è‚É£ Why save model & preprocessor?

> To avoid data leakage and ensure consistent inference.



### 5Ô∏è‚É£ Why Docker?

> Environment consistency and scalable deployments.



### 6Ô∏è‚É£ Docker Hub vs ECR vs ACR?

* Docker Hub ‚Üí Public
* ECR / ACR ‚Üí Private enterprise registries



### 7Ô∏è‚É£ What is CI/CD in your project?

> CI validates code, CD builds Docker images and deploys automatically.



### 8Ô∏è‚É£ Why self-hosted runner?

> To deploy directly to EC2 without manual SSH.



### 9Ô∏è‚É£ How would you scale this?

> Load balancer, auto-scaling, Kubernetes, ECS / AKS.



### üîü How do you monitor in production?

> Logs, cloud monitoring, health checks, model drift tracking.



## üéØ FINAL ONE-LINE INTERVIEW SUMMARY

> ‚ÄúThis is a production-grade ML system with modular pipelines, Dockerized deployment, CI/CD automation, and cloud-native deployment on AWS and Azure.‚Äù

---