# **Project Flow**

___

____

## **Workflow**

1. constants
2. config_entity
3. artifact_entity
4. component code
5. pipeline
6. app.py / demo.py

___

___

### **Template file**

1. Create a `template.py` 
    - it is going to be the project template 
    - execute the template
    - this template can be used for future projects
    - it defines the folder sturcture for the project
    - once it's run, it creates all the mentioned files into the root directory of the project
    - when you are creating a complex project, you need to have a project structure and that is provided by the `template.py` file

```
project_name/
│
├── __init__.py
│
├── components/
│   ├── __init__.py
│   ├── data_ingestion.py
│   ├── data_validation.py
│   ├── data_transformation.py
│   ├── model_trainer.py
│   ├── model_evaluation.py
│   └── model_pusher.py
│
├── configuration/
│   ├── __init__.py
│   ├── mongo_db_connection.py
│   └── aws_connection.py
│
├── cloud_storage/
│   ├── __init__.py
│   └── aws_storage.py
│
├── data_access/
│   ├── __init__.py
│   └── proj1_data.py
│
├── constants/
│   └── __init__.py
│
├── entity/
│   ├── __init__.py
│   ├── config_entity.py
│   ├── artifact_entity.py
│   ├── estimator.py
│   └── s3_estimator.py
│
├── exception/
│   └── __init__.py
│
├── logger/
│   └── __init__.py
│
├── pipline/
│   ├── __init__.py
│   ├── training_pipeline.py
│   └── prediction_pipeline.py
│
├── utils/
│   ├── __init__.py
│   └── main_utils.py
│
├── config/
│   ├── model.yaml
│   └── schema.yaml
│
├── app.py
├── demo.py
├── requirements.txt
├── setup.py
├── pyproject.toml
├── Dockerfile
└── .dockerignore
```


___

### **Setup and toml files**

2. write the code on `setup.py` and `pyproject.toml` file to import local packages

- now when we install the requirements in the venv, the packages get installed
- but, we have a src folder and there are many files inside whichth elocal packages needs to be imported/installed
- and for that we ada a `-e .` at the end of the requirements.txt file 
- what this does is:t
    - this will directly enter the `setup.py`
    - and find the "setup" function
        - inside the "setup" function, it will go to the `packages = find_packages()` 
        - it will find the packages in src (we have mentioned "name = 'src'") ---> local package
        - so this setup.py helps to properly install the local packages in the environment
- and similarly, the `pyproject.toml` works along with the `setup.py`:
    - the toml file will store the corresponding metadata, configurations etc
    - prevent environment errors

- now, whatever fucntions or modules we will define int he `src` folder, we will be easily be able to communicate with it outide of the src folder 

___

### **Virtual Environment setup**

3. Create a virtual environment and activate it 
    - install the requirements from the `requirements.txt`
    - commands:
        - `conda create -n Mlops-Porject-1 python=3.10 -y`
        - `conda activate Mlops-Porject-1` 
    - add requirement modules to the requirement.txt
        - `do pip install -r requirements.txt`
    - do a pip list on terminal to make sure you have the local packages installed

___

### **MongoDB Setup**
> Atlas ---> Organisation ---> Project ---> CLuster ---> data

4. MongoDB setup
    - sign in to `mongodb atlas` 
    - create an organisation
    - create a project by just providing it a name then next next create
    - from `create a cluster` screen, hit "create"
    - select M0 service keeping other services as default 
    - hit "create deployment"
    - setup the username and password and then create a DB user
    - go to "network access" and add ip address ---> `0.0.0.0/0` ---> so taht we can access it from anywhere
    - go back to project ---> "get connection string" ---> "drivers" ---> {Driver: Python, Version: 3.6 or later} ---> copy and save the connection string with your password ---> done
    - create a folder "notebook" and add the data.csv to it
    - create a file "mongoDB_demo.ipynb" 
    - select kernel ---> python kernel ---> select Mlops-project-1 kernel
    - dataset added to notebook folder
    - push your data to mongoDB database from python notebook
    - go to mongoDB Atlas ---> Database ---> browse connection ---> see your data in key value format

mongodb+srv://Achyuth_mlops_user:<password>@cluster0.kyrojcr.mongodb.net/?appName=Cluster0

![image.png](attachment:image.png)

![image.png](attachment:image.png)

^ Data successfully uploaded to the MongoDB database

___

### **Logging, exception, EDA, feature engineering notebooks**

5. setting up the logging and exceptions
    - write a logger file and test it on `demo.py`
    - write an exception file and test iton `demo.py`
    - EDA and feature engineering notebooks also needs to be added

___

### **Data Ingestion**

6. before `data_ingestion` component:
    - declare variables within `constants.__init__.py` file
    - add code to `configuration.mongo_db_connections.py` file and define the functions for mongodb connection
    - now create a new directory `data_access` ---> then create a new file inside named `proj1_data` that will use `mongo_db_connections.py`
    - to connect with the DB, fetch the data in key-value format and transform that to a dataframe
    - then add code to `entity.config_entity.py` file till **DataIntegstionConfig** class
    - then add code to `entity.artifact_entity.py` file till **DataIntestionArtifact** class
    - then add code to `components.data_ingestion.py` file 
    - add the code to training pipeline
        - for demo:
            - setup the mongodb connection url first and follow the next steps
            - run `demo.py`

7. to setup the connection url on windows:
    - open powershell terminal and run:
        - set: ```$MONGODB_URL = "mongodb+srv://Achyuth_mlops_user:<password>@cluster0.kyrojcr.mongodb.net/?appName=Cluster0"```
        - check: ```echo $env:MONGODB_URL```
    - OR MANUALLY:
    - to set up the connection url on windows, open the env variable setting option and add a new variable:
        - NAME: MONGODB_URL, Value = `<url>`
- also add "artifact" dir to `.gitignore` file

$MONGODB_URL = "mongodb+srv://Achyuth_mlops_user:Iu6mo7qIJmqRnfAU@cluster0.kyrojcr.mongodb.net/?appName=Cluster0"

___

### **Data Validation, Data Transformation, Model Trainer**

8. complete the work on `utils.main_utils.py` and `config.schema.yaml` file 
    - add entire information about the dataset for data validation step
9. work on the `Data Validation` component similar to the **Data Ingestion step**
10. similarly work on `Data Transformation`
    - add _`estimator.py`_ to the entity folder
11. similarly, work on `Model Trainer` compnent
    - add class to _`estimator.py`_ in entity folder

___

### **Model Evaluation and Model Pusher**

12. Before moving ahead with this component (model evaluation), some AWS services setup is needed:
    - login to AWS console
    - keep the region as us-east-1
    - go to IAM ---> create a new user (name: firstproj)
    - attaach policy ---> select AdminstratorAccess ---> next ---> create user
    - go to the user ---> security credentials ---> access keys ---> create access key
    - select CLI ---> agree to condition ---> next ---> crete access key ---> download the csv file
    - set env variables with above csv values using the below methods:
        - Set env var from powershell terminal:
            - $env:AWS_ACCESS_KEY_ID="AWS_ACCESS_KEY_ID"
            - $env:AWS_SECRET_ACCESS_KEY="AWS_SECRET_ACCESS_KEY"
        - Check env var from powershell terminal:
            - echo $env:AWS_ACCESS_KEY_ID
            - echo $env:AWS_SECRET_ACCESS_KEY
    - now, add the access key, secret key, region name to `constants.__init__.py` file
    - add code to `src.configurations.aws_connection.py` file ---> to work with AWS S3 services
    - ensure below info in `constants.__init__.py` file:
        - MODEL_EVALUATION_CHANGED_THRESHOLD_SCORE: float = 0.02
        - MODEL_BUCKET_NAME = "my-model-mlopsproj"
        - MODEL_PUSHER_S3_KEY = "model-registry"
    - Go to S3 service >> Create bucket >> Region: us-east-1 >> General purpose >>
        Bucket Name: "achyuth-mlops-project" >> uncheck: "Block all public access" and acknowledge >>
        Hit Create Bucket
    - Now inside `src.aws_storage` code needs to be added for the configurations needed to pull 
        and push model from AWS S3 bucket. 
    - Inside "entity" dir we will have an `s3_estimator.py` file containing all the func to pull/push
        data from s3 bucket.

24. Now we will start our work on "Model Evaluation" and "Model Pusher" component.


![image.png](attachment:image.png)

___