# **Project Flow**

___

____

## **Workflow**

1. constants
2. config_entity
3. artifact_entity
4. component code
5. pipeline
6. app.py / demo.py

___

___

### **Template file**

1. Create a `template.py` 
    - it is going to be the project template 
    - execute the template
    - this template can be used for future projects
    - it defines the folder sturcture for the project
    - once it's run, it creates all the mentioned files into the root directory of the project
    - when you are creating a complex project, you need to have a project structure and that is provided by the `template.py` file

```
project_name/
│
├── __init__.py
│
├── components/
│   ├── __init__.py
│   ├── data_ingestion.py
│   ├── data_validation.py
│   ├── data_transformation.py
│   ├── model_trainer.py
│   ├── model_evaluation.py
│   └── model_pusher.py
│
├── configuration/
│   ├── __init__.py
│   ├── mongo_db_connection.py
│   └── aws_connection.py
│
├── cloud_storage/
│   ├── __init__.py
│   └── aws_storage.py
│
├── data_access/
│   ├── __init__.py
│   └── proj1_data.py
│
├── constants/
│   └── __init__.py
│
├── entity/
│   ├── __init__.py
│   ├── config_entity.py
│   ├── artifact_entity.py
│   ├── estimator.py
│   └── s3_estimator.py
│
├── exception/
│   └── __init__.py
│
├── logger/
│   └── __init__.py
│
├── pipline/
│   ├── __init__.py
│   ├── training_pipeline.py
│   └── prediction_pipeline.py
│
├── utils/
│   ├── __init__.py
│   └── main_utils.py
│
├── config/
│   ├── model.yaml
│   └── schema.yaml
│
├── app.py
├── demo.py
├── requirements.txt
├── setup.py
├── pyproject.toml
├── Dockerfile
└── .dockerignore
```


___

### **Setup and toml files**

2. write the code on `setup.py` and `pyproject.toml` file to import local packages

- now when we install the requirements in the venv, the packages get installed
- but, we have a src folder and there are many files inside whichth elocal packages needs to be imported/installed
- and for that we ada a `-e .` at the end of the requirements.txt file 
- what this does is:t
    - this will directly enter the `setup.py`
    - and find the "setup" function
        - inside the "setup" function, it will go to the `packages = find_packages()` 
        - it will find the packages in src (we have mentioned "name = 'src'") ---> local package
        - so this setup.py helps to properly install the local packages in the environment
- and similarly, the `pyproject.toml` works along with the `setup.py`:
    - the toml file will store the corresponding metadata, configurations etc
    - prevent environment errors

- now, whatever fucntions or modules we will define int he `src` folder, we will be easily be able to communicate with it outide of the src folder 

___

### **Virtual Environment setup**

3. Create a virtual environment and activate it 
    - install the requirements from the `requirements.txt`
    - commands:
        - `conda create -n Mlops-Porject-1 python=3.10 -y`
        - `conda activate Mlops-Porject-1` 
    - add requirement modules to the requirement.txt
        - `do pip install -r requirements.txt`
    - do a pip list on terminal to make sure you have the local packages installed

___

### **MongoDB Setup**
> Atlas ---> Organisation ---> Project ---> CLuster ---> data

4. MongoDB setup
    - sign in to `mongodb atlas` 
    - create an organisation
    - create a project by just providing it a name then next next create
    - from `create a cluster` screen, hit "create"
    - select M0 service keeping other services as default 
    - hit "create deployment"
    - setup the username and password and then create a DB user
    - go to "network access" and add ip address ---> `0.0.0.0/0` ---> so taht we can access it from anywhere
    - go back to project ---> "get connection string" ---> "drivers" ---> {Driver: Python, Version: 3.6 or later} ---> copy and save the connection string with your password ---> done
    - create a folder "notebook" and add the data.csv to it
    - create a file "mongoDB_demo.ipynb" 
    - select kernel ---> python kernel ---> select Mlops-project-1 kernel
    - dataset added to notebook folder
    - push your data to mongoDB database from python notebook
    - go to mongoDB Atlas ---> Database ---> browse connection ---> see your data in key value format

mongodb+srv://Achyuth_mlops_user:<password>@cluster0.kyrojcr.mongodb.net/?appName=Cluster0

![image.png](attachment:image.png)

![image.png](attachment:image.png)

^ Data successfully uploaded to the MongoDB database

___

### **Logging, exception, EDA, feature engineering notebooks**

5. setting up the logging and exceptions
    - write a logger file and test it on `demo.py`
    - write an exception file and test iton `demo.py`
    - EDA and feature engineering notebooks also needs to be added

___

### **Data Ingestion**

6. before `data_ingestion` component:
    - declare variables within `constants.__init__.py` file
    - add code to `configuration.mongo_db_connections.py` file and define the functions for mongodb connection
    - now create a new directory `data_access` ---> then create a new file inside named `proj1_data` that will use `mongo_db_connections.py`
    - to connect with the DB, fetch the data in key-value format and transform that to a dataframe
    - then add code to `entity.config_entity.py` file till **DataIntegstionConfig** class
    - then add code to `entity.artifact_entity.py` file till **DataIntestionArtifact** class
    - then add code to `components.data_ingestion.py` file 
    - add the code to training pipeline
        - for demo:
            - setup the mongodb connection url first and follow the next steps
            - run `demo.py`

7. to setup the connection url on windows:
    - open powershell terminal and run:
        - set: ```$MONGODB_URL = "mongodb+srv://Achyuth_mlops_user:<password>@cluster0.kyrojcr.mongodb.net/?appName=Cluster0"```
        - check: ```echo $env:MONGODB_URL```
    - OR MANUALLY:
    - to set up the connection url on windows, open the env variable setting option and add a new variable:
        - NAME: MONGODB_URL, Value = `<url>`
- also add "artifact" dir to `.gitignore` file

$MONGODB_URL = "mongodb+srv://Achyuth_mlops_user:Iu6mo7qIJmqRnfAU@cluster0.kyrojcr.mongodb.net/?appName=Cluster0"

___

### **Data Validation, Data Transformation, Model Trainer**

8. complete the work on `utils.main_utils.py` and `config.schema.yaml` file 
    - add entire information about the dataset for data validation step
9. work on the `Data Validation` component similar to the **Data Ingestion step**
10. similarly work on `Data Transformation`
    - add _`estimator.py`_ to the entity folder
11. similarly, work on `Model Trainer` compnent
    - add class to _`estimator.py`_ in entity folder

___

### **Model Evaluation and Model Pusher**

12. Before moving ahead with this component (model evaluation), some AWS services setup is needed:
    - login to AWS console
    - keep the region as us-east-1
    - go to IAM ---> create a new user (name: firstproj)
    - attaach policy ---> select AdminstratorAccess ---> next ---> create user
    - go to the user ---> security credentials ---> access keys ---> create access key
    - select CLI ---> agree to condition ---> next ---> crete access key ---> download the csv file
    - set env variables with above csv values using the below methods:
        - Set env var from powershell terminal:
            - $env:AWS_ACCESS_KEY_ID="AWS_ACCESS_KEY_ID"
            - $env:AWS_SECRET_ACCESS_KEY="AWS_SECRET_ACCESS_KEY"
        - Check env var from powershell terminal:
            - echo $env:AWS_ACCESS_KEY_ID
            - echo $env:AWS_SECRET_ACCESS_KEY
    - now, add the access key, secret key, region name to `constants.__init__.py` file
    - add code to `src.configurations.aws_connection.py` file ---> to work with AWS S3 services
    - ensure below info in `constants.__init__.py` file:
        - MODEL_EVALUATION_CHANGED_THRESHOLD_SCORE: float = 0.02
        - MODEL_BUCKET_NAME = "my-model-mlopsproj"
        - MODEL_PUSHER_S3_KEY = "model-registry"
    - Go to S3 service >> Create bucket >> Region: us-east-1 >> General purpose >>
        Bucket Name: "achyuth-mlops-project" >> uncheck: "Block all public access" and acknowledge >>
        Hit Create Bucket
    - Now inside `src.aws_storage` code needs to be added for the configurations needed to pull 
        and push model from AWS S3 bucket. 
    - Inside "entity" dir we will have an `s3_estimator.py` file containing all the func to pull/push
        data from s3 bucket.

24. Now we will start our work on "Model Evaluation" and "Model Pusher" component.


![image.png](attachment:image.png)

___

### **Prediction Pipeline and App**

14. create a code structure for `prediction_pipeline.py` 
15. setup the `app.py`
16. add **static** and **template** directory to the project

___

### **CI/CD Pipeline**

17. setup the `dockerfile` and `.dockerignore` file 
18. setup the `.github\workflows` dir and `aws.yaml` file within
19. go to the AWS management console and: 
    - create a new IAM user 
    - go inside user ---> security credentials ---> access keys ---> create access keys ---> CLI ---> check agreement ---> next ---> create access key ---> download csv
    - **IGNORE IF CREATED ALREADY**
20. now create one **ECR** repo to store/save docker image
    - AWS Console ---> go to ECR ---> region: us-east-1 ---> hit create repository ---> repo name: **vehicleproj** ---> hit create repository ---> copy and keep the uri
21. Now create EC2 Ubuntu server ---> AWS console ---> EC2 ---> Launch Instance ---> name: vehicledata-machine ---> Image: Ubuntu ---> AMI: Ubuntu Server 24.04 (free tier) ---> Instance: T2 Medium (~chargeable-3.5rs/hr) ---> create new key pair (name: proj1key) ---> allow for https and http traffic ---> storage: 30gb ---> Launch
    - Go to instance ---> click on "Connect" ---> Connect using EC2 Instance Connect 
    - Connect (Terminal will be launched) 

22. Open EC2 and install docker in EC2 machine:
    ```
    sudo apt-get update -y
    sudo apt-get upgrade
    ## Required (Because Docker is'nt there in our EC2 server - [docker --version])
    curl -fsSL https://get.docker.com -o get-docker.sh
    sudo sh get-docker.sh
    sudo usermod -aG docker ubuntu
    newgrp docker
    ```

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

29. Next step is to connect Github with EC2(Self hosted runner):
    select your project on Github >> go to settings >> Actions >> Runner >> New self hosted runner
        >> Select OS (Linux) >> Now step by step run all "Download" related commands on EC2 server 
        >> run first "Configure" command (hit enter instead of setting a runner group, runner name: self-hosted)
        >> enter any additional label (hit enter to skip) >> name of work folder (again hit enter)
        >> Now run second "Configure" command (./run.sh) and runner will get connected to Github
        >> To crosscheck, go back to Github and click on Runner and you will see runner state as "idle"
        >> If you do ctrl+c on EC2 server then runner will shut then restart with "./run.sh"

30. Setup your Github secrets: (Github project>Settings>SecretandVariable>Actions>NewRepoSecret)
      AWS_ACCESS_KEY_ID
      AWS_SECRET_ACCESS_KEY
      AWS_DEFAULT_REGION
      ECR_REPO

31. CI-CD pipeline will be triggered at next commit and push.
32. Now we need to activate the 5000 port of our EC2 instance:
      * Go to the instance > Security > Go to Security Groups > Edit inbound rules > add rule > type: Custom TCP > Port range: 5000 > 0.0.0.0/0 > Save rules
33. Now paste the public ip address on the address bar +:5000 and your app will be launched.
34. You can also do model training on /training route

___


It seems that the self-hosted runner stopped because it's designed to run in the foreground of the terminal when executed with `./run.sh`. This means that if you closed the terminal session or the EC2 instance restarted, the runner would have stopped. To ensure it runs persistently in the background and starts automatically, you should set it up as a service.

Here’s how to do that on your EC2 instance:

### 1. Navigate to the Runner Directory

First, make sure you’re in the `actions-runner` directory on your EC2 instance:

```bash
cd ~/actions-runner
```

### 2. Set Up the Runner as a Service

To run the self-hosted GitHub runner as a background service, GitHub provides a built-in script to install it as a service.

Run the following command in the `actions-runner` directory:

```bash
sudo ./svc.sh install
```

This command will install the runner as a service, making it easier to start, stop, or restart it.

### 3. Start the Service

After installing the service, start it using the following command:

```bash
sudo ./svc.sh start
```

### 4. Check the Status

You can check the status of the runner service to ensure it’s running:

```bash
sudo ./svc.sh status
```

### 5. Enable the Service to Start on Boot (Optional)

To ensure that the runner service starts automatically when the EC2 instance reboots, you can use the following command:

```bash
sudo systemctl enable actions.runner.<your-runner-name>.service
```

### Additional Notes

- **Replace `<your-runner-name>`**: When checking the status or enabling the service, replace `<your-runner-name>` with the specific name of your runner service. You can find the name of the service by running `systemctl list-units | grep actions.runner`.

- **Re-register the Runner**: If you’re still experiencing issues, there may have been an issue during setup. You might need to re-register the runner. To do this, stop the service, remove the runner, and start the configuration process again with a new registration token from your GitHub repository's settings.

### Example Commands

```bash
# Stop and remove the service (if needed)
sudo ./svc.sh stop
sudo ./svc.sh uninstall

# Re-configure the runner
./config.sh --url https://github.com/vikashishere/MLOps-Test-Proj --token <NEW_TOKEN>

# Re-install and start the service
sudo ./svc.sh install
sudo ./svc.sh start
```

After following these steps, the runner should be persistently running as a background service on your EC2 instance and ready to accept jobs.

___

___

![image.png](attachment:image.png)

___

___