![image.png](attachment:image.png)



# Architecture Document

## LC50 Prediction Project Architecture

### Overview

The architecture of the LC50 prediction project involves several interconnected components to facilitate the prediction of LC50 values using QSAR models. This document outlines the key components, their interactions, and the overall flow of data and processes within the system.

### Components

1. **Data Collection and Preprocessing**
   - **Objective:** Collect and preprocess data from the ECOTOX Database and ECHA for model training.
   - **Tools:** Python (pandas for data manipulation, scikit-learn for preprocessing).

2. **Model Training**
   - **Objective:** Train machine learning models (Random Forest, SVR, GBR) using preprocessed data.
   - **Tools:** scikit-learn for model training and optimization.

3. **Ensemble Model**
   - **Objective:** Combine predictions from individual models to improve accuracy.
   - **Technique:** Simple averaging of predictions from Random Forest, SVR, and GBR models.

4. **Flask API**
   - **Objective:** Provide an interface for users to input molecular descriptors and receive LC50 predictions.
   - **Tools:** Flask framework for web application development.

5. **Azure Web Service**
   - **Objective:** Host the Flask application for scalable deployment.
   - **Configuration:** Azure Web App service for hosting and continuous deployment from GitHub.

6. **Logging and Monitoring**
   - **Objective:** Capture application events and monitor performance.
   - **Tools:** Python logging library for application-specific logs, Azure Application Insights for real-time monitoring.

### Architecture Diagram

```
+-----------------------------------+
|            User Interface         |
|    (HTML/CSS, Flask Templates)    |
+-----------------------------------+
                  |
                  v
+-----------------------------------+
|          Flask Application        |
|         (API Endpoint)            |
+-----------------------------------+
                  |
                  v
+-----------------------------------+
|    Machine Learning Models        |
|     (Random Forest, SVR, GBR)     |
+-----------------------------------+
                  |
                  v
+-----------------------------------+
|    Data Collection & Preprocessing|
|     (ECOTOX Database, ECHA)       |
+-----------------------------------+
```

### Interaction Flow

1. **User Interaction:**
   - User enters molecular descriptors via the User Interface.
   - Submits form data to Flask API endpoint (`/predict`).

2. **Flask API Processing:**
   - Receives form data and sends it to the Machine Learning Models.
   - Retrieves predictions from the Ensemble Model.

3. **Machine Learning Models:**
   - Receive input data.
   - Each model (Random Forest, SVR, GBR) generates predictions independently.

4. **Ensemble Model:**
   - Combines predictions from individual models (RF, SVR, GBR) using averaging.

5. **Data Collection & Preprocessing:**
   - Collects data from ECOTOX Database and ECHA.
   - Cleans, preprocesses, and prepares data for model training.

### Non-Functional Requirements Addressed

- **Performance:** Ensure timely responses from Flask API and model predictions.
- **Scalability:** Deploy on Azure for scalable hosting and management.
- **Security:** Validate user inputs and protect against malicious data.
- **Monitoring:** Utilize Azure tools for monitoring application health and performance.

### Deployment Strategy

- Deploy the Flask application on Azure Web Services.
- Set up continuous deployment from GitHub repository to Azure.
- Configure environment variables and application settings in Azure portal.

### Conclusion

This architecture document provides a comprehensive overview of the LC50 prediction project, detailing the system’s structure, key components, and interactions. It serves as a guide for development, deployment, and maintenance of the application, ensuring clarity and alignment with project objectives.

---