# Backend Engineering Take-Home Challenge

Submitted by: Crystal Nguyen

This API allows you to trigger an ETL (Extract, Transform, Load) process and retrieve the results from a PostgreSQL **postgres** database. Uses Flask as the web framework for building the API.

## System Requirements

To run this application, you need the following system requirements:
The application is containerized using Docker. It requires a Docker environment with Docker Engine installed.

- Python 3.10
- Docker base image: `python:3.10-slim`
- PostgreSQL 15.3

## Input Requirements

```symbol
  Root
  ├─ Folder 1
  │   ├─ File 1
  │   └─ File 2
  ├─ Folder 2
  │   ├─ File 3
  │   └─ Subfolder
  │       └─ File 4
  └─ File 5
```
To run this application, you need the following input data files:

You will find three CSV files in the `data`  directory:

- `users.csv`: Contains user data with the following columns: `user_id`, `name`, `email`,`signup_date`.

- `user_experiments.csv`: Contains experiment data with the following columns: `experiment_id`, `user_id`, `experiment_compound_ids`, `experiment_run_time`. The `experiment_compound_ids` column contains a semicolon-separated list of compound IDs.


- `compounds.csv`: Contains compound data with the following columns: `compound_id`, `compound_name`, `compound_structure`.

## Building and Running the Application

1. Clone the repository: `git clone <repository_url>`
2. Change into the project directory: `cd <project_directory>`
3. Build the Docker image: `docker build -t myapp .`
4. Run the Docker container: `docker run -d -p 5000:5000 --name myapp_container myapp`
5. Access the application at: `http://localhost:5000`

### Step 1: Build and Run the Docker Container

1. Make sure you have Docker installed on your machine.
2. Open a terminal or command prompt.
3. Navigate to the project directory.

#### Option 1: Using the provided shell script

4. Run the `run_app.sh` shell script to build and run the Docker container:

```bash
./run_app.sh
```

#### Option 2: Manually build and run the Docker container

4. Build the Docker image by running the following command:
```bash
docker build -t eikon_app:1.0 .
```

5. Run the Docker container using the following command:
```bash
docker run -d -p 5000:5000 --name eikon_app_1.0_container eikon_app:1.0
```
   
   
`build_and_run.sh`

Stop and remove the Docker container

```bash
docker stop eikon_app_1.0_container`
```

```bash
docker rm eikon_app_1.0_container
```

### Step 2: Make a Curl Request to the API Endpoint
1. Open a new terminal or command prompt.

#### Option 1: Using the provided shell script
2. Run the make_request.sh shell script to make a curl request to the API endpoint:
`run_app.sh`

#### Option 2: Manually make a curl request
2. Use the following curl command to make a request to the API endpoint:
```bash
curl -s http://localhost:5000/api-endpoint
```

Replace **api-endpoint** with the actual endpoint you want to access.

## app.py - API Endpoints
`app.py`

`http://127.0.0.1:5000`

### Trigger ETL

Endpoint: `/trigger-etl`
Method: GET

This endpoint triggers the ETL process by calling the `etl()` function. It returns a JSON response indicating the status of the ETL process and any relevant messages.

### ETL Results

Endpoint: `/etl-results`
Method: GET

This endpoint retrieves the results of the ETL process. It returns a JSON response containing the ETL results, such as the transformed data or any other relevant output.

### Database Structure

The Postgres database used by the application consists of the following tables:

-  `user_id`:
-  `name`:
-  `email`:
-  `signup_date`:
-  `experiment_count`: Total experiments a user ran.
-  `avg_experiment_run_time`: Average experiments amount per user.
-  `compound_id`: User's most commonly experimented compound.
-  `compound_name`:
-  `compound_structure`:

## Assumptions and Known Issues

While the ETL Process API is functional and provides the desired functionality, there are a few known issues and areas that can be improved in future iterations. Here are some of the notable points:

- Issue 1: The error handling mechanism can be further improved to provide more detailed and informative error messages to the API consumers.
- Improvement 1: Currently, the API only supports GET requests for triggering the ETL process and retrieving results. Adding support for other HTTP methods like POST, PUT, or DELETE can provide more flexibility in controlling the ETL process.
- Improvement 2: Enhancing the logging capabilities of the API can help in better monitoring and troubleshooting of the ETL process. Integration with a centralized logging system or implementing log rotation can be considered.
- asynchronous processing, based on time
- If the ETL process is a background task that needs to be executed periodically or based on certain conditions, it is common to trigger it automatically. In this case, you can set up a scheduled task or use a job scheduler to run the ETL process at predefined intervals or when specific events occur.
- On the other hand, if the ETL process is intended to be user-triggered, allowing users to initiate the process at their discretion, you can provide an API endpoint or a user interface that allows users to explicitly trigger the ETL process. This gives users control over when the ETL process is executed.

### Assumptions

- The ETL process assumes a specific data source format and schema. Ensure that the data provided for the ETL process adheres to the expected structure and format.
- The API assumes a stable and accessible database connection with the required credentials and permissions. Ensure that the database is properly set up and accessible before running the API.
- assumptions noted in email to aaron


### Questions

- units
- If you have any questions or need clarifications regarding the functionality, usage, or implementation details of the ETL Process API, please don't hesitate to reach out by opening an issue or starting a discussion in the project repository.
- We welcome feedback, suggestions, and contributions from the community. If you have ideas for enhancements or improvements, or if you encounter any issues, please let us know.

