# Data load

#### 0. Imports

In [2]:
import pandas as pd
import sys
sys.path.append("..")

from src.support.data_extraction import TickersFetcher
from src.support.data_transformation import TickerExtender, TechnicalIndicators
from src.support.data_load import MongoDBHandler

from src.support.utils import load_tickers_to_include

# 1. Introduction to this notebook

The goal of this notebook is to show the process followed to load data to MongoDB.

## 1.1 Database choice
For this project, the database choice is MongoDB. It is chosen above any other relational databases because, as the project is still in prototyping phase, it makes more sense due to the flexibility NoSQL provides. Among other NoSQL providers, MongoDB Atlas is chosen for its free tier and easy API integration.

## 1.2 Data subject to upload
There are various data that could be uploaded to the database by the nature of this project:
1. Transformed data. Typical ETL applies here.
2. Predictions data. For predictions to be tracked and also used by other services.
3. Trade oprations data. For monitoring of the trading activity.

In the case 1. Transformed data, there is a disclaimer; although the load part of the ETL is operational, it is in practice deactivated and not used due to the 0.5 GB storage limit present in MongoDB Atlas' free tier. Actually, data load for this is not even necessary, except for dataset versioning purposes for which a feature store (out of scope for this project) would be more appropriate, as API availability and the speed of transformation guarantee that at any time the data will be avaible in 20 seconds by executing the ETL code. Given that data would be used for prediction just once a week, or maximum once a day, the costs of storing big gigabytes of data outweight its benefits.

# 2. Data load process

Data load is handled by `src/support/data_load.py`, which is where the MongoDBHandler class stores the methods for uploading data to MongoDB.

The scripts that exploit these methods are in:
- `src/data_etl_pipeline.py`. The load code is commented out as explained above.
- `src/deployment/predict.py`. Loads the predictions output by the model into the 'stocks_predictions' collection.
- `src/trading/trading.py`. Interacts with the database to monitor trades in the 'trades' collection.