## Uploading CSV Data to MongoDB

In this project, we will be uploading CSV data to MongoDB. The data is located in the `data` folder. <br>
The `data` folder is something like this:

```
.
└── data/
    ├── class_A/
    │   ├── gyro/
    |   |   |── class_A_01.csv
    |   |   |── class_A_02.csv
    │   ├── accelerometer/
    |   |   |── class_A_01.csv
    |   |   |── class_A_02.csv
    │   └──
    ├── class_B/
    │   ├── gyro/
    |   |   |── class_B_01.csv
    |   |   |── class_B_02.csv
    │   ├── accelerometer/
    |   |   |── class_B_01.csv
    |   |   |── class_B_02.csv
    └── class ...
```

To upload this data to MongoDB, we will write a Python script that reads in each CSV file, transforms the data as necessary, and uploads it to the database. We will use the PyMongo library to interact with the database.

Here are the high-level steps we will take:

1. Iterate through the class folders in the `data` directory.
2. For each class folder, iterate through the `gyro` and `accelerometer` subfolders.
3. Read in the CSV files using the `csv` library.
4. Transform the data as necessary, such as converting timestamps or normalizing values.
5. Upload the data to the corresponding MongoDB collection using PyMongo.

By following these steps, we will be able to upload all of our CSV data to MongoDB and prepare it for further analysis.


## Load configuration

In [5]:
# import library for yaml handling
import yaml
import os

In [6]:
config_path = "config.yml"

with open(config_path) as file:
    config = yaml.load(file, Loader=yaml.FullLoader)

## MongoDB database instantiation

The relevant information for the MongoDB client connection, the database name, and collection name is located in the configuration file.

```
# DB Connection with the uri (host)
client: "mongodb://localhost:27017/"

# db name
db: "handwriting_classifier"

# db collection
col: "sensor_readings"
```

In [7]:
# import library for hanlding the MongoDB client
import pymongo
# import library for retrieving datetime
from datetime import datetime

### Create the database

Create a MongoDB database and connect to it.

In [8]:
client = pymongo.MongoClient(config["client"])
db = client[config["db"]]
col = db[config["col"]]

## Create the data collection

Uploading the gathered data to MongoDB collection. The data directory structure should be as follows:

```
.
└── data/
    ├── α/
    │   ├── gyro/
    |   |   |── α_01.csv
    |   |   |── α_02.csv
    │   ├── accelerometer/
    |   |   |── α_01.csv
    |   |   |── α_02.csv
    │   └──
    ├── β/
    │   ├── gyro/
    |   |   |── β_01.csv
    |   |   |── β_02.csv
    │   ├── accelerometer/
    |   |   |── β_01.csv
    |   |   |── β_02.csv
    └── class ...
```

In [9]:
# import library for hanlding the csv data and transformations
import pandas as pd

Get data path:

In [10]:
data_path = "../data"
print(data_path)

../data


List all files in a path:

In [11]:
classes_folders_list = [f for f in os.listdir(data_path) if os.path.isdir(os.path.join(data_path, f))]
print(classes_folders_list)

['α', 'β', 'γ', 'δ', 'ε', 'ζ', 'η', 'θ']


Each document in the MongoDB database should have the following schema:

```json
{
  "data": {
    "acc_x": ["array", "of", "values"],
    "acc_y": ["array", "of", "values"],
    "acc_z": ["array", "of", "values"],
    "gyr_x": ["array", "of", "values"],
    "gyr_y": ["array", "of", "values"],
    "gyr_z": ["array", "of", "values"]
  },
  "label": "The label of the instance",
  "datetime": "MongoDB datetime object (it can be generated with the datetime.datetime.now() function"
}
```

In [11]:
from utils import df_rebase

## Upload the data to MongoDB

In [12]:
for item in classes_folders_list:
    gyro_path = data_path + "/" + item + "/gyro"
    accel_path = data_path + "/" + item + "/accel"
    
    gyro_files = [f for f in os.listdir(gyro_path) if os.path.isfile(os.path.join(gyro_path, f))]
    accel_files = [f for f in os.listdir(accel_path) if os.path.isfile(os.path.join(accel_path, f))]


    for gyro_file, accel_file in zip(gyro_files, accel_files):
        gyro_df = pd.read_csv(gyro_path + "/" + gyro_file)
        accel_df = pd.read_csv(accel_path + "/" + accel_file)
        
        final_df = df_rebase(accel_df,gyro_df, config["order"], config["rename"])

        data_dict = final_df.to_dict('list')
        data_dict = {k: v for k, v in data_dict.items()}

        doc = {
            'data': data_dict,
            'label': item,
            'datetime': gyro_df['time (03:00)'][0].replace("T", " ")
        }


        # insert data to MongoDB
        col.insert_one(doc)