## Uploading CSV Data to MongoDB

In this project, we will be uploading CSV data to MongoDB. The data is located in the `data` folder. <br>
The `data` folder is something like this:

```
.
└── data/
    ├── class_A/
    │   ├── gyro/
    |   |   |── class_A_01.csv
    |   |   |── class_A_02.csv
    │   ├── accelerometer/
    |   |   |── class_A_01.csv
    |   |   |── class_A_02.csv
    │   └──
    ├── class_B/
    │   ├── gyro/
    |   |   |── class_B_01.csv
    |   |   |── class_B_02.csv
    │   ├── accelerometer/
    |   |   |── class_B_01.csv
    |   |   |── class_B_02.csv
    └── class ...
```

To upload this data to MongoDB, we will write a Python script that reads in each CSV file, transforms the data as necessary, and uploads it to the database. We will use the PyMongo library to interact with the database.

Here are the high-level steps we will take:

1. Iterate through the class folders in the `data` directory.
2. For each class folder, iterate through the `gyro` and `accelerometer` subfolders.
3. Read in the CSV files using the `csv` library.
4. Transform the data as necessary, such as converting timestamps or normalizing values.
5. Upload the data to the corresponding MongoDB collection using PyMongo.

By following these steps, we will be able to upload all of our CSV data to MongoDB and prepare it for further analysis.


## Load configuration

In [1]:
# import library for yaml handling
import yaml
import os

In [2]:
config_path = "config.yml"

with open(config_path) as file:
    config = yaml.load(file, Loader=yaml.FullLoader)
print(config)

{'client': 'mongodb://localhost:27017/', 'db': 'aiot_course', 'col': 'sensor_readings', 'order': ['x-axis (g)', 'y-axis (g)', 'z-axis (g)', 'x-axis (deg/s)', 'y-axis (deg/s)', 'z-axis (deg/s)'], 'rename': ['acc_x', 'acc_y', 'acc_z', 'gyr_x', 'gyr_y', 'gyr_z'], 'data_path': 'PATH TO THE DATASET', 'single_instance_path': 'PATH TO INSTANCE', 'sliding_window': {'ws': 30, 'overlap': 15, 'w_type': 'hann', 'w_center': True, 'print_stats': False}, 'x_number': 2, 'filter': {'order': 5, 'wn': 0.1, 'type': 'lowpass'}, 'PCA': {'n_comp': None}, 'classifier': {'SVC': {'C': None, 'kernel': 'rbf', 'gamma': None}}, 'fine_tune': {'param_grid': [{'C': [1, 10, 100, 1000], 'kernel': ['linear']}, {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']}], 'cv': 5, 'verbose': 1}, 'fit': {'epochs': None, 'batch': None, 'verbose': 'auto'}}


## MongoDB database instantiation

The relevant information for the MongoDB client connection, the database name, and collection name is located in the configuration file.

```
# DB Connection with the uri (host)
client: "mongodb://localhost:27017/"

# db name
db: "aiot_course"

# db collection
col: "NAME YOUR COLLECTION"
```

In [3]:
# import library for hanlding the MongoDB client
import pymongo
# import library for retrieving datetime
from datetime import datetime

### Create the database

To create a database in MongoDB, start by creating a MongoClient object, then specify a connection URL with the correct ip address and the name of the database you want to create.

MongoDB will create the database if it does not exist, and make a connection to it.

In [4]:
client = pymongo.MongoClient(config["client"])

In [5]:
db = client[config["db"]]

### Instantiate the collection

To create a collection in MongoDB, use database object and specify the name of the collection you want to create.

MongoDB will create the collection if it does not exist.

In [6]:
col = db[config["col"]]

Initially, no collection will be shown in MongoDB before you enter the first document!

## Create the data collection

Uploading the gathered data to MongoDB collection. The data directory structure should be as follows:

```
.
└── data/
    ├── class_A/
    │   ├── data_A_01.csv
    │   ├── data_A_02.csv
    │   └── ..
    ├── class_B/
    │   ├── data_B_01.csv
    │   ├── data_B_02.csv
    │   └── .
    └── class ...
```

In [7]:
# import library for hanlding the csv data and transformations
import pandas as pd
import json

Get data path:

In [8]:
data_path = "../data"
print(data_path)

../data


List all files in a path:

In [9]:
classes_folders_list = [f for f in os.listdir(data_path) if os.path.isdir(os.path.join(data_path, f))]
print(classes_folders_list)

['α', 'β', 'γ', 'δ']


In [10]:
# print files in folder
folder_path = os.path.join(data_path, classes_folders_list[0])
files_in_folder = [f for f in os.listdir(folder_path) if os.path.isfile(os.path.join(folder_path, f))]
print(files_in_folder)

[]


Each document in the MongoDB database should have the following schema:

```json
{
  "data": {
    "acc_x": ["array", "of", "values"],
    "acc_y": ["array", "of", "values"],
    "acc_z": ["array", "of", "values"],
    "gyr_x": ["array", "of", "values"],
    "gyr_y": ["array", "of", "values"],
    "gyr_z": ["array", "of", "values"]
  },
  "label": "The label of the instance",
  "datetime": "MongoDB datetime object (it can be generated with the datetime.datetime.now() function"
}
```

Accordingly, if you are using gyroscope or both accelerometer and gyroscope, the following order and naming of the sensor keys should be defined:

* for gyroscope: `gyr_x`, `gyr_y`, `gyr_z` for the three axes
* for accelerometer and gyroscope: `acc_x`, `acc_y`, `acc_z`, `gyr_x`, `gyr_y`, `gyr_z` for the six axes

**Note: Be careful, the document is mandatory to have the aforementioned schema, in order to argue and proceed with the rest of the processes later on, in data engineering, plotting, etc.**

In [11]:
from utils import df_rebase, df_rebase_acc

## Provide the code to upload the data to MongoDB

In [12]:
for item in classes_folders_list:
    gyro_path = data_path + "/" + item + "/gyro"
    accel_path = data_path + "/" + item + "/accel"
    
    gyro_files = [f for f in os.listdir(gyro_path) if os.path.isfile(os.path.join(gyro_path, f))]
    accel_files = [f for f in os.listdir(accel_path) if os.path.isfile(os.path.join(accel_path, f))]

    # for accel_file in accel_files:
    #     accel_df = pd.read_csv(accel_path + "/" + accel_file)
    #     final_df = accel_df[config["order"]]
    #     final_df = final_df.rename(columns= dict(zip(config["order"], config["rename"])))  # rename the columns

    #     data_dict = final_df.to_dict('list')
    #     data_dict = {k: v for k, v in data_dict.items()}
    #     doc = {
    #         'data': data_dict,
    #         'label': item,
    #         'datetime': accel_df['time (03:00)'][0].replace("T", " ")
    #     }
    #     col.insert_one(doc)


    for gyro_file, accel_file in zip(gyro_files, accel_files):
        gyro_df = pd.read_csv(gyro_path + "/" + gyro_file)
        accel_df = pd.read_csv(accel_path + "/" + accel_file)
        
        final_df = df_rebase(accel_df,gyro_df, config["order"], config["rename"])


        # print(final_df.columns)

        data_dict = final_df.to_dict('list')
        data_dict = {k: v for k, v in data_dict.items()}

        doc = {
            'data': data_dict,
            'label': item,
            'datetime': gyro_df['time (03:00)'][0].replace("T", " ")
        }

        # print doc as json
        # print(json.dumps(doc, indent=4))


        # insert data to MongoDB
        col.insert_one(doc)