
Orestis Antonis Makris
1084516

# MongoDB Handling

After installing the MongoDB server in your machine, you can use this notebook for handling the initial processes with the database.

Specifically, in this step, we utilize Python's `pymongo` library to exploit its capabilities for MongoDB server interaction.

**Important Note: Be sure that the MongoDB server is up and running as a service in the background.**

For example, in macOS, to run MongoDB (i.e. the mongod process) as a service, run:

* `brew services start mongodb-community`

To stop a mongod running as a macOS service, use the following command as needed:

* `brew services stop mongodb-community`

To install MongoDB in your system, follow the instructions here:

* https://www.mongodb.com/docs/manual/administration/install-community/


**Note:** You can modify any of the processes below, however, you have to explain your thoughts.

In [27]:
# import library for various processes with the OS
import os

## Load configuration

In [28]:
# import library for yaml handling
import yaml

In [29]:
config_path = os.path.join(os.getcwd(), "config.yml")

with open(config_path) as file:
    config = yaml.load(file, Loader=yaml.FullLoader)

## MongoDB database instantiation

The relevant information for the MongoDB client connection, the database name, and collection name is located in the configuration file.

```
# DB Connection with the uri (host)
client: "mongodb://localhost:27017/"

# db name
db: "aiot_course"

# db collection
col: "NAME YOUR COLLECTION"
```

In [30]:
# import library for hanlding the MongoDB client
import pymongo
# import library for retrieving datetime
from datetime import datetime

### Create the database

To create a database in MongoDB, start by creating a MongoClient object, then specify a connection URL with the correct ip address and the name of the database you want to create.

MongoDB will create the database if it does not exist, and make a connection to it.

In [31]:
client = pymongo.MongoClient(config["client"])

In [32]:
db = client[config["db"]]

### Instantiate the collection

To create a collection in MongoDB, use database object and specify the name of the collection you want to create.

MongoDB will create the collection if it does not exist.

In [33]:
col = db[config["col"]]

Initially, no collection will be shown in MongoDB before you enter the first document!

## Create the data collection

Uploading the gathered data to MongoDB collection. The data directory structure should be as follows:

```
.
└── merged_data/
    ├── class_A/
    │   ├── data_A_01.csv
    │   ├── data_A_02.csv
    │   └── ..
    ├── class_B/
    │   ├── data_B_01.csv
    │   ├── data_B_02.csv
    │   └── .
    └── class ...
```

In [34]:
# import library for hanlding the csv data and transformations
import pandas as pd

Get data path:

In [35]:
data_path = os.path.join(os.getcwd(), "merged_data")
print(data_path)

c:\Users\orest\Downloads\IoT-Course-AIoT-project_Orestis_Makris_1084516\merged_data


List all files in a path:

In [36]:
classes_folders_list = [f for f in os.listdir(data_path) if os.path.isdir(os.path.join(data_path, f))]
print(classes_folders_list)

['.vscode', 'Anti_Clock_Wise', 'Clock_Wise', 'Gun_Shot', 'Left_Horizontal_Scroll', 'Right_Horizontal_Scroll']


In [37]:
# print files in folder
folder_path = os.path.join(data_path, classes_folders_list[0])
files_in_folder = [f for f in os.listdir(folder_path) if os.path.isfile(os.path.join(folder_path, f))]
print(files_in_folder)

[]


Each document in the MongoDB database should have the following schema:

```json
{
  "data": {
    "acc_x": ["array", "of", "values"],
    "acc_y": ["array", "of", "values"],
    "acc_z": ["array", "of", "values"],
  },
  "label": "The label of the instance",
  "datetime": "MongoDB datetime object (it can be generated with the datetime.datetime.now() function"
}
```

Accordingly, if you are using gyroscope or both accelerometer and gyroscope, the following order and naming of the sensor keys should be defined:

* for gyroscope: `gyr_x`, `gyr_y`, `gyr_z` for the three axes
* for accelerometer and gyroscope: `acc_x`, `acc_y`, `acc_z`, `gyr_x`, `gyr_y`, `gyr_z` for the six axes

**Note: Be careful, the document is mandatory to have the aforementioned schema, in order to argue and proceed with the rest of the processes later on, in data engineering, plotting, etc.**

In [38]:
from utils import df_rebase

## Provide the code to upload the data to MongoDB

In [39]:
# Process each class folder
for class_folder in classes_folders_list:
    folder_path = os.path.join(data_path, class_folder)
    
    files_in_folder = [f for f in os.listdir(folder_path) if os.path.isfile(os.path.join(folder_path, f))]
   
    for file_name in files_in_folder:
        file_path = os.path.join(folder_path, file_name)
        df = pd.read_csv(file_path)
        print(df)
        print(config["order"])
        print(config["rename"])
        # Rebase dataframe according to the specified order and rename columns
        df = df_rebase(df, config["order"], config["rename"])
        print(df)
        # Convert dataframe to dictionary for MongoDB
        data = df.to_dict(orient='list')
    
        # Create document according to the specified schema
        label = class_folder
        timestamp = datetime.now()
        document = {"data": data, "label": label, "datetime": timestamp}
        
        # Insert document into MongoDB
        col.insert_one(document)

print("Data upload to MongoDB completed.")

    elapsed (s)  gyro_x   gyro_y  gyro_z  accel_x  accel_y  accel_z
0          8.55 -28.902 -108.293  95.152   -0.191   -0.077    0.877
1          8.56 -35.335 -127.622  90.884   -0.117   -0.128    0.842
2          8.57 -43.049 -130.915  81.250   -0.129   -0.136    0.876
3          8.58 -49.146 -125.366  87.256   -0.252   -0.129    0.947
4          8.59 -51.220 -117.378  96.616   -0.371   -0.115    0.992
..          ...     ...      ...     ...      ...      ...      ...
90         9.45 -11.098  -64.360  -4.726   -0.603   -0.163    0.817
91         9.46 -11.128 -118.841  -2.012   -0.668   -0.183    0.813
92         9.47 -18.537 -179.512   4.024   -0.599   -0.205    0.870
93         9.48 -25.000 -217.866  12.896   -0.448   -0.199    0.891
94         9.49 -24.665 -225.915  21.189   -0.253   -0.153    0.927

[95 rows x 7 columns]
['gyro_x', 'gyro_y', 'gyro_z', 'accel_x', 'accel_y', 'accel_z']
['gyr_x', 'gyr_y', 'gyr_z', 'acc_x', 'acc_y', 'acc_z']
Initial columns: ['elapsed (s)', 'gyro_x',