# MongoDB Handling

After installing the MongoDB server in your machine, you can use this notebook for handling the initial processes with the database.

Specifically, in this step, we utilize Python's `pymongo` library to exploit its capabilities for MongoDB server interaction.

**Important Note: Be sure that the MongoDB server is up and running as a service in the background.**

For example, in macOS, to run MongoDB (i.e. the mongod process) as a service, run:

* `brew services start mongodb-community`

To stop a mongod running as a macOS service, use the following command as needed:

* `brew services stop mongodb-community`

To install MongoDB in your system, follow the instructions here:

* https://www.mongodb.com/docs/manual/administration/install-community/


**Note:** You can modify any of the processes below, however, you have to explain your thoughts.

In [1]:
# import library for various processes with the OS
import os

In [2]:
# import library for hanlding the MongoDB client
import pymongo
# import library for retrieving datetime
from datetime import datetime

## MongoDB database instantiation

The relevant information for the MongoDB client connection, the database name, and collection name is located in the configuration file.

```
# DB Connection with the uri (host)
client: "mongodb://localhost:27017/"

# db name
db: "aiot_course"

# db collection
col: "NAME YOUR COLLECTION"
```

## Load configuration

In [3]:
# import library for yaml handling
import yaml

In [4]:
config_path = os.path.join(os.getcwd(), "config.yml")

with open(config_path) as file:
    config = yaml.load(file, Loader=yaml.FullLoader)

### Create the database

To create a database in MongoDB, start by creating a MongoClient object, then specify a connection URL with the correct ip address and the name of the database you want to create.

MongoDB will create the database if it does not exist, and make a connection to it.

In [5]:
client = pymongo.MongoClient(config["client"])

In [6]:
db = client[config["db"]]

### Instantiate the collection

To create a collection in MongoDB, use database object and specify the name of the collection you want to create.

MongoDB will create the collection if it does not exist.

In [7]:
col = db[config["col"]]

Initially, no collection will be shown in MongoDB before you enter the first document!

## Create the data collection

Uploading the gathered data to MongoDB collection. The data directory structure should be as follows:

```
.
└── data/
    ├── class_A/
    │   ├── data_A_01.csv
    │   ├── data_A_02.csv
    │   └── ..
    ├── class_B/
    │   ├── data_B_01.csv
    │   ├── data_B_02.csv
    │   └── .
    └── class ...
```

In [8]:
# import library for hanlding the csv data and transformations
import pandas as pd
import json

Get data path:

In [9]:
data_path = os.path.join(os.getcwd(), "data")
print(data_path)

C:\Users\user\miniconda3\envs\myenv\data


List all files in a path:

In [10]:
classes_folders_list = [f for f in os.listdir(data_path) if os.path.isdir(os.path.join(data_path, f))]
print(classes_folders_list)

['class_A', 'class_B']


In [12]:
# print files in folder
folder_path = os.path.join(data_path, classes_folders_list[0])
files_in_folder = [f for f in os.listdir(folder_path) if os.path.isfile(os.path.join(folder_path, f))]
print(files_in_folder)

['data_A_01.csv', 'data_A_02.csv', 'data_A_03.csv', 'data_A_04.csv', 'data_A_05.csv', 'data_A_06.csv', 'data_A_07.csv', 'data_A_08.csv', 'data_A_09.csv']


Each document in the MongoDB database should have the following schema:

```json
{
  "data": {
    "acc_x": ["array", "of", "values"],
    "acc_y": ["array", "of", "values"],
    "acc_z": ["array", "of", "values"],
  },
  "label": "The label of the instance",
  "datetime": "MongoDB datetime object (it can be generated with the datetime.datetime.now() function"
}
```

Accordingly, if you are using gyroscope or both accelerometer and gyroscope, the following order and naming of the sensor keys should be defined:

* for gyroscope: `gyr_x`, `gyr_y`, `gyr_z` for the three axes
* for accelerometer and gyroscope: `acc_x`, `acc_y`, `acc_z`, `gyr_x`, `gyr_y`, `gyr_z` for the six axes

**Note: Be careful, the document is mandatory to have the aforementioned schema, in order to argue and proceed with the rest of the processes later on, in data engineering, plotting, etc.**

In [13]:
from utilsF import df_rebase

## Provide the code to upload the data to MongoDB

In [14]:
import os

# Get the current working directory
current_path = os.getcwd()

# Print the current working directory
print("Current Path:", current_path)


Current Path: C:\Users\user\miniconda3\envs\myenv


In [15]:
import pymongo
import pandas as pd
import yaml
import os
from datetime import datetime

def load_config(config_path):
    with open(config_path, 'r') as file:
        config = yaml.safe_load(file)
    return config

def read_and_transform_data(file_path):
    data = pd.read_csv(file_path)
    filename = os.path.basename(file_path).split('.')[0]  # Get filename without extension as label

    # Convert 'epoch' to 'datetime' for the entire column
    data['datetime'] = pd.to_datetime(data['Epoch'], unit='ms')  # Ensure the epoch unit is correct ('s' for seconds)

    # Transpose the accelerometer data into arrays
    transformed_data = {
        "data": {
            "acc_x": data['X'].tolist(),
            "acc_y": data['Y'].tolist(),
            "acc_z": data['Z'].tolist()
        },
        "label": filename,  # Using filename as the label
        "datetime": data['datetime'].tolist()  # Store all datetime conversions if needed
    }
    return transformed_data

def insert_data_to_mongodb(data, config):
    try:
        client = pymongo.MongoClient(config['client'])
        db = client[config['db']]
        collection = db[config['col']]
        result = collection.insert_one(data)
        print(f"Data inserted successfully with ID: {result.inserted_id}")
    except Exception as ex:
        print(f"An error occurred: {ex}")

def process_files(directory_path, config):
    file_paths = [os.path.join(directory_path, f) for f in os.listdir(directory_path) if f.endswith('.csv')]
    if not file_paths:
        print("No CSV files found in the directory.")
    for file_path in file_paths:
        print(f"Processing file: {file_path}")
        data = read_and_transform_data(file_path)
        insert_data_to_mongodb(data, config)

def main():
    config_path = 'config.yml'  # Adjust this path to your actual configuration file
    root_directory_path = 'C:/Users/user/miniconda3/envs/myenv/data'  # Adjust this path to the root data directory
    config = load_config(config_path)
    class_directories = ['class_A', 'class_B']  # Directories to process

    for class_dir in class_directories:
        directory_path = os.path.join(root_directory_path, class_dir)
        process_files(directory_path, config)

if __name__ == "__main__":
    main()


Processing file: C:/Users/user/miniconda3/envs/myenv/data\class_A\data_A_01.csv
Data inserted successfully with ID: 666c3855f1ba8959ad191495
Processing file: C:/Users/user/miniconda3/envs/myenv/data\class_A\data_A_02.csv
Data inserted successfully with ID: 666c3855f1ba8959ad191497
Processing file: C:/Users/user/miniconda3/envs/myenv/data\class_A\data_A_03.csv
Data inserted successfully with ID: 666c3855f1ba8959ad191499
Processing file: C:/Users/user/miniconda3/envs/myenv/data\class_A\data_A_04.csv
Data inserted successfully with ID: 666c3855f1ba8959ad19149b
Processing file: C:/Users/user/miniconda3/envs/myenv/data\class_A\data_A_05.csv
Data inserted successfully with ID: 666c3855f1ba8959ad19149d
Processing file: C:/Users/user/miniconda3/envs/myenv/data\class_A\data_A_06.csv
Data inserted successfully with ID: 666c3855f1ba8959ad19149f
Processing file: C:/Users/user/miniconda3/envs/myenv/data\class_A\data_A_07.csv
Data inserted successfully with ID: 666c3855f1ba8959ad1914a1
Processing fi