In [9]:
!pip install pymongo
!pip install pandas



### Import

This code cell imports necessary Python libraries and modules for the project. It includes libraries for MongoDB interaction (`pymongo`), numerical operations (`numpy`), time-related functions (`time`), system-related operations (`sys`), and data manipulation (`pandas`).

In [10]:
from pymongo import MongoClient
import numpy as np
import time
import sys
import pandas as pd

### MongoDB Connection

In this code cell, a connection to a MongoDB database is established using a MongoDB URI (`mongo_URI`). It connects to the "sim-bridge" database and accesses the "PRJ-" collection within that database.

In [11]:
mongo_URI = "mongodb+srv://monitor:kundrovejmamka@xerxes.57jmr.mongodb.net/alfa?retryWrites=true&w=majority"
cluster = MongoClient(mongo_URI)

db = cluster["sim-bridge"]
col = db["PRJ-7"]

### Data Retrieval and DataFrame Creation

This code cell retrieves documents from the MongoDB collection. It specifically looks for documents where the "measurements" field exists and the "meta" field does not exist. The retrieved documents are stored in an empty list (`document_list`), and then a Pandas DataFrame (`df2`) is created from this list, enabling further data analysis and manipulation.


In [12]:
# Find documents where "measurements" exists and "meta" does not
documents = col.find()

# Initialize an empty list to store the documents
document_list = []

# Iterate through the cursor and store documents in the list
for document in documents:
    document_list.append(document)

# Create a DataFrame from the list of documents
df2 = pd.DataFrame(document_list)

## Pickle

### DataFrame Pickling

In this code snippet, the `df` DataFrame is copied from `df2`. Then, the `to_pickle` method is used to serialize and save the DataFrame as 'df.pickle' for future storage or retrieval.


In [134]:
df = df2.copy()

# Pickle the DataFrame to a file (e.g., 'df.pickle')
df.to_pickle('df.pickle')

### Loading Pickled DataFrame

In this code snippet, the `pd.read_pickle` function is used to load a previously pickled DataFrame from the


In [135]:
# Load the pickled DataFrame
df = pd.read_pickle('df.pickle')

In [136]:
import numpy as np

# Extract "uuid" values from dictionaries in the "meta" column
uuids = [entry.get("uuid", None) for entry in df["meta"]]

# Filter out None values and find unique UUIDs
unique_uuids = np.unique([uuid for uuid in uuids if uuid is not None])

# unique_uuids now contains all unique UUIDs in the DataFrame

(unique_uuids)

array(['31357529339808', '44332625541024',
       '8e08d85e-05fe-466a-9810-20018f643c92',
       '902ddc4c-2131-40b8-a0f8-0d7fd033f9c5',
       'aa9fecb1-3220-40f4-9557-8555126eb533'], dtype='<U36')

### Removing Rows with NaN in "measurements" Column

In this operation, rows in the DataFrame are filtered to remove those where the "measurements" column contains NaN (missing) values. This effectively eliminates rows without data in the "measurements" column from the DataFrame.


In [137]:
# Drop rows with NaN values in the "measurements" column
df.dropna(subset=["measurements"], inplace=True)

### Bridge Data Classification

In this code section, the dataset is divided into two separate DataFrames based on the sensors' names associated with each bridge.

1. **Sensor Definitions**: First, the sensors for Bridge M022 and Bridge M023 are defined using predefined lists `sensors_M022` and `sensors_M023`, respectively.

2. **Classification Function**: The `classify_bridge` function is defined to classify each document in the DataFrame into one of the bridges based on the presence of sensors. It checks which sensors are present in the "measurements" column for each document and assigns the document to "M022" or "M023" accordingly. In cases where sensors don't match either bridge, the document is classified as "Unknown."

3. **Applying Classification**: The classification function is applied to each row in the DataFrame, creating a new column called 'Bridge' that indicates the bridge classification for each document.

4. **Creating Separate DataFrames**: Finally, two separate DataFrames, `df_M022` and `df_M023`, are created by filtering the original DataFrame based on the 'Bridge' column. These DataFrames contain the data specific to Bridge M022 and Bridge M023, respectively.

This code effectively organizes the data into separate DataFrames for analysis and classification based on the sensors associated with each bridge.


In [138]:
# Define the sensors for each bridge
df.iloc[-1]["measurements"].keys(), df.iloc[-2]["measurements"].keys()
sensors_M022 = ['7', '8', '9', '13', '14', '15', '16', '19', '21', '28']
sensors_M023 = ['4', '5', '6', '10', '11', '12', '17', '18', '20']

# Function to classify a document to a bridge based on available sensors


def classify_bridge(row):
    sensors_present = list(row["measurements"].keys())
    if any(sensor in sensors_present for sensor in sensors_M022):
        return "M022"
    elif any(sensor in sensors_present for sensor in sensors_M023):
        return "M023"
    else:
        return "Unknown"  # Handle cases where sensors don't match either bridge


# Apply the classification function to create a new 'Bridge' column
df["Bridge"] = df.apply(classify_bridge, axis=1)

# Create separate DataFrames for M022 and M023
df_M022 = df[df["Bridge"] == "M022"]
df_M023 = df[df["Bridge"] == "M023"]

# Data Transformation for Bridge M022 and M023

## Bridge M022
### Create a Copy of df_M022
- A copy of the DataFrame `df_M022` is created to ensure modifications do not affect the original DataFrame.

### Define LVDT Sensors for M022
- A list `sensors_M022` is defined containing the names of LVDT sensors for Bridge M022.

### Iterate Through LVDT Sensors (M022)
- The code iterates through the LVDT sensors defined for Bridge M022 (`sensors_M022`).
- For each LVDT sensor, it creates two new columns, `d_{sensor}.pv0` and `d_{sensor}.pv1`, in the DataFrame `df_M022`.
- The values for these columns are extracted from the "measurements" data for each sensor, specifically "pv0" and "pv1".
- If the sensor data is not present in a row, the corresponding columns are filled with `None`.

### Create Columns for Temperature Sensor "21" (M022)
- Two new columns, `t_21.pv1` and `t_21.pv2`, are created in the DataFrame `df_M022` to store data from temperature sensor "21".
- The values for these columns are extracted from the "measurements" data for sensor "21", specifically "pv1" and "pv2".
- If the sensor data is not present in a row, the corresponding columns are filled with `None`.

## Bridge M023
### Create a Copy of df_M023
- A copy of the DataFrame `df_M023` is created to ensure modifications do not affect the original DataFrame.

### Define LVDT Sensors for M023
- A list `sensors_M023` is defined containing the names of LVDT sensors for Bridge M023.

### Iterate Through LVDT Sensors (M023)
- The code iterates through the LVDT sensors defined for Bridge M023 (`sensors_M023`).
- For each LVDT sensor, it creates two new columns, `d_{sensor}.pv0` and `d_{sensor}.pv1`, in the DataFrame `df_M023`.
- The values for these columns are extracted from the "measurements" data for each sensor, specifically "pv0" and "pv1".
- If the sensor data is not present in a row, the corresponding columns are filled with `None`.

### Create Columns for Temperature Sensor "20" (M023)
- Two new columns, `t_20.pv1` and `t_20.pv2`, are created in the DataFrame `df_M023` to store data from temperature sensor "20".
- The values for these columns are extracted from the "measurements" data for sensor "20", specifically "pv1" and "pv2".
- If the sensor data is not present in a row, the corresponding columns are filled with `None`.

This code transforms the original DataFrames `df_M022` and `df_M023` by adding columns to store sensor data from LVDT sensors and temperature sensors for both bridges.


In [139]:
# Create a copy of df_M022
df_M022 = df_M022.copy()

# Define the sensors for Bridge M022 (LVDT sensors)
sensors_M022 = ['7', '8', '9', '13', '14', '15', '16', '19', '28']

# Iterate through the LVDT sensors and create columns for pv0 and pv1
for sensor in sensors_M022:
    df_M022[f"d_{sensor}.pv0"] = df_M022.apply(
        lambda row: row["measurements"][sensor]["pv0"] if sensor in row["measurements"] else None, axis=1)
    df_M022[f"d_{sensor}.pv1"] = df_M022.apply(
        lambda row: row["measurements"][sensor]["pv1"] if sensor in row["measurements"] else None, axis=1)

# Create columns for temperature sensor "21" (pv1 and pv2)
sensor = '21'
df_M022[f"t_{sensor}.pv1"] = df_M022.apply(
    lambda row: row["measurements"][sensor]["pv1"] if sensor in row["measurements"] else None, axis=1)
df_M022[f"t_{sensor}.pv2"] = df_M022.apply(
    lambda row: row["measurements"][sensor]["pv2"] if sensor in row["measurements"] else None, axis=1)


# Create a copy of df_M023
df_M023 = df_M023.copy()

# Define the sensors for Bridge M023 (LVDT sensors)
sensors_M023 = ['4', '5', '6', '10', '11', '12', '17', '18']

# Iterate through the LVDT sensors and create columns for pv0 and pv1
for sensor in sensors_M023:
    df_M023[f"d_{sensor}.pv0"] = df_M023.apply(
        lambda row: row["measurements"][sensor]["pv0"] if sensor in row["measurements"] else None, axis=1)
    df_M023[f"d_{sensor}.pv1"] = df_M023.apply(
        lambda row: row["measurements"][sensor]["pv1"] if sensor in row["measurements"] else None, axis=1)

# Create columns for temperature sensor "20" (pv1 and pv2)
sensor = '20'
df_M023[f"t_{sensor}.pv1"] = df_M023.apply(
    lambda row: row["measurements"][sensor]["pv1"] if sensor in row["measurements"] else None, axis=1)
df_M023[f"t_{sensor}.pv2"] = df_M023.apply(
    lambda row: row["measurements"][sensor]["pv2"] if sensor in row["measurements"] else None, axis=1)

## Data Transformation for Bridge M022 and M023

### Common Steps for Both Bridges
- Replace the 'time' column with the 'epoch' value.
- Drop the 'measurements' column.

### Bridge M022
- Create a 'battery' column to store battery voltage.
- Drop the 'meta' and 'Bridge' columns.

### Bridge M023
- Create a 'battery' column to store battery voltage.
- Drop the 'meta' and 'Bridge' columns.

These steps simplify time representation, create a 'battery' column, and remove unnecessary columns in both bridges' data.


In [140]:
# Replace the 'time' column with the 'epoch' value for Bridge M022
df_M022['time'] = df_M022['time'].apply(lambda x: x['server']['epoch'])

# Replace the 'time' column with the 'epoch' value for Bridge M023
df_M023['time'] = df_M023['time'].apply(lambda x: x['server']['epoch'])

# Drop the 'measurements' column from both DataFrames
df_M022.drop(columns=['measurements'], inplace=True)
df_M023.drop(columns=['measurements'], inplace=True)

# Create the 'battery' column in Bridge M022
df_M022['battery'] = df_M022['meta'].apply(
    lambda x: x['power']['battery']['V'])

# Create the 'battery' column in Bridge M023
df_M023['battery'] = df_M023['meta'].apply(
    lambda x: x['power']['battery']['V'])

# Drop the 'meta' and 'Bridge' columns from both DataFrames
df_M022.drop(columns=['meta', 'Bridge'], inplace=True)
df_M023.drop(columns=['meta', 'Bridge'], inplace=True)

### LVDT Sensors
- New columns ('d_{sensor}') represent the difference between 'pv0' and 'pv1' for each LVDT sensor in both bridges.

### Temperature Sensors
- New columns ('t_{sensor}') represent the difference between 'pv1' and 'pv2' for each temperature sensor in both bridges.

These transformations simplify sensor data analysis.


In [141]:
# Create new columns for the difference between pv0 and pv1 for each LVDT sensor in Bridge M022
for sensor in ['7', '8', '9', '13', '14', '15', '16', '19', '28']:
    df_M022[f'd_{sensor}'] = df_M022[f'd_{sensor}.pv0'] - \
        df_M022[f'd_{sensor}.pv1']

# Create new columns for the difference between pv0 and pv1 for each LVDT sensor in Bridge M023
for sensor in ['4', '5', '6', '10', '11', '12', '17', '18']:
    df_M023[f'd_{sensor}'] = df_M023[f'd_{sensor}.pv0'] - \
        df_M023[f'd_{sensor}.pv1']
    
# Create new columns for the difference between pv1 and pv2 for each temperature sensor in Bridge M022
for sensor in ['21']:
    df_M022[f't_{sensor}'] = df_M022[f't_{sensor}.pv1'] - \
        df_M022[f't_{sensor}.pv2']

# Create new columns for the difference between pv1 and pv2 for each temperature sensor in Bridge M023
for sensor in ['20']:
    df_M023[f't_{sensor}'] = df_M023[f't_{sensor}.pv1'] - \
        df_M023[f't_{sensor}.pv2']

In [142]:
df_M022.columns, df_M023.columns

(Index(['_id', 'time', 'd_7.pv0', 'd_7.pv1', 'd_8.pv0', 'd_8.pv1', 'd_9.pv0',
        'd_9.pv1', 'd_13.pv0', 'd_13.pv1', 'd_14.pv0', 'd_14.pv1', 'd_15.pv0',
        'd_15.pv1', 'd_16.pv0', 'd_16.pv1', 'd_19.pv0', 'd_19.pv1', 'd_28.pv0',
        'd_28.pv1', 't_21.pv1', 't_21.pv2', 'battery', 'd_7', 'd_8', 'd_9',
        'd_13', 'd_14', 'd_15', 'd_16', 'd_19', 'd_28', 't_21'],
       dtype='object'),
 Index(['_id', 'time', 'd_4.pv0', 'd_4.pv1', 'd_5.pv0', 'd_5.pv1', 'd_6.pv0',
        'd_6.pv1', 'd_10.pv0', 'd_10.pv1', 'd_11.pv0', 'd_11.pv1', 'd_12.pv0',
        'd_12.pv1', 'd_17.pv0', 'd_17.pv1', 'd_18.pv0', 'd_18.pv1', 't_20.pv1',
        't_20.pv2', 'battery', 'd_4', 'd_5', 'd_6', 'd_10', 'd_11', 'd_12',
        'd_17', 'd_18', 't_20'],
       dtype='object'))

In [152]:
# Remove rows from df_M022 with time below 1691193600
df_M022 = df_M022[df_M022['time'] >= 1691539200]

# Remove rows from df_M023 with time below 1691193600
df_M023 = df_M023[df_M023['time'] >= 1691539200]

Export to csv

In [153]:
# Export df_M022 to a CSV file
df_M022.to_csv('df_M022.csv', index=False)

# Export df_M023 to a CSV file
df_M023.to_csv('df_M023.csv', index=False)

Pickling (Saving) the DataFrames:

In [149]:
# Save (pickle) df_M022 to a file
df_M022.to_pickle('df_M022.pickle')

# Save (pickle) df_M023 to a file
df_M023.to_pickle('df_M023.pickle')

Loading the Pickled DataFrames:

In [150]:
import pandas as pd

# Load df_M022 from the pickle file
df_M022 = pd.read_pickle('df_M022.pickle')

# Load df_M023 from the pickle file
df_M023 = pd.read_pickle('df_M023.pickle')

In [146]:
df_M023.shape, df_M022.shape

((14682, 30), (9636, 33))

# Visualisation

Loading the Pickled DataFrames:

In [161]:
import pandas as pd

# Load df_M022 from the pickle file
df_M022 = pd.read_pickle('df_M022.pickle')

# Load df_M023 from the pickle file
df_M023 = pd.read_pickle('df_M023.pickle')

In [162]:
import numpy as np

# Convert the 'time' column to pandas datetime if not already for both DataFrames
df_M022['datetime'] = pd.to_datetime(df_M022['time'], unit='s')
df_M023['datetime'] = pd.to_datetime(df_M023['time'], unit='s')

In [163]:
# Check data types of all columns in daily_df_M022
print(df_M022.dtypes)

# Check data types of all columns in daily_df_M023
print(df_M023.dtypes)

_id                 object
time               float64
d_7.pv0            float64
d_7.pv1            float64
d_8.pv0            float64
d_8.pv1            float64
d_9.pv0            float64
d_9.pv1            float64
d_13.pv0           float64
d_13.pv1           float64
d_14.pv0           float64
d_14.pv1           float64
d_15.pv0           float64
d_15.pv1           float64
d_16.pv0           float64
d_16.pv1           float64
d_19.pv0           float64
d_19.pv1           float64
d_28.pv0           float64
d_28.pv1           float64
t_21.pv1           float64
t_21.pv2           float64
battery            float64
d_7                float64
d_8                float64
d_9                float64
d_13               float64
d_14               float64
d_15               float64
d_16               float64
d_19               float64
d_28               float64
t_21               float64
datetime    datetime64[ns]
dtype: object
_id                 object
time               float64
d_4.pv0       

In [165]:

# Resample the data to daily bins and aggregate using mean (or another aggregation method)
# Replace 'mean' with another aggregation method if needed
daily_df_M022 = df_M022.resample('D', on='datetime').mean()
# Replace 'mean' with another aggregation method if needed
daily_df_M023 = df_M023.resample('D', on='datetime').mean()

TypeError: agg function failed [how->mean,dtype->object]