# Merging Exercise Recordings with User Data

This notebook loads two JSON files:

1. **Exercise Recordings** (`exercise_recordings_2025-04-13_12-38-11.json`): Contains exercise sessions including fields such as `userEmail`, `exerciseName`, heart rate data, and accelerometer data.
2. **Users** (`users_2025-04-13_12-38-11.json`): Contains user information, including a unique `email` field, among other details.

The objective is to combine (i.e. perform an inner merge) these two datasets by matching `userEmail` from the exercise recordings with `email` in the user records. This allows for enriching the exercise data with additional user-specific details.

In [1]:
import pandas as pd
import json

# Define the file paths
recordings_file = 'exercise_recordings_2025-04-13_12-38-11.json'
users_file = 'users_2025-04-13_12-38-11.json'

# Load the exercise recordings file
# Assuming it is newline delimited JSON (each line is a JSON record)
try:
    recordings_df = pd.read_json(recordings_file, lines=True)
except ValueError:
    # If file is a single JSON array, remove the parameter lines=True
    recordings_df = pd.read_json(recordings_file)

# Load the users file similarly
try:
    users_df = pd.read_json(users_file, lines=True)
except ValueError:
    users_df = pd.read_json(users_file)

print('Exercise Recordings DataFrame:')
print(recordings_df.head())

print('\nUsers DataFrame:')
print(users_df.head())

Exercise Recordings DataFrame:
                                    _id  userEmail exerciseName  channel  \
0  {'$oid': '67cb210abc0d51d7519accb2'}          0  Bicep Curls        2   
1  {'$oid': '67cb223fbc0d51d7519accb6'}          0  Bicep Curls        1   
2  {'$oid': '67cecf44c4f5e290fedf2cfd'}          1  Bicep Curls        2   
3  {'$oid': '67ced05ac4f5e290fedf2cff'}          1  Bicep Curls        3   
4  {'$oid': '67ced159c4f5e290fedf2d03'}          1  Bicep Curls        3   

                                   hrDuringRecording  \
0  [{'time': 1741365504085.0, 'hr': 78}, {'time':...   
1  [{'time': 1741365582699.0, 'hr': 77}, {'time':...   
2  [{'time': 1741606485373.0, 'hr': 79}, {'time':...   
3  [{'time': 1741606762751.0002, 'hr': 79}, {'tim...   
4  [{'time': 1741607017764.0, 'hr': 79}, {'time':...   

                                     hrPostRecording  \
0                                                 []   
1  [{'time': 1741365664413.9998, 'hr': 77}, {'tim...   
2  [{'t

## Merging the DataFrames

We merge the recordings DataFrame with the users DataFrame on the condition that `recordings_df.userEmail` equals `users_df.email`. This is performed as an inner join so that only the records with matching user identifiers appear in the final output.

In [None]:
recordings_df['userEmail'] = recordings_df['userEmail'].astype(str)
users_df['email'] = users_df['email'].astype(str)
merged_df = pd.merge(recordings_df, users_df, left_on='userEmail', right_on='email', how='inner')

print('Merged DataFrame:')
print(merged_df.head())

SyntaxError: unmatched ')' (1131653129.py, line 4)

## Saving or Exporting the Merged Data

If further analysis is needed, you can save the merged DataFrame to a new JSON or CSV file.

In [None]:
# Export the merged DataFrame to a CSV file
merged_df.to_csv('/mnt/data/merged_output.csv', index=False)

# Alternatively, save as JSON (newline delimited)
merged_df.to_json('/mnt/data/merged_output.json', orient='records', lines=True)

## Summary

The notebook demonstrates how to read two JSON files (one containing exercise recordings and the other with user details), convert them into pandas DataFrames, and merge them based on matching user identifiers. This provides a consolidated dataset that can be used for further analysis.