# DataFrame Merge Pipeline

## Overview
This Jupyter notebook implements a data merging pipeline for combining multiple CSV files containing motion analysis data. It handles duplicate entries and ensures data integrity during the merge process.

### Key Features
- Merges multiple CSV files into a single DataFrame
- Removes duplicate video entries
- Maintains data consistency
- Exports consolidated results

### Prerequisites


In [None]:
import pandas as pd
import glob



### Process Flow
1. Locate all CSV files in target directory
2. Read and concatenate all files
3. Remove duplicate video entries
4. Reset index for clean output
5. Export to single CSV file

### Usage
The pipeline expects CSV files in the current directory and outputs a single consolidated `motion_data.csv` file with unique video entries.

### Note
- Input: Multiple CSV files with motion analysis data
- Output: Single CSV file (`motion_data.csv`)
- Deduplication based on `video_id` column
- Maintains original data structure while ensuring uniqueness

In [None]:
# Use glob to get all CSV files in the directory
csv_files = glob.glob("*.csv")

# Read and concatenate all CSV files into one DataFrame
df_list = [pd.read_csv(file) for file in csv_files]
merged_df = pd.concat(df_list, ignore_index=True)

# Remove duplicates based on the 'video_id' column
unique_df = merged_df.drop_duplicates(subset='video_id')

# Optionally, reset the index if needed
unique_df.reset_index(drop=True, inplace=True)

In [None]:
# Display the resulting DataFrame
unique_df

Unnamed: 0,video_id,file_name,motion_magnitude_mean,motion_magnitude_std,motion_magnitude_volatility,motion_direction_mean,motion_direction_std,motion_direction_volatility,spatial_entropy_mean,spatial_entropy_std,spatial_entropy_volatility,info_entropy_mean,info_entropy_std,info_entropy_volatility,avg_shots_per_min
0,P1reIZUmmTo,P1reIZUmmTo,1.047124,2.736799,1.941134,3.075574,0.660509,1.088754,5.601186,0.137032,0.142869,0.460189,0.759635,0.586817,0.749980
1,XlDGJ0xoUuQ,XlDGJ0xoUuQ,7.267014,4.326668,6.122454,3.074731,0.485987,0.690260,5.891461,2.260568,2.373351,3.502699,1.166152,1.654355,3.974217
2,vcEcn1h0G18,vcEcn1h0G18,10.160639,3.429074,4.445736,3.177627,0.343019,0.487674,6.255020,0.867853,1.254895,4.281098,0.754295,0.898561,38.512266
3,UbxUSsFXYo4,UbxUSsFXYo4,8.932534,4.123500,4.289419,3.220799,0.384036,0.640398,6.862274,0.823084,0.910774,4.149479,0.956405,0.880392,18.348998
4,ZXNQ_aUJu7A,ZXNQ_aUJu7A,10.596617,3.188511,3.854998,3.088211,0.359673,0.456376,6.038011,0.484015,0.530182,4.333926,0.303339,0.291729,5.815508
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15583,vpN9qC2ch1M,vpN9qC2ch1M,1.450960,3.885558,4.950634,2.949092,0.440514,0.352467,5.567377,0.960263,0.509048,0.619131,1.487786,1.974099,0.903501
15584,9lzL4v7okwI,9lzL4v7okwI,10.407893,3.066508,3.334389,3.141752,0.450786,0.702986,7.015267,0.464155,0.612523,4.400715,0.782966,0.815884,21.367290
15585,BoFxslm6DKY,BoFxslm6DKY,13.733934,2.421945,3.279414,3.153612,0.565776,0.840430,7.203560,0.570628,0.601249,4.955595,0.271914,0.350803,15.972485
15586,pYJsLJe8alQ,pYJsLJe8alQ,12.618681,3.079915,3.362808,3.145084,0.421527,0.621169,6.986616,0.551609,0.536225,4.737476,0.558853,0.523007,22.126477


In [None]:
# Step 5: Export the unique DataFrame to a CSV file
unique_df.to_csv('motion_data.csv', index=False)