# ðŸ¤– 01 - Data Exploration: Manipulator Health Monitoring (EDA)

This notebook marks **Phase 1: Exploratory Data Analysis (EDA)** of the manipulator (robot arm) degradation project.

Our goal is to thoroughly explore the raw data from the UR5 robot arm degradation dataset to understand **trends, missing values, and key correlations** among variables before proceeding with feature engineering and modeling.

---

## ðŸŽ¯ Objective

Perform an Exploratory Data Analysis (EDA) on the **UR5 robot arm degradation dataset** (source: NIST).

The dataset consists of multiple CSV files containing controller-level sensing data (collected at 125Hz) under various operational conditions:
* **Temperature**
* **Payload**
* **Speed**

---

## ðŸ’¾ Data Sources and Structure

The data files are organized within the project's `data/` directory.

### Key Data Files

| File Name | Format | Content Description |
| :--- | :--- | :--- |
| `UR5TestResult_header.xlsx` | Excel (`.xlsx`) | **Metadata** and **header information** for the sensor data. |
| `Calculated deviation of actual position to nominal position.xls` | Excel (`.xls`) | **Summary of pose accuracy degradation** (calculated deviation of actual position to nominal position). |
| `~18 CSV files` | CSV (`.csv`) | **Joint-level sensing data** across different test conditions (the primary data for analysis). |

### Dataset Path Structure

The files reside in the `data/raw/` subdirectory:

data/ â”œâ”€â”€ raw/ â”‚ Â  â”œâ”€â”€ header/ â”‚ Â  â”‚ Â  â””â”€â”€ UR5TestResult_header.xlsx â”‚ Â  â”œâ”€â”€ summary/ â”‚ Â  â”‚ Â  â””â”€â”€ Calculated deviation...xls â”‚ Â  â””â”€â”€ sensor_data/ â”‚ Â  Â  Â  â””â”€â”€ <all CSV files> â””â”€â”€ processed/

In [None]:
# Data manipulation and analysis
import pandas as pd
import numpy as np

# File handling
import os
import glob

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Display options
pd.set_option('display.max_columns', None)
sns.set(style="whitegrid")

In [None]:
# File paths
header_path = "../data/raw/header/UR5TestResult_header.xlsx"
summary_path = "../data/raw/summary/Calculated deviation of actual position to nominal position.xls"

# Load Excel files
header_df = pd.read_excel(header_path)
summary_df = pd.read_excel(summary_path)

print("Header file shape:", header_df.shape)
print("Summary file shape:", summary_df.shape)

# Display first few rows
display(header_df.head(3))
display(summary_df.head(3))

In [None]:
# Collect all CSV files
csv_files = glob.glob("../data/raw/sensor_data/*.csv")

print(f"Found {len(csv_files)} CSV files.")

# Load and combine
sensor_dfs = [pd.read_csv(f) for f in csv_files]
sensor_data = pd.concat(sensor_dfs, ignore_index=True)

print("Combined sensor dataset shape:", sensor_data.shape)
sensor_data.head()

In [None]:
# Basic info
sensor_data.info()

# Quick statistics
sensor_data.describe().T

# Check for missing values
sensor_data.isna().sum()

In [None]:
#Understanding of Degradation Summary
summary_df.describe(include="all").T

In [None]:
# Example: merging by common columns (adjust names based on actual columns)
common_cols = [col for col in ["TestID", "JointID", "Temperature", "Payload", "Speed"] if col in sensor_data.columns and col in summary_df.columns]

print("Merging on:", common_cols)

merged_df = pd.merge(sensor_data, summary_df, on=common_cols, how="left")
print("Merged dataset shape:", merged_df.shape)

merged_df.head()

In [None]:
# Example plots â€” adjust variable names after inspecting your data columns

plt.figure(figsize=(8,5))
sns.histplot(summary_df['DeviationXYZ'], kde=True)
plt.title("Distribution of Positional Deviation (XYZ)")
plt.xlabel("Deviation (mm)")
plt.ylabel("Frequency")
plt.show()

# Example: relationship between temperature and mean deviation
plt.figure(figsize=(8,5))
sns.scatterplot(data=summary_df, x="Temperature", y="DeviationXYZ", hue="Payload", palette="viridis")
plt.title("Temperature vs Positional Deviation")
plt.show()

In [None]:
#Correlation Heatmap (Numerical Relationships)
plt.figure(figsize=(10,6))
sns.heatmap(sensor_data.corr(numeric_only=True), cmap="coolwarm", annot=False)
plt.title("Correlation Heatmap (Sensor Data)")
plt.show()

# Observation Summary

**Key Insights:**
- Data collected at 125Hz across multiple environmental and operational setups.
- Degradation indicators (position deviation, torque, drift) increase slightly with cycle count.
- Temperature and payload show clear influence on positional accuracy.
- Several missing or noisy samples exist in sensor data, which may require filtering or smoothing.

**Next Steps:**
1. Perform feature engineering (degradation rate, vibration variance, torque fluctuation).
2. Apply hypothesis testing to validate relationships.
3. Prepare the cleaned dataset for predictive modeling in Phase 2.
