# Iteration 1 - Data Understanding

## 1. Import Libraries

Load the necessary libraries for data analysis.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt

*Libraries used:*
- **pandas** for reading and processing CSV files.
- **matplotlib** for data visualization.

## 2. Load Data

Read the provided CSV file and display the first few rows.

In [4]:
# Read the CSV file
file_path = "Data/csv/20240606081620_blade1_10m.csv"
df = pd.read_csv(file_path)

In [5]:
# Display the first 5 rows
df.head()

Unnamed: 0,Time,Left,Right
0,0 sec,0.0,0.0
1,2.2676e-05 sec,0.0,0.0
2,4.5351e-05 sec,0.0,0.0
3,6.8027e-05 sec,0.0,0.0
4,9.0703e-05 sec,0.0,0.0


*Code review:*
- Reads the dataset from a CSV file.
- Displays the first five rows to understand the data structure.

## 3. Display Data Information

Get a summary of the dataset to understand data types and missing values.

In [6]:
# Display dataset information
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 841592 entries, 0 to 841591
Data columns (total 3 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   Time    841592 non-null  object 
 1   Left    841592 non-null  float64
 2   Right   841592 non-null  float64
dtypes: float64(2), object(1)
memory usage: 19.3+ MB


*Code review:*
- Shows the number of columns, data types, and non-null values.

### **3.1 Check Descriptive Statistics**

View basic statistical metrics of the dataset.

In [7]:
# Display descriptive statistics
df.describe()

Unnamed: 0,Left,Right
count,841592.0,841592.0
mean,-9.383008e-07,3.074363e-07
std,0.04057442,0.03902616
min,-0.9256264,-0.8904691
25%,-0.002685629,-0.002227851
50%,0.0,0.0
75%,0.002685629,0.002227851
max,1.0,0.938139


*Code review:*
- Provides metrics like mean, min, max, and quartiles for numerical columns.

### **3.2 Check for Missing Values**

Count the number of missing values in the dataset.

In [9]:
# Check for missing values
df.isnull().sum()

Time     0
Left     0
Right    0
dtype: int64

*Code review:*
- Identifies if there are any missing values in the dataset.

## 4. Data Visualization

Analyze the data patterns using plots.

### **4.1 Plot Audio Signal (Left & Right Channels)**

Create a plot to visualize the audio signal over time.

In [11]:
# Plot audio signals for left and right channels
plt.figure(figsize=(12, 5))
plt.plot(df['Time'], df['Left'], label='Left Channel', alpha=0.7)
plt.plot(df['Time'], df['Right'], label='Right Channel', alpha=0.7)
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.title('Waveform of Left and Right Channels')
plt.legend()
plt.show()

KeyboardInterrupt: 

*Code review:*
- Plots the audio signal over time to observe any patterns or anomalies.
- Both **Left Channel** and **Right Channel** are shown for comparison.

### **4.2 Histogram of Amplitude Distribution**

Visualize the amplitude distribution for each channel.

In [None]:
# Create a histogram of amplitude
plt.figure(figsize=(12, 5))
plt.hist(df['left'], bins=50, alpha=0.5, label='Left Channel', color='blue')
plt.hist(df['right'], bins=50, alpha=0.5, label='Right Channel', color='red')
plt.xlabel('Amplitude')
plt.ylabel('Frequency')
plt.title('Amplitude Distribution')
plt.legend()
plt.show()

*Code review:*
- Displays the amplitude distribution for both **left** and **right** channels.
- Helps identify noise or dominant signals.

## 5. Preliminary Findings

From this initial analysis, we can conclude:
- **Data Structure:** The dataset contains three columns (`time`, `left`, `right`).
- **Data Condition:** Presence or absence of missing values (based on `.isnull().sum()`).
- **Signal Patterns:** The waveform provides insights into amplitude variations.