# 📘 CWRU Bearing Dataset: First Exploration

**🎯 Learning Objectives**

In this notebook, you will:
- Load a sample CWRU bearing dataset file in Python.
- Inspect and understand its contents.
- Compute and visualize basic signal statistics.
- Plot a vibration signal in both time and frequency domains.

In [None]:
# ---- Import Required Libraries ----

import os
import scipy.io as sio
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from urllib.parse import urljoin
import urllib.request

In [None]:
# ---- Dataset Source Configuration ----
# Purpose: Make notebook cross-platform (Colab or local)
# and allow datasets to be persistent on Google Drive.

# Detect environment
try:
    import google.colab
    ON_COLAB = True
except ImportError:
    ON_COLAB = False

# Set paths
if ON_COLAB:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    COURSE_PATH = "/content/drive/MyDrive/Industrial_ML_Course"
else:
    COURSE_PATH = r"D:\Industrial_ML_Course"  # adjust if needed

DATASET_PATH = os.path.join(COURSE_PATH, "datasets/CWRU")
NOTEBOOK_PATH = os.path.join(COURSE_PATH, "notebooks")

os.makedirs(DATASET_PATH, exist_ok=True)
os.makedirs(NOTEBOOK_PATH, exist_ok=True)

print("Environment:", "Colab" if ON_COLAB else "Local")
print("Dataset Path:", DATASET_PATH)

In [None]:
# ---- Download or Load Dataset ----

# Example file to load
file_name = "105.mat"

# Official CWRU dataset hosting (can replace with your own mirror/repo)
dataset_url = "https://engineering.case.edu/sites/default/files/"
file_url = urljoin(dataset_url, file_name)
file_path = os.path.join(DATASET_PATH, file_name)

# Download if missing
if not os.path.exists(file_path):
    print(f"Downloading {file_name} ...")
    urllib.request.urlretrieve(file_url, file_path)
    print("Download completed!")
else:
    print(f"{file_name} already exists in {DATASET_PATH}")

# Load the .mat file
data = sio.loadmat(file_path)
print("Dataset keys:", data.keys())

In [None]:
# ---- Extract Vibration Signal ----
# Look for a key containing "DE_time" (Drive End sensor)

signal_key = None
for key in data.keys():
    if "DE_time" in key:
        signal_key = key
        break

signal = data[signal_key].squeeze()
print("Signal shape:", signal.shape)
print("First 10 samples:", signal[:10])

In [None]:
# ---- Basic Signal Statistics ----

print("Mean:", np.mean(signal))
print("Standard Deviation:", np.std(signal))
print("RMS (Root Mean Square):", np.sqrt(np.mean(signal**2)))
print("Min:", np.min(signal))
print("Max:", np.max(signal))

# Histogram
plt.figure(figsize=(8, 4))
plt.hist(signal, bins=50, color="skyblue", edgecolor="black")
plt.title(f"Amplitude Distribution of {file_name}")
plt.xlabel("Amplitude")
plt.ylabel("Frequency")
plt.show()

In [None]:
# ---- Visualize Signal in Time Domain ----

plt.figure(figsize=(12, 4))
plt.plot(signal, color="steelblue")
plt.title(f"Vibration Signal from {file_name}")
plt.xlabel("Sample Index")
plt.ylabel("Amplitude")
plt.grid(True)
plt.show()

# Zoomed-in segment
plt.figure(figsize=(12, 4))
plt.plot(signal[:1000], color="darkorange")
plt.title(f"Zoomed Signal (First 1000 Samples) - {file_name}")
plt.xlabel("Sample Index")
plt.ylabel("Amplitude")
plt.grid(True)
plt.show()

In [None]:
# ---- Rolling Statistics (using Pandas) ----

signal_series = pd.Series(signal)
rolling_mean = signal_series.rolling(window=200).mean()

plt.figure(figsize=(12, 4))
plt.plot(signal_series[:3000], label="Original", alpha=0.7)
plt.plot(rolling_mean[:3000], label="Rolling Mean (200 samples)", linewidth=2)
plt.title("Rolling Mean Demonstration")
plt.xlabel("Sample Index")
plt.ylabel("Amplitude")
plt.legend()
plt.grid(True)
plt.show()

In [None]:
# ---- Visualize Signal in Frequency Domain ----
from scipy.fft import fft, fftfreq

N = len(signal)

fs = 12000  # CWRU typical sampling frequency: 12 kHz
T = 1 / fs

yf = fft(signal)
xf = fftfreq(N, T)[:N // 2]

plt.figure(figsize=(12, 4))
plt.plot(xf, 2.0/N * np.abs(yf[0:N//2]), color="purple")
plt.title(f"Frequency Spectrum from {file_name} (fs={fs} Hz)")
plt.xlabel("Frequency [Hz]")
plt.ylabel("Amplitude")
plt.grid(True)
plt.show()

**🚀 Explore More!**

Try these additional tasks:

1. Load another file (e.g., "106.mat") and compare its time and frequency plots with "105.mat".
2. Compute and compare statistics (mean, std, RMS) between two signals.
3. Plot histograms of two signals side by side.
4. Plot a rolling standard deviation to see local variations in vibration intensity.
5. Recreate the FFT but use a log scale on the x-axis to highlight low-frequency behavior.
6. Overlay two signals (from healthy and faulty bearings) and discuss visible differences.