# Python Assignment: 1D CNN for Time Series Signal Classification

This assignment challenges you to build a Convolutional Neural Network (CNN) for classifying different types of signals embedded in time-series data. While CNNs are widely known for image processing, their ability to detect local patterns makes them powerful for 1D sequence data as well. You will generate synthetic signals, preprocess them, design a 1D CNN, train it, and evaluate its performance in distinguishing between various signal types.

## Part 1: Data Generation and Preprocessing (35 points)

We'll create a synthetic dataset of fixed-length time-series signals belonging to several distinct classes. This controlled environment will allow you to clearly see the CNN's ability to learn patterns.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import warnings

warnings.filterwarnings('ignore') # Suppress warnings for cleaner output
np.random.seed(42) # for reproducibility
tf.random.set_seed(42)

# 1.1 Define Signal Parameters
#    - `signal_length`: Number of time steps in each signal (e.g., 100-200).
#    - `n_samples_per_class`: Number of signals to generate for each class.
#    - `n_classes`: Number of distinct signal types.

signal_length = 150
n_samples_per_class = 500
n_classes = 4 # e.g., Sine, Square, Triangle, Noise Pulse

# 1.2 Generate Synthetic Time-Series Signals
#    Create a function `generate_signal(signal_type, length, noise_level)` that returns a 1D numpy array.
#    Implement at least `n_classes` distinct signal types:
#    - **Class 0 (Sine Wave):** A clean sine wave with a specific frequency and amplitude.
#    - **Class 1 (Square Wave):** A square wave pattern.
#    - **Class 2 (Triangle Wave):** A triangular wave pattern.
#    - **Class 3 (Random Pulse):** Mostly noise, but with a brief, sudden positive or negative pulse at a random location.
#    Add varying levels of Gaussian noise to all signals.

def generate_signal(signal_type: int, length: int, noise_level: float = 0.5):
    t = np.linspace(0, 2 * np.pi, length)
    noise = np.random.normal(0, noise_level, length)

    if signal_type == 0: # Sine Wave
        return 10 * np.sin(t * 3) + noise
    elif signal_type == 1: # Square Wave
        return 8 * np.sign(np.sin(t * 5)) + noise
    elif signal_type == 2: # Triangle Wave
        return 7 * (2 * np.abs(t / (2 * np.pi) - np.floor(t / (2 * np.pi) + 0.5)) - 1) + noise
    elif signal_type == 3: # Random Pulse
        signal = np.random.normal(0, noise_level * 2, length) # Base noise
        pulse_start = np.random.randint(length // 4, length * 3 // 4)
        pulse_strength = np.random.uniform(15, 25) * np.random.choice([-1, 1])
        pulse_width = np.random.randint(5, 15)
        signal[pulse_start:pulse_start + pulse_width] += pulse_strength
        return signal
    else:
        raise ValueError("Invalid signal_type")

X = []
y = []
for class_id in range(n_classes):
    for _ in range(n_samples_per_class):
        X.append(generate_signal(class_id, signal_length))
        y.append(class_id)

X = np.array(X)
y = np.array(y)

print(f"Generated X shape: {X.shape}")
print(f"Generated y shape: {y.shape}")

# 1.3 Visualize Sample Signals
#    Plot a few sample signals from each class to ensure they are distinct and visually discernible.

plt.figure(figsize=(15, 8))
for i in range(n_classes):
    plt.subplot(n_classes, 1, i + 1)
    # Find an index for the current class
    idx = np.where(y == i)[0][0]
    plt.plot(X[idx])
    plt.title(f'Sample Signal - Class {i}')
    plt.ylabel('Amplitude')
    if i == n_classes - 1:
        plt.xlabel('Time Step')
    plt.grid(True)
plt.tight_layout()
plt.suptitle('Sample Signals for Each Class', y=1.02, fontsize=16)
plt.show()

# 1.4 Data Reshaping and One-Hot Encoding
#    - **Reshaping X:** For a 1D CNN, the input shape needs to be `(samples, timesteps, features)`. Here, `features` will be 1.
#    - **One-Hot Encoding y:** Convert integer labels to one-hot encoded vectors.

print("\n--- Preprocessing Data for CNN ---")

# Reshape X for CNN input
X = X.reshape(-1, signal_length, 1)
print(f"Reshaped X for CNN input: {X.shape}")

# One-hot encode y labels
encoder = OneHotEncoder(sparse_output=False)
y_ohe = encoder.fit_transform(y.reshape(-1, 1))
print(f"One-Hot Encoded y labels: {y_ohe.shape}")

# 1.5 Train-Validation-Test Split
#    Split the data into training, validation, and test sets. (e.g., 70% train, 15% validation, 15% test).

X_train, X_temp, y_train, y_temp = train_test_split(X, y_ohe, test_size=0.3, random_state=42, stratify=y_ohe)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp)

print(f"Train set: {X_train.shape}, {y_train.shape}")
print(f"Validation set: {X_val.shape}, {y_val.shape}")
print(f"Test set: {X_test.shape}, {y_test.shape}")


## Part 2: Building the 1D CNN Model (25 points)

You will design a Sequential 1D CNN model suitable for your time-series classification task.

In [None]:
# 2.1 Define the CNN Architecture
#    Create a `tf.keras.Sequential` model with the following layers:
#    - One or more `Conv1D` layers:
#        - Choose `filters` (e.g., 32, 64).
#        - Choose `kernel_size` (e.g., 3, 5, 7) - this determines the length of the 1D convolution window.
#        - Use `relu` activation.
#        - The first `Conv1D` layer must specify `input_shape=(signal_length, 1)`.
#    - One or more `MaxPooling1D` layers (e.g., `pool_size=2`).
#    - A `Flatten` layer to convert the 3D output of convolutional layers into 1D for dense layers.
#    - One or more `Dense` hidden layers with `relu` activation.
#    - The final `Dense` output layer with `n_classes` neurons and `softmax` activation.

print("\n--- Building 1D CNN Model ---")
# TODO: Build the Sequential model
# model = keras.Sequential([
#     layers.Conv1D(filters=32, kernel_size=5, activation='relu', input_shape=(signal_length, 1)),
#     layers.MaxPooling1D(pool_size=2),
#     layers.Conv1D(filters=64, kernel_size=5, activation='relu'),
#     layers.MaxPooling1D(pool_size=2),
#     layers.Flatten(),
#     layers.Dense(100, activation='relu'),
#     layers.Dense(n_classes, activation='softmax')
# ])

# 2.2 Compile the Model
#    Configure the model for training:
#    - `optimizer`: Choose `'adam'`.
#    - `loss`: Use `'categorical_crossentropy'` (since labels are one-hot encoded).
#    - `metrics`: Monitor `['accuracy']`.

# TODO: Compile the model
# model.compile(optimizer='adam',
#               loss='categorical_crossentropy',
#               metrics=['accuracy'])

# 2.3 Display Model Summary
#    Print the model summary to review the layers, output shapes, and parameter counts.

model.summary()


## Part 3: Training the CNN (20 points)

Train your CNN model on the training data, utilizing the validation set to monitor for overfitting.

In [None]:
# 3.1 Train the Model
#    Use `model.fit()` to train the CNN.
#    - `epochs`: Choose a sufficient number (e.g., 20-50).
#    - `batch_size`: Use a common batch size (e.g., 32, 64).
#    - `validation_data`: Pass `(X_val, y_val)` to monitor validation performance.
#    - (Optional) Add an `EarlyStopping` callback to prevent overfitting and save the best model.
#    Store the returned `history` object.

epochs = 30
batch_size = 64

# Optional: Early Stopping Callback
# early_stopping = keras.callbacks.EarlyStopping(
#     monitor='val_loss', patience=5, restore_best_weights=True
# )

print(f"\n--- Training CNN Model for {epochs} epochs with batch size {batch_size} ---")
# TODO: Train the model
# history = model.fit(X_train, y_train,
#                     epochs=epochs,
#                     batch_size=batch_size,
#                     validation_data=(X_val, y_val),
#                     callbacks=[early_stopping], # Uncomment if using early stopping
#                     verbose=1)

print("Training complete.")


## Part 4: Model Evaluation and Interpretation (15 points)

Assess your trained model's ability to classify unseen signals and understand its performance characteristics.

In [None]:
# 4.1 Evaluate on Test Data
#    Use `model.evaluate()` on your `X_test` and `y_test`.

print("\n--- Evaluating Model on Test Data ---")
# TODO: Evaluate the model
# test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=2)

print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")

# 4.2 Plot Training History
#    Plot the training and validation loss over epochs.
#    Plot the training and validation accuracy over epochs.

history_dict = history.history
loss_values = history_dict['loss']
val_loss_values = history_dict['val_loss']
accuracy_values = history_dict['accuracy']
val_accuracy_values = history_dict['val_accuracy']

epochs_trained = len(loss_values)
epochs_range = range(1, epochs_trained + 1)

plt.figure(figsize=(14, 6))

plt.subplot(1, 2, 1)
# TODO: Plot training and validation loss
# plt.plot(epochs_range, loss_values, 'bo', label='Training loss')
# plt.plot(epochs_range, val_loss_values, 'b', label='Validation loss')
# plt.title('Training and Validation Loss')
# plt.xlabel('Epochs')
# plt.ylabel('Loss')
# plt.legend()

plt.subplot(1, 2, 2)
# TODO: Plot training and validation accuracy
# plt.plot(epochs_range, accuracy_values, 'bo', label='Training accuracy')
# plt.plot(epochs_range, val_accuracy_values, 'b', label='Validation accuracy')
# plt.title('Training and Validation Accuracy')
# plt.xlabel('Epochs')
# plt.ylabel('Accuracy')
# plt.legend()

plt.tight_layout()
plt.show()

# 4.3 Make Predictions and Visualize Samples (Tougher Aspect)
#    Make predictions on a few sample signals from the test set.
#    Display the predicted class and the true class. Highlight misclassifications.

print("\n--- Making Predictions and Visualizing Samples ---")
# Get class labels (0, 1, 2, 3)
true_labels_int = np.argmax(y_test, axis=1)
predicted_probabilities = model.predict(X_test)
predicted_labels_int = np.argmax(predicted_probabilities, axis=1)

num_samples_to_show = 8
plt.figure(figsize=(16, 12))
for i in range(num_samples_to_show):
    plt.subplot(num_samples_to_show, 1, i + 1)
    plt.plot(X_test[i].flatten()) # Flatten back to 1D for plotting
    plt.xticks([])
    plt.yticks([])

    true_class = true_labels_int[i]
    predicted_class = predicted_labels_int[i]

    color = 'green' if true_class == predicted_class else 'red'
    plt.title(f'True Class: {true_class}, Predicted Class: {predicted_class}', color=color)
    plt.grid(True)
plt.suptitle('Sample Test Predictions (Green=Correct, Red=Incorrect)', y=1.005, fontsize=16)
plt.tight_layout(rect=[0, 0.03, 1, 0.98])
plt.show()


# 4.4 Generate Classification Report and Confusion Matrix
#    Provide a detailed breakdown of performance per class.

print("\n--- Classification Report ---")
# TODO: Print classification report
# print(classification_report(true_labels_int, predicted_labels_int, target_names=[f'Class {i}' for i in range(n_classes)]))

print("\n--- Confusion Matrix ---")
# TODO: Plot confusion matrix
# cm = confusion_matrix(true_labels_int, predicted_labels_int)
# plt.figure(figsize=(8, 6))
# sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
#             xticklabels=[f'Class {i}' for i in range(n_classes)],
#             yticklabels=[f'Class {i}' for i in range(n_classes)])
# plt.xlabel('Predicted Label')
# plt.ylabel('True Label')
# plt.title('Confusion Matrix')
# plt.show()


## Part 5: Reflection and Further Exploration (5 points)

Answer the following questions based on your understanding and observations from this assignment.

### Your Answers to Reflection Questions:

1.  **How do `Conv1D` layers help in identifying patterns within time-series data? Explain the role of `kernel_size` in this context.**

    _(Your answer here)_

2.  **What is the purpose of `MaxPooling1D` layers in a 1D CNN, and how do they contribute to the model's efficiency and robustness?**

    _(Your answer here)_

3.  **Discuss the advantages and disadvantages of using a CNN for time-series classification compared to traditional methods like hand-crafted features + SVM/Random Forest, or other deep learning models like LSTMs.**

    * **Advantages:** _(Your answer here)_
    * **Disadvantages:** _(Your answer here)_

4.  **If you had much longer time-series signals (e.g., thousands of time steps) or signals with multiple features per time step (e.g., x, y, z accelerometer data), how would you adapt the `input_shape` and potentially the network architecture?**

    _(Your answer here)_


## Deliverables:

1.  This completed Jupyter Notebook (`cnn_time_series_classification_assignment.ipynb`) with all code cells executed and reflection questions answered.
2.  Ensure all plots are clearly visible and well-labeled within the notebook.