#Loading Features and Clustering
The content of this tutorial was developed by Team 3 as part of an engineering project management class. This is a straightforward tutorial on using the Extracted Features from the previous Tutorial to set up the activations. We tinker with activations in two Clustering Algorithms, and then proceed to visualize the clusters using pca plots.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Set Up The Activations From The Extracted Features

This code imports necessary libraries such as PyTorch, scikit-learn, and matplotlib for data manipulation, neural network modeling, visualization, and clustering tasks. It also loads pre-computed features (referred to as "checkpoints") from different layers of various convolutional neural network (CNN) models. These features, along with their corresponding labels, are stored in separate variables (activations1, activations2, activations3, labels1, labels2, labels3). The code sets up the environment for subsequent data analysis, feature manipulation, and clustering operations.

In [None]:
import torch
from torch.utils.data import Dataset, DataLoader
from torchvision.datasets import ImageFolder
from torchvision.transforms import ToTensor
from torchvision import transforms
import matplotlib.pyplot as plt
import csv
import time
import numpy as np
from scipy.signal import resample_poly
import os
import pandas as pd
import math
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import random_split
from sklearn.cluster import KMeans

# All of these "checkpoints" are features from different layers of different cnn models
# Change this how you see fit
checkpoint1 = torch.load('activations1.pth')
checkpoint2 = torch.load('activations2.pth')
checkpoint3 = torch.load('activations3.pth')

activations1 = checkpoint1['activations1']
labels1 = checkpoint1['labels1']

activations2 = checkpoint2['activations2']
labels2 = checkpoint2['labels2']

activations3 = checkpoint3['activations3']
labels3 = checkpoint3['labels3']

# You can use the clustering agorithm with features from other neural network.
# This tutorial helps you to begin to understand the objective of the cloudd-rf project.

FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/MyDrive/Colab Notebooks/Tutorial Data Info/activations1.pth'

# Use Activations in XGBoost Cluster Algorithm

XGBoost (eXtreme Gradient Boosting) is a powerful machine learning algorithm used for supervised learning tasks, particularly for regression and classification problems. It builds a series of decision trees sequentially, where each subsequent tree corrects the errors made by the previous one. XGBoost combines the strengths of gradient boosting with several enhancements, such as regularization, parallel processing, and handling missing values. It optimizes the model's performance by minimizing a specific loss function, often using gradient descent methods. XGBoost is known for its high predictive accuracy, scalability, and efficiency, making it a popular choice in data science competitions and real-world applications.

In [None]:
# Where activations2 is, is wher you can put the activations of another team in.
word_map = { 0: '2-ASK', 1: 'BPSK', 2: 'Cnst T', 3: 'P-FMCW', 4: 'N-FMCW'}
label = ['2-ASK', 'BPSK', 'Cnst T', 'P-FMCW', 'N-FMCW']

import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# This for loop makes it so that axis 1 is the same size across all activations
new = torch.zeros_like(activations2)
for x in range(activations3.shape[0]):
    for y in range(activations3.shape[1]):
        new[x,y] = activations3[x,y]
activations3 = new


# Combine activations and labels from different checkpoints
all_activations = np.concatenate([activations3, activations2], axis=0)
all_labels = np.concatenate([labels3, labels2], axis=0)

# Flatten activations if needed
flat_activations = all_activations.reshape(all_activations.shape[0], -1)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(flat_activations, all_labels, test_size=0.2, random_state=42)

# Convert data to DMatrix format for XGBoost
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Set XGBoost parameters
params = {
    'objective': 'multi:softmax',  # for multiclass classification
    'num_class': len(label),       # number of classes
    'eval_metric': 'mlogloss'      # use cross-entropy loss for multiclass
}

# Train the XGBoost model
num_rounds = 100
xgb_model = xgb.train(params, dtrain, num_rounds)

# Make predictions on the test set
y_pred = xgb_model.predict(dtest)

# Convert predictions to integers
y_pred = y_pred.astype(int)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {100*accuracy}", "%")


# Use Activations in Random Forest Cluster Algorithm

Random Forest is an ensemble learning technique used for both classification and regression tasks. It builds multiple decision trees during training and combines their predictions to make more accurate and robust predictions. Each decision tree in the Random Forest is trained on a subset of the training data and a random subset of features, which introduces randomness and reduces overfitting. During prediction, each tree in the forest independently predicts the target variable, and the final prediction is determined by aggregating the predictions from all trees (e.g., by taking the majority vote for classification or the average for regression). Random Forest is known for its simplicity, scalability, and ability to handle high-dimensional data and nonlinear relationships, making it a popular choice in machine learning for various applications.

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Where activations2 is, is where you can put the activations of another team in.
word_map = {0: '2-ASK', 1: 'BPSK', 2: 'Cnst T', 3: 'P-FMCW', 4: 'N-FMCW'}
label = ['2-ASK', 'BPSK', 'Cnst T', 'P-FMCW', 'N-FMCW']

# Flatten activations if needed
flat_activations = activations3.reshape(activations3.shape[0], -1)

# Combine activations and labels from different checkpoints
all_activations = np.concatenate([flat_activations, activations2], axis=0)
all_labels = np.concatenate([labels3, labels2], axis=0)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(all_activations, all_labels, test_size=0.2, random_state=42)

# Initialize Random Forest classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the Random Forest model
rf_classifier.fit(X_train, y_train)

# Make predictions on the test set
y_pred = rf_classifier.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {100*accuracy} %")