# Neural Network: GPU Notebook
### In-Progress

## Notebook Summary

## Table of Contents

- [Notebook Setup](#Notebook-Setup)
- [Read in Parquet](#Read-in-Parquet)
- [MLP Using PyTorch](#Encode-Features)
- [Results](#Results)
- [Save as Pickle](#Save-as_Pickle)

## Notebook Setup

Significant functions used can be found in [assignment_3_tools.py](./assignment_3_tools.py)

### PyTorch Installation

PyTorch allows for GPU acceleration. If a cuda compatible GPU is avaiable you can install cuda and pytorch. Below is the method used for installing cuda. Make sure to install the appropriate pytorch for the cuda version installed.

```bash
# Install Cuda
sudo apt-get install nvidia-cuda-toolkit

# Install PyTorch
conda install pytorch torchvision torchaudio pytorch-cuda -c pytorch -c nvidia
```

In [1]:
import os
import torch
import numpy as np
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader, Subset
from sklearn.model_selection import KFold
from assignment_3_tools import dict_to_parquet, pickle_to_dict, parquet_extract

In [29]:
# Cuda, Python, and PyTorch Versions
! nvcc --version
! python3 --version
print(f"PyTorch {torch.__version__}")

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
Python 3.10.12
PyTorch 2.3.0


In [30]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

Using cuda device


## Read in Parquet

In [7]:
extract_pq = ['X_train','y_train','X_test','y_test']
pq_jar = parquet_extract("../../Data/GoogleDrive/Encoded_Data", extract_pq)

X_train = pq_jar['X_train']
y_train = pq_jar['y_train']
X_test = pq_jar['X_test']
y_test = pq_jar['y_test']

## MLP Using PyTorch

In [9]:
# Assuming X and y are your input features and labels, respectively
# Replace these with your actual data
X = np.array(X_train)  # Example NumPy array input
y = np.array(y_train).squeeze()  # Example NumPy array target labels

# Convert to tensors
X_tensor = torch.tensor(X, dtype=torch.float32)
y_tensor = torch.tensor(y, dtype=torch.long)

# Combine into a dataset
dataset = TensorDataset(X_tensor, y_tensor)

# Define the MLP model
class SimpleMLP(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(SimpleMLP, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, output_dim),
            nn.Softmax(dim=1)
        )

    def forward(self, x):
        return self.model(x)

# Initialize cross-validation parameters
k_folds = 5
kf = KFold(n_splits=k_folds, shuffle=True)

# Training loop with cross-validation
input_dim = X.shape[1]
hidden_dim = 64
output_dim = 2  # Adjust to your class count
num_epochs = 10
batch_size = 32
fold_results = {}

# Start cross-validation
for fold, (train_idx, val_idx) in enumerate(kf.split(X)):
    print(f'Fold {fold + 1}/{k_folds}')
    
    # Create training and validation data loaders for this fold
    train_subset = Subset(dataset, train_idx)
    val_subset = Subset(dataset, val_idx)
    train_loader = DataLoader(train_subset, batch_size=batch_size, shuffle=True)
    val_loader = DataLoader(val_subset, batch_size=batch_size, shuffle=False)
    
    # Initialize model, loss function, and optimizer
    model = SimpleMLP(input_dim, hidden_dim, output_dim)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    
    # Training for each epoch
    for epoch in range(num_epochs):
        model.train()
        for inputs, labels in train_loader:
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
    
    # Evaluate on the validation set
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for inputs, labels in val_loader:
            outputs = model(inputs)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    
    # Calculate accuracy
    accuracy = correct / total * 100
    fold_results[fold + 1] = accuracy
    print(f'Accuracy for Fold {fold + 1}: {accuracy:.2f}%')

# Display overall results
print("\nCross-validation results:")
for fold, acc in fold_results.items():
    print(f"Fold {fold}: {acc:.2f}%")

Fold 1/5


RuntimeError: Expected floating point type for target with class probabilities, got Long