# ASL Neural Network Pipeline Notebook

This notebook contains all the steps necessary to train a neural network for the ASL Neural Network App project located at [this repository](https://github.com/TWilliamsA7/asl-neural-app/tree/main). Utility functions can also be found in the above repository under the src directory.

1. Setup: Configuration & Authentication
2. Environment: Initialization & Imports
3. Data: Acquisition & Preprocessing
4. Data: Loading & Splitting
5. Model: Architecture
6. Model: Training
7. Model: Evaluation

## Setup: Configuration & Authenticatioon

This section of the notebook is for setting up the necessary authentication and configuration of the Colab environment

In [1]:
# Import necessary modules for setup

from google.colab import userdata, auth
import os
import sys

### Create github connection via colab variables

In [2]:
# Define repository details
USERNAME = "TWilliamsA7"
REPO_NAME = "asl-neural-app.git"
BRANCH_NAME = "main"

# Get PAT (Personal Access Token) stored in Colab Secrets
PAT = userdata.get("GITHUB_PAT")
if not PAT:
    raise ValueError("GITHUB_PAT secret not found!")

# Construct Authetnicated URL for accessing repositry
AUTHENTICATED_URL = f"https://{PAT}@github.com/{USERNAME}/{REPO_NAME}"
REPO_FOLDER = REPO_NAME.replace(".git", "")

# Set golabl Git configuration
!git config --global user.email "twilliamsa776@gmail.com"
!git config --global user.name "{USERNAME}"

print("Setup github connection and authenticated url successfully!")

Setup github connection and authenticated url successfully!


### Google Cloud Authentication

In [3]:
print("--- GCS Authentication ---")

auth.authenticate_user()

print("Google Cloud authentication complete.")

--- GCS Authentication ---
Google Cloud authentication complete.


## Environment: Initialization and Imports

### Clone Github Repository

In [4]:
# Clean up any existing clone (optional, but good for reliable restarts)
if os.path.isdir(REPO_FOLDER):
    print(f"Removing old {REPO_FOLDER} folder...")
    !rm -rf {REPO_FOLDER}

# Clone the repository using the authenticated URL
print(f"Cloning repository: {REPO_NAME}...")
!git clone {AUTHENTICATED_URL}

# Change directory into the cloned repository
%cd {REPO_FOLDER}
print(f"Current working directory: {os.getcwd()}")

Removing old asl-neural-app folder...
Cloning repository: asl-neural-app.git...
Cloning into 'asl-neural-app'...
remote: Enumerating objects: 92, done.[K
remote: Counting objects: 100% (92/92), done.[K
remote: Compressing objects: 100% (59/59), done.[K
remote: Total 92 (delta 39), reused 80 (delta 31), pack-reused 0 (from 0)[K
Receiving objects: 100% (92/92), 19.53 KiB | 9.76 MiB/s, done.
Resolving deltas: 100% (39/39), done.
/content/asl-neural-app
Current working directory: /content/asl-neural-app


### Install Dependencies

- Includes manual inclusion of kaggle.json file

In [8]:
print("Upgrading pip, setuptools, and wheel...")
!pip install --upgrade pip setuptools wheel -q

print("Using preinstalled numpy and tensorflow dependencies")

print("Installing remaining project dependencies from requirements.txt...")
!pip install -r requirements.txt -q

print("Dependencies installed successfully.")

# Install Kaggle API
!mkdir -p ~/.kaggle

# --- IMPORTANT: Manually upload kaggle.json to the ~/.kaggle folder now ---
print("\n--- MANUAL STEP REQUIRED ---")
print("1. Click the Folder icon (left sidebar).")
print("2. Navigate to the root folder (click the / symbol).")
print("3. Navigate to the hidden folder: .kaggle")
print("4. Upload your 'kaggle.json' file into the .kaggle folder.")
print("Proceed only after kaggle.json is uploaded.")

Upgrading pip, setuptools, and wheel...
Using preinstalled numpy and tensorflow dependencies
Installing remaining project dependencies from requirements.txt...
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ydf 0.13.0 requires protobuf<7.0.0,>=5.29.1, but you have protobuf 4.25.8 which is incompatible.
grpcio-status 1.71.2 requires protobuf<6.0dev,>=5.26.1, but you have protobuf 4.25.8 which is incompatible.
opentelemetry-proto 1.37.0 requires protobuf<7.0,>=5.0, but you have protobuf 4.25.8 which is incompatible.[0m[31m
[0mDependencies installed successfully.

--- MANUAL STEP REQUIRED ---
1. Click the Folder icon (left sidebar).
2. Navigate to the root folder (click the / symbol).
3. Navigate to the hidden folder: .kaggle
4. Upload your 'kaggle.json' file into the .kaggle folder.
Proceed only after kaggle.json is uploaded.


### Connect Src directory for access to utility functions

In [9]:
sys.path.append('src')
print("Setup Complete. Colab environment is ready.")

Setup Complete. Colab environment is ready.


## Data: Acquisition & Preprocessing

### Include necessary imports

In [10]:
import numpy as np

# If earlier cells are not ran
import os
import sys

# Ensure src accessibility
sys.path.append('src')

# Import utility functions
from data_utils import extract_keypoints



### Setup directories and constants

In [15]:
KAGGLE_DATASET_ID = "grassknoted/asl-alphabet"
DESTINATION_PATH = "sample_data"
PROCESSED_OUTPUT_DIR = 'processed_data'
DATA_ROOT_FOLDER_NAME = 'asl_alphabet_train'

os.makedirs(DESTINATION_PATH, exist_ok=True)
os.makedirs(PROCESSED_OUTPUT_DIR, exist_ok=True)

### Download Data via Kaggle API

In [17]:
print(f"Downloading dataset: {KAGGLE_DATASET_ID}")
!kaggle datasets download -d {KAGGLE_DATASET_ID} -p {DESTINATION_PATH} --unzip

# Define the exact root path to the image subfolders (A, B, C, etc.)
DATA_ROOT = os.path.join(DESTINATION_PATH, DATA_ROOT_FOLDER_NAME)
print(f"Image data root set to: {DATA_ROOT}")

Downloading dataset: grassknoted/asl-alphabet
Traceback (most recent call last):
  File "/usr/local/bin/kaggle", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/kaggle/cli.py", line 68, in main
    out = args.func(**command_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/kaggle/api/kaggle_api_extended.py", line 1741, in dataset_download_cli
    with self.build_kaggle_client() as kaggle:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/kaggle/api/kaggle_api_extended.py", line 688, in build_kaggle_client
    username=self.config_values['username'],
             ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: 'username'
Image data root set to: sample_data/asl_alphabet_train


### Feature Extraction and Array Storage

In [None]:
GCS_BUCKET_NAME = "gs://asl-keypoint-data-storage-2025"
GCS_DESTINATION_FOLDER = "processed_features_v1"

def create_and_save_features():
    X_keypoints, X_cnn, y_labels = [], [], []

    # Iterate through all class folders
    for label_index, class_name in enumerate(sorted(os.listdir(DATA_ROOT))):
        class_path = os.path.join(DATA_ROOT, class_name)
        if not os.path.isdir(class_path) or class_name.startswith('.'):
            continue

        print(f"Processing Class: {class_name} (Label: {label_index})")

        for image_name in os.listdir(class_path):
            image_path = os.path.join(class_path, image_name)

            # Use the imported modular function
            keypoints, resized_img = extract_keypoints(image_path)

            if keypoints is not None:
                # Store keypoints
                X_keypoints.append(keypoints)
                # Normalize and store CNN image data
                X_cnn.append(cv2.cvtColor(resized_img, cv2.COLOR_BGR2RGB) / 255.0)
                y_labels.append(label_index)

    # Convert to final NumPy arrays
    X_keypoints_array = np.array(X_keypoints, dtype=np.float32)
    X_cnn_array = np.array(X_cnn, dtype=np.float32)
    y_labels_array = np.array(y_labels, dtype=np.int32)

    TEMP_DIR = 'temp_feature_dump'
    os.makedirs(TEMP_DIR, exist_ok=True)

    # Save the processed data to the designated output folder
    np.save(os.path.join(TEMP_DIR, 'X_keypoints.npy'), X_keypoints_array)
    np.save(os.path.join(TEMP_DIR, 'X_cnn_images.npy'), X_cnn_array)
    np.save(os.path.join(TEMP_DIR, 'y_labels.npy'), y_labels_array)

    # Source is the local temp directory. Destination is the GCS path.
    GCS_PATH = f"{GCS_BUCKET_NAME}/{GCS_DESTINATION_FOLDER}"
    print(f"\nUploading processed features to {GCS_PATH}...")

    # The -m flag runs the command multi-threaded (faster) and -r copies the directory recursively
    !gsutil -m cp -r {TEMP_DIR} {GCS_PATH}

    print("\nUpload to GCS complete. Features are ready for training notebook.")

    print(f"\nSuccessfully processed {len(X_keypoints)} samples.")
    print(f"Keypoints Shape: {X_keypoints_array.shape}, Images Shape: {X_cnn_array.shape}")
    print("All processed arrays saved to the 'processed_data' directory.")

# --- EXECUTION ---
create_and_save_features()