### Step 1: Setup for Local Training

This notebook is configured for local training with NVIDIA GPU.

**Prerequisites:**
- NVIDIA GPU with CUDA support
- Python 3.11+ installed
- Virtual environment activated (if using venv)
- Dataset downloaded to `data/` directory

In [1]:
# Check GPU availability
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU device: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("⚠️ WARNING: CUDA not available. Training will be slow on CPU.")

PyTorch version: 2.3.0+cu121
CUDA available: True
CUDA version: 12.1
GPU device: NVIDIA GeForce RTX 5070 Ti Laptop GPU
GPU memory: 12.82 GB


NVIDIA GeForce RTX 5070 Ti Laptop GPU with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.
If you want to use the NVIDIA GeForce RTX 5070 Ti Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/



### Step 2: Navigate to Project Directory

Make sure you're in the project root directory. If running from a different location, adjust the path.

In [2]:
import os
from pathlib import Path

# Get current working directory
project_root = Path.cwd()
print(f"Current directory: {project_root}")

# Verify we're in the right place (should have emg2qwerty folder)
if (project_root / "emg2qwerty").exists():
    print("✓ Found emg2qwerty package")
else:
    print("⚠️ Warning: emg2qwerty package not found. Make sure you're in the project root.")

# Verify dataset directory exists
if (project_root / "data").exists():
    print("✓ Found data directory")
else:
    print("⚠️ Warning: data directory not found. Dataset should be in 'data/' folder.")

Current directory: c:\Users\junji\Documents\C147_final\emg2qwerty
✓ Found emg2qwerty package
✓ Found data directory


### Step 3: Install/Verify Required Packages

**Note:** If using a virtual environment, make sure it's activated before running this cell.

If packages are already installed, this will just verify/upgrade them.

In [3]:
# Install required packages
# Uncomment the line below if you need to install/update packages
# !pip install -r requirements.txt

# Or if using conda:
# !conda env update -f environment.yml

# Verify key packages
try:
    import pytorch_lightning as pl
    import torch
    import hydra
    print("✓ Key packages imported successfully")
    print(f"  - PyTorch Lightning: {pl.__version__}")
    print(f"  - PyTorch: {torch.__version__}")
    print(f"  - Hydra: {hydra.__version__}")
except ImportError as e:
    print(f"⚠️ Missing package: {e}")
    print("Run: pip install -r requirements.txt")


A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.2.6 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "c:\Users\junji\Documents\C147_final\emg2qwerty\venv\Lib\site-packages\ipykernel_launcher.py", line 18, in <module>
    app.launch_new_instance()
  File "c:\Users\junji\Documents\C147_final\emg2qwerty\venv\Lib\site-packages\traitlets\config\application.py", line 1075, in launch_instance
    app.start()
  File "c:\Users\junji\Documents\C147_final\emg2qwerty\venv\Lib\site-packages\ipykernel\kernelapp.py", line 758,

✓ Key packages imported successfully
  - PyTorch Lightning: 1.8.6
  - PyTorch: 2.3.0+cu121
  - Hydra: 1.3.2


### Step 4: Start Your Experiments!

**Important:**
- Make sure the dataset is in the `data/` directory
- Logs will be saved to `logs/YYYY-MM-DD/HH-MM-SS/`
- Checkpoints are saved to `logs/.../checkpoints/`
- TensorBoard logs are in `logs/.../logs/tensorboard/`

**Monitor Training:**
- View real-time progress in TensorBoard: `tensorboard --logdir logs/`
- Or check the console output below

#### Training

**Configuration:**
- GPU: Automatically detected (configured in `config/base.yaml`)
- Checkpoints: Saved to `logs/YYYY-MM-DD/HH-MM-SS/checkpoints/`
- Best model: Automatically saved based on `val/CER`
- Logs: TensorBoard logs in `logs/.../logs/tensorboard/`

**Training Options:**
- Single-user baseline: Use command below
- Custom epochs: Add `trainer.max_epochs=50` (default is 150)
- Custom batch size: Add `batch_size=16` (default is 32)

In [10]:
!python -m emg2qwerty.train \
user="single_user" \
trainer.max_epochs=50 \
batch_size=16

[2026-02-22 16:04:36,688][__main__][INFO] - 
Config:
user: single_user
dataset:
  train:
  - user: 89335547
    session: 2021-06-03-1622765527-keystrokes-dca-study@1-0efbe614-9ae6-4131-9192-4398359b4f5f
  - user: 89335547
    session: 2021-06-02-1622681518-keystrokes-dca-study@1-0efbe614-9ae6-4131-9192-4398359b4f5f
  - user: 89335547
    session: 2021-06-04-1622863166-keystrokes-dca-study@1-0efbe614-9ae6-4131-9192-4398359b4f5f
  - user: 89335547
    session: 2021-07-22-1627003020-keystrokes-dca-study@1-0efbe614-9ae6-4131-9192-4398359b4f5f
  - user: 89335547
    session: 2021-07-21-1626916256-keystrokes-dca-study@1-0efbe614-9ae6-4131-9192-4398359b4f5f
  - user: 89335547
    session: 2021-07-22-1627004019-keystrokes-dca-study@1-0efbe614-9ae6-4131-9192-4398359b4f5f
  - user: 89335547
    session: 2021-06-05-1622885888-keystrokes-dca-study@1-0efbe614-9ae6-4131-9192-4398359b4f5f
  - user: 89335547
    session: 2021-06-02-1622679967-keystrokes-dca-study@1-0efbe614-9ae6-4131-9192-4398359b4f5f

Global seed set to 1501
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
NVIDIA GeForce RTX 5070 Ti Laptop GPU with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.
If you want to use the NVIDIA GeForce RTX 5070 Ti Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

Missing logger folder: c:\Users\junji\Documents\C147_final\emg2qwerty\logs\2026-02-22\16-04-36/logs\tensorboard
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
  if not hasattr(numpy, tp_name):
  if not hasattr(numpy, tp_name):
  "lr_options": generate_power_seq(LEARNING_RATE_CIFAR, 11),
  contrastive_task: Union[FeatureMapContrastiveTask] = FeatureMapContrastiveTask("01, 02, 11"),
  self.nce_loss = AmdimNCELoss(tclip)
  return _target_(*args, *

#### Testing/Evaluation:

**Testing Options:**
- Test on best checkpoint: Use the checkpoint path from training logs
- Test on specific checkpoint: Provide full path to `.ckpt` file
- Test with different decoders: `decoder=ctc_greedy` or `decoder=ctc_beam`

**Checkpoint Path Format:**
- Best model: `logs/YYYY-MM-DD/HH-MM-SS/checkpoints/epoch=X-val_CER=Y.Y.ckpt`
- Last model: `logs/YYYY-MM-DD/HH-MM-SS/checkpoints/last.ckpt`

In [5]:
# Testing/Evaluation on trained model
# Replace the checkpoint path with your actual checkpoint path

# Example: Test on best checkpoint (greedy decoding)
# !python -m emg2qwerty.train \
#   user="single_user" \
#   checkpoint="logs/2024-01-15/14-30-45/checkpoints/epoch=3-val_CER=43.8.ckpt" \
#   train=False \
#   decoder=ctc_greedy

# Example: Test on last checkpoint (beam search decoding)
# !python -m emg2qwerty.train \
#   user="single_user" \
#   checkpoint="logs/2024-01-15/14-30-45/checkpoints/last.ckpt" \
#   train=False \
#   decoder=ctc_beam

# Example: Find and test on latest checkpoint automatically
import glob
from pathlib import Path

# Find latest checkpoint
checkpoint_dir = Path("logs")
checkpoints = list(checkpoint_dir.glob("**/checkpoints/*.ckpt"))
if checkpoints:
    latest_ckpt = max(checkpoints, key=lambda p: p.stat().st_mtime)
    print(f"Latest checkpoint found: {latest_ckpt}")
    print(f"\nTo test, uncomment and run:")
    print(f"!python -m emg2qwerty.train \\")
    print(f'  user="single_user" \\')
    print(f'  checkpoint="{latest_ckpt}" \\')
    print(f'  train=False \\')
    print(f'  decoder=ctc_greedy')
else:
    print("No checkpoints found. Train a model first.")

No checkpoints found. Train a model first.


#### Monitor Training with TensorBoard

Start TensorBoard in a separate terminal to monitor training in real-time.

In [6]:
# Start TensorBoard to monitor training
# Run this in a separate terminal (not in notebook):
# tensorboard --logdir logs/

# Or if you want to run it in the background from notebook:
import subprocess
import os
from pathlib import Path

log_dir = Path("logs")
if log_dir.exists():
    print("To view TensorBoard, run in a separate terminal:")
    print(f"  tensorboard --logdir {log_dir.absolute()}")
    print("\nThen open: http://localhost:6006")
    print("\nOr uncomment below to start TensorBoard in background:")
    # Uncomment to start TensorBoard (will run in background)
    # subprocess.Popen(["tensorboard", "--logdir", str(log_dir.absolute())], 
    #                  stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    # print("TensorBoard started! Open http://localhost:6006")
else:
    print("No logs directory found yet. Start training first.")

No logs directory found yet. Start training first.
