# CMSE 381 Final Project Template

**INSTRUCTIONS**: This is a template to help organize your project.  All projects should include the 5 major sections below (you do not need to use this template file).  If you use this file, complete your work below and remove content in parentheses. Also, remove this current cell.  

#### CMSE 381 Final Project
### &#9989; Group members: Mehrshad, Rithvik
### &#9989; Section_002
#### &#9989; 11/29/25

# ___PROJECT TITLE HERE___

## Background and Motivation

_(Provide context for the problem.  **Clearly state the question(s) you set
out to answer.**)_

## Methodology
_(How did you go about answering your question(s)? You should wrote some code here to demonstrate what the data is like and how in principle your method works. You can leave the variations of the related to specific results to the results section.)_

In [1]:
# ------------------------------------------------
# Import modules and find all CSV files
# ------------------------------------------------

import pandas as pd
import numpy as np
import glob
import os

# Path to folder containing the 193 CSV files
data_path = "Freiwald_Tsao_faceviews_AM_data_csv"

# list of all CSV files in that folder
csv_files = glob.glob(os.path.join(data_path, "*.csv"))

print("Number of CSV files found:", len(csv_files))
print("First 5 file names:")
for f in csv_files[:5]:
    print("  ", f)

Number of CSV files found: 193
First 5 file names:
   Freiwald_Tsao_faceviews_AM_data_csv/raster_data_bert_am_site070.csv
   Freiwald_Tsao_faceviews_AM_data_csv/raster_data_lupo_am_site181.csv
   Freiwald_Tsao_faceviews_AM_data_csv/raster_data_bert_am_site138.csv
   Freiwald_Tsao_faceviews_AM_data_csv/raster_data_bert_am_site110.csv
   Freiwald_Tsao_faceviews_AM_data_csv/raster_data_bert_am_site105.csv


### Data
_(Describe the data you are using. What variables are you using? What they mean? Why did you choose them?)_

In [2]:
# ------------------------------------------------
# Find the neuron file with the maximum number of trials
# ------------------------------------------------

trial_counts = {}

# Count number of rows (trials) for each file
for f in csv_files:
    df_temp = pd.read_csv(f, nrows=5)     # read only 5 rows to get columns
    full_df = pd.read_csv(f)              # load full file to get shape
    trial_counts[f] = full_df.shape[0]    # number of rows = trials

# Convert to sorted list (descending by number of trials)
sorted_trials = sorted(trial_counts.items(), key=lambda x: x[1], reverse=True)

print("Top 5 files with the most trials:")
for f, count in sorted_trials[:5]:
    print(f"{f}  -->  {count} trials")

# Choose the file with maximum trials
best_file = sorted_trials[0][0]

print("\nSelected file with highest number of trials:")
print(best_file)

# Load this file as the example_file for Cell 2+
example_file = pd.read_csv(best_file)

Top 5 files with the most trials:
Freiwald_Tsao_faceviews_AM_data_csv/raster_data_bert_am_site185.csv  -->  2685 trials
Freiwald_Tsao_faceviews_AM_data_csv/raster_data_bert_am_site186.csv  -->  2685 trials
Freiwald_Tsao_faceviews_AM_data_csv/raster_data_bert_am_site265.csv  -->  2431 trials
Freiwald_Tsao_faceviews_AM_data_csv/raster_data_lupo_am_site221.csv  -->  2312 trials
Freiwald_Tsao_faceviews_AM_data_csv/raster_data_lupo_am_site223.csv  -->  2312 trials

Selected file with highest number of trials:
Freiwald_Tsao_faceviews_AM_data_csv/raster_data_bert_am_site185.csv


In [3]:
# ------------------------------------------------
# Inspect the first neuron file
# ------------------------------------------------

# Load the first neuron file
example_file = pd.read_csv(best_file)

print("Inspecting:", best_file)
print("Shape (rows = trials, columns = labels + time bins):", example_file.shape)

# Show first 5 rows
display(example_file.head())

# Identify label columns
label_cols = [c for c in example_file.columns 
              if c.startswith("site_info") or c.startswith("labels")]

# Identify time-bin columns (these hold neural spikes)
time_cols = [c for c in example_file.columns if c.startswith("time")]

print("\nNumber of label columns:", len(label_cols))
print("Label columns:", label_cols)

print("\nNumber of time-bin columns:", len(time_cols))
print("First 5 time columns:", time_cols[:5])
print("Last 5 time columns:", time_cols[-5:])

Inspecting: Freiwald_Tsao_faceviews_AM_data_csv/raster_data_bert_am_site185.csv
Shape (rows = trials, columns = labels + time bins): (2685, 806)


Unnamed: 0,site_info.monkey,site_info.region,labels.stimID,labels.person,labels.orientation,labels.orient_person_combo,time.1_2,time.2_3,time.3_4,time.4_5,...,time.791_792,time.792_793,time.793_794,time.794_795,time.795_796,time.796_797,time.797_798,time.798_799,time.799_800,time.800_801
0,bert,am,1,1,front,front 1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,bert,am,1,1,front,front 1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,bert,am,1,1,front,front 1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,bert,am,1,1,front,front 1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,bert,am,1,1,front,front 1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0



Number of label columns: 6
Label columns: ['site_info.monkey', 'site_info.region', 'labels.stimID', 'labels.person', 'labels.orientation', 'labels.orient_person_combo']

Number of time-bin columns: 800
First 5 time columns: ['time.1_2', 'time.2_3', 'time.3_4', 'time.4_5', 'time.5_6']
Last 5 time columns: ['time.796_797', 'time.797_798', 'time.798_799', 'time.799_800', 'time.800_801']


In [4]:
# ------------------------------------------------
# Extract labels for this neuron
# ------------------------------------------------

# Identity labels
y_identity = example_file["labels.person"].values

# Orientation labels 
y_orientation_str = example_file["labels.orientation"].values

# Convert orientation strings to integer category codes
orientation_categories = pd.Categorical(y_orientation_str)
y_orientation = orientation_categories.codes  # integer encoding

# Print shapes
print("Identity labels shape:", y_identity.shape)
print("Orientation labels (string) shape:", y_orientation_str.shape)
print("Orientation labels (coded) shape:", y_orientation.shape)

# Unique identity values
print("\nUnique identities present:", np.unique(y_identity))
print("Number of unique identities:", len(np.unique(y_identity)))

# Unique orientations
print("\nUnique orientations (string):", orientation_categories.categories)
print("Number of unique orientations:", len(orientation_categories.categories))

# Orientation code -> label mapping
print("\nOrientation code mapping:")
for code, label in enumerate(orientation_categories.categories):
    print(f"  {code} -> {label}")

Identity labels shape: (2685,)
Orientation labels (string) shape: (2685,)
Orientation labels (coded) shape: (2685,)

Unique identities present: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25]
Number of unique identities: 25

Unique orientations (string): Index(['back', 'down', 'front', 'left 3/4', 'left profile', 'right 3/4',
       'right profile', 'up'],
      dtype='object')
Number of unique orientations: 8

Orientation code mapping:
  0 -> back
  1 -> down
  2 -> front
  3 -> left 3/4
  4 -> left profile
  5 -> right 3/4
  6 -> right profile
  7 -> up


In [5]:
# ------------------------------------------------
# Extract spike-count features for this neuron
# ------------------------------------------------

# first 200 ms of data
time_cols_0_200 = time_cols[:200]

print("Number of time bins used:", len(time_cols_0_200))
print("First 5 selected time bins:", time_cols_0_200[:5])
print("Last 5 selected time bins:", time_cols_0_200[-5:])

# Compute spike counts per trial
X_neuron = example_file[time_cols_0_200].sum(axis=1).values

print("\nFeature vector X_neuron shape:", X_neuron.shape)
print("First 10 spike counts:", X_neuron[:10])
print("Min/Max spike count:", X_neuron.min(), X_neuron.max())

Number of time bins used: 200
First 5 selected time bins: ['time.1_2', 'time.2_3', 'time.3_4', 'time.4_5', 'time.5_6']
Last 5 selected time bins: ['time.196_197', 'time.197_198', 'time.198_199', 'time.199_200', 'time.200_201']

Feature vector X_neuron shape: (2685,)
First 10 spike counts: [4 6 7 5 4 6 5 6 6 5]
Min/Max spike count: 0 9


In [6]:
# ------------------------------------------------
# Reshape feature vector to 2D matrix
# ------------------------------------------------

# Reshape into (n_samples, n_features)
X = X_neuron.reshape(-1, 1)

print("X shape:", X.shape)
print("y_identity shape:", y_identity.shape)
print("y_orientation shape:", y_orientation.shape)

# preview
print("\nFirst 10 rows of X:")
print(X[:10])

X shape: (2685, 1)
y_identity shape: (2685,)
y_orientation shape: (2685,)

First 10 rows of X:
[[4]
 [6]
 [7]
 [5]
 [4]
 [6]
 [5]
 [6]
 [6]
 [5]]


### Models for classification _(if applicable)_
_(What models will you be using for classification? Why did you choose to use them? What questions would you answer with them? How would you evaluate if each model? What cross-validation method did you use?)_

In [None]:
# you may add some code here to show how the model works in principle

### Models for regression _(if applicable)_
_(What models will you be using for regression? Why did you choose to use them? What questions would you answer with them? How would you evaluate if each model? What cross-validation method did you use?)_

In [None]:
# you may add some code here to show how the model works in principle

### Other methods used _(if applicable)_

_(If this is a preprocessing step to prepare your data for regression or classification models, you should put this subsection before your explanation for the regression or classification models.)_

_(What method did you use otherwise? Why did you choose to use them? What questions would you answer with them? How would you evaluate the results? What cross-validation method did you use when applicable?)_

In [None]:
# you may add some code here to show how the method works in principle

# you may add some code here to show how the model works in principle

## Results

_(What did you find when you carried out your methods? Some of your code related to
presenting results/figures/data may be replicated from the methods section or may only be present in
this section. All of the plots that you plan on using for your presentation should be present in this
section)_

### classification results
_(What are you trying to do here?)_

In [None]:
# how did you do it

_(How do you interpret what you see?)_

_(What are you doing next?)_

In [None]:
# how did you do it (etc. etc.)

### regression results
_(What are you trying to do here?)_

In [None]:
# how did you do it

_(How do you interpret what you see?)_

_(What are you doing next?)_

In [None]:
# how did you do it (etc. etc.)

### other results
_(What are you trying to do here?)_

In [None]:
# how did you do it

_(How do you interpret what you see?)_

_(What are you doing next?)_

In [None]:
# how did you do it (etc. etc.)

## Discussion and Conclusion

_(What did you learn from your results? What obstacles did you run into? What would you do differently next time? Clearly provide quantitative answers to your question(s)?  At least one of your questions should be answered with numbers.  That is, it is not sufficient to answer "yes" or "no", but rather to say something quantitative such as variable 1 increased roughly 10% for every 1 year increase in variable 2.)_

### discussion on the classification results

### discussion on the regression results

### discussion on the other results

### conclusion and future steps

## Author contribution

_(Please describe the contribution of each member of group)._

## References

_(List the source(s) for any data and/or literature cited in your project.  Ideally, this should be formatted using a formal citation format (MLA or APA or other, your choice!).   Multiple free online citation generators are available such as <a href="http://www.easybib.com/style">http://www.easybib.com/style</a>. **Important:** if you use **any** code that you find on the internet for your project you **must** cite it or you risk losing most/all of the points for you project.)_