### fennec-ml Example  
This example demonstrates the use of the data utilites in the fennec-ml library

#### Setup
>1. Make sure data_utils_example.ipynb (this file) is in its own folder. This example program will create all other files and folders it needs 
>2. Install fennec-ml with:  
```pip install fennec-ml```  
fennec-ml will install all other dependencies needed


>>NOTE: If you have already installed fennec-ml, you need to make sure your library is up to date so use  
```pip install --upgrade fennec-ml```


#### **Make example files**  

##### YOU DON'T NEED TO UNDERSTAND THE FOLLOWING CODE, IT JUST DOES THE FOLLOWING SETUP

**Create excel files** <br>
The following code will make 5 example Excel files named according to the 2024-2025 naming convention. A file will be made for each CG category: AA, BB, CC, DD, and EE. <br>
>- Each of the five Excel files contains a single worksheet named "sheet 1". The worksheet has five columns labeled "col 1", "col 2", "col 3", "col 4", and "col 5", and contains 100 rows. <br>
>- Every cell in a worksheet contains the same number, which is specific to that file: the file with label AA contains only 1s, BB contains only 2s, CC contains only 3s, DD contains only 4s, and EE contains only 5s. <br>
>- **Please go look at them if you need to**

The Excel files will be stored in ```testing_data\raw_data``` <br><br>
**Create Vars of Interest**<br>
A ```vars_of_interest.json``` file will be created in the main folder containing the following json: <br>
>{<br>"sheet 1": ["col 1", "col 2", "col 3", "col 4", "col 5"]<br>} <br>
- This will direct data_cleaner() to return every column in sheet 1


In [1]:
# import libraries to make example files
import os
import json
import random
import numpy as np
import pandas as pd

# --- FOLDER CONFIG ---
data_dir = os.path.join(os.getcwd(), "testing_data")
os.makedirs(data_dir, exist_ok=True)
excel_dir = os.path.join(data_dir, "raw_data")
os.makedirs(excel_dir, exist_ok=True)
csv_dir = os.path.join(data_dir, "proccessed_data")
os.makedirs(csv_dir, exist_ok=True)

# Create the folder if it doesn't exist
os.makedirs(excel_dir, exist_ok=True)

# Clear the folder if files already exist
for f in os.listdir(excel_dir):
    file_path = os.path.join(excel_dir, f)
    if os.path.isfile(file_path):
        os.remove(file_path)

# Map labels to numbers
label_map = {
    "AA": 1,
    "BB": 2,
    "CC": 3,
    "DD": 4,
    "EE": 5
}

# Generate the Excel files
for label, num in label_map.items():
    # Create the data: 100 rows Ã— 5 columns filled with the same number
    data = np.full((100, 5), num)
    df = pd.DataFrame(data, columns=[f"col {i+1}" for i in range(5)])

    # Filename matching your pattern
    filename = f"clip{123}B{45}_{label}_L_{0}.xlsx"
    filepath = os.path.join(excel_dir, filename)

    # Save to Excel with a single sheet called "sheet 1"
    df.to_excel(filepath, sheet_name="sheet 1", index=False)

    print(f"Created: {filename}")

# --- MAKE VARS OF INTEREST ---
base_dir = os.getcwd()
json_path = os.path.join(base_dir, "vars_of_interest.json")

# Data to write
vars_of_interest = {
    "sheet 1": ["col 1", "col 2", "col 3", "col 4", "col 5"]
}

# Write to JSON file
with open(json_path, "w") as f:
    json.dump(vars_of_interest, f, indent=4)

print(f"Created {json_path}")

Created: clip123B45_AA_L_0.xlsx
Created: clip123B45_BB_L_0.xlsx
Created: clip123B45_CC_L_0.xlsx
Created: clip123B45_DD_L_0.xlsx
Created: clip123B45_EE_L_0.xlsx
Created /Users/willsstoddard/Documents/Development/Python/FENNEC/FENNEC-25_26/vars_of_interest.json


#### Using fennec-ml <br>
We will use fennec-ml to proccess the data created above by converting .xlsx to .csv's, scaling the .csv's, extracting the charictarization labels, and segmenting the final data into training, validation, and testing sets<br><br>
**Output**
>The final result of our work will be a dictionary with 3 keys: *Training_Set*, *Validation_Set*, and *Testing_Set*
>>The value of each set is a dict with the follwing keys: *sets* and *labels* <br>
>>- *sets* : list of sets
>>- *labels* : list of labels<br>

Usage ex: 
```python
data_dictionary['Training_Set']['sets']
```


In [3]:
# --- IMPORT FENNEC-ML --- 
import fennec_ml as fn # note the underscore

# --- EXCEL TO CSV ---
print("Using folder_cleaner() to convert from raw excel files to useful .csv's.\nfolder_cleaner() output:")
fn.folder_cleaner(excel_dir, csv_dir, overwrite= True)

# --- SCALING AND LABELS ---
scaled_data = fn.standardize(csv_dir)
# scaled_data = fn.normalize(csv_dir)
labels = fn.get_1D_CG_labels(csv_dir)

# --- SEGMENTING AND SPLITTING ---
print("\nUsing segment_and_split() to cut data into training/validtion/testing sets\nsegment_and_split() output:")
# using default 70%/15%/15% train/val/test split
dataset_dict = fn.segment_and_split(scaled_data, labels, timesteps = 10)

Using folder_cleaner() to convert from raw excel files to useful .csv's.
folder_cleaner() output:
clip123B45_BB_L_0.xlsx processed and saved to /Users/willsstoddard/Documents/Development/Python/FENNEC/FENNEC-25_26/testing_data/proccessed_data as clip123B45_BB_L_0.csv
clip123B45_EE_L_0.xlsx processed and saved to /Users/willsstoddard/Documents/Development/Python/FENNEC/FENNEC-25_26/testing_data/proccessed_data as clip123B45_EE_L_0.csv
clip123B45_AA_L_0.xlsx processed and saved to /Users/willsstoddard/Documents/Development/Python/FENNEC/FENNEC-25_26/testing_data/proccessed_data as clip123B45_AA_L_0.csv
clip123B45_CC_L_0.xlsx processed and saved to /Users/willsstoddard/Documents/Development/Python/FENNEC/FENNEC-25_26/testing_data/proccessed_data as clip123B45_CC_L_0.csv
clip123B45_DD_L_0.xlsx processed and saved to /Users/willsstoddard/Documents/Development/Python/FENNEC/FENNEC-25_26/testing_data/proccessed_data as clip123B45_DD_L_0.csv

Using segment_and_split() to cut data into training

#### Verify Results  
To verify the results, the below code will print out the correct value stored in each type of dataset.

It will then print out the first 5 labels in the Training_Set and the corresponding datasets for you to verify

In [6]:
# Verification
print("Classification Key:")
for i in range(len(labels)):
    print(f"{labels[i]} corresponds to {scaled_data[i][1][1]}")

print("\nFirst 5 training sets")
for i in range(5):
    print(f"\nTraining_Set Label: {(dataset_dict['Training_Set']['labels'][i])}")
    print(f"Training_Set Dataset: \n{(dataset_dict['Training_Set']['sets'][i])}")

Classification Key:
AA corresponds to -1.414213562373095
BB corresponds to -0.7071067811865475
CC corresponds to 0.0
DD corresponds to 0.7071067811865475
EE corresponds to 1.414213562373095

First 5 training sets

Training_Set Label: CC
Training_Set Dataset: 
[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]

Training_Set Label: DD
Training_Set Dataset: 
[[0.70710678 0.70710678 0.70710678 0.70710678 0.70710678]
 [0.70710678 0.70710678 0.70710678 0.70710678 0.70710678]
 [0.70710678 0.70710678 0.70710678 0.70710678 0.70710678]
 [0.70710678 0.70710678 0.70710678 0.70710678 0.70710678]
 [0.70710678 0.70710678 0.70710678 0.70710678 0.70710678]
 [0.70710678 0.70710678 0.70710678 0.70710678 0.70710678]
 [0.70710678 0.70710678 0.70710678 0.70710678 0.70710678]
 [0.70710678 0.70710678 0.70710678 0.70710678 0.70710678]
 [0.70710678 0.70710678 0.70710678 0.70710678 0.