# Week 1 (Sep 1 - 7): Literature Review and Scope Definition

After my kickstarter meeting with Professor Rachlin, I began reading and documenting relevant papers in the fields: Neural Network Hyperparameter Tuning, Multi-Objective Optimization and Evolutionary Deep Learning. These works provided the necessary background to identify my data science thesis research area: Multi-Objective Hyperparameter Optimization (MOPHO) of Neural Network Architecture using Evolutionary Algorithms.

Rather than relying on manual experimentation, which currently functions as a “black art”, with no standardized protocol, my thesis explores using rational agents to iteratively evolve and optimize neural networks. At best, completion of this research may establish standardized protocols for neural network configuration;  at worst, it can illuminate how configuration specifications correlate to performance metrics.

These agents not only adjust high-level hyperparameters, such as learning rates or activation functions, but also actively modify the structure of the network itself, including the number of layers, types of layers, and the interconnections between them. By framing the problem as a multi-objective optimization task, we aim to balance competing goals such as accuracy, complexity, and training efficiency. This enables ML practitioners to understand and make informed tradeoffs when designing or tuning their models.

# Week 2 (Sep 8 - 14): Evo Framework and Strategy

This week, Professor Rachlin provided the most updated code of the Evo Framework to which I did a complete deep dive of the code base. First I went line by line, handwriting the code to understand the relationship between the Profile class, Environment class, use of decorators and the TA Assignment implementation. I ran the Evo Framework code in its current state to ensure I fully understood its operational flow. 

Once I grasped how to define agents, objectives, solutions and the environment, I created schematics, UML diagrams and pseudocode to map out Python scripts for the MOHPO (Multi-Objective Hyperparameter Optimization) problem. Further exploring papers, I began brainstorming potential objective functions (i.e. training time, memory size, number of neurons, number of layers), agents (i.e. change layer type, model optimizer,  neurons per layer, etc.)

As part of this phase, I’m beginning to ask more targeted implementation questions. For example, how should we express optimization of objectives (e.g., maximization vs. minimization) within the Evo framework? Simplifying accuracy into its complement or inverse would allow consistency of metric minimization, for example. I also considered how the neural network architecture will be represented and evolved. Data structure considerations emerged with options like dictionaries and class objects being considered. Deliberating with Professor Rachlin concluded that Solution would be a class object with parameters for the trained model, its configurations and performance metrics. As for which performance metrics, this naturally led into me examining which metrics of accuracy should be prioritized. For example, in domains like healthcare diagnostics, false negatives (Type II errors) can be significantly more dangerous than false positives, and such domain-specific consequences need to inform the multi-objective optimization goals.
Storage and model management are also emerging as key implementation concerns as this project would be running locally on my personal computer. I am exploring whether models should be stored and retrieved via serialized files (e.g., using pickle) or whether model objects should be managed entirely in memory by the system. 

# Week 3 (Sep 15 - 21): Keras Intro and Evo Integration

This week, I focused on learning how to use Keras for building neural networks and integrating it into the existing evolutionary framework. I followed the first three chapters of Deep Learning with Python, which helped me understand the basics of writing Keras code. Since memory and performance were concerns from last week, I explored the three main options for running Keras models: on a local CPU/GPU, on Google Colab, or on AWS EC2. To avoid a slow iteration cycle and a steep initial learning curve, I chose to work in Colab for now due to its friendlier interface compared to AWS and faster runtime than my local setup. However, I am planning to transition to AWS EC2 around Week 7 or 8.

After reviewing the TA Assignment implementation, I began drafting my main function in comments (or pseudocode). I needed to register objective functions and agents, load a dataset, and initialize a few sample solutions to form the first generation. For the dataset, I settled on using the Iris dataset because of its simplicity. I created a data-loading function that uses Seaborn to load the dataset, splits it into train/test sets, and standardizes the data (including encoding categorical variables) into NumPy arrays to ensure compatibility with TensorFlow models.

Next, I defined the objectives I wanted to minimize: error rate (1 - accuracy), number of layers, average number of nodes per layer, and training time. These metrics were straightforward to incorporate into the codebase for a first iteration just to map out the workflow as simply as possible. I also created agents, including adding/removing layers, shrinking/growing layers, changing activation functions for specific layers, and changing the model’s optimizer. To ensure that changes had real effects, I removed the current configuration from the list of available options.

I initialized a starting population of 10 solutions, each consisting of 3 layers with 3 to 10 nodes per layer. Since I couldn't modify Professor Rachlin’s Evolutionary Framework, I adjusted my agent functions to accept a list of solutions even though only one solution is typically expected. I also wrote functions to build, train, and evaluate models, using the defined hyperparameters to guide network construction. The Solution object stores hyperparameters and performance metrics as a dictionary and the trained model as a Tensorflow object. I believed that this was the best data structure to represent the design of a neural network architecture. 

Despite getting the code to run, the first working iteration resulted in an empty solution set. I suspected this might be related to how Pareto-optimal solutions were being calculated, so I experimented by increasing the number of layers to see if added complexity would improve outcomes but that didn’t resolve the issue. Eventually, I realized the root cause: when working in Colab, file and directory references behave differently. After the first 10 solutions were created, the function call terminated early, and no solutions were retained in the population.

Another issue I identified was that the constraints file had a maximum time limit, but my time measurements were in nanoseconds causing all models to exceed the limit and be discarded. This highlighted the importance of being attentive to even the smallest implementation details. I’ve since adjusted this.

Lastly, I removed the use of 'softmax' for now as an activation function, since it requires a specific number of output units, which adds unnecessary complexity at this stage. My goal remains to define the minimal viable neural network that can be evolved and evaluated effectively. As expected with simple models and a very small search space, initial performance was not strong, but the architecture seemed to work well. A set of Pareto optimal solutions were generated but the accuracy for many of them performed the same as unweighted randomization (⅓ accuracy for classification task of 3 categories).


* choosing the number of neuron in a layer
* should i be storing the models as pkl loads? and if self.model is not None then whatever? or should the computer be handling all of it in real time?
* we may need to enhance the evo framework to say maximizing or minimizing or objective. accuracy to error rate (just to get it to work)
* Deep Learning with Python textbook, approach with sense of curiosity
* script I might write to build the neural network could be driven by hyperparameters
* what we are going to do in order to make the objective function of just looking up the metric function. don't have toe rebuild the neural network every time you want to run an objective score.
* state variables of the pobject (private) and methods like train_model(input_data), build_model() (builds the model object alsoa state variable)
* understand OOD down
* do all of the testing as part of the solution object. done all of this in advanced.
* take that solution, run each objective function
* how do i want to present the design of the neural net arch?
* no. of layers, no. of neurons, backpropagation, number of epochs
* integrate filtering solutions
* create and build neural network based on random selection
* automate generation of hyperparamters sets
* ensure that the current evo framework produces a set of Pareto optimal solutions
* agents tweak the architecture, tweak the links in neural network itself multi-objective approach to evolve competing architectures

# Week 4 (Sep 22 - 28): Domain Focus and Code Optimization

This week initiated a major restructuring of the Multi-Objective Hyperparameter Optimization (MOPHO) framework. The project immediately shifted from the generic Iris dataset to the Wisconsin Breast Cancer (Diagnostic) dataset from the UCI ML repository, fulfilling the requirement to integrate biomedical data into the research, which aligns with Professor Rachlin's interests in bioinformatics. This transition necessitated critical technical corrections: the loss function was updated from sparse_categorical_crossentropy to binary_crossentropy to correctly handle the binary classification output, and the final layer configuration was fixed to use a single unit with a Sigmoid activation. 

Two key framework optimizations were implemented for better performance and rigor. First, the complexity objective, average_node_per_layer, was abandoned as ineffective because it ignored network depth; it was replaced with the Total Number of Neurons in the Network (node_count), which is a far more accurate measure of complexity. Second, the code was refactored to remove redundant storage of the dataset from every Solution instance, a critical step that significantly reduces memory overhead and prepares the system for scalability. To ensure all progress is trackable, an efficient, persistent record was established: every architecture generated is now appended to an all_solutions.jsonl file, utilizing the JSON Lines format for fast, non-blocking input/output, which allows for eventual analysis on how many configurations can be made in a short time span (e.g., aiming for 1000 architectures per hour).

# Week 5 (Sep 29 - Oct 5):  Advanced Performance Evaluation

I missed the meeting with Rachlin and prepared a report of the work I have done thus far. My computer broke this week and was having issues. Despite this, I was able to include more robust performance metrics to include Recall (Sensitivity), Specificity, Precision, and F1 Score. This addition allows for the direct examination of the trade-off between Type I and Type II errors. This is a critical consideration in medical diagnostics with Sensitivity addressing the risk of False Negatives (missing cancer). The evolution process was restructured into a robust two-stage filtering mechanism incorporating a dedicated Validation Set to prevent overfitting. The evolutionary phase proceeds as normal, achieving the initial Pareto optimal set using Test Set metrics. Upon completion, this Pareto set is subjected to the final stage: the models are re-evaluated on the separate Validation Set, and a final dominance check is run based on the new validation metrics. This process filters the initial set down to the final, most generalizable solutions, which are saved separately to a pickle file. This comprehensive archival of all solutions in the jsonl file, combined with the detailed final set metrics, enables the creation of a dashboard showing trade-offs between key objectives (e.g., complexity versus Sensitivity), directly addressing the core research goal of finding the simplest neural network that still maintains high, reliable performance.

* remove sol.data from each solution instance, should only be one
* include validation set to filter out the final pareto optimal solutions
    * keep the evolution as is. once you have finished, get the models from the solutions, validate them on the validation set, get the final scores and THEN show what the output is (maybe store all these metrics in metrics, but only focus on certain ones for objectives
* keep track of the number of different architectures you are able to generate per hour (10 is too few, 1000 is useful)
* write a report of what I have done so far. frame my thinking. organize my thoughts. resusable for the final report.

# Week 6 (Oct 6 - 12): 

???

# Week 7 (Oct 13 - 19): 

* overall accuracy, f1, sensitivity, specifity
* add epochs as a hyperparameter
* include addressing overfitting (looking for similar performance between test and train tests.
* consider different types of layers: add `layers.Dropout(0.5)` dropout layers (applied to layer than comes before it) pg 151
* include other accuracy metrics (recall, precision, sensitivity, etc)
* 10 papers a week in the field of Multi-Objective Optimization, Neural Network Hyperparameter Tuning, Evolutionary Computing
* how accurate is the network and there's different measurements of accuracy
* false positive, false negative (type I is fun, type II youre screwed in health diagnostics )




# Week 8 (Oct 20 - 26):  


* Manually do 10 neural networks and populate the table and experiment with different datasets
* read in research area and write a background of what's been done in this research area.
* read Rachlin Evo paper for DT
* Rachlin's paper of using Evo Comp to evolve Decision Trees
* 10 papers a week in the field of Multi-Objective Optimization, Neural Network Hyperparameter Tuning, Evolutionary Computing
* conduct a literature review of the field
* look for key words, formulas, nuggets and areas of confusion for each paper
* write summaries of every paper i read to be able to explain the problem thoroughly in the abstract
* demonstrate that i can apply the Evo framework in the context of the problem (refere to TA assignment for an example)
    * TA assignment and TRaveling Salesman Problem
* read in research area and write a background of what's been done in this research area
A taxonomy of evolutionary NAS techniques (genetic algorithms, neuroevolution, reinforcement-based NAS hybrids, etc.)

A summary matrix (Model × Search Method × Dataset × Application domain × Year)

A gap analysis showing underexplored combinations (e.g., GANs + evolutionary NAS for human data synthesis).

Recommendations for novel directions that are underrepresented in the literature.



# Week 9 (Oct 27 - Nov 2): 

For weeks I have been having issues getting the Evo Framework to run locally on my device. Leveraging AI to support debugging efforts, I learned that my previous Python environment was x86_64-based, which caused compatibility issues with TensorFlow and JAX due to AVX instruction requirements. I then installed an ARM-native version of Python 3.11 using Homebrew and created a new virtual environment (venv-arm) using this ARM Python. Within this new environment, I updated the requirements.txt to include ARM-compatible versions of the libraries used in the workflow, replacing tensorflow with tensorflow-macos to ensure proper Apple Silicon support. Finally, I installed all dependencies into the ARM virtual environment, verified that TensorFlow, Keras, and scikit-learn imported correctly, and confirmed that the code now runs locally without CPU architecture errors.

The hardest part right now is figuring out **what problem I should be solving** with this dataset. Since Professor Rachlin has a particular interest in **astronomy** and **biomedical** applications, I initially explored Jamaican datasets that could relate to those themes.  I already had `exoplanets.csv` from a previous assignment, and I remembered that Professor Rachlin had mentioned using **classification algorithms for time-series analysis of variable stars**. That got me interested in time-series data as a whole.   To explore Jamaica-specific options, I downloaded over **400 datasets** from [data.gov.jm](https://data.gov.jm/) and [statinja.gov.jm](https://statinja.gov.jm). I thought that analyzing **news outliers** to identify emerging topics might lead to something interesting. I became fascinated by whether there could be meaningful data related to **Hurricane Melissa** and how it might connect to climate or even astronomy-related research. I then remembered that the Jamaica AI Association just published a paper Towards Robust Speech Recognition for Jamaican Patois Music Transcription

While reviewing the [**Keras Documentation**](https://keras.io/examples/audio/), I looked for models that could align with my dataset and research focus.  Vocal Track Separation (Encoder–Decoder Architecture) wasn’t applicable because it requires a dataset of music tracks with isolated vocals, drums, bass, and other instruments. My dataset only includes full mixed tracks. Automatic Speech Recognition (ASR) with Transformers converts audio directly to text using sequence-to-sequence architectures. This aligns with my interest in **Generative AI** and could potentially convert **Jamaican Patois** into text.   However, my dataset contains **music in the background**, whereas ASR examples typically use clean speech. Additionally, my **text labels are not always accurate**, which could cause poor model performance.   ASR with Connectionist Temporal Classification model (CNN + RNN + CTC loss) doesn’t need perfectly aligned text and audio, which fits my dataset better. This could help transcribe **Patois lyrics**, filling a gap for platforms like Apple Music that often lack lyrics or lyric timing for Jamaican music.  It also provides more of a research contribution to **Evolutionary Neural Architecture Search (NAS)**, since **RNNs are underexplored** in that space. The downside is that even though CTC can handle alignment issues, it still depends on reasonably correct transcripts — and mine aren’t fully reliable. Music Generation with Transformer Models learn patterns from **MIDI files** (digital sheet music) to generate new music. I don’t have MIDI files, but my dataset includes raw **1D waveform vectors** (each 661,500 data points).   I successfully adapted the “visualize audio” section from the Keras documentation to my data, which worked well. This opens the possibility of **music generation grounded in a low-resource language** like Patois.  However, some generated examples I’ve heard from similar models sounded unnatural or unsettling. I’d prefer to avoid replicating that tone in my own project. Still, the idea of **AI-generated Jamaican music** remains very intriguing especially since the Jamaican population is very amused by AI-generated videos that emulate our dialect. If I do go through route of music generation, which also would be valuable to the Evo-NAS research area, as GANs are underexplored as well, given the density of my audio data, **MelGAN-based spectrogram inversion** might be a better approach for generating music-like sounds.  
I could also divide each 30-second clip into **10-second segments**, giving me more samples and reducing the computational load.

While exploring related work, I discovered the [**JukeBox Dataset**](https://iprobe.cse.msu.edu/dataset_detail.php?id=8&?title=JukeBox:_A_Speaker_Recognition_Dataset_with_Multi-lingual_Singing_Voice_Audio), which includes **multilingual singing voice recordings**.  This dataset could support experiments in **speaker or language recognition through music**, allowing me to study how models distinguish languages or accents. It also opens questions about **biases in speech and language recognition** across different linguistic and cultural groups. This new direction feels very promising. I could run multiple iterations using **songs from different regions or languages** and measure how accurately the model identifies them. Analyzing **misclassifications** could also reveal interesting insights — for example, which languages are confused with others, and why. For this new direction I would be very fascinated with incorporating Speaker Recognition, Audio Classification with STFT Spectrograms and English speaker accent recognition using Transfer Learning methodologies. This new pivot can also help with my inaccurate transcription problem! As well as my discomfort with generating audio that sounds off-putting. 


Testing right now with this dataset but the way its structured is different from mine so yea https://huggingface.co/datasets/AaronZ345/GTSinger/tree/main/English/EN-Alto-1/Breathy/god%20is%20a%20girl



what so right now i have 3 directories. english. chinese and patois. english and chinese are formatted similarly so we will focus on those first. what i need you to do is create a dataframe: import requests import soundfile as sf # read .wav using soundfile audio, sr = sf.read("chinese/0131.wav") display(visualize_audio(audio, sampling_rate=48000, seconds=30)) # store audio sample (waveform), sampling rate, duration and language a = pd.DataFrame(audio) a.shape output:(330240, 1) right now when i save one .wav file and make it pandas the dimensions are (330240, 1). but then another file pandas dimensions are (344160, 1) because the lengths are different. i want my finished dataframe to be (x + 3, 201) where each column is a file and the extra 3 rows are the sample rate, audio duration and language that it is in. x should be equal to the number of waveform samples from the longest duration file. based on this can you please create a dataframe













OMGGG STUDY CREOLE LANGUAGES!!! 
* Redware peoples, Ostionoid person
* Arawak, Taino, Maroons (Windward, Leeward, Karmahaly)
* West Africa, Sierra Leone, Ghana, Nigeria, Central Africa: Akan, Ashanti, Yoruba, Ibo, Ibibio, Twi, Ashanti
* Northern India (Uttar Pradesh, Bihar, the Central Provinces, Punjab and the North West Frontiers)
* Hong Kong and from the Kwang Tung Province in southeast China
* Palestine and Lebanon
* Irish, Spanish, English, Jews (Portuguese), Welsh, Scottish, (Italian?)
* Ethiopia, Madagascar



* be sure to specify how many of each class there is to look at it to accuracy
* is this just a resource allocation problem?
* understand learning the key properites of the data (EDA)



### CHOOSE METRICS (hyp x performance)

* keep metrics meaningful to the dataset (gain subject matter expertise on the data)

In neural network classification, the type of accuracy metric you choose depends on the structure of your output layer and the nature of your prediction task. In my model, when the task involves binary classification, the output layer uses a sigmoid activation, which produces a single probability between 0 and 1 for each example. In this case, BinaryAccuracy is appropriate, because it first thresholds the predicted probability (commonly at 0.5) to decide whether each sample is classified as 0 or 1, and then computes the proportion of correct classifications. Mathematically, it checks whether each predicted label matches the true label after thresholding.


For multi-class classification, my model instead uses a softmax activation, which outputs a probability distribution across multiple classes. In that scenario, either CategoricalAccuracy or SparseCategoricalAccuracy is appropriate, depending on how the true labels are formatted. If the labels are one-hot encoded (a vector where only one entry is 1), CategoricalAccuracy compares the index of the maximum predicted probability to the index of the true class. If the labels are stored as integers (e.g., 0, 1, 2, …), SparseCategoricalAccuracy performs the same comparison but without requiring one-hot encoding.


By contrast, Accuracy in Keras is a generic metric that simply checks equality between predicted and true labels—it doesn’t perform thresholding or class selection—so it’s best suited when predictions are already discrete. Choosing the correct accuracy metric ensures that the model’s reported performance correctly reflects the underlying decision process implied by the activation function: sigmoid for binary probabilities, softmax for multi-class distributions.


https://keras.io/examples/audio/uk_ireland_accent_recognition/#confusion-matrix

* explore the pricniples more deeply
*  what's the difference between neural network architecture and hyperparameters and model parameters
* keep all parameters regarding training (epochs, batch size (fixed))
* choosing the number of neuron in a layer, explore the pricniples more deeply
* Store accuracy metrics over splits to look at overfitting
* false positive and negatives have some societal impact and relevance. go through
* what parameters should be considered:
    * specific standardized acuracy. minimize the complexity of network, total number of nodes.


### PRETTY PRESENTATION
* confusion matrix is PERFECT
* show the diminishing returns (or a plateau)
* repeat for different random samples for another table to se if you are getting consistent results across simulations (generate as many as possible)
* class materials on evo framework and deepl learning
* visualize FP TP FN TN in confusion matrix (metrics should be consistent) for validation to unsure metrics are correct
*  create some artificats that can be re-used and back my conclusions. So that my work could be reproduced.
* expand output table to include entire population
* consolidate all the work done so far so Rachlin has a clear overview of everything
* show dashboard showing tradeoffs, so what's the relationship between the objectives (maybe can have pareto optimal solutions vs all solutions) --> is there a string positive correlation between increasing accuray and increasing complexity?
* getting the Pareto optimal set of solutions and evaluating the different trade-offs (also fundamental trade off between accuracy and complexity. simplest neural network that still has good accuracy)
* manually check that the metrics are correct


### WRITE-UPS
* givine protocol into neural network.
    * Built “scaling laws”: Basically, a formula that predicts how much better performance gets if you make the model bigger or add more data. This helps others decide what size model or dataset they’ll need without guessing
* read in research area and write a background of what's been done in this research area.
* read Rachlin Evo paper for DT
* Rachlin's paper of using Evo Comp to evolve Decision Trees
* give Rachlin daily updates on progress

* becoming an expert in Keras.
* might get an account on research cluster.



# Week 10 (Nov 3 - 9): 


*  evolutionary computing
* Track the generation that it was in

evo com in ML

multisequence alignment

classification algorithm

time series analysis

variable stars

decision trees

neural networks

optimisation algorithms

* becoming an expert in Keras
* evo deep learning book continuation
* connect with past mentees to get their perspective on how to impress him
* 'softmax' removing since requires unique specificity for units.. should i include it down the line?
* a couple books on O'Reilly that I should take a closer look at
* sit and go through the book the textbook he recommended
* extremely valuable. opporunity to become an expert
* agents are dumb, just run at random
* if you can optimize a 5 layer netowrk, you can get a 5 figure salary, if you otpimize 100 layer network you can get a 7 figure salary
* evo deep learning book: migenetic algorithms paper
* should there be some level of informed search where are seeing if a loinegae is evolving to get better or are they evolving to get worse
* can we do it where we remove non-dominated solutions and then just pick a neighbor so tweak it by out of all the choices of neighbors we HAVE not visited for that node

# Week 11 (Nov 10 - 16):
* create a visualization (use qr to scan) for people to see this project at work in an animation
* use cursor to create a frontend that allows people to understand the project



# Week 12 (Nov 17 - 23): 

* givine protocol into neural network.
* might get an account on research cluster.
* CNN for image analysis and image net competitions

# Week 13 (Nov 24 - 30): 
* put together something that demonstrates how time together

# Week 14 (Dec 1 - 7): 
* secure letter of recommendation

# Week 15 (Dec 8 - 14): 
* compare keras and pytorch