# Project analysis Composer Classification

### Dataset Overview
 - There are 194 files in the **training set**
 - There are 35 files in the **test set**

The MIDI files are unique in that they do not directly store audio recordings, they instead represent the score of the music. A musical piece can be composed of multiple instruments with different timings and the MIDI files are organized in layers of instruments each with a timing, a note and a note velocity. It is intended to be played back with a midi synthesizer which reads in each instrument and renders the sound using a specified *“instrument”*. 

Each midi file can be reprsented as a *Piano Roll* which flattens all instruments in the file creating a unified score as if the entire orchestration is played on a single piano. This respresntation is a trade off on fidelity but lends itself well to ML applications. A piano roll can be visualied as an image 128px X N where N is the length of the musical piece sampled at a frequency fs. Here is an example Piano Roll of Bach's 3rd Cello Suite. Looking at the fingerprint of Bach, it is tempting to imagine that if you looked at enough of these images you could learn to distinguish different composers via pattern recognition. As an example, compare the fingerprint of *Mars, the Bringer of War* from The Planets Suite for Orchestra, op. 32 composed by Gustav Holst. This piece is light years apart compositionally from the Baroque and Romantic era composers in our positive class set of composers.

| Bach | Holst |
|------|-------|
|![Example Piano Roll](../resources/piano_roll.png "Bach's 3rd Cello Suite")|![Mars Piano Roll](../resources/mars.png "Holst - Mars")|


Thees two pieces have clearly different fingerprints but is that enough to build a robust classifier?

### Challenges / Assumptions
Several challenges for this task of 1-vs-All classification as outlined from a 30 second midi file of *live captured audio* recordings. The assumption is that each MIDI is a live recording of a composition that is being classified in real time
1.  Musicians are either using MIDI instruments or a process exists to capture the audio and generate MIDI files
2.  There is a limited amount of data, and no given counter examples for the negative class
    1.   This could be augmented with an external datasource but the data will be from a different distribution from the process in *1.* such as the Holst piece.
    2.   If we augment the data with random midi files of different borque/romantic era composers (versus modern classical such as Holst vs modern pop music such as Taylor Swift) it's possible our model will learn to detect a different distribution vs detect the composer and give misleading performance characteristics at development time. Clarification with the client is needed to determine the best path forward.
3. In the training set there is a large class imbalance for Beethoven. It is unclear if this is the intended inference-time distribution and this will affect pre-training data processing and system configuration
4. The client provided no mention of the pipeline system the requsted system should be a part of. This requirement will need clairification with the client
5. Classical ML methods are preferred which presents a challenge mainly around feature engineering: Features must be manually extracted from the dataset and this will be very difficult. We can create features but domain expertise is likely needed if classical ML is a must have for this project. We need to understand the challenge for the client and the best path forward. It can be argued that CNNs or LSTM are classical in comparison to LLMs. TBD
6. The challenge specifically with detecting a composer vs the **audio search**  model (*Shazam*) of detecting pre-recorded music is that for pre-recorded music, there is a [sonic fingerprint](https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf) vs in a live recording the score is interpreted by the conductor and each musician. This further complicates the feature engineering. Others have attempted composer classification directly from the score via the **kern** format [Classifying Musical Scores by Composer:
A machine learning approach](https://cs229.stanford.edu/proj2008/LebarChangYu-ClassifyingMusicalScoresByComposer.pdf) but this is not what the client has asked specifically for. To clarify the difficulty of this task, if the Shazam model is using audio fingerprinting, what this task is attempting to do is recognize a composer by reading a low fidelity partial *smudged fingerprint*... An interesting challenge. Again, we need to better understand the challenge the client is trying to solve / success factors and constraints so that we don't overpromise on this project. 


### EDA and Initial Modeling
Initial modeling was done by manually engineering features and performing K-means clustering and Guassian Mixture Models with and without kernel tricks. The training set was further split into a validation set stratified on each composition. For each piece, 9 30-second slices where extracted at a frequency of 1/100s resulting in a 128x3000 frame. The frames were sampled randomly from the piece however the first 30 seconds and last 30 seconds were always included for each piece. 




### Recommendations

1. Clarify Project contraints/success factors with the client. We need to understand what the end goal for the client is in order to be successful. 
2. The current project as defined with only the provided data and curent constraints is likely to be unsuccessful and risks brand reputation for SFL due to client dissatisfaction in performance. If we must deliver a model with the data provided under the assumptions/guidelines given I would recommend passing on this project.



### References
| Project/Reference | License | Source | Usage Description |
|-------------------|---------|--------|-------------------|
|Pretty Midi | MIT | https://github.com/craffel/pretty-midi | MIDI file manipulation and data extraction|
|Tensorflow | Apache 2.0 | https://github.com/tensorflow/tensorflow/blob/master/LICENSE | Modeling |
|SKLearn | BSD 3-Clause | https://github.com/scikit-learn/scikit-learn?tab=BSD-3-Clause-1-ov-file | Modeling |
|SKLearn | BSD 3-Clause | https://github.com/scikit-learn/scikit-learn?tab=BSD-3-Clause-1-ov-file | Modeling |



### Literature Review
| Title | Description | Source  |
|-------|-------------|---------|
|The Classification of musical scores by composer |Research into the base problem presented by the client, inspiration for *Piano Roll* representation used in these models |https://cs230.stanford.edu/projects_fall_2018/reports/12441334.pdf|
|Classifying Musical Scores by Composer |Classical Methods reference |https://cs229.stanford.edu/proj2008/LebarChangYu-ClassifyingMusicalScoresByComposer.pdf]|
|Classifying Musical Scores by Composer|Citation Graph |https://www.connectedpapers.com/main/50080fb82064b952d0450e9dced8c9536a129d35/Classifying-Musical-Scores-by-Composer-%3A-A-machine-learning-approach/graph|
|Music Genre Classification Using MIDI and Audio Features | Citation Graph |https://www.connectedpapers.com/main/ded47f6f02fd24e752fd8853f2702b43db277904/Music-Genre-Classification-Using-MIDI-and-Audio-Features/graph|
|An Industrial-Strength Audio Search Algorithm | Shazam method  |https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf|
