# Understanding Collected MIDI Data

- 32 Golberg Variations from J.S. Bach are here http://www.jsbach.net/midi/midi_goldbergvariations.html
- The Beethoven Piano Sonatas are here https://www.suzumidi.com/eng/beethov1.htm

1. **Select Music**: 
    - You choose a collection of simple classical piano pieces available in MIDI format.
  
2. **Convert to Text**: 
    - You write a Python script using a library like `pretty_midi` to read MIDI files and extract note and duration information.
    - For example, a C4 note held for a quarter note might be converted to `(60, 0.25)`.

```python
# Example code snippet for converting a single MIDI file to text
import pretty_midi

# Load MIDI file
midi_data = pretty_midi.PrettyMIDI('example_song.mid')

# Extract notes from the first instrument track
notes = midi_data.instruments[0].notes

# Convert notes to text format
text_notes = [(note.pitch, note.end - note.start) for note in notes]
```

In [23]:
import pretty_midi

In [24]:
# Load MIDI file - Piano Sonata No.8 C Minor Op.13 "Pathetic" first movement
midi_data = pretty_midi.PrettyMIDI('../data/external/The_Beatles_-_While_My_Guitar_Gently_Weeps_Piano_Accompaniment.mid')

In [25]:
for index, instrument in enumerate(midi_data.instruments):
    print(f"Instrument {index}: {instrument.name}")

Instrument 0: 
Instrument 1: 


In [26]:
# Extract notes from the first instrument track
notes = midi_data.instruments[0].notes
notes[0:10]

[Note(start=0.000000, end=0.355208, pitch=64, velocity=80),
 Note(start=0.000000, end=0.355208, pitch=69, velocity=80),
 Note(start=0.375000, end=0.492708, pitch=69, velocity=80),
 Note(start=0.500000, end=0.855208, pitch=69, velocity=80),
 Note(start=0.875000, end=0.992708, pitch=69, velocity=80),
 Note(start=1.000000, end=1.355208, pitch=69, velocity=80),
 Note(start=1.375000, end=1.730208, pitch=69, velocity=80),
 Note(start=1.750000, end=1.986458, pitch=69, velocity=80),
 Note(start=2.000000, end=2.355208, pitch=64, velocity=80),
 Note(start=2.000000, end=2.355208, pitch=69, velocity=80)]

In [27]:
# Extract notes from the first instrument track
notes = midi_data.instruments[1].notes
notes[0:10]

[Note(start=0.000000, end=0.355208, pitch=57, velocity=80),
 Note(start=1.750000, end=1.986458, pitch=57, velocity=80),
 Note(start=2.000000, end=2.355208, pitch=55, velocity=80),
 Note(start=3.750000, end=3.986458, pitch=55, velocity=80),
 Note(start=4.000000, end=4.355208, pitch=54, velocity=80),
 Note(start=5.750000, end=5.986458, pitch=54, velocity=80),
 Note(start=6.000000, end=7.423958, pitch=53, velocity=80),
 Note(start=7.750000, end=7.986458, pitch=53, velocity=80),
 Note(start=8.000000, end=8.711458, pitch=57, velocity=80),
 Note(start=9.500000, end=9.973958, pitch=57, velocity=80)]

Let's break down the code and output step by step:

### Code:

```python
notes = midi_data.instruments[0].notes
notes[0:10]
```

### Explanation:

1. **`midi_data.instruments[0]`**:
   - `midi_data` is an object representing the loaded MIDI file.
   - `instruments` is a list of all the instrument tracks in the MIDI file.
   - `midi_data.instruments[0]` accesses the first instrument track in the MIDI file.

2. **`.notes`**:
   - Each instrument track in a MIDI file contains a sequence of notes.
   - `.notes` is an attribute of the instrument object that returns a list of all the notes in that instrument track.

3. **`notes[0:10]`**:
   - This slices the list of notes to retrieve the first 10 notes from the instrument track.

### Output:

```
[Note(start=0.416666, end=0.625000, pitch=67, velocity=64),
 ...
 Note(start=2.708332, end=2.916665, pitch=67, velocity=64)]
```

### Explanation:

Each `Note` object in the output represents a musical note and has the following properties:

1. **`start` and `end`**:
   - These represent the start and end times of the note, respectively, in seconds. The difference between the end and start times gives the duration of the note.
   - For instance, the first note starts at `0.416666` seconds and ends at `0.625000` seconds, making its duration `0.208334` seconds.

2. **`pitch`**:
   - This represents the frequency of the note in MIDI terms.
   - In MIDI, pitch is represented as an integer where middle C (C4) is 60. The numbers increase or decrease by one for each half step (or semitone). For example, a pitch of 67 corresponds to G4.

3. **`velocity`**:
   - This represents how hard the note is struck.
   - In the context of a piano, a higher velocity means the key was pressed harder, resulting in a louder sound. In MIDI, velocity values range from 0 (softest) to 127 (loudest). Here, all the notes have a velocity of 64, which is a medium level.

The second code block and output follow the same logic but for the second instrument track (`midi_data.instruments[1]`). This track likely represents a different part of the piano piece, possibly the left hand if the first track represents the right hand.

In the context of a piano MIDI file and the `pretty_midi` library, these code blocks and outputs show the first 10 notes of two separate instrument tracks, likely representing the right and left hands of a piano piece. Each note is characterized by its start time, end time, pitch, and velocity.

## Pitch-Duration Simplification

In [38]:
# Extract notes from the first instrument track
notes = midi_data.instruments[0].notes
# Convert notes to text format
text_notes_rh = [(note.pitch, note.end - note.start) for note in notes]
text_notes_rh

[(64, 0.35520833333333335),
 (69, 0.35520833333333335),
 (69, 0.1177083333333333),
 (69, 0.35520833333333335),
 (69, 0.1177083333333333),
 (69, 0.35520833333333335),
 (69, 0.35520833333333335),
 (69, 0.23645833333333321),
 (64, 0.3552083333333331),
 (69, 0.3552083333333331),
 (69, 0.1177083333333333),
 (69, 0.3552083333333331),
 (69, 0.1177083333333333),
 (69, 0.3552083333333331),
 (69, 0.3552083333333331),
 (69, 0.23645833333333321),
 (62, 0.35520833333333357),
 (69, 0.35520833333333357),
 (69, 0.11770833333333286),
 (69, 0.35520833333333357),
 (69, 0.11770833333333286),
 (69, 0.35520833333333357),
 (71, 0.35520833333333357),
 (72, 0.23645833333333321),
 (65, 1.4239583333333332),
 (69, 1.4239583333333332),
 (74, 1.4239583333333332),
 (72, 0.23645833333333321),
 (74, 0.23645833333333321),
 (69, 0.7114583333333329),
 (72, 0.7114583333333329),
 (76, 0.7114583333333329),
 (72, 0.7114583333333329),
 (69, 0.4739583333333339),
 (62, 0.7114583333333329),
 (67, 0.7114583333333329),
 (74, 0.711

In [39]:
# Extract notes from the first instrument track
notes = midi_data.instruments[1].notes
# Convert notes to text format
text_notes_lh = [(note.pitch, note.end - note.start) for note in notes]
text_notes_lh

[(57, 0.35520833333333335),
 (57, 0.23645833333333321),
 (55, 0.3552083333333331),
 (55, 0.23645833333333321),
 (54, 0.35520833333333357),
 (54, 0.23645833333333321),
 (53, 1.4239583333333332),
 (53, 0.23645833333333321),
 (57, 0.7114583333333329),
 (57, 0.4739583333333339),
 (55, 0.7114583333333329),
 (55, 0.4739583333333339),
 (54, 0.7114583333333329),
 (54, 0.4739583333333339),
 (52, 0.4739583333333339),
 (52, 0.23645833333333321),
 (52, 0.23645833333333321),
 (57, 0.47395833333333215),
 (55, 0.47395833333333215),
 (54, 0.47395833333333215),
 (53, 0.47395833333333215),
 (53, 0.23645833333333144),
 (57, 0.47395833333333215),
 (57, 0.23645833333333144),
 (55, 0.47395833333333215),
 (55, 0.23645833333333144),
 (57, 0.23645833333333144),
 (59, 0.23645833333333144),
 (52, 0.47395833333333215),
 (57, 0.4739583333333357),
 (55, 0.4739583333333357),
 (54, 0.4739583333333357),
 (53, 0.4739583333333357),
 (53, 0.23645833333333144),
 (57, 0.4739583333333357),
 (57, 0.4739583333333357),
 (55, 0


### Original Code:

The original code extracts the first 10 notes from the first instrument track and displays them as they are:

```python
# Extract notes from the first instrument track
notes = midi_data.instruments[0].notes
notes[0:10]
```

The output is a list of `Note` objects, each with properties like `start`, `end`, `pitch`, and `velocity`.

### Transformation:

The goal is to simplify the representation of each note by only considering its pitch and duration (calculated as the difference between the `end` and `start` times).

To achieve this, we use a list comprehension:

```python
# Convert notes to text format
text_notes = [(note.pitch, note.end - note.start) for note in notes]
```

Here's what's happening:

1. **`note.pitch`**: This extracts the pitch of the note, which is an integer representation of the note's frequency in MIDI terms.

2. **`note.end - note.start`**: This calculates the duration of the note by subtracting its start time from its end time.

The result is a list of tuples, where each tuple represents a note by its pitch and duration.

### Final Code:

Combining everything, the final code looks like this:

```python
# Extract notes from the first instrument track
notes = midi_data.instruments[0].notes
# Convert notes to text format
text_notes = [(note.pitch, note.end - note.start) for note in notes]
text_notes[0:10]
```

The output is a simplified representation of the first 10 notes, showing
only their pitch and duration. This format is more concise and can be
more easily processed or analyzed, especially if you're not interested
in other properties like velocity or exact start/end times.

The simplified representation focuses only
on the pitch and duration of each note, and in doing so, it omits the
velocity information. 

Velocity in MIDI terms represents the "strength" or "intensity" with
which a note is played. On a piano, it corresponds to how hard a key is
pressed, which affects the loudness and timbre of the note. By omitting
velocity, you lose information about the dynamics and expressiveness of
the performance.

## Pitch-Duration-Velocity Simplification

If you want to retain the velocity information along with the pitch and
duration, you can modify the list comprehension to include it:

```python
text_notes_with_velocity = [(note.pitch, note.end - note.start, note.velocity) for note in notes]
```

This will give you a list of tuples where each tuple represents a note
by its pitch, duration, and velocity.

In [30]:
text_notes_with_velocity = [(note.pitch, note.end - note.start, note.velocity) for note in notes]
text_notes_with_velocity

[(57, 0.35520833333333335, 80),
 (57, 0.23645833333333321, 80),
 (55, 0.3552083333333331, 80),
 (55, 0.23645833333333321, 80),
 (54, 0.35520833333333357, 80),
 (54, 0.23645833333333321, 80),
 (53, 1.4239583333333332, 80),
 (53, 0.23645833333333321, 80),
 (57, 0.7114583333333329, 80),
 (57, 0.4739583333333339, 80),
 (55, 0.7114583333333329, 80),
 (55, 0.4739583333333339, 80),
 (54, 0.7114583333333329, 80),
 (54, 0.4739583333333339, 80),
 (52, 0.4739583333333339, 80),
 (52, 0.23645833333333321, 80),
 (52, 0.23645833333333321, 80),
 (57, 0.47395833333333215, 80),
 (55, 0.47395833333333215, 80),
 (54, 0.47395833333333215, 80),
 (53, 0.47395833333333215, 80),
 (53, 0.23645833333333144, 80),
 (57, 0.47395833333333215, 80),
 (57, 0.23645833333333144, 80),
 (55, 0.47395833333333215, 80),
 (55, 0.23645833333333144, 80),
 (57, 0.23645833333333144, 80),
 (59, 0.23645833333333144, 80),
 (52, 0.47395833333333215, 80),
 (57, 0.4739583333333357, 80),
 (55, 0.4739583333333357, 80),
 (54, 0.47395833333

## Pedal Information

Pedal information is crucial for capturing the nuances of a piano
performance, especially in classical music. In MIDI files, pedal data is
typically represented as control change events rather than note events.
The sustain pedal, which is the most commonly used pedal on the piano,
corresponds to MIDI control number 64.

In the `pretty_midi` library, control change events can be accessed
through the `control_changes` attribute of an instrument.

Here's how you can extract sustain pedal data from a MIDI file using
`pretty_midi`:

```python
# Extract control changes from the first instrument track
control_changes = midi_data.instruments[0].control_changes

# Filter for sustain pedal events (MIDI control number 64)
sustain_pedal_events = [cc for cc in control_changes if cc.number == 64]
```

Each item in `sustain_pedal_events` will have a `time`, `number`, and
`value` attribute:

- `time`: The time at which the control change event occurs.
- `number`: The control number (should be 64 for sustain pedal events).
- `value`: The value of the control change. For the sustain pedal, a
  value of 0 typically means the pedal is released, while a value of 127
  means the pedal is fully depressed. Values in between can represent
  partial pedal presses, though in practice, many MIDI recordings only
  use 0 and 127.

If you want a comprehensive representation of a piano performance in a
MIDI file, you'd ideally want to consider note events (with pitch,
duration, and velocity) and control change events (especially the
sustain pedal) together. This will give you a more complete picture of
both the notes being played and the expressive elements of the
performance, such as dynamics and pedaling.

In [37]:
# Extract control changes from the first instrument track
control_changes = midi_data.instruments[0].control_changes

# Filter for sustain pedal events (MIDI control number 64)
sustain_pedal_events = [cc for cc in control_changes if cc.number == 64]
sustain_pedal_events

[ControlChange(number=64, value=127, time=0.002083),
 ControlChange(number=64, value=0, time=2.001042),
 ControlChange(number=64, value=127, time=2.002083),
 ControlChange(number=64, value=0, time=4.001042),
 ControlChange(number=64, value=127, time=4.002083),
 ControlChange(number=64, value=0, time=6.001042),
 ControlChange(number=64, value=127, time=7.502083),
 ControlChange(number=64, value=0, time=10.001042),
 ControlChange(number=64, value=0, time=10.001042),
 ControlChange(number=64, value=0, time=10.001042),
 ControlChange(number=64, value=127, time=10.002083),
 ControlChange(number=64, value=127, time=10.002083),
 ControlChange(number=64, value=127, time=10.002083),
 ControlChange(number=64, value=0, time=12.001042),
 ControlChange(number=64, value=127, time=12.002083),
 ControlChange(number=64, value=0, time=14.501042),
 ControlChange(number=64, value=0, time=14.501042),
 ControlChange(number=64, value=127, time=14.502083),
 ControlChange(number=64, value=127, time=14.502083),

## Other information included in a MIDI file

A MIDI file, especially one representing a piano performance, can
contain a wealth of information beyond just the notes and pedal data.
Here's a breakdown of some of the other types of information you might
find in such a MIDI file:

1. **Meta Events**:
   - **Track Name**: A name or title for a particular track or the
     entire piece.
   - **Lyrics**: If the piece has lyrics, they can be embedded in sync
     with the music.
   - **Markers**: Points of interest or significance within the track.
   - **Tempo Changes**: Changes in the playback speed throughout the
     piece.
   - **Time Signature**: Information about the number of beats per
     measure and which note gets the beat.
   - **Key Signature**: Information about the key of a section of music
     (e.g., C Major, D Minor).

2. **Control Changes**:
   - **Modulation (Control number 1)**: Often associated with vibrato or
     other expressive effects.
   - **Expression (Control number 11)**: Can adjust the volume of a note
     apart from its velocity, allowing for crescendos and decrescendos
     during a sustained note.
   - **Soft Pedal (Control number 67)**: Represents the una corda pedal
     on a piano, which softens the sound.
   - **Sostenuto Pedal (Control number 66)**: Holds certain notes while
     allowing others to be released.

3. **Program Changes**: MIDI allows for different instrument sounds
   (called patches). A program change event switches to a different
   instrument sound. For a solo piano piece, you might not see this, but
   in a multi-instrument MIDI file, program changes are common.

4. **Pitch Bend**: Represents a change in pitch, similar to sliding
   between notes or bending a string on a guitar.

5. **Aftertouch**:
   - **Channel Aftertouch**: A pressure value for the entire channel,
     often related to the pressure applied to keys after they've been
     struck.
   - **Polyphonic Aftertouch**: A pressure value for individual notes.

6. **System Exclusive (SysEx) Messages**: These are
   manufacturer-specific messages that can be used for various purposes,
   such as configuring a specific piece of MIDI hardware.

7. **End of Track**: Marks the end of a track.

8. **MIDI Channels**: MIDI files can have up to 16 channels, which can
   be thought of as separate "lines" or "voices" in the music. In a solo
   piano piece, you might only see one or two channels, but in ensemble
   pieces, different instruments or parts might be assigned to different
   channels.

9. **Other Meta Events**: Text events, copyright notices,
   sequencer-specific metadata, and more.

When working with the `pretty_midi` library in Python, many of these
events and data types can be accessed and manipulated through the
library's API. Depending on the complexity of the MIDI file and the
intentions of its creator, not all of these data types may be present,
but they represent the range of possibilities in the MIDI standard.

## Tempo Changes

In [32]:
tempo_changes = midi_data.get_tempo_changes()
print(tempo_changes)



(array([0.]), array([120.]))


## Key Signature

In [33]:
time_signatures = midi_data.time_signature_changes
for ts in time_signatures:
    print(ts)

4/4 at 0.00 seconds


## File Length

In [34]:
print("Length in seconds:", midi_data.get_end_time())


Length in seconds: 178.0


## File Resolution

In [35]:
print("Resolution:", midi_data.resolution)


Resolution: 480


To explore all the information in your MIDI file using the `pretty_midi` library, you'll want to access and inspect various attributes and methods provided by the library. Here's a step-by-step guide to help you explore the contents of your MIDI file:

1. **Load the MIDI File**:
   First, you'll need to load your MIDI file into a `PrettyMIDI` object.

   ```python
   import pretty_midi

   midi_data = pretty_midi.PrettyMIDI('path_to_your_midi_file.mid')
   ```

2. **Inspect the Instruments**:
   The `instruments` attribute contains a list of all the instrument tracks in the MIDI file.

   ```python
   for instrument in midi_data.instruments:
       print(f"Instrument {instrument.program} ({instrument.name}), Channel: {instrument.channel}")
   ```

3. **Notes, Pitch Bends, and Control Changes for Each Instrument**:
   For each instrument, you can inspect the notes, pitch bends, and control changes.

   ```python
   for instrument in midi_data.instruments:
       print(f"Notes for {instrument.name}:")
       for note in instrument.notes[:10]:  # Displaying the first 10 notes as an example
           print(note)

       print(f"Pitch Bends for {instrument.name}:")
       for bend in instrument.pitch_bends[:10]:  # Displaying the first 10 pitch bends as an example
           print(bend)

       print(f"Control Changes for {instrument.name}:")
       for control_change in instrument.control_changes[:10]:  # Displaying the first 10 control changes as an example
           print(control_change)
   ```

4. **Meta Information**:
   - **Tempo Changes**:
     ```python
     tempo_changes = midi_data.get_tempo_changes()
     print(tempo_changes)
     ```

   - **Time Signature Changes**:
     ```python
     time_signatures = midi_data.time_signature_changes
     for ts in time_signatures:
         print(ts)
     ```

   - **Key Signature Changes**:
     ```python
     key_signatures = midi_data.key_signature_changes
     for ks in key_signatures:
         print(ks)
     ```

   - **Lyrics**:
     ```python
     for lyric in midi_data.lyrics:
         print(lyric.text, "at time", lyric.time)
     ```

5. **Other Information**:
   - **File Resolution**:
     ```python
     print("Resolution:", midi_data.resolution)
     ```

   - **File Length**:
     ```python
     print("Length in seconds:", midi_data.get_end_time())
     ```

This is a basic overview to help you start exploring the contents of your MIDI file. Depending on the complexity and richness of the MIDI file, there might be more or less data to inspect. The `pretty_midi` library provides a comprehensive set of tools to inspect and manipulate MIDI data, so you can delve deeper based on your needs.

In [36]:
# Sample MIDI data
right_hand = text_notes_rh

left_hand = text_notes_lh

# Assuming a tempo of 60 beats per minute and a time signature of 3/8
seconds_per_bar = 3 * 0.5

# Define the chunks based on the Description column
chunks = [
    (1, 17, "First Subject in F minor"),
    (17, 36, "Connecting Episode"),
    (36, 62, "Second Subject in A flat major"),
    (62, 66, "Coda"),
    (66, 136, "Development Section")
]

def chunk_midi_data(data, chunks):
    chunked_data = {}
    current_time = 0.0
    
    for note, duration in data:
        for start_bar, end_bar, description in chunks:
            start_time = (start_bar - 1) * seconds_per_bar
            end_time = end_bar * seconds_per_bar
            
            if start_time <= current_time < end_time:
                if description not in chunked_data:
                    chunked_data[description] = []
                chunked_data[description].append((note, duration))
                break
        
        current_time += duration
    
    return chunked_data

chunked_right_hand = chunk_midi_data(right_hand, chunks)
chunked_left_hand = chunk_midi_data(left_hand, chunks)

print("Chunked Right Hand:", chunked_right_hand)
print("Chunked Left Hand:", chunked_left_hand)


Chunked Right Hand: {'First Subject in F minor': [(64, 0.35520833333333335), (69, 0.35520833333333335), (69, 0.1177083333333333), (69, 0.35520833333333335), (69, 0.1177083333333333), (69, 0.35520833333333335), (69, 0.35520833333333335), (69, 0.23645833333333321), (64, 0.3552083333333331), (69, 0.3552083333333331), (69, 0.1177083333333333), (69, 0.3552083333333331), (69, 0.1177083333333333), (69, 0.3552083333333331), (69, 0.3552083333333331), (69, 0.23645833333333321), (62, 0.35520833333333357), (69, 0.35520833333333357), (69, 0.11770833333333286), (69, 0.35520833333333357), (69, 0.11770833333333286), (69, 0.35520833333333357), (71, 0.35520833333333357), (72, 0.23645833333333321), (65, 1.4239583333333332), (69, 1.4239583333333332), (74, 1.4239583333333332), (72, 0.23645833333333321), (74, 0.23645833333333321), (69, 0.7114583333333329), (72, 0.7114583333333329), (76, 0.7114583333333329), (72, 0.7114583333333329), (69, 0.4739583333333339), (62, 0.7114583333333329), (67, 0.7114583333333329