
# Melody Representation in Python (Notebook)


## 1. Minimal, ordered representation
A melody is an **ordered sequence** of note events. We use a Python **list** of note **strings** to preserve order.


In [None]:

# A single melody: C D E F G
melody = ["C", "D", "E", "F", "G"]
melody



## 2. Adding rhythm and octave (optional upgrade)
If you need more detail (octave/duration), use tuples `(pitch, duration)` where duration is in beats.


In [None]:

# pitch as scientific pitch notation, duration in beats
melody_rich = [("C4", 1.0), ("D4", 1.0), ("E4", 1.0), ("F4", 1.0), ("G4", 2.0)]
melody_rich



## 3. Grouping several melodies
Use a list of melodies for a simple corpus or a dict if you want names/metadata.


In [None]:

# Option A: Corpus as a list of melodies (each melody is a list of note strings)
corpus = [
    ["C", "D", "E", "F", "G"],       # melody 1
    ["G", "F", "E", "D", "C"],       # melody 2
    ["C", "E", "G", "E", "C"]        # melody 3
]

# Option B: Named melodies (dictionary)
songs = {
    "ascending": ["C", "D", "E", "F", "G"],
    "descending": ["G", "F", "E", "D", "C"],
    "arpeggio": ["C", "E", "G", "E", "C"]
}

corpus, songs



## 4. Flattening a collection of melodies
Sometimes you need a single, long sequence of notes (e.g., for statistics or training). Three common approaches:
- `itertools.chain.from_iterable`
- Repeated `list.extend(...)`
- `sum(corpus, [])` (simple, but less efficient for large data)


In [None]:

from itertools import chain

# Using itertools (efficient)
all_notes_chain = list(chain.from_iterable(corpus))

# Using extend in a loop
all_notes_extend = []
for m in corpus:
    all_notes_extend.extend(m)

# Using sum (okay for small datasets)
all_notes_sum = sum(corpus, [])

print("itertools.chain:", all_notes_chain)
print("extend loop    :", all_notes_extend)
print("sum(corpus, []):", all_notes_sum)



## 5. Simple feature extraction
To analyze or train models, you might compute:
- **Pitch n-grams** (common note patterns)
- **Intervals** (melodic motion; transposition-invariant)
- (Optionally) rhythm patterns, key/scale degrees, phrase boundaries

Below is a toy example mapping pitch classes to integers and extracting intervals and n-grams.


In [None]:

# Simple pitch-to-number map for interval features (pitch classes only; no octaves)
scale_order = ["C","C#","D","D#","E","F","F#","G","G#","A","A#","B"]
to_num = {p:i for i,p in enumerate(scale_order)}

def intervals(melody):
    # Filter: ensure all notes are recognized pitch classes for this toy demo
    nums = [to_num[p] for p in melody if p in to_num]
    return [nums[i+1] - nums[i] for i in range(len(nums)-1)]

def ngrams(seq, n=3):
    return [tuple(seq[i:i+n]) for i in range(len(seq)-n+1)]

# Build a flat corpus and compute some features
from itertools import chain
all_notes = list(chain.from_iterable(corpus))
tri_grams = ngrams(all_notes, n=3)
melody_intervals = [intervals(m) for m in corpus]

print("All notes (flat):", all_notes)
print("Example 3-grams:", tri_grams[:8])
print("Intervals per melody:", melody_intervals)



## 6. Building a dataset incrementally
Starting with an empty list and adding melody notes is straightforward via `extend` (or `+=`). This keeps the result flat.


In [None]:

all_notes_incremental = []
for melody_i in corpus:
    all_notes_incremental.extend(melody_i)   # or: all_notes_incremental += melody_i
all_notes_incremental



## 7. Preferred training data shape & justification
- **Representation:** `List[List[Note]]` where `Note` is a string (`"C"`) or a tuple (`("C4", 1.0)`) if rhythm/octave matter.
- **Why lists?** Lists preserve order, which is paramount for melody. They are easy to slice, batch, and combine.
- **Flattening:** Use `itertools.chain` (or `extend`) when a single sequence is needed for n-gram statistics or sequence modeling.
- **Features to extract:** pitch n-grams, intervals, rhythm patterns, phrase lengths; optionally key/scale degrees.
- **Extensibility:** Start simple (strings), then upgrade notes to dicts/tuples if you need more attributes without changing the outer list shape.
