## Problem Definition

The classification of musical genres is important because it helps organize and navigate the vast and diverse world of music, which is crucial for both practical and academic purposes. Practically, it enhances user experiences in streaming platforms, enabling personalized recommendations and efficient content discovery. It also supports music organizations by automating categorization, improving accessibility, and driving music-related business models.

# Using Random Forests

A Random Forest algorithm is ideal for musical genre classification because it handles high-dimensional data well and captures complex, non-linear relationships between features like tempo and melody. It is robust against overfitting by combining multiple decision trees, improving generalization to new songs. Additionally, it can handle noisy or missing data and provides insights into feature importance, making it a reliable and efficient choice for accurate genre classification.

In [37]:
pip install imbalanced-learn



In [38]:
# libraries
import numpy as np
import pandas as pd
from imblearn.over_sampling import SMOTE
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

## Musical Genre Sorter

In [39]:
# generate a synthetic dataset for the example
np.random.seed(42)

# generate samples
n_samples = 1000


In [40]:
# 0: Electric Guitar (0: No, 1: Yes)
electric_guitar = np.random.randint(0, 2, size=n_samples)  # 0: No electric guitar, 1: Yes electric guitar

# 1: Acoustic Guitar (0: No, 1: Yes)
acoustic_guitar = np.random.randint(0, 2, size=n_samples)  # 0: No acoustic guitar, 1: Yes acoustic guitar

# 2: Guitar Type (0: Fender, 1: Gibson, 2: PRS, 3: Jackson)
guitar_type = np.random.randint(0, 4, size=n_samples)  # 0: Fender, 1: Gibson, 2: PRS, 3: Jackson (guitar types)

# 3: Synthesizers (0: No, 1: Yes)
synthesizers = np.random.randint(0, 2, size=n_samples)  # 0: No synthesizer, 1: Yes synthesizer

# 4: Bass (0: No, 1: Yes)
bass = np.random.randint(0, 2, size=n_samples)  # 0: No bass, 1: Yes bass

# 5: Percussion (0: No, 1: Yes)
percussion = np.random.randint(0, 2, size=n_samples)  # 0: No percussion, 1: Yes percussion

# 6: Vocals (0: No, 1: Yes)
vocals = np.random.randint(0, 2, size=n_samples)  # 0: No vocals, 1: Yes vocals

# 7: Tempo (BPM, range from 40 to 180)
tempo = np.random.randint(40, 180, size=n_samples)  # Tempo in BPM (40 to 180)

# 8: Rhythm (Rhythm categories)
# 0: Shuffle, 1: Boom Bap, 2: Salsa, 3: Jazz, 4: Blues, 5: Soft Rhythm, 6: Funk, 7: 4x4 Dance, 8: EDM, 9: Country, 10: Reggae, 11: Latin, 12: Experimental
rhythm = np.random.randint(0, 13, size=n_samples)  # 0: Shuffle, 1: Boom Bap, 2: Salsa, 3: Jazz, etc.

# 9: Song Structure (0: Standard, 1: Freeform)
song_structure = np.random.randint(0, 2, size=n_samples)  # 0: Standard structure, 1: Freeform structure

# 10: Harmony (0: Low, 1: Medium, 2: High)
harmony = np.random.randint(0, 3, size=n_samples)  # 0: Low harmony, 1: Medium harmony, 2: High harmony

# 11: Duration (in seconds, range from 120 to 300)
duration = np.random.randint(120, 300, size=n_samples)  # Duration of the song in seconds (120 to 300 seconds)

# 12: Energy (value between 0 and 1)
energy = np.random.random(size=n_samples)  # Energy of the song, value between 0 and 1

# 13: Key (Key, value between 0 and 11)
key = np.random.randint(0, 12, size=n_samples)  # Key, value between 0 and 11 (for the 12 musical notes)

# 14: Saxophone (0: No, 1: Yes)
saxophone = np.random.randint(0, 2, size=n_samples)  # 0: No saxophone, 1: Yes saxophone

# 15: Drums (0: No, 1: Yes)
drums = np.random.randint(0, 2, size=n_samples)  # 0: No drums, 1: Yes drums

# 16: Piano (0: No, 1: Yes)
piano = np.random.randint(0, 2, size=n_samples)  # 0: No piano, 1: Yes piano

# 17: Violin (0: No, 1: Yes)
violin = np.random.randint(0, 2, size=n_samples)  # 0: No violin, 1: Yes violin

# 18: Trumpet (0: No, 1: Yes)
trumpet = np.random.randint(0, 2, size=n_samples)  # 0: No trumpet, 1: Yes trumpet

# 19: Flute (0: No, 1: Yes)
flute = np.random.randint(0, 2, size=n_samples)  # 0: No flute, 1: Yes flute

# 20: Electric Bass (0: No, 1: Yes)
electric_bass = np.random.randint(0, 2, size=n_samples)  # 0: No electric bass, 1: Yes electric bass

# 21: Harmonica (0: No, 1: Yes)
harmonica = np.random.randint(0, 2, size=n_samples)  # 0: No harmonica, 1: Yes harmonica

# 22: Organ (0: No, 1: Yes)
organ = np.random.randint(0, 2, size=n_samples)  # 0: No organ, 1: Yes organ

# 23: Pedal Steel (0: No, 1: Yes)
pedal_steel = np.random.randint(0, 2, size=n_samples)  # 0: No pedal steel, 1: Yes pedal steel

# 24: Banjo (0: No, 1: Yes)
banjo = np.random.randint(0, 2, size=n_samples)  # 0: No banjo, 1: Yes banjo

# 25: Keyboard (0: No, 1: Yes)
keyboard = np.random.randint(0, 2, size=n_samples)  # 0: No keyboard, 1: Yes keyboard



In [41]:
# generate genre column (20 genres, integer from 0 to 19)
genre = np.random.randint(0, 20, size=n_samples)

# map genres to each label
genre_labels = [
    "pop",           # 0
    "rock",          # 1
    "jazz",          # 2
    "hip-hop",       # 3
    "salsa",         # 4
    "blues",         # 5
    "metal",         # 6
    "disco",         # 7
    "folk",          # 8
    "funk",          # 9
    "reggae",        # 10
    "country",       # 11
    "experimental",  # 12
    "punk",          # 13
    "latin",         # 14
    "r&b",           # 15
    "soul",          # 16
    "indie",         # 17
    "house",         # 18
    "trance",        # 19
    "edm"            # 20
]


In [42]:
genre_rules = {
    0: {
        'name': 'pop',
        'conditions': [
            ('electric_guitar', 1, 'vocals', 1, 'rhythm', 0, 'song_structure', 0),  # Pop often features electric guitar with vocals and simple rhythm/song structure
            ('acoustic_guitar', 1, 'vocals', 1, 'rhythm', 0, 'song_structure', 0),  # Acoustic guitar can also be common with vocals
        ],
    },
    1: {
        'name': 'rock',
        'conditions': [
            ('electric_guitar', 1, 'guitar_type', [1, 0], 'percussion', 1, 'rhythm', [0, 4]),  # Rock includes electric guitars and percussion, with varied rhythm
        ],
    },
    2: {
        'name': 'jazz',
        'conditions': [
            ('electric_guitar', 1, 'guitar_type', [1, 2], 'rhythm', 6, 'tempo', 120, 'harmony', 1),  # Jazz often features electric guitars (PRS/Gibson) and intricate harmony
            ('vocals', 1, 'bass', 1, 'saxophone', 1, 'trumpet', 1, 'piano', 1),  # Typical jazz instrumentation includes saxophone, trumpet, and piano
        ],
    },
    3: {
        'name': 'hip-hop',
        'conditions': [
            ('synthesizers', 1, 'percussion', 1, 'rhythm', 1),  # Hip-hop often includes synthesizers and percussion with rhythmic beats
        ],
    },
    4: {
        'name': 'salsa',
        'conditions': [
            ('percussion', 1, 'bass', 1, 'rhythm', 2),  # Salsa relies heavily on percussion, bass, and its rhythmic style
        ],
    },
    5: {
        'name': 'blues',
        'conditions': [
            ('electric_guitar', 1, 'guitar_type', [1, 2, 0], 'rhythm', 4),  # Blues often uses electric guitar and slower rhythm
            ('vocals', 1, 'bass', 1, 'harmonica', 1),  # Blues frequently includes harmonica and bass with vocals
        ],
    },
    6: {
        'name': 'metal',
        'conditions': [
            ('electric_guitar', 1, 'guitar_type', [0, 3, 2], 'tempo', 140, 'harmony', 0),  # Metal features high tempo and low harmony with powerful guitar types
            ('vocals', 1, 'bass', 1, 'drums', 1),  # Metal includes strong vocals, bass, and drums
        ],
    },
    7: {
        'name': 'disco',
        'conditions': [
            ('synthesizers', 1, 'tempo', 120, 'song_structure', 1),  # Disco often uses a higher tempo and freeform structure
            ('bass', 1, 'drums', 1),  # Disco is known for a strong bassline and prominent drum beats
        ],
    },
    8: {
        'name': 'folk',
        'conditions': [
            ('acoustic_guitar', 1, 'bass', 1, 'tempo', [90, 120], 'song_structure', 0),  # Folk music often features acoustic guitar, bass, and moderate tempo
            ('vocals', 1, 'banjo', 1),  # Banjo is a key instrument in folk music alongside vocals
        ],
    },
    9: {
        'name': 'funk',
        'conditions': [
            ('synthesizers', 1, 'rhythm', 9, 'song_structure', 1),  # Funk is driven by rhythm and synthesizers with freeform structure
            ('bass', 1, 'drums', 1),  # Funk also features bass and drums as the core of its groove
        ],
    },
    10: {
        'name': 'reggae',
        'conditions': [
            ('bass', 1, 'rhythm', 10, 'tempo', [70, 100]),  # Reggae typically has a laid-back tempo and rhythm style
            ('vocals', 1, 'drums', 1, 'keyboard', 1),  # Reggae often includes drums and keyboard along with vocals
        ],
    },
    11: {
        'name': 'country',
        'conditions': [
            ('acoustic_guitar', 1, 'bass', 1, 'rhythm', 11, 'song_structure', 0),  # Country features acoustic guitar with a steady rhythm
            ('vocals', 1, 'pedal_steel', 1),  # Pedal steel guitar is often used in country music
        ],
    },
    12: {
        'name': 'experimental',
        'conditions': [
            ('synthesizers', 1, 'song_structure', 1, 'rhythm', 12),  # Experimental genres feature unique song structures and rhythm styles
            ('vocals', 1, 'percussion', 1),  # Experimental genres often experiment with unique percussion alongside vocals
        ],
    },
    13: {
        'name': 'punk',
        'conditions': [
            ('electric_guitar', 1, 'guitar_type', [0, 1], 'tempo', 140, 'rhythm', 0),  # Punk has high tempo and fast rhythms, often using simple guitar types
            ('bass', 1, 'drums', 1),  # Punk relies on bass and drums to create energetic and straightforward tracks
        ],
    },
    14: {
        'name': 'latin',
        'conditions': [
            ('electric_guitar', 1, 'acoustic_guitar', 1, 'percussion', 1, 'bass', 1, 'rhythm', 2),  # Latin music blends guitars, percussion, and bass with a unique rhythm
            ('vocals', 1, 'trumpet', 1, 'saxophone', 1),  # Brass instruments like trumpet and saxophone are key in Latin music
        ],
    },
    15: {
        'name': 'r&b',
        'conditions': [
            ('synthesizers', 1, 'vocals', 1, 'rhythm', 5, 'song_structure', 0),  # R&B often uses soft rhythms, synths, and structured songs
            ('bass', 1, 'drums', 1),  # R&B features strong bass and drums as the foundation
        ],
    },
    16: {
        'name': 'soul',
        'conditions': [
            ('vocals', 1, 'bass', 1, 'percussion', 1, 'tempo', [0, 90]),  # Soul music typically features slow tempos and rich instrumentation
            ('organ', 1, 'saxophone', 1),  # Organ and saxophone are staples in soul music
        ],
    },
    17: {
        'name': 'indie',
        'conditions': [
            ('electric_guitar', 1, 'synthesizers', 1, 'vocals', 1, 'song_structure', 1),  # Indie music is known for a mix of guitars, synths, and freeform song structures
            ('bass', 1, 'drums', 1),  # Bass and drums are key to indie rhythms
        ],
    },
    18: {
        'name': 'house',
        'conditions': [
            ('synthesizers', 1, 'rhythm', 7, 'tempo', 120),  # House music features a steady 4x4 rhythm with a driving tempo
            ('bass', 1, 'drums', 1),  # Bass and drums create the foundation of house beats
        ],
    },
    19: {
        'name': 'trance',
        'conditions': [
            ('synthesizers', 1, 'tempo', 130, 'song_structure', 1),  # Trance music has fast tempos with freeform song structures
            ('bass', 1, 'drums', 1),  # Bass and drums are the driving force in trance music
        ],
    },
    20: {
        'name': 'edm',
        'conditions': [
            ('synthesizers', 1, 'rhythm', 7, 'tempo', 120, 'song_structure', 1),  # EDM features dance rhythms with a freeform structure
            ('bass', 1, 'drums', 1),  # Bass and drums are essential in EDM tracks
        ],
    }
}



In [43]:
# Create a dataframe with the data
data = pd.DataFrame({
    'tempo': tempo,
    'duration': duration,
    'energy': energy,
    'key': key,
    'electric_guitar': electric_guitar,
    'acoustic_guitar': acoustic_guitar,
    'guitar_type': guitar_type,
    'synthesizers': synthesizers,
    'percussion': percussion,
    'bass': bass,
    'vocals': vocals,
    'rhythm': rhythm,
    'harmony': harmony,
    'song_structure': song_structure,
    'saxophone': saxophone,
    'drums': drums,
    'piano': piano,
    'violin': violin,
    'trumpet': trumpet,
    'flute': flute,
    'electric_bass': electric_bass,
    'harmonica': harmonica,
    'banjo': banjo,
    'pedal_steel': pedal_steel,
    'organ': organ,
    'keyboard': keyboard,
    'genre': genre
})

In [44]:
# Helper function to check if conditions are met
def check_conditions(data, conditions):
    """Helper function to evaluate if the conditions hold for a given row"""
    for col, value in zip(conditions[::2], conditions[1::2]):
        if isinstance(value, list):  # if value is a list, check if it is in the list
            if not data[col] in value:
                return False
        else:
            if data[col] != value:
                return False
    return True

In [45]:
# show the first few rows of the dataset
print(data.head())

# define the features (X) and target variable (y)
X = data[['tempo', 'duration', 'energy', 'key', 'electric_guitar', 'acoustic_guitar', 'guitar_type', 'synthesizers', 'percussion', 'bass', 'vocals', 'rhythm', 'song_structure']]
y = data['genre']

genre = [None] * len(data)  # Initialize a list to store genre predictions

for genre_id, rule in genre_rules.items():
    for idx, row in data.iterrows():
        for condition in rule['conditions']:
            if check_conditions(row, condition):
                genre[idx] = genre_id

# split the dataset into training and test sets (80% training, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# scale the features on the training data and then transform the test data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# balance data
smote = SMOTE(random_state=42)
X_train_res, y_train_res = smote.fit_resample(X_train, y_train)

# create and train the random forest classifier model
model = RandomForestClassifier(
    n_estimators=200,           # Number of trees in the forest
    max_depth=20,               # Maximum depth of trees (prevents overfitting)
    min_samples_split=5,        # Minimum number of samples required to split a node
    min_samples_leaf=2,         # Minimum number of samples required at a leaf node
    max_features='sqrt',        # Max features for each tree (sqrt typically works well)
    class_weight='balanced',    # Adjust class weights to handle imbalanced classes
    random_state=42
)

model.fit(X_train_res, y_train_res)

# make predictions with the test set
y_pred = model.predict(X_test)

# evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print(f"Confusion Matrix:\n{conf_matrix}")
print(f"Classification Report:\n{class_report}")

# Map the predicted genre index to the genre name
predicted_genres = [genre_labels[i] for i in y_pred]


   tempo  duration    energy  key  electric_guitar  acoustic_guitar  \
0    127       168  0.672327    9                0                1   
1    112       217  0.531201    6                1                0   
2     76       149  0.098873   10                0                0   
3    158       191  0.812180    3                0                0   
4     96       222  0.866023    5                0                0   

   guitar_type  synthesizers  percussion  bass  ...  violin  trumpet  flute  \
0            2             1           0     1  ...       1        1      1   
1            3             1           0     0  ...       0        0      1   
2            1             1           0     0  ...       0        0      0   
3            3             1           0     0  ...       1        1      0   
4            3             0           1     1  ...       1        0      0   

   electric_bass  harmonica  banjo  pedal_steel  organ  keyboard  genre  
0              0        