<a href="https://colab.research.google.com/github/MaidenTaief/Music_Final/blob/main/cluster_Labels.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


###Data Loading and Initial Cluster Description Generation

This dataset containing various musical attributes and performing an initial analysis by grouping the data into clusters based on these attributes. We will compute the mean values of features like acousticness, danceability, energy, etc., within each cluster to understand the defining characteristics of the music grouped together.

Steps Performed
- Data Loading: The dataset is loaded from a specified directory. This dataset contains pre-clustered song data along with various musical features.
- Feature Analysis: We calculate the mean values of nine key musical features for each cluster to identify their central tendencies.
- Normalization: The feature values within each cluster are normalized to determine their relative importance. This helps in understanding which features are more dominant in describing the cluster.
- Descriptive Labeling: Each cluster is then assigned a descriptive label based on the three most dominant features. This label is constructed using a predefined map that translates feature technical names into more intuitive descriptions (e.g., 'tempo' is described as 'varied Tempo').
- Data Verification: The first few rows of the updated dataset are displayed to verify that the descriptive labels have been correctly attached.
Saving the Data: Finally, the dataset with the new cluster descriptions is saved back to the drive for further analysis or sharing.

In [None]:
import pandas as pd

# Load the song data with clustering information from my Drive
data_directory = '/content/drive/My Drive/DATASET'
data_path = f'{data_directory}/data_final_cleaned.csv'
data_final_cleaned = pd.read_csv(data_path)

# Display the first few entries to confirm the data is loaded correctly
print("Initial Data:")
print(data_final_cleaned.head())

# Calculate the mean of selected features for each cluster to understand their characteristics
features = ['acousticness', 'danceability', 'energy', 'instrumentalness',
            'liveness', 'loudness', 'speechiness', 'tempo', 'valence']
cluster_means = data_final_cleaned.groupby('cluster')[features].mean()
print("\nCluster Means:")
print(cluster_means)

# Determine the relative importance of each feature within the clusters
feature_importance = cluster_means.apply(lambda x: x / x.sum(), axis=1)
print("\nFeature Importance:")
print(feature_importance)

# Define intuitive descriptions for features
feature_names_map = {
    'tempo': 'Varied Tempo', 'energy': 'Vibrant', 'valence': 'Joyful',
    'danceability': 'Danceable', 'speechiness': 'Lyrical', 'acousticness': 'Mellow',
    'instrumentalness': 'Instrument-rich', 'liveness': 'Live', 'loudness': 'Loud'
}

cluster_descriptions = {}
for cluster, features in feature_importance.iterrows():
    top_features = features.sort_values(ascending=False).head(3).index
    description = ' / '.join([feature_names_map[feature] for feature in top_features])
    cluster_descriptions[cluster] = description

# Attach new descriptive labels to the dataset
data_final_cleaned['Cluster_Description'] = data_final_cleaned['cluster'].map(cluster_descriptions)
print("\nUpdated Data with Descriptive Labels:")
print(data_final_cleaned.head())

# Save the enhanced dataset with cluster descriptions
output_file_path = f'{data_directory}/data_final_with_descriptive_labels.csv'
data_final_cleaned.to_csv(output_file_path, index=False)


Initial Data:
   valence  year  acousticness  \
0   0.0594  1921         0.982   
1   0.9630  1921         0.732   
2   0.0394  1921         0.961   
3   0.1650  1921         0.967   
4   0.2530  1921         0.957   

                                             artists  danceability  \
0  ['Sergei Rachmaninoff', 'James Levine', 'Berli...         0.279   
1                                     ['Dennis Day']         0.819   
2  ['KHP Kridhamardawa Karaton Ngayogyakarta Hadi...         0.328   
3                                   ['Frank Parker']         0.275   
4                                     ['Phil Regan']         0.418   

   duration_ms  energy  explicit                      id  instrumentalness  \
0       831667   0.211         0  4BJqT0PrAfrxzMOxytFOIz          0.878000   
1       180533   0.341         0  7xPhfUan2yNtyFG0cUWkt8          0.000000   
2       500062   0.166         0  1o6I8BglA6ylDMrIELygv1          0.913000   
3       210000   0.309         0  3ftBPsC5vPBKxY

###Enhancing Cluster Descriptions with Normalized Feature Importance

We further refine our understanding of each music cluster by employing Z-score normalization on the cluster feature means. This statistical method helps in assessing the relative importance of each feature within the clusters, facilitating the creation of more nuanced and informative descriptions for each cluster.

Steps Performed
- Feature Selection: We define a set of musical features that are relevant to our analysis. These include acousticness, danceability, energy, and several others.
- Cluster Mean Calculation: For each cluster, we calculate the mean of these features to understand the average attribute values that characterize each cluster.
- Normalization: We apply Z-score normalization to these means. This standardizes the feature values across the dataset, highlighting how many standard deviations a feature's mean in a cluster is from the overall mean of that feature.
- Feature Importance: The absolute values of the Z-scores are taken to determine the feature importance irrespective of the direction (positive or negative), which tells us about the features that are most defining in each cluster.
- Descriptive Label Generation: Using the top three most significant features from the normalized data, we map each feature to a more intuitive description using a predefined dictionary. These top features are then combined to form a descriptive label for each cluster.
- Data Update and Verification: We update our dataset with these new descriptions and print them out to verify that they have been applied correctly.
- Data Saving: The dataset with updated, more descriptive labels is saved back to the drive. This enhanced dataset is now ready for further analysis or presentation.

In [None]:
import pandas as pd
from sklearn.preprocessing import StandardScaler

# Define features and calculate means for each cluster
features = ['acousticness', 'danceability', 'energy', 'instrumentalness', 'liveness', 'loudness', 'speechiness', 'tempo', 'valence']
cluster_means = data_final_cleaned.groupby('cluster')[features].mean()

# Apply Z-score normalization across each feature across all clusters
scaler = StandardScaler()
scaled_features = scaler.fit_transform(cluster_means)
scaled_cluster_means = pd.DataFrame(scaled_features, columns=features, index=cluster_means.index)
print("\nScaled Cluster Means:")
print(scaled_cluster_means)

# Adjust feature importance calculation
feature_importance = scaled_cluster_means.apply(lambda x: abs(x))
print("\nAbsolute Feature Importance:")
print(feature_importance)

# Map feature names to intuitive descriptions
feature_names_map = {
    'tempo': 'Varied Tempo', 'energy': 'Vibrant', 'valence': 'Joyful',
    'danceability': 'Danceable', 'speechiness': 'Lyrical', 'acousticness': 'Mellow',
    'instrumentalness': 'Instrument-rich', 'liveness': 'Live', 'loudness': 'Loud'
}

# Generate and assign descriptive labels using weighted importance
cluster_descriptions = {index: ' / '.join([feature_names_map[feat] for feat in row.sort_values(ascending=False).head(3).index]) for index, row in feature_importance.iterrows()}
data_final_cleaned['Cluster_Description'] = data_final_cleaned['cluster'].map(cluster_descriptions)

print("\nNew Cluster Descriptions:")
for cluster, description in cluster_descriptions.items():
    print(f"Cluster {cluster}: {description}")

# Save the updated dataset with enhanced descriptions
output_file_path = f'{data_directory}/data_final_with_descriptive_labels.csv'
data_final_cleaned.to_csv(output_file_path, index=False)



Scaled Cluster Means:
         acousticness  danceability    energy  instrumentalness  liveness  \
cluster                                                                     
0           -1.406342     -0.620789  1.502680         -0.363104  1.077273   
1           -0.770327      1.251741  0.893577         -0.461865 -1.157650   
2           -0.002285      1.135181 -0.759838         -0.622969  1.304326   
3            0.923834     -0.567222 -0.701617         -0.544590 -0.455389   
4            1.255119     -1.198911 -0.934802          1.992528 -0.768559   

         loudness  speechiness     tempo   valence  
cluster                                             
0        1.302155    -0.462288  1.891706 -0.235693  
1        0.969387    -0.454837  0.099058  1.794578  
2       -1.074498     1.998642 -0.577413  0.223138  
3       -0.101693    -0.539825 -0.507362 -0.683229  
4       -1.095351    -0.541693 -0.905989 -1.098793  

Absolute Feature Importance:
         acousticness  danceability 

- Cluster 0: "Varied Tempo / Vibrant / Mellow"
Genres: Indie, Alternative
Rationale: The description suggests a mix of energetic and calm qualities, which is typical of Indie and Alternative music. These genres often span a broad spectrum of tempo and mood, making them a fitting choice for a cluster characterized by both vibrancy and mellowness.
- Cluster 1: "Joyful / Danceable / Live"
Genres: Pop, Dance
Rationale: This cluster clearly features music that is upbeat and suitable for dancing, which aligns perfectly with Pop and Dance genres. These genres are known for their catchy, upbeat melodies that aim to evoke happiness and encourage dancing.
- Cluster 2: "Lyrical / Live / Danceable"
Genres: Hip-Hop, Rap
Rationale: The emphasis on 'Lyrical' suggests a focus on the words used in the songs, which is a central element of Hip-Hop and Rap. Additionally, these genres often feature live performances that showcase freestyle rap and danceable beats.
- Cluster 3: "Mellow / Vibrant / Joyful"
Genres: Classical, Jazz
Rationale: Classical and Jazz music can be both mellow and vibrant, offering rich and complex compositions that bring joy. The combination of these characteristics fits well with the attributes of this cluster.
- Cluster 4: "Instrument-rich / Mellow / Danceable"
Genres: Rock, Metal
Rationale: Rock and Metal music are often characterized by their rich instrumentation. While Metal might not always be mellow, its dynamic intensity can cater to a variety of tempos, including slower, more mellow tracks within the genre. Rock often includes a mix of mellow and upbeat danceable tracks.

In [None]:
import pandas as pd

# Load the dataset that already contains basic cluster descriptions
data_directory = '/content/drive/My Drive/DATASET'
data_path = f'{data_directory}/data_final_with_descriptive_labels.csv'
data_final_cleaned = pd.read_csv(data_path)

# Existing descriptions and genre tags mapped to each cluster
base_cluster_descriptions = {
    0: "Varied Tempo / Vibrant / Mellow",
    1: "Joyful / Danceable / Live",
    2: "Lyrical / Live / Danceable",
    3: "Mellow / Vibrant / Joyful",
    4: "Instrument-rich / Mellow / Danceable",
}

genre_tags = {
    0: ['Indie', 'Alternative'],
    1: ['Pop', 'Dance'],
    2: ['Hip-Hop', 'Rap'],
    3: ['Classical', 'Jazz'],
    4: ['Rock', 'Metal']
}

# Combine base descriptions with genre tags
for cluster, tags in genre_tags.items():
    full_description = f"{base_cluster_descriptions[cluster]} / {' / '.join(tags)}"
    base_cluster_descriptions[cluster] = full_description

# Update the cluster descriptions in the dataset
data_final_cleaned['Cluster_Description'] = data_final_cleaned['cluster'].apply(lambda x: base_cluster_descriptions[x])

# Display the first few updated entries to verify the changes
print(data_final_cleaned[['name', 'cluster', 'Cluster_Description']].head(5))

# Save the enhanced dataset with genre tags included
output_file_path = f'{data_directory}/data_final_with_genre_tags.csv'
data_final_cleaned.to_csv(output_file_path, index=False)


                                                name  cluster  \
0  Piano Concerto No. 3 in D Minor, Op. 30: III. ...        4   
1                            Clancy Lowered the Boom        1   
2                                          Gati Bali        4   
3                                          Danny Boy        3   
4                        When Irish Eyes Are Smiling        3   

                                 Cluster_Description  
0  Instrument-rich / Mellow / Danceable / Rock / ...  
1            Joyful / Danceable / Live / Pop / Dance  
2  Instrument-rich / Mellow / Danceable / Rock / ...  
3       Mellow / Vibrant / Joyful / Classical / Jazz  
4       Mellow / Vibrant / Joyful / Classical / Jazz  


In [None]:
import pandas as pd

# Set the directory where your dataset is stored
data_directory = '/content/drive/My Drive/DATASET'

# Load the dataset that includes descriptive tags for each cluster
dataset_path = f'{data_directory}/data_final_with_genre_tags.csv'
data_final_cleaned = pd.read_csv(dataset_path)

# Sample five entries from each cluster to inspect the variety and accuracy of the assigned tags
samples_from_each_cluster = data_final_cleaned.groupby('cluster').apply(lambda x: x.sample(5))

# Reset the index for a clean, easy-to-read DataFrame
samples_from_each_cluster.reset_index(drop=True, inplace=True)

# Display the sampled data, focusing on song names, artists, cluster numbers, and their descriptions
print(samples_from_each_cluster[['name', 'artists', 'cluster', 'Cluster_Description']])


                                                 name  \
0                                        Green & Gold   
1                       Santa Looked a Lot Like Daddy   
2   I Can't Give You Anything But Love - Live At T...   
3                  Look Back in Anger - 2017 Remaster   
4                                  Found a Job - Live   
5                              Munchies for Your Bass   
6                              You're Knockin' Me Out   
7                              Don't Back Down (Mono)   
8                                 Now I Got the Blues   
9   Standing On The Corner (with Ray Ellis & His O...   
10                Le mystère de l'assassinat du frère   
11                    Часть 192.2 - Триумфальная арка   
12                                      Bocca di rosa   
13                Часть 134.3 - По ком звонит колокол   
14                     Часть 66.2 - Триумфальная арка   
15  The Christmas Song (Chestnuts Roasting On an O...   
16                             