## STEP 7 : Model Deployment 

### Import Necessary Libraries

In [23]:
import os
import numpy as np
import pandas as pd
import ipywidgets as widgets
from sklearn.preprocessing import StandardScaler
from scipy.spatial.distance import cdist
from IPython.display import display, clear_output, HTML
from sklearn.metrics import precision_score, recall_score, ndcg_score

import warnings
warnings.filterwarnings("ignore")

### 7.1 Load datasets from file paths

In [24]:
# Paths to datasets
data_path = '..\Dataset\data_cleaned_clustering.csv'
genre_data_path = '..\Dataset\genre_data_cleaned_clustering.csv'

# Check if files exist and load them
if os.path.exists(data_path) and os.path.exists(genre_data_path):
    data_original = pd.read_csv(data_path)
    genre_data_original = pd.read_csv(genre_data_path)
    print("Info: Data and genre data successfully loaded.")
else:
    print("Attention: One or both files are not found in the specified directory.")
    raise FileNotFoundError("Dataset files are missing. Please check the paths.")

# Create copies of the datasets for manipulation
data = data_original.copy()
genre_data = genre_data_original.copy()

Info: Data and genre data successfully loaded.



### 7.2 Types of Recommendation System

### 7.2.1. Collaborative Filtering

Collaborative Filtering helps suggest songs you might like by looking at what other people who like the same music listen to.

#### 7.2.1.1 User-based Collaborative Filtering
- **What it does**: It’s like getting song recommendations from a friend who has similar taste in music. The system finds other users who like what you like and suggests songs they enjoy.
- **Why it’s good**: It’s personalized — you get suggestions that fit your own music taste.

#### 7.2.1.2 Item-based Collaborative Filtering
- **What it does**: If you like a specific song, this method finds other songs that fans of that song also like and recommends them to you.
- **Why it’s good**: It’s reliable because it doesn’t depend much on changing user preferences; it focuses on the songs themselves.

**Note**: Both methods need a lot of data about what songs people listen to work well.

### 7.2.2. Content-Based Filtering

This method recommends songs by looking at the features of the songs themselves.

- **Example**: If you like songs with lots of guitar or a specific beat, the system will find other songs with similar music features and recommend them to you.
- **Why it’s useful**: It gives you songs that match the musical styles you like, without needing to know what others like.

### 7.2.3. Hybrid Models

Hybrid models mix both collaborative and content-based methods to give you better song recommendations.

- **How it works**: These models use both what you like and what songs are like to make suggestions. This way, even if there’s not a lot of data on what you’ve listened to, you can still get good recommendations.
- **Why it’s beneficial**: It helps fill in the gaps when there’s limited information about user preferences or new songs, making sure you still get recommendations that you are likely to enjoy.


## 7.3 Choosing the Best Model for Music Recommendation

After analyzing our datasets, `genre_data_cleaned.csv` and `data_cleaned.csv`, we need to decide on the most suitable recommendation model based on the characteristics of each dataset.

### 7.3.1 Dataset Overview

#### 1. Genre Data (`genre_data_cleaned.csv`)
This dataset provides extensive descriptive metadata about songs, which includes:

- **Audio Features**: Metrics such as acousticness, danceability, energy, and instrumentalness.
- **Categorical Data**: Attributes like genre, popularity category, and mode category, which describe the emotional and stylistic characteristics of the songs.
- **Artist and Song Information**: Details on the artists and individual tracks.

**Appropriate Model**: Given the rich content-based features, this dataset is ideal for a **Content-Based Filtering** recommendation system.

#### 2. Data Cleaned (`data_cleaned.csv`)
This dataset includes:

- **Song Features**: Contains attributes similar to the genre dataset, such as acousticness, danceability, and energy.
- **Lack of User-Specific Interaction Data**: It does not contain user-item interactions like ratings or play history, which are crucial for collaborative filtering.

**Appropriate Model**: The absence of user interaction data makes this dataset less suited for collaborative filtering and more appropriate for a **Content-Based Filtering** approach.

#### Conclusion

**Model Choice**
- The extensive details in the `data_cleaned.csv` make it ideal for analyzing and recommending songs based on their inherent characteristics.
- We will proceed with a **Content-Based Filtering** model using the rich metadata available in our datasets.
- **Technique**
- We will implement **Cosine Similarity** to identify and recommend songs with similar features, aiming to improve user satisfaction by aligning recommendations closely with their preferences.


## 7.4 Music Recommendation System

**Obective**

Recommendation system is set up with a user interface to interact with the user and suggest personalized recommendations based on various features.

**Overview**

**User Interface**

Users can interact with dropdowns to select songs, artists, or genres for receiving recommendations.Three dropdowns are created for the user to choose:
- All Songs: Displays a list of all songs.
- By Artist: Displays songs filtered by the artist.
- By Genre: Displays songs filtered by genre.


In [25]:
# Shuffle dataset for processing
data_cleaned_shuffled = data.sample(frac=0.1, random_state=42).reset_index(drop=True)

# Relevant numeric columns for recommendations
number_cols = [
    'valence', 'year', 'acousticness', 'danceability', 'duration_min', 
    'energy', 'explicit', 'instrumentalness', 'key', 'liveness', 
    'loudness_scaled', 'mode', 'popularity', 'speechiness', 'tempo'
]

# Initialize and fit the scaler
scaler = StandardScaler()
scaler.fit(data_cleaned_shuffled[number_cols])

# Dropdowns for selecting songs, artists, or genres
all_songs_dropdown = widgets.Dropdown(
    description='🎵 Select Song:',
    layout={'width': '90%'},
    style={'description_width': 'initial'}
)
artist_dropdown = widgets.Dropdown(
    description='🎤 Choose Artist:',
    layout={'width': '90%'},
    style={'description_width': 'initial'}
)
genre_dropdown = widgets.Dropdown(
    description='🎧 Select Genre:',
    layout={'width': '90%'},
    style={'description_width': 'initial'}
)

# Populate dropdowns
all_songs_dropdown.options = [(f"{row['name']} ({row['year']}) - {', '.join(eval(row['artists']))} 🎶", index) for index, row in data_cleaned_shuffled.iterrows()]
artist_dropdown.options = [(artist, index) for index, artist in enumerate(data_cleaned_shuffled['artists'].apply(lambda x: ', '.join(eval(x))).unique())]
genre_dropdown.options = [(genre, index) for index, genre in enumerate(genre_data['genres'].unique())]

# Outputs for displaying recommendations and metrics
output = widgets.Output(layout={'border': '1px solid #ccc', 'padding': '10px', 'margin-top': '10px', 'background-color': '#f9f9f9'})
metrics_output = widgets.Output(layout={'border': '1px solid #ccc', 'padding': '10px', 'margin-top': '10px', 'background-color': '#f9f9f9'})

# Function to recommend songs
def recommend_songs(song_index, data, genre_data, n_songs=5):
    
    # Retrieve the selected song's data using its index
    song_data = data.iloc[song_index]
    
    # Scale the features of the selected song using the pre-fitted scaler
    song_features = scaler.transform([song_data[number_cols]])
    
    # Scale the features of all songs in the dataset for distance calculations
    data_features = scaler.transform(data[number_cols])
    
    # Compute cosine distances between the selected song and all other songs
    distances = cdist(song_features, data_features, 'cosine')[0]
    
    # Get the indices of the closest songs, sorted by smallest distance
    indices = np.argsort(distances)[:n_songs + 6]
    
    # Retrieve the top `n_songs + 1` closest songs (excluding the original)
    recommended_songs = data.iloc[indices].head(n_songs + 1)
    
    # Make a copy of the recommendations to avoid modifying the original DataFrame
    recommended_songs = recommended_songs.copy()
    
    # Map cluster labels to genre names using the genre_data mapping
    genre_map = genre_data.set_index('cluster')['genres'].to_dict()
    recommended_songs['genre'] = recommended_songs['cluster_label'].map(genre_map)
    
    # Exclude the original song from the recommendations and return results
    return recommended_songs[recommended_songs['id'] != song_data['id']], distances[indices[1:n_songs + 1]]

# Function to calculate metrics for the recommendation system
def calculate_metrics(recommended, actual_cluster):
    
    # Identify the relevant items: songs in the same cluster as the original song
    relevant_items = recommended['cluster_label'] == actual_cluster
    
    # Create an array of ones for the recommended labels (assumes all recommended songs are "recommended")
    recommended_labels = np.ones(len(recommended))
    
    # Calculate precision: the percentage of relevant recommendations among all recommendations
    precision = precision_score(relevant_items, recommended_labels) * 100
    
    # Calculate recall: the percentage of relevant items correctly recommended out of all relevant items
    recall = recall_score(relevant_items, recommended_labels) * 100
    
    # Calculate NDCG (Normalized Discounted Cumulative Gain): measures ranking quality of recommendations
    ndcg = ndcg_score([relevant_items], [recommended_labels]) * 100
    
    # Return the calculated metrics as percentages
    return precision, recall, ndcg


def display_metrics(precision, recall, ndcg, average_distance, alignment):
    with metrics_output:
        clear_output()
        color = "#2e7d32" if precision >= 80 and recall >= 80 else "#ff8f00" if precision >= 60 else "#d32f2f"
        overall_assessment = (
            "Exceptional Performance" if precision >= 80 and recall >= 80 else
            "Adequate but Needs Improvement" if precision >= 60 else
            "Significant Improvement Needed"
        )
        metrics_html = f"""
        <div style="background-color: #e8f5e9; padding: 15px; border-radius: 10px; font-family: Arial, sans-serif;">
            <h3 style="color: {color};">Recommendation Metrics 📊</h3>
            <ul style="list-style-type: none; padding: 0;">
                <li><strong>📏 Average Cosine Distance:</strong> {average_distance:.2f}</li>
                <li><strong>🔄 Cluster Alignment:</strong> {alignment * 100:.2f}%</li>
                <li><strong>🎯 Precision:</strong> {precision:.2f}%</li>
                <li><strong>📈 Recall:</strong> {recall:.2f}%</li>
                <li><strong>⭐ NDCG:</strong> {ndcg:.2f}%</li>
            </ul>
            <p style="margin-top: 10px; font-size: 1.2em;">{overall_assessment}</p>
        </div>
        """
        display(HTML(metrics_html))

# Function to display recommendations recursively
def display_recommendations(change):
    song_index = change.new
    display_recommendations_recursive(song_index)

def display_recommendations_recursive(song_index):
    recommendations, distances = recommend_songs(song_index, data_cleaned_shuffled, genre_data, 5)
    
    with output:
        clear_output()
        if recommendations.empty:
            display(HTML("<strong>No Recommendations Found</strong>"))
            return
        
        display(HTML("<h4 style='color: #1976d2;'>🎧 Recommendations:</h4>"))
        
        recommendations_html = ""
        for _, row in recommendations.iterrows():
            recommendations_html += f"""
            <div style="border: 1px solid #ccc; padding: 10px; margin-bottom: 10px; border-radius: 5px; background-color: #f1f1f1;">
                <strong>🎶 {row['name']} ({row['year']})</strong><br>
                <em>Artists:</em> {', '.join(eval(row['artists']))}<br>
                <em>Genre:</em> {row['genre']}<br>
                <em>Cluster:</em> {row['cluster_label']}
            </div>
            """
        display(HTML(recommendations_html))
        
        # Dropdown for selecting a song from the recommendations
        recommendations_dropdown = widgets.Dropdown(
            options=[(f"{row['name']} ({row['year']}) - {', '.join(eval(row['artists']))} 🎧", idx) 
                     for idx, row in recommendations.iterrows()],
            description='Further Recommendations:',
            layout={'width': '90%'},
            style={'description_width': 'initial'}
        )
        display(recommendations_dropdown)
        
        # Observe further recommendations
        recommendations_dropdown.observe(lambda change: display_recommendations_recursive(change.new), names='value')
    
    with metrics_output:
        average_distance = np.mean(distances)
        input_cluster = data_cleaned_shuffled.iloc[song_index]['cluster_label']
        alignment = np.mean(recommendations['cluster_label'] == input_cluster)
        precision, recall, ndcg = calculate_metrics(recommendations, input_cluster)
        display_metrics(precision, recall, ndcg, average_distance, alignment)

# Sync dropdown selections across tabs
def update_other_tabs(change):
    song_index = change.new
    selected_song = data_cleaned_shuffled.iloc[song_index]
    artist = ', '.join(eval(selected_song['artists']))
    genre = genre_data.loc[genre_data['cluster'] == selected_song['cluster_label'], 'genres'].iloc[0]
    artist_dropdown.value = next((val for key, val in artist_dropdown.options if key == artist), None)
    genre_dropdown.value = next((val for key, val in genre_dropdown.options if key == genre), None)

# Link dropdowns to the recommendation function
all_songs_dropdown.observe(display_recommendations, names='value')
all_songs_dropdown.observe(update_other_tabs, names='value')
artist_dropdown.observe(display_recommendations, names='value')
genre_dropdown.observe(display_recommendations, names='value')

# Display UI components
tab = widgets.Tab([all_songs_dropdown, artist_dropdown, genre_dropdown])
tab.set_title(0, '🎶 All Songs')
tab.set_title(1, '🎤 By Artist')
tab.set_title(2, '🎧 By Genre')

display(widgets.VBox([
    widgets.HTML(value="<h2 style='color: #1976d2;'>Music Recommendation System 🎵</h2>"),
    tab,
    output,
    metrics_output
]))


VBox(children=(HTML(value="<h2 style='color: #1976d2;'>Music Recommendation System 🎵</h2>"), Tab(children=(Dro…

## 7.5 Summary of Algorithms Used in the Music Recommendation System

This recommendation system utilizes a combination of machine learning techniques and metrics to suggest songs to users. Below is a summary of the algorithms and methods used:

---

### 1. Data Standardization
- **Algorithm:** `StandardScaler` from `sklearn`
- **Purpose:**
  - Standardizes the numerical features to have a mean of 0 and a standard deviation of 1.
  - Ensures that all features contribute equally to the similarity computation.

---

### 2. Song Similarity Calculation
- **Algorithm:** **Cosine Similarity**
  - Implemented using `scipy.spatial.distance.cdist`.
- **Purpose:**
  - Computes the cosine distance between the feature vectors of songs.
  - Finds the most similar songs to the user’s selection by sorting distances in ascending order.

---

### 3. Recommendation Filtering
- **Method:** 
  - Filters out the selected song from the recommendations to avoid duplication.
  - Maps song clusters to genres using a predefined cluster-to-genre mapping.

---

### 4. Evaluation Metrics
- **Algorithms:** 
  - `precision_score` and `recall_score` from `sklearn`
  - `ndcg_score` (Normalized Discounted Cumulative Gain)
- **Purpose:**
  - Evaluates the quality of recommendations based on:
    - **Precision:** Percentage of recommended items that are relevant.
    - **Recall:** Percentage of relevant items that are recommended.
    - **NDCG:** Evaluates the ranking quality of recommendations.

---

### 5. Recursive Recommendations
- **Method:**
  - Allows users to select a recommended song for further recommendations.
  - Dynamically updates recommendations and metrics in real-time, enabling exploratory navigation through the recommendation space.


## 7.6 Future Scope

**Real Time Processing**
- Build user profiles, integrate real-time feedback (likes/dislikes) and apply collaborative filtering for more accurate, personalized suggestions.

**Advanced Models**
- Implement Hybrid models combining collaborative, content-based and context-aware filtering for better recommendations.

**Enanced Features**
- Incorporate audio features, sentiment analysis of lyrics and temporal trends better recommendations.
