# Album Cover Music Recommendation System

## Step 1: Project Setup and Data Loading

A CSV file is required for this project, containing at least the following columns:

- **`song_name`**: The title of the song.
- **`img`**: A URL pointing to the album cover image.

The file (e.g., `music_dataset.csv`) should be placed in the working directory. This dataset will be used for analyzing album cover images and generating music recommendations.

### Required Libraries

The following libraries are necessary for the implementation:

- **`pandas`** and **`numpy`**: For data manipulation.
- **`scikit-learn`**: For preprocessing and similarity calculations.
- **`tensorflow`** and **`keras`**: For deep learning-based feature extraction.
- **`opencv-python-headless`**: For image processing.
- **`pillow`**: For handling image operations.
- **`matplotlib`** and **`seaborn`**: For data visualization.

### Loading the Dataset

After ensuring the required libraries are installed, the dataset is loaded into a DataFrame. Basic checks can then be performed to confirm the structure and contents of the data.


In [1]:
# Install required libraries
!pip install pandas numpy scikit-learn tensorflow keras opencv-python-headless pillow matplotlib seaborn

# Import essential libraries
import pandas as pd
import numpy as np
import os
import cv2
import matplotlib.pyplot as plt
import seaborn as sns

# Deep learning and image processing
from tensorflow.keras.applications import ResNet50V2, VGG16
from tensorflow.keras.preprocessing.image import img_to_array, load_img
from tensorflow.keras.applications.resnet_v2 import preprocess_input as resnet_preprocess
from tensorflow.keras.applications.vgg16 import preprocess_input as vgg_preprocess

# Machine learning
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity

# Load the dataset
df = pd.read_csv('spotify_dataset.csv').dropna()

# Check dataset
print(df.head())
print("\nTotal number of songs:", len(df))

                               name      artist              spotify_id  \
0            Mood (feat. iann dior)    24kGoldn  3tjFYV6RSFtuktYl3ZtYcq   
2                          Dynamite         BTS  0t1kP63rueHleOhQkYSXFY   
6                             Hawái      Maluma  4uoR6qeWeuL4Qeu2qJzkuG   
8  Savage Love (Laxed - Siren Beat)   Jawsh 685  1xQ6trAsedVPCdbtDAmk0c   
9         Head & Heart (feat. MNEK)  Joel Corry  6cx06DFPPHchuUAcTxznu9   

                                             preview  \
0  https://p.scdn.co/mp3-preview/45cb08fdb67744ab...   
2  https://p.scdn.co/mp3-preview/a707728846c105f4...   
6  https://p.scdn.co/mp3-preview/07037d916c2e4ea2...   
8  https://p.scdn.co/mp3-preview/d709526bcdffa668...   
9  https://p.scdn.co/mp3-preview/a860b3fe04f48053...   

                                                 img  danceability  energy  \
0  https://i.scdn.co/image/ab67616d0000b273ff8c98...         0.700   0.722   
2  https://i.scdn.co/image/ab67616d0000b273755995...    

## Step 2: Image Download and Preprocessing

To analyze album covers, it is necessary to download images from the provided URLs and prepare them for feature extraction. This involves implementing functions for downloading images and applying preprocessing steps to make them suitable for analysis.

### Key Functions

1. **Image Download**  
   A function to download images from the URLs provided in the dataset. The images are saved locally for subsequent processing.

2. **Preprocessing**  
   The downloaded images undergo preprocessing to ensure they meet the input requirements of the feature extraction methods. This may include resizing and color space conversion.


In [2]:
def download_image(url, filename):
    """
    Download image from URL and save locally
    """
    import requests
    try:
        response = requests.get(url)
        if response.status_code == 200:
            with open(filename, 'wb') as f:
                f.write(response.content)
            return True
    except Exception as e:
        print(f"Error downloading {url}: {e}")
    return False

## Step 3: Feature Extraction

Various features are extracted from the album cover images to enable effective recommendation generation. The features include:

### 1. Color Histograms
   - Color histograms capture the distribution of colors in an image, providing a representation of the color composition. This is done by converting the image to the HSV color space and computing the histogram for the hue and saturation channels.

### 2. CNN Features
   - Convolutional Neural Networks (CNNs) such as **ResNet** and **VGG** are used to extract high-level image features. These models are pre-trained on large datasets (e.g., ImageNet) and can identify complex patterns in images. The extracted features represent the album cover in a way that can be compared across songs.

### 3. Font Features (Optional)
   - Font features involve analyzing any text on the album cover. This can be done using **pytesseract**, an OCR tool, to extract text and derive basic text-based features (e.g., text length, word count). This step is optional and requires the installation of **pytesseract**.


In [3]:
def extract_color_histogram(image_path):
    """
    Extract color histogram features from an image
    """
    image = cv2.imread(image_path)
    if image is None:
        return None

    # Convert to HSV color space
    hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)

    # Compute histogram
    hist = cv2.calcHist([hsv_image], [0, 1], None, [180, 256], [0, 180, 0, 256])
    cv2.normalize(hist, hist)
    return hist.flatten()


def extract_cnn_features_sequential(image_path, model_name='resnet'):
    """
    Extract deep features using pre-trained CNN models sequentially to avoid memory overload.
    """
    # Resize image
    img = load_img(image_path, target_size=(224, 224))
    img_array = img_to_array(img)

    # Select and preprocess model
    if model_name.lower() == 'resnet':
        model = ResNet50V2(weights='imagenet', include_top=False, pooling='avg')
        preprocessed_img = resnet_preprocess(img_array)
    else:
        model = VGG16(weights='imagenet', include_top=False, pooling='avg')
        preprocessed_img = vgg_preprocess(img_array)

    # Expand dimensions and predict
    preprocessed_img = np.expand_dims(preprocessed_img, axis=0)
    features = model.predict(preprocessed_img)

    # Free up memory by clearing the Keras session
    from tensorflow.keras import backend as K
    K.clear_session()

    return features.flatten()


def extract_font_features(image_path):
    """
    Basic font feature extraction
    """
    import pytesseract

    # Read image
    image = cv2.imread(image_path)

    # Convert to grayscale
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    # Perform text extraction
    text = pytesseract.image_to_string(gray)

    # Basic text analysis features
    return {
        'text_length': len(text),
        'word_count': len(text.split()),
        'unique_chars': len(set(text))
    }


def extract_album_cover_features_sequential(image_url):
    """
    Comprehensive feature extraction for album covers, processed sequentially.
    """
    # Create temp directory if not exists
    os.makedirs('temp_covers', exist_ok=True)

    # Generate unique filename
    filename = f"temp_covers/{hash(image_url)}.jpg"

    # Download image
    if not download_image(image_url, filename):
        return None

    # Extract features sequentially to avoid memory overload
    features = {
        'color_hist': extract_color_histogram(filename)
    }

    # Process ResNet features first
    features['resnet_features'] = extract_cnn_features_sequential(filename, model_name='resnet')

    # Process VGG features next
    features['vgg_features'] = extract_cnn_features_sequential(filename, model_name='vgg')

    # Optional font features (requires pytesseract)
    try:
        features['font_features'] = extract_font_features(filename)
    except ImportError:
        print("Pytesseract not installed. Skipping font features.")

    return features

import pickle
import os
import json

def process_dataset_features(dataframe, output_file='album_features.pkl', techniques=None):
    """
    Process features for the entire dataset, with options to select specific techniques.
    """
    # Default techniques to run if none are provided
    if techniques is None:
        techniques = ['color_hist', 'resnet', 'vgg', 'font']

    features_list = []
    total_songs = len(dataframe)

    # Check if file already exists, and load features if so
    if os.path.exists(output_file):
        print(f"Loading existing features from {output_file}...")
        with open(output_file, 'rb') as f:
            features_list = pickle.load(f)
        return pd.DataFrame(features_list)

    # Loop through each song and extract selected features
    for idx, row in dataframe.iterrows():  # Iterate through rows using iterrows()
        url = row['img']
        features = {}

        # Create temp directory if not exists
        os.makedirs('temp_covers', exist_ok=True)

        # Generate unique filename
        filename = f"temp_covers/{hash(url)}.jpg"

        # Download image
        if not download_image(url, filename):
            continue  # Skip if download fails

        if 'color_hist' in techniques:
            features['color_hist'] = extract_color_histogram(filename) # Pass filename
        if 'resnet' in techniques:
            features['resnet_features'] = extract_cnn_features_sequential(filename, 'resnet') # Pass filename
        if 'vgg' in techniques:
            features['vgg_features'] = extract_cnn_features_sequential(filename, 'vgg') # Pass filename
        if 'font' in techniques:
            try:
                features['font_features'] = extract_font_features(filename) # Pass filename
            except Exception as e:
                print(f"Error extracting font features for {url}: {e}")

        if features:  # Only append if features were extracted
            features['song_name'] = row['name']  # Include song name
            features_list.append(features)

        # Print progress every 10 songs or at the last song
        if idx % 10 == 0 or idx == total_songs - 1:
            print(f"Processing song {idx + 1} of {total_songs}")

    # Save features list to a file
    with open(output_file, 'wb') as f:
        pickle.dump(features_list, f)
        print(f"Features saved to {output_file}")

    return pd.DataFrame(features_list)

## Step 4: Similarity Calculation and Recommendations

To recommend songs based on album cover similarity, cosine similarity is used to measure how similar the features of one album cover are to another. The following two functions are implemented for recommendation generation:

### 1. recommend_songs_by_cover
   - This function recommends songs that have similar album covers to a given target song. It calculates the cosine similarity between the features of the target song and all other songs in the dataset, then returns the top N most similar songs based on their album cover features.

### 2. create_playlist_recommendation
   - This function generates song recommendations based on the collective characteristics of a playlist. The average features of the songs in the playlist are calculated, and then cosine similarity is used to find songs that are most similar to this average. The top N recommended songs are returned.


In [4]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

def recommend_songs_by_cover(target_song, features_df, top_n=5, available_features=None):
    """
    Recommend songs based on album cover similarity, considering only the features available.
    """
    print("Feature dataframe")
    print(features_df.head())
    print("Available features")
    print(available_features)
    # Get target song features
    target_features = features_df[features_df['song_name'] == target_song].iloc[0]

    similarities = {}

    # Compute similarities for available features
    if 'color_hist' in available_features and 'color_hist' in target_features:
        similarities['color_similarity'] = cosine_similarity(
            target_features['color_hist'].reshape(1, -1),
            features_df['color_hist'].tolist()
        )[0]

    if 'resnet' in available_features and 'resnet_features' in target_features:
        similarities['resnet_similarity'] = cosine_similarity(
            target_features['resnet_features'].reshape(1, -1),
            features_df['resnet_features'].tolist()
        )[0]

    if 'vgg' in available_features and 'vgg_features' in target_features:
        similarities['vgg_similarity'] = cosine_similarity(
            target_features['vgg_features'].reshape(1, -1),
            features_df['vgg_features'].tolist()
        )[0]

    # Aggregate similarities
    if similarities:
        aggregate_similarity = np.mean(list(similarities.values()), axis=0)

        # Remove target song from recommendations
        aggregate_similarity[features_df['song_name'] == target_song] = -1

        # Get top N recommendations
        top_indices = aggregate_similarity.argsort()[-top_n:][::-1]

        return features_df.iloc[top_indices]
    else:
        print("No available features for similarity calculation.")
        return None


def create_playlist_recommendation(playlist, features_df, top_n=10, available_features=None):
    """
    Generate recommendations based on playlist, considering only the features available.
    """
    # Compute average features of playlist
    playlist_features = features_df[features_df['song_name'].isin(playlist)]

    avg_features = {}

    # Compute average for each available feature
    if 'color_hist' in available_features:
        avg_features['color_hist'] = np.mean(playlist_features['color_hist'].tolist(), axis=0)

    if 'resnet' in available_features:
        avg_features['resnet_features'] = np.mean(playlist_features['resnet_features'].tolist(), axis=0)

    if 'vgg' in available_features:
        avg_features['vgg_features'] = np.mean(playlist_features['vgg_features'].tolist(), axis=0)

    similarities = {}

    # Compute similarities for available features
    if 'color_hist' in available_features and 'color_hist' in avg_features:
        similarities['color_similarity'] = cosine_similarity(
            avg_features['color_hist'].reshape(1, -1),
            features_df['color_hist'].tolist()
        )[0]

    if 'resnet' in available_features and 'resnet_features' in avg_features:
        similarities['resnet_similarity'] = cosine_similarity(
            avg_features['resnet_features'].reshape(1, -1),
            features_df['resnet_features'].tolist()
        )[0]

    if 'vgg' in available_features and 'vgg_features' in avg_features:
        similarities['vgg_similarity'] = cosine_similarity(
            avg_features['vgg_features'].reshape(1, -1),
            features_df['vgg_features'].tolist()
        )[0]

    # Aggregate similarities
    if similarities:
        aggregate_similarity = np.mean(list(similarities.values()), axis=0)

        # Remove playlist songs from recommendations
        aggregate_similarity[features_df['song_name'].isin(playlist)] = -1

        # Get top N recommendations
        top_indices = aggregate_similarity.argsort()[-top_n:][::-1]

        return features_df.iloc[top_indices]
    else:
        print("No available features for similarity calculation.")
        return None


## Step 5: Example Usage

To test the recommendation system, the following examples demonstrate how to generate song recommendations based on album cover similarity:

### 1. Process Dataset Features
   - Extract features for all the album covers in the dataset using the `process_dataset_features` function. This will generate the feature matrix required for similarity calculations.

### 2. Recommend Songs for a Single Song
   - Use the `recommend_songs_by_cover` function to get song recommendations based on a specific song. The name of the target song is provided, and the function will return the top N most similar songs.

### 3. Recommend Songs Based on a Playlist
   - Use the `create_playlist_recommendation` function to recommend songs based on a playlist. A playlist consisting of song names is provided, and the function will return songs similar to the collective characteristics of the playlist.


In [5]:
# Example usage
# Specify which features you want to use for the recommendation process
available_features = ['color_hist']  # Modify as needed: ['color_hist'], ['resnet'], ['vgg'], ['font'], or combinations

# Process entire dataset features, passing the desired techniques
dataset_features = process_dataset_features(df, techniques=available_features)
print("Processed Dataset features using", available_features)

# Recommendation for a single song
target_song = 'Shape of You'
song_recommendations = recommend_songs_by_cover(target_song, dataset_features, available_features=available_features)
print(f"Recommendations for '{target_song}':")
print(song_recommendations)

# Playlist-based recommendation
playlist = ['Shape of You', 'Counting Stars', 'Havana']
playlist_recommendations = create_playlist_recommendation(playlist, dataset_features, available_features=available_features)
print("\nRecommendations for playlist:", playlist)
print(playlist_recommendations)


Loading existing features from album_features.pkl...
Processed Dataset features using ['color_hist']
Feature dataframe
                                          color_hist  \
0  [0.08430748, 0.0, 0.0012140276, 0.0004046759, ...   
1  [0.99841475, 0.0, 0.0012881887, 0.0005991576, ...   
2  [0.046410248, 0.0, 8.2433835e-05, 4.1216917e-0...   
3  [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...   
4  [0.22954293, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...   

                          song_name  
0            Mood (feat. iann dior)  
1                          Dynamite  
2                             Hawái  
3  Savage Love (Laxed - Siren Beat)  
4         Head & Heart (feat. MNEK)  
Available features
['color_hist']
Recommendations for 'Shape of You':
                                             color_hist            song_name
282   [0.736289, 0.0, 0.0, 0.0, 0.0, 0.0, 4.1025745e...          Galway Girl
101   [0.736289, 0.0, 0.0, 0.0, 0.0, 0.0, 4.1025745e...              Happier
2452  [0.736289

## Step 6: Visualization (Optional)

To further explore and analyze the album cover similarities, visualization techniques can be applied. For example:

### 1. Similarity Heatmaps
   - A heatmap can be generated to visualize the cosine similarity matrix between the album cover features. This allows for an easy identification of songs that are similar to each other based on their album covers.

### 2. t-SNE Plots
   - t-SNE (t-Distributed Stochastic Neighbor Embedding) is a technique for dimensionality reduction that can be used to project the high-dimensional feature vectors into a 2D space. A t-SNE plot can help visualize how songs with similar album covers are grouped together.

These visualizations can help to better understand the relationships between songs and the effectiveness of the recommendation system.


In [None]:
def visualize_album_cover_similarities(features_df):
    """
    Create a heatmap of album cover feature similarities
    """
    # Compute similarity matrix for color histograms
    color_similarity_matrix = cosine_similarity(
        features_df['color_hist'].tolist(),
        features_df['color_hist'].tolist()
    )

    # Plotting the heatmap
    plt.figure(figsize=(10, 8))
    sns.heatmap(color_similarity_matrix, annot=False, cmap='viridis', xticklabels=False, yticklabels=False)
    plt.title('Album Cover Color Similarity Heatmap')
    plt.show()

# Visualize similarities
visualize_album_cover_similarities(dataset_features)