#Quantum Machine Learning System for Spotify Music Recommendation


Mentor:
Prof. Dr. Gerhard Hellstern
Duale Hochschule Baden-Württemberg

Team members:
Roshani Vijayan


### Spotify music recommendations:

The Jupyter Notebook is designed to create a quantum machine learning model to recommend Spotify tracks based on user preferences and song features. Here's a summary of each step involved in the process:

1. **Library Installation and Imports**:
    - The notebook installs and imports necessary libraries, including TensorFlow, TensorFlow Quantum, and Cirq.

2. **Loading Data**:
    - The dataset is loaded from a CSV file (Source: https://github.com/orzanai/Moodify/blob/main/Datasets/1200_song_mapped.csv). The file contains features of Spotify tracks and corresponding labels indicating user preferences.

3. **Data Preparation**:
    - Features are selected for model training.
    - Data is split into training and testing sets.

4. **Dimensionality Reduction**:
    - Principal Component Analysis (PCA) is used to reduce the dimensionality of the data to make it more suitable for quantum processing.

5. **Data Standardization**:
    - The features are standardized using `StandardScaler`.

6. **Scaling for Quantum Processing**:
    - Data is scaled to the range \([-π, π]\) for compatibility with quantum processing.

7. **Label Preparation**:
    - Labels are converted to NumPy arrays.

8. **Quantum Data Encoding**:
    - The data is converted into quantum circuits using the `convert_to_circuits` function, where each feature is encoded into a rotation gate.

9. **Conversion to TensorFlow Quantum Tensors**:
    - The circuits are converted to TensorFlow Quantum tensors for use in the model.

10. **Quantum Model Definition**:
    - A quantum model is defined using TensorFlow Quantum. The model consists of quantum circuits with parameterized rotation gates.

11. **Model Compilation and Training**:
    - The quantum model is compiled with an Adam optimizer and binary cross-entropy loss function.
    - The model is trained on the training data and validated on the test data.

12. **Predictions**:
    - The model makes predictions on the `prediction_circuits` data.


### Quantum Machine Learning Algorithm in the Code

The code uses a hybrid quantum-classical machine learning approach, specifically leveraging quantum circuits within a neural network framework. This involves the following components:

1. **Quantum Circuits**: Each data sample is encoded into a quantum circuit.
2. **Parameterized Quantum Circuits (PQCs)**: These circuits have trainable parameters that can be optimized during the training process.
3. **Quantum Layers in TensorFlow Quantum**: The quantum circuits are integrated into a TensorFlow Keras model using TensorFlow Quantum, which allows quantum circuits to be used as layers in a neural network.

#### Key Components and Steps

1. **Quantum Circuit Encoding**:
   - Each feature of the input data is encoded into rotation gates (`rx` and `ry`) on quantum qubits.
   - The quantum circuit for each sample is constructed with these gates.

2. **Parameterized Quantum Circuit (PQC)**:
   - A PQC is created where each qubit's rotation is parameterized by a symbol, allowing these parameters to be trained.
   - The quantum layer (`tfq.layers.PQC`) takes these circuits and the measurement operator (in this case, the Z operator on the first qubit) to output the expectation value, which serves as the model's prediction.

3. **Hybrid Quantum-Classical Model**:
   - The PQCs are integrated into a Keras model as a layer.
   - The model is compiled and trained using classical optimization techniques (e.g., Adam optimizer) on classical hardware.



### Prediction and Recommendation Process

1. **Data Preparation**:
   - Features from the dataset are standardized and reduced in dimensionality using PCA.
   - The features are scaled to fit the range \([-π, π]\) for compatibility with quantum circuits.

2. **Quantum Circuit Construction**:
   - The preprocessed features are converted into quantum circuits where each feature value determines the angle of rotation gates applied to qubits.

3. **Model Training**:
   - The quantum circuits representing the training data are fed into the PQC model.
   - The model is trained on these circuits, optimizing the parameters of the quantum gates to minimize the binary cross-entropy loss.

4. **Making Predictions**:
   - After training, the model predicts the labels for the test data and the prediction data (which could be new songs to recommend).

5. **Interpreting Predictions**:
   - The output of the model is a set of continuous values between -1 and 1.
   - These values represent the model's confidence in the recommendation, with values closer to 1 indicating strong recommendations and values closer to -1 indicating weak or no recommendations.



### Detailed Breakdown of How the Code Recommends Songs

1. **Preprocessing**:
   - The input data (features of songs) is cleaned and standardized.
   - Dimensionality reduction (PCA) ensures that the data is suitable for quantum processing.
   - The data is scaled to the appropriate range for quantum gate rotations.

2. **Quantum Circuit Encoding**:
   - Each song's features are converted into a quantum circuit, where each feature controls the rotation of a qubit.

3. **Quantum Model Training**:
   - The encoded circuits are input into a PQC within a neural network.
   - The model learns to distinguish between songs that should be recommended and those that should not based on training labels.

4. **Prediction**:
   - For new or test songs, the same preprocessing and encoding steps are applied.
   - The trained quantum model predicts the recommendation score for each song.

5. **Recommendation**:
   - The predicted scores are analyzed.
   - Songs with high positive scores are considered strong recommendations.
   - These songs can then be suggested to the user based on their preference and listening history.



In [None]:
# install TensorFlow Quantum and its dependencies:
!pip install tensorflow tensorflow-quantum
!pip install cirq

Collecting tensorflow-quantum
  Downloading tensorflow_quantum-0.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m27.7 MB/s[0m eta [36m0:00:00[0m
Collecting cirq-core==1.3.0 (from tensorflow-quantum)
  Downloading cirq_core-1.3.0-py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m24.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting cirq-google==1.3.0 (from tensorflow-quantum)
  Downloading cirq_google-1.3.0-py3-none-any.whl (598 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m598.8/598.8 kB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sympy==1.12 (from tensorflow-quantum)
  Downloading sympy-1.12-py3-none-any.whl (5.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.7/5.7 MB[0m [31m56.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting duet~=0.2.8 (from cirq-core

In [None]:
# Verify Installation: Verify if TensorFlow Quantum is installed correctly by importing it:
try:
    import tensorflow as tf
    import tensorflow_quantum as tfq
    import cirq
    import sympy
    print("TensorFlow Quantum is installed.")
except ImportError as e:
    print(f"An error occurred: {e}")

TensorFlow Quantum is installed.


In [None]:
# Mount Google Drive with Data and other files

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras import layers
import tensorflow_quantum as tfq
import cirq
import sympy

In [None]:
# Load the Data:
data_path = '/content/drive/MyDrive/Colab_Notebooks/GIRQ_Project/1200_song_mapped.csv'
data = pd.read_csv(data_path)
data.head()

Unnamed: 0.1,Unnamed: 0,track,artist,duration (ms),popularity,uri,danceability,energy,key,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time signature,labels
0,0,1999,Prince,379266,68,spotify:track:2H7PHVdQ3mXqEHXcvclTB0,0.866,0.73,5,-8.201,0.0767,0.137,0.0,0.0843,0.625,118.523,4,1
1,1,23,Blonde Redhead,318800,43,spotify:track:4HIwL9ii9CcXpTOTzMq0MP,0.381,0.832,8,-5.069,0.0492,0.0189,0.196,0.153,0.166,120.255,4,0
2,2,9 Crimes,Damien Rice,217946,60,spotify:track:5GZEeowhvSieFDiR8fQ2im,0.346,0.139,0,-15.326,0.0321,0.913,7.7e-05,0.0934,0.116,136.168,4,0
3,3,99 Luftballons,Nena,233000,2,spotify:track:6HA97v4wEGQ5TUClRM0XLc,0.466,0.438,4,-12.858,0.0608,0.089,6e-06,0.113,0.587,193.1,4,1
4,4,A Boy Brushed Red Living In Black And White,Underoath,268000,60,spotify:track:47IWLfIKOKhFnz1FUEUIkE,0.419,0.932,1,-3.604,0.106,0.00171,0.0,0.137,0.445,169.881,4,2


In [None]:
# Data Pre-Processing

# Drop unnecessary columns
data = data.drop(columns=['Unnamed: 0', 'track', 'artist', 'uri'])

# Feature Selection and Data Cleaning
features = data[['danceability', 'energy', 'key', 'loudness', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo']].values
labels = data['labels'].values

# Handle missing values (if any)
features = pd.DataFrame(features).fillna(0).values

data.head()

Unnamed: 0,duration (ms),popularity,danceability,energy,key,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time signature,labels
0,379266,68,0.866,0.73,5,-8.201,0.0767,0.137,0.0,0.0843,0.625,118.523,4,1
1,318800,43,0.381,0.832,8,-5.069,0.0492,0.0189,0.196,0.153,0.166,120.255,4,0
2,217946,60,0.346,0.139,0,-15.326,0.0321,0.913,7.7e-05,0.0934,0.116,136.168,4,0
3,233000,2,0.466,0.438,4,-12.858,0.0608,0.089,6e-06,0.113,0.587,193.1,4,1
4,268000,60,0.419,0.932,1,-3.604,0.106,0.00171,0.0,0.137,0.445,169.881,4,2


In [None]:
# Split the Data:
train_data, test_data, train_labels, test_labels = train_test_split(features, labels, test_size=0.2, random_state=42)
prediction_data = test_data

In [None]:
# Data Standardization:
scaler = StandardScaler()
train_data = scaler.fit_transform(train_data)
test_data = scaler.transform(test_data)
prediction_data = scaler.transform(prediction_data)

In [None]:
# Dimensionality Reduction:
N_DIM = 4
pca = PCA(n_components=N_DIM)
train_data = pca.fit_transform(train_data)
test_data = pca.transform(test_data)
prediction_data = pca.transform(prediction_data)

In [None]:
# Scale the Data for Quantum Processing:
combined_data = np.vstack((train_data, test_data, prediction_data))
minmax_scaler = MinMaxScaler((-np.pi, np.pi))
scaled_combined_data = minmax_scaler.fit_transform(combined_data)

train_data = scaled_combined_data[:len(train_data)]
test_data = scaled_combined_data[len(train_data):len(train_data)+len(test_data)]
prediction_data = scaled_combined_data[-len(prediction_data):]

In [None]:
# Prepare Labels:
train_labels = np.array(train_labels)
test_labels = np.array(test_labels)

In [None]:
# Quantum Data Encoding:
def convert_to_circuits(data):
    qubits = cirq.GridQubit.rect(1, N_DIM)
    circuits = []
    for sample in data:
        circuit = cirq.Circuit()
        for i, qubit in enumerate(qubits):
            circuit.append(cirq.rx(sample[i]).on(qubit))
        circuits.append(circuit)
    return circuits

train_circuits = convert_to_circuits(train_data)
test_circuits = convert_to_circuits(test_data)
prediction_circuits = convert_to_circuits(prediction_data)

In [None]:
# Convert Circuits to TensorFlow Quantum Tensors:
train_circuits = tfq.convert_to_tensor(train_circuits)
test_circuits = tfq.convert_to_tensor(test_circuits)
prediction_circuits = tfq.convert_to_tensor(prediction_circuits)

In [None]:
# Define Quantum Model:
def create_quantum_model():
    # Create a quantum circuit
    qubits = cirq.GridQubit.rect(1, N_DIM)
    circuit = cirq.Circuit()
    for qubit in qubits:
        circuit.append(cirq.rx(sympy.Symbol(f'rx_{qubit}')).on(qubit))
        circuit.append(cirq.ry(sympy.Symbol(f'ry_{qubit}')).on(qubit))
    readout = cirq.Z(qubits[0])

    # Create a Keras model with a quantum layer
    model = tf.keras.Sequential([
        tf.keras.layers.Input(shape=(), dtype=tf.dtypes.string),
        tfq.layers.PQC(circuit, readout)
    ])
    return model

quantum_model = create_quantum_model()


In [None]:
# Compile and Train the Model:
quantum_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.01), loss='binary_crossentropy', metrics=['accuracy'])
quantum_model.fit(train_circuits, train_labels, epochs=10, batch_size=32, validation_data=(test_circuits, test_labels))


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7bb85a2498d0>

In [None]:
# Make Predictions:
predictions = quantum_model.predict(prediction_circuits)
print(predictions)

[[ 7.76097357e-01]
 [-6.02479577e-01]
 [-7.16827273e-01]
 [ 1.67500392e-01]
 [-1.80070102e-01]
 [ 4.67581391e-01]
 [-3.21377404e-02]
 [-5.60704112e-01]
 [ 6.68997943e-01]
 [-9.39764977e-01]
 [-8.49646270e-01]
 [-9.99736667e-01]
 [ 6.06708527e-01]
 [-9.99430954e-01]
 [ 2.46149644e-01]
 [-9.99308765e-01]
 [ 6.22351468e-01]
 [ 1.05385199e-01]
 [-9.87720549e-01]
 [ 3.91667187e-01]
 [ 1.35202527e-01]
 [ 4.16154712e-01]
 [-6.68250501e-01]
 [-2.20336452e-01]
 [-1.98708493e-02]
 [-9.95960474e-01]
 [-8.36290717e-01]
 [-9.36443448e-01]
 [-2.99627632e-01]
 [-3.11470270e-01]
 [ 2.66182899e-01]
 [-6.51200950e-01]
 [ 1.49473548e-04]
 [-3.10974926e-01]
 [ 7.10237741e-01]
 [-1.76215276e-01]
 [ 6.80985868e-01]
 [-8.90222788e-01]
 [ 2.70534754e-01]
 [-1.37266964e-02]
 [ 1.48584038e-01]
 [-9.68246818e-01]
 [ 6.13052130e-01]
 [-7.91369557e-01]
 [ 1.58322334e-01]
 [ 6.08213305e-01]
 [-4.94534552e-01]
 [ 5.64802468e-01]
 [ 3.54889750e-01]
 [ 5.05545020e-01]
 [ 2.20164776e-01]
 [ 5.09767234e-01]
 [ 2.4500151

The output from the prediction model shows the predicted values for each input data point, ranging from negative to positive values. The range and distribution of these predictions suggest that the model is providing continuous outputs, which might be used to rate or score each input data point.

Here's a step-by-step explanation of the output and how it might be used in a recommender system like Spotify:

1. **Prediction Values**: The array of prediction values outputted by the model are continuous scores, likely representing the model's confidence or strength of recommendation for each item. Positive values close to 1 suggest a strong recommendation, while negative values indicate weak or no recommendation.

2. **Input Data Points**: Each value corresponds to a different input data point. These could be user-song pairs, where each score indicates how likely the model thinks a user would enjoy a particular song.

3. **Recommending Spotify Tracks**:
    - **Filtering Positive Scores**: Recommendations are typically made by filtering out the most positive scores. For instance, scores above a certain threshold (e.g., 0.8) could be considered strong recommendations.
    - **Ranking**: The positive scores can be sorted in descending order to rank the recommendations, presenting the highest-scoring tracks first.
    - **Contextualization**: The actual track recommendations would be contextualized by mapping these scores back to specific songs in Spotify's catalog.


### Interpreting the Predictions

The predictions output array contains continuous values which represent the model's recommendations for the test data:

- **Positive Values**: High positive values close to 1 indicate a strong recommendation.
- **Negative Values**: Values closer to -1 indicate weak or no recommendation.
- **Thresholding**: To make practical recommendations, the model can filter predictions by applying a threshold (e.g., only consider predictions above 0.8 as strong recommendations).

### How it Recommends Spotify Tracks

1. **Preprocessing**:
    - Features of Spotify tracks (like danceability, energy, etc.) are processed and scaled.
    - These features are transformed into quantum circuits.

2. **Model Predictions**:
    - The quantum model predicts the suitability of each track based on the input features.

3. **Filtering and Ranking**:
    - Predictions are filtered based on a threshold to select strong recommendations.
    - The selected tracks are ranked by their prediction scores.

### Applications:

If a user wants to find new music similar to what they already like, the system:
1. Takes the features of tracks they enjoy.
2. Encodes these features into quantum circuits.
3. Uses the trained quantum model to predict new tracks the user might enjoy.
4. Returns the top recommended tracks based on the prediction scores.

This process enables the creation of a personalized recommendation system leveraging quantum machine learning techniques.