In [38]:
# %% [markdown]
# # Precompute and Cache Music Data for Song Recommendation
#
# In this notebook, we will:
# 1. Fetch music data (top artists, tracks, lyrics) from Spotify and Lyrics.ovh.
# 2. Generate a detailed song description using Google Gemini.
# 3. Compute embeddings for each song description using a Sentence Transformer.
# 4. Cache the results offline to avoid heavy processing at runtime.
#
# **Note:** We use your provided Spotify credentials. Replace `your_gemini_api_key` with your actual Gemini API key.

# %%
import json
import requests
import base64
import google.generativeai as genai
from sentence_transformers import SentenceTransformer
import pickle
import time



In [39]:
pip install google.generativeai

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [40]:
pip install sentence_transformers

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [41]:
pip install tf-keras

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable
Collecting numpy<2.2.0,>=1.26.0 (from tensorflow<2.20,>=2.19->tf-keras)
  Using cached numpy-2.0.2-cp39-cp39-macosx_14_0_arm64.whl.metadata (60 kB)
Using cached numpy-2.0.2-cp39-cp39-macosx_14_0_arm64.whl (5.3 MB)
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.24.0
    Uninstalling numpy-1.24.0:
      Successfully uninstalled numpy-1.24.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
albumentations 1.4.20 requires scipy>=1.10.0, but you have scipy 1.9.1 which is incompatible.
contourpy 1.2.0 requires numpy<2.0,>=1.20, but you have numpy 2.0.2 which is incompatible.
matplotlib 3.8.2 requires numpy<2,>=1.21, but you have numpy 2.0.2 which is incompatible.
pandas 2.1.4 requires numpy<2,>=1.22.4; python_version < "3.11", 

In [42]:
pip install numpy==1.24.0


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable
Collecting numpy==1.24.0
  Using cached numpy-1.24.0-cp39-cp39-macosx_11_0_arm64.whl.metadata (5.6 kB)
Using cached numpy-1.24.0-cp39-cp39-macosx_11_0_arm64.whl (13.9 MB)
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 2.0.2
    Uninstalling numpy-2.0.2:
      Successfully uninstalled numpy-2.0.2
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
albucore 0.0.19 requires numpy>=1.24.4, but you have numpy 1.24.0 which is incompatible.
albumentations 1.4.20 requires numpy>=1.24.4, but you have numpy 1.24.0 which is incompatible.
albumentations 1.4.20 requires scipy>=1.10.0, but you have scipy 1.9.1 which is incompatible.
seaborn 0.13.2 requires numpy!=1.24.0,>=1.20, but you have numpy 1.24.0 which is incompatible.
segmentation-

In [43]:
# Spotify API credentials 
CLIENT_ID = 'cc4a25cce13f4b3dbb91fc2e09f44870'
CLIENT_SECRET = '3a9b7a2f0ca642e881805a890135f6ed'
GEMINI_API_KEY = 'AIzaSyC2KQPEjT-RDGoQwFJW2pgryK7gjr_ueqo '  # Replace with your actual Gemini API key

# Configure Google Gemini API
genai.configure(api_key=GEMINI_API_KEY)

# Load the Sentence Transformer model once (this might take some time on the first run)
st_model = SentenceTransformer('all-mpnet-base-v2')

In [58]:



def get_access_token():
    url = 'https://accounts.spotify.com/api/token'
    headers = {
        'Authorization': 'Basic ' + base64.b64encode(f'{CLIENT_ID}:{CLIENT_SECRET}'.encode()).decode()
    }
    data = {'grant_type': 'client_credentials'}
    response = requests.post(url, headers=headers, data=data)
    try:
        result = response.json()
    except json.JSONDecodeError:
        print("Error decoding Spotify token response:", response.text)
        return None
    if response.status_code == 200 and result:
        return result.get('access_token')
    else:
        print(f"Could not authenticate: {result}")
        return None

def search_top_artists(token, limit):
    url = "https://api.spotify.com/v1/search"
    headers = {'Authorization': f'Bearer {token}'}
    params = {'q': 'genre:pop', 'type': 'artist', 'limit': limit}
    response = requests.get(url, headers=headers, params=params)
    try:
        result = response.json()
    except json.JSONDecodeError:
        print("Error decoding artists response:", response.text)
        return []
    if response.status_code == 200 and result:
        return result.get('artists', {}).get('items', [])
    else:
        print(f"Error fetching artists: {response.status_code} - {result}")
        return []

def get_top_tracks(token, artist_id, country='US', limit=50):
    url = f"https://api.spotify.com/v1/artists/{artist_id}/top-tracks"
    headers = {'Authorization': f'Bearer {token}'}
    params = {'country': country}
    response = requests.get(url, headers=headers, params=params)
    try:
        result = response.json()
    except json.JSONDecodeError:
        print("Error decoding top tracks response:", response.text)
        return []
    if response.status_code == 200 and result:
        return result.get('tracks', [])[:limit]
    else:
        print(f"Error fetching top tracks: {response.status_code} - {result}")
        return []

def get_lyrics(artist, title):
    url = f"https://api.lyrics.ovh/v1/{artist}/{title}"
    response = requests.get(url)
    try:
        result = response.json()
    except json.JSONDecodeError:
        print("Error decoding lyrics response for", artist, title, ":", response.text)
        return 'Lyrics not found.'
    if response.status_code == 200 and result:
        return result.get('lyrics', 'Lyrics not found.')
    else:
        return 'Lyrics not found.'

def generate_song_description(artist, track, lyrics):
    """
    Uses Google Gemini generative AI to create a detailed description from lyrics.
    """
    prompt = f"""
    You are a music expert who analyzes song lyrics to provide a concise description of the song.
    Artist Name: {artist}
    Track Name: {track}
    Lyrics:
    {lyrics}
    you tell me a detailed description about the nature,the vibe, the place where the song was used, the place where it was shooted,
    the environment,the place where where the song can be sung,for whom it can be sung,the audinece it is affecting,who it affects the mind
    """
    model = genai.GenerativeModel("gemini-1.5-flash")
    response = model.generate_content(prompt)
    description = None
    if response and hasattr(response, 'candidates'):
        candidate = response.candidates[0]
        if hasattr(candidate, 'content'):
            # If candidate.content is already a string, use it.
            if isinstance(candidate.content, str):
                description = candidate.content
            # Otherwise, if it has a 'parts' attribute, extract the text from the first part.
            elif hasattr(candidate.content, 'parts'):
                description = candidate.content.parts[0].text
            else:
                description = str(candidate.content)
    return description


def compute_embedding(text):
    return st_model.encode(text)


In [59]:

# %% [markdown]
# ## Precompute Music Data
#
# The cell below fetches top artists and tracks from Spotify, retrieves lyrics,
# generates a detailed description for each track, computes embeddings, and stores the results.

# %%
def precompute_song_data():
    # Fetch Spotify access token
    token = get_access_token()
    if not token:
        print("Failed to get Spotify access token.")
        return []
    
    print("Fetching top artists...")
    artists = search_top_artists(token, limit=50)
    
    song_data = []  # List to hold precomputed song info
    
    for artist in artists:
        artist_name = artist['name']
        artist_id = artist['id']
        print(f"Processing artist: {artist_name}")
        
        # Get top tracks for this artist
        tracks = get_top_tracks(token, artist_id, limit=5)
        for track in tracks:
            track_name = track['name']
            print(f"  Track: {track_name}")
            
            # Retrieve lyrics
            lyrics = get_lyrics(artist_name, track_name)
            if not lyrics or lyrics == 'Lyrics not found.':
                print("    No lyrics found; skipping track.,still trying to get description")
                # continue
            
            # Generate song description using Gemini (this call can be slow)
            description = generate_song_description(artist_name, track_name, lyrics)
            if not description:
                print("    No description generated; skipping track.")
                continue
            
            # Compute embedding for the song description
            embedding = compute_embedding(description)
            
            # Store data in a dictionary
            song_info = {
                'artist': artist_name,
                'track': track_name,
                'lyrics': lyrics[:500] + "..." if len(lyrics) > 500 else lyrics,
                'description': description,
                'embedding': embedding  # This is a NumPy array
            }
            song_data.append(song_info)
            
            # Optional: Add a small delay to respect API rate limits
            time.sleep(1)
    
    return song_data



In [60]:
# Precompute and cache the song data
song_data = precompute_song_data()

# Save the precomputed data to a pickle file
with open('song_data.pkl', 'wb') as f:
    pickle.dump(song_data, f)

print(f"Precomputed data for {len(song_data)} songs saved to 'song_data.pkl'.")


Fetching top artists...
Processing artist: Sachin-Jigar
  Track: Apna Bana Le
    No lyrics found; skipping track.,still trying to get description
  Track: Aaj Ki Raat (From "Stree 2")
    No lyrics found; skipping track.,still trying to get description
  Track: Tum Se (From "Teri Baaton Mein Aisa Uljha Jiya")
    No lyrics found; skipping track.,still trying to get description
  Track: Tainu Khabar Nahi - From "Munjya"
    No lyrics found; skipping track.,still trying to get description
  Track: Aayi Nai (From "Stree 2")
    No lyrics found; skipping track.,still trying to get description
Processing artist: The Weeknd
  Track: Timeless (feat Playboi Carti)
    No lyrics found; skipping track.,still trying to get description
  Track: One Of The Girls (with JENNIE, Lily Rose Depp)
    No lyrics found; skipping track.,still trying to get description
  Track: Cry For Me
    No lyrics found; skipping track.,still trying to get description
  Track: São Paulo (feat. Anitta)
    No lyrics fou

In [61]:
for i, song in enumerate(song_data, start=1):
    print(f"Song #{i}")
    print(f"  Artist: {song['artist']}")
    print(f"  Track: {song['track']}")
    print(f"  Description: {song['description']}")
    print(f"  Embedding shape: {song['embedding'].shape}")
    print()

Song #1
  Artist: Sachin-Jigar
  Track: Apna Bana Le
  Description: Since the lyrics for "Apna Bana Le" by Sachin-Jigar are unavailable, I can only offer a speculative description based on the title and the composers' known style.  Sachin-Jigar are known for creating upbeat, catchy, and often romantic Bollywood music.  Therefore, I will construct a likely scenario:

**Nature of the Song:**  "Apna Bana Le" (Make Me Yours) suggests a romantic song, likely a playful and slightly assertive declaration of love. The vibe is probably upbeat and energetic, possibly with a mix of traditional and modern Indian musical elements. It might incorporate a blend of pop, Indi-pop, or even a touch of folk depending on the specific film or project.

**Vibe:**  Energetic, romantic, playful, slightly flirtatious, hopeful.  Think vibrant colors, sunshine, and a feeling of youthful exuberance.

**Place Where the Song Was Used/Shot:**  The song would likely be placed within a Bollywood film or a similar India