# **Welcome to the Spotify Project**!

After reading the Gnod Project Introduction, you are now ready to develop the product that Jane, the CTO of Gnod, asked you to create. You will present it on Friday, where it will be live-tested to assess how good your song recommendations are!

---

## Instructions

### DAY 1:
**Objective:**
- Install necessary libraries and create a Spotify developer account (if not done).
- Create a DataFrame with the top 100 songs and respective artists by scraping the website [Billboard Hot 100](https://www.billboard.com/charts/hot-100/).
  - The DataFrame should have 2 columns: `song_title`, `artist`.

- Create a Python program that takes a user input, checks if it is present in your scraped DataFrame.
  - If it is, it recommends a random song from the DataFrame.
  - If it’s not, it prints: `Sorry, your song is not popular`.
  - *Important note*: For now, you will not use the DataFrame of audio features you created yesterday.


### DAY 2:
**Objective:**
  - Using the `Spotipy` library, create a DataFrame storing the `audio_features` of at least 1000 songs.
  - [**Audio features explanation**](https://developer.spotify.com/documentation/web-api/reference/get-audio-features)
  - The more diverse your playlist, the better the end result will be!

### DAY 3:
**Objective:**
- Using one of the Unsupervised Learning Algorithms that we have covered, create a model using the audio features DataFrame you created on Tuesday.

### DAY 4:
**Objective:**
- Finalize the project: Your final program should check if a song is present in your scraped `billboard_hot100` DataFrame.
  - If it is, it should recommend a random song from that DataFrame.
  - If not, it should recommend a song based on musical similarity.

---

## **Libraries Import & Settings**

In [201]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import random
from fuzzywuzzy import process
import requests
from textblob import TextBlob
import pandas as pd
from spotipy.oauth2 import SpotifyClientCredentials
import spotipy

# Function file with Spotify client_id & client_secret
import config1

In [203]:
pd.set_option('display.max_columns', None)
# pd.reset_option('display.max_rows')

## **Billboard Hot 100 hits**

In [134]:
url = "https://www.billboard.com/charts/hot-100/"
response = requests.get(url)

if response.status_code == 200:
    print("Page fetched successfully!")
    html_content = response.content
else:
    print("Failed to retrieve the page!")

soup = BeautifulSoup(html_content, "html.parser")

# Scrape song titles
songs = [song.get_text(strip=True) for song in soup.select("li.o-chart-results-list__item h3")]

# Scrape artist names
artists = [artist.get_text(strip=True) for artist in soup.select("li.o-chart-results-list__item h3 + span")]

# Scrape ranks
ranks = range(1, len(songs) + 1)

# Combine data into a DataFrame
billboard_hot_100 = pd.DataFrame({
    "song_title": songs,
    "artist": artists
})

billboard_hot_100

Page fetched successfully!


Unnamed: 0,song_title,artist
0,A Bar Song (Tipsy),Shaboozey
1,Die With A Smile,Lady Gaga & Bruno Mars
2,Birds Of A Feather,Billie Eilish
3,Espresso,Sabrina Carpenter
4,Lose Control,Teddy Swims
...,...,...
95,Hollon,GloRilla
96,Lonely Road,mgk & Jelly Roll
97,Change Me,BigXthaPlug
98,Him All Along,Gunna


In [135]:
# Save to a subfolder in the current directory
file_path = '/Users/mbouch17/Desktop/IronHack/Labs &  Project/spotify-song-recommendation/csv_files/billboard_hot_100.csv'

# Save to CSV
billboard_hot_100.to_csv(file_path, index=False)

##  **iTunes DE Top 100 songs**

In [137]:
# URL of the PopVortex Germany Top Songs page
url2 = "https://www.popvortex.com/music/germany/top-songs.php"

# Fetch the page content
response = requests.get(url2)
response.raise_for_status()  # Ensure the request was successful

# Parse the HTML content using BeautifulSoup
soup2 = BeautifulSoup(response.text, "html.parser")

# Locate song titles and artists based on the updated structure
titles = [title.text.strip() for title in soup2.select("cite.title")]
artists = [artist.text.strip() for artist in soup2.select("em.artist")]

# Create a DataFrame
itunes_de_100 = pd.DataFrame({
    "song_title": titles,
    "artist": artists
})

# Save to CSV
itunes_de_100.to_csv("/Users/mbouch17/Desktop/IronHack/Labs &  Project/spotify-song-recommendation/csv_files/iTunes_DE_100.csv", index=False)
itunes_de_100

Unnamed: 0,song_title,artist
0,Bad Dreams,Teddy Swims
1,The Emptiness Machine,LINKIN PARK
2,BIRDS OF A FEATHER,Billie Eilish
3,Now Or Never,Pitbull & Bon Jovi
4,APT.,ROSÉ & Bruno Mars
...,...,...
95,In Da Club,50 Cent
96,Na Le (Phaxe Remix),Omiki
97,In the End,LINKIN PARK
98,Perfect Soul,Spiritbox


## **Merge WW & DE dataframe | 'ww_de'**

In [139]:
# Add origin columns to each DataFrame
billboard_hot_100["origin"] = "WW"
itunes_de_100["origin"] = "DE"

# Normalize song titles for case-insensitive matching
billboard_hot_100["song_title_normalized"] = billboard_hot_100["song_title"].str.lower()
itunes_de_100["song_title_normalized"] = itunes_de_100["song_title"].str.lower()

# Merge the two DataFrames on normalized song titles
merged_df = pd.merge(
    billboard_hot_100,
    itunes_de_100,
    on="song_title_normalized",
    suffixes=('_billboard', '_itunes'),
    how="outer"
)

# Determine the origin column
def determine_origin(row):
    if pd.notnull(row["origin_billboard"]) and pd.notnull(row["origin_itunes"]):
        return "WW_DE"
    elif pd.notnull(row["origin_billboard"]):
        return "WW"
    elif pd.notnull(row["origin_itunes"]):
        return "DE"
    return None

merged_df["origin"] = merged_df.apply(determine_origin, axis=1)

# Create the final DataFrame with only the required columns
ww_de = pd.DataFrame({
    "song_title": merged_df["song_title_normalized"].str.title(),
    "artist": merged_df["artist_billboard"].combine_first(merged_df["artist_itunes"]),
    "origin": merged_df["origin"]
})

ww_de

Unnamed: 0,song_title,artist,origin
0,231 - Und Der Dreiäugige Schakal (Inhaltsangabe),Die drei ???,DE
1,25,Rod Wave,WW
2,28,Zach Bryan,WW
3,2Am,BigXthaPlug,WW
4,A Bar Song (Tipsy),Shaboozey,WW_DE
...,...,...,...
183,World Gone Wild (Feat. Sam Martin),Robin Schulz & CYRIL,DE
184,Wunder,AYLIVA & Apache 207,DE
185,You Look Like You Love Me,Ella Langley Featuring Riley Green,WW
186,Zombie,The Cranberries,DE


In [140]:
# Save to a subfolder in the current directory
file_path = '/Users/mbouch17/Desktop/IronHack/Labs &  Project/spotify-song-recommendation/csv_files/ww_de.csv'

# Save to CSV
ww_de.to_csv(file_path, index=False)

## **User Input**

In [142]:
# Function to recommend song
def recommend_song(user_input):
    # Normalize input to lowercase and remove extra spaces
    user_input = user_input.strip().lower()

    # Use spelling correction to handle minor typos
    user_input = str(TextBlob(user_input).correct())

    # Use fuzzywuzzy to match the input with song_title or artist
    titles = ww_de['song_title'].tolist()
    artists = ww_de['artist'].tolist()

    # Try to match the input with song titles and artists
    matched_title, score_title = process.extractOne(user_input, titles)
    matched_artist, score_artist = process.extractOne(user_input, artists)

    # Set a threshold for acceptable match scores (e.g., 80%)
    if score_title >= 80 or score_artist >= 80:
        # Show matched song and artist
        matched_song_info = ww_de[(ww_de['song_title'] == matched_title) | (ww_de['artist'] == matched_artist)]
        
        print(f"\nFound a match: {matched_song_info.iloc[0]['song_title']} by {matched_song_info.iloc[0]['artist']}")
        
        # Ask for confirmation before recommending a random song
        while True:
            confirm = input("Would you like a random song recommendation from the list? (yes/no): ").strip().lower()

            if confirm == 'yes':
                # Recommend a random song
                random_song = ww_de.sample(n=1).iloc[0]
                print(f"How about this one: {random_song['song_title']} by {random_song['artist']}")
                return True  # Continue to menu
            elif confirm == 'no':
                print("Okay, no recommendations at the moment.")
                return False  # Indicate to stop
            else:
                print("Invalid input. Please enter 'yes' or 'no'.")
    else:
        print("Sorry, no popular match found for your song or artist.")
    return True  # Continue processing

# Main program
def main():
    while True:
        user_input = input("Enter a song title or artist: ").strip()
        
        if user_input:
            continue_recommendation = recommend_song(user_input)
            if not continue_recommendation:
                break  # Exit the main loop if user says 'no'

            while True:
                # Ask user for the next action
                choice = input("\nWould you like to:\n(1) Get another recommendation based on the same artist/song\n(2) Type a new artist/song\n(3) Exit\nEnter 1, 2, or 3: ").strip()

                if choice == '1':
                    # Get another recommendation and return to this menu
                    continue_recommendation = recommend_song(user_input)
                    if not continue_recommendation:
                        return  # Exit the entire program
                elif choice == '2':
                    # Ask for a new artist/song
                    break
                elif choice == '3':
                    print("Thank you for using the song recommendation system!")
                    return  # Exit the entire program
                else:
                    print("Invalid input. Please enter 1, 2, or 3.")
        else:
            print("Please enter a valid song title or artist.")

# Start the program
if __name__ == "__main__":
    main()


Enter a song title or artist:  25



Found a match: 25 by Rod Wave


Would you like a random song recommendation from the list? (yes/no):  no


Okay, no recommendations at the moment.


## **Spotify**

### Connect API

In [None]:
#client_id = '6559a69aadc144b7ab62c6bbd9679e7b'
#client_secret = 'b1918a0cb574446d9b86eb3f7f730ad6'

In [159]:
#Initialize SpotiPy with user credentials
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id= client_id,
                                                           client_secret= client_secret))



results = sp.search(q="daddy cool",limit=5,market="GB")

### Fetch [**audio features**](https://developer.spotify.com/documentation/web-api/reference/get-audio-features) of 1000 songs

In [207]:
# Function to fetch track IDs based on search queries
# def get_track_ids(query, limit=50, offset=0):
#     results = sp.search(q=query, type='track', limit=limit, offset=offset)
#     return [track['id'] for track in results['tracks']['items']]

# Function to fetch audio features for a list of track IDs
# def get_audio_features(track_ids):
#     features = sp.audio_features(track_ids)
#     return [f for f in features if f]  # Filter out None results

# Generate a diverse list of queries
# queries = ['love', 'dance', 'happy', 'rock', 'pop', 'jazz', 'indie', 'classical', 
#          'hip hop', 'chill', 'blues', 'reggae', 'party', 'electronic', 'latin']

# Fetch and process data
# all_features = []
# seen_track_ids = set()  # To ensure unique tracks

# for query in queries:
#     for offset in range(0, 1000, 50):  # Iterate with offsets to get more tracks
#        track_ids = get_track_ids(query, limit=50, offset=offset)
#        unique_track_ids = [t for t in track_ids if t not in seen_track_ids]  # Filter unique IDs
#        seen_track_ids.update(unique_track_ids)

#        features = get_audio_features(unique_track_ids)  # Fetch audio features
#        all_features.extend(features)  # Append to the master list

#        if len(all_features) >= 1000:  # Stop once we have 1000+ songs
#            break

#    if len(all_features) >= 1000:  # Stop outer loop if we hit the target
#        break

# Create a DataFrame from the features
# df_audio_features= pd.DataFrame(all_features)

# Save to CSV
# df_audio_features.to_csv("/Users/mbouch17/Desktop/IronHack/Labs &  Project/spotify-song-recommendation/csv_files/df_audio_features.csv", index=False)

df_audio_features.head()


Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.619,0.843,5,-5.348,0,0.0284,0.00948,0.000118,0.164,0.746,122.064,audio_features,7hR22TOX3RorxJPcsz5Wbo,spotify:track:7hR22TOX3RorxJPcsz5Wbo,https://api.spotify.com/v1/tracks/7hR22TOX3Ror...,https://api.spotify.com/v1/audio-analysis/7hR2...,204829,4
1,0.688,0.518,5,-4.285,1,0.0283,0.0642,0.0,0.1,0.314,116.714,audio_features,0W4NhJhcqKCqEP2GIpDCDq,spotify:track:0W4NhJhcqKCqEP2GIpDCDq,https://api.spotify.com/v1/tracks/0W4NhJhcqKCq...,https://api.spotify.com/v1/audio-analysis/0W4N...,255333,4
2,0.559,0.871,5,-5.338,0,0.0397,8e-06,0.00545,0.096,0.572,104.97,audio_features,6dBUzqjtbnIa1TwYbyw5CM,spotify:track:6dBUzqjtbnIa1TwYbyw5CM,https://api.spotify.com/v1/tracks/6dBUzqjtbnIa...,https://api.spotify.com/v1/audio-analysis/6dBU...,213920,4
3,0.67,0.634,11,-6.471,1,0.0326,0.0124,0.0,0.0946,0.497,124.926,audio_features,2XHzzp1j4IfTNp1FTn7YFg,spotify:track:2XHzzp1j4IfTNp1FTn7YFg,https://api.spotify.com/v1/tracks/2XHzzp1j4IfT...,https://api.spotify.com/v1/audio-analysis/2XHz...,255053,4
4,0.351,0.296,4,-10.109,0,0.0333,0.934,0.0,0.095,0.12,115.284,audio_features,0u2P5u6lvoDfwTYjAADbn4,spotify:track:0u2P5u6lvoDfwTYjAADbn4,https://api.spotify.com/v1/tracks/0u2P5u6lvoDf...,https://api.spotify.com/v1/audio-analysis/0u2P...,200186,4


In [216]:
# Function to fetch artist and song details
def get_artist_and_song(track_ids):
    track_details = sp.tracks(track_ids)['tracks']
    data = []
    for track in track_details:
        if track:  # Ensure the track is valid
            song_name = track['name']
            artist_name = ", ".join([artist['name'] for artist in track['artists']])  # Join multiple artists
            data.append({'id': track['id'], 'song': song_name, 'artist': artist_name})
    return pd.DataFrame(data)

# Split the track IDs into batches of 50 (Spotify API limit for batch queries)
batch_size = 50
track_id_batches = [df_audio_features['id'].iloc[i:i + batch_size].tolist() 
                    for i in range(0, len(df_audio_features), batch_size)]

# Fetch artist and song details for all batches
artist_song_data = pd.concat([get_artist_and_song(batch) for batch in track_id_batches])

# Merge the artist and song data back into the main DataFrame
df_audio_features = df_audio_features.merge(artist_song_data, on='id', how='left')

# Display the updated DataFrame
print(df_audio_features.head())

# Save the updated DataFrame to CSV
df_audio_features.to_csv("/Users/mbouch17/Desktop/IronHack/Labs &  Project/spotify-song-recommendation/csv_files/df_audio_features_with_artist_song.csv", index=False)


SpotifyOauthError: error: invalid_client, error_description: Invalid client

In [189]:
df_audio_features.shape

(1000, 18)

In [211]:
# Check for duplicate IDs
duplicates = df_audio_features[df_audio_features['id'].duplicated()]

# Display duplicate rows
print(duplicates)

Empty DataFrame
Columns: [danceability, energy, key, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, tempo, type, id, uri, track_href, analysis_url, duration_ms, time_signature]
Index: []
