# Welcome to the Spotify Project!

After reading the Gnod Project Introduction, you are now ready to develop the product that Jane, the CTO of Gnod, asked you to create. You will present it on Friday, where it will be live-tested to assess how good your song recommendations are!

---

## Instructions

### DAY 1:
**Objective:**
- Install necessary libraries and create a Spotify developer account (if not done).
- Create a DataFrame with the top 100 songs and respective artists by scraping the website [Billboard Hot 100](https://www.billboard.com/charts/hot-100/).
  - The DataFrame should have 2 columns: `song_title`, `artist`.

- Create a Python program that takes a user input, checks if it is present in your scraped DataFrame.
  - If it is, it recommends a random song from the DataFrame.
  - If it’s not, it prints: `Sorry, your song is not popular`.
  - *Important note*: For now, you will not use the DataFrame of audio features you created yesterday.


### DAY 2:
**Objective:**
  - Using the `Spotipy` library, create a DataFrame storing the `audio_features` of at least 1000 songs.
  - The more diverse your playlist, the better the end result will be!

### DAY 3:
**Objective:**
- Using one of the Unsupervised Learning Algorithms that we have covered, create a model using the audio features DataFrame you created on Tuesday.

### DAY 4:
**Objective:**
- Finalize the project: Your final program should check if a song is present in your scraped `billboard_hot100` DataFrame.
  - If it is, it should recommend a random song from that DataFrame.
  - If not, it should recommend a song based on musical similarity.

---

## Libraries Import & Settings

In [36]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import random
from fuzzywuzzy import process
import requests
from textblob import TextBlob

In [37]:
# pd.set_option('display.max_rows', None)
pd.reset_option('display.max_rows')

## Billboard Hot 100 hits

In [39]:
url = "https://www.billboard.com/charts/hot-100/"
response = requests.get(url)

if response.status_code == 200:
    print("Page fetched successfully!")
    html_content = response.content
else:
    print("Failed to retrieve the page!")

soup = BeautifulSoup(html_content, "html.parser")

# Scrape song titles
songs = [song.get_text(strip=True) for song in soup.select("li.o-chart-results-list__item h3")]

# Scrape artist names
artists = [artist.get_text(strip=True) for artist in soup.select("li.o-chart-results-list__item h3 + span")]

# Scrape ranks
ranks = range(1, len(songs) + 1)

# Combine data into a DataFrame
billboard_hot_100 = pd.DataFrame({
    "song_title": songs,
    "artist": artists
})

billboard_hot_100

Page fetched successfully!


Unnamed: 0,song_title,artist
0,A Bar Song (Tipsy),Shaboozey
1,Die With A Smile,Lady Gaga & Bruno Mars
2,Birds Of A Feather,Billie Eilish
3,Espresso,Sabrina Carpenter
4,Lose Control,Teddy Swims
...,...,...
95,Hollon,GloRilla
96,Lonely Road,mgk & Jelly Roll
97,Change Me,BigXthaPlug
98,Him All Along,Gunna


In [40]:
# Save to a subfolder in the current directory
file_path = '/Users/mbouch17/Desktop/IronHack/Labs &  Project/spotify-song-recommendation/csv_files/billboard_hot_100.csv'

# Save to CSV
billboard_hot_100.to_csv(file_path, index=False)

##  iTunes DE Top 100 songs

In [42]:
# URL of the PopVortex Germany Top Songs page
url2 = "https://www.popvortex.com/music/germany/top-songs.php"

# Fetch the page content
response = requests.get(url2)
response.raise_for_status()  # Ensure the request was successful

# Parse the HTML content using BeautifulSoup
soup2 = BeautifulSoup(response.text, "html.parser")

# Locate song titles and artists based on the updated structure
titles = [title.text.strip() for title in soup2.select("cite.title")]
artists = [artist.text.strip() for artist in soup2.select("em.artist")]

# Create a DataFrame
itunes_de_100 = pd.DataFrame({
    "song_title": titles,
    "artist": artists
})

# Save to CSV
itunes_de_100.to_csv("/Users/mbouch17/Desktop/IronHack/Labs &  Project/spotify-song-recommendation/csv_files/iTunes_DE_100.csv", index=False)
itunes_de_100

Unnamed: 0,song_title,artist
0,The Emptiness Machine,LINKIN PARK
1,Bad Dreams,Teddy Swims
2,BIRDS OF A FEATHER,Billie Eilish
3,APT.,ROSÉ & Bruno Mars
4,Now Or Never,Pitbull & Bon Jovi
...,...,...
95,Cut the Bridge,LINKIN PARK
96,Follow Me,Amanda Lear
97,Like a Prayer,Madonna
98,Zombie,The Cranberries


## Merge WW & DE dataframe | 'ww_de'

In [44]:
# Add origin columns to each DataFrame
billboard_hot_100["origin"] = "WW"
itunes_de_100["origin"] = "DE"

# Normalize song titles for case-insensitive matching
billboard_hot_100["song_title_normalized"] = billboard_hot_100["song_title"].str.lower()
itunes_de_100["song_title_normalized"] = itunes_de_100["song_title"].str.lower()

# Merge the two DataFrames on normalized song titles
merged_df = pd.merge(
    billboard_hot_100,
    itunes_de_100,
    on="song_title_normalized",
    suffixes=('_billboard', '_itunes'),
    how="outer"
)

# Determine the origin column
def determine_origin(row):
    if pd.notnull(row["origin_billboard"]) and pd.notnull(row["origin_itunes"]):
        return "WW_DE"
    elif pd.notnull(row["origin_billboard"]):
        return "WW"
    elif pd.notnull(row["origin_itunes"]):
        return "DE"
    return None

merged_df["origin"] = merged_df.apply(determine_origin, axis=1)

# Create the final DataFrame with only the required columns
ww_de = pd.DataFrame({
    "song_title": merged_df["song_title_normalized"].str.title(),
    "artist": merged_df["artist_billboard"].combine_first(merged_df["artist_itunes"]),
    "origin": merged_df["origin"]
})

ww_de

Unnamed: 0,song_title,artist,origin
0,25,Rod Wave,WW
1,28,Zach Bryan,WW
2,2Am,BigXthaPlug,WW
3,A Bar Song (Tipsy),Shaboozey,WW_DE
4,A Lot More Free,Max McNown,DE
...,...,...,...
182,World Gone Wild (Feat. Sam Martin),Robin Schulz & CYRIL,DE
183,Wunder,AYLIVA & Apache 207,DE
184,You Look Like You Love Me,Ella Langley Featuring Riley Green,WW
185,Zombie,The Cranberries,DE


In [57]:
# Save to a subfolder in the current directory
file_path = '/Users/mbouch17/Desktop/IronHack/Labs &  Project/spotify-song-recommendation/csv_files/ww_de.csv'

# Save to CSV
ww_de.to_csv(file_path, index=False)

## User Input

In [47]:
# Function to recommend song
def recommend_song(user_input):
    # Normalize input to lowercase and remove extra spaces
    user_input = user_input.strip().lower()

    # Use spelling correction to handle minor typos
    user_input = str(TextBlob(user_input).correct())

    # Use fuzzywuzzy to match the input with song_title or artist
    titles = ww_de['song_title'].tolist()
    artists = ww_de['artist'].tolist()

    # Try to match the input with song titles and artists
    matched_title, score_title = process.extractOne(user_input, titles)
    matched_artist, score_artist = process.extractOne(user_input, artists)

    # Set a threshold for acceptable match scores (e.g., 80%)
    if score_title >= 80 or score_artist >= 80:
        # Show matched song and artist
        matched_song_info = ww_de[(ww_de['song_title'] == matched_title) | (ww_de['artist'] == matched_artist)]
        
        print(f"\nFound a match: {matched_song_info.iloc[0]['song_title']} by {matched_song_info.iloc[0]['artist']}")
        
        # Ask for confirmation before recommending a random song
        while True:
            confirm = input("Would you like a random song recommendation from the list? (yes/no): ").strip().lower()

            if confirm == 'yes':
                # Recommend a random song
                random_song = ww_de.sample(n=1).iloc[0]
                print(f"How about this one: {random_song['song_title']} by {random_song['artist']}")
                return True  # Continue to menu
            elif confirm == 'no':
                print("Okay, no recommendations at the moment.")
                return False  # Indicate to stop
            else:
                print("Invalid input. Please enter 'yes' or 'no'.")
    else:
        print("Sorry, no popular match found for your song or artist.")
    return True  # Continue processing

# Main program
def main():
    while True:
        user_input = input("Enter a song title or artist: ").strip()
        
        if user_input:
            continue_recommendation = recommend_song(user_input)
            if not continue_recommendation:
                break  # Exit the main loop if user says 'no'

            while True:
                # Ask user for the next action
                choice = input("\nWould you like to:\n(1) Get another recommendation based on the same artist/song\n(2) Type a new artist/song\n(3) Exit\nEnter 1, 2, or 3: ").strip()

                if choice == '1':
                    # Get another recommendation and return to this menu
                    continue_recommendation = recommend_song(user_input)
                    if not continue_recommendation:
                        return  # Exit the entire program
                elif choice == '2':
                    # Ask for a new artist/song
                    break
                elif choice == '3':
                    print("Thank you for using the song recommendation system!")
                    return  # Exit the entire program
                else:
                    print("Invalid input. Please enter 1, 2, or 3.")
        else:
            print("Please enter a valid song title or artist.")

# Start the program
if __name__ == "__main__":
    main()


Enter a song title or artist:  ferfefw


Sorry, no popular match found for your song or artist.



Would you like to:
(1) Get another recommendation based on the same artist/song
(2) Type a new artist/song
(3) Exit
Enter 1, 2, or 3:  3


Thank you for using the song recommendation system!
