## Multi-API Data Extraction & Integration for the Music Industry

**Introduction**

This project demonstrates how multi-API integration—specifically leveraging Spotify, YouTube, Genius, Ticketmaster, and Discogs APIs—can drive data-driven decision-making aligned with precise industry objectives. By aggregating data from these diverse sources, this project builds unified and customizable datasets, optimized for focused analysis within the music industry. Key data extracted includes artist metadata, streaming and engagement metrics, audience sentiment, and event information, enabling targeted insights into trend forecasting, audience segmentation, and fan engagement strategies. Additionally, it showcases how API-driven workflows can automate data updates, making it a scalable solution for ongoing monitoring of artist performance and audience trends.

**Objective**

The objective of this project was to create automated workflows for extracting structured data frames from multiple music industry APIs (Spotify, YouTube, Genius, Ticketmaster, and Discogs). By building a streamlined data pipeline, this project simplifies the data collection process, allowing for efficient, repeatable extraction of critical information—such as artist profiles, engagement metrics, audience demographics, and event details. The resulting data frames are designed to be easily integrated into analytical models, enabling continuous and up-to-date insights for strategic decision-making within the music industry.

In [1]:
from dotenv import load_dotenv
import os
import base64
import requests
import lyricsgenius as lg
import pandas as pd
import time 
from googleapiclient.discovery import build 
from difflib import get_close_matches

![Spotify Logo](Assets/spotify.png)

In [2]:
# Load the .env file
load_dotenv("Data_api.env")

True

In [3]:
# Access environment variables
client_id = os.getenv("CLIENT_ID")
client_secret = os.getenv("CLIENT_SECRET")

In [4]:
def get_access_token(client_id, client_secret):
    auth_url = 'https://accounts.spotify.com/api/token'
    auth_header = {
        'Authorization': f'Basic {base64.b64encode((client_id + ":" + client_secret).encode()).decode()}'
    }
    auth_data = {
        'grant_type': 'client_credentials',
        'client_id': client_id,
        'client_secret': client_secret
    }

    auth_response = requests.post(auth_url, data=auth_data, headers=auth_header)
    auth_response_data = auth_response.json()
    access_token = auth_response_data['access_token']
    return access_token

artist_name = 'The Beatles'

In [5]:
def get_artist_id(artist_name, access_token):
    search_url = 'https://api.spotify.com/v1/search'
    headers = {'Authorization': f'Bearer {access_token}'}
    params = {
        'q': artist_name,
        'type': 'artist',
        'limit': 1
    }
    response = requests.get(search_url, headers=headers, params=params)
    if response.status_code == 200:
        return response.json()['artists']['items'][0]['id']
    else:
        print("Failed to get artist ID:", response.status_code)
        return None

# Step 1: Get the access token
access_token = get_access_token(client_id, client_secret)

# Step 2: Get artist ID for The Beatles using the generated access token
artist_id = get_artist_id('The Beatles', access_token)
print("The Beatles' Artist ID:", artist_id)

The Beatles' Artist ID: 3WrFJ7ztbogyGnTHbHJFl2


In [6]:
# Table 1. Artist Data
def get_artist_data(artist_name, access_token):
    artist_id = get_artist_id(artist_name, access_token)
    if artist_id:
        artist_url = f'https://api.spotify.com/v1/artists/{artist_id}'
        headers = {'Authorization': f'Bearer {access_token}'}
        response = requests.get(artist_url, headers=headers).json()
        artist_data = {
            'Artist ID': response['id'],
            'Name': response['name'],
            'Genres': response['genres'],
            'Popularity': response['popularity'],
            'Followers': response['followers']['total']
        }
        return pd.DataFrame([artist_data])
    else:
        return pd.DataFrame()

# Example usage for artist data
artist_df = get_artist_data(artist_name, access_token)
print("\033[1;34m\nArtist DataFrame\033[0m")
artist_df.head()

[1;34m
Artist DataFrame[0m


Unnamed: 0,Artist ID,Name,Genres,Popularity,Followers
0,3WrFJ7ztbogyGnTHbHJFl2,The Beatles,"[british invasion, classic rock, merseybeat, p...",86,28745842


In [7]:
# Table 2. Album Data for the Artist
def get_album_data(artist_id, access_token, max_albums=10):
    album_url = f'https://api.spotify.com/v1/artists/{artist_id}/albums'
    headers = {'Authorization': f'Bearer {access_token}'}
    album_params = {
        'limit': max_albums,
        'album_type': 'album'
    }
    response = requests.get(album_url, headers=headers, params=album_params).json()
    album_data = [{
        'Album ID': album['id'],
        'Album Name': album['name'],
        'Release Date': album['release_date'],
        'Total Tracks': album['total_tracks'],
        'Album Type': album['album_type']
    } for album in response['items']]
    return pd.DataFrame(album_data)

# Example usage for albums

album_df = get_album_data(artist_id, access_token)
print("\033[1;34m\nAlbum DataFrame\033[0m")
album_df.head()

[1;34m
Album DataFrame[0m


Unnamed: 0,Album ID,Album Name,Release Date,Total Tracks,Album Type
0,2AlPRfYeskAMxhJS00xjeP,The Beatles 1967 – 1970 (2023 Edition),2023-11-10,37,album
1,39Ti6Be9Ak2d6YbxlQo0Ba,The Beatles 1962 – 1966 (2023 Edition),2023-11-10,38,album
2,7C221PnWhYGv8Tc0xSbfdc,Revolver (Super Deluxe),2022-10-28,63,album
3,6emgUTDksZyhhWmtjM9FCs,Get Back (Rooftop Performance),2022-01-28,10,album
4,1BdxbYp1FaNejpDgtDo25V,Let It Be (Super Deluxe),2021-10-15,57,album


In [8]:
# Function to get all albums for the artist
def get_artist_albums(artist_id, access_token, album_limit=5):
    albums_url = f'https://api.spotify.com/v1/artists/{artist_id}/albums'
    headers = {'Authorization': f'Bearer {access_token}'}
    albums = []
    params = {'limit': album_limit}  # Limit the number of albums retrieved
    while len(albums) < album_limit:
        response = requests.get(albums_url, headers=headers, params=params)
        if response.status_code == 200:
            data = response.json()
            albums.extend(data['items'])
            if data['next'] is None or len(albums) >= album_limit:
                break
            params['offset'] = params.get('offset', 0) + album_limit
            time.sleep(0.5)  # To avoid rate limiting
        else:
            print(f"Failed to get albums for artist ID {artist_id}: {response.status_code}")
            break
    return albums[:album_limit]  # Return only up to the specified limit

# Helper function to get track popularity
def get_track_popularity(track_id, access_token):
    track_url = f'https://api.spotify.com/v1/tracks/{track_id}'
    headers = {'Authorization': f'Bearer {access_token}'}
    response = requests.get(track_url, headers=headers)
    if response.status_code == 200:
        return response.json().get('popularity', None)
    else:
        print(f"Failed to get popularity for track ID {track_id}: {response.status_code}")
        return None

# Helper function to get the number of playlists a track is in
def get_playlist_count_for_track(track_name, access_token, limit=50):
    search_url = 'https://api.spotify.com/v1/search'
    headers = {'Authorization': f'Bearer {access_token}'}
    params = {
        'q': track_name,
        'type': 'playlist',
        'limit': limit
    }
    response = requests.get(search_url, headers=headers, params=params)
    if response.status_code == 200:
        return len(response.json()['playlists']['items'])
    else:
        print(f"Failed to search playlists for track '{track_name}': {response.status_code}")
        return 0

# Function to get all track data for an album, including playlist count and popularity
def get_track_data(album_id, access_token, track_limit):
    track_url = f'https://api.spotify.com/v1/albums/{album_id}/tracks'
    headers = {'Authorization': f'Bearer {access_token}'}
    response = requests.get(track_url, headers=headers)
    if response.status_code == 200:
        tracks = response.json()['items'][:track_limit]  # Limit number of tracks per album
        track_data = []
        for track in tracks:
            popularity = get_track_popularity(track['id'], access_token)
            playlist_count = get_playlist_count_for_track(track['name'], access_token)
            track_data.append({
                'Track ID': track['id'],
                'Track Name': track['name'],
                'Duration (ms)': track['duration_ms'],
                'Explicit': track['explicit'],
                'Track Number': track['track_number'],
                'Popularity': popularity,
                'Playlist Count': playlist_count,
                'Preview URL': track.get('preview_url')
            })
        return track_data
    else:
        print(f"Failed to get track data for album ID {album_id}: {response.status_code}")
        return []

# Main function to gather all tracks by The Beatles with a track limit
def get_all_beatles_tracks(artist_id, access_token, total_track_limit=20, album_limit=5):
    # Get all albums by The Beatles
    albums = get_artist_albums(artist_id, access_token, album_limit)
    
    # Get tracks from each album, up to the total track limit
    all_tracks = []
    for album in albums:
        if len(all_tracks) >= total_track_limit:
            break
        album_tracks = get_track_data(album['id'], access_token, track_limit=total_track_limit - len(all_tracks))
        all_tracks.extend(album_tracks)
    
    # Convert list of track data to DataFrame
    tracks_df = pd.DataFrame(all_tracks[:total_track_limit])  # Limit total number of tracks in DataFrame
    return tracks_df

# Example usage
# Use the artist_id and access_token you obtained earlier
beatles_tracks_df = get_all_beatles_tracks(artist_id, access_token, total_track_limit=20, album_limit=3)
print("\033[1;34m\nTrack DataFrame for Limited Beatles Tracks\033[0m")
beatles_tracks_df.head()

[1;34m
Track DataFrame for Limited Beatles Tracks[0m


Unnamed: 0,Track ID,Track Name,Duration (ms),Explicit,Track Number,Popularity,Playlist Count,Preview URL
0,4JaTNsbucUxF3FtKqt3IY3,Strawberry Fields Forever - 2015 Mix,252093,False,1,43,50,
1,5M8s9X2vuW1tv52uv4meCt,Penny Lane - 2017 Mix,180946,False,2,42,49,
2,5GuGzaaKHoOR3amXgvT2Pq,Sgt. Pepper's Lonely Hearts Club Band - 2017 Mix,122120,False,3,41,50,
3,0Q6YQlQHdFd6NfXjcHtijM,With A Little Help From My Friends - 2017 Mix,164373,False,4,41,50,
4,4GX8I8c7gMZn7mZFM9QAs0,Lucy In The Sky With Diamonds - 2017 Mix,210453,False,5,41,50,


In [9]:
# Table 4. Playlist Data
# Function to search for playlists with "The Beatles" in the title or description
def search_playlists(artist_name, access_token, limit=10):
    search_url = 'https://api.spotify.com/v1/search'
    headers = {'Authorization': f'Bearer {access_token}'}
    params = {
        'q': artist_name,
        'type': 'playlist',
        'limit': limit
    }
    response = requests.get(search_url, headers=headers, params=params)
    if response.status_code == 200:
        return response.json()['playlists']['items']
    else:
        print("Failed to search playlists:", response.status_code)
        return None

# Function to retrieve playlist details for each playlist found
def get_playlist_data(playlist_id, access_token):
    playlist_url = f'https://api.spotify.com/v1/playlists/{playlist_id}'
    headers = {'Authorization': f'Bearer {access_token}'}
    response = requests.get(playlist_url, headers=headers)
    if response.status_code == 200:
        data = response.json()
        playlist_data = {
            'Playlist ID': data['id'],
            'Name': data['name'],
            'Owner': data['owner']['display_name'],
            'Description': data['description'],
            'Total Tracks': data['tracks']['total'],
            'Followers': data['followers']['total'],
            'Collaborative': data['collaborative'],
            'Public': data['public']
        }
        return playlist_data
    else:
        print(f"Failed to get playlist details for {playlist_id}:", response.status_code)
        return None

# Search for playlists related to "The Beatles"
artist_name = "The Beatles"
playlists = search_playlists(artist_name, access_token, limit=10)  # Adjust limit as needed

# Retrieve details for each playlist and create a DataFrame
if playlists:
    playlist_data_list = []
    for playlist in playlists:
        playlist_id = playlist['id']
        playlist_data = get_playlist_data(playlist_id, access_token)
        if playlist_data:
            playlist_data_list.append(playlist_data)
    
# Convert list of playlist data to a DataFrame
playlist_df = pd.DataFrame(playlist_data_list)
print("\033[1;34m\nPlaylist DataFrame\033[0m")
playlist_df.head()

[1;34m
Playlist DataFrame[0m


Unnamed: 0,Playlist ID,Name,Owner,Description,Total Tracks,Followers,Collaborative,Public
0,2CKnioFobYQRK3EEMf9Jr6,The Beatles Greatest Hits,Tristan Rioseco,,168,230043,False,True
1,37i9dQZF1DZ06evO2iBPiw,This Is The Beatles,Spotify,"This is The Beatles. The essential tracks, all...",51,3613410,False,True
2,5oP2KIDIIMMLimsO5OMH4e,THE BEATLES - GREATEST HITS [OF ALL TIMES],allmächtig,Look At My Other Playlist Where Are You All Th...,54,17658,False,True
3,37i9dQZF1DXaKIA8E7WcJj,All Out 60s,Spotify,The biggest songs of the 1960s. Cover: The Bea...,150,3151700,False,True
4,21Gs9V9IfA9cw16tlznsHn,The Beatles: Best Of The Best,Best Of The Best,A selection of the biggest and best tracks fro...,61,378555,False,True


In [10]:
# Table 5: Audio Features of Tracks

# Function to get track details (including name) for a single track
def get_track_details(track_id, access_token):
    track_url = f'https://api.spotify.com/v1/tracks/{track_id}'
    headers = {'Authorization': f'Bearer {access_token}'}
    response = requests.get(track_url, headers=headers)
    if response.status_code == 200:
        data = response.json()
        return data['name']  # Returning only the track name
    else:
        print(f"Failed to get track details for track ID {track_id}: {response.status_code}")
        return None

# Function to get audio features for a single track, including track name
def get_audio_features(track_id, access_token):
    # Get the track name
    track_name = get_track_details(track_id, access_token)
    
    # Get the audio features
    audio_features_url = f'https://api.spotify.com/v1/audio-features/{track_id}'
    headers = {'Authorization': f'Bearer {access_token}'}
    response = requests.get(audio_features_url, headers=headers)
    if response.status_code == 200 and track_name:
        data = response.json()
        audio_data = {
            'Track ID': data['id'],
            'Track Name': track_name,
            'Danceability': data['danceability'],
            'Energy': data['energy'],
            'Key': data['key'],
            'Loudness': data['loudness'],
            'Speechiness': data['speechiness'],
            'Acousticness': data['acousticness'],
            'Instrumentalness': data['instrumentalness'],
            'Liveness': data['liveness'],
            'Valence': data['valence'],
            'Tempo': data['tempo'],
            'Duration (ms)': data['duration_ms']
        }
        return audio_data
    else:
        print(f"Failed to get audio features for track ID {track_id}: {response.status_code}")
        return None

# Retrieve audio features for all tracks in a list of track IDs
def get_all_audio_features(tracks_df, access_token):
    all_audio_features = []
    for track_id in tracks_df['Track ID']:
        audio_features = get_audio_features(track_id, access_token)
        if audio_features:
            all_audio_features.append(audio_features)
    
    # Convert list of audio features to a DataFrame
    audio_features_df = pd.DataFrame(all_audio_features)
    return audio_features_df

# Example usage
# Assuming `tracks_df` is your DataFrame with Beatles track IDs from previous functions
all_audio_features_df = get_all_audio_features(beatles_tracks_df, access_token)
print("\033[1;34m\nAudio Features DataFrame for All Tracks\033[0m")
all_audio_features_df.head()

[1;34m
Audio Features DataFrame for All Tracks[0m


Unnamed: 0,Track ID,Track Name,Danceability,Energy,Key,Loudness,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Duration (ms)
0,4JaTNsbucUxF3FtKqt3IY3,Strawberry Fields Forever - 2015 Mix,0.389,0.665,10,-8.727,0.292,0.269,0.000636,0.0918,0.22,96.705,252093
1,5M8s9X2vuW1tv52uv4meCt,Penny Lane - 2017 Mix,0.631,0.506,9,-8.369,0.0402,0.0898,0.00314,0.123,0.677,113.063,180947
2,5GuGzaaKHoOR3amXgvT2Pq,Sgt. Pepper's Lonely Hearts Club Band - 2017 Mix,0.36,0.837,7,-7.866,0.113,0.0378,0.0354,0.894,0.472,95.369,122120
3,0Q6YQlQHdFd6NfXjcHtijM,With A Little Help From My Friends - 2017 Mix,0.678,0.595,4,-7.9,0.0282,0.13,0.0,0.256,0.707,112.303,164373
4,4GX8I8c7gMZn7mZFM9QAs0,Lucy In The Sky With Diamonds - 2017 Mix,0.247,0.537,2,-6.921,0.0442,0.13,0.0,0.127,0.497,187.405,210453


<img src="genius.png" alt="Genius Logo" width="400"/>

In [11]:
# Load environment variables from .env file
genius_access_token = os.getenv("GENIUS_ACCESS_TOKEN")

In [12]:
# Set up headers for Genius API
headers = {
    'Authorization': f'Bearer {genius_access_token}'
}


# 1. Function to Get Artist Information
def get_artist_data(artist_name):
    search_url = "https://api.genius.com/search"
    params = {'q': artist_name}
    response = requests.get(search_url, headers=headers, params=params)
    if response.status_code == 200:
        hits = response.json()['response']['hits']
        if hits:
            artist_info = hits[0]['result']['primary_artist']
            artist_data = {
                'Artist ID': artist_info['id'],
                'Name': artist_info['name'],
                'URL': artist_info['url'],
                'Followers Count': artist_info.get('followers_count', 'N/A'),
                'Image URL': artist_info.get('image_url')
            }
            return pd.DataFrame([artist_data])
        else:
            print(f"No data found for artist '{artist_name}'.")
            return pd.DataFrame()
    else:
        print(f"Failed to retrieve artist data: {response.status_code}")
        return pd.DataFrame()

# Example usage for artist data
artist_df = get_artist_data("The Beatles")
print("\033[1;34m\nArtist DataFrame\033[0m")
artist_df

[1;34m
Artist DataFrame[0m


Unnamed: 0,Artist ID,Name,URL,Followers Count,Image URL
0,586,The Beatles,https://genius.com/artists/The-beatles,,https://images.genius.com/04ceb4450ad13e7a794e...


In [13]:
# 2. Function to Get All Songs for an Artist

genius = lg.Genius(genius_access_token, timeout=15)

# Function to Fetch Song Data with Lyrics for an Artist
def fetch_artist_lyrics(artist_name, max_songs=100):
    print(f"Fetching songs for artist: \033[1;34m{artist_name}\033[0m")

    
    # Search for the artist and get up to `max_songs` of their songs
    artist = genius.search_artist(artist_name, max_songs=max_songs, sort="popularity")
    if not artist:
        print(f"Artist '{artist_name}' not found on Genius.")
        return pd.DataFrame()

    # Prepare list to hold song data
    song_data = []

    # Loop through each song and collect details
    for song in artist.songs:
        track_name = song.title
        lyrics = song.lyrics
        
        # Fields from the Genius API that are more reliably available
        song_id = song.id  # Unique Genius song ID
        url = song.url  # URL to the song's page on Genius
        full_title = song.full_title  # Full title with artist name
        primary_artist = song.primary_artist.name if song.primary_artist else artist_name
        path = song.path  # Relative path on Genius website

        print(f"Fetched lyrics for: {track_name}")
        
        # Append data to list
        song_data.append({
            'Track Name': track_name,
            'Full Title': full_title,
            'Lyrics': lyrics,
            'Genius Song ID': song_id,
            'URL': url,
            'Primary Artist': primary_artist,
            'Path': path
        })

        # Optional: delay to avoid rate limits
        time.sleep(1)

    # Convert to DataFrame
    df = pd.DataFrame(song_data)
    return df

# Example usage for fetching song data
artist_name = "The Beatles"  # Replace with the artist's name you want
song_df = fetch_artist_lyrics(artist_name, max_songs=5)
print("\033[1;34m\nSong DataFrame\033[0m")
song_df

Fetching songs for artist: [1;34mThe Beatles[0m
Searching for songs by The Beatles...

Song 1: "Yesterday"
Song 2: "Let It Be"
Song 3: "In My Life"
Song 4: "Hey Jude"
Song 5: "Come Together"

Reached user-specified song limit (5).
Done. Found 5 songs.
Fetched lyrics for: Yesterday
Fetched lyrics for: Let It Be
Fetched lyrics for: In My Life
Fetched lyrics for: Hey Jude
Fetched lyrics for: Come Together
[1;34m
Song DataFrame[0m


Unnamed: 0,Track Name,Full Title,Lyrics,Genius Song ID,URL,Primary Artist,Path
0,Yesterday,Yesterday by The Beatles,109 ContributorsTranslationsTürkçeEspañolPortu...,2236,https://genius.com/The-beatles-yesterday-lyrics,The Beatles,/The-beatles-yesterday-lyrics
1,Let It Be,Let It Be by The Beatles,148 ContributorsTranslationsEspañolDeutschLet ...,1575,https://genius.com/The-beatles-let-it-be-lyrics,The Beatles,/The-beatles-let-it-be-lyrics
2,In My Life,In My Life by The Beatles,74 ContributorsTranslationsEspañolDeutschIn My...,71861,https://genius.com/The-beatles-in-my-life-lyrics,The Beatles,/The-beatles-in-my-life-lyrics
3,Hey Jude,Hey Jude by The Beatles,133 ContributorsTranslationsEspañolHey Jude Ly...,82381,https://genius.com/The-beatles-hey-jude-lyrics,The Beatles,/The-beatles-hey-jude-lyrics
4,Come Together,Come Together by The Beatles,175 ContributorsTranslationsEspañolPortuguêsDe...,56218,https://genius.com/The-beatles-come-together-l...,The Beatles,/The-beatles-come-together-lyrics


In [14]:
# Specify the song title
song_title = "Come Together"  # Replace with the actual song title you want to search

# Filter the DataFrame for the specific song and print its lyrics
song_row = song_df[song_df["Track Name"] == song_title]


lyrics = song_row["Lyrics"].values[0]  # Get the lyrics from the first (and only) row

print(f"\033[1;34m{song_title} Lyrics:\033[0m")
print(f"\n{lyrics}\n")

[1;34mCome Together Lyrics:[0m

175 ContributorsTranslationsEspañolPortuguêsDeutschCome Together Lyrics[Intro]
Shoot me
Shoot me
Shoot me
Shoot me

[Verse 1]
Here come old flat-top, he come grooving up slowly
He got ju-ju eyeball, he one holy roller
He got hair down to his knee
Got to be a joker, he just do what he please

[Interlude]
Shoot me
Shoot me
Shoot me
Shoot me

[Verse 2]
He wear no shoeshine, he got toe-jam football
He got monkey finger, he shoot Coca-Cola
He say, "I know you, you know me"
One thing I can tell you is you got to be free

[Chorus]
Come together, right now
Over me
You might also like[Interlude]
Shoot me
Shoot me
Shoot me
Shoot me

[Verse 3]
He bag production, he got walrus gumboot
He got Ono sideboard, he one spinal cracker
He got feet down below his knee
Hold you in his armchair, you can feel his disease

[Chorus]
Come together, right now
Over me

[Interlude]
Shoot me
Right
Come, come, come, come, come

[Verse 4]
He roller-coaster, he got early warnin'
He got

<img src="Assets/youtube.png" alt="Genius Logo" width="400"/>

In [15]:
# Load environment variables
load_dotenv("Data_api.env")
API_KEY = os.getenv("YOUTUBE_KEY")


CHANNEL_USERNAME = '@TheBeatles'

# Initialize the YouTube API client
youtube = build('youtube', 'v3', developerKey=API_KEY)

In [16]:
# Step 1: Get the channel ID using the username
def get_channel_id(username):
    response = youtube.search().list(
        part='snippet',
        q=username,
        type='channel',
        maxResults=1
    ).execute()
    
    if 'items' in response and response['items']:
        channel_id = response['items'][0]['snippet']['channelId']
        return channel_id
    else:
        print("Could not find the channel.")
        return None

# Example usage
channel_id = get_channel_id(CHANNEL_USERNAME)
print("\033[1;34mChannel ID:\033[0m", channel_id)  # Print in bold and blue


[1;34mChannel ID:[0m UCc4K7bAqpdBP8jh1j9XZAww


In [17]:
# Step 2: Get the Video Data
def get_all_videos_from_channel(channel_id, max_videos=100):
    video_data = []
    next_page_token = None
    
    # Step 1: Retrieve All Video IDs from the Channel
    while len(video_data) < max_videos:
        response = youtube.search().list(
            part='id',
            channelId=channel_id,
            maxResults=50,
            pageToken=next_page_token,
            type='video'
        ).execute()
        
        video_ids = [item['id']['videoId'] for item in response['items']]
        
        # Step 2: Retrieve Details for Each Video
        details_response = youtube.videos().list(
            part='snippet,statistics,contentDetails',
            id=','.join(video_ids)
        ).execute()
        
        for video_info in details_response['items']:
            video_data.append({
                'video_id': video_info['id'],
                'title': video_info['snippet']['title'],
                'published_at': video_info['snippet']['publishedAt'],
                'tags': video_info['snippet'].get('tags', []),
                'category_id': video_info['snippet'].get('categoryId', 'Unknown'),
                'view_count': int(video_info['statistics'].get('viewCount', 0)),
                'like_count': int(video_info['statistics'].get('likeCount', 0)),
                'comment_count': int(video_info['statistics'].get('commentCount', 0)),
                'like_to_view_ratio': int(video_info['statistics'].get('likeCount', 0)) / int(video_info['statistics'].get('viewCount', 1)),
                'comment_to_view_ratio': int(video_info['statistics'].get('commentCount', 0)) / int(video_info['statistics'].get('viewCount', 1)),
                'duration': video_info['contentDetails']['duration']
            })
        
        next_page_token = response.get('nextPageToken')
        if not next_page_token:
            break

    # Convert to DataFrame and sort by view count in descending order
    video_df = pd.DataFrame(video_data).sort_values(by='view_count', ascending=False)
    return video_df

# Example usage
channel_id = get_channel_id(CHANNEL_USERNAME)
video_df = get_all_videos_from_channel(channel_id)
print("\033[1;34m\nVideo DataFrame (Sorted by Most Popular)\033[0m")
video_df

[1;34m
Video DataFrame (Sorted by Most Popular)[0m


Unnamed: 0,video_id,title,published_at,tags,category_id,view_count,like_count,comment_count,like_to_view_ratio,comment_to_view_ratio,duration
24,8UQK-UcRezE,Strawberry Fields Forever - Restored HD Video,2015-09-15T13:04:50Z,"[Beatles, The Beatles, TheBeatles1, 1s, Song, ...",10,29434666,125890,3605,0.004277,0.000122,PT1M23S
59,vefJAtG-ZKI,Yellow Submarine Original Trailer - 1968 (Beat...,2012-03-20T15:08:19Z,"[Beatles, The Beatles, George Harrison, Ringo ...",10,12199321,79219,3071,0.006494,0.000252,PT3M43S
75,Tb83rbm0IVI,The Beatles: Get Back | Official Trailer | Dis...,2021-10-13T13:00:34Z,"[The Beatles, Documentary, John Lennon, Paul M...",10,5247581,140350,7412,0.026746,0.001412,PT3M58S
32,385eTo76OzA,Watch The Beatles rehearsing “Don’t Let Me Dow...,2021-11-24T19:10:41Z,[],10,3195153,51059,2400,0.015980,0.000751,PT1M59S
0,KgFTpwB_uII,Now And Then – The Last Beatles Song (Short Fi...,2023-10-26T12:51:56Z,[],10,2853562,43161,1894,0.015125,0.000664,PT35S
...,...,...,...,...,...,...,...,...,...,...,...
81,-evKmkjLryU,"Listen To The New Mix Of ""For No One"" From The...",2022-12-12T11:45:36Z,[],10,48342,2989,18,0.061830,0.000372,PT15S
72,2iBcSaELAxs,TV Premiere of The Beatles: Eight Days A Week,2018-01-02T18:03:54Z,"[Beatles, The Beatles, George Harrison, Ringo ...",10,30965,948,38,0.030615,0.001227,PT51S
86,feAltgyZa4Q,‘That’s The Way God Planned It’ by Billy Prest...,2024-08-11T21:22:05Z,[],10,24692,1818,28,0.073627,0.001134,PT16S
69,BTxuuyDNZC4,ARENA: Paul Gambaccini: It's a graft of Engl...,2012-10-08T09:39:52Z,"[ARENA2, Paul, the beatles, Beatles, The Beatl...",10,22560,307,13,0.013608,0.000576,PT42S


In [18]:
# Step 3: Playlists DataFrame
def get_channel_playlists(channel_id, max_results=10):
    playlist_data = []
    request = youtube.playlists().list(
        part="snippet,contentDetails",
        channelId=channel_id,
        maxResults=max_results
    ).execute()
    
    for item in request.get('items', []):
        playlist_data.append({
            'Playlist ID': item['id'],
            'Title': item['snippet']['title'],
            'Description': item['snippet'].get('description'),
            'Published Date': item['snippet'].get('publishedAt'),
            'Item Count': item['contentDetails'].get('itemCount')
        })
    
    return pd.DataFrame(playlist_data)

# Example usage for Playlists DataFrame
channel_id = get_channel_id(CHANNEL_USERNAME)
playlists_df = get_channel_playlists(channel_id)
print("\033[1;34m\nPlaylists DataFrame\033[0m")
playlists_df

[1;34m
Playlists DataFrame[0m


Unnamed: 0,Playlist ID,Title,Description,Published Date,Item Count
0,PL0jp-uZ7a4g_fh5IQ2IwEMPs2CYdmp7O2,Beatles '64,,2024-11-14T12:43:24Z,1
1,PL0jp-uZ7a4g-y5uBeFPjElILtDpCdICKD,Beatles American Tour,,2024-08-19T14:50:46Z,6
2,PL0jp-uZ7a4g-ZCB7OfAwVsZWHr865HK26,The Beatles - Let It Be (2024),,2024-04-29T22:11:43Z,7
3,PL0jp-uZ7a4g8OKKCTAZdVhXfJEgxkm1DU,The Beatles - Red & Blue Albums (2023 Edition),Remixed and Expanded: 'Red’ has 12 additional ...,2023-11-10T05:43:01Z,92
4,PL0jp-uZ7a4g_wUzbgU_US9V7h-HR1TyIh,Now And Then,,2023-10-29T10:18:05Z,7
5,PL0jp-uZ7a4g-Bo8JcrE2B4eqLnZRlT0X7,REVOLVER 2022,,2022-09-07T13:41:23Z,54
6,PL0jp-uZ7a4g8yAYvmdbQ2Uh8ZEgqrV0Qf,Let It Be Anniversary & Get Back Documentary,,2021-09-21T17:37:31Z,32
7,PL0jp-uZ7a4g9y2OkrZ2GkUqnpcw8gbuEo,Fans Around The World,,2019-12-09T20:37:43Z,0
8,PL0jp-uZ7a4g8LIe-l4bku8Irbsnev4IWW,Abbey Road Anniversary,,2019-08-14T15:47:51Z,12
9,PL0jp-uZ7a4g_oUvMt1-fTMZSYyDJqNSva,The Beatles - LOVE | Cirque Du Soleil (Officia...,,2019-06-03T14:31:12Z,17


In [None]:
# Step 4: Comment Data for Videos
def get_video_comments_data(video_ids, max_comments_per_video=100):
    comments_data = []
    for video_id in video_ids:
        next_page_token = None
        while len(comments_data) < max_comments_per_video:
            response = youtube.commentThreads().list(
                part="snippet",
                videoId=video_id,
                maxResults=100,
                pageToken=next_page_token
            ).execute()
            
            for item in response.get('items', []):
                comment_info = item['snippet']['topLevelComment']['snippet']
                comments_data.append({
                    'Comment ID': item['id'],
                    'Video ID': video_id,
                    'Author Display Name': comment_info.get('authorDisplayName'),
                    'Comment Text': comment_info.get('textDisplay'),
                    'Like Count': comment_info.get('likeCount'),
                    'Published Date': comment_info.get('publishedAt'),
                    'Updated Date': comment_info.get('updatedAt')
                })
                
            next_page_token = response.get('nextPageToken')
            if not next_page_token:
                break

    return pd.DataFrame(comments_data)

# Example usage for Comments DataFrame
comments_df = get_video_comments_data(video_ids)
print("\033[1;34m\nComments DataFrame\033[0m")
print(comments_df)


**Did not run as comments are disabled for this page**

In [19]:
# Step 6: Detailed Category Data for each video on the channel
def get_video_category_for_videos(video_ids):
    video_categories = []
    
    # Retrieve video details for each video
    for video_id in video_ids:
        try:
            response = youtube.videos().list(
                part="snippet",
                id=video_id
            ).execute()
            
            # Get details from the video response
            if 'items' in response and response['items']:
                video_info = response['items'][0]
                category_id = video_info['snippet'].get('categoryId', 'Unknown')
                title = video_info['snippet'].get('title', 'Unknown Title')
                channel_title = video_info['snippet'].get('channelTitle', 'Unknown Channel')
                description = video_info['snippet'].get('description', 'No description available')
                published_at = video_info['snippet'].get('publishedAt', 'Unknown Date')
                
                # Add all details to video data
                video_categories.append({
                    'Video ID': video_id,
                    'Video Title': title,
                    'Category ID': category_id,
                    'Channel Title': channel_title,
                    'Description': description,
                    'Published Date': published_at
                })
        except HttpError as e:
            print(f"An error occurred for video ID: {video_id} - {e}")
    
    # Create a DataFrame with video details including category, title, channel, description, and published date
    return pd.DataFrame(video_categories)

# Retrieve video IDs if not defined
if 'video_ids' not in globals():
    # Assuming `channel_id` is already defined
    video_df = get_all_videos_from_channel(channel_id)
    video_ids = video_df['video_id'].tolist()  # Extract video IDs from the DataFrame

# Example usage for all the Beatles video categories and titles
category_df = get_video_category_for_videos(video_ids)
print("\033[1;34m\nVideo Category DataFrame for The Beatles' Videos\033[0m")
category_df.head()

[1;34m
Video Category DataFrame for The Beatles' Videos[0m


Unnamed: 0,Video ID,Video Title,Category ID,Channel Title,Description,Published Date
0,8UQK-UcRezE,Strawberry Fields Forever - Restored HD Video,10,The Beatles,THE BEATLES’ VIDEOS AND TOP HITS COME TOGETHER...,2015-09-15T13:04:50Z
1,vefJAtG-ZKI,Yellow Submarine Original Trailer - 1968 (Beat...,10,The Beatles,Yellow Submarine Feature Film Restored For May...,2012-03-20T15:08:19Z
2,Tb83rbm0IVI,The Beatles: Get Back | Official Trailer | Dis...,10,The Beatles,The official trailer for #TheBeatlesGetBack is...,2021-10-13T13:00:34Z
3,385eTo76OzA,Watch The Beatles rehearsing “Don’t Let Me Dow...,10,The Beatles,Watch The Beatles rehearsing “Don’t Let Me Dow...,2021-11-24T19:10:41Z
4,KgFTpwB_uII,Now And Then – The Last Beatles Song (Short Fi...,10,The Beatles,Now and Then's eventful journey to fruition to...,2023-10-26T12:51:56Z


<img src="Assets/ticketmaster.png" alt="Genius Logo" width="400"/>

In [20]:
# Load environment variables from .env file
master_api_key = os.getenv("MASTER_KEY")
master_api_key = 'qxoxFTbgjYcZbSAGd4Gb4g1STF2dCPNa'

# Define base URL and API Key
BASE_URL = "https://app.ticketmaster.com/discovery/v2/"
params = {"apikey": master_api_key}

In [21]:
def search_events_by_keyword(keyword, size=10):
    search_url = f"{BASE_URL}events"
    search_params = {
        "apikey": master_api_key,
        "keyword": keyword,
        "size": size  # Number of results to return
    }
    response = requests.get(search_url, params=search_params)
    
    if response.status_code == 200:
        events_data = response.json()
        
        # Check if events are available
        if '_embedded' not in events_data or 'events' not in events_data['_embedded']:
            print("No events found for this keyword.")
            return pd.DataFrame(columns=["Event Name", "Event Type", "Date", "Time", "Event URL", "Venue Name", "City", "State", "Country", "Min Price", "Max Price", "Currency", "Genre", "Image URL"])
        
        # Extracting event info
        events_list = []
        for event in events_data['_embedded']['events']:
            event_name = event.get('name', 'Unknown Event')
            event_type = event.get('type', 'N/A')
            event_date = event['dates']['start'].get('localDate', 'N/A')
            event_time = event['dates']['start'].get('localTime', 'N/A')
            event_url = event.get('url', 'N/A')
            
            # Genre information
            genre = event['classifications'][0]['genre']['name'] if 'classifications' in event and event['classifications'] else 'N/A'
            
            # Venue information
            venue_info = event['_embedded']['venues'][0] if '_embedded' in event and 'venues' in event['_embedded'] else {}
            venue_name = venue_info.get('name', 'Unknown Venue')
            city = venue_info.get('city', {}).get('name', 'N/A')
            state = venue_info.get('state', {}).get('name', 'N/A') if 'state' in venue_info else 'N/A'
            country = venue_info.get('country', {}).get('name', 'N/A')

            # Ticket price information
            price_ranges = event.get('priceRanges', [])
            min_price = price_ranges[0].get('min', 'N/A') if price_ranges else 'N/A'
            max_price = price_ranges[0].get('max', 'N/A') if price_ranges else 'N/A'
            currency = price_ranges[0].get('currency', 'N/A') if price_ranges else 'N/A'
            
            # Image URL (using the first image if available)
            image_url = event['images'][0].get('url', 'N/A') if 'images' in event and event['images'] else 'N/A'
            
            events_list.append({
                "Event Name": event_name,
                "Event Type": event_type,
                "Date": event_date,
                "Time": event_time,
                "Event URL": event_url,
                "Venue Name": venue_name,
                "City": city,
                "State": state,
                "Country": country,
                "Min Price": min_price,
                "Max Price": max_price,
                "Currency": currency,
                "Genre": genre,
                "Image URL": image_url
            })
        
        return pd.DataFrame(events_list)
    
    else:
        print(f"Failed to fetch data. Status code: {response.status_code}")
        return pd.DataFrame(columns=["Event Name", "Event Type", "Date", "Time", "Event URL", "Venue Name", "City", "State", "Country", "Min Price", "Max Price", "Currency", "Genre", "Image URL"])

# Example usage: Search for events related to "The Beatles"
events_df = search_events_by_keyword("The Beatles", size=10)

print("\033[1;34mEvents related to 'The Beatles':\033[0m")
events_df.head()


[1;34mEvents related to 'The Beatles':[0m


Unnamed: 0,Event Name,Event Type,Date,Time,Event URL,Venue Name,City,State,Country,Min Price,Max Price,Currency,Genre,Image URL
0,The Mersey Beatles,event,2024-11-14,19:00:00,https://www.ticketmaster.ie/the-mersey-beatles...,3Olympia Theatre,Dublin,Dublin 2,Ireland,33.2,41.05,EUR,Rock,https://s1.ticketm.net/dam/a/411/0bda42a5-c60a...
1,The Mersey Beatles,event,2024-11-15,19:30:00,https://www.ticketmaster.ie/the-mersey-beatles...,Gleneagle INEC Arena,Co. Kerry,Co. Kerry,Ireland,33.75,33.75,EUR,Rock,https://s1.ticketm.net/dam/a/411/0bda42a5-c60a...
2,The Beatles Show,event,2024-11-15,18:00:00,https://www.moshtix.com.au/v2/event/the-beatle...,Shearwater Resort,Shearwater,Tasmania,Australia,,,,Variety,https://s1.ticketm.net/dam/c/360/89e64469-b2d6...
3,The Mersey Beatles,event,2024-11-16,20:00:00,https://www.ticketmaster.ie/the-mersey-beatles...,"TF Royal, Castlebar",Co. Mayo,Co. Mayo,Ireland,33.65,33.65,EUR,Rock,https://s1.ticketm.net/dam/a/411/0bda42a5-c60a...
4,Imagine The Beatles,event,2024-11-16,19:30:00,https://www.ticketweb.uk/event/imagine-the-bea...,Hertford Corn Exchange,Hertford,,Great Britain,,,,Rock,https://s1.ticketm.net/dam/c/fbc/b293c0ad-c904...


In [22]:
def search_events(city=None, classification_name=None, size=10):
    search_url = f"{BASE_URL}events"
    search_params = {
        "apikey": master_api_key,
        "size": size  # Number of results to return
    }
    if city:
        search_params["city"] = city
    if classification_name:
        search_params["classificationName"] = classification_name
    
    response = requests.get(search_url, params=search_params)
    
    if response.status_code == 200:
        events_data = response.json()
        
        if '_embedded' not in events_data or 'events' not in events_data['_embedded']:
            print("No events found.")
            return {"_embedded": {"events": []}}
        
        return events_data
    
    else:
        print(f"Failed to fetch data. Status code: {response.status_code}")
        return {"_embedded": {"events": []}}

def get_event_classification_info(events_data):
    classification_list = []
    for event in events_data['_embedded']['events']:
        if 'classifications' in event and event['classifications']:  # Check if classifications data is present
            for classification in event['classifications']:
                genre = classification.get('genre', {}).get('name', 'N/A')
                segment = classification.get('segment', {}).get('name', 'N/A')
                subGenre = classification.get('subGenre', {}).get('name', 'N/A')
                event_type = classification.get('type', {}).get('name', 'N/A')

                classification_list.append({
                    "Event Name": event.get('name', 'Unknown Event'),
                    "Genre": genre,
                    "Segment": segment,
                    "SubGenre": subGenre,
                    "Type": event_type
                })
    
    # Return a DataFrame, even if empty, with the specified columns
    return pd.DataFrame(classification_list) if classification_list else pd.DataFrame(columns=["Event Name", "Genre", "Segment", "SubGenre", "Type"])

# Fetch music events in Los Angeles
events_data = search_events(city="Los Angeles", classification_name="music", size=10)

# Extract classification information
classification_df = get_event_classification_info(events_data)

# Display result
print("\033[1;34mClassification Information for Music Events in Los Angeles:\033[0m")
classification_df.head()

[1;34mClassification Information for Music Events in Los Angeles:[0m


Unnamed: 0,Event Name,Genre,Segment,SubGenre,Type
0,Movements with Citizen,Rock,Music,Pop,Undefined
1,Los Temerarios USA Tour,Latin,Music,Latin,Undefined
2,Gillian Welch & David Rawlings,Rock,Music,Pop,Undefined
3,Los Temerarios USA Tour,Latin,Music,Latin,Undefined
4,Sabrina Carpenter w/ Declan Mckenna,Pop,Music,Pop Rock,


In [23]:
def get_venue_accessibility_info(events_data):
    accessibility_list = []
    unique_venues = set()  # Set to track unique venue names
    
    for event in events_data.get('_embedded', {}).get('events', []):
        # Check if 'venues' data is embedded in the event
        venue_info = event.get('_embedded', {}).get('venues', [{}])[0]
        venue_name = venue_info.get('name', 'Unknown Venue')
        general_info = venue_info.get('generalInfo', {})
        accessibility = venue_info.get('accessibility', {})
        city = venue_info.get('city', {}).get('name', '')

        # Ensure uniqueness by venue name
        if venue_name not in unique_venues:
            accessibility_list.append({
                "Venue Name": venue_name,
                "City": city,
                "General Info": general_info.get('generalRule', 'N/A'),
                "Child Info": general_info.get('childRule', 'N/A'),
                "Accessible Seating": accessibility.get('ticketLimit', 'N/A')
            })
            unique_venues.add(venue_name)  # Add to the set to avoid duplicates
    
    # Return DataFrame
    return pd.DataFrame(accessibility_list) if accessibility_list else pd.DataFrame(columns=["Venue Name", "City", "General Info", "Child Info", "Accessible Seating"])

# Example usage: Search for events in Los Angeles and get accessibility info for all unique venues
events_data = search_events(city="Los Angeles", size=200)  # Adjust size as needed to get more events
accessibility_info_df = get_venue_accessibility_info(events_data)

# Display result
print("\033[1;34mVenue Accessibility Information for All Unique Venues in Los Angeles:\033[0m")
accessibility_info_df

[1;34mVenue Accessibility Information for All Unique Venues in Los Angeles:[0m


Unnamed: 0,Venue Name,City,General Info,Child Info,Accessible Seating
0,Crypto.com Arena,Los Angeles,"No Bottles, Cans, Or Coolers. No Smoking In Ar...","Some events require all attendees, regardless ...",
1,Pauley Pavilion-UCLA,Los Angeles,,,
2,Galen Center,Los Angeles,,,
3,Dodger Stadium,Los Angeles,,,
4,BMO Stadium,Los Angeles,"We aim to provide an enjoyable, yet exciting e...",Children two and under will be admitted free b...,
5,The Torch at LA Coliseum,Los Angeles,,,
6,The Wiltern,Los Angeles,For more information on our venue policies and...,All patrons must have a ticket regardless of a...,
7,Orpheum Theatre,Los Angeles,Video and still photography not permitted. Out...,Attendance for children may be restricted for ...,
8,Greek Theatre,Los Angeles,"No cameras (any type), no recording devices, n...",,
9,Hollywood Pantages Theatre,Los Angeles,No smoking in the theatre; patrons may go outs...,EVERYONE NEEDS A TICKET; NO BABIES IN ARMS. Pl...,


<img src="Assets/discogs.png" alt="Genius Logo" width="400"/>

In [24]:
# Load environment variables
discogs_consumer_key = os.getenv('DISCOGS_CONSUMER_KEY')
discogs_consumer_secret = os.getenv('DISCOGS_CONSUMER_SECRET')

# Parameters for Discogs API
params = {
    "key": discogs_consumer_key,
    "secret": discogs_consumer_secret
}

In [25]:
# Step 1: Search for The Beatles and get their artist_id
def search_artist(artist_name):
    search_url = "https://api.discogs.com/database/search"
    params = {
        "q": artist_name,
        "type": "artist",
        "key": discogs_consumer_key,
        "secret": discogs_consumer_secret
    }
    
    response = requests.get(search_url, params=params)
    
    if response.status_code == 200:
        results = response.json().get("results", [])
        for result in results:
            if result["title"].lower() == artist_name.lower():
                artist_id = result["id"]
                # Only print the title in blue and bold
                print(f"\033[1;34mArtist ID for '{artist_name}'\033[0m: {artist_id}")
                return artist_id  # Return the artist ID so it doesn't get printed again
        print(f"Could not find artist '{artist_name}'.")
    else:
        print(f"Error: {response.status_code} - {response.text}")
    return None

# Example usage
artist_name = "The Beatles"
artist_id = search_artist(artist_name)  # This will only print the artist ID once

[1;34mArtist ID for 'The Beatles'[0m: 82730


In [26]:
# Step 2: Get Artist Information
def get_artist_information(artist_id):
    url = f"https://api.discogs.com/artists/{artist_id}"
    response = requests.get(url)
    
    if response.status_code == 200:
        artist_info = response.json()
        artist_data = {
            "Artist Name": artist_info.get("name"),
            "Profile": artist_info.get("profile"),
            "Members": [member["name"] for member in artist_info.get("members", [])] if "members" in artist_info else "N/A",
            "Genres": artist_info.get("genres", []),
            "Styles": artist_info.get("styles", []),
            "URLs": artist_info.get("urls", [])
        }
        artist_df = pd.DataFrame([artist_data])
        return artist_df  # Return the DataFrame for use outside the function

artist_df = get_artist_information(artist_id)

print("\033[1;34mArtist Information for The Beatles:\033[0m")
artist_df

[1;34mArtist Information for The Beatles:[0m


Unnamed: 0,Artist Name,Profile,Members,Genres,Styles,URLs
0,The Beatles,"Emerging from Liverpool, England in 1960, the ...","[Paul McCartney, John Lennon, George Harrison,...",[],[],"[https://www.thebeatles.com/, https://www.brit..."


In [27]:
# Step 3: Get Artist Releases 

# Function to get Release Rating
def get_release_rating(release_url):
    release_details = requests.get(release_url, params=params).json()
    # Extract rating data if available
    rating = release_details.get('community', {}).get('rating', {}).get('average', 'No rating available')
    return rating


def get_artist_releases(artist_id, num_releases=5):
    releases_url = f"https://api.discogs.com/artists/{artist_id}/releases"
    releases_data = []
    page = 1
    while True:
        release_params = {
            "page": page,
            "per_page": num_releases  # Limit to the number of releases you want to fetch
        }
        response = requests.get(releases_url, params=release_params)
        if response.status_code == 200:
            releases = response.json().get("releases", [])
            for release in releases:
                # Get the rating for the release
                rating = get_release_rating(release.get("resource_url"))
                
                # Append release data along with rating
                releases_data.append({
                    "Title": release.get("title"),
                    "Format": release.get("format"),
                    "Year": release.get("year"),
                    "Label": release.get("label"),
                    "Type": release.get("type"),
                    "Resource URL": release.get("resource_url"),
                    "User Rating": rating  # Add the rating as a new column
                })
            
            # Break if we've collected the desired number of releases
            if len(releases_data) >= num_releases:
                break
            page += 1
        else:
            print(f"Error fetching releases (Status Code: {response.status_code})")
            break
    
    releases_df = pd.DataFrame(releases_data)  # Create DataFrame
    return releases_df  # Return the releases DataFrame

# Get artist releases (limit to first 5 releases)
releases_df = get_artist_releases(artist_id, num_releases=5)

# Print releases information with ratings
print("\033[1;34mReleases Information for The Beatles:\033[0m")
releases_df

[1;34mReleases Information for The Beatles:[0m


Unnamed: 0,Title,Format,Year,Label,Type,Resource URL,User Rating
0,Hello Little Girl / Till There Was You,"Acetate, 10"", Mono",1962,Personal Recording Department,release,https://api.discogs.com/releases/9833515,4.67
1,My Bonnie,,1962,,master,https://api.discogs.com/masters/119930,No rating available
2,Love Me Do,,1962,,master,https://api.discogs.com/masters/1154826,No rating available
3,Please Please Me,"7"", EP",1963,Parlophone,release,https://api.discogs.com/releases/5503823,4.88
4,Boys / Love Me Do,"7"", Single, Jukebox, Ora",1963,Odeon,release,https://api.discogs.com/releases/6531153,4.75


In [28]:
# Step 4: Get Artist Information with Label Data
def get_artist_label_info(artist_id, num_releases=5):
    url = f"https://api.discogs.com/artists/{artist_id}/releases"
    labels = []
    page = 1

    while len(labels) < num_releases:
        response = requests.get(url, params={"page": page, "per_page": min(num_releases - len(labels), 50)})
        if response.status_code != 200:
            print(f"Error fetching releases (Status Code: {response.status_code})")
            break

        for release in response.json().get("releases", []):
            release_details = requests.get(release["resource_url"]).json()
            label_info = release_details.get("labels", [{}])[0]  # Take the first label if multiple

            labels.append({
                "Release Title": release.get("title", "Unknown Title"),
                "Label Name": label_info.get("name", "Unknown Label"),
                "Catalog Number": label_info.get("catno", "N/A"),
                "Year": release_details.get("year", "Unknown Year")
            })
            if len(labels) >= num_releases:
                break

        page += 1

    return pd.DataFrame(labels, columns=["Release Title", "Label Name", "Catalog Number", "Year"])

# Get label info for The Beatles (limit to first 10 releases)
labels_df = get_artist_label_info(artist_id, num_releases=10)

# Print label information for multiple releases
print("\033[1;34mLabel Information for The Beatles (First 10 Releases):\033[0m")
labels_df

[1;34mLabel Information for The Beatles (First 10 Releases):[0m


Unnamed: 0,Release Title,Label Name,Catalog Number,Year
0,Hello Little Girl / Till There Was You,Personal Recording Department,none,1962
1,My Bonnie,Unknown Label,,1961
2,Love Me Do,Unknown Label,,1962
3,Please Please Me,Parlophone,LMEP 1173,1963
4,Boys / Love Me Do,Odeon,SO 10108,1963
5,Please Mr Postman,Odeon,25006,1963
6,Till There Was You,Odeon,EPO-3,1963
7,Please Please Me,Odeon,25078,1963
8,What Goes On,Dick James Music Ltd.,none,1963
9,P.S. I Love You,Parlophone,PAL-60273,1963


## Multi-API Integration

<img src="Assets/multi_api.png" alt="Genius Logo" width="600"/>

In [29]:
def get_popular_artists_in_la(size=100):
    # Search for music events in Los Angeles
    events_data = search_events(city="Los Angeles", classification_name="music", size=size)
    
    # Extract artist names from events
    artist_names = []
    for event in events_data.get('_embedded', {}).get('events', []):
        for attraction in event.get('_embedded', {}).get('attractions', []):
            artist_name = attraction.get('name', 'Unknown Artist')
            if artist_name not in artist_names:
                artist_names.append(artist_name)
    
    return artist_names[:size]  # Limit to specified number of artists

# Example usage to get popular artist names in Los Angeles
popular_artists_la = get_popular_artists_in_la(size=5)
print("\033[1;34mPopular Artists Performing in Los Angeles:\033[0m", popular_artists_la)


[1;34mPopular Artists Performing in Los Angeles:[0m ['Movements', 'Citizen', 'Scowl', 'Downward', 'Los Temerarios']


In [30]:
def get_discogs_artist_id(artist_name):
    search_url = "https://api.discogs.com/database/search"
    search_params = {
        "q": artist_name,
        "type": "artist",
        "key": discogs_consumer_key,
        "secret": discogs_consumer_secret
    }
    response = requests.get(search_url, params=search_params)
    
    if response.status_code == 200:
        results = response.json().get('results', [])
        
        # First, look for exact match
        exact_matches = [result for result in results if result.get('title', '').lower() == artist_name.lower()]
        if exact_matches:
            return exact_matches[0].get('id')  # Return ID of the first exact match
        
        # If no exact match, try fuzzy matching
        titles = [result.get('title') for result in results]
        close_matches = get_close_matches(artist_name, titles, n=1, cutoff=0.6)
        if close_matches:
            matched_title = close_matches[0]
            matched_result = next((result for result in results if result.get('title') == matched_title), None)
            if matched_result:
                return matched_result.get('id')  # Return ID of the best fuzzy match

    print(f"No results found for {artist_name}")
    return None

# Example usage with artist names
artist_ids = {artist: get_discogs_artist_id(artist) for artist in popular_artists_la}
print("\033[1;34mArtist IDs:\033[0m", artist_ids)

[1;34mArtist IDs:[0m {'Movements': 1386947, 'Citizen': 7829, 'Scowl': 1021223, 'Downward': 6137569, 'Los Temerarios': 2161789}


In [31]:
# Step 1: Get popular artists in Los Angeles (Ticketmaster API)
def get_popular_artists_in_la(size=10):
    search_url = "https://app.ticketmaster.com/discovery/v2/events"
    search_params = {
        "apikey": master_api_key,
        "city": "Los Angeles",
        "classificationName": "music",
        "size": size
    }
    response = requests.get(search_url, params=search_params)
    
    artist_names = []
    if response.status_code == 200:
        events_data = response.json().get('_embedded', {}).get('events', [])
        for event in events_data:
            for attraction in event.get('_embedded', {}).get('attractions', []):
                artist_name = attraction.get('name', 'Unknown Artist')
                if artist_name not in artist_names:
                    artist_names.append(artist_name)
    
    return artist_names[:size]

# Step 2: Get Discogs artist ID and additional info (Discogs API)
def get_discogs_artist_info(artist_name):
    search_url = "https://api.discogs.com/database/search"
    search_params = {
        "q": artist_name,
        "type": "artist",
        "key": discogs_consumer_key,
        "secret": discogs_consumer_secret
    }
    response = requests.get(search_url, params=search_params)
    
    if response.status_code == 200:
        results = response.json().get('results', [])
        
        # Exact match check first
        for result in results:
            if result.get("title", "").strip().lower() == artist_name.lower():
                return fetch_discogs_artist_details(result["id"])
        
        # Fuzzy matching if no exact match is found
        titles = [result.get('title') for result in results]
        close_matches = get_close_matches(artist_name, titles, n=1, cutoff=0.6)
        if close_matches:
            matched_title = close_matches[0]
            matched_result = next((result for result in results if result.get('title') == matched_title), None)
            if matched_result:
                return fetch_discogs_artist_details(matched_result.get("id"))
    
    print(f"No results found for {artist_name}")
    return {"Discogs ID": None, "Profile (Discogs)": "N/A"}

# Fetch Discogs artist details and full profile
def fetch_discogs_artist_details(artist_id):
    artist_url = f"https://api.discogs.com/artists/{artist_id}"
    response = requests.get(artist_url)
    
    if response.status_code == 200:
        artist_data = response.json()
        return {
            "Discogs ID": artist_id,
            "Profile (Discogs)": artist_data.get('profile', 'N/A')
        }
    return {"Discogs ID": artist_id, "Profile (Discogs)": "N/A"}

# Step 3: Get Spotify access token
def get_spotify_access_token(client_id, client_secret):
    auth_url = 'https://accounts.spotify.com/api/token'
    auth_header = {
        'Authorization': f'Basic {base64.b64encode((client_id + ":" + client_secret).encode()).decode()}'
    }
    auth_data = {
        'grant_type': 'client_credentials'
    }
    auth_response = requests.post(auth_url, data=auth_data, headers=auth_header)
    return auth_response.json().get('access_token')

# Step 4: Get Spotify artist information
def get_spotify_artist_info(artist_name, access_token):
    search_url = "https://api.spotify.com/v1/search"
    headers = {
        "Authorization": f"Bearer {access_token}"
    }
    params = {
        "q": artist_name,
        "type": "artist",
        "limit": 1
    }
    response = requests.get(search_url, headers=headers, params=params)
    
    if response.status_code == 200:
        artist_data = response.json().get('artists', {}).get('items', [])
        if artist_data:
            artist = artist_data[0]
            return {
                "Genre (Spotify)": ", ".join(artist.get('genres', [])),
                "Popularity (Spotify)": artist.get('popularity'),
                "Followers (Spotify)": artist.get('followers', {}).get('total')
            }
    print(f"Spotify data not found for {artist_name}")
    return {"Genre (Spotify)": "N/A", "Popularity (Spotify)": "N/A", "Followers (Spotify)": "N/A"}

# Step 5: Get additional Ticketmaster event information
def get_ticketmaster_event_info(artist_name, city="Los Angeles"):
    search_url = "https://app.ticketmaster.com/discovery/v2/events"
    params = {
        "apikey": master_api_key,
        "keyword": artist_name,
        "city": city,
        "size": 1  # Only get the first event as an example
    }
    response = requests.get(search_url, params=params)
    
    if response.status_code == 200:
        events_data = response.json().get('_embedded', {}).get('events', [])
        if events_data:
            event = events_data[0]
            venue_info = event['_embedded']['venues'][0] if '_embedded' in event and 'venues' in event['_embedded'] else {}
            return {
                "Event Date (Ticketmaster)": event['dates']['start'].get('localDate', 'N/A'),
                "Venue (Ticketmaster)": venue_info.get('name', 'N/A'),
                "Location (Ticketmaster)": f"{venue_info.get('city', {}).get('name', 'N/A')}, {venue_info.get('state', {}).get('name', 'N/A')}",
                "Ticket Availability (Ticketmaster)": event.get('ticketLimit', {}).get('info', 'N/A')
            }
    print(f"No event data found on Ticketmaster for {artist_name}")
    return {
        "Event Date (Ticketmaster)": "N/A",
        "Venue (Ticketmaster)": "N/A",
        "Location (Ticketmaster)": "N/A",
        "Ticket Availability (Ticketmaster)": "N/A"
    }

# Step 6: Combine data from all sources
def get_combined_artist_info(size=5):
    popular_artists = get_popular_artists_in_la(size)
    spotify_token = get_spotify_access_token(client_id, client_secret)
    
    combined_data = []
    for artist_name in popular_artists:
        discogs_info = get_discogs_artist_info(artist_name)
        spotify_info = get_spotify_artist_info(artist_name, spotify_token)
        ticketmaster_info = get_ticketmaster_event_info(artist_name)
        
        combined_data.append({
            "Artist Name": artist_name,
            **discogs_info,
            **spotify_info,
            **ticketmaster_info
        })
    
    return pd.DataFrame(combined_data)

# Fetch and display combined data
artist_info_df = get_combined_artist_info(size=5)
print("\033[1;34mCombined Artist Information from Ticketmaster, Discogs, and Spotify:\033[0m")
artist_info_df


[1;34mCombined Artist Information from Ticketmaster, Discogs, and Spotify:[0m


Unnamed: 0,Artist Name,Discogs ID,Profile (Discogs),Genre (Spotify),Popularity (Spotify),Followers (Spotify),Event Date (Ticketmaster),Venue (Ticketmaster),Location (Ticketmaster),Ticket Availability (Ticketmaster)
0,Movements,1386947,,"alternative emo, dreamo, emo, midwest emo, pop...",59,287889,2025-04-06,The Torch at LA Coliseum,"Los Angeles, California",There is an overall 8 ticket limit for this ev...
1,Citizen,7829,,gymcore,63,471139,2025-04-06,The Torch at LA Coliseum,"Los Angeles, California",There is an overall 8 ticket limit for this ev...
2,Scowl,1021223,,"california hardcore, modern hardcore",47,77130,2025-04-06,The Torch at LA Coliseum,"Los Angeles, California",There is an overall 8 ticket limit for this ev...
3,Downward,6137569,,"diy emo, grungegaze, tulsa indie",30,13115,2025-04-06,The Torch at LA Coliseum,"Los Angeles, California",There is an overall 8 ticket limit for this ev...
4,Los Temerarios,2161789,,"grupera, musica mexicana",80,5627575,2024-12-07,BMO Stadium,"Los Angeles, California",There is an overall 8 ticket limit for this ev...
