**Assignment: A MongoDB JSON Document Database using Spotify API**

#Batch Processing using MongoDB and the Spotify API

Batch data processing involves collecting, storing, and processing data in large groups or "batches" rather than handling data in real-time as it comes in. This approach allows for more efficient handling of data, especially when dealing with large volumes of information that don’t need to be processed immediately. In this assignment, you will work with both MongoDB, a NoSQL database, and the Spotify API to implement batch processing techniques that involve collecting and analyzing music-related data.

These are the steps you will need to follow:
*   [1 - Install dependencies](#1)
*   [2 - Create an Atlas Client on MongoDB](#2)
*   [3 - Create an Spotify APP](#3)
*   [4 - Connect to your app using Spotify's SDK](#4)
*   [5 - Get new releases data from Spotify API](#5)
*   [6 - Explore your MongoDB collection](#6)
*   [7 - Get all albums from the featured Artists](#7)
*   [8 - Create New MongoDB collection](#8)
*   [9 - Explore your data!](#9)
*   [10 - Create an iteractive map using Folium!](#10)

**IMPORTANT!!!!**
## During the course of this assignment, you will encounter the word `None` in several places. Each time you see `None` replace it with the appropriate variable, method, string, or value for that specific code snippet—unless the `None` is used as a return value to indicate the absence of a result. In this case, `None` is intentionally returned to signify that that a result could not be obtained.

# **Name of Students:**

Yuvraj Lal - 3107753

Harjoban Singh Jawanda - 3108304

<a name='1'></a>
#1 - Install spotify sdk and pymongo in your Google Colab env

In [13]:
!pip install spotipy pymongo --upgrade



<a name='2'></a>
#2 - Create an Atlas Client on MongoDB

To sign up for a free MongoDB account, go to https://mongodb.com, then create a new free account. Once your account is set up, you will be taken to the screen to create your cluster. Use the default settings for their free Atlas cluster (MO, as they refer to it) and click Create Cluster to get started. This will take you to the Clusters page so you can begin creating your new cluster, which takes several minutes.

###Create your Database User and whitelist your IP address
Next, in the Atlas tab Security Quickstart, you will need to complete additional steps to get up and running:

*	Add your username and password, then click Create User—This enables you to log into your cluster.
*	Keep My Local Environment—This means adding your network IP addresses to the IP Access List. This can be modified at any time.
*	Click on Add My Current IP Address—This is a security measure that ensures only the IP addresses you verify are allowed to interact with your cluster. To connect to this cluster from multiple locations (school, home, work, etc.), you will need to whitelist each IP address from which you intend to connect.
Finally, click on Finish and Close.

###Connect to your Cluster

Go to Databases. Click Connect to continue. Connecting to a MongoDB Atlas database from Python requires a connection string. To get your connection string, click **Connect Your Application**. In **Select your driver and version**, choose Python 3.6 or later. Your connection string will appear below in **Add your connection string into your application code**. Click COPY to copy the string. Paste this string into the keys.py file as mongo_connection_string’s value. Replace “<PASSWORD>” in the connection string with your password, and replace the database name “myFirstDatabase” with “mySpotifyDatabase”,” which will be the database name in this assignment. At the bottom of the Connect to YourClusterName, click Close. You are now ready to interact with your Atlas cluster.


In [14]:
MONGO_STRING = "mongodb+srv://yuvilal1:test123@midterm1.dwdii.mongodb.net/?retryWrites=true&w=majority&appName=Midterm1"

In [15]:
from pymongo import MongoClient
#START YOUR CODE HERE
atlas_client = MongoClient(MONGO_STRING)   #Pass your cluster connection string to the client method
#END YOUR CODE HERE

In [17]:
#START YOUR CODE HERE
database = atlas_client["Spotify"]                #Create a database object and name it for your atlas_client
featured_albums_collection.drop()
featured_albums_collection = database["Albums"]      #Select a name for your collection
#END YOUR CODE HERE

<a name='3'></a>
#3 - Create a Spotify APP

To get access to Spotify's API resources, you need to create a Spotify account if you don't already have one. A trial account will be enough to complete this lab.

1. Go to https://developer.spotify.com/, create an account and log in.
2. Click on the account name in the right-top corner and then click on **Dashboard**.
3. Create a new APP using the following details:
   - App name: You can choose the name, make sure you select only use an alphanumeric string without special characters
   - App description: `DBMS test API application`
   - Website: leave empty
   - Redirect URIs: `http://localhost:6000`
   - API to use: select `Web API`
4. Click on **Save** button. If you get an error message saying that your account is not ready, you can log out, wait for a few minutes and then repeat again steps 2-4.
5. In the App Home page click on **Settings** and reveal `Client ID` and `Client secret`. Make sure you copy those and save them in a separated file!


Here's the link to [the Spotify API documentation](https://developer.spotify.com/documentation/web-api/tutorials/getting-started) that you can refer to while you're working on this assignment.

ClientID = a681cef0e6c04ca4989d6661304c8a0f

ClientSecret = e1d2249870614e3292f89f36a28cb4df

<a name='4'></a>
#4 - Create a Spotify ClientCredential object using the SDK

The Spotipy SDK is a Python client for interacting with Spotify’s Web API. It provides a range of functions to access and manage data related to artists, albums, tracks, playlists, and user profiles. Here’s an overview of some key capabilities that you will explore in this assignment:





*   **Accessing Artist Information**: With Spotipy, you can retrieve detailed information about artists, including their name, genres, popularity score, and followers. The SDK also allows access to an artist's top tracks and related artists, which can help students explore music trends and build up artist profiles for batch storage in MongoDB.

*   **Track and Album Metadata**: Spotipy enables access to metadata for tracks and albums, such as track name, album name, release date, and track popularity. Additionally, you can retrieve audio features like tempo, danceability, and energy, which provide in-depth details about the music and are valuable for data analysis.

*   **Searching for Content**: Using Spotipy’s search functionality, you can query the Spotify catalog by keywords for artists, albums, playlists, or tracks. This can be instrumental in batch processing, as users can search for multiple artists or songs and gather relevant data in one go.

*   **User Profile and Playlist Management**: Spotipy also supports accessing Spotify user profiles and playlists, though this is less relevant for the assignment. However, this feature could provide additional context or personalization if students wanted to explore user-based music preferences.


*   **Authorization and Access Control**: Spotipy handles authorization with Spotify’s OAuth, ensuring that only authenticated requests are made. This allows students to securely access data and manage the rate limits associated with the Spotify API.

In [18]:
import spotipy
import pandas as pd
from spotipy.oauth2 import SpotifyClientCredentials

The first step in working with an API is understanding its authentication process. For Spotify, this involves using a Client ID and Client Secret generated by the Spotify app to obtain an access token. The access token is a string containing the credentials and permissions required to access specific resources. For more information, refer to the Spotify [API documentation](https://developer.spotify.com/documentation/web-api/concepts/access-token).

Since each API is designed with unique features, it’s essential to review its documentation thoroughly to access data responsibly. Throughout this lab, you’ll find links to documentation; it’s recommended to review these during and after the session as needed.

Now, let’s create variables to store the client_id and client_secret values.

In [19]:
CLIENT_ID = 'a681cef0e6c04ca4989d6661304c8a0f'     #Include your client ID here
CLIENT_SECRET = 'e1d2249870614e3292f89f36a28cb4df' #Include your client Secret here

In [20]:
credentials = SpotifyClientCredentials(
        client_id=CLIENT_ID,
        client_secret=CLIENT_SECRET
    )

spotify = spotipy.Spotify(client_credentials_manager=credentials, language='en')  #You can change this if you want to get data from a different lenguage

When working with the Spotify API, you'll receive a temporary access token, with its validity period specified in the `expires_in` field (in seconds). Once this token expires, any subsequent requests will fail and return an error with a status code of 401, indicating that the request is unauthorized.

For each API request you send to Spotify, you need to include the access token in the request’s authorization header. The get_auth_header function is provided to streamline this process. It takes the access token as input and returns a properly formatted authorization header, which you can then include in your API requests.

**If you get an 401 response, please make sure to create your access token again by executing the code below!**

In [21]:
credentials.get_access_token()

  credentials.get_access_token()


{'access_token': 'BQDCOtJ8lUgGNFsUPrlrAjakbndKDcRWG5pEzActrrneTxYKEF6q-KlIjIB7L2vmeIy4BkP0J6zlGrZriE39ccyvvvVaS9TqQbFh1AmHsY0CkN9IhuQ',
 'token_type': 'Bearer',
 'expires_in': 3600,
 'expires_at': 1732593957}

The above token contains the expiration (in seconds) of the token. Once the token expires, you will need to create a new one.

<a name='5'></a>
#5 - Get new releases data from Spotify API

Select one of the following country codes to fetch data from Spotify based on the country of your choice:

* AU: Australia
* AT: Austria
* BE: Belgium
* BO: Bolivia
* BR: Brazil
* BG: Bulgaria
* CA: Canada
* CL: Chile
* CO: Colombia
* CR: Costa Rica
* CY: Cyprus
* DO: Dominican Republic
* FI: Finland
* FR: France
* DE: Germany
* GT: Guatemala
* HN: Honduras
* HK: Hong Kong
* IE: Ireland
* IT: Italy
* JP: Japan
* LV: Latvia
* LU: Luxembourg
* MY: Malaysia
* MT: Malta
* MX: Mexico
* MC: Monaco
* NL: Netherlands
* NZ: New Zealand
* NI: Nicaragua
* PY: Paraguay
* PE: Peru
* PH: Philippines
* PL: Poland
* PT: Portugal
* SG: Singapore
* ES: Spain
* SK: Slovakia
* SE: Sweden
* CH: Switzerland
* TW: Taiwan
* TR: Turkey
* GB: United Kingdom
* US: United States
* UY: Uruguay


**Your task**:
*   Select one country from the list above for which you will retrieve data from the Spotify API.
*   Use the limit parameter to specify the number of records you want to retrieve. Default: 20. Minimum: 1. Maximum: 50


In [22]:
#START YOUR CODE HERE
COUNTRY_CODE = 'CA'
LIMIT = 20
#END YOUR CODE HERE

Now, let's use the token to perform a request to access the first resource, which is the [new_releases](https://spotipy.readthedocs.io/en/2.22.1/?highlight=featured_playlists#spotipy.client.Spotify.new_releases).

**Your tasks**:


1.   Look at the link above and make the correct call to the Spotify end-point and get the new_releases and store them in the `featured_albums`.
2.   Loop through the response and store each record (album) into your MongoDB collection you created above. HINT: You should explore the `featured_albums` response to understand how it is structured, also check the [mongodb doc](https://www.mongodb.com/docs/manual/reference/method/db.collection.insertOne/?msockid=2c010af6d0b963db3ebe1e3ed1496248)



In [24]:
#START YOUR CODE HERE
featured_albums = spotify.new_releases(country=COUNTRY_CODE, limit=LIMIT)
#featured_albums
for album in featured_albums['albums']['items']:
    featured_albums_collection.insert_one(album)

#END YOUR CODE HERE

print (album)
print (len(album))


{'album_type': 'single', 'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/16oZKvXb6WkQlVAjwo2Wbg'}, 'href': 'https://api.spotify.com/v1/artists/16oZKvXb6WkQlVAjwo2Wbg', 'id': '16oZKvXb6WkQlVAjwo2Wbg', 'name': 'The Lumineers', 'type': 'artist', 'uri': 'spotify:artist:16oZKvXb6WkQlVAjwo2Wbg'}], 'available_markets': ['AR', 'AU', 'AT', 'BE', 'BO', 'BR', 'BG', 'CA', 'CL', 'CO', 'CR', 'CY', 'CZ', 'DK', 'DO', 'DE', 'EC', 'EE', 'SV', 'FI', 'FR', 'GR', 'GT', 'HN', 'HK', 'HU', 'IS', 'IE', 'IT', 'LV', 'LT', 'LU', 'MY', 'MT', 'MX', 'NL', 'NZ', 'NI', 'NO', 'PA', 'PY', 'PE', 'PH', 'PL', 'PT', 'SG', 'SK', 'ES', 'SE', 'CH', 'TW', 'TR', 'UY', 'US', 'GB', 'AD', 'LI', 'MC', 'ID', 'JP', 'TH', 'VN', 'RO', 'IL', 'ZA', 'SA', 'AE', 'BH', 'QA', 'OM', 'KW', 'EG', 'MA', 'DZ', 'TN', 'LB', 'JO', 'PS', 'IN', 'BY', 'KZ', 'MD', 'UA', 'AL', 'BA', 'HR', 'ME', 'MK', 'RS', 'SI', 'KR', 'BD', 'PK', 'LK', 'GH', 'KE', 'NG', 'TZ', 'UG', 'AG', 'AM', 'BS', 'BB', 'BZ', 'BT', 'BW', 'BF', 'CV', 'CW', 'DM'

<a name='6'></a>
#6 - Explore your MongoDB collection

This script will connect to the MongoDB collection, query for specific fields (artist ID, name, and URI), and load the data into a list.

**Your tasks:**


1.   Check the following [link](https://www.mongodb.com/docs/manual/reference/method/db.collection.find/) to explore how to use the find() method to query specific fields from your collection.  Ensure that your query retrieves only the artists id, name and uri from your collection. Make sure to read the documentation.
2.   Once you get the data from your query, you should create a pandas DataFrame with the results. You should find a way to combine all records into the `artists_data` dictionary.



In [25]:
artists_data = []
#START YOUR CODE HERE
for album in featured_albums_collection.find({},
    {   "artists.id": 1,
        "artists.name": 1,
        "artists.uri": 1,
        "_id": 0}):
    for artist in album.get("artists", []):
        artist_info = {
            "artist_id": artist["id"],
            "artist_name": artist["name"],
            "artist_uri": artist["uri"]
        }
        artists_data.append(artist_info)
    #END YOUR CODE HERE
artists_df = pd.DataFrame(artists_data)
artists_df

Unnamed: 0,artist_id,artist_name,artist_uri
0,06HL4z0CvFAxyc27GXpf02,Taylor Swift,spotify:artist:06HL4z0CvFAxyc27GXpf02
1,1w5Kfo2jwwIPruYS2UWh56,Pearl Jam,spotify:artist:1w5Kfo2jwwIPruYS2UWh56
2,5Vuvs6Py2JRU7WiFDVsI7J,Lucky Daye,spotify:artist:5Vuvs6Py2JRU7WiFDVsI7J
3,540vIaP2JwjQb9dm3aArA4,DJ Snake,spotify:artist:540vIaP2JwjQb9dm3aArA4
4,12GqGscKJx3aE4t07u7eVZ,Peso Pluma,spotify:artist:12GqGscKJx3aE4t07u7eVZ
5,75JvBeqW4BJ4xgnbMAq6MN,Anne Wilson,spotify:artist:75JvBeqW4BJ4xgnbMAq6MN
6,20qISvAhX20dpIbOOzGK3q,Nas,spotify:artist:20qISvAhX20dpIbOOzGK3q
7,6GEykX11lQqp92UVOQQCC7,DJ Premier,spotify:artist:6GEykX11lQqp92UVOQQCC7
8,2FXC3k01G6Gw61bmprjgqS,Hozier,spotify:artist:2FXC3k01G6Gw61bmprjgqS
9,7A0awCXkE1FtSU8B0qwOJQ,Jamie xx,spotify:artist:7A0awCXkE1FtSU8B0qwOJQ


<a name='7'></a>
#7 - Get all albums from the featured Artists

When we used the `new_releases` method, we actually queried all new relases (based on the parameters you picked) from the Spotify API. This allowed us to save multiple documents (each beign a single released) in an object called `artists_data`. Now your job is to retrieve every single album from the list of artists you got from the `new_releases` method.

Your tasks:


1.   Loop through the `artists_data` object and get the `artist_uri` for each artist. This value will be required for you to call the `artist_albums` method and get all `albums` from that `artist_uri`. Learn more about [artist_albums](https://spotipy.readthedocs.io/en/2.22.1/?highlight=featured_playlists#spotipy.client.Spotify.artist_albums) and [artist_uri](https://spotipy.readthedocs.io/en/2.22.1/?highlight=featured_playlists#ids-uris-and-urls) by clicking the links
2.   You should create a temporal variable to store the results from the `artist_albums` method. Furthermore, you should only store the `items` key from the results inside a variable called `albums`
3.   We want to create a new list with all the different albums that an artist has, to do this, you will first add a new key named `artist_name` that will contain the `artist_name` that you got from the `artist_albums` method.
4.   Join each album inside the `artists_albums` list.
5.   Spotify API works with something called "pagination". Pagination means that within the string response from the API, there will be another set of results contained in the `next` key. This allows us to create consecutive requests from the same element. Your job is to use the `next` [method](https://spotipy.readthedocs.io/en/2.22.1/?highlight=featured_playlists#spotipy.client.Spotify.next) to get the next albums from a given artist. Do not forget to include the `artist_name` just as you did in step 3.



In [27]:
artists_albums = []

#START YOUR CODE HERE
for artist in artists_data:
    results = spotify.artist_albums(artist['artist_uri'], album_type='album')
    #print(results)
    albums = results['items']
# print(albums)
# Add artist's name to each album in the initial results
    for album in albums:
        album['artist_name'] = artist['artist_name']
        artists_albums.append(album)

    # Loop through paginated results, adding artist's name
    while results['next']:
        results = spotify.next(results)
        for album in results['items']:
            album['artist_name'] = artist['artist_name']
            artists_albums.append(album)
    #END YOUR CODE HERE

print(f"Total albums retrieved: {len(artists_albums)}")

Total albums retrieved: 424


<a name='8'></a>
#8 - Create New MongoDB collection

Now that you have your new object with all artists' albums, you will need to create a new Collection in your MongoDB cluster. Use the data you created above to store those in a new MongoDB collection.

Remember to look at this [documentation](https://www.mongodb.com/docs/manual/tutorial/insert-documents/#:~:text=Collection.-,insertOne()%20inserts%20a%20single%20document%20into%20a%20collection.,value%20to%20the%20new%20document) to learn more about MongoDB. Also, DO NOT FORGET to go and verify that the data is in your MongoDB cluster.

In [31]:
#START YOUR CODE HERE
database = atlas_client["Spotify"]       #Select the name of your database
albums_collection.drop()
albums_collection = database["artist_albums"]  #Select the name of your collection

for album in artists_albums:
    albums_collection.insert_one(album)     #Insert the data into MongoDB
#END YOUR CODE HERE

In [32]:
#To check if the data is added correctly
for album in albums_collection.find().limit(5):
    print(album)


{'_id': ObjectId('67453c4da77c19742702e153'), 'album_type': 'album', 'total_tracks': 31, 'available_markets': ['AR', 'AU', 'AT', 'BE', 'BO', 'BR', 'BG', 'CA', 'CL', 'CO', 'CR', 'CY', 'CZ', 'DK', 'DO', 'DE', 'EC', 'EE', 'SV', 'FI', 'FR', 'GR', 'GT', 'HN', 'HK', 'HU', 'IS', 'IE', 'IT', 'LV', 'LT', 'LU', 'MY', 'MT', 'MX', 'NL', 'NZ', 'NI', 'NO', 'PA', 'PY', 'PE', 'PH', 'PL', 'PT', 'SG', 'SK', 'ES', 'SE', 'CH', 'TW', 'TR', 'UY', 'US', 'GB', 'AD', 'LI', 'MC', 'ID', 'JP', 'TH', 'VN', 'RO', 'IL', 'ZA', 'SA', 'AE', 'BH', 'QA', 'OM', 'KW', 'EG', 'MA', 'DZ', 'TN', 'LB', 'JO', 'PS', 'IN', 'KZ', 'MD', 'UA', 'AL', 'BA', 'HR', 'ME', 'MK', 'RS', 'SI', 'KR', 'BD', 'PK', 'LK', 'GH', 'KE', 'NG', 'TZ', 'UG', 'AG', 'AM', 'BS', 'BB', 'BZ', 'BT', 'BW', 'BF', 'CV', 'CW', 'DM', 'FJ', 'GM', 'GE', 'GD', 'GW', 'GY', 'HT', 'JM', 'KI', 'LS', 'LR', 'MW', 'MV', 'ML', 'MH', 'FM', 'NA', 'NR', 'NE', 'PW', 'PG', 'WS', 'SM', 'ST', 'SN', 'SC', 'SL', 'SB', 'KN', 'LC', 'VC', 'SR', 'TL', 'TO', 'TT', 'TV', 'VU', 'AZ', 'BN', '

<a name='9'></a>
#9 - Explore your data!
You have now collected all albums from artists with new releases. Your next task is to explore and analyze this data using Python and MongoDB.

Answer the following questions based on the data in your collection:


1.   How many albums are stored in the collection?
2.   Which artist has the most albums in the collection?
3.   Which artist has the least albums in the collection?
4.   What is the average number of tracks per album? (*Include the Artist Name*)
5.   How many albums are available in each market?
6.   What is the release date of the oldest album? (*Include the Artist Name*)
7.   What are the top 5 albums with the most tracks? (*Include the Artist Name*)
8.   Which albums are available in more than 60 markets? (*Include the Artist Name*)
9.   How many albums does each artist have, and what is the average number of tracks per album for each artist?
10.  Which albums have the word "Deluxe" in their title? (*Include the Artist Name*)

For your reference, here are the MongoDB commands that will be useful for these tasks:

[Aggregate](https://www.mongodb.com/docs/manual/reference/command/aggregate/)

[Find](https://www.mongodb.com/docs/manual/reference/command/find/)


In [33]:
# Question 1:How many albums are stored in the collection? -- 424
counter = 0
for album in albums_collection.find():
    counter += 1

print (counter)

424


In [34]:
# Question 2:Which artist has the most albums in the collection? -- Artist: Taylor Swift, Albums: 58

DataAggregationPipeline = [
    {
        "$group": {"_id": "$artist_name", "album_count": {"$sum": 1}  }
    },
    {
        "$sort": {"album_count": -1}
    },
    {
        "$limit": 10
    }
]


most_albums_artist = albums_collection.aggregate(DataAggregationPipeline)


for artist in most_albums_artist:
    print(f"Artist: {artist['_id']}, Albums: {artist['album_count']}")

Artist: Taylor Swift, Albums: 58
Artist: Nas, Albums: 50
Artist: Pearl Jam, Albums: 38
Artist: Tee Grizzley, Albums: 26
Artist: Eyedress, Albums: 22
Artist: Olamide, Albums: 22
Artist: Rauw Alejandro, Albums: 16
Artist: DJ Premier, Albums: 14
Artist: The Lumineers, Albums: 14
Artist: Nicky Jam, Albums: 14


In [35]:
# Question 3:Which artist has the least albums in the collection? -- Artist: Elvie Shane, Albums: 4

LeastAlbumsPipeline = [
    { '$group': {"_id": "$artist_name", "album_count": {"$sum" : 1} } },

    { "$sort" : {"album_count": 1} },

    { "$limit" : 10}
]

least_played_artist = albums_collection.aggregate(LeastAlbumsPipeline)

for artist in least_played_artist:
  print(f"Artist: {artist['_id']}, Albums: {artist['album_count']}")


Artist: Elvie Shane, Albums: 4
Artist: Zak Abel, Albums: 6
Artist: Cigarettes After Sex, Albums: 6
Artist: Wyatt Flores, Albums: 6
Artist: DJ Snake, Albums: 6
Artist: Jamie xx, Albums: 6
Artist: Nile Rodgers, Albums: 6
Artist: Anne Wilson, Albums: 8
Artist: Trueno, Albums: 8
Artist: Tourist, Albums: 10


In [37]:
# Question 4: What is the average number of tracks per album? (Include the Artist Name) -- Artist: DJ Snake, Average Tracks: 21

AverageTracksPipeline = [
    { "$group" : {"_id": "$artist_name", "average_tracks": {"$avg": "$total_tracks"} } },
    { "$sort" : {"average_tracks": -1} }
]

average_tracks_per_album = albums_collection.aggregate(AverageTracksPipeline)

print("Average Number of Tracks per Album (by Artist):")
for artist in average_tracks_per_album:
    print(f"Artist: {artist['_id']}, Average Tracks: {artist['average_tracks']:.0f}")

Average Number of Tracks per Album (by Artist):
Artist: DJ Snake, Average Tracks: 21
Artist: Taylor Swift, Average Tracks: 20
Artist: Honey Dijon, Average Tracks: 18
Artist: Eyedress, Average Tracks: 17
Artist: Anne Wilson, Average Tracks: 17
Artist: Hozier, Average Tracks: 17
Artist: Nicky Jam, Average Tracks: 16
Artist: Tee Grizzley, Average Tracks: 16
Artist: Olamide, Average Tracks: 15
Artist: The Lumineers, Average Tracks: 15
Artist: Lucky Daye, Average Tracks: 15
Artist: DJ Premier, Average Tracks: 15
Artist: Pearl Jam, Average Tracks: 14
Artist: Nas, Average Tracks: 14
Artist: Kygo, Average Tracks: 14
Artist: Rauw Alejandro, Average Tracks: 14
Artist: Elvie Shane, Average Tracks: 14
Artist: Peso Pluma, Average Tracks: 14
Artist: Trueno, Average Tracks: 12
Artist: Jamie xx, Average Tracks: 12
Artist: Fontaines D.C., Average Tracks: 12
Artist: CKay, Average Tracks: 11
Artist: Tourist, Average Tracks: 10
Artist: Zak Abel, Average Tracks: 10
Artist: Cigarettes After Sex, Average Tra

In [38]:
for doc in albums_collection.find({}, {"total_tracks": 1, "artist_name": 1, "_id": 0}):
    print(doc)

{'total_tracks': 31, 'artist_name': 'Taylor Swift'}
{'total_tracks': 16, 'artist_name': 'Taylor Swift'}
{'total_tracks': 22, 'artist_name': 'Taylor Swift'}
{'total_tracks': 21, 'artist_name': 'Taylor Swift'}
{'total_tracks': 22, 'artist_name': 'Taylor Swift'}
{'total_tracks': 23, 'artist_name': 'Taylor Swift'}
{'total_tracks': 20, 'artist_name': 'Taylor Swift'}
{'total_tracks': 13, 'artist_name': 'Taylor Swift'}
{'total_tracks': 30, 'artist_name': 'Taylor Swift'}
{'total_tracks': 26, 'artist_name': 'Taylor Swift'}
{'total_tracks': 17, 'artist_name': 'Taylor Swift'}
{'total_tracks': 15, 'artist_name': 'Taylor Swift'}
{'total_tracks': 34, 'artist_name': 'Taylor Swift'}
{'total_tracks': 17, 'artist_name': 'Taylor Swift'}
{'total_tracks': 16, 'artist_name': 'Taylor Swift'}
{'total_tracks': 18, 'artist_name': 'Taylor Swift'}
{'total_tracks': 15, 'artist_name': 'Taylor Swift'}
{'total_tracks': 46, 'artist_name': 'Taylor Swift'}
{'total_tracks': 19, 'artist_name': 'Taylor Swift'}
{'total_trac

In [39]:
# Question 5: How many albums are available in each market

AvailableMarketsPipeline = [

        {"$unwind" : "$available_markets"},
        {"$group" : {"_id": "$available_markets", "album_count": {"$sum": 1} } },
        {"$sort" : {"album_count": -1} }

]

album_market_count = albums_collection.aggregate(AvailableMarketsPipeline)

for market in album_market_count:
    print(f"Market: {market['_id']}, Album Count: {market['album_count']}")

Market: US, Album Count: 424
Market: CA, Album Count: 422
Market: MX, Album Count: 384
Market: MO, Album Count: 380
Market: GT, Album Count: 380
Market: ID, Album Count: 380
Market: QA, Album Count: 380
Market: KW, Album Count: 380
Market: TZ, Album Count: 380
Market: JO, Album Count: 380
Market: GM, Album Count: 380
Market: EE, Album Count: 380
Market: PH, Album Count: 380
Market: GW, Album Count: 380
Market: CY, Album Count: 380
Market: AD, Album Count: 380
Market: MA, Album Count: 380
Market: CH, Album Count: 380
Market: GR, Album Count: 380
Market: TH, Album Count: 380
Market: HU, Album Count: 380
Market: NO, Album Count: 380
Market: SA, Album Count: 380
Market: LV, Album Count: 380
Market: LR, Album Count: 380
Market: LK, Album Count: 380
Market: SC, Album Count: 380
Market: AM, Album Count: 380
Market: LT, Album Count: 380
Market: PK, Album Count: 380
Market: BG, Album Count: 380
Market: CW, Album Count: 380
Market: MT, Album Count: 380
Market: KR, Album Count: 380
Market: MW, Al

In [40]:
# Question 6: What is the release date of the oldest album? (Include the Artist Name) -- Artist: Nile Rodgers, Album: Adventures In The Land Of The Good Groove, Release Date: 1983

OldestAlbumPipeline = [
    {"$sort" : {"release_date": 1} },
    {"$limit" : 1}

]

oldest_album = albums_collection.aggregate(OldestAlbumPipeline)

for album in oldest_album:
  artist_name = album["artists"][0]["name"]
  print(f"Artist: {artist_name}, Album: {album['name']}, Release Date: {album['release_date']}")

Artist: Nile Rodgers, Album: Adventures In The Land Of The Good Groove, Release Date: 1983


In [41]:
# Question 7: What are the top 5 albums with the most tracks? (Include the Artist Name)

MostTracksPipeline = [
    # Group by album ID to keep track of the albums
    {
        "$group": {
            "_id": "$name",
            "album_name": {"$first": "$name"},
            "artist_name": {"$first": "$artists"},
            "total_tracks": {"$first": "$total_tracks"}
        }
    },
    { "$sort" : {"total_tracks": -1} },
    { "$limit" : 5 }
]

most_tracks = albums_collection.aggregate(MostTracksPipeline)

print("Top 5 Albums with the Most Tracks:")
for album in most_tracks:
    artist_name = album["artist_name"][0]["name"]  # Assuming the first artist is the main artist
    album_name = album["album_name"]
    total_tracks = album["total_tracks"]

    print(f"Artist: {artist_name}, Album: {album_name}, Tracks: {total_tracks}")


Top 5 Albums with the Most Tracks:
Artist: Taylor Swift, Album: reputation Stadium Tour Surprise Song Playlist, Tracks: 46
Artist: Honey Dijon, Album: DJ-Kicks: Honey Dijon, Tracks: 35
Artist: Taylor Swift, Album: folklore: the long pond studio sessions (from the Disney+ special) [deluxe edition], Tracks: 34
Artist: Eyedress, Album: Vampire in Beverly Hills, Tracks: 34
Artist: DJ Snake, Album: Carte Blanche (Deluxe), Tracks: 32


In [42]:
AlbumsInManyMarketsPipeline = [

    {
        "$unwind": "$artists"
        },

    {
        "$unwind": "$available_markets"
        },

    {
        "$group":{
                  "_id":"$name",
                  "artist_name":{"$first":"$artist.name"},
                  "market_count":{"$sum":1}
        }
    },

    {
        "$match":{"market_count":{"$gt":60}}},

    {
        "$project":{
                  "_id":0,
                 "artist_name":1,
                 "album_name":"$_id",
                 "market_count":1
                  }
    }
]

album_with_more_than_60_markets = albums_collection.aggregate(AlbumsInManyMarketsPipeline)

print("Albums Available in More Than 60 Markets:")

for album in album_with_more_than_60_markets:

    print(f"Artist: {album['artist_name']}, Album: {album['album_name']}, Markets: {album['market_count']}")



Albums Available in More Than 60 Markets:
Artist: None, Album: Let's Play Two (Live / Original Motion Picture Soundtrack), Markets: 368
Artist: None, Album: YBNL, Markets: 370
Artist: None, Album: Skinty Fia go deo, Markets: 366
Artist: None, Album: Midnights (The Til Dawn Edition), Markets: 366
Artist: None, Album: Tha Blaqprint, Markets: 740
Artist: None, Album: My Jesus (Anniversary Deluxe), Markets: 366
Artist: None, Album: KYGO (The Remixes), Markets: 370
Artist: None, Album: Salon De La Fama, Markets: 370
Artist: None, Album: The Smartest, Markets: 370
Artist: None, Album: BIEN O MAL, Markets: 370
Artist: None, Album: Cry, Markets: 370
Artist: None, Album: Painted (Deluxe Edition), Markets: 370
Artist: None, Album: NASIR, Markets: 358
Artist: None, Album: Love Over Fear, Markets: 370
Artist: None, Album: B-Movie Matinee, Markets: 344
Artist: None, Album: Carpe Diem, Markets: 370
Artist: None, Album: Damascus, Markets: 370
Artist: None, Album: The Lumineers, Markets: 370
Artist: N

In [43]:
# Question 9 How many albums does each artist have, and what is the average number of tracks per album for each artist?

AlbumsperArtistPipeline = [
    {
        "$group": {
            "_id": "$artist_name",
            "album_count": {"$sum": 1},
            "average_tracks": {"$avg": "$total_tracks"}
        }
    },

    { "$sort" : {"album_count": -1} }
]

albums_per_artist = albums_collection.aggregate(AlbumsperArtistPipeline)

print("Albums per Artist and Average Tracks per Album:")
for artist in albums_per_artist:
    print(f"Artist: {artist['_id']}, Albums: {artist['album_count']}, Average Tracks: {artist['average_tracks']:.0f}")



Albums per Artist and Average Tracks per Album:
Artist: Taylor Swift, Albums: 58, Average Tracks: 20
Artist: Nas, Albums: 50, Average Tracks: 14
Artist: Pearl Jam, Albums: 38, Average Tracks: 14
Artist: Tee Grizzley, Albums: 26, Average Tracks: 16
Artist: Olamide, Albums: 22, Average Tracks: 15
Artist: Eyedress, Albums: 22, Average Tracks: 17
Artist: Rauw Alejandro, Albums: 16, Average Tracks: 14
Artist: DJ Premier, Albums: 14, Average Tracks: 15
Artist: The Lumineers, Albums: 14, Average Tracks: 15
Artist: Nicky Jam, Albums: 14, Average Tracks: 16
Artist: Kygo, Albums: 14, Average Tracks: 14
Artist: Lucky Daye, Albums: 12, Average Tracks: 15
Artist: Honey Dijon, Albums: 12, Average Tracks: 18
Artist: Fontaines D.C., Albums: 12, Average Tracks: 12
Artist: Peso Pluma, Albums: 12, Average Tracks: 14
Artist: Hozier, Albums: 12, Average Tracks: 17
Artist: Tourist, Albums: 10, Average Tracks: 10
Artist: CKay, Albums: 10, Average Tracks: 11
Artist: Anne Wilson, Albums: 8, Average Tracks: 17


In [44]:
# Question 10 Which albums have the word "Deluxe" in their title? (Include the Artist Name)

DeluxeAlbumsPipeline = [
    {
        "$match": {
            "name": {"$regex": "Deluxe", "$options": "i"}
        }

    },
    {
        "$project" : {
            "_id" : 0, # To exclude Id
            "artist_name" : "$artists.name", # include artist name
            "album_name" : "$name" # include album name
        }
    }
]
Word = albums_collection.aggregate(DeluxeAlbumsPipeline)

print("Albums with the Word 'Deluxe':")
for album in Word:
    print(f"Artist: {album['artist_name']}, Album: {album['album_name']}")

Albums with the Word 'Deluxe':
Artist: ['Taylor Swift'], Album: 1989 (Taylor's Version) [Deluxe]
Artist: ['Taylor Swift'], Album: evermore (deluxe version)
Artist: ['Taylor Swift'], Album: folklore: the long pond studio sessions (from the Disney+ special) [deluxe edition]
Artist: ['Taylor Swift'], Album: folklore (deluxe version)
Artist: ['Taylor Swift'], Album: 1989 (Deluxe Edition)
Artist: ['Taylor Swift'], Album: Red (Deluxe Edition)
Artist: ['Taylor Swift'], Album: Speak Now (Deluxe Edition)
Artist: ['Lucky Daye'], Album: Candydrip (Deluxe)
Artist: ['Lucky Daye'], Album: Painted (Deluxe Edition)
Artist: ['DJ Snake'], Album: Carte Blanche (Deluxe)
Artist: ['Anne Wilson'], Album: My Jesus (Anniversary Deluxe)
Artist: ['Nas'], Album: Life Is Good (Deluxe)
Artist: ['CKay'], Album: Sad Romance (Deluxe)
Artist: ['Tee Grizzley'], Album: Tee’s Coney Island (Deluxe)
Artist: ['The Lumineers'], Album: Cleopatra (Deluxe Edition)
Artist: ['The Lumineers'], Album: The Lumineers (Deluxe Edition)


<a name='10'></a>
#10 - Create an interactive map using Folium!

Folium is a Python library that simplifies the creation of interactive, visually appealing maps. It acts as a wrapper for the Leaflet.js JavaScript library, allowing users to create maps in Python without needing to write JavaScript. [Learn More](https://python-visualization.github.io/folium/latest/)

Here are a few key concepts about this library:

* **Map Initialization**: Folium provides a Map class that lets users set a central location and zoom level, initializing a map on which they can place markers or other geographic elements.
* **Adding Markers**: Folium’s Marker class lets students place icons on the map at specific locations, which can contain popups with details (like artist names and album information). This feature is key for visualizing different locations where an artist’s album is available.
* **Customization and Interactivity**: The library supports customizing marker icons, colors, and popups, making the map both interactive and visually informative. Users can click on markers to view additional information about the album or artist, which makes exploring data on a map engaging.

In this assignment, Folium will allow you to see where each album is available. Each marker represents a market (country) in which the artist’s album has been released, making it easy to see the global spread and reach of the album. By adding artist and album information to the markers, you can visually assess which artists and albums have the widest distribution.

**Geopy** is a Python library that enables geocoding—converting addresses or location names (like country codes) into latitude and longitude coordinates, which can then be used for plotting on a map.

Here’s how we will be using it:

* **Geocoding Services**: Geopy can connect to multiple geocoding providers (like Nominatim, Google Maps, etc.) to look up geographic data. When given a country code, Geopy queries the provider to retrieve the corresponding latitude and longitude.
* **Caching and Rate Limiting**: Geopy includes rate limits to prevent users from overwhelming the service with requests. This is particularly helpful in this assignment, because many albums might be available in multiple countries! And we will provide 1 request per country per album.

In [46]:
import folium
import time
from pymongo import MongoClient
from geopy.geocoders import Nominatim
from geopy.exc import GeocoderTimedOut

Your tasks:


1.   Because your collection can have many albums and many markets you will need to first create a new query using pymongo to get specific records from your collection that will be used for your map. Your task is to create a variable to store the TOP 10 albums order by the lastest `release_date`.
2.   We will send request to the Nominatim geocoding provider you will need to give a name to the geolocator.
3.   We have defined a function called `get_coordinates` that will take as an input a `country_code` and will return a tuple with the coordinates of that country. We want to reduce having multiple API requests for countries that we already asked the coordinates for (this is because multiple albums may have the same market). Your task is to check if the `country_code` passed to the `get_coordinates` was already provided by checking if it is within our `coordinates_cache` dictionary.
4.   If the country code has not been retrieved before, you will need to pass it to the geocode method to get the coordinates.
5.   Create a tuple with the `latitude` and `longitude` values
6.   In step 1 you got a variable with a list of albums that you need to map using Folium. You should loop and get the following information for each album: `artist_name`, `name` and `available_markets`.
7.   Because each `available_market` for each record can contain multiple countries, you will need to loop through each country and call the `get_coordinates` function you just created and pass the current market
8.   Now you are ready to create a folium Marker. This market should have the `artist_name`, `album_name` and `market`. You can change the colors and icon if you like. Learn more about this [here](https://python-visualization.github.io/folium/latest/reference.html#folium.map.Marker)



 NOTE: The way that we have constructed the map only allows a single record to be shown in the MAP. Please avoid creating a query that returns multiple records as only the last record will be mapped using the code below.




In [None]:
# Initialize a Folium map centered globally
map_ = folium.Map(location=[20, 0], zoom_start=2)
coordinates_cache = {}

#START YOUR CODE HERE
geolocator = Nominatim(user_agent="album")   #Name your geolocator

latest_albums = albums_collection.find().sort("release_date", -1).limit(10)        #Create your query using pymongo here

def get_coordinates(country_code):
    if country_code in coordinates_cache:
        return coordinates_cache[country_code]

    try:
        location = geolocator.geocode(country_code, timeout=10)
        if location:
            coords = (location.latitude, location.longitude)
            coordinates_cache[country_code] = coords
            return coords
        else:
            return None
    except GeocoderTimedOut:
        return None

# Loop through each album and add markers for available markets
for album in latest_albums:
    artist_name = album["artists"][0]["name"]
    album_name = album["name"]
    available_markets = album["available_markets"]

    for market in available_markets:
        coords = get_coordinates(market)
        if coords:
            # Add a marker with a popup showing artist and album information
            folium.Marker(
                location=coords,
                popup=f"Artist: {artist_name}<br>Album: {album_name}<br>Market: {market}",
                icon=folium.Icon(color="blue", icon="music")
            ).add_to(map_)
        time.sleep(1)
#END YOUR CODE HERE
map_

# Video Submission

In your **5-minute** (maximum) video, ensure you:

1. **Explain your understanding of the assignment** and the process for each of the 10 steps:
   - **Clear Overview:** Provide a comprehensive overview covering each of the 10 steps in the assignment, such as setting up MongoDB, accessing the Spotify API, and querying data.
   - **Thoughtful Reflections:** Highlight the challenges you encountered and how you solved them. Discuss any key choices you made (e.g., structuring MongoDB queries), showcasing your thought process and approach.
   - **Depth of Understanding:** Demonstrate a strong understanding of each step, reflecting on both the successes and obstacles you faced.

2. **Provide a detailed explanation of the insights you gained** from querying the data:
   - **Learning Outcomes:** Clearly articulate what you learned from the data queries, connecting these insights to specific questions you aimed to answer (e.g., identifying which artist has the most albums or finding the oldest album).
   - **Contributions to Understanding:** Explain how these insights contribute to a deeper understanding of the data and your overall assignment, emphasizing the value of your analysis.

3. **Explain the interactive map** you created with **Folium**:
   - **Insights from the Map:** Describe the insights the map helped you uncover, including any geographical patterns or trends in the data.
   - **Enhancing Analysis:** Discuss how the interactive map enhanced your analysis and contributed to your overall findings, demonstrating its relevance to your assignment.
