# Music Data Analysis with AWS Bedrock

## FMA (Free Music Archive) Data Model

<pre>
+-----------------+ 1       n +-----------------+ 1       n +-----------------+
|                 |<----------|                 |<----------|                 |
|     Genres      |           |     Tracks      |           |     Albums      |
|                 |           |                 |           |                 |
+-----------------+           +-----------------+           +-----------------+
| genre_id  (PK)  |           | genre_id (FK)   |           | album_id  (PK)  |
| genre_title     |           | track_id  (PK)  |           | album_title     |
| ...             |           | album_id (FK)   |<----------| artist_name (FK)|
+-----------------+           | artist_id (FK)  |           | ...             |
                              | track_title     |           +-----------------+
                              | ...             |
                              +-----------------+
                                      |
                                      | 1       n
                                      V
                               +-----------------+
                               |                 |
                               |     Artists     |
                               |                 |
                               +-----------------+
                               | artist_id  (PK) |
                               | artist_name     |
                               | ...             |
                               +-----------------+
</pre>


## FMA DataModel

#### Genres (genres.csv): 
This table contains information about different music genres. It has five columns: genre_id, genre_color, genre_handle, genre_parent_id, and genre_title. The primary key for this table is genre_id, which is a unique identifier for each genre.
#### Tracks (raw_tracks.csv):
This table contains detailed information about tracks. It includes columns like track_id, album_id, artist_id, track_genres, etc. The primary key for this table is track_id. This table has foreign keys album_id and artist_id which link to the Albums and Artists tables respectively. The track_genres column contains a list of genres associated with each track, which can be linked to the genre_id in the Genres table.
#### Albums (albums.csv): 
This table contains information about different music albums. It has columns like album_id, album_title, artist_name, etc. The primary key for this table is album_id, and it has a foreign key artist_name linking to the Artists table
#### Artists (artists.csv): 
This table contains information about different artists. It has columns like artist_id, artist_name, artist_website, etc. The primary key for this table is artist_id

### Import Libraries Load the data from S3

In [7]:
# pip install boto3 pandas

In [90]:
from sklearn.model_selection import train_test_split
import boto3
import json
import pandas as pd
import ast

# Initialize boto3 client for S3
s3 = boto3.client('s3', region_name='us-east-1')
bucket_name='fma-data-analysis'
raw_files_prefix='raw/'

In [91]:
def load_csv_from_s3(bucket_name, file_name):
    print(file_name)
    obj = s3.get_object(Bucket=bucket_name, Key=file_name)
    df = pd.read_csv(obj['Body'])
    return df


In [92]:
# Load each CSV file into a DataFrame
tracks = load_csv_from_s3(bucket_name, raw_files_prefix+'raw_tracks.csv')
genres = load_csv_from_s3(bucket_name, raw_files_prefix+'raw_genres.csv')
# features = load_csv_from_s3(bucket_name, raw_files_prefix+'features.csv')
# echonest = load_csv_from_s3(bucket_name, raw_files_prefix+'echonest.csv')

raw/raw_tracks.csv
raw/raw_genres.csv


In [148]:
tracks.head(2)

Unnamed: 0,track_id,album_id,album_title,album_url,artist_id,artist_name,artist_url,artist_website,license_image_file,license_image_file_large,...,track_instrumental,track_interest,track_language_code,track_listens,track_lyricist,track_number,track_publisher,track_title,track_url,genre_titles
0,2,1.0,AWOL - A Way Of Life,http://freemusicarchive.org/music/AWOL/AWOL_-_...,1,AWOL,http://freemusicarchive.org/music/AWOL/,http://www.AzillionRecords.blogspot.com,http://i.creativecommons.org/l/by-nc-sa/3.0/us...,http://fma-files.s3.amazonaws.com/resources/im...,...,0,4656,en,1293,,3,,Food,http://freemusicarchive.org/music/AWOL/AWOL_-_...,[Hip-Hop]
1,3,1.0,AWOL - A Way Of Life,http://freemusicarchive.org/music/AWOL/AWOL_-_...,1,AWOL,http://freemusicarchive.org/music/AWOL/,http://www.AzillionRecords.blogspot.com,http://i.creativecommons.org/l/by-nc-sa/3.0/us...,http://fma-files.s3.amazonaws.com/resources/im...,...,0,1470,en,514,,4,,Electric Ave,http://freemusicarchive.org/music/AWOL/AWOL_-_...,[Hip-Hop]


## Example 1: Generating a music artist bio using the genre of their songs

In [7]:
def extract_genre_id(track_genres):
    if pd.isna(track_genres):
        return None
    genre_list = ast.literal_eval(track_genres)
    if genre_list:
        return int(genre_list[0]['genre_id'])
    else:
        return None

In [8]:
tracks['genre_id'] = tracks['track_genres'].apply(extract_genre_id)


In [10]:
data.head(2)


Unnamed: 0,track_id,album_id,album_title,album_url,artist_id,artist_name,artist_url,artist_website,license_image_file,license_image_file_large,...,track_lyricist,track_number,track_publisher,track_title,track_url,genre_id,genre_color,genre_handle,genre_parent_id,genre_title
0,2,1.0,AWOL - A Way Of Life,http://freemusicarchive.org/music/AWOL/AWOL_-_...,1,AWOL,http://freemusicarchive.org/music/AWOL/,http://www.AzillionRecords.blogspot.com,http://i.creativecommons.org/l/by-nc-sa/3.0/us...,http://fma-files.s3.amazonaws.com/resources/im...,...,,3,,Food,http://freemusicarchive.org/music/AWOL/AWOL_-_...,21.0,#CC0000,Hip-Hop,,Hip-Hop
1,3,1.0,AWOL - A Way Of Life,http://freemusicarchive.org/music/AWOL/AWOL_-_...,1,AWOL,http://freemusicarchive.org/music/AWOL/,http://www.AzillionRecords.blogspot.com,http://i.creativecommons.org/l/by-nc-sa/3.0/us...,http://fma-files.s3.amazonaws.com/resources/im...,...,,4,,Electric Ave,http://freemusicarchive.org/music/AWOL/AWOL_-_...,21.0,#CC0000,Hip-Hop,,Hip-Hop


In [69]:
data['prompt'] = f"\n\nHuman: Write a captivating bio for the electronic music artist {data['artist_name']}, who is known for their {data['genre_title']} music.\n\nAssistant:"

In [94]:
from io import StringIO
import sys
import textwrap

def print_ww(*args, width: int = 100, **kwargs):
    """Like print(), but wraps output to `width` characters (default 100)"""
    buffer = StringIO()
    try:
        _stdout = sys.stdout
        sys.stdout = buffer
        print(*args, **kwargs)
        output = buffer.getvalue()
    finally:
        sys.stdout = _stdout
    for line in output.splitlines():
        print("\n".join(textwrap.wrap(line, width=width)))

### Initializing Bedrock client

In [95]:
import boto3,os
import sys

boto3_bedrock = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None)
)


Create new client
  Using region: us-east-1
boto3 Bedrock client successfully created!
bedrock-runtime(https://bedrock-runtime.us-east-1.amazonaws.com)


In [18]:
# %pip install --no-build-isolation --force-reinstall \
#     "boto3>=1.28.57" \
#     "awscli>=1.29.57" \
#     "botocore>=1.31.57"

In [141]:


genre_dict = genres.set_index('genre_id')['genre_title'].to_dict()

def extract_genre_titles(track_genres):
    if pd.isna(track_genres):
        return None
    genre_list = ast.literal_eval(track_genres)
    genre_titles = []
    for genre in genre_list:
        genre_id = int(genre['genre_id'])
        if genre_id in genre_dict:
            genre_titles.append(genre_dict[genre_id])
    return genre_titles

tracks['genre_titles'] = tracks['track_genres'].apply(extract_genre_titles)



In [111]:
tracks_with_multiple_genres = tracks[tracks['genre_titles'].apply(lambda x: len(x) > 1 if x is not None else False)]

tracks_with_multiple_genres.head(5)


Unnamed: 0,track_id,album_id,album_title,album_url,artist_id,artist_name,artist_url,artist_website,license_image_file,license_image_file_large,...,track_interest,track_language_code,track_listens,track_lyricist,track_number,track_publisher,track_title,track_url,genre_hierarchy,genre_titles
4,20,4.0,Niris,http://freemusicarchive.org/music/Chris_and_Ni...,4,Nicky Cook,http://freemusicarchive.org/music/Chris_and_Ni...,,http://i.creativecommons.org/l/by-nc-nd/3.0/88...,http://fma-files.s3.amazonaws.com/resources/im...,...,978,en,361,,3,,Spiritual Level,http://freemusicarchive.org/music/Chris_and_Ni...,"[Experimental Pop, Pop, Singer-Songwriter, Folk]","[Experimental Pop, Singer-Songwriter]"
5,26,4.0,Niris,http://freemusicarchive.org/music/Chris_and_Ni...,4,Nicky Cook,http://freemusicarchive.org/music/Chris_and_Ni...,,http://i.creativecommons.org/l/by-nc-nd/3.0/88...,http://fma-files.s3.amazonaws.com/resources/im...,...,1060,en,193,,4,,Where is your Love?,http://freemusicarchive.org/music/Chris_and_Ni...,"[Experimental Pop, Pop, Singer-Songwriter, Folk]","[Experimental Pop, Singer-Songwriter]"
6,30,4.0,Niris,http://freemusicarchive.org/music/Chris_and_Ni...,4,Nicky Cook,http://freemusicarchive.org/music/Chris_and_Ni...,,http://i.creativecommons.org/l/by-nc-nd/3.0/88...,http://fma-files.s3.amazonaws.com/resources/im...,...,718,en,612,,5,,Too Happy,http://freemusicarchive.org/music/Chris_and_Ni...,"[Experimental Pop, Pop, Singer-Songwriter, Folk]","[Experimental Pop, Singer-Songwriter]"
7,46,4.0,Niris,http://freemusicarchive.org/music/Chris_and_Ni...,4,Nicky Cook,http://freemusicarchive.org/music/Chris_and_Ni...,,http://i.creativecommons.org/l/by-nc-nd/3.0/88...,http://fma-files.s3.amazonaws.com/resources/im...,...,252,en,171,,8,,Yosemite,http://freemusicarchive.org/music/Chris_and_Ni...,"[Experimental Pop, Pop, Singer-Songwriter, Folk]","[Experimental Pop, Singer-Songwriter]"
8,48,4.0,Niris,http://freemusicarchive.org/music/Chris_and_Ni...,4,Nicky Cook,http://freemusicarchive.org/music/Chris_and_Ni...,,http://i.creativecommons.org/l/by-nc-nd/3.0/88...,http://fma-files.s3.amazonaws.com/resources/im...,...,247,en,173,,9,,Light of Light,http://freemusicarchive.org/music/Chris_and_Ni...,"[Experimental Pop, Pop, Singer-Songwriter, Folk]","[Experimental Pop, Singer-Songwriter]"


In [118]:
# Group tracks by artist and combine genre titles
grouped_tracks = tracks_with_multiple_genres.groupby('artist_name')['genre_titles'].sum().reset_index()
grouped_tracks_subset = grouped_tracks.head(5)

### Invoking the model Anthropic Claude V2

* <b>prompt:</b> This is the initial string of text that the model uses as a starting point to generate further text. It sets the context for the generated text 1.
max_tokens_to_sample: This parameter specifies the maximum number of tokens to generate in the response. A token can be as short as one character or as long as one word. This parameter is used to control the length of the generated text.
* <b>temperature:</b>  This parameter controls the randomness of the model's responses. A lower value (closer to 0) makes the model's outputs more deterministic and focused, while a higher value increases the randomness and diversity of the output.
* <b>top_k:</b> This parameter is used during the sampling process to restrict the pool of words that the model considers for the next word in the sequence. The model will only sample from the * top_k most probable words when generating the next word. This can help control the randomness of the output.
* <b>top_p:</b> Also known as nucleus sampling, this parameter is used to further control the randomness in the model's output. It cuts off the least probable options, ensuring that the cumulative probability of the chosen options is at least top_p. This can help in reducing the randomness and improving the quality of the output.
* <b>stop_sequences:</b> This parameter allows you to specify sequences of tokens that, when encountered, signal the model to stop generating further tokens. This can be useful to control where the generated text ends.

In [None]:
num_tokens = textgen_llm.get_num_tokens(prompt)


In [142]:

modelId = 'anthropic.claude-v2' 
for index, row in grouped_tracks_subset.iterrows():
    artist_name = row['artist_name']
    genre_titles = ', '.join(set(row['genre_titles']))  # Use set to remove duplicates
    prompt_data = f"\n\nHuman: Write a captivating bio for the electronic music artist {artist_name}, who is known for their {genre_titles} music.\n\nAssistant:"
    print(prompt_data)

     # Claude - Body Syntex
    body = json.dumps({
                    "prompt": prompt_data,
                    "max_tokens_to_sample":4096,
                    "temperature":1,
                    "top_k":250,
                    "top_p":0.5,
                    "stop_sequences": ["\n\nHuman:"]
                  }) 
    response = boto3_bedrock.invoke_model(body=body, modelId=modelId, accept='application/json', contentType='application/json')
    response_body = json.loads(response.get('body').read())
    print_ww(response_body.get('completion'))





Human: Write a captivating bio for the electronic music artist "Blue" Gene Tyranny, who is known for their Avant-Garde, Experimental, Composed Music music.

Assistant:
 Here is a draft captivating bio for the electronic music artist "Blue" Gene Tyranny:

Pushing the boundaries of avant-garde electronic music since the 1970s, "Blue" Gene Tyranny has
carved out a unique space in the experimental and composed music worlds. Tyranny's innovative
compositions blend electronic textures with acoustic instruments, fusing classical sensibilities
with cutting-edge techniques.

Whether collaborating with giants like Robert Ashley or creating solo works, Tyranny's music
inhabits a realm between pure sound and structure. Hypnotic yet cerebral, Tyranny's compositions
reveal hidden patterns and meanings upon repeated listens. Yet they always retain an air of mystery
and discovery.

A true pioneer in the electronic arts, Tyranny helped develop the very first sampling keyboard, the
SampleMaster. This 

## Example 2: Summarizing the most popular music genres

In [122]:
tracks_sample = load_csv_from_s3(bucket_name, raw_files_prefix+'tracks_sample.csv')

raw/tracks_sample.csv


In [131]:
# Convert DataFrame to string
# Select relevant columns
tracks_df = tracks_sample[['track_id', 'album_id', 'artist_name', 'track_genres']]
tracks_sample_str = tracks_df.to_string(index=False)


In [132]:
prompt = f"Human:{tracks_sample_str}\n\nGenerate a summary of the most popular music genres in this data.\n\nAssistant:"

In [133]:
body = json.dumps({
                "prompt": prompt,
                "max_tokens_to_sample":4096,
                "temperature":1,
                "top_k":250,
                "top_p":0.5,
                "stop_sequences": ["\n\nHuman:"]
              }) 


response = boto3_bedrock.invoke_model(body=body, modelId=modelId, accept='application/json', contentType='application/json')
response_body = json.loads(response.get('body').read())
print_ww(response_body.get('completion'))



 Here is a summary of the most popular music genres in the data:

- The most common genre is 'Rock', with 192 track entries.

- The second most common is 'Electronic', with 144 entries.

- 'Folk' is third with 139 entries.

- 'Hip-Hop' has 97 entries.

- 'Punk' has 91 entries.

- 'Pop' has 87 entries.

- 'Jazz' has 84 entries.

- 'Avant-Garde' has 67 entries.

- 'Experimental' has 64 entries.

- 'Lo-Fi' has 62 entries.

So in summary, the most popular genres in descending order are:

1. Rock
2. Electronic
3. Folk
4. Hip-Hop
5. Punk
6. Pop
7. Jazz
8. Avant-Garde
9. Experimental
10. Lo-Fi

This gives a good overview of the most common genres present in the music data. Rock is clearly the
most prevalent, followed by electronic and folk styles.


### Example 3: Creating playlists of songs in different genre

In [136]:
# Convert DataFrame to string
# Select relevant columns
tracks_df = tracks_sample[['track_title', 'album_title', 'artist_name', 'track_genres']]
tracks_sample_str = tracks_df.to_string(index=False)

prompt = f"Human:{tracks_sample_str}\n\nCreate playlists of 30 songs in Rock and Electronic Genre.\n\nAssistant:"

In [137]:
body = json.dumps({
                "prompt": prompt,
                "max_tokens_to_sample":4096,
                "temperature":1,
                "top_k":250,
                "top_p":0.5,
                "stop_sequences": ["\n\nHuman:"]
              }) 


response = boto3_bedrock.invoke_model(body=body, modelId=modelId, accept='application/json', contentType='application/json')
response_body = json.loads(response.get('body').read())
print_ww(response_body.get('completion'))


 Here are two 30 song playlists, one for the Rock genre and one for the Electronic genre:

Rock Playlist:

track_title                                                   album_title
artist_name
Referents                                             Open Relationship
Bird Names
A New Map for Kissing                                              On Opaque Things
Bird Names
Ghosts                                      Wooden Lake Sexual Diner
Bird Names
The Indefinite Time Yet to Come                                              On Opaque Things
Bird Names
Smoovebiz                                      Wooden Lake Sexual Diner
Bird Names
Nobody Loves Me                                      Wooden Lake Sexual Diner
Bird Names
It's Becoming a Stranger                                              On Opaque Things
Bird Names
New Life                                             Open Relationship
Bird Names
Masters Of Enjoyment                                             Open Relationship
Bird Name