# Session 11: Practice

In this practice we are going to work using my spotify data. I have downloaded my data from spotify and I have a file called `history.json` that contains all the songs I have listened to in the last year.

The first thing we are going to do is to load the data and take a look at it.

Since the data is a JSON file, we are going to use the `json` library to load it.

```python
import json

with open('history.json') as f:
    spotify = json.load(f)
```

This will load the data into the variable `data`. The data is a list of dictionaries, where each dictionary represents a song I have listened to.

In [1]:
import json

with open('history.json') as f:
    spotify = json.load(f)

spotify[:2]

[{'endTime': '2023-11-14 08:13',
  'artistName': 'Eminem',
  'trackName': 'Stan',
  'msPlayed': 11093},
 {'endTime': '2023-11-14 08:13',
  'artistName': 'The Sugarcubes',
  'trackName': 'Birthday',
  'msPlayed': 128615}]

## Exercise 1

Take a look at the data. How many songs have I listened to in the last year?

In [2]:
len(spotify)

14044

## Exercise 2

What is the average reproduction time of the songs I have listened to?

In [3]:
total_reproduction = 0

for song in spotify:
    total_reproduction += song['msPlayed']

print(total_reproduction / len(spotify) / 60 / 1000)

2.476529684088104


2.476529684088104


## Exercise 3

Create a function that receives the `spotify` data and return the artist I've listened to the most times.

Also, create another function but for the song I've listened to the most times.

In [4]:
def most_listened_artist(spotify):
    artists = {}

    for song in spotify:
        artist = song['artistName']
        if artist in artists:
            artists[artist] += 1
        else:
            artists[artist] = 1
    
    most_played = max(artists, key = artists.get)

    return most_played, artists[most_played]

most_listened_artist(spotify)

('Mujeres', 705)

In [10]:
def most_played_object(spotify):
    track_play_time = {}

    for song in spotify:
        time = song['msPlayed']
        name = song['trackName']
        if name in track_play_time:
            track_play_time[name] += time
        else:
            track_play_time[name] = time
    
    most_played = max(track_play_time, key = track_play_time.get)
    return most_played, track_play_time[most_played]

most_played_object(spotify)

('Un Gesto Brillante', 11365071)

In [12]:
def most_occurred_song(spotify):
    songs = {}

    for song in spotify:
        song_name = song['trackName']
        if song_name in songs:
            songs[song_name] += 1
        else:
            songs[song_name] = 1
    
    most_played = max(songs, key = songs.get)

    return most_played, songs[most_played]

most_occurred_song(spotify)

('Unknown Track', 310)

## Exercise 4

Create a function that receives the `spotify` data and returns the total reproduction time, in hours, of the songs I have listened to.

In [8]:
def total_reproduction(spotify):
    total_reproduction_time = 0
    for song in spotify:
        total_reproduction_time += song['msPlayed']
    
    return f'The total reproduction time is {total_reproduction_time / 1000 / 60 / 60}'

print(total_reproduction(spotify))

The total reproduction time is 579.6730480555556


## Exercise 5

Regarding reproduction time, is my most listened artist the one with the most reproduction time?

What about the song?

In [9]:
most_listened_artist = {}

for song in spotify:
    artist = song['artistName']
    time = song['msPlayed']
    if artist in most_listened_artist:
        most_listened_artist[artist] += time
    else:
        most_listened_artist[artist] = time

most = max(most_listened_artist, key = most_listened_artist.get)

print(most, most_listened_artist[most])

Mujeres 99030565


In [10]:
most_listened_song = {}

for song in spotify:
    track = song['trackName']
    time = song['msPlayed']
    if track in most_listened_song:
        most_listened_song[track] += time
    else:
        most_listened_song[track] = time

most = max(most_listened_song, key = most_listened_song.get)

print(most, most_listened_song[most])

Un Gesto Brillante 11365071


## Exercise 6

Using the `datetime` library, create new fields in each of the dictionaries in the `spotify` data that represent the following:

* `date`: the date when the song was listened to.
* `time`: the time when the song was listened to.
* `day_of_week`: the day of the week when the song was listened to.
* `hour`: the hour when the song was listened to.
* `month`: the month when the song was listened to.

In [None]:
# Reference 
from datetime import datetime

str_datetime = '2023-11-14 08:13'
dt = datetime.strptime(str_datetime, '%Y-%m-%d %H:%M')

date = dt.date()
time = dt.time()
day_of_week = dt.weekday()
hour = dt.hour
month = dt.month

print(date)
print(time)
print(day_of_week)
print(hour)
print(month)

In [14]:
from datetime import datetime

for song in spotify:
    dt = datetime.strptime(song['endTime'], '%Y-%m-%d %H:%M')
    song['date'] = dt.date()
    song['time'] = dt.time()
    song['day_of_week'] = dt.weekday()
    song['hour'] = dt.hour
    song['month'] = dt.month

spotify[:2]

[{'endTime': '2023-11-14 08:13',
  'artistName': 'Eminem',
  'trackName': 'Stan',
  'msPlayed': 11093,
  'date': datetime.date(2023, 11, 14),
  'time': datetime.time(8, 13),
  'day_of_week': 1,
  'hour': 8,
  'month': 11},
 {'endTime': '2023-11-14 08:13',
  'artistName': 'The Sugarcubes',
  'trackName': 'Birthday',
  'msPlayed': 128615,
  'date': datetime.date(2023, 11, 14),
  'time': datetime.time(8, 13),
  'day_of_week': 1,
  'hour': 8,
  'month': 11}]

## Exercise 7

Create a function that receives the `spotify` data and returns a dictionary containing the day of the week as key, and the average reproduction time of the songs listened to on that day as value.

In [20]:
day_of_week_average = {}
unique_days = set(song['day_of_week'] for song in spotify)
print(unique_days)

for day in unique_days:
    times = []
    
    for song in spotify:
        week_day = song['day_of_week']
        time = song['msPlayed']
        if week_day == day:
            times.append(time)
        else:
            continue
        
        
    
    average = sum(times) / len(times)
    day_of_week_average[day] = average

print(day_of_week_average)

{0, 1, 2, 3, 4, 5, 6}
{0: 155393.9657064472, 1: 148166.1306049822, 2: 152668.17224880384, 3: 149653.29365079364, 4: 138931.63335377068, 5: 144044.99075215784, 6: 143172.49061032865}


## Exercise 8

Create a function that receives the `spotify` data and returns a dictionary containing the month as key, and the average reproduction time of the songs listened to on that month as value.

In [23]:
def month_average(spotify):
    months_average_dict = {}
    unique_months = set(song['month'] for song in spotify)

    for month in unique_months:
        times = []

        for song in spotify:
            song_month = song['month']
            song_time = song['msPlayed']
            if song_month == month:
                times.append(song_time)
            else:
                continue
        
        average = sum(times) / len(times)
        months_average_dict[month] = average
    
    return months_average_dict

print(month_average(spotify))

{1: 152086.27578475335, 2: 172693.3589059373, 3: 138529.38774002955, 4: 151025.22015503875, 5: 167525.50705882354, 6: 138003.29032258064, 7: 158848.08542713567, 8: 110486.351160444, 9: 125787.73454545454, 10: 147484.56053184046, 11: 155254.23768308922, 12: 164800.02191558442}


## Exercise 9

What is the artist with the longest song I've listened to?

What is the song with the longest reproduction time?

In [25]:
for song in spotify:
    song['song_name_length'] = len(song['trackName'])

max(spotify, key = lambda x: x['song_name_length'])

{'endTime': '2024-07-30 09:58',
 'artistName': 'Sidonie',
 'trackName': 'Fascinados (feat. Joan Manuel Serrat, Leiva, Vetusta Morla, Iván Ferreiro, Loquillo, Zahara, Dani Martin, Albert Pla, Mikel (Izal), Noni (Lori Meyers), Santi Balmes, Xoel López, Anni B Sweet, Jeanette, Carlos Sadness, Nina (Morgan), Juan Alberto (Niños Mutantes), Miri Ros, Javiera Mena, Jorge Martí (La Habitación Roja), Rafa Val (Viva Suecia), Marc (Dorian), Alondra Bentley, Abraham Boba, Carlangas (Novedades Carminha), La Bien Querida, Martí Perarnau IV (Mucho), Nita (Fuel Fandango) & Shuarma (Elefantes))',
 'msPlayed': 6911,
 'date': datetime.date(2024, 7, 30),
 'time': datetime.time(9, 58),
 'day_of_week': 1,
 'hour': 9,
 'month': 7,
 'song_name_length': 511}

In [28]:
print(max(spotify, key = lambda x: x['msPlayed'])['trackName'], max(spotify, key = lambda x: x['msPlayed'])['msPlayed'])

Change of the Guard 735960


## Exercise 10

Create a function that returns all the songs I've listened to for less than 5 seconds.

In [29]:
def songs_under_5(spotify):
    songs_under_5_list = []
    for song in spotify:
        seconds = song['msPlayed'] / 1000

        if seconds < 5:
            songs_under_5_list.append(song['trackName'])
    return songs_under_5_list

print(songs_under_5(spotify))

['Bang', 'Middle Of My Mind', 'Nunca Estás a la Altura', 'Olímpicos', 'Del Montón', 'Que Hace una Chica Como Tu en un Sitio Como Este', 'Azul Casi Luz', 'You Only Live Once', 'Tu nuevo grupo favorito', 'Call It Fate, Call It Karma', 'Instant Crush (feat. Julian Casablancas)', 'Azul Casi Luz', 'Interesantes', 'Niña de Hielo', 'Sopa fría', 'Moving', 'Puede Ser', 'Por la boca vive el pez', 'Por Ti', 'Destrangis in the Night', 'Destrangis in the Night', 'Por la boca vive el pez', 'Después (con Gualberto y Bebe)', 'La Revolución Sexual', 'Let Me Out', 'Tarde o temprano', 'Lo Que Pasa Es Que Me Cuelgo', 'Tal Como Eres', 'Marinade', 'Live Jam in Polynesia', 'Monkey Throw Monkey', 'Shalala', 'Charmed', 'White Winter Hymnal', 'Our Mutual Friend', 'Leaving Today', 'Tonight We Fly', 'Azul Casi Luz', 'Niña de Hielo', 'Del Montón', 'Playa', 'Pacific Theme', 'Teardrop', 'Weather Storm', 'Modern Man', 'Ready to Start', "Alameda's Blues", 'La grifa', 'Porselana Teeth', "Alameda's Blues", 'Todo El Mund

## Exercise 11

What's the hour of the day when I listen to music the most?

In [43]:
hour_average = {}
unique_hours = set(song['hour'] for song in spotify)
print(unique_days)

for hour in unique_hours:
    times = []
    
    for song in spotify:
        song_hour = song['hour']
        time = song['msPlayed']
        if song_hour == hour:
            times.append(time)
        else:
            continue
        
        
    
    average = sum(times) / len(times)
    hour_average[hour] = average

max_hour = max(hour_average, key = hour_average.get)
print(max_hour, hour_average[max_hour])

{0, 1, 2, 3, 4, 5, 6}
19 169546.72972972973


## Exercise 12

Create two functions:

- `dates_played_artist`: receives an artist's name and returns a list with all the dates in which I've listened to that artist.
- `span_artist`: receives an artist's name and returns the span of time between the first and the last time I've listened to that artist, using `dates_played_artist`.

Create a dictionary with the artist's name as key and the span of time as value, in days.

In [53]:
def dates_played_artist(artist_name, spotify):
    dates_played = []

    for song in spotify:
        artist = song['artistName']
        date = song['date']
        if artist_name == artist:
            dates_played.append(date)
        else:
            continue

    return dates_played

print(dates_played_artist('Eminem', spotify))

[datetime.date(2023, 11, 14), datetime.date(2023, 11, 14), datetime.date(2023, 11, 14), datetime.date(2023, 11, 14), datetime.date(2023, 11, 14)]


In [55]:
def span_artist(artist_name, spotify):
    dates = dates_played_artist(artist_name, spotify)
    max_date = max(dates)
    min_date = min(dates)
    span = max_date - min_date
    return span

print(span_artist('Mujeres', spotify))

354 days, 0:00:00


## Exercise 13

For each artist, what's the ratio of songs that I've listened to for more that 10 s to the total number of songs I've listened to from that artist?

For example, if I've listened to 10 songs from artist A, and 3 of them have been listened to for more than 10 s, the ratio is 0.3.

In [64]:
unique_artists = set(song['artistName'] for song in spotify)
artist_song_ratio = {}
count_10 = 0
total = 0

for artist in unique_artists:
    for song in spotify:
        if song['artistName'] == artist and (song['msPlayed'] / 1000) > 10:
            count_10 += 1
        else:
            continue
    artist_song_ratio[artist] = count_10

    for song in spotify:
        if song['artistName'] == artist:
            total += 1
        else:
            continue
    artist_song_ratio[artist] = artist_song_ratio[artist] / total



print(artist_song_ratio)            

print(artist_song_ratio['Mad People'])

{'Haddaway': 1.0, 'Biznaga': 0.8461538461538461, 'Giggs': 0.8571428571428571, 'Bugseed': 0.8666666666666667, 'Cleo Sol': 0.875, 'Cecilio G.': 0.8235294117647058, 'Whitney': 0.7666666666666667, 'Will Smith': 0.7741935483870968, 'Chico Buarque': 0.8108108108108109, 'Lighthouse Family': 0.8157894736842105, 'Frank Sinatra': 0.803030303030303, 'Lykke Li': 0.7857142857142857, 'The Pioneers': 0.7922077922077922, 'Andrés Calamaro': 0.7816091954022989, 'India Martinez': 0.7727272727272727, 'Leif Vollebekk': 0.7752808988764045, 'Wolf Alice': 0.776595744680851, 'The Aggrovators': 0.7878787878787878, 'Sharon Van Etten': 0.7941176470588235, 'Michael Kiwanuka': 0.794392523364486, 'marián': 0.7981651376146789, 'Ondatrópica': 0.8, 'Sweeps': 0.8018018018018018, 'Cabaret Voltaire': 0.8035714285714286, 'Reality': 0.7964601769911505, 'Lluis Llach': 0.7982456140350878, 'The Kinks': 0.7931034482758621, 'FERNANDO ALFARO': 0.8106060606060606, 'Eminem': 0.8175182481751825, 'Rockers All Stars': 0.82312925170068