# Lecture 3.1: Data Reading

# 1. Reading Data (round one)

# The Dataset

<img src = "Spotify Top 50.png">

**Summary**

"Syntactic sugar": Methods that are useful to lead to concise code, but not absolutely necessary for the library to function.

In this notebook: .head(), .tail(), .sample(), . loc(), .iloc()

The main distinction between two methods:
- loc() gets rows with particular labels
- iloc() gets rows at integer locations
    
**Selection operators:**

loc() selects items by label **(df.loc[row_labels, column_labels])**
- Argument1 = rows
- Argument2 = columns

iloc() selects items by number **(df.iloc[row_integers, column_integers])**
- Argument1 = rows
- Argument2 = columns   
- Arguments can be:
    - lists, 
    - a slice[:] (syntax is inclusive of right hand side)
    - a single value

**loc() is usually better because even if the index gets shuffled the code will still work.**

[] Only takes one arg, which may be:
- slice of row numbers
- list of column labels
- a single columnn label
- [] is more common than loc in practice

In [16]:
# importing the zipfile module
from zipfile import ZipFile

filename = "/Users/christopherreid/My Drive (christopherreid@arizona.edu)/Classes/6. Summer 2023/CSC 380 - Principles of Data Science/Lecture Slides/Lecture 3.1/spotify-top-50-playlist-songs-anxods.zip"

# loading the temp.zip and creating a zip object
with ZipFile(filename, 'r') as zObject:
    
    # Extracting all the members of zip
    # into specific location
    zObject.extractall('spotify-top-50')
    
    

In [20]:
global_50_path = '/Users/christopherreid/My Drive (christopherreid@arizona.edu)/Classes/6. Summer 2023/CSC 380 - Principles of Data Science/Lecture Slides/Lecture 3.1/spotify-top-50/data/spotify-streaming-top-50-world.csv'

# 1. Reading Data (using csv)

In [21]:
import csv

global_50_list = []

with open(global_50_path, newline='') as csvfile:
    reader = csv.reader(csvfile, delimiter= ' ', quotechar= '|')
    for row in reader:
        global_50_list.append(row)

In [22]:
global_50_list[0:2]

[['date,position,song,artist,popularity,duration_ms,album_type,total_tracks,release_date,is_explicit,album_cover_url'],
 ['2023-05-18,1,Ella',
  'Baila',
  'Sola,Eslabon',
  'Armado,89,165671,album,16,2023-04-28,False,https://i.scdn.co/image/ab67616d0000b273dfddf1cb31b85a6d28b7d91f']]

**Who is the top artist of the day?**

In [23]:
date = '2023-05-27'

for row in global_50_list[1:]:
    if date + ",1," in row[0]:
        print(row[1])

Baila


# Pandas

In [24]:
!pip install pandas



In [25]:
import pandas as pd

In [26]:
# read_csv() method: reads in csv file data

global_50_df = pd.read_csv(global_50_path)

In [27]:
# to_csv() method: used to write the csv data to a new file

#global_50_df.to_csv('temp_file.csv', index=False)

In [28]:
# df = dataframe, table version of the data

global_50_df

Unnamed: 0,date,position,song,artist,popularity,duration_ms,album_type,total_tracks,release_date,is_explicit,album_cover_url
0,2023-05-18,1,Ella Baila Sola,Eslabon Armado,89,165671,album,16,2023-04-28,False,https://i.scdn.co/image/ab67616d0000b273dfddf1...
1,2023-05-18,2,un x100to,Grupo Frontera & Bad Bunny,99,194563,single,1,2023-04-17,False,https://i.scdn.co/image/ab67616d0000b273716c0b...
2,2023-05-18,3,La Bebe - Remix,Yng Lvcas & Peso Pluma,99,234352,single,2,2023-03-17,True,https://i.scdn.co/image/ab67616d0000b273a04be3...
3,2023-05-18,4,Cupid - Twin Ver.,FIFTY FIFTY,97,174253,single,3,2023-02-24,False,https://i.scdn.co/image/ab67616d0000b27337c0b3...
4,2023-05-18,5,Flowers,Miley Cyrus,91,200600,album,13,2023-03-10,False,https://i.scdn.co/image/ab67616d0000b27358039b...
...,...,...,...,...,...,...,...,...,...,...,...
2045,2023-06-27,46,Another Love,Tom Odell,93,244360,album,15,2013-06-24,True,https://i.scdn.co/image/ab67616d0000b2731917a0...
2046,2023-06-27,47,Popular (with Playboi Carti & Madonna) - The I...,The Weeknd & Playboi Carti & Madonna,91,215466,single,1,2023-06-02,True,https://i.scdn.co/image/ab67616d0000b273eac1ac...
2047,2023-06-27,48,Barbie World (with Aqua) [From Barbie The Album],Nicki Minaj & Ice Spice & Aqua,80,109750,single,1,2023-06-23,True,https://i.scdn.co/image/ab67616d0000b2737e8f93...
2048,2023-06-27,49,Heat Waves,Glass Animals,93,238805,album,16,2020-08-07,False,https://i.scdn.co/image/ab67616d0000b273712701...


In [29]:
# columns method: prints all column headers

global_50_df.columns

Index(['date', 'position', 'song', 'artist', 'popularity', 'duration_ms',
       'album_type', 'total_tracks', 'release_date', 'is_explicit',
       'album_cover_url'],
      dtype='object')

**Who is the top artist of the day?**

In [30]:
# global_50_df['date']                        -> prints date of every row in table
# global_50_df['date'] == date                 -> Checks if date in each row matches (boolean value)
# global_50_df[global_50_df['date'] == date]  -> prints rows with matching date

# Prints the top artist of the day

global_50_df[ (global_50_df['date'] == date) & (global_50_df['position'] == 1) ]

Unnamed: 0,date,position,song,artist,popularity,duration_ms,album_type,total_tracks,release_date,is_explicit,album_cover_url
450,2023-05-27,1,Ella Baila Sola,Eslabon Armado,91,165671,album,16,2023-04-28,False,https://i.scdn.co/image/ab67616d0000b273dfddf1...


# 2.2 Create a dataframe

Format: pd.DataFrame(data, index/row, colmns)
- Each is a separate item

In [31]:
# DataFrame() method: creates a dataframe

pd.DataFrame(['csc380', 'csv'])

Unnamed: 0,0
0,csc380
1,csv


In [32]:
# Adding Rows to the list (INCORRECT)

pd.DataFrame(['csc380', 'csv'], ['csc380', 'pandas'])

Unnamed: 0,0
csc380,csc380
pandas,csv


In [33]:
# Adding Rows to the list (CORRECT)

pd.DataFrame([ ['csc380', 'csv'], ['csc380', 'pandas'] ])

Unnamed: 0,0,1
0,csc380,csv
1,csc380,pandas


In [18]:
# Setting Columns

pd.DataFrame( [['csc380', 'csv'], ['csc380', 'pandas']], columns=['class', 'method'] )

Unnamed: 0,class,method
0,csc380,csv
1,csc380,pandas


In [19]:
# Dictionary
# NaN - Value of 'None'

pd.DataFrame([{'class': 'csc380', 'method': 'csv'}, {'class': 'csc380', 'method':'pandas'}])

Unnamed: 0,class,method
0,csc380,csv
1,csc380,pandas


In [20]:
pd.DataFrame(['csv','pandas'],['C', 'V'], ['CSC380'])

Unnamed: 0,CSC380
C,csv
V,pandas


In [21]:
# reset_index method: resets indicies to integers starting at 0

sample_df = pd.DataFrame(['csv','pandas'],['C', 'V'], ['CSC380']).reset_index()
sample_df

Unnamed: 0,index,CSC380
0,C,csv
1,V,pandas


In [22]:
# drop() method: drops a column from the dataframe

sample_df.drop('index', axis= 'columns', inplace=True)

In [23]:
sample_df

Unnamed: 0,CSC380
0,csv
1,pandas


In [24]:
# Adds column to dataframe

sample_df['random_count'] = [89, 52]
sample_df

Unnamed: 0,CSC380,random_count
0,csv,89
1,pandas,52


In [25]:
# Subtracts from each row in the dataframe

sample_df['random_count'] = sample_df['random_count'] - 10
sample_df

Unnamed: 0,CSC380,random_count
0,csv,79
1,pandas,42


In [26]:
# apply() method: applies a function along an axis of the DataFrame
# lambda: anonymous functions without a name
# upper() method: coverts strings in the Series to uppercase

sample_df['CSC380'] = sample_df['CSC380'].apply(lambda x : x.upper())

In [27]:
sample_df

Unnamed: 0,CSC380,random_count
0,CSV,79
1,PANDAS,42


# 2.3 Working with a DataFrame

In [28]:
global_50_df[2:5]

Unnamed: 0,date,position,song,artist,popularity,duration_ms,album_type,total_tracks,release_date,is_explicit,album_cover_url
2,2023-05-18,3,La Bebe - Remix,Yng Lvcas & Peso Pluma,99,234352,single,2,2023-03-17,True,https://i.scdn.co/image/ab67616d0000b273a04be3...
3,2023-05-18,4,Cupid - Twin Ver.,FIFTY FIFTY,97,174253,single,3,2023-02-24,False,https://i.scdn.co/image/ab67616d0000b27337c0b3...
4,2023-05-18,5,Flowers,Miley Cyrus,91,200600,album,13,2023-03-10,False,https://i.scdn.co/image/ab67616d0000b27358039b...


In [29]:
# head() method: prints beginning rows of table (5 by default)

global_50_df.head(2)

Unnamed: 0,date,position,song,artist,popularity,duration_ms,album_type,total_tracks,release_date,is_explicit,album_cover_url
0,2023-05-18,1,Ella Baila Sola,Eslabon Armado,89,165671,album,16,2023-04-28,False,https://i.scdn.co/image/ab67616d0000b273dfddf1...
1,2023-05-18,2,un x100to,Grupo Frontera & Bad Bunny,99,194563,single,1,2023-04-17,False,https://i.scdn.co/image/ab67616d0000b273716c0b...


In [30]:
# tail() method: prints beginning rows of table (5 by default)

global_50_df.tail(2)

Unnamed: 0,date,position,song,artist,popularity,duration_ms,album_type,total_tracks,release_date,is_explicit,album_cover_url
2048,2023-06-27,49,Heat Waves,Glass Animals,93,238805,album,16,2020-08-07,False,https://i.scdn.co/image/ab67616d0000b273712701...
2049,2023-06-27,50,Yandel 150,Yandel,90,216148,album,17,2023-01-13,True,https://i.scdn.co/image/ab67616d0000b273b2aec0...


In [31]:
# sample() method: Gets specified number of sample cell from table, called a Series

global_50_df.sample(2)

Unnamed: 0,date,position,song,artist,popularity,duration_ms,album_type,total_tracks,release_date,is_explicit,album_cover_url
1995,2023-06-26,46,La Bachata,Manuel Turizo,88,162637,album,15,2023-03-17,False,https://i.scdn.co/image/ab67616d0000b2734dd995...
1559,2023-06-18,10,Daylight,David Kushner,98,212953,single,1,2023-04-14,False,https://i.scdn.co/image/ab67616d0000b27395ca6a...


In [32]:
# iloc() method: Similar to list slicing (19 exclusive)

global_50_df.iloc[15:19]

Unnamed: 0,date,position,song,artist,popularity,duration_ms,album_type,total_tracks,release_date,is_explicit,album_cover_url
15,2023-05-18,16,BESO,ROSALÍA & Rauw Alejandro,96,194543,single,3,2023-03-24,False,https://i.scdn.co/image/ab67616d0000b2734d6cf0...
16,2023-05-18,17,Die For You (with Ariana Grande) - Remix,The Weeknd,88,232857,album,21,2023-03-14,False,https://i.scdn.co/image/ab67616d0000b2738ad8f5...
17,2023-05-18,18,PRC,Peso Pluma & Natanael Cano,96,184066,single,1,2023-01-23,True,https://i.scdn.co/image/ab67616d0000b2737be314...
18,2023-05-18,19,Calm Down (with Selena Gomez),Rema,78,239317,album,22,2023-04-27,False,https://i.scdn.co/image/ab67616d0000b273963265...


In [33]:
# loc() method: Similar to list slicing (19 inclusive)

global_50_df.loc[15:19]

Unnamed: 0,date,position,song,artist,popularity,duration_ms,album_type,total_tracks,release_date,is_explicit,album_cover_url
15,2023-05-18,16,BESO,ROSALÍA & Rauw Alejandro,96,194543,single,3,2023-03-24,False,https://i.scdn.co/image/ab67616d0000b2734d6cf0...
16,2023-05-18,17,Die For You (with Ariana Grande) - Remix,The Weeknd,88,232857,album,21,2023-03-14,False,https://i.scdn.co/image/ab67616d0000b2738ad8f5...
17,2023-05-18,18,PRC,Peso Pluma & Natanael Cano,96,184066,single,1,2023-01-23,True,https://i.scdn.co/image/ab67616d0000b2737be314...
18,2023-05-18,19,Calm Down (with Selena Gomez),Rema,78,239317,album,22,2023-04-27,False,https://i.scdn.co/image/ab67616d0000b273963265...
19,2023-05-18,20,Yandel 150,Yandel,90,216148,album,17,2023-01-13,True,https://i.scdn.co/image/ab67616d0000b273b2aec0...


In [34]:
global_50_subset_df = global_50_df.sample(10)

In [35]:
global_50_subset_df

Unnamed: 0,date,position,song,artist,popularity,duration_ms,album_type,total_tracks,release_date,is_explicit,album_cover_url
346,2023-05-24,47,Like Crazy,Jimin,93,212241,single,6,2023-03-24,False,https://i.scdn.co/image/ab67616d0000b2732b4607...
989,2023-06-06,40,Los del Espacio,LIT killah,79,338000,single,1,2023-06-01,False,https://i.scdn.co/image/ab67616d0000b27352c8b9...
1898,2023-06-24,49,Heat Waves,Glass Animals,93,238805,album,16,2020-08-07,False,https://i.scdn.co/image/ab67616d0000b273712701...
1433,2023-06-15,34,Last Night,Morgan Wallen,90,163854,album,36,2023-03-03,True,https://i.scdn.co/image/ab67616d0000b273705079...
1111,2023-06-09,12,Kill Bill,SZA,94,153946,album,23,2022-12-08,False,https://i.scdn.co/image/ab67616d0000b2730c471c...
374,2023-05-25,25,I Wanna Be Yours,Arctic Monkeys,94,183956,album,12,2013-09-09,False,https://i.scdn.co/image/ab67616d0000b2734ae1c4...
1769,2023-06-22,20,I Wanna Be Yours,Arctic Monkeys,95,183956,album,12,2013-09-09,False,https://i.scdn.co/image/ab67616d0000b2734ae1c4...
224,2023-05-22,25,TQM,Fuerza Regida,75,158965,single,1,2023-05-19,True,https://i.scdn.co/image/ab67616d0000b273832ea5...
57,2023-05-19,8,As It Was,Harry Styles,92,167303,album,13,2022-05-20,False,https://i.scdn.co/image/ab67616d0000b2732e8ed7...
2034,2023-06-27,35,Die For You (with Ariana Grande) - Remix,The Weeknd,88,232857,album,21,2023-03-14,False,https://i.scdn.co/image/ab67616d0000b2738ad8f5...


In [36]:
global_50_subset_df[3:5]

Unnamed: 0,date,position,song,artist,popularity,duration_ms,album_type,total_tracks,release_date,is_explicit,album_cover_url
1433,2023-06-15,34,Last Night,Morgan Wallen,90,163854,album,36,2023-03-03,True,https://i.scdn.co/image/ab67616d0000b273705079...
1111,2023-06-09,12,Kill Bill,SZA,94,153946,album,23,2022-12-08,False,https://i.scdn.co/image/ab67616d0000b2730c471c...


In [37]:
# iloc: Slices by row

global_50_subset_df.iloc[3:5]

Unnamed: 0,date,position,song,artist,popularity,duration_ms,album_type,total_tracks,release_date,is_explicit,album_cover_url
1433,2023-06-15,34,Last Night,Morgan Wallen,90,163854,album,36,2023-03-03,True,https://i.scdn.co/image/ab67616d0000b273705079...
1111,2023-06-09,12,Kill Bill,SZA,94,153946,album,23,2022-12-08,False,https://i.scdn.co/image/ab67616d0000b2730c471c...


In [38]:
# loc: Slices by index. Error for out of bounds

global_50_subset_df.loc[[1143, 1389]]

KeyError: "None of [Index([1143, 1389], dtype='int64')] are in the [index]"

In [39]:
global_50_df.iloc[5:15]['song']

5                                   Daylight
6                                  Kill Bill
7                                     Tattoo
8                                  As It Was
9                                        TQG
10                               Cha Cha Cha
11                                Classy 101
12                                 Acróstico
13    Creepin' (with The Weeknd & 21 Savage)
14          See You Again (feat. Kali Uchis)
Name: song, dtype: object

In [40]:
# DataFrame vs Series: 2 dimentional data vs 1 dimentional data
# to.frame() method: converts Series to DataFrame

global_50_df.iloc[5:15]['song'].to_frame()

Unnamed: 0,song
5,Daylight
6,Kill Bill
7,Tattoo
8,As It Was
9,TQG
10,Cha Cha Cha
11,Classy 101
12,Acróstico
13,Creepin' (with The Weeknd & 21 Savage)
14,See You Again (feat. Kali Uchis)


In [41]:
# KeyError: ('song', 'artist')

global_50_df.iloc[5:15]['song','artist']

KeyError: ('song', 'artist')

In [42]:
# Submit multiple strings as a single list of items 

global_50_df.iloc[5:15][['song','artist']]

Unnamed: 0,song,artist
5,Daylight,David Kushner
6,Kill Bill,SZA
7,Tattoo,Loreen
8,As It Was,Harry Styles
9,TQG,KAROL G
10,Cha Cha Cha,Käärijä
11,Classy 101,Feid & Young Miko
12,Acróstico,Shakira
13,Creepin' (with The Weeknd & 21 Savage),Metro Boomin
14,See You Again (feat. Kali Uchis),"Tyler, The Creator"


# 2.4 Conditionals

**Q: What are the top 50 artists of June 5th 2023?**

In [43]:
date = '2023-06-05'
global_50_df[global_50_df['date'] == date]

Unnamed: 0,date,position,song,artist,popularity,duration_ms,album_type,total_tracks,release_date,is_explicit,album_cover_url
900,2023-06-05,1,Ella Baila Sola,Eslabon Armado,92,165671,album,16,2023-04-28,False,https://i.scdn.co/image/ab67616d0000b273dfddf1...
901,2023-06-05,2,"Peso Pluma: Bzrp Music Sessions, Vol. 55",Bizarrap & Peso Pluma,89,188361,single,1,2023-06-01,True,https://i.scdn.co/image/ab67616d0000b273155830...
902,2023-06-05,3,WHERE SHE GOES,Bad Bunny,96,231704,single,1,2023-05-18,True,https://i.scdn.co/image/ab67616d0000b273ab5c9c...
903,2023-06-05,4,La Bebe - Remix,Yng Lvcas & Peso Pluma,99,234352,single,2,2023-03-17,True,https://i.scdn.co/image/ab67616d0000b273a04be3...
904,2023-06-05,5,un x100to,Grupo Frontera & Bad Bunny,100,194563,single,1,2023-04-17,False,https://i.scdn.co/image/ab67616d0000b273716c0b...
905,2023-06-05,6,Flowers,Miley Cyrus,91,200600,album,13,2023-03-10,False,https://i.scdn.co/image/ab67616d0000b27358039b...
906,2023-06-05,7,Cupid - Twin Ver.,FIFTY FIFTY,98,174253,single,3,2023-02-24,False,https://i.scdn.co/image/ab67616d0000b27337c0b3...
907,2023-06-05,8,TQM,Fuerza Regida,91,158965,single,1,2023-05-19,True,https://i.scdn.co/image/ab67616d0000b273832ea5...
908,2023-06-05,9,As It Was,Harry Styles,92,167303,album,13,2022-05-20,False,https://i.scdn.co/image/ab67616d0000b2732e8ed7...
909,2023-06-05,10,Daylight,David Kushner,97,212953,single,1,2023-04-14,False,https://i.scdn.co/image/ab67616d0000b27395ca6a...


**Q: What are the top 5 artists of June 5th 2023?**

In [44]:
global_50_df[ (global_50_df['date'] == date) & (global_50_df['position'] <= 5) ]

Unnamed: 0,date,position,song,artist,popularity,duration_ms,album_type,total_tracks,release_date,is_explicit,album_cover_url
900,2023-06-05,1,Ella Baila Sola,Eslabon Armado,92,165671,album,16,2023-04-28,False,https://i.scdn.co/image/ab67616d0000b273dfddf1...
901,2023-06-05,2,"Peso Pluma: Bzrp Music Sessions, Vol. 55",Bizarrap & Peso Pluma,89,188361,single,1,2023-06-01,True,https://i.scdn.co/image/ab67616d0000b273155830...
902,2023-06-05,3,WHERE SHE GOES,Bad Bunny,96,231704,single,1,2023-05-18,True,https://i.scdn.co/image/ab67616d0000b273ab5c9c...
903,2023-06-05,4,La Bebe - Remix,Yng Lvcas & Peso Pluma,99,234352,single,2,2023-03-17,True,https://i.scdn.co/image/ab67616d0000b273a04be3...
904,2023-06-05,5,un x100to,Grupo Frontera & Bad Bunny,100,194563,single,1,2023-04-17,False,https://i.scdn.co/image/ab67616d0000b273716c0b...


In [45]:
# .shape method: returns #row & #columns as a tuple

global_50_df.shape

(2050, 11)

**Q: How many artists have entered the top 50 Spotify playlist?**
(Over the total lifetime covered by the dataset)

In [46]:
# .to_list(): makes a list of a specified parameter from a DataFrame Object

global_50_df['artist'].to_list()

['Eslabon Armado',
 'Grupo Frontera & Bad Bunny',
 'Yng Lvcas & Peso Pluma',
 'FIFTY FIFTY',
 'Miley Cyrus',
 'David Kushner',
 'SZA',
 'Loreen',
 'Harry Styles',
 'KAROL G',
 'Käärijä',
 'Feid & Young Miko',
 'Shakira',
 'Metro Boomin',
 'Tyler, The Creator',
 'ROSALÍA & Rauw Alejandro',
 'The Weeknd',
 'Peso Pluma & Natanael Cano',
 'Rema',
 'Yandel',
 'Junior H & Peso Pluma',
 'PinkPantheress & Ice Spice',
 'Taylor Swift',
 'The Weeknd',
 'Arctic Monkeys',
 'Lil Durk & J. Cole',
 'Morgan Wallen',
 'David Guetta & Bebe Rexha',
 'Natanael Cano & Peso Pluma & Gabito Ballesteros',
 'd4vd',
 'Miley Cyrus',
 'JVKE',
 'Stephen Sanchez & Em Beihold',
 'Jimin',
 'Kali Uchis',
 'Taylor Swift',
 'Bizarrap & Shakira',
 'Sam Smith',
 'Tom Odell',
 'The Weeknd',
 'Peso Pluma',
 'Alessandra',
 'Ozuna',
 'Libianca',
 'OneRepublic',
 'Manuel Turizo',
 'Eminem',
 'Yahritza Y Su Esencia & Grupo Frontera',
 'JISOO',
 'Chino Pacas',
 'Eslabon Armado',
 'Grupo Frontera & Bad Bunny',
 'FIFTY FIFTY',
 'Yng

In [47]:
# .unique(): returns a list of a specified parameter without duplicates, only works on a Series

top_50_artists_of_all_time = global_50_df['artist'].unique()

In [48]:
top_50_artists_of_all_time

array(['Eslabon Armado', 'Grupo Frontera & Bad Bunny',
       'Yng Lvcas & Peso Pluma', 'FIFTY FIFTY', 'Miley Cyrus',
       'David Kushner', 'SZA', 'Loreen', 'Harry Styles', 'KAROL G',
       'Käärijä', 'Feid & Young Miko', 'Shakira', 'Metro Boomin',
       'Tyler, The Creator', 'ROSALÍA & Rauw Alejandro', 'The Weeknd',
       'Peso Pluma & Natanael Cano', 'Rema', 'Yandel',
       'Junior H & Peso Pluma', 'PinkPantheress & Ice Spice',
       'Taylor Swift', 'Arctic Monkeys', 'Lil Durk & J. Cole',
       'Morgan Wallen', 'David Guetta & Bebe Rexha',
       'Natanael Cano & Peso Pluma & Gabito Ballesteros', 'd4vd', 'JVKE',
       'Stephen Sanchez & Em Beihold', 'Jimin', 'Kali Uchis',
       'Bizarrap & Shakira', 'Sam Smith', 'Tom Odell', 'Peso Pluma',
       'Alessandra', 'Ozuna', 'Libianca', 'OneRepublic', 'Manuel Turizo',
       'Eminem', 'Yahritza Y Su Esencia & Grupo Frontera', 'JISOO',
       'Chino Pacas', 'Coldplay', 'Fast & Furious: The Fast Saga',
       'Lil Mabu', 'Bad Bunny'

In [49]:
# len: returns the length of the list

len(top_50_artists_of_all_time)

79

**Q: How long are the songs usually?**

Evaluate: Average & Median

In [50]:
# columns method: retuns a list of the column headings

global_50_df.columns

Index(['date', 'position', 'song', 'artist', 'popularity', 'duration_ms',
       'album_type', 'total_tracks', 'release_date', 'is_explicit',
       'album_cover_url'],
      dtype='object')

In [51]:
# artist method: prints artist column from every row

global_50_df.artist

0                             Eslabon Armado
1                 Grupo Frontera & Bad Bunny
2                     Yng Lvcas & Peso Pluma
3                                FIFTY FIFTY
4                                Miley Cyrus
                        ...                 
2045                               Tom Odell
2046    The Weeknd & Playboi Carti & Madonna
2047          Nicki Minaj & Ice Spice & Aqua
2048                           Glass Animals
2049                                  Yandel
Name: artist, Length: 2050, dtype: object

In [52]:
# duration_ms method: returns each song length in milliseconds
# mean() method: calculates the average/mean of a specific dataset

global_50_df.duration_ms.mean()

198069.54926829267

In [53]:
# median() method: calculates the median of a specific dataset

global_50_df.duration_ms.median()

194563.0

**Q: Who are the artists with explicit music that were in the top 50 Spotify playlists?**

In [54]:
global_50_df[global_50_df['is_explicit']]['artist'].unique()

array(['Yng Lvcas & Peso Pluma', 'KAROL G', 'Feid & Young Miko',
       'Metro Boomin', 'Tyler, The Creator', 'Peso Pluma & Natanael Cano',
       'Yandel', 'Lil Durk & J. Cole', 'Morgan Wallen',
       'David Guetta & Bebe Rexha',
       'Natanael Cano & Peso Pluma & Gabito Ballesteros', 'Tom Odell',
       'The Weeknd', 'Eminem', 'Chino Pacas', 'Lil Mabu', 'Bad Bunny',
       'Post Malone', 'Eladio Carrion', 'Fuerza Regida',
       'Beyoncé & Kendrick Lamar', 'Taylor Swift', 'Lil Durk',
       'Peso Pluma', 'Bizarrap & Peso Pluma', 'Dave & Central Cee',
       'The Weeknd & Playboi Carti & Madonna', 'Halsey & SUGA',
       'Sky Rompiendo & Feid & Myke Towers', 'Saiko & Feid & Quevedo',
       'Doja Cat', 'Bizarrap & Rauw Alejandro',
       'Nicki Minaj & Ice Spice & Aqua', 'Young Thug'], dtype=object)

**Q: Of the artists with explicit music in the top 50, how many spots did they take?**

In [55]:
# value_counts() method: counts the number of occurences across the dataset

global_50_df[global_50_df['is_explicit']]['artist'].value_counts()

artist
Yng Lvcas & Peso Pluma                             41
Yandel                                             41
KAROL G                                            41
David Guetta & Bebe Rexha                          41
Morgan Wallen                                      41
The Weeknd                                         41
Tyler, The Creator                                 41
Metro Boomin                                       41
Feid & Young Miko                                  41
Fuerza Regida                                      40
Peso Pluma & Natanael Cano                         38
Bad Bunny                                          38
Tom Odell                                          32
Bizarrap & Peso Pluma                              25
Dave & Central Cee                                 24
Natanael Cano & Peso Pluma & Gabito Ballesteros    22
Lil Durk                                           22
Sky Rompiendo & Feid & Myke Towers                 18
The Weeknd & Playboi 

**Q: Mean duration of songs by each artist?**

In [56]:
# groupby() method: A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

global_50_df[['artist','duration_ms']].groupby('artist').mean()

Unnamed: 0_level_0,duration_ms
artist,Unnamed: 1_level_1
Alessandra,147979.0
Arctic Monkeys,183956.0
BTS,229953.0
Bad Bunny,231704.0
Beyoncé & Kendrick Lamar,260962.0
...,...
Yahritza Y Su Esencia & Grupo Frontera,160517.0
Yandel,216148.0
Yng Lvcas & Peso Pluma,234352.0
Young Thug,206799.5


**Q: Shortest and longest mean of artist?**

In [57]:
# sort_values() method: sorts by specified column header

global_50_df[['artist','duration_ms']].groupby('artist').mean().sort_values(by='duration_ms')

Unnamed: 0_level_0,duration_ms
artist,Unnamed: 1_level_1
Lil Mabu,88304.0
DENNIS & MC Kevin o Chris,92093.0
Nicki Minaj & Ice Spice & Aqua,109750.0
Chino Pacas,112087.0
PinkPantheress & Ice Spice,131013.0
...,...
Coldplay,266773.0
Doja Cat,277043.0
Tina Turner,280020.0
Saiko & Feid & Quevedo,288000.0


**Q: Top & bottom 5?**

In [58]:
global_50_df[['artist','duration_ms']].groupby('artist').mean().sort_values(by='duration_ms').head(5)

Unnamed: 0_level_0,duration_ms
artist,Unnamed: 1_level_1
Lil Mabu,88304.0
DENNIS & MC Kevin o Chris,92093.0
Nicki Minaj & Ice Spice & Aqua,109750.0
Chino Pacas,112087.0
PinkPantheress & Ice Spice,131013.0


In [59]:
global_50_df[['artist','duration_ms']].groupby('artist').mean().sort_values(by='duration_ms').tail(5)

Unnamed: 0_level_0,duration_ms
artist,Unnamed: 1_level_1
Coldplay,266773.0
Doja Cat,277043.0
Tina Turner,280020.0
Saiko & Feid & Quevedo,288000.0
LIT killah,338000.0


**Q: Who are the artists whose name starts wit Lil?**

In [60]:
# str.startswith() method: checks if a string starts with a specified string, returns a numpy array

global_50_df[global_50_df['artist'].str.startswith('Lil')]['artist'].unique()

array(['Lil Durk & J. Cole', 'Lil Mabu', 'Lil Durk'], dtype=object)

**Q: Spots that the above the artists took?**

In [61]:
# tolist() method: works on ARRAYS. Does not work on SERIES. Converts numpy array into list

lil_artists = global_50_df[global_50_df['artist'].str.startswith('Lil')]['artist'].unique().tolist()

In [62]:
# isin() method: returns DataFrame of booleans showing whether each element in the DataFrame is contained in values.

global_50_df[global_50_df['artist'].isin(lil_artists)].shape

(33, 11)

**Q: Unique songs by artists whose name starts with 'Lil'?**

In [63]:
lil_artists_songs_df = global_50_df[global_50_df['artist'].isin(lil_artists)][['artist', 'song']]
lil_artists_songs_df

Unnamed: 0,artist,song
25,Lil Durk & J. Cole,All My Life (feat. J. Cole)
77,Lil Durk & J. Cole,All My Life (feat. J. Cole)
126,Lil Durk & J. Cole,All My Life (feat. J. Cole)
149,Lil Mabu,MATHEMATICAL DISRESPECT
176,Lil Durk & J. Cole,All My Life (feat. J. Cole)
230,Lil Durk & J. Cole,All My Life (feat. J. Cole)
281,Lil Durk & J. Cole,All My Life (feat. J. Cole)
328,Lil Durk & J. Cole,All My Life (feat. J. Cole)
375,Lil Durk & J. Cole,All My Life (feat. J. Cole)
427,Lil Durk & J. Cole,All My Life (feat. J. Cole)


In [1]:
# duplicated() method: returns bool value if the row is duplicated

lil_artists_songs_df[['artist', 'song']].duplicated()

NameError: name 'lil_artists_songs_df' is not defined

In [2]:
lil_artists_songs_df[lil_artists_songs_df[['artist', 'song']].duplicated()!= True]

NameError: name 'lil_artists_songs_df' is not defined

In [3]:
# Alternate method to remove duplicates
# drop_duplicates() method: removes duplicate rows

lil_artists_songs_df.drop_duplicates()

NameError: name 'lil_artists_songs_df' is not defined