## Cleaning the Pitchfork Review data

1) Create a data frame of the 2017 reviews only. 1261 lines for Dec 6th 2016 - Dec 6th 2017.

2) Make album the index

In [1]:
import pandas as pd
import json

p4k_reviews_fname = 'p4kreviews-utf8.csv'

# read in the first 1260 lines into a pd df

# note that this p4kreviews file was latin1 encoded. I created a copy in utf-8 that is read in by the uncommented line.

# p4k_df_init = pd.read_csv(p4k_reviews_fname, nrows=1260, encoding='latin1', index_col = 0)
p4k_df_init = pd.read_csv(p4k_reviews_fname, nrows=1260, encoding='utf-8', index_col = 0)



In [2]:
p4k_df_init.tail()

Unnamed: 0,album,artist,best,date,genre,review,score
1256,Harlequin,Alex Izenberg,0,December 7 2016,Pop/R&B,A modernist spin on the ’70s singer-songwriter...,6.3
1257,"“Awaken, My Love!”",Childish Gambino,0,December 6 2016,Rap,On Donald Glover’s latest project as Childish ...,7.2
1258,Zoovier,Fetty Wap,0,December 6 2016,Rap,Fetty Wap’s new tape doesn't quite answer the ...,6.4
1259,The Weight of These Wings,Miranda Lambert,0,December 6 2016,Folk/Country,Miranda Lambert’s double album arrives in the ...,7.8
1260,love and noir.,Denitia and Sene,0,December 6 2016,Pop/R&B,"This Brooklyn duo traffics in spare, low-lit R...",6.6


In [3]:
p4k_df_init.loc[17]

album                                Loüm / Go Be Forgotten
artist                                             Krallice
best                                                      0
date                                        December 2 2017
genre                                                 Metal
review    1 / 2 Albums New York City’s most consistent m...
score                                                   7.6
Name: 17, dtype: object

## Set NaN in album field to empty string or drop rows

Some album names are blank and the df has these as NaN. Set these to empty string or maybe even eliminate them.

In [4]:
p4k_df_init = p4k_df_init.dropna(subset=['album'])

# reindex since we dropped some rows
p4k_df_init = p4k_df_init.reset_index(drop=True)

In [5]:
p4k_df_init.describe()

Unnamed: 0,best,score
count,1256.0,1256.0
mean,0.082006,7.338694
std,0.274484,0.939424
min,0.0,2.8
25%,0.0,6.8
50%,0.0,7.4
75%,0.0,7.825
max,1.0,10.0


## Set index to 'Album' name

In [6]:
p4k_df_init_album = p4k_df_init.set_index('album')
p4k_df_init_album.head(25)

Unnamed: 0_level_0,artist,best,date,genre,review,score
album,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A.M./Being There,Wilco,1,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0
No Shame,Hopsin,0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5
Material Control,Glassjaw,0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6
Weighing of the Heart,Nabihah Iqbal,0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7
The Visitor,Neil Young / Promise of the Real,0,December 5 2017,Rock,"While still pointedly political, Neil Young’s ...",6.7
Perfect Angel,Minnie Riperton,1,December 5 2017,Pop/R&B,Best new reissue A deluxe reissue of Minnie Ri...,9.0
Everyday Is Christmas,Sia,0,December 5 2017,Pop/R&B,Sia’s shiny Christmas album feels inconsistent...,5.8
Zaytown Sorority Class of 2017,Zaytoven,0,December 5 2017,Rap,The prolific Atlanta producer enlists 17 women...,6.2
Songs of Experience,U2,0,December 4 2017,Rock,"Years in the making, U2’s 14th studio album fi...",5.3
Post Self,Godflesh,0,December 4 2017,Metal,The new LP from pioneering industrial band God...,8.1


## Save to CSV

Encode as utf-8 or utf-8-sig

In [7]:
csv_save_filename = "p4kreviews-2017-utf8sig.csv"
p4k_df_init.to_csv(csv_save_filename, encoding="utf-8-sig")

## Read this CSV back in as our DF to make sure things look right

Verifying that the utf-8 encoding is okay.

In [8]:
p4k_reviews_fname = 'p4kreviews-2017-utf8sig.csv'

p4k_df_2017 = pd.read_csv(p4k_reviews_fname, index_col = 0)

p4k_df_2017.tail()

Unnamed: 0,album,artist,best,date,genre,review,score
1251,Harlequin,Alex Izenberg,0,December 7 2016,Pop/R&B,A modernist spin on the ’70s singer-songwriter...,6.3
1252,"“Awaken, My Love!”",Childish Gambino,0,December 6 2016,Rap,On Donald Glover’s latest project as Childish ...,7.2
1253,Zoovier,Fetty Wap,0,December 6 2016,Rap,Fetty Wap’s new tape doesn't quite answer the ...,6.4
1254,The Weight of These Wings,Miranda Lambert,0,December 6 2016,Folk/Country,Miranda Lambert’s double album arrives in the ...,7.8
1255,love and noir.,Denitia and Sene,0,December 6 2016,Pop/R&B,"This Brooklyn duo traffics in spare, low-lit R...",6.6


In [9]:
p4k_df_2017.loc[17]

album                                         War & Leisure
artist                                               Miguel
best                                                      0
date                                        December 1 2017
genre                                               Pop/R&B
review    Miguel’s fourth album has a kinetic sexual and...
score                                                   8.1
Name: 17, dtype: object

In [10]:
p4k_df_2017.head(20)

Unnamed: 0,album,artist,best,date,genre,review,score
0,A.M./Being There,Wilco,1,December 6 2017,Rock,Best new reissue 1 / 2 Albums Newly reissued a...,7.0
1,No Shame,Hopsin,0,December 6 2017,Rap,"On his corrosive fifth album, the rapper takes...",3.5
2,Material Control,Glassjaw,0,December 6 2017,Rock,"On their first album in 15 years, the Long Isl...",6.6
3,Weighing of the Heart,Nabihah Iqbal,0,December 6 2017,Pop/R&B,"On her debut LP, British producer Nabihah Iqba...",7.7
4,The Visitor,Neil Young / Promise of the Real,0,December 5 2017,Rock,"While still pointedly political, Neil Young’s ...",6.7
5,Perfect Angel,Minnie Riperton,1,December 5 2017,Pop/R&B,Best new reissue A deluxe reissue of Minnie Ri...,9.0
6,Everyday Is Christmas,Sia,0,December 5 2017,Pop/R&B,Sia’s shiny Christmas album feels inconsistent...,5.8
7,Zaytown Sorority Class of 2017,Zaytoven,0,December 5 2017,Rap,The prolific Atlanta producer enlists 17 women...,6.2
8,Songs of Experience,U2,0,December 4 2017,Rock,"Years in the making, U2’s 14th studio album fi...",5.3
9,Post Self,Godflesh,0,December 4 2017,Metal,The new LP from pioneering industrial band God...,8.1


## See if we can get album names and artist names

In [11]:
type(p4k_df_2017.loc[58].album)

str

In [12]:
p4k_df_2017.loc[58][:]

album                                       Summer Megalith
artist                                             Caracara
best                                                      0
date                                       November 17 2017
genre                                                  Rock
review    The debut album from Caracara—co-produced by f...
score                                                   7.7
Name: 58, dtype: object

In [31]:
album_name_lst = list(p4k_df_2017.album)

In [32]:
artist_name_lst = list(p4k_df_2017.artist)

## Generate DataFrame CSV for Spotify Data on P4K 2017 Albums

Be VERY careful running the below code. It is prone to timing out, and do NOT overwrite data you want to keep.

In [33]:
from search_Album_get_Tracks_functions import *

In [35]:
# In[4]:


Found_Albums = Find_Albums(album_name_lst, artist_name_lst)
# print(json.dumps(Found_Albums, indent=4, ensure_ascii=False))  # reading back from JSON

No album found: Name: A.M./Being There, Artist: Wilco
No album found: Name: The Visitor, Artist: Neil Young / Promise of the Real
No album found: Name: Zaytown Sorority Class of 2017, Artist: Zaytoven
No album found: Name: Friday on Elm Street, Artist: Fabolous / Jadakiss
No album found: Name: Loüm / Go Be Forgotten, Artist: Krallice
No album found: Name: Fete Des Morts AKA Dia De Los Muertos EP, Artist: Mach-Hommy
No album found: Name: ?? ????, ???? “When You Have Won, You Have Lost”, Artist: HARAM
No album found: Name: The Fall - Singles 1978-2016, Artist: The Fall
No album found: Name: Bill Brewster Presents Tribal Rites, Artist: Bill Brewster
No album found: Name: Contributors, Artist: Contributors
No album found: Name: Ascending a Mountain of Heavy Light, Artist: The Body & Full of Hell
No album found: Name: Music for Nine Postcards, Artist: Hiroshi Yoshimura
No album found: Name: Tauhid/Jewels of Thought/Deaf Dumb Blind (Summun Bukmun Umyun), Artist: Pharoah Sanders
No album foun

No album found: Name: Spots y Escupitajo, Artist: Elysia Crampton
No album found: Name: Empirical House, Artist: Ricardo Villalobos
No album found: Name: Sufi La EP, Artist: Swet Shop Boys
No album found: Name: Bookhead EP, Artist: JJ DOOM
No album found: Name: Niggative Approach, Artist: Obnox
No album found: Name: Truth, Liberty & Soul - Live in NYC: The Complete 1982 NPR Jazz Alive! Recording, Artist: Jaco Pastorius
No album found: Name: And the Birds Flew Overhead, Artist: Mary Lattimore / Elysse Thebner Miller
No album found: Name: T-Wayne, Artist: T-Pain / Lil Wayne
No album found: Name: Close but Not Quite EP, Artist: Everything Is Recorded
No album found: Name: Ends With And / The Dirt of Luck / The Magic City / No Guitars, Artist: Helium
No album found: Name: Singles: Original Motion Picture Soundtrack-Deluxe Edition, Artist: Various Artists
No album found: Name: The King & I, Artist: Faith Evans / The Notorious B.I.G.
No album found: Name: Rubbish of the Floodwaters EP, Artis

No album found: Name: Merry Christmas Lil’ Mama, Artist: Chance the Rapper / Jeremih
No album found: Name: Not the Actual Events EP, Artist: Nine Inch Nails
No album found: Name: Swiss Radio Days Vol. 41 - Zurich 1961, Artist: Ray Charles
No album found: Name: Jackie OST, Artist: Mica Levi
No album found: Name: Slight Freedom, Artist: Jeff Parker
No album found: Name: Chronology, Artist: Qasim Naqvi
No album found: Name: Bob Dylan: The 1966 Live Recordings, Artist: Bob Dylan
No album found: Name: Melo de Melo EP, Artist: Ricardo Villalobos / Umho
No album found: Name: Equality Now EP, Artist: Peder Mannerfelt
No album found: Name: The Early Years 1965-1972, Artist: Pink Floyd
No album found: Name: Hologram ?mparatorlu?u, Artist: Gaye Su Akyol
No album found: Name: Zero Gravity EP, Artist: Kodie Shane
No album found: Name: 1017 vs. the World, Artist: Gucci Mane / Lil Uzi Vert
No album found: Name: How to Disappear in America, Artist: Young Male
No album found: Name: Zoovier, Artist: Fet

In [36]:
# In[6]:

for album_uri_key in Found_Albums:  # each album's uri
    albumSongs(album_uri_key, Found_Albums)
    print(
        "Album " + str(Found_Albums[album_uri_key]["album_name"]) + " songs has been added to Found_Albums dictionary")

Album No Shame songs has been added to Found_Albums dictionary
Album Material Control songs has been added to Found_Albums dictionary
Album Weighing of the Heart songs has been added to Found_Albums dictionary
Album Perfect Angel songs has been added to Found_Albums dictionary
Album Everyday Is Christmas songs has been added to Found_Albums dictionary
Album Songs of Experience songs has been added to Found_Albums dictionary
Album Post Self songs has been added to Found_Albums dictionary
Album cybersex songs has been added to Found_Albums dictionary
Album Endless Computer songs has been added to Found_Albums dictionary
Album Metal Machine Music songs has been added to Found_Albums dictionary
Album Master of Puppets songs has been added to Found_Albums dictionary
Album Oblivion songs has been added to Found_Albums dictionary
Album War & Leisure songs has been added to Found_Albums dictionary
Album Kick songs has been added to Found_Albums dictionary
Album Polygondwanaland songs has been 

Album Tokyo Flashback songs has been added to Found_Albums dictionary
Album Losing songs has been added to Found_Albums dictionary
Album Going Grey songs has been added to Found_Albums dictionary
Album Death Revenge songs has been added to Found_Albums dictionary
Album Glasshouse songs has been added to Found_Albums dictionary
Album Anthology: Movie Themes 1974-1998 songs has been added to Found_Albums dictionary
Album Musik songs has been added to Found_Albums dictionary
Album Thawing Dawn songs has been added to Found_Albums dictionary
Album The Queen Is Dead songs has been added to Found_Albums dictionary
Album The Saga Continues songs has been added to Found_Albums dictionary
Album Sis-boom-bah! songs has been added to Found_Albums dictionary
Album Beginning to Fall in Line Before Me, So Decorously, the Nature of All That Must Be Transformed songs has been added to Found_Albums dictionary
Album ken songs has been added to Found_Albums dictionary
Album Reaching for Indigo songs has 

Album Native Invader songs has been added to Found_Albums dictionary
Album CCCLX songs has been added to Found_Albums dictionary
Album The Gradual Progression songs has been added to Found_Albums dictionary
Album Hippopotamus songs has been added to Found_Albums dictionary
Album Hitchhiker songs has been added to Found_Albums dictionary
Album 1992 Deluxe songs has been added to Found_Albums dictionary
Album A Moment Apart songs has been added to Found_Albums dictionary
Album In Search of Lost Time songs has been added to Found_Albums dictionary
Album Love What Survives songs has been added to Found_Albums dictionary
Album The Hanged Man songs has been added to Found_Albums dictionary
Album Outrage! Is Now songs has been added to Found_Albums dictionary
Album Thx songs has been added to Found_Albums dictionary
Album MAKANDA at the End of Space, the Beginning of Time songs has been added to Found_Albums dictionary
Album Red songs has been added to Found_Albums dictionary
Album How Did I 

Album Content songs has been added to Found_Albums dictionary
Album Contempt songs has been added to Found_Albums dictionary
Album Tchornobog songs has been added to Found_Albums dictionary
Album Golden songs has been added to Found_Albums dictionary
Album fantasii songs has been added to Found_Albums dictionary
Album Good for You songs has been added to Found_Albums dictionary
Album Sounds from the Other Side songs has been added to Found_Albums dictionary
Album What Do You Think About the Car? songs has been added to Found_Albums dictionary
Album The Autobiography songs has been added to Found_Albums dictionary
Album Lack songs has been added to Found_Albums dictionary
Album Ceilings songs has been added to Found_Albums dictionary
Album American Water songs has been added to Found_Albums dictionary
Album New Facts Emerge songs has been added to Found_Albums dictionary
Album Patterns for Resonant Space songs has been added to Found_Albums dictionary
Album Sudan Archives songs has been

Album The Fifth State of Consciousness songs has been added to Found_Albums dictionary
Album Hudson songs has been added to Found_Albums dictionary
Album OUÏ songs has been added to Found_Albums dictionary
Album Boomiverse songs has been added to Found_Albums dictionary
Album Weather Diaries songs has been added to Found_Albums dictionary
Album Witness songs has been added to Found_Albums dictionary
Album House and Land songs has been added to Found_Albums dictionary
Album The Writing's on the Wall songs has been added to Found_Albums dictionary
Album The Nashville Sound songs has been added to Found_Albums dictionary
Album Adiós songs has been added to Found_Albums dictionary
Album Symbolic Use of Light songs has been added to Found_Albums dictionary
Album Melodrama songs has been added to Found_Albums dictionary
Album Pretty Girls Like Trap Music songs has been added to Found_Albums dictionary
Album Rose Colored Corner songs has been added to Found_Albums dictionary
Album Trouble Mak

Album Slowdive songs has been added to Found_Albums dictionary
Album Satan’s graffiti or God’s art? songs has been added to Found_Albums dictionary
Album Joan Shelley songs has been added to Found_Albums dictionary
Album I songs has been added to Found_Albums dictionary
Album Mixtape IV songs has been added to Found_Albums dictionary
Album Dookie songs has been added to Found_Albums dictionary
Album Compassion songs has been added to Found_Albums dictionary
Album No Shape songs has been added to Found_Albums dictionary
Album In Spades songs has been added to Found_Albums dictionary
Album All Blue songs has been added to Found_Albums dictionary
Album This Old Dog songs has been added to Found_Albums dictionary
Album The Weather songs has been added to Found_Albums dictionary
Album HOWSLA songs has been added to Found_Albums dictionary
Album Carrie & Lowell Live songs has been added to Found_Albums dictionary
Album 9 songs has been added to Found_Albums dictionary
Album Grafts songs has 

Album Undertow songs has been added to Found_Albums dictionary
Album Build Music songs has been added to Found_Albums dictionary
Album Contact songs has been added to Found_Albums dictionary
Album Silver Eye songs has been added to Found_Albums dictionary
Album You Only Live 2wice songs has been added to Found_Albums dictionary
Album The Ride songs has been added to Found_Albums dictionary
Album Sand songs has been added to Found_Albums dictionary
Album Kelly Lee Owens songs has been added to Found_Albums dictionary
Album Star Stuff songs has been added to Found_Albums dictionary
Album Hands in Our Names songs has been added to Found_Albums dictionary
Album Number 1 Angel songs has been added to Found_Albums dictionary
Album Sorcerer songs has been added to Found_Albums dictionary
Album Feel Infinite songs has been added to Found_Albums dictionary
Album It's a Myth songs has been added to Found_Albums dictionary
Album II songs has been added to Found_Albums dictionary
Album Where Are W

Album Unfold songs has been added to Found_Albums dictionary
Album A Pink Sunset For No One songs has been added to Found_Albums dictionary
Album Soul Sick songs has been added to Found_Albums dictionary
Album Created in the Image of Suffering songs has been added to Found_Albums dictionary
Album Selectors 002 songs has been added to Found_Albums dictionary
Album Alice songs has been added to Found_Albums dictionary
Album Now That the Light Is Fading songs has been added to Found_Albums dictionary
Album Daylight Ghosts songs has been added to Found_Albums dictionary
Album Heba songs has been added to Found_Albums dictionary
Album Prisoner songs has been added to Found_Albums dictionary
Album Wildly Idle (Humble Before the Void) songs has been added to Found_Albums dictionary
Album Being You Is Great, I Wish I Could Be You More Often songs has been added to Found_Albums dictionary
Album Undying Color songs has been added to Found_Albums dictionary
Album DROGAS Light songs has been added

Album Mezzanine songs has been added to Found_Albums dictionary
Album Prelapsarian songs has been added to Found_Albums dictionary
Album All of Them Naturals songs has been added to Found_Albums dictionary
Album New Start songs has been added to Found_Albums dictionary
Album Insecure (Music From the HBO Original Series) songs has been added to Found_Albums dictionary
Album Stillness in Wonderland songs has been added to Found_Albums dictionary
Album Tehillim songs has been added to Found_Albums dictionary
Album Reflection songs has been added to Found_Albums dictionary
Album Filthy America… It’s Beautiful songs has been added to Found_Albums dictionary
Album Clear Sounds/Perfetta songs has been added to Found_Albums dictionary
Album Run the Jewels 3 songs has been added to Found_Albums dictionary
Album Nadir songs has been added to Found_Albums dictionary
Album ///// Effectual songs has been added to Found_Albums dictionary
Album Love You to Death songs has been added to Found_Albums d

In [37]:
# Create a deep copy of Found_Data to preserve the original incase we get errors and don't want to mess up the data

import copy

Found_Albums_backup = copy.deepcopy(Found_Albums)

In [38]:
# In[7]:


# print Found_Albums with track info
# print(json.dumps(Found_Albums, indent=4, ensure_ascii=False))  # reading back from JSON

# In[10]:


Add_Track_AudioFeatures(Found_Albums)
# print(json.dumps(Found_Albums, indent=4, ensure_ascii=False))  # reading back from JSON

5 playlists completed
Loop #: 5
Elapsed Time: 12.143371105194092 seconds
10 playlists completed
Loop #: 10
Elapsed Time: 26.125358819961548 seconds
15 playlists completed
Loop #: 15
Elapsed Time: 45.9360728263855 seconds
20 playlists completed
Loop #: 20
Elapsed Time: 59.7347047328949 seconds
25 playlists completed
Loop #: 25
Elapsed Time: 69.74460911750793 seconds
30 playlists completed
Loop #: 30
Elapsed Time: 80.70419383049011 seconds
35 playlists completed
Loop #: 35
Elapsed Time: 97.10355472564697 seconds
40 playlists completed
Loop #: 40
Elapsed Time: 109.0631947517395 seconds
45 playlists completed
Loop #: 45
Elapsed Time: 119.25600051879883 seconds
50 playlists completed
Loop #: 50
Elapsed Time: 127.88742470741272 seconds
55 playlists completed
Loop #: 55
Elapsed Time: 137.69802737236023 seconds
60 playlists completed
Loop #: 60
Elapsed Time: 150.1778757572174 seconds
65 playlists completed
Loop #: 65
Elapsed Time: 161.25378155708313 seconds
70 playlists completed
Loop #: 70
El

545 playlists completed
Loop #: 545
Elapsed Time: 1402.639867067337 seconds
550 playlists completed
Loop #: 550
Elapsed Time: 1417.683729171753 seconds
555 playlists completed
Loop #: 555
Elapsed Time: 1427.4213309288025 seconds
560 playlists completed
Loop #: 560
Elapsed Time: 1439.67596077919 seconds
565 playlists completed
Loop #: 565
Elapsed Time: 1448.5513417720795 seconds
570 playlists completed
Loop #: 570
Elapsed Time: 1465.9805879592896 seconds
575 playlists completed
Loop #: 575
Elapsed Time: 1475.3559818267822 seconds
580 playlists completed
Loop #: 580
Elapsed Time: 1489.0386326313019 seconds
585 playlists completed
Loop #: 585
Elapsed Time: 1508.8492028713226 seconds
590 playlists completed
Loop #: 590
Elapsed Time: 1521.9340415000916 seconds
595 playlists completed
Loop #: 595
Elapsed Time: 1534.0341851711273 seconds
600 playlists completed
Loop #: 600
Elapsed Time: 1547.5361504554749 seconds
605 playlists completed
Loop #: 605
Elapsed Time: 1557.6016414165497 seconds
610

In [26]:
print(len(Found_Albums))

1009


In [39]:
Found_Albums_backup = copy.deepcopy(Found_Albums)

In [40]:


# In[12]:


dic_df = create_df_dict(Found_Albums)

# In[13]:



df = pd.DataFrame.from_dict(dic_df)
df

# ## Remove duplicates
# Spotify has a duplicate issue which we can only address by removing all but the most popular songs

# In[14]:


print(len(df))
final_df = df.sort_values('popularity', ascending=False).drop_duplicates('name_track').sort_index()
print(len(final_df))

# ## Save to CSV

# In[23]:


csv_save_filename = "sp-p4k-2017-tracks-test.csv"
final_df.to_csv(csv_save_filename, encoding='utf-8-sig')

# In[16]:


final_df

# In[18]:


# combine the albums
combined_albums_df = combine_albums(final_df)
combined_albums_df

# In[22]:


# save combined albums to csv
csv_save_filename = "sp-p4k-2017-albums-test.csv"
combined_albums_df.to_csv(csv_save_filename, encoding="utf-8-sig")

12298
11854
