# Clean Kworb.net Spotify Most Streamed Artists
The code below just takes the csv file "kworb_top_artists.csv" that was orginially made in "kworb_artists_scraper.ipynb". It get's rid of all the commas to convert the numbers into ints to make visualisation easier. It also fills "missing" entries with 0s because there are many artists that are not featured on any tracks and are not considered "lead artist" so we can safely assume that the entries should be 0.

In [28]:
import pandas as pd
import csv
import os

In [30]:
# Read in CSV file of top Spotify artists
df = pd.read_csv('../../Data/Raw/kworb_spotify_top_artists.csv')
df.head()

Unnamed: 0,Artist,total_streams,streams_as_lead,solo_streams,featured_streams,daily_streams,tracks_total,tracks_as_lead,tracks_solo,tracks_as_feature
0,Drake,112592066779,76886517261,42430139565,35705549518,50091662,513,314,197,199
1,Taylor Swift,102222940841,98993346012,90320824607,3229594829,49907494,593,579,519,14
2,Bad Bunny,97253221301,61472589366,34896852568,35780631935,63464912,269,144,87,125
3,The Weeknd,79246015185,63772467221,42714560989,15473547964,44626887,319,241,169,78
4,Justin Bieber,59964949358,36504644580,22040079971,23460304778,22204379,289,206,111,83


Make a list of all the columns that need to be cleaned (the ones containing numbers) and fill any missing values with 0.

In [38]:
# List of columns to clean
columns_to_clean = [
    'total_streams', 'streams_as_lead', 'solo_streams', 'featured_streams',
    'daily_streams', 'tracks_total', 'tracks_as_lead', 'tracks_solo', 'tracks_as_feature'
]

# Clean commas and convert to numeric
for col in columns_to_clean:
    df[col] = pd.to_numeric(df[col].astype(str).str.replace(',', ''), errors='coerce')

# Fill empty rows with 0 (some artists aren't featured in any songs)
df[columns_to_clean] = df[columns_to_clean].fillna(0).astype(int)

print(df.head())

          Artist  total_streams  streams_as_lead  solo_streams  \
0          Drake   112592066779      76886517261   42430139565   
1   Taylor Swift   102222940841      98993346012   90320824607   
2      Bad Bunny    97253221301      61472589366   34896852568   
3     The Weeknd    79246015185      63772467221   42714560989   
4  Justin Bieber    59964949358      36504644580   22040079971   

   featured_streams  daily_streams  tracks_total  tracks_as_lead  tracks_solo  \
0       35705549518       50091662           513             314          197   
1        3229594829       49907494           593             579          519   
2       35780631935       63464912           269             144           87   
3       15473547964       44626887           319             241          169   
4       23460304778       22204379           289             206          111   

   tracks_as_feature  
0                199  
1                 14  
2                125  
3                 78  
4

Save the file to the processed data folder.

In [45]:
# Save the DataFrame to CSV
df.to_csv('../../Data/Processed/kworb_spotify_artists.csv', index=False)