# Scraping to Plotting Lab

### Introduction

Ok, now it's time to work through scraping to plotting data on our own.  

### Collecting our Data

Let's gather a list of popular songs.  From Wikipedia, we can gather the list of most streamed songs from Spotify.  It's located at the following url.

In [None]:
url = 'https://en.wikipedia.org/wiki/List_of_most-streamed_songs_on_Spotify'

Ok, now let's use pandas to find the list of tables from this url.  Store the list of tables to the variable `tables`.

In [10]:
import pandas as pd

tables = pd.read_html(url)
tables[3]

Unnamed: 0,Song,Artist(s),Weeks at No. 1,Average streams (millions),Date published,Date achieved,Ref.
0,"""Seven""",Jungkook featuring Latto,8,63.9,"July 14, 2023","July 20, 2023",[140]
1,"""Lala""",Myke Towers,1,39.4,"March 23, 2023","July 13, 2023",[141]
2,"""Vampire""",Olivia Rodrigo,1,52.0,"June 30, 2023","July 6, 2023",[142]
3,"""Un x100to""",Grupo Frontera and Bad Bunny,2,53.2,17 April 2023,"May 4, 2023",[143]
4,"""Ella Baila Sola""",Eslabon Armado and Peso Pluma,9[E],44.6,17 March 2023,20 April 2023,[144]
...,...,...,...,...,...,...,...
63,"""Despacito (remix)""",Luis Fonsi and Daddy Yankee featuring Justin B...,14,45.9,17 April 2017,27 April 2017,[203]
64,"""Humble""",Kendrick Lamar,1,41.4,30 March 2017,20 April 2017,[204]
65,"""Shape of You""",Ed Sheeran,14,51.9,6 January 2017,12 January 2017,[205]
66,"""Starboy""",The Weeknd featuring Daft Punk,2,25.5,21 September 2016,29 December 2016,[206]


Now that we have our list of tables, find the element that has the large table of songs, and store it as the variable `songs_table`.

In [11]:
songs_table = tables[3]

Once stored, we can convert our table, which is a pandas dataframe, to a list of dictionaries.  We can do this with the line `songs_table.to_dict('records')`.  Assign the result to the variable `songs`.

In [15]:
songs = songs_table.to_dict('records')
songs[:2]

[{'Song': '"Seven"',
  'Artist(s)': 'Jungkook featuring Latto',
  'Weeks at No. 1': '8',
  'Average streams (millions)': '63.9',
  'Date published': 'July 14, 2023',
  'Date achieved': 'July 20, 2023',
  'Ref.': '[140]'},
 {'Song': '"Lala"',
  'Artist(s)': 'Myke Towers',
  'Weeks at No. 1': '1',
  'Average streams (millions)': '39.4',
  'Date published': 'March 23, 2023',
  'Date achieved': 'July 13, 2023',
  'Ref.': '[141]'}]

Check that your result matches the commented out data in the cell.

In [16]:
songs[:2]

# [{'Song': '"Flowers"',
#   'Artist(s)': 'Miley Cyrus',
#   'Weeksat No. 1[152]': '1',
#   'Average streams(millions)': '96.0',
#   'Date published': '13 January 2023',
#   'Date achieved': '19 January 2023',
#   'Ref.': '[153]'},
#  {'Song': '"Kill Bill"',
#   'Artist(s)': 'SZA',
#   'Weeksat No. 1[152]': '2',
#   'Average streams(millions)': '44.4',
#   'Date published': '9 December 2022',
#   'Date achieved': '5 January 2023',
#   'Ref.': '[154]'}]

[{'Song': '"Seven"',
  'Artist(s)': 'Jungkook featuring Latto',
  'Weeks at No. 1': '8',
  'Average streams (millions)': '63.9',
  'Date published': 'July 14, 2023',
  'Date achieved': 'July 20, 2023',
  'Ref.': '[140]'},
 {'Song': '"Lala"',
  'Artist(s)': 'Myke Towers',
  'Weeks at No. 1': '1',
  'Average streams (millions)': '39.4',
  'Date published': 'March 23, 2023',
  'Date achieved': 'July 13, 2023',
  'Ref.': '[141]'}]

### Converting our list of dictionaries

Ok, so now we have a list of dictionaries, and we would like to have two lists -- one for each of the top songs, and another for the related number of streams of that songs.

First, use a for loop to create a list of the songs.  Store that as the variable `song_names`.

In [20]:
song_names = []

for song in songs:
  song_name = song['Song']
  song_names.append(song_name)



In [21]:
song_names[:3]

# ['"Flowers"', '"Kill Bill"', '"Anti-Hero"']

['"Seven"', '"Lala"', '"Vampire"']

Next we need a list of the number of streams.

In [24]:
streams = []
for stream in songs:
  stream_name = stream['Average streams (millions)']
  streams.append(stream_name)

In [25]:
streams[:3]
# ['96.0', '44.4', '64.0']

['63.9', '39.4', '52.0']

### Plotting our Data

Ok, now it's time for plotly. We start by importing our `plotly.graph_objects` library in such a way that we can reference it as `go`.

In [26]:
import plotly.graph_objects as go

Now we want to create a figure, and inside of the figure, place a trace.  Change the trace so that it prints out the correct information.  Remember our two lists are `song_names` and `streams`.

In [28]:
import plotly.graph_objects as go

scatter = go.Scatter(y = [song_names], mode = 'markers', hovertext = [streams])

go.Figure(scatter)

### Summary

Well, that's a job well done.  If you started these lessons without ever having coded before, please go give yourself a well deserved reward.  You earned it.

And if you'd like to keep going, check out one of our other courses or workshops.  You won't be disappointed :)