# Working with Pandas and Plotting

To get started, let's import the Pandas module:

In [1]:
import pandas as pd

We are going to read data from a CSV file stored in GitHub (code repository resource). To read it in with Pandas, we run the cell below.

In [2]:
url = 'https://raw.githubusercontent.com/ag12s/CreateWithCodeModules/main/Modeling%20Pollution%20with%20Python/spotify.csv'

spotify_info = pd.read_csv(url)

This notebook contains the number of times certain songs were played on given dates.

In [3]:
spotify_info

Unnamed: 0,Date,Shape of You,Despacito,Something Just Like This,HUMBLE.,Unforgettable
0,2017-01-06,12287078,,,,
1,2017-01-07,13190270,,,,
2,2017-01-08,13099919,,,,
3,2017-01-09,14506351,,,,
4,2017-01-10,14275628,,,,
...,...,...,...,...,...,...
361,2018-01-05,4492978,3450315.0,2408365.0,2685857.0,2869783.0
362,2018-01-06,4416476,3394284.0,2188035.0,2559044.0,2743748.0
363,2018-01-07,4009104,3020789.0,1908129.0,2350985.0,2441045.0
364,2018-01-08,4135505,2755266.0,2023251.0,2523265.0,2622693.0


You'll notice that not every song was played each day! We can use one of the Pandas methods we learned last week, like `dropna()` to get red of rows that do not have data for each song. Or, we can use something like `fillna()` to replace the missing data with the number 0.

Let's see what data type each column has:

In [7]:
spotify_info.dtypes

Date                         object
Shape of You                  int64
Despacito                   float64
Something Just Like This    float64
HUMBLE.                     float64
Unforgettable               float64
dtype: object

The 'Date' column is stored as an object. Pandas actually has a data type called *DateTime*, and we can convert this column to that type. To do that, we pass the column to the function `pd.to_datetime`:

In [8]:
spotify_info['Date'] = pd.to_datetime(spotify_info['Date'])

spotify_info.head()

Unnamed: 0,Date,Shape of You,Despacito,Something Just Like This,HUMBLE.,Unforgettable
0,2017-01-06,12287078,,,,
1,2017-01-07,13190270,,,,
2,2017-01-08,13099919,,,,
3,2017-01-09,14506351,,,,
4,2017-01-10,14275628,,,,


Now, if we check our data types, we will see the 'DateTime' type for the first column:

In [9]:
spotify_info.dtypes

Date                        datetime64[ns]
Shape of You                         int64
Despacito                          float64
Something Just Like This           float64
HUMBLE.                            float64
Unforgettable                      float64
dtype: object

This was an important thing to do because it allows us to actually use the dates listed; we can **sort** them in any given order, **group** by periods of time (weeks, months, years, etc.), or **slice** the DataFrame according to specific dates.

To get the minimum and maximum values of this data, we can use Panda's `min()` and `max()` functions. We may be particularly interested in the start and end dates of the 'Date' column:

In [13]:
spotify_info.min()

Date                        2017-01-06 00:00:00
Shape of You                            3497682
Despacito                                275178
Something Just Like This                 215591
HUMBLE.                             1.63849e+06
Unforgettable                            308838
dtype: object

In [11]:
spotify_info.max()

Date                        2018-01-09 00:00:00
Shape of You                           19764745
Despacito                           2.32182e+07
Something Just Like This            9.73693e+06
HUMBLE.                              1.3145e+07
Unforgettable                       7.48316e+06
dtype: object

### Exercise 1

We want to look at data for days when all 5 songs were played. Use `dropna()` to update `spotify_info` to only hold rows without `NaN` entries.

Then, use `sort_values()` to make sure the 'Date' column is in chronological order.

Lastly, ensure our indices are in ascending numeric order with `reset_index()`.

In [None]:
# 1. Use dropna()


In [None]:
# 2. Use sort_values()


In [None]:
# 3. Use reset_index()


### Exercise 2

Let's visualize our data! Plot the number of times each song was played per day.

We want the Date on the x-axis and the number of songs on the y-axis. Make a separate line for each song, but all in the same plot.

In [None]:
# Plot the DataFrame contents here!


### Exercise 3

Let's get some experience working with a different axis of the DataFrame. In the last exercise, we could use the default `axis=0` direction. Now, let's look across the rows with `axis=1`!

Plot the average number of times a song was played (for all songs, combined) for each day.

Keep in mind -- there are several ways you can go about this!

In [None]:
# Plot the average number of song plays per day here!
