# Learn Data Science with Spotify
Welcome! This tutorial is a complement to the textbook for the EdX course Data 8.1x, Data Science: Computational Thinking with Python. Since the textbook uses a custom Tables object instead of a standard Pandas dataframe, I thought it would be fun to show how to do the same operations in Pandas. We'll use your own Spotify listening history as a source of data, so you might learn something about your music tastes at the same time.

You can also follow it independently as a simple introduction to working with Pandas and learn some basic concepts of data science.

This is a Jupyter notebook, another standard data science tool. You can find tutorials online, but the basics are simple: cells contain text or code and can be modified.  You can execute code in the cells by selecting it and pressing the play button or shift-enter. If a cell fails to execute, it's usually because you didn't execute necessary code in a preceding cell. For this reason, it's usually best to execute all cells in order.

Before we get started, we need to do a bit of setup. First, let's import pandas and numpy, two standard data science packages for Python. Run the following cell (and all future code cells as you come to them):

In [1]:
import pandas as pd
import numpy as np

Next, we need to get your music history from Spotify. When you run the following cell, Spotify will ask you for permission for our app to access your top artists from your listening history and your followed artists. If you say yes, the app will download your top tracks and followed artists and convert the data into CSV format. If you prefer to skip this step, you can use the data from my own listening history.

Note that you only need to run the following cell once to create the CSV file, or again if you want to update the file with your latest history.

If you prefer the app to only use your followed artists or your top artists, you can modify the cell below with a keyword argument. For example:

- `user_followed_csv(top=False)` will only get your followed artists
- `user_followed_csv(followed=False)` will only get your top artists

In [2]:
from dataspot import user_followed_csv
user_followed_csv()

Configuration Succesful
Configuration Succesful


Now that we have our CSV file, let's load it into Pandas as a DataFrame:

In [3]:
with open('data/user_artists.csv', 'r') as csv_file:
    user_artists = pd.read_csv(csv_file, index_col='name')

Good! We should now have everything we need to start following the textbook. We'll pick things up at [chapter 3.4](https://www.inferentialthinking.com/chapters/03/4/Introduction_to_Tables.html), which introduces the concept of tables. The equivalent in Pandas is called a DataFrame. Let's see what it looks like in a Jupyter notebook by running the following cell:

In [8]:
user_artists

Unnamed: 0_level_0,followers,popularity,total_genres,genre_0,genre_1,genre_2,genre_3,genre_4,genre_5,genre_6,...,genre_12,genre_13,genre_14,genre_15,genre_16,genre_17,genre_18,genre_19,id,uri
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Kamaal Williams,58152,46,4,indie jazz,indie soul,neo r&b,uk contemporary jazz,,,,...,,,,,,,,,01mXk9IDlVczWwZvVHAiIS,spotify:artist:01mXk9IDlVczWwZvVHAiIS
Madison McFerrin,38397,47,5,a cappella,alternative r&b,indie jazz,indie soul,neo r&b,,,...,,,,,,,,,02zPEtdzUWnPToEVLRiQ7e,spotify:artist:02zPEtdzUWnPToEVLRiQ7e
Scott Walker,94071,49,11,art pop,art rock,baroque pop,brill building pop,dance rock,experimental,experimental rock,...,,,,,,,,,04tBaW21jyUfeP5iqiKBVq,spotify:artist:04tBaW21jyUfeP5iqiKBVq
Jerico,168,2,1,doujin,,,,,,,...,,,,,,,,,050aWtsntLl4HdCJSoCNDa,spotify:artist:050aWtsntLl4HdCJSoCNDa
CocoRosie,297601,50,4,art pop,folktronica,freak folk,new weird america,,,,...,,,,,,,,,05fo024EFotg9songSENOZ,spotify:artist:05fo024EFotg9songSENOZ
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Beatrice Dillon,11929,39,10,art pop,chamber psych,deconstructed club,electra,experimental synth,float house,fluxwork,...,,,,,,,,,14H1XUmtWYzRHCQDkoee97,spotify:artist:14H1XUmtWYzRHCQDkoee97
Pauline Anna Strom,4316,31,2,electra,fourth world,,,,,,...,,,,,,,,,1N5oRpOIshVJwICjXqkHPW,spotify:artist:1N5oRpOIshVJwICjXqkHPW
Free Nationals,121125,64,2,alternative r&b,indie soul,,,,,,...,,,,,,,,,4596e2d3KmYzAeVenjCxfj,spotify:artist:4596e2d3KmYzAeVenjCxfj
BADBADNOTGOOD,475899,63,6,alternative hip hop,canadian modern jazz,escape room,funk,hip hop,indie soul,,...,,,,,,,,,65dGLGjkw3UbddUg2GKQoZ,spotify:artist:65dGLGjkw3UbddUg2GKQoZ


Unless you barely listen to Spotify, the notebook probably can't show all the data at once. It should display the first five and last five rows of artists, as well as the first ten and last ten columns of data.

If you don't recognize the artists, it's possible the app failed to download your data from Spotify and you are using the default data from my own listening history. Make sure you ran the cell with the command `user_followed_csv()` (after executing all previous code cells in order). If that cell failed to execute properly, I'm sorry. You'll have to make do with the default data.

You might notice that the `name` column is in bold. That's because it's the index column, which we specified with the kwarg `index_col` when loading the CSV file. This will be useful later.

Panda's`head` method is similar to Table's `show` method in the textbook. By default, it will display the first five rows:

In [9]:
user_artists.head()

Unnamed: 0_level_0,followers,popularity,total_genres,genre_0,genre_1,genre_2,genre_3,genre_4,genre_5,genre_6,...,genre_12,genre_13,genre_14,genre_15,genre_16,genre_17,genre_18,genre_19,id,uri
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Kamaal Williams,58152,46,4,indie jazz,indie soul,neo r&b,uk contemporary jazz,,,,...,,,,,,,,,01mXk9IDlVczWwZvVHAiIS,spotify:artist:01mXk9IDlVczWwZvVHAiIS
Madison McFerrin,38397,47,5,a cappella,alternative r&b,indie jazz,indie soul,neo r&b,,,...,,,,,,,,,02zPEtdzUWnPToEVLRiQ7e,spotify:artist:02zPEtdzUWnPToEVLRiQ7e
Scott Walker,94071,49,11,art pop,art rock,baroque pop,brill building pop,dance rock,experimental,experimental rock,...,,,,,,,,,04tBaW21jyUfeP5iqiKBVq,spotify:artist:04tBaW21jyUfeP5iqiKBVq
Jerico,168,2,1,doujin,,,,,,,...,,,,,,,,,050aWtsntLl4HdCJSoCNDa,spotify:artist:050aWtsntLl4HdCJSoCNDa
CocoRosie,297601,50,4,art pop,folktronica,freak folk,new weird america,,,,...,,,,,,,,,05fo024EFotg9songSENOZ,spotify:artist:05fo024EFotg9songSENOZ


If you give it an integer, it will display that many rows:

In [10]:
user_artists.head(2)

Unnamed: 0_level_0,followers,popularity,total_genres,genre_0,genre_1,genre_2,genre_3,genre_4,genre_5,genre_6,...,genre_12,genre_13,genre_14,genre_15,genre_16,genre_17,genre_18,genre_19,id,uri
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Kamaal Williams,58152,46,4,indie jazz,indie soul,neo r&b,uk contemporary jazz,,,,...,,,,,,,,,01mXk9IDlVczWwZvVHAiIS,spotify:artist:01mXk9IDlVczWwZvVHAiIS
Madison McFerrin,38397,47,5,a cappella,alternative r&b,indie jazz,indie soul,neo r&b,,,...,,,,,,,,,02zPEtdzUWnPToEVLRiQ7e,spotify:artist:02zPEtdzUWnPToEVLRiQ7e


Pandas also offers a `tail` method to show the last rows instead of the first:

In [12]:
user_artists.tail(8)

Unnamed: 0_level_0,followers,popularity,total_genres,genre_0,genre_1,genre_2,genre_3,genre_4,genre_5,genre_6,...,genre_12,genre_13,genre_14,genre_15,genre_16,genre_17,genre_18,genre_19,id,uri
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Actress,63596,48,20,ambient,art pop,bass music,chamber psych,chillwave,deconstructed club,dub techno,...,fourth world,future garage,hauntology,intelligent dance music,microhouse,ninja,outsider house,wonky,3bg5rmICvmA8dmYVAdKGYH,spotify:artist:3bg5rmICvmA8dmYVAdKGYH
James Gilchrist,631,38,2,choral,classical tenor,,,,,,...,,,,,,,,,53h0zu3PaL8vYCgCepxdBA,spotify:artist:53h0zu3PaL8vYCgCepxdBA
Main Source,47293,41,5,alternative hip hop,hardcore hip hop,hip hop,queens hip hop,turntablism,,,...,,,,,,,,,0zi2OowIfzNqUQiuUVyGLs,spotify:artist:0zi2OowIfzNqUQiuUVyGLs
Beatrice Dillon,11929,39,10,art pop,chamber psych,deconstructed club,electra,experimental synth,float house,fluxwork,...,,,,,,,,,14H1XUmtWYzRHCQDkoee97,spotify:artist:14H1XUmtWYzRHCQDkoee97
Pauline Anna Strom,4316,31,2,electra,fourth world,,,,,,...,,,,,,,,,1N5oRpOIshVJwICjXqkHPW,spotify:artist:1N5oRpOIshVJwICjXqkHPW
Free Nationals,121125,64,2,alternative r&b,indie soul,,,,,,...,,,,,,,,,4596e2d3KmYzAeVenjCxfj,spotify:artist:4596e2d3KmYzAeVenjCxfj
BADBADNOTGOOD,475899,63,6,alternative hip hop,canadian modern jazz,escape room,funk,hip hop,indie soul,,...,,,,,,,,,65dGLGjkw3UbddUg2GKQoZ,spotify:artist:65dGLGjkw3UbddUg2GKQoZ
King Geedorah,93293,55,3,alternative hip hop,hardcore hip hop,hip hop,,,,,...,,,,,,,,,77AKJs9SJqxHXbPgtJPKRa,spotify:artist:77AKJs9SJqxHXbPgtJPKRa


You can select a single column by index using square brackets without changing the original DataFrame. This is equivalent to the Table `select` method in the textbook:

In [13]:
user_artists['followers']

name
Kamaal Williams        58152
Madison McFerrin       38397
Scott Walker           94071
Jerico                   168
CocoRosie             297601
                       ...  
Beatrice Dillon        11929
Pauline Anna Strom      4316
Free Nationals        121125
BADBADNOTGOOD         475899
King Geedorah          93293
Name: followers, Length: 715, dtype: int64

You'll notice Pandas gives you some information about the column you selected: it's name, the number of rows, and the Pandas data type of all entries.

The original DataFrame is unchanged:

In [14]:
user_artists

Unnamed: 0_level_0,followers,popularity,total_genres,genre_0,genre_1,genre_2,genre_3,genre_4,genre_5,genre_6,...,genre_12,genre_13,genre_14,genre_15,genre_16,genre_17,genre_18,genre_19,id,uri
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Kamaal Williams,58152,46,4,indie jazz,indie soul,neo r&b,uk contemporary jazz,,,,...,,,,,,,,,01mXk9IDlVczWwZvVHAiIS,spotify:artist:01mXk9IDlVczWwZvVHAiIS
Madison McFerrin,38397,47,5,a cappella,alternative r&b,indie jazz,indie soul,neo r&b,,,...,,,,,,,,,02zPEtdzUWnPToEVLRiQ7e,spotify:artist:02zPEtdzUWnPToEVLRiQ7e
Scott Walker,94071,49,11,art pop,art rock,baroque pop,brill building pop,dance rock,experimental,experimental rock,...,,,,,,,,,04tBaW21jyUfeP5iqiKBVq,spotify:artist:04tBaW21jyUfeP5iqiKBVq
Jerico,168,2,1,doujin,,,,,,,...,,,,,,,,,050aWtsntLl4HdCJSoCNDa,spotify:artist:050aWtsntLl4HdCJSoCNDa
CocoRosie,297601,50,4,art pop,folktronica,freak folk,new weird america,,,,...,,,,,,,,,05fo024EFotg9songSENOZ,spotify:artist:05fo024EFotg9songSENOZ
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Beatrice Dillon,11929,39,10,art pop,chamber psych,deconstructed club,electra,experimental synth,float house,fluxwork,...,,,,,,,,,14H1XUmtWYzRHCQDkoee97,spotify:artist:14H1XUmtWYzRHCQDkoee97
Pauline Anna Strom,4316,31,2,electra,fourth world,,,,,,...,,,,,,,,,1N5oRpOIshVJwICjXqkHPW,spotify:artist:1N5oRpOIshVJwICjXqkHPW
Free Nationals,121125,64,2,alternative r&b,indie soul,,,,,,...,,,,,,,,,4596e2d3KmYzAeVenjCxfj,spotify:artist:4596e2d3KmYzAeVenjCxfj
BADBADNOTGOOD,475899,63,6,alternative hip hop,canadian modern jazz,escape room,funk,hip hop,indie soul,,...,,,,,,,,,65dGLGjkw3UbddUg2GKQoZ,spotify:artist:65dGLGjkw3UbddUg2GKQoZ


To select multiple columns, you have to pass a list of column names. Watch out for the double brackets here, which indicate the list:

In [16]:
user_artists[['followers', 'popularity']]

Unnamed: 0_level_0,followers,popularity
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Kamaal Williams,58152,46
Madison McFerrin,38397,47
Scott Walker,94071,49
Jerico,168,2
CocoRosie,297601,50
...,...,...
Beatrice Dillon,11929,39
Pauline Anna Strom,4316,31
Free Nationals,121125,64
BADBADNOTGOOD,475899,63


We can also drop columns we're not interested in for the moment with the `drop` method, which is similar to the method with the same name in the textbook. The simplest syntax is to pass a list of columns as a kwarg:

In [20]:
user_artists.drop(columns=['followers'])

Unnamed: 0_level_0,popularity,total_genres,genre_0,genre_1,genre_2,genre_3,genre_4,genre_5,genre_6,genre_7,...,genre_12,genre_13,genre_14,genre_15,genre_16,genre_17,genre_18,genre_19,id,uri
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Kamaal Williams,46,4,indie jazz,indie soul,neo r&b,uk contemporary jazz,,,,,...,,,,,,,,,01mXk9IDlVczWwZvVHAiIS,spotify:artist:01mXk9IDlVczWwZvVHAiIS
Madison McFerrin,47,5,a cappella,alternative r&b,indie jazz,indie soul,neo r&b,,,,...,,,,,,,,,02zPEtdzUWnPToEVLRiQ7e,spotify:artist:02zPEtdzUWnPToEVLRiQ7e
Scott Walker,49,11,art pop,art rock,baroque pop,brill building pop,dance rock,experimental,experimental rock,freak folk,...,,,,,,,,,04tBaW21jyUfeP5iqiKBVq,spotify:artist:04tBaW21jyUfeP5iqiKBVq
Jerico,2,1,doujin,,,,,,,,...,,,,,,,,,050aWtsntLl4HdCJSoCNDa,spotify:artist:050aWtsntLl4HdCJSoCNDa
CocoRosie,50,4,art pop,folktronica,freak folk,new weird america,,,,,...,,,,,,,,,05fo024EFotg9songSENOZ,spotify:artist:05fo024EFotg9songSENOZ
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Beatrice Dillon,39,10,art pop,chamber psych,deconstructed club,electra,experimental synth,float house,fluxwork,fourth world,...,,,,,,,,,14H1XUmtWYzRHCQDkoee97,spotify:artist:14H1XUmtWYzRHCQDkoee97
Pauline Anna Strom,31,2,electra,fourth world,,,,,,,...,,,,,,,,,1N5oRpOIshVJwICjXqkHPW,spotify:artist:1N5oRpOIshVJwICjXqkHPW
Free Nationals,64,2,alternative r&b,indie soul,,,,,,,...,,,,,,,,,4596e2d3KmYzAeVenjCxfj,spotify:artist:4596e2d3KmYzAeVenjCxfj
BADBADNOTGOOD,63,6,alternative hip hop,canadian modern jazz,escape room,funk,hip hop,indie soul,,,...,,,,,,,,,65dGLGjkw3UbddUg2GKQoZ,spotify:artist:65dGLGjkw3UbddUg2GKQoZ


Again, none of these methods modify the original DataFrame. If we want to work on a modified version of the DataFrame, we have to assign it a variable name. For example, if we wanted to work on music genres, we could save a DataFrame that only contained that information:

In [24]:
music_genres = user_artists.drop(columns=['followers','popularity'])

Now, we can refew to this new DataFrame anytime we want:

In [25]:
music_genres.head()

Unnamed: 0_level_0,total_genres,genre_0,genre_1,genre_2,genre_3,genre_4,genre_5,genre_6,genre_7,genre_8,...,genre_12,genre_13,genre_14,genre_15,genre_16,genre_17,genre_18,genre_19,id,uri
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Kamaal Williams,4,indie jazz,indie soul,neo r&b,uk contemporary jazz,,,,,,...,,,,,,,,,01mXk9IDlVczWwZvVHAiIS,spotify:artist:01mXk9IDlVczWwZvVHAiIS
Madison McFerrin,5,a cappella,alternative r&b,indie jazz,indie soul,neo r&b,,,,,...,,,,,,,,,02zPEtdzUWnPToEVLRiQ7e,spotify:artist:02zPEtdzUWnPToEVLRiQ7e
Scott Walker,11,art pop,art rock,baroque pop,brill building pop,dance rock,experimental,experimental rock,freak folk,melancholia,...,,,,,,,,,04tBaW21jyUfeP5iqiKBVq,spotify:artist:04tBaW21jyUfeP5iqiKBVq
Jerico,1,doujin,,,,,,,,,...,,,,,,,,,050aWtsntLl4HdCJSoCNDa,spotify:artist:050aWtsntLl4HdCJSoCNDa
CocoRosie,4,art pop,folktronica,freak folk,new weird america,,,,,,...,,,,,,,,,05fo024EFotg9songSENOZ,spotify:artist:05fo024EFotg9songSENOZ


The original table remains unchanged. Let's use it to create a simple table to study the popularity of your favorite artists:

In [28]:
artist_popularity = user_artists[['popularity','followers']]
artist_popularity

Unnamed: 0_level_0,popularity,followers
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Kamaal Williams,46,58152
Madison McFerrin,47,38397
Scott Walker,49,94071
Jerico,2,168
CocoRosie,50,297601
...,...,...
Beatrice Dillon,39,11929
Pauline Anna Strom,31,4316
Free Nationals,64,121125
BADBADNOTGOOD,63,475899


Of course, this table would be much more interesting if it were sorted! We can do this with the `sort_values` method, which is equivalent to the `sort` method in the textbook:

In [31]:
artist_popularity.sort_values('popularity')

Unnamed: 0_level_0,popularity,followers
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Geneviève et Mathieu,0,10
Rozalind MacPhail,0,112
Nennen,0,6
Fanacanta,0,58
Jacobus et Maleco,0,49
...,...,...
Madvillain,73,335179
Yoko Ono,73,80878
Marvin Gaye,77,3789117
MF DOOM,80,813464


Pandas has another sort function, `sort_index`, which always sorts according to the index column we specified when creating the DataFrame:

In [32]:
artist_popularity.sort_index()

Unnamed: 0_level_0,popularity,followers
name,Unnamed: 1_level_1,Unnamed: 2_level_1
30/70,36,18228
75 Dollar Bill,19,5331
A Tribe Called Quest,70,1368562
ADULT.,33,25752
Aavikko,24,3436
...,...,...
rRoxymore,27,5143
serpentwithfeet,46,64867
upsammy,34,5885
Âme,51,92613


Let's sort by popularity again, but this time, we'll put the most popular artists at the top of the list. In the textbook, the kwarg for this is `descending=True`, but in Pandas, we'll use `ascending=False` instead.

In [34]:
artist_popularity.sort_values('popularity', ascending=False)

Unnamed: 0_level_0,popularity,followers
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Kanye West,91,13101096
MF DOOM,80,813464
Marvin Gaye,77,3789117
Yoko Ono,73,80878
Madvillain,73,335179
...,...,...
Nuclear Winter Garden,0,33
Jacob Yates,0,51
Gut und Irmler,0,174
Fanacanta,0,58


Maybe you'll notice that some of your artists have the same popularity, but a different number of followers. We can sort by multiple columns to break the ties using the number of followers:

In [35]:
artist_popularity.sort_values(['popularity','followers'], ascending=False)

Unnamed: 0_level_0,popularity,followers
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Kanye West,91,13101096
MF DOOM,80,813464
Marvin Gaye,77,3789117
Madvillain,73,335179
Yoko Ono,73,80878
...,...,...
Nuclear Winter Garden,0,33
Sealy Sikes,0,20
Geneviève et Mathieu,0,10
Nennen,0,6


Again, the `sort_value` method doesn't change the original DataFrame. We can assign the new one a name, as for the previous methods, but we can also use the `in_place` kwarg to sort the original. The same kwarg can be used with the `sort_index` method.

In [54]:
artist_popularity.sort_values(['popularity','followers'], ascending=False, inplace=True)
artist_popularity

Unnamed: 0_level_0,popularity,followers
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Kanye West,91,13101096
MF DOOM,80,813464
Marvin Gaye,77,3789117
Madvillain,73,335179
Yoko Ono,73,80878
...,...,...
Nuclear Winter Garden,0,33
Sealy Sikes,0,20
Geneviève et Mathieu,0,10
Nennen,0,6


There are different ways to filter the data according to certain values, similarly to the `where` method in the textbook. One of the simplest is called boolean indexing, where we pass a condition to the DataFrame and only the rows where the value is True are returned:

In [61]:
artist_popularity[artist_popularity['followers'] > 50000]

Unnamed: 0_level_0,popularity,followers
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Kanye West,91,13101096
MF DOOM,80,813464
Marvin Gaye,77,3789117
Madvillain,73,335179
Yoko Ono,73,80878
...,...,...
Omar Souleyman,41,58900
Marissa Nadler,41,56493
Cibo Matto,41,56247
Sons Of Kemet,39,69720
