October 11, 2023

Follow up to **Charlie's question** today: "Can we use `nsmallest` for dataframes?"

There *is* an `nsmallest()` method for data frames! Let's check it out.

In class we used `nsmallest()` for `pandas.Series` like this:

In [1]:
import pandas as pd

# read in data
penguins  = pd.read_csv('https://bit.ly/palmer-penguins-csv')

In [2]:
# make a series with the 10 smallest values of body mass
smallest = penguins.body_mass_g.nsmallest(10)
smallest

314    2700.0
58     2850.0
64     2850.0
54     2900.0
98     2900.0
116    2900.0
298    2900.0
104    2925.0
47     2975.0
44     3000.0
Name: body_mass_g, dtype: float64

In class **Sofia also noticed** the values already come out sorted. Great observation! Looking at [the documentation](https://pandas.pydata.org/docs/reference/api/pandas.Series.nsmallest.html), we can see the return values are always sorted in *increasing* order:

![](nsmallest-series.png)


Then we used the `smallest` series to select those rows from the `penguins` data frame:

In [3]:
# as a one liner it would look like this (the penguins selection is inside [])
penguins.loc[penguins.body_mass_g.nsmallest(10).index]

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
314,Chinstrap,Dream,46.9,16.6,192.0,2700.0,female,2008
58,Adelie,Biscoe,36.5,16.6,181.0,2850.0,female,2008
64,Adelie,Biscoe,36.4,17.1,184.0,2850.0,female,2008
54,Adelie,Biscoe,34.5,18.1,187.0,2900.0,female,2008
98,Adelie,Dream,33.1,16.1,178.0,2900.0,female,2008
116,Adelie,Torgersen,38.6,17.0,188.0,2900.0,female,2009
298,Chinstrap,Dream,43.2,16.6,187.0,2900.0,female,2007
104,Adelie,Biscoe,37.9,18.6,193.0,2925.0,female,2009
47,Adelie,Dream,37.5,18.9,179.0,2975.0,,2007
44,Adelie,Dream,37.0,16.9,185.0,3000.0,female,2007


## `nsmallest()` for `pandas.DataFrame`

In case we are not interested in storing the smallest values for another use, we can also use the **`nsmallest()` method for `pandas.DataFrame`**. [From the documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nsmallest.html):

![](nsmallest-dataframe.png)

In [4]:
# select rows based on the 10 smallest values in body_mass_g columns
penguins.nsmallest(10,'body_mass_g')

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
314,Chinstrap,Dream,46.9,16.6,192.0,2700.0,female,2008
58,Adelie,Biscoe,36.5,16.6,181.0,2850.0,female,2008
64,Adelie,Biscoe,36.4,17.1,184.0,2850.0,female,2008
54,Adelie,Biscoe,34.5,18.1,187.0,2900.0,female,2008
98,Adelie,Dream,33.1,16.1,178.0,2900.0,female,2008
116,Adelie,Torgersen,38.6,17.0,188.0,2900.0,female,2009
298,Chinstrap,Dream,43.2,16.6,187.0,2900.0,female,2007
104,Adelie,Biscoe,37.9,18.6,193.0,2925.0,female,2009
47,Adelie,Dream,37.5,18.9,179.0,2975.0,,2007
44,Adelie,Dream,37.0,16.9,185.0,3000.0,female,2007


And we can even use multiple columns to select data:

In [5]:
# select rows based on the 10 smallest values in body_mass_g columns
penguins.nsmallest(10,['body_mass_g','bill_depth_mm'])

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
314,Chinstrap,Dream,46.9,16.6,192.0,2700.0,female,2008
58,Adelie,Biscoe,36.5,16.6,181.0,2850.0,female,2008
64,Adelie,Biscoe,36.4,17.1,184.0,2850.0,female,2008
98,Adelie,Dream,33.1,16.1,178.0,2900.0,female,2008
298,Chinstrap,Dream,43.2,16.6,187.0,2900.0,female,2007
116,Adelie,Torgersen,38.6,17.0,188.0,2900.0,female,2009
54,Adelie,Biscoe,34.5,18.1,187.0,2900.0,female,2008
104,Adelie,Biscoe,37.9,18.6,193.0,2925.0,female,2009
47,Adelie,Dream,37.5,18.9,179.0,2975.0,,2007
144,Adelie,Dream,37.3,16.8,192.0,3000.0,female,2009


What's going on there? It is first oredering (ascending) by the values in `body_mass_g` and then ordering (ascending) by the values in `bill_depth_mm`.

`pandas.Series` and `pandas.DataFrames` often share similar methods - it's worth looking up both of them! 

**Great questions in class today!!**