# Dunder Data Challenge 005 - Keeping Values Within Interquartile Range

In this challenge, you are given a table of closing stock prices for 10 different stocks with data going back as far as 1999. For each stock, calculate the [interquartile range (IQR)][1]. Return a DataFrame that satisfies the following conditions:

* Keep values as they are if they are within the IQR
* For values lower than the first quartile, make them equal to the exact value of the first quartile
* For values higher than the third quartile, make them equal to the exact value of the third quartile

[1]: https://en.wikipedia.org/wiki/Interquartile_range

In [1]:
import pandas as pd
stocks = pd.read_csv('../data/stocks10.csv', index_col='date', parse_dates=['date'])
stocks.head()

Unnamed: 0_level_0,MSFT,AAPL,SLB,AMZN,TSLA,XOM,WMT,T,FB,V
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1999-10-25,29.84,2.32,17.02,82.75,,21.45,38.99,16.78,,
1999-10-26,29.82,2.34,16.65,81.25,,20.89,37.11,17.28,,
1999-10-27,29.33,2.38,16.52,75.94,,20.8,36.94,18.27,,
1999-10-28,29.01,2.43,16.59,71.0,,21.19,38.85,19.79,,
1999-10-29,29.88,2.5,17.21,70.62,,21.47,39.25,20.0,,


### Challenge

There is a straightforward solution that completes this challenge in a single line of readable code. Can you find it?

## Solution

We begin by finding the first and third quartiles of each stock using the `quantile` method. This is an **aggregation** which returns a single value for each column by default. Set the first parameter, `q` to a float between 0 and 1 to represent the quantile. Below, we create two variables to hold the first and third quartiles (also known as the 25th and 75th percentiles) and output their results to the screen.

In [2]:
lower = stocks.quantile(.25)
upper = stocks.quantile(.75)

In [3]:
lower

MSFT    19.1500
AAPL     3.9100
SLB     25.6200
AMZN    40.4600
TSLA    33.9375
XOM     32.6200
WMT     37.6200
T       14.5000
FB      62.3000
V       19.4750
Name: 0.25, dtype: float64

In [4]:
upper

MSFT     39.2600
AAPL     90.5900
SLB      66.2900
AMZN    362.7000
TSLA    260.4700
XOM      71.8100
WMT      65.1500
T        26.2300
FB      162.3050
V        80.3375
Name: 0.75, dtype: float64

We now use the `clip` method which trims values in a DataFrame at the given threshold. It has two parameters `lower` and `upper` which can either be a single value or a sequence of values. We set each parameter to the Series containing the appropriate quartile. The `clip` method requires that we use the `axis` parameter to inform pandas which direction to align the given sequence. We align with the columns.

In [5]:
stocks_final = stocks.clip(lower, upper, axis='columns')
stocks_final.head()

Unnamed: 0_level_0,MSFT,AAPL,SLB,AMZN,TSLA,XOM,WMT,T,FB,V
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1999-10-25,29.84,3.91,25.62,82.75,,32.62,38.99,16.78,,
1999-10-26,29.82,3.91,25.62,81.25,,32.62,37.62,17.28,,
1999-10-27,29.33,3.91,25.62,75.94,,32.62,37.62,18.27,,
1999-10-28,29.01,3.91,25.62,71.0,,32.62,38.85,19.79,,
1999-10-29,29.88,3.91,25.62,70.62,,32.62,39.25,20.0,,


### Verify correctness

Let's verify that each column contains the correct values by taking the min and max of each one.

In [7]:
stocks_final.agg(['min', 'max'])

Unnamed: 0,MSFT,AAPL,SLB,AMZN,TSLA,XOM,WMT,T,FB,V
min,19.15,3.91,25.62,40.46,33.9375,32.62,37.62,14.5,62.3,19.475
max,39.26,90.59,66.29,362.7,260.47,71.81,65.15,26.23,162.305,80.3375


### One line

Using one line of code, we can pass the Series containing the quartiles directly to the `clip` method.

In [8]:
stocks.clip(stocks.quantile(.25), stocks.quantile(.75), axis='columns').head()

Unnamed: 0_level_0,MSFT,AAPL,SLB,AMZN,TSLA,XOM,WMT,T,FB,V
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1999-10-25,29.84,3.91,25.62,82.75,,32.62,38.99,16.78,,
1999-10-26,29.82,3.91,25.62,81.25,,32.62,37.62,17.28,,
1999-10-27,29.33,3.91,25.62,75.94,,32.62,37.62,18.27,,
1999-10-28,29.01,3.91,25.62,71.0,,32.62,38.85,19.79,,
1999-10-29,29.88,3.91,25.62,70.62,,32.62,39.25,20.0,,


### Unpacking trickery

This is just for fun, but you can pass the `quantile` method a list to return multiple quantiles on each column.

In [9]:
stocks.quantile([.25, .75])

Unnamed: 0,MSFT,AAPL,SLB,AMZN,TSLA,XOM,WMT,T,FB,V
0.25,19.15,3.91,25.62,40.46,33.9375,32.62,37.62,14.5,62.3,19.475
0.75,39.26,90.59,66.29,362.7,260.47,71.81,65.15,26.23,162.305,80.3375


pandas default iteration is over the column names. But, numpy defaults its iteration by row. We can use this knowledge to unpack each of the first two rows as the first two parameters in the `clip` method after using the `values` attribute to get the numpy array from the DataFrame.

In [10]:
stocks.clip(*stocks.quantile([.25, .75]).values, axis='columns').head(3)

Unnamed: 0_level_0,MSFT,AAPL,SLB,AMZN,TSLA,XOM,WMT,T,FB,V
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1999-10-25,29.84,3.91,25.62,82.75,,32.62,38.99,16.78,,
1999-10-26,29.82,3.91,25.62,81.25,,32.62,37.62,17.28,,
1999-10-27,29.33,3.91,25.62,75.94,,32.62,37.62,18.27,,


# Become a pandas expert

If you are looking to completely master the pandas library and become a trusted expert for doing data science work, check out my book [Master Data Analysis with Python][1]. It comes with over 300 exercises with detailed solutions covering the pandas library in-depth.

[1]: https://www.dunderdata.com/master-data-analysis-with-python