## Aggregating DataFrames

#### Dropping duplicates

Removing duplicates is an essential skill to get accurate counts because often, you don't want to count the same thing multiple times. In this exercise, you'll create some new DataFrames using unique values from `sales`.
`sales` is available and `pandas` is imported as `pd`.

### Instructions

* Remove rows of `sales` with duplicate pairs of `store` and `type` and save as `store_types` and print the head.
* Remove rows of `sales` with duplicate pairs of `store` and `department` and save as `store_depts` and print the heead.
* Subset the rows that are holiday weeks using the `is_holiday` column, and drop the duplicate `date`s, saving as `holiday_dates`.
* Select the `date` column of `holiday_dates`, and print.

In [1]:
# importing pandas
import pandas as pd

# importing sales dataset
sales = pd.read_csv("../datasets/sales_subset.csv")
sales.head()

Unnamed: 0.1,Unnamed: 0,store,type,department,date,weekly_sales,is_holiday,temperature_c,fuel_price_usd_per_l,unemployment
0,0,1,A,1,2010-02-05,24924.5,False,5.727778,0.679451,8.106
1,1,1,A,1,2010-03-05,21827.9,False,8.055556,0.693452,8.106
2,2,1,A,1,2010-04-02,57258.43,False,16.816667,0.718284,7.808
3,3,1,A,1,2010-05-07,17413.94,False,22.527778,0.748928,7.808
4,4,1,A,1,2010-06-04,17558.09,False,27.05,0.714586,7.808


In [3]:
# Drop duplicate store/type combinations
store_types = sales.drop_duplicates(["store", "type"])
store_types

Unnamed: 0.1,Unnamed: 0,store,type,department,date,weekly_sales,is_holiday,temperature_c,fuel_price_usd_per_l,unemployment
0,0,1,A,1,2010-02-05,24924.5,False,5.727778,0.679451,8.106
901,901,2,A,1,2010-02-05,35034.06,False,4.55,0.679451,8.324
1798,1798,4,A,1,2010-02-05,38724.42,False,6.533333,0.686319,8.623
2699,2699,6,A,1,2010-02-05,25619.0,False,4.683333,0.679451,7.259
3593,3593,10,B,1,2010-02-05,40212.84,False,12.411111,0.782478,9.765
4495,4495,13,A,1,2010-02-05,46761.9,False,-0.261111,0.704283,8.316
5408,5408,14,A,1,2010-02-05,32842.31,False,-2.605556,0.735455,8.992
6293,6293,19,A,1,2010-02-05,21500.58,False,-6.133333,0.780365,8.35
7199,7199,20,A,1,2010-02-05,46021.21,False,-3.377778,0.735455,8.187
8109,8109,27,A,1,2010-02-05,32313.79,False,-2.672222,0.780365,8.237


In [4]:
# Drop duplicate store/department combinations
store_depts = sales.drop_duplicates(["store", "department"])
store_depts

Unnamed: 0.1,Unnamed: 0,store,type,department,date,weekly_sales,is_holiday,temperature_c,fuel_price_usd_per_l,unemployment
0,0,1,A,1,2010-02-05,24924.50,False,5.727778,0.679451,8.106
12,12,1,A,2,2010-02-05,50605.27,False,5.727778,0.679451,8.106
24,24,1,A,3,2010-02-05,13740.12,False,5.727778,0.679451,8.106
36,36,1,A,4,2010-02-05,39954.04,False,5.727778,0.679451,8.106
48,48,1,A,5,2010-02-05,32229.38,False,5.727778,0.679451,8.106
...,...,...,...,...,...,...,...,...,...,...
10715,10715,39,A,95,2010-02-05,88385.24,False,6.833333,0.679451,8.554
10727,10727,39,A,96,2010-02-05,21450.05,False,6.833333,0.679451,8.554
10739,10739,39,A,97,2010-02-05,21162.05,False,6.833333,0.679451,8.554
10751,10751,39,A,98,2010-02-05,9023.09,False,6.833333,0.679451,8.554


In [6]:
# Subset the rows where is_holiday is True and drop duplicates dates
holiday_dates = sales[sales["is_holiday"]].drop_duplicates("date")
holiday_dates

Unnamed: 0.1,Unnamed: 0,store,type,department,date,weekly_sales,is_holiday,temperature_c,fuel_price_usd_per_l,unemployment
498,498,1,A,45,2010-09-10,11.47,True,25.938889,0.677602,7.787
691,691,1,A,77,2011-11-25,1431.0,True,15.633333,0.854861,7.866
2315,2315,4,A,47,2010-02-12,498.0,True,-1.755556,0.679715,8.623
6735,6735,19,A,39,2012-09-07,13.41,True,22.333333,1.076766,8.193
6810,6810,19,A,47,2010-12-31,-449.0,True,-1.861111,0.881278,8.067
6815,6815,19,A,47,2012-02-10,15.0,True,0.338889,1.010723,7.943
6820,6820,19,A,48,2011-09-09,197.0,True,20.155556,1.038197,7.806


In [6]:
# See the columns you calculated
print(sales_1_1[["date", "weekly_sales", "cum_weekly_sales", "cum_max_sales"]])

             date  weekly_sales  cum_weekly_sales  cum_max_sales
0      2010-02-05      24924.50      2.492450e+04       24924.50
6437   2010-02-05      38597.52      1.629610e+08       38597.52
1249   2010-02-05       3840.21      2.668539e+07       38597.52
6449   2010-02-05      17590.59      1.633879e+08       38597.52
6461   2010-02-05       4929.87      1.635954e+08       38597.52
...           ...           ...               ...            ...
3592   2012-10-05        440.00      8.543040e+07      293966.05
8108   2012-10-05        660.00      2.028157e+08      293966.05
10773  2012-10-05        915.00      2.568947e+08      293966.05
6257   2012-10-12          3.00      1.583335e+08      293966.05
3384   2012-10-26        -21.63      7.879133e+07      293966.05

[10774 rows x 4 columns]
