## Aggregating DataFrames

#### Counting categorical variables

Counting is a great way to get an overview of your data and to spot curiosities that you might not notice otherwise. In this exercise, you'll count the number of each type of store and the number of each department number using the DataFrames you created in the previous exercise:

In [None]:
# Drop duplicate store/type combinations
store_types = sales.drop_duplicates(subset=["store", "type"])

# Drop duplicate store/department combinations
store_depts = sales.drop_duplicates(subset=["store", "department"])

The `store_types` and `store_depts` DataFrame you created in the last exercise are available, and `pandas` is imported as `pd`.

### Instructions

* Count the number of stores of each store `type` in `store_types`.
* Count the proportion of stores of each store `type` in `store_types`.
* Count the number of different `department's in `store_depts`, sorting the counts in descending order.
* Count the proportion of different `department`s in `store_depts`, sorting the proportions in descending order.
* Count the proportion of different `department`s in `store_depts`, sorting the proportions in descending order.

In [1]:
# importing pandas
import pandas as pd

# importing sales dataset
sales = pd.read_csv("../datasets/sales_subset.csv")
sales.head()

Unnamed: 0.1,Unnamed: 0,store,type,department,date,weekly_sales,is_holiday,temperature_c,fuel_price_usd_per_l,unemployment
0,0,1,A,1,2010-02-05,24924.5,False,5.727778,0.679451,8.106
1,1,1,A,1,2010-03-05,21827.9,False,8.055556,0.693452,8.106
2,2,1,A,1,2010-04-02,57258.43,False,16.816667,0.718284,7.808
3,3,1,A,1,2010-05-07,17413.94,False,22.527778,0.748928,7.808
4,4,1,A,1,2010-06-04,17558.09,False,27.05,0.714586,7.808


In [2]:
# Drop duplicate store/type combinations
store_types = sales.drop_duplicates(["store", "type"])
store_types

Unnamed: 0.1,Unnamed: 0,store,type,department,date,weekly_sales,is_holiday,temperature_c,fuel_price_usd_per_l,unemployment
0,0,1,A,1,2010-02-05,24924.5,False,5.727778,0.679451,8.106
901,901,2,A,1,2010-02-05,35034.06,False,4.55,0.679451,8.324
1798,1798,4,A,1,2010-02-05,38724.42,False,6.533333,0.686319,8.623
2699,2699,6,A,1,2010-02-05,25619.0,False,4.683333,0.679451,7.259
3593,3593,10,B,1,2010-02-05,40212.84,False,12.411111,0.782478,9.765
4495,4495,13,A,1,2010-02-05,46761.9,False,-0.261111,0.704283,8.316
5408,5408,14,A,1,2010-02-05,32842.31,False,-2.605556,0.735455,8.992
6293,6293,19,A,1,2010-02-05,21500.58,False,-6.133333,0.780365,8.35
7199,7199,20,A,1,2010-02-05,46021.21,False,-3.377778,0.735455,8.187
8109,8109,27,A,1,2010-02-05,32313.79,False,-2.672222,0.780365,8.237


In [3]:
# Drop duplicate store/department combinations
store_depts = sales.drop_duplicates(["store", "department"])
store_depts

Unnamed: 0.1,Unnamed: 0,store,type,department,date,weekly_sales,is_holiday,temperature_c,fuel_price_usd_per_l,unemployment
0,0,1,A,1,2010-02-05,24924.50,False,5.727778,0.679451,8.106
12,12,1,A,2,2010-02-05,50605.27,False,5.727778,0.679451,8.106
24,24,1,A,3,2010-02-05,13740.12,False,5.727778,0.679451,8.106
36,36,1,A,4,2010-02-05,39954.04,False,5.727778,0.679451,8.106
48,48,1,A,5,2010-02-05,32229.38,False,5.727778,0.679451,8.106
...,...,...,...,...,...,...,...,...,...,...
10715,10715,39,A,95,2010-02-05,88385.24,False,6.833333,0.679451,8.554
10727,10727,39,A,96,2010-02-05,21450.05,False,6.833333,0.679451,8.554
10739,10739,39,A,97,2010-02-05,21162.05,False,6.833333,0.679451,8.554
10751,10751,39,A,98,2010-02-05,9023.09,False,6.833333,0.679451,8.554


In [7]:
# Count the number of stores of each type
store_counts = store_types["type"].value_counts()
store_counts

A    11
B     1
Name: type, dtype: int64

In [9]:
# Get the proportion of stores of each type
store_props = store_types["type"].value_counts(normalize=True)
store_props

A    0.916667
B    0.083333
Name: type, dtype: float64

In [11]:
# Count the number of each department number and sort
dept_counts_sorted = store_depts["department"].value_counts(sort=True)
dept_counts_sorted

1     12
55    12
72    12
71    12
67    12
      ..
37    10
48     8
50     6
39     4
43     2
Name: department, Length: 80, dtype: int64

In [13]:
# Get the proportion of departments of each number and sort
dept_props_sorted = store_depts["department"].value_counts(sort=True, normalize=True)
dept_props_sorted

1     0.012917
55    0.012917
72    0.012917
71    0.012917
67    0.012917
        ...   
37    0.010764
48    0.008611
50    0.006459
39    0.004306
43    0.002153
Name: department, Length: 80, dtype: float64