# Counting values

## Setup

In [1]:
import pandas as pd

## Creation

Creation of an example DataFrame (starting from a dictionary of dictionaries):

In [2]:
data = {
    "Capital": {
        "Spain": "Madrid",
        "Belgium": "Brussels",
        "France": "Paris",
        "Italy": "Roma",
        "Germany": "Berlin",
        "Portugal": "Lisbon",
        "Norway": "Oslo",
        "Greece": "Athens",
    },
    "Population": {
        "Spain": 46733038,
        "Belgium": 11449656,
        "France": 67076000,
        "Italy": 60390560,
        "Germany": 83122889,
        "Portugal": 10295909,
        "Norway": 5391369,
        "Greece": 10718565,
    },
    "Monarch": {
        "Spain": "Felipe VI",
        "Belgium": "Philippe",
        "Norway": "Harald V",
    },
    "Area": {
        "Spain": 505990,
        "Belgium": 30688,
        "France": 640679,
        "Italy": 301340,
        "Germany": None,
        "Portugal": 92212,
        "Norway": 385207,
        "Greece": 131957,
    },
    "Currency": {
        "Spain": "EUR",
        "Belgium": "EUR",
        "France": "EUR",
        "Italy": "EUR",
        "Germany": "EUR",
        "Portugal": None,
        "Norway": "NOK",
        "Greece": "EUR",
    },
    "Formation": {
        "Spain": "1715-06-09",
        "Belgium": "1830-10-04",
        "France": "1792-09-22",
        "Italy": None,
        "Germany": None,
        "Portugal": None,
        "Norway": None,
        "Greece": None,
    },
}

In [3]:
# For now, let's forget about these steps:
df = pd.DataFrame(data)
df["Density"] = df["Population"] / df["Area"]
df["Capital"] = df["Capital"].astype("string")
df["Monarch"] = df["Monarch"].astype("string")
df["Area"] = df["Area"].astype("Int64")
df["Currency"] = df["Currency"].astype("category")
df["Formation"] = df["Formation"].astype("datetime64[ns]")

Apple stock data, taken from the [`matplotlib` sample datasets](https://github.com/matplotlib/sample_data/blob/master/aapl.csv)

In [4]:
# For now, let's forget about these steps:
apple = pd.read_csv("AAPL.csv")
apple["Date"] = apple["Date"].astype("datetime64[ns]")
apple["Open"] = apple["Open"] // 10 * 10
apple["Close"] = apple["Close"] // 10 * 10
apple = apple.set_index("Date")
apple = apple.sort_index()
apple.at["1984-09-07", "Open"] = None
apple.loc["1984-09-10":"1984-09-11", "High"] = None
apple.loc["1984-09-10":"1984-09-12", "Low"] = None

## Demo 1: Counting values

In [5]:
df

Unnamed: 0,Capital,Population,Monarch,Area,Currency,Formation,Density
Spain,Madrid,46733038,Felipe VI,505990.0,EUR,1715-06-09,92.359608
Belgium,Brussels,11449656,Philippe,30688.0,EUR,1830-10-04,373.098801
France,Paris,67076000,,640679.0,EUR,1792-09-22,104.695175
Italy,Roma,60390560,,301340.0,EUR,NaT,200.406717
Germany,Berlin,83122889,,,EUR,NaT,
Portugal,Lisbon,10295909,,92212.0,,NaT,111.654763
Norway,Oslo,5391369,Harald V,385207.0,NOK,NaT,13.996031
Greece,Athens,10718565,,131957.0,EUR,NaT,81.227711


Count values in a column:

In [6]:
df["Currency"].value_counts()

EUR    6
NOK    1
Name: Currency, dtype: int64

Count values in a column, including missing values:

In [7]:
df["Currency"].value_counts(dropna=False)

EUR    6
NOK    1
NaN    1
Name: Currency, dtype: int64

Count values in a column, and show results in descending order:

In [8]:
df["Currency"].value_counts(ascending=True)

NOK    1
EUR    6
Name: Currency, dtype: int64

Count values in a column, and normalize the results:

In [9]:
df["Currency"].value_counts(normalize=True)

EUR    0.857143
NOK    0.142857
Name: Currency, dtype: float64

## Exercise 1

In [10]:
apple.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1984-09-07,,26.87,26.25,20.0,2981600,3.02
1984-09-10,20.0,,,20.0,2346400,3.01
1984-09-11,20.0,,,20.0,5444000,3.07
1984-09-12,20.0,27.0,,20.0,4773600,2.98
1984-09-13,20.0,27.62,27.5,20.0,7429600,3.14


Count values in the "Open" column:

In [12]:
apple.Open.value_counts()

20.0     1260
30.0     1192
40.0     1097
10.0      818
50.0      509
60.0      346
70.0      186
80.0      112
90.0       88
120.0      83
130.0      62
170.0      58
100.0      56
110.0      53
180.0      49
160.0      35
140.0      29
150.0      28
190.0      18
200.0       1
Name: Open, dtype: int64

Count values in the "Open" column, including missing values:

In [13]:
apple.Open.value_counts(dropna=False)

20.0     1260
30.0     1192
40.0     1097
10.0      818
50.0      509
60.0      346
70.0      186
80.0      112
90.0       88
120.0      83
130.0      62
170.0      58
100.0      56
110.0      53
180.0      49
160.0      35
140.0      29
150.0      28
190.0      18
200.0       1
NaN         1
Name: Open, dtype: int64

Count values in the "Open" column, and show results in descending order:

In [14]:
apple.Open.value_counts(ascending=False)

20.0     1260
30.0     1192
40.0     1097
10.0      818
50.0      509
60.0      346
70.0      186
80.0      112
90.0       88
120.0      83
130.0      62
170.0      58
100.0      56
110.0      53
180.0      49
160.0      35
140.0      29
150.0      28
190.0      18
200.0       1
Name: Open, dtype: int64

Count values in the "Open" column, and normalize the results:

In [16]:
apple.Open.value_counts(normalize=True, ascending=False)

20.0     0.207237
30.0     0.196053
40.0     0.180428
10.0     0.134539
50.0     0.083717
60.0     0.056908
70.0     0.030592
80.0     0.018421
90.0     0.014474
120.0    0.013651
130.0    0.010197
170.0    0.009539
100.0    0.009211
110.0    0.008717
180.0    0.008059
160.0    0.005757
140.0    0.004770
150.0    0.004605
190.0    0.002961
200.0    0.000164
Name: Open, dtype: float64