In [1]:
import pandas as pd

# Value Counts & Frequency Analysis

`value_counts()`  
This topic answers a very common question in EDA:  

*“How often does each value appear?”*

In [4]:
# Sample DataFrame
df = pd.DataFrame({
    "Student": ["Onkar", "Amit", "Sara", "Rohit", "Neha", "Karan", "Amit"],
    "Department": ["IT", "IT", "HR", "IT", "HR", "Finance", "IT"],
    "Grade": ["A", "B", "A", "A", "B", "A", "B"]
})

df

Unnamed: 0,Student,Department,Grade
0,Onkar,IT,A
1,Amit,IT,B
2,Sara,HR,A
3,Rohit,IT,A
4,Neha,HR,B
5,Karan,Finance,A
6,Amit,IT,B


## 1. Basic `value_counts()`
Count values in a column

In [8]:
df["Department"].value_counts()

Department
IT         4
HR         2
Finance    1
Name: count, dtype: int64

Returned a Series.  
Where,  
    **Index** - Unique values in specified column.  
    **Values** - count

### **In resultant series the counts are bydefault arranged in Descending order.**

## 2. Sort order by index

In [9]:
df["Department"].value_counts().sort_index()

Department
Finance    1
HR         2
IT         4
Name: count, dtype: int64

## 3. Include missing values

`.value_counts()` by default exclude NaN values. So to include that just pass `dropna=False`

In [10]:
df["Department"].value_counts(dropna=False)

Department
IT         4
HR         2
Finance    1
Name: count, dtype: int64

### It shows NaN values count also

## 4. Normalize (Percentages / Prpportions)
Get proportion instead of counts

In [11]:
df["Department"].value_counts(normalize=True)

Department
IT         0.571429
HR         0.285714
Finance    0.142857
Name: proportion, dtype: float64

In [12]:
# To convert it into percentages
df["Department"].value_counts(normalize=True) * 100

Department
IT         57.142857
HR         28.571429
Finance    14.285714
Name: proportion, dtype: float64

## 5. Value Counts for Multiple Columns (using `groupby`)

In [19]:
df.groupby("Department")["Grade"].value_counts()

Department  Grade
Finance     A        1
HR          A        1
            B        1
IT          A        2
            B        2
Name: count, dtype: int64

In [23]:
aaa = df.groupby("Department")["Grade"].value_counts()
aaa = aaa.reset_index()
aaa

Unnamed: 0,Department,Grade,count
0,Finance,A,1
1,HR,A,1
2,HR,B,1
3,IT,A,2
4,IT,B,2


## 6. Value Counts vs Crosstab
`.value_counts()` -> One variables  
`.crosstab()` -> Two variables

In [24]:
df["Grade"].value_counts()

Grade
A    4
B    3
Name: count, dtype: int64

In [25]:
pd.crosstab(df["Department"], df["Grade"])

Grade,A,B
Department,Unnamed: 1_level_1,Unnamed: 2_level_1
Finance,1,0
HR,1,1
IT,2,2


## 7. Apply `.value_counts()` on entire DataFrame

In [30]:
df.apply(pd.Series.value_counts)

Unnamed: 0,Student,Department,Grade
A,,,4.0
Amit,2.0,,
B,,,3.0
Finance,,1.0,
HR,,2.0,
IT,,4.0,
Karan,1.0,,
Neha,1.0,,
Onkar,1.0,,
Rohit,1.0,,



---

### Summary

`.value_counts()`

1. Value counts from col. -> `df["col"].value_counts()`
2. Value counts with NaN vals -> pass-`dropna=False` (By default `.value_counts()` method drops NaN values)
3. For proportions -> pass-`normalize=True`
4. For percentage -> multiple proportion column with 100 `df["col"].value_counts(normalize=True)*100`
5. `value_counts` with `groupby` -> We get count of values (col2). ex. `df.groupby("col1")["col2"].value_counts()`
6. `Value_counts` vs `crosstab` -> **value_counts**-One variable & **crosstab**-Two variables
7. Apple value_conts in enire df -> `df.apply(pd.Series.value_counts)`