# üßÆ Using `value_counts()` in Pandas
**Author:** Hamna Munir  
**Repository:** Python-Libraries-for-AI-ML  
**Topic:** 06_Value_Counts_Function

`value_counts()` is a Pandas function used to **count unique values** in a Series. It is one of the most commonly used methods for **data exploration, analysis, and feature engineering**.

---

## üìò Why Use `value_counts()`?
- Quickly summarize categorical data.
- Identify the frequency of each unique value.
- Useful for **EDA (Exploratory Data Analysis)**.
- Helps in preprocessing data for ML models.

## ----------------------------------------------------------
## Importing Pandas and Creating Sample DataFrame
## ----------------------------------------------------------
We will use a sample DataFrame to demonstrate `value_counts()`.

In [1]:
import pandas as pd

data = {
    'Name': ['Ali', 'Sara', 'Umar', 'Zoya', 'Omar', 'Ali'],
    'Age': [22, 25, 28, 30, 26, 22],
    'City': ['NY', 'LA', 'Chicago', 'Boston', 'NY', 'LA']
}

df = pd.DataFrame(data)
print("Sample DataFrame:\n", df)

Sample DataFrame:
   Name  Age  City
0  Ali   22   NY
1  Sara  25   LA
2  Umar  28   Chicago
3  Zoya  30   Boston
4  Omar  26   NY
5  Ali   22   LA


## üß© Counting Unique Values in a Series
You can use `value_counts()` on a column (Series) to count **how many times each unique value appears**.

### Example: Count occurrences of each Name

In [2]:
# Count unique values in 'Name' column
name_counts = df['Name'].value_counts()
print("Name Counts:\n", name_counts)

Name Counts:
Ali     2
Sara    1
Umar    1
Zoya    1
Omar    1
Name: Name, dtype: int64


## üß© Counting Values in Other Columns
You can also count values for other columns, like City.

In [3]:
# Count values in 'City' column
city_counts = df['City'].value_counts()
print("City Counts:\n", city_counts)

City Counts:
NY         2
LA         2
Chicago    1
Boston     1
Name: City, dtype: int64


## üß© Including Missing Values
By default, `value_counts()` **excludes NaN**. Use `dropna=False` to include them.

### Example:

In [4]:
# Adding a NaN value to demonstrate dropna=False
import numpy as np
df.loc[6] = ['Hira', 27, np.nan]

city_counts_nan = df['City'].value_counts(dropna=False)
print("City Counts Including NaN:\n", city_counts_nan)

City Counts Including NaN:
NY         2
LA         2
Chicago    1
Boston     1
NaN        1
Name: City, dtype: int64


## üß© Normalizing Counts
- `normalize=True` converts counts to **proportions/fractions** of the total.

### Example:

In [5]:
# Normalize to get proportions
name_counts_norm = df['Name'].value_counts(normalize=True)
print("Normalized Name Counts:\n", name_counts_norm)

Normalized Name Counts:
Ali     0.285714
Sara    0.142857
Umar    0.142857
Zoya    0.142857
Omar    0.142857
Hira    0.142857
Name: Name, dtype: float64


## üß© Sorting the Counts
- By default, `value_counts()` sorts in **descending order**.
- Use `sort_index()` or `ascending=True` for custom sorting.

### Example:

In [6]:
# Sort counts by index
sorted_counts = df['Name'].value_counts().sort_index()
print("Counts sorted by index:\n", sorted_counts)

Counts sorted by index:
Ali       2
Hira      1
Omar      1
Sara      1
Umar      1
Zoya      1
Name: Name, dtype: int64


## üìù Summary
- `value_counts()` provides **frequency counts** of unique values in a Series.
- Parameters:
  - `dropna` ‚Üí include/exclude NaN values
  - `normalize` ‚Üí convert counts to proportions
- Can be **sorted** by default (descending) or by index.
- Essential for **exploratory data analysis** and understanding categorical features.

**Next:** Handling Missing Values ‚Üí `07_Handling_Missing_Values.ipynb`