# Technique: 08 Concept Hierarchies

### What is this?
Concept Hierarchies organize data into different levels. We replace low-level, specific details (like a city) with high-level, general groups (like a province). 

### Why use it?
1. **Data Reduction**: It makes raw data smaller and easier to manage.
2. **Better Analysis**: It helps managers see big trends (e.g., "Total Sales in Ontario") instead of looking at every small sale.
3. **Drill-Down and Roll-Up**: You can view data in multiple "granularities" (levels of detail).

### Example:
* **Street < City < Province < Country**
* **{Kitchener, Waterloo} < Waterloo Region**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from data_generator import generate_dtt_dataset, GLOBAL_SEED

# Initialize Dataset
df = generate_dtt_dataset()
print(f'Dataset loaded with Global Seed: {GLOBAL_SEED}')
df.head()

Dataset loaded with Global Seed: 888


Unnamed: 0,Age,Annual_Salary,Household_Size,Education_Level,Region,Cluster_Feature_1,Cluster_Feature_2,Transaction_Amount
0,47,293814.560245,2.219505,Master,North,-0.666995,8.207288,3.257499
1,63,293814.560245,1.967882,High School,East,-1.843954,-8.553721,68.050678
2,55,293814.560245,1.82875,PhD,East,6.498745,-7.157678,8.111841
3,36,293814.560245,1.772328,Bachelor,East,-1.25746,-8.568788,27.766274
4,42,293814.560245,3.174114,Master,North,8.144192,-6.575686,13.867245


In [2]:
# Define our Hierarchy (The "Roll-Up" map)
# This is often done by a "Domain Expert"
location_map = {
    'North': 'Northern Territory',
    'South': 'Southern Territory',
    'East': 'Coastal Region',
    'West': 'Coastal Region'
}

# Look at the original "Region" column
print("Original Regions:")
print(df['Region'].unique())

Original Regions:
<StringArray>
['North', 'East', 'South', 'West']
Length: 4, dtype: str


## Applying the Transformation
We use the `.map()` function to replace specific regions with a higher-level group. This is the **"Level 1"** of our hierarchy.

In [3]:
# 1. Create the new high-level column
df['Higher_Level_Region'] = df['Region'].map(location_map)

# 2. Check the results
print("Detail Level (Region) vs. High Level (Territory/Region):")
print(df[['Region', 'Higher_Level_Region']].head(10))

# 3. See the reduction
print("\nUnique values in Detail Level:", df['Region'].nunique())
print("Unique values in High Level:", df['Higher_Level_Region'].nunique())

Detail Level (Region) vs. High Level (Territory/Region):
  Region Higher_Level_Region
0  North  Northern Territory
1   East      Coastal Region
2   East      Coastal Region
3   East      Coastal Region
4  North  Northern Territory
5  South  Southern Territory
6   East      Coastal Region
7   West      Coastal Region
8  North  Northern Territory
9   West      Coastal Region

Unique values in Detail Level: 4
Unique values in High Level: 3


## Numeric to Categorical Hierarchy
We can also do this for numbers like **Age**. We roll up specific ages into groups: **Youth**, **Adult**, or **Senior**.

In [4]:
# 1. Define the levels
# 20-35 = Youth, 35-55 = Middle-Aged, 55-65 = Senior
bins = [20, 35, 55, 65]
labels = ['Youth', 'Middle_Aged', 'Senior']

df['Age_Hierarchy'] = pd.cut(df['Age'], bins=bins, labels=labels)

# 2. Look at the data
print("Age Hierarchy Summary:")
print(df[['Age', 'Age_Hierarchy']].head(10))

Age Hierarchy Summary:
   Age Age_Hierarchy
0   47   Middle_Aged
1   63        Senior
2   55   Middle_Aged
3   36   Middle_Aged
4   42   Middle_Aged
5   45   Middle_Aged
6   49   Middle_Aged
7   53   Middle_Aged
8   64        Senior
9   53   Middle_Aged


### Summary from Lecture Slides:
* **Roll-Up**: Moving from specific details (City) to a general summary (Province).
* **Drill-Down**: Moving from a summary back down to the details.
* **Feature Engineering**: This is a foundation for making better data for Machine Learning.

By rolling up the data, it becomes much easier to see the "Big Picture" in your analysis!