# What is a Unit of Observation?

Understanding the context behind numbers

## Problem

### Situation 1 - Income

Imagine:

1. You want to study **‚Äúincome.‚Äù**
2. You collect numbers: Row 1 = individual? Row 2 = family? Row 3 = household?  

You can calculate averages, sums, percentages‚Ä¶ but then a question arises:  

> What do these numbers really represent?  
> Who or what is each number describing?  

üî∫ At this point, you realize there‚Äôs something missing that makes the numbers meaningful.

### Situation 2 ‚Äî School Performance

Imagine:

1. You want to analyze **‚Äúschool performance.‚Äù**
2. Rows could represent students, classes, or entire schools. 

You try to summarize performance and draw conclusions. Suddenly you wonder:  

> Are we comparing individuals, groups, or institutions?  
> Does the measure reflect the level you intended?  

üî∫ The lack of clarity makes any conclusion questionable.

### üîπ **Final question:**  

> If numbers exist, but we don‚Äôt know **who or what they are measured for**, **what gives them comparative meaning?**

## Solving the Problem

### Python Libraries

In [1]:
import pandas as pd

### Situation 1: Income

#### Problem Demonstration

We start with a small dataset **without a clearly defined unit**:

In [2]:
data_income = {
    "record": ["A", "B", "C", "D", "E"],
    "income": [50000, 60000, 55000, 45000, 70000]
}

df_income = pd.DataFrame(data_income)
df_income

Unnamed: 0,record,income
0,A,50000
1,B,60000
2,C,55000
3,D,45000
4,E,70000


**Observation**: We have numbers, but we don‚Äôt know if they represent individuals, households, or families. Any summary (mean, sum) is ambiguous.

#### Solution: Define the Unit

Let‚Äôs say we define ‚Äúunit = individual‚Äù. Now every row represents a single person‚Äôs income.

In [3]:
# Assign unit
df_income["unit"] = "individual"

cols_income = ["record", "unit", "income"]
df_income = df_income[cols_income]

df_income

Unnamed: 0,record,unit,income
0,A,individual,50000
1,B,individual,60000
2,C,individual,55000
3,D,individual,45000
4,E,individual,70000


Now you can, for example, compute meaningful summaries

In [4]:
# Average income
average_income = df_income["income"].mean()
print("Average income per individual:", average_income)

# Total income
total_income = df_income["income"].sum()
print("Total income for all individuals:", total_income)

Average income per individual: 56000.0
Total income for all individuals: 280000


### Situation 2: School Performance

#### Problem Demonstration

We start with a small dataset **without a clearly defined unit**:

In [5]:
# Dataset without defined unit
data_school_performance = {
    "record": ["A", "B", "C", "D"],
    "performance": [85, 90, 78, 92]
}

df_school_performance = pd.DataFrame(data_school_performance)
df_school_performance

Unnamed: 0,record,performance
0,A,85
1,B,90
2,C,78
3,D,92


**Observation**: We have numbers (grades or scores), but we don‚Äôt know if they represent individual students, classes, or entire schools. Any summary could be misleading.

#### Solution: Define the Unit

Let‚Äôs define ‚Äúunit = student‚Äù. Each row now represents a single student‚Äôs performance.

In [6]:
df_school_performance["unit"] = "student"

cols = ["record", "unit", "performance"]
df_school_performance = df_school_performance[cols]

df_school_performance

Unnamed: 0,record,unit,performance
0,A,student,85
1,B,student,90
2,C,student,78
3,D,student,92


Now you can, for example, compute meaningful summaries

In [8]:
# Average performance
average_school_performance = df_school_performance["performance"].mean()
print("Average performance per student:", average_school_performance)

# Maximum and minimum
max_school_performance = df_school_performance["performance"].max()
min_school_performance = df_school_performance["performance"].min()
print(f"Highest performance: {max_school_performance}, Lowest performance: {min_school_performance}")

Average performance per student: 86.25
Highest performance: 92, Lowest performance: 78


## Conclusions

### Units define context

Numbers are meaningless without knowing who or what they describe.

### Units give meaning to summaries

Averages, sums, and percentages are interpretable only per defined unit (e.g., per student, per individual).

### Assign units before analysis

This ensures all calculations, visualizations, and comparisons are consistent and meaningful.

## üîπ Key takeaway: 

Always ask: ‚ÄúWho or what does this number describe?‚Äù