# Student DataFrame with Income Binning (INR)

This notebook creates a sample dataset of 20 students with several attributes. It includes an `annual_income_inr` feature (in Indian Rupees) and bins it into three categories: Poor, Middle Class, and Rich.

- Bins (right=False):
  - Poor: [0, 300,000)
  - Middle Class: [300,000, 1,000,000)
  - Rich: [1,000,000, ∞)

The dataset below is fixed (no randomness) so running the notebook always produces identical output.


In [4]:
# Deterministic student dataset and income binning (INR) — identical output every run
import pandas as pd
from IPython.display import display

# Hand-crafted 20-row dataset (no randomness)
students = [
    {"student_id": 1,  "name": "Aarav",   "age": 15, "class": "X",  "section": "A", "math_score": 88, "science_score": 79, "english_score": 91, "attendance_pct": 95.2, "extracurriculars": 1, "annual_income_inr": 280_000},  # Poor
    {"student_id": 2,  "name": "Vivaan",  "age": 16, "class": "XI", "section": "B", "math_score": 76, "science_score": 83, "english_score": 72, "attendance_pct": 88.5, "extracurriculars": 2, "annual_income_inr": 450_000},  # Middle
    {"student_id": 3,  "name": "Aditya",  "age": 17, "class": "XI", "section": "C", "math_score": 92, "science_score": 90, "english_score": 86, "attendance_pct": 96.1, "extracurriculars": 3, "annual_income_inr": 1_200_000},# Rich
    {"student_id": 4,  "name": "Vihaan",  "age": 14, "class": "IX", "section": "D", "math_score": 69, "science_score": 74, "english_score": 78, "attendance_pct": 82.4, "extracurriculars": 0, "annual_income_inr": 310_000},  # Middle
    {"student_id": 5,  "name": "Arjun",   "age": 18, "class": "XII","section": "E", "math_score": 85, "science_score": 88, "english_score": 80, "attendance_pct": 91.7, "extracurriculars": 1, "annual_income_inr": 980_000},  # Middle
    {"student_id": 6,  "name": "Sai",     "age": 15, "class": "X",  "section": "A", "math_score": 58, "science_score": 64, "english_score": 71, "attendance_pct": 76.9, "extracurriculars": 0, "annual_income_inr": 150_000},  # Poor
    {"student_id": 7,  "name": "Krishna", "age": 16, "class": "XI", "section": "B", "math_score": 73, "science_score": 70, "english_score": 75, "attendance_pct": 89.2, "extracurriculars": 2, "annual_income_inr": 650_000},  # Middle
    {"student_id": 8,  "name": "Ishan",   "age": 17, "class": "XI", "section": "C", "math_score": 90, "science_score": 87, "english_score": 84, "attendance_pct": 93.3, "extracurriculars": 3, "annual_income_inr": 220_000},  # Poor
    {"student_id": 9,  "name": "Rohan",   "age": 18, "class": "XII","section": "D", "math_score": 67, "science_score": 72, "english_score": 69, "attendance_pct": 81.5, "extracurriculars": 1, "annual_income_inr": 300_000},  # Middle (boundary)
    {"student_id": 10, "name": "Karthik", "age": 14, "class": "IX", "section": "E", "math_score": 82, "science_score": 77, "english_score": 85, "attendance_pct": 92.8, "extracurriculars": 2, "annual_income_inr": 1_050_000},# Rich
    {"student_id": 11, "name": "Ananya",  "age": 15, "class": "X",  "section": "A", "math_score": 81, "science_score": 86, "english_score": 88, "attendance_pct": 97.0, "extracurriculars": 1, "annual_income_inr": 260_000},  # Poor
    {"student_id": 12, "name": "Diya",    "age": 16, "class": "XI", "section": "B", "math_score": 74, "science_score": 79, "english_score": 83, "attendance_pct": 85.6, "extracurriculars": 1, "annual_income_inr": 750_000},  # Middle
    {"student_id": 13, "name": "Ishita",  "age": 17, "class": "XI", "section": "C", "math_score": 95, "science_score": 93, "english_score": 89, "attendance_pct": 98.4, "extracurriculars": 3, "annual_income_inr": 980_000},  # Middle
    {"student_id": 14, "name": "Meera",   "age": 18, "class": "XII","section": "D", "math_score": 62, "science_score": 68, "english_score": 74, "attendance_pct": 79.3, "extracurriculars": 0, "annual_income_inr": 2_050_000},# Rich
    {"student_id": 15, "name": "Saanvi",  "age": 14, "class": "IX", "section": "E", "math_score": 70, "science_score": 73, "english_score": 77, "attendance_pct": 87.1, "extracurriculars": 1, "annual_income_inr": 295_000},  # Poor
    {"student_id": 16, "name": "Aisha",   "age": 15, "class": "X",  "section": "A", "math_score": 84, "science_score": 82, "english_score": 90, "attendance_pct": 94.6, "extracurriculars": 2, "annual_income_inr": 350_000},  # Middle
    {"student_id": 17, "name": "Riya",    "age": 16, "class": "XI", "section": "B", "math_score": 77, "science_score": 80, "english_score": 79, "attendance_pct": 90.4, "extracurriculars": 2, "annual_income_inr": 890_000},  # Middle
    {"student_id": 18, "name": "Priya",   "age": 17, "class": "XI", "section": "C", "math_score": 59, "science_score": 63, "english_score": 68, "attendance_pct": 75.8, "extracurriculars": 0, "annual_income_inr": 180_000},  # Poor
    {"student_id": 19, "name": "Tanvi",   "age": 18, "class": "XII","section": "D", "math_score": 87, "science_score": 89, "english_score": 92, "attendance_pct": 95.5, "extracurriculars": 3, "annual_income_inr": 1_000_000},# Rich (boundary)
    {"student_id": 20, "name": "Kavya",   "age": 14, "class": "IX", "section": "E", "math_score": 66, "science_score": 71, "english_score": 73, "attendance_pct": 83.9, "extracurriculars": 1, "annual_income_inr": 240_000},  # Poor
]

# Build DataFrame
df = pd.DataFrame(students)

df

Unnamed: 0,student_id,name,age,class,section,math_score,science_score,english_score,attendance_pct,extracurriculars,annual_income_inr
0,1,Aarav,15,X,A,88,79,91,95.2,1,280000
1,2,Vivaan,16,XI,B,76,83,72,88.5,2,450000
2,3,Aditya,17,XI,C,92,90,86,96.1,3,1200000
3,4,Vihaan,14,IX,D,69,74,78,82.4,0,310000
4,5,Arjun,18,XII,E,85,88,80,91.7,1,980000
5,6,Sai,15,X,A,58,64,71,76.9,0,150000
6,7,Krishna,16,XI,B,73,70,75,89.2,2,650000
7,8,Ishan,17,XI,C,90,87,84,93.3,3,220000
8,9,Rohan,18,XII,D,67,72,69,81.5,1,300000
9,10,Karthik,14,IX,E,82,77,85,92.8,2,1050000


In [6]:
# INR string for readability
df["annual_income_inr_str"] = df["annual_income_inr"].map(lambda x: f"₹{x:,.0f}")

# Bin incomes into categories (left-closed, right-open)
bins = [0, 300_000, 1_000_000, float("inf")]
labels = ["Poor", "Middle Class", "Rich"]
df["income_bracket"] = pd.cut(df["annual_income_inr"], bins=bins, labels=labels, right=False)

# Deterministic output
print("Counts by income bracket:\n", df["income_bracket"].value_counts(dropna=False))
print("\nSample rows:")
cols = [
    "student_id",
    "name",
    "age",
    "class",
    "section",
    "annual_income_inr",
    "annual_income_inr_str",
    "income_bracket",
]
display(df[cols].head(10))

Counts by income bracket:
 income_bracket
Middle Class    9
Poor            7
Rich            4
Name: count, dtype: int64

Sample rows:


Unnamed: 0,student_id,name,age,class,section,annual_income_inr,annual_income_inr_str,income_bracket
0,1,Aarav,15,X,A,280000,"₹280,000",Poor
1,2,Vivaan,16,XI,B,450000,"₹450,000",Middle Class
2,3,Aditya,17,XI,C,1200000,"₹1,200,000",Rich
3,4,Vihaan,14,IX,D,310000,"₹310,000",Middle Class
4,5,Arjun,18,XII,E,980000,"₹980,000",Middle Class
5,6,Sai,15,X,A,150000,"₹150,000",Poor
6,7,Krishna,16,XI,B,650000,"₹650,000",Middle Class
7,8,Ishan,17,XI,C,220000,"₹220,000",Poor
8,9,Rohan,18,XII,D,300000,"₹300,000",Middle Class
9,10,Karthik,14,IX,E,1050000,"₹1,050,000",Rich


## Example: Age to Age_Group binning

We bin ages into groups using left-closed, right-open intervals (right=False):
- Child: [0, 13)
- Teen: [13, 20)
- Adult: [20, 60)
- Senior: [60, ∞)


In [7]:
import pandas as pd

# Example dataset
df_age = pd.DataFrame({
    'Age': [5, 12, 17, 24, 37, 45, 59, 80]
})

# Define age bins and labels (left-closed, right-open)
age_bins = [0, 13, 20, 60, float('inf')]
age_labels = ['Child', 'Teen', 'Adult', 'Senior']

# Create Age_Group column
# Note: right=False makes intervals like [0,13), [13,20), [20,60), [60,∞)
df_age['Age_Group'] = pd.cut(df_age['Age'], bins=age_bins, labels=age_labels, right=False)

print('Age bin counts:\n', df_age['Age_Group'].value_counts(dropna=False))
df_age

Age bin counts:
 Age_Group
Adult     4
Child     2
Teen      1
Senior    1
Name: count, dtype: int64


Unnamed: 0,Age,Age_Group
0,5,Child
1,12,Child
2,17,Teen
3,24,Adult
4,37,Adult
5,45,Adult
6,59,Adult
7,80,Senior
