# Analysis and Visualization of the NEET Population (15-24 years)

**Author:** Nina & Ligia

**Data Source:** `people_14_25_2023_2024_fullvars.csv`

**Objective:** This notebook analyzes microdata from the PNAD Contínua for Q4 2024, calculates the total NEET population and breaks it down by gender, and generates a chart to visualize the results.

ano — interview year (e.g., 2023, 2024, 2025).
Source: IBGE

trimestre — reference quarter (1 to 4).
Source: IBGE

id_uf — numeric code of the Federative Unit (state) (11=RO … 53=DF). For abbreviation/name, join with br_bd_diretorios_brasil.uf.
Source: Base dos Dados

V1022 – Dwelling location (urban/rural): 1=Urban, 2=Rural.
Source: IBGE (FTP docs)

V2007 – Sex: 1=Male, 2=Female.
Source: IBGE

V2009 – Age: completed years.
Source: IBGE

V2010 – Race/Color: 1=White, 2=Black, 3=Asian (Yellow), 4=Brown (Pardo), 5=Indigenous, 9=Ignored/Not declared.
Source: IBGE

V3002 – Attends school/course? 1=Yes, 2=No (basis for the “E” in NEET).
Source: IBGE

VD4002 – Labor force status in the reference week (derived): 1=Employed, 2=Unemployed, 3=Out of the labor force (basis for the “T” in NEET: not employed).
Source: IBGE (FTP docs)

V4032 – Contributes to a social security institute for this job? 1=Yes, 2=No (asked of employed; “not applicable” if not employed).
Source: IBGE (FTP docs)

VD4019 – Usual earnings from all jobs (derived): monthly nominal income (currency values).
Source: IBGE (FTP docs)

V1028 – Sample weight: historical “household/person weight” with corrections and post-stratification.
Note: for person-level analyses, the more common weight is V1032 (final weight) (and its replicate weights for variance).
Source: IBGE

### 1. Import Libraries

First, we import the necessary libraries for the analysis: `pandas` for data manipulation and `matplotlib` for creating charts.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import os

print("Libraries imported successfully!")

### 2. Load the Data

Now, let's load our pre-processed CSV file into a pandas DataFrame. The script checks if the file exists before attempting to load it.

In [None]:
input_filename = 'people_14_25_2023_2024_fullvars.csv'

if not os.path.exists(input_filename):
    print(f"\nError: The input file '{input_filename}' was not found.")
    print("Please make sure the file is in the same folder as this notebook.")
else:
    try:
        print(f"Loading data from '{input_filename}'...")
        df = pd.read_csv(input_filename)
        print("Data loaded successfully!")
        # Display the first few rows for verification
        display(df.head())
    except Exception as e:
        print(f"\nError: Could not read the file '{input_filename}'. It might be corrupted. Error: {e}")

### 3. Filter Data for Analysis

Let's focus our analysis on the relevant data:
1.  **Period:** Q4 2024.
2.  **Age:** Young people between 15 and 24 years old.

In [None]:
print("Filtering data for Q4 2024 and age between 15-24 years...")

# Filter by the specific year and quarter
df_periodo = df[(df['Year'] == 2024) & (df['Quarter'] == 4)].copy()

# Filter by the target age group
df_idade = df_periodo[(df_periodo['Age'] >= 15) & (df_periodo['Age'] <= 24)].copy()

if df_idade.empty:
    print("\nError: No data found for young people aged 15-24 in Q4 2024.")
else:
    print(f"Filter applied. Found {len(df_idade)} observations of young people in the period.")

### 4. Identify NEET Population and Calculate Totals

In this step, we apply the NEET definition to filter individuals who are not studying and not working. Then, we use the sample weights (`Weight_V1028`) to estimate the actual population and print the results.

In [None]:
print("Identifying the NEET population and calculating totals...")

# Identify NEET individuals
df_neet = df_idade[
    (df_idade['School_label'] == 'No') &
    (df_idade['Occupation_label'] != 'Occupied')
].copy()

# Calculate totals using the sample weights ('Weight_V1028')
total_jovens = df_idade['Weight_V1028'].sum()
total_neet = df_neet['Weight_V1028'].sum()
homens_neet = df_neet[df_neet['Sex_label'] == 'Male']['Weight_V1028'].sum()
mulheres_neet = df_neet[df_neet['Sex_label'] == 'Female']['Weight_V1028'].sum()

# Calculate percentage
percentagem_neet = (total_neet / total_jovens) * 100 if total_jovens > 0 else 0

print("\n--- Results for Q4 2024 (15-24 years old) ---")
print(f"Total Youth Population: {total_jovens:,.0f}")
print(f"Total NEET Youth: {total_neet:,.0f} ({percentagem_neet:.1f}%)")
print(f"  - NEET Men: {homens_neet:,.0f}")
print(f"  - NEET Women: {mulheres_neet:,.0f}")

### 5. Generate Chart

Finally, we create a bar chart to visualize the distribution of the NEET population, including the total and the breakdown by gender.

In [None]:
print("\nGenerating the chart...")

# Data for the chart
categories = ['Total NEET', 'Men', 'Women']
valores = [total_neet, homens_neet, mulheres_neet]
cores = ['#003f5c', '#58508d', '#bc5090']

fig, ax = plt.subplots(figsize=(10, 7))

# Create the bars
bars = ax.bar(categories, valores, color=cores)

# Add title and labels
ax.set_title('Analysis of the NEET Population (15-24 years) - Q4 2024', fontsize=16, pad=20)
ax.set_ylabel('Estimated Population', fontsize=12)
ax.tick_params(axis='x', labelsize=12)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.yaxis.grid(True, linestyle='--', alpha=0.6)
ax.get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "{:,.0f}".format(int(x))))

# Add data labels on top of the bars
for bar in bars:
    yval = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2.0, yval + (max(valores) * 0.01), f'{yval:,.0f}', ha='center', va='bottom', fontsize=11)

# Add summary text below the title
summary_text = (
    f"Total Youth Population (15-24 years) in the period: {total_jovens:,.0f}\n"
    f"NEET Youth represent {percentagem_neet:.1f}% of the total"
)
fig.text(0.5, 0.9, summary_text, ha='center', fontsize=12, style='italic', color='gray')

plt.tight_layout(rect=[0, 0, 1, 0.9]) # Adjust layout to make space for text
plt.show()