<a href="https://colab.research.google.com/github/Pedro1Guevara/inegi_enoe_2024/blob/main/proyectoinegi.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Initial Setup and Data Loading

## Description
This section establishes the foundation for analysis:
- Installs required libraries (pandas, numpy, plotly)
- Defines state code to name mapping
- Sets up initial data loading and cleaning

## Main Functions
- `get_state_name`: Converts state numeric codes to names
- `load_and_process_enoe_data`: Loads and processes ENOE data from GitHub
  - Cleans invalid values
  - Calculates sample weights
  - Prepares variables for analysis

## Processed Data
- inhabitants: Number of people per dwelling
- shared_expenses: Shared expenses indicator
- household_count: Number of households in the dwelling

In [19]:
# Instalación de paquetes necesarios
!pip install pandas numpy plotly

import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

def get_state_name(code):
    """Map state codes to names"""
    estados = {
        1: 'Aguascalientes', 2: 'Baja California', 3: 'Baja California Sur',
        4: 'Campeche', 5: 'Coahuila', 6: 'Colima', 7: 'Chiapas', 8: 'Chihuahua',
        9: 'Ciudad de México', 10: 'Durango', 11: 'Guanajuato', 12: 'Guerrero',
        13: 'Hidalgo', 14: 'Jalisco', 15: 'México', 16: 'Michoacán',
        17: 'Morelos', 18: 'Nayarit', 19: 'Nuevo León', 20: 'Oaxaca',
        21: 'Puebla', 22: 'Querétaro', 23: 'Quintana Roo', 24: 'San Luis Potosí',
        25: 'Sinaloa', 26: 'Sonora', 27: 'Tabasco', 28: 'Tamaulipas',
        29: 'Tlaxcala', 30: 'Veracruz', 31: 'Yucatán', 32: 'Zacatecas'
    }
    return estados.get(code, 'No especificado')

def load_and_process_enoe_data():
    """Load and process ENOE survey data"""
    # Load data from GitHub
    df = pd.read_csv('https://raw.githubusercontent.com/Pedro1Guevara/inegi_enoe_2024/master/conjunto_de_datos_viv_enoe_2024_3t.csv')
    dict_df = pd.read_csv('https://raw.githubusercontent.com/Pedro1Guevara/inegi_enoe_2024/master/diccionario_datos_viv_enoe_2024_3t.csv')

    # Add state names
    df['estado'] = df['ent'].map(get_state_name)

    # Process household questions
    df['habitantes'] = pd.to_numeric(df['p1'], errors='coerce').fillna(0)
    df['comparten_gasto'] = pd.to_numeric(df['p2'], errors='coerce').fillna(0)
    df['num_hogares'] = pd.to_numeric(df['p3'], errors='coerce').fillna(0)
    df['peso_trimestral'] = df['fac_tri']

    # Remove invalid values
    df = df[df['habitantes'] > 0]

    return df



# Analysis and Visualization Functions

## Description
Contains the core statistical analysis and visualization functions:

## Main Functions
- `analyze_state_metrics`:
  - Calculates aggregated metrics by state
  - Uses sample weights for accurate estimates
  - Generates comparable indicators across states

- `create_comparison_charts`:
  - Creates interactive visualizations
  - Compares selected state with others
  - Shows distributions of inhabitants and shared expenses

## Visualizations
- Upper plot: Inhabitants per dwelling
- Lower plot: Percentage of households sharing expenses

In [20]:
def analyze_state_metrics(df):
    """Calculate comprehensive state metrics"""
    metrics = df.groupby('estado').agg({
        'habitantes': lambda x: np.average(x, weights=df.loc[x.index, 'peso_trimestral']),
        'num_hogares': lambda x: np.average(x, weights=df.loc[x.index, 'peso_trimestral']),
        'comparten_gasto': lambda x: (x == 1).mean() * 100,
        'peso_trimestral': 'sum',
        'v_sel': 'count'
    }).round(2)

    return metrics

def create_comparison_charts(metrics, selected_state):
    """Create comparative bar charts for selected state"""
    # Prepare data for comparison
    comp_data = metrics.copy()
    comp_data['color'] = 'Otros Estados'
    comp_data.loc[selected_state, 'color'] = 'Estado Seleccionado'

    # Create figure with subplots
    fig = make_subplots(
        rows=2, cols=1,
        subplot_titles=(
            'Habitantes por Vivienda',
            'Porcentaje que Comparte Gastos'
        ),
        vertical_spacing=0.15
    )

    # Add habitantes bar chart
    fig.add_trace(
        go.Bar(
            x=comp_data.index,
            y=comp_data['habitantes'],
            marker_color=['red' if x == selected_state else 'lightgray' for x in comp_data.index],
            name='Habitantes',
            text=comp_data['habitantes'].round(2),
            textposition='auto',
        ),
        row=1, col=1
    )

    # Add gastos bar chart
    fig.add_trace(
        go.Bar(
            x=comp_data.index,
            y=comp_data['comparten_gasto'],
            marker_color=['red' if x == selected_state else 'lightgray' for x in comp_data.index],
            name='% Comparten Gastos',
            text=comp_data['comparten_gasto'].round(1),
            textposition='auto',
        ),
        row=2, col=1
    )

    # Update layout
    fig.update_layout(
        height=800,
        showlegend=False,
        title_text=f"Comparación de {selected_state} con otros Estados"
    )

    # Update axes
    fig.update_xaxes(tickangle=45)
    fig.update_yaxes(title_text="Habitantes", row=1, col=1)
    fig.update_yaxes(title_text="Porcentaje", row=2, col=1)

    return fig

# Insights Generation and Comparative Analysis

## Description
This section generates detailed and comparative analyses:

## Analysis Features
- National rankings of selected state
- Comparisons with national averages
- Identification of similar states

## Analyzed Metrics
- Ranking position in inhabitants per dwelling
- Ranking position in shared expenses
- Percentage differences from national averages
- States with similar characteristics

In [21]:
def print_state_insights(metrics, selected_state):
    """Print insights for selected state"""
    print(f"\n=== ANÁLISIS COMPARATIVO: {selected_state.upper()} ===\n")

    # Get state metrics
    state_metrics = metrics.loc[selected_state]

    # Calculate rankings
    hab_rank = metrics['habitantes'].rank(ascending=False)
    gastos_rank = metrics['comparten_gasto'].rank(ascending=False)

    print("POSICIÓN EN RANKINGS NACIONALES:")
    print(f"- Habitantes por vivienda: {hab_rank[selected_state]:.0f}° lugar")
    print(f"- Porcentaje que comparte gastos: {gastos_rank[selected_state]:.0f}° lugar")

    # Compare with national averages
    print("\nCOMPARACIÓN CON PROMEDIOS NACIONALES:")

    hab_diff = (state_metrics['habitantes'] - metrics['habitantes'].mean()) / metrics['habitantes'].mean() * 100
    print(f"- Habitantes por vivienda: {state_metrics['habitantes']:.2f}")
    print(f"  {abs(hab_diff):.1f}% {'por encima' if hab_diff > 0 else 'por debajo'} del promedio nacional")

    gastos_diff = state_metrics['comparten_gasto'] - metrics['comparten_gasto'].mean()
    print(f"- Porcentaje que comparte gastos: {state_metrics['comparten_gasto']:.1f}%")
    print(f"  {abs(gastos_diff):.1f} puntos porcentuales {'por encima' if gastos_diff > 0 else 'por debajo'} "
          "del promedio nacional")

    # Find similar states
    hab_diff = abs(metrics['habitantes'] - state_metrics['habitantes'])
    similar_hab = hab_diff[hab_diff.index != selected_state].nsmallest(3)

    print("\nESTADOS MÁS SIMILARES EN HABITANTES POR VIVIENDA:")
    for estado, diff in similar_hab.items():
        print(f"- {estado}: {metrics.loc[estado, 'habitantes']:.2f} habitantes por vivienda "
              f"(diferencia de {diff:.2f})")

# Analysis Execution

## Description
Main cell that executes the complete analysis:

## Process
1. ENOE data loading
2. State metrics calculation
3. State selection interface
4. Visualization generation
5. Insights presentation

## Results
- Comparative interactive visualizations
- Detailed statistical analysis
- State-specific insights

In [22]:
# Cargar y procesar datos
print("Cargando datos de ENOE...")
df = load_and_process_enoe_data()

# Calcular métricas
metrics = analyze_state_metrics(df)

# Mostrar estados disponibles
estados_sorted = sorted(metrics.index)
print("\nSeleccione un estado para análisis detallado:")
for i, estado in enumerate(estados_sorted, 1):
    print(f"{i:2d}. {estado}")

# Obtener input del usuario
while True:
    try:
        selection = int(input("\nIngrese el número del estado (1-32): "))
        if 1 <= selection <= len(estados_sorted):
            selected_state = estados_sorted[selection - 1]
            break
        else:
            print("Número fuera de rango. Por favor, ingrese un número entre 1 y 32.")
    except ValueError:
        print("Por favor, ingrese un número válido.")

# Crear y mostrar gráficas comparativas
comp_fig = create_comparison_charts(metrics, selected_state)
comp_fig.show()

# Mostrar insights
print_state_insights(metrics, selected_state)

Cargando datos de ENOE...

Seleccione un estado para análisis detallado:
 1. Aguascalientes
 2. Baja California
 3. Baja California Sur
 4. Campeche
 5. Chiapas
 6. Chihuahua
 7. Ciudad de México
 8. Coahuila
 9. Colima
10. Durango
11. Guanajuato
12. Guerrero
13. Hidalgo
14. Jalisco
15. Michoacán
16. Morelos
17. México
18. Nayarit
19. Nuevo León
20. Oaxaca
21. Puebla
22. Querétaro
23. Quintana Roo
24. San Luis Potosí
25. Sinaloa
26. Sonora
27. Tabasco
28. Tamaulipas
29. Tlaxcala
30. Veracruz
31. Yucatán
32. Zacatecas

Ingrese el número del estado (1-32): 2



=== ANÁLISIS COMPARATIVO: BAJA CALIFORNIA ===

POSICIÓN EN RANKINGS NACIONALES:
- Habitantes por vivienda: 27° lugar
- Porcentaje que comparte gastos: 25° lugar

COMPARACIÓN CON PROMEDIOS NACIONALES:
- Habitantes por vivienda: 3.20
  4.9% por debajo del promedio nacional
- Porcentaje que comparte gastos: 84.9%
  1.4 puntos porcentuales por debajo del promedio nacional

ESTADOS MÁS SIMILARES EN HABITANTES POR VIVIENDA:
- Quintana Roo: 3.20 habitantes por vivienda (diferencia de 0.00)
- Tamaulipas: 3.20 habitantes por vivienda (diferencia de 0.00)
- Chihuahua: 3.22 habitantes por vivienda (diferencia de 0.02)
