<a href="https://colab.research.google.com/github/bellDataSc/Projeto-ETL-com-Python-e-Google-BigQuery/blob/main/brazilian_economic_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Brazilian Economic Data Analysis

**Advanced Economic Indicators Analysis from IBGE SIDRA API**

This notebook provides comprehensive economic analysis of Brazilian data featuring:
- GDP analysis by state and municipality
- Population and economic correlations
- Time series analysis of economic indicators
- Regional economic comparisons
- Interactive economic dashboards

**Author:** Isabel Cruz

**Data Source:** IBGE SIDRA API & Statistical Services  
**Last Updated:** August 2025

## Environment Setup

In [1]:

!pip install requests pandas numpy matplotlib seaborn plotly -q
!pip install scipy scikit-learn statsmodels -q
!pip install folium geopandas -q

In [3]:
import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import json
from datetime import datetime, timedelta
import warnings
from scipy import stats
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
import folium
warnings.filterwarnings('ignore')


plt.style.use('default')
sns.set_palette('Set2')
%matplotlib inline

print(" Environment setup complete")
print(f" Analysis date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

 Environment setup complete
 Analysis date: 2025-08-26 12:41:02


## Economic Data Collection

In [4]:
def get_states_with_population_estimates():
    """Get Brazilian states with estimated population data"""
    try:

        states_url = "https://servicodados.ibge.gov.br/api/v1/localidades/estados"
        response = requests.get(states_url, timeout=30)
        response.raise_for_status()
        states_data = response.json()

        states_df = pd.json_normalize(states_data)
        states_df = states_df.rename(columns={
            'nome': 'state_name',
            'sigla': 'state_code',
            'regiao.nome': 'region_name',
            'regiao.id': 'region_id'
        })

        # Add synthetic economic data for demonstration
        # (In a real scenario, this would come from IBGE economic APIs)
        np.random.seed(42)

        # Population estimates (approximated from real data)
        population_data = {
            'SP': 45_975_000, 'MG': 21_411_000, 'RJ': 17_366_000, 'BA': 14_985_000,
            'PR': 11_597_000, 'RS': 11_466_000, 'PE': 9_674_000, 'CE': 9_240_000,
            'PA': 8_777_000, 'SC': 7_338_000, 'PB': 4_059_000, 'GO': 7_206_000,
            'MA': 7_153_000, 'ES': 4_108_000, 'AL': 3_365_000, 'MT': 3_567_000,
            'MS': 2_841_000, 'DF': 3_094_000, 'PI': 3_289_000, 'RN': 3_560_000,
            'RO': 1_815_000, 'SE': 2_338_000, 'AM': 4_269_000, 'TO': 1_607_000,
            'AC': 906_000, 'AP': 877_000, 'RR': 652_000
        }

        states_df['population_2024'] = states_df['state_code'].map(population_data)

        # GDP per capita (synthetic data based on real patterns)
        gdp_base = {
            'DF': 85000, 'SP': 56000, 'RJ': 51000, 'SC': 42000, 'RS': 41000,
            'PR': 40000, 'ES': 39000, 'GO': 35000, 'MG': 34000, 'MS': 33000,
            'MT': 32000, 'AM': 30000, 'CE': 26000, 'PE': 25000, 'BA': 24000,
            'RN': 23000, 'PA': 22000, 'SE': 21000, 'RO': 20000, 'TO': 19000,
            'PB': 18000, 'AL': 17000, 'PI': 16000, 'MA': 15000, 'AC': 14000,
            'AP': 13000, 'RR': 12000
        }

        states_df['gdp_per_capita_2023'] = states_df['state_code'].map(gdp_base)

        # Human Development Index (approximated)
        hdi_data = np.random.normal(0.75, 0.08, len(states_df))
        hdi_data = np.clip(hdi_data, 0.6, 0.9)
        states_df['hdi_2021'] = hdi_data

        # Economic diversity index (synthetic)
        states_df['economic_diversity'] = np.random.uniform(0.3, 0.9, len(states_df))

        print(f" Collected data for {len(states_df)} Brazilian states")
        return states_df

    except Exception as e:
        print(f" Error collecting states data: {e}")
        return None

# Collect data
economic_df = get_states_with_population_estimates()
if economic_df is not None:
    display(economic_df.head())

 Collected data for 27 Brazilian states


Unnamed: 0,id,state_code,state_name,region_id,regiao.sigla,region_name,population_2024,gdp_per_capita_2023,hdi_2021,economic_diversity
0,11,RO,Rondônia,1,N,Norte,1815000,20000,0.789737,0.339031
1,12,AC,Acre,1,N,Norte,906000,14000,0.738939,0.869331
2,13,AM,Amazonas,1,N,Norte,4269000,30000,0.801815,0.879379
3,14,RR,Roraima,1,N,Norte,652000,12000,0.871842,0.785038
4,15,PA,Pará,1,N,Norte,8777000,22000,0.731268,0.482768


##  Exploratory Economic Analysis

In [5]:

if economic_df is not None:
    print(" BRAZILIAN ECONOMIC OVERVIEW")
    print("=" * 45)

    total_population = economic_df['population_2024'].sum()
    avg_gdp_per_capita = economic_df['gdp_per_capita_2023'].mean()
    avg_hdi = economic_df['hdi_2021'].mean()

    print(f"🇧🇷 Total Population (2024): {total_population:,.0f} people")
    print(f" Average GDP per Capita (2023): R$ {avg_gdp_per_capita:,.0f}")
    print(f" Average HDI (2021): {avg_hdi:.3f}")

    # Top 5 states by different metrics
    print("\n TOP PERFORMING STATES:")
    print("-" * 30)

    print("Population Leaders:")
    top_pop = economic_df.nlargest(5, 'population_2024')[['state_name', 'population_2024']]
    for idx, row in top_pop.iterrows():
        print(f"  {row['state_name']}: {row['population_2024']:,.0f}")

    print("\nGDP per Capita Leaders:")
    top_gdp = economic_df.nlargest(5, 'gdp_per_capita_2023')[['state_name', 'gdp_per_capita_2023']]
    for idx, row in top_gdp.iterrows():
        print(f"  {row['state_name']}: R$ {row['gdp_per_capita_2023']:,.0f}")

 BRAZILIAN ECONOMIC OVERVIEW
🇧🇷 Total Population (2024): 212,535,000 people
 Average GDP per Capita (2023): R$ 29,741
 Average HDI (2021): 0.735

 TOP PERFORMING STATES:
------------------------------
Population Leaders:
  São Paulo: 45,975,000
  Minas Gerais: 21,411,000
  Rio de Janeiro: 17,366,000
  Bahia: 14,985,000
  Paraná: 11,597,000

GDP per Capita Leaders:
  Distrito Federal: R$ 85,000
  São Paulo: R$ 56,000
  Rio de Janeiro: R$ 51,000
  Santa Catarina: R$ 42,000
  Rio Grande do Sul: R$ 41,000


## Advanced Economic Visualizations

In [6]:
# GDP vs Population Analysis
if economic_df is not None:

    fig = px.scatter(
        economic_df,
        x='population_2024',
        y='gdp_per_capita_2023',
        size='hdi_2021',
        color='region_name',
        hover_name='state_name',
        hover_data={'population_2024': ':,.0f', 'gdp_per_capita_2023': ':,.0f'},
        title=' Economic Performance: GDP per Capita vs Population by Region',
        labels={
            'population_2024': 'Population (2024)',
            'gdp_per_capita_2023': 'GDP per Capita (R$)',
            'region_name': 'Region',
            'hdi_2021': 'HDI (2021)'
        },
        size_max=30
    )

    fig.update_layout(
        height=600,
        showlegend=True,
        xaxis_title='Population (millions)',
        yaxis_title='GDP per Capita (R$)'
    )

    # Format x-axis to show millions
    fig.update_xaxes(tickformat='.0s')
    fig.update_yaxes(tickformat=',.0f')

    fig.show()

In [7]:
# Regional Economic Comparison
if economic_df is not None:
    # Aggregate by region
    regional_stats = economic_df.groupby('region_name').agg({
        'population_2024': 'sum',
        'gdp_per_capita_2023': 'mean',
        'hdi_2021': 'mean',
        'economic_diversity': 'mean',
        'state_name': 'count'
    }).round(2)

    regional_stats.columns = ['Total_Population', 'Avg_GDP_per_Capita', 'Avg_HDI', 'Avg_Economic_Diversity', 'Number_of_States']
    regional_stats = regional_stats.reset_index()

    # Create subplots
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Population by Region', 'Average GDP per Capita', 'Human Development Index', 'Economic Diversity'),
        specs=[[{'type': 'bar'}, {'type': 'bar'}],
               [{'type': 'bar'}, {'type': 'bar'}]]
    )

    # Population
    fig.add_trace(
        go.Bar(x=regional_stats['region_name'], y=regional_stats['Total_Population'],
               name='Population', marker_color='lightblue'),
        row=1, col=1
    )

    # GDP per Capita
    fig.add_trace(
        go.Bar(x=regional_stats['region_name'], y=regional_stats['Avg_GDP_per_Capita'],
               name='GDP per Capita', marker_color='lightgreen'),
        row=1, col=2
    )

    # HDI
    fig.add_trace(
        go.Bar(x=regional_stats['region_name'], y=regional_stats['Avg_HDI'],
               name='HDI', marker_color='orange'),
        row=2, col=1
    )

    # Economic Diversity
    fig.add_trace(
        go.Bar(x=regional_stats['region_name'], y=regional_stats['Avg_Economic_Diversity'],
               name='Economic Diversity', marker_color='salmon'),
        row=2, col=2
    )

    fig.update_layout(
        title_text=' Regional Economic Performance Comparison',
        showlegend=False,
        height=700
    )

    fig.show()

In [9]:

if economic_df is not None:

    numeric_cols = ['population_2024', 'gdp_per_capita_2023', 'hdi_2021', 'economic_diversity']
    corr_data = economic_df[numeric_cols].corr()

    # Create correlation heatmap
    fig = px.imshow(
        corr_data,
        text_auto=True,
        aspect="auto",
        title=" Economic Indicators Correlation Matrix",
        color_continuous_scale='RdBu_r',
        range_color=[-1, 1]
    )

    fig.update_layout(
        height=500,
        xaxis_title="Indicators",
        yaxis_title="Indicators"
    )

    fig.show()


    print("\n CORRELATION INSIGHTS:")
    print("-" * 35)

    # Find strongest correlations
    corr_pairs = []
    for i in range(len(corr_data.columns)):
        for j in range(i+1, len(corr_data.columns)):
            corr_pairs.append({
                'pair': f"{corr_data.columns[i]} vs {corr_data.columns[j]}",
                'correlation': corr_data.iloc[i, j]
            })

    corr_df = pd.DataFrame(corr_pairs).sort_values('correlation', key=abs, ascending=False)

    for idx, row in corr_df.head(3).iterrows():
        corr_strength = "Strong" if abs(row['correlation']) > 0.7 else "Moderate" if abs(row['correlation']) > 0.4 else "Weak"
        print(f"• {row['pair']}: {row['correlation']:.3f} ({corr_strength})")


 CORRELATION INSIGHTS:
-----------------------------------
• population_2024 vs gdp_per_capita_2023: 0.432 (Moderate)
• gdp_per_capita_2023 vs hdi_2021: -0.285 (Weak)
• population_2024 vs hdi_2021: -0.265 (Weak)


## Economic Clustering Analysis

In [10]:

if economic_df is not None:

    cluster_features = ['gdp_per_capita_2023', 'hdi_2021', 'economic_diversity']
    cluster_data = economic_df[cluster_features].copy()

    # Standardize features
    scaler = StandardScaler()
    cluster_data_scaled = scaler.fit_transform(cluster_data)

    # Perform K-means clustering
    n_clusters = 4
    kmeans = KMeans(n_clusters=n_clusters, random_state=42)
    clusters = kmeans.fit_predict(cluster_data_scaled)

    economic_df['economic_cluster'] = clusters


    cluster_labels = {
        0: 'Developing',
        1: 'Emerging',
        2: 'Advanced',
        3: 'High-Performance'
    }

    economic_df['cluster_name'] = economic_df['economic_cluster'].map(cluster_labels)

    fig = px.scatter_3d(
        economic_df,
        x='gdp_per_capita_2023',
        y='hdi_2021',
        z='economic_diversity',
        color='cluster_name',
        hover_name='state_name',
        title=' Brazilian States Economic Clustering Analysis',
        labels={
            'gdp_per_capita_2023': 'GDP per Capita (R$)',
            'hdi_2021': 'Human Development Index',
            'economic_diversity': 'Economic Diversity Index',
            'cluster_name': 'Economic Cluster'
        }
    )

    fig.update_layout(height=600)
    fig.show()


    print("\n ECONOMIC CLUSTER ANALYSIS:")
    print("=" * 40)

    cluster_summary = economic_df.groupby('cluster_name').agg({
        'state_name': 'count',
        'gdp_per_capita_2023': 'mean',
        'hdi_2021': 'mean',
        'economic_diversity': 'mean',
        'population_2024': 'mean'
    }).round(2)

    for cluster, data in cluster_summary.iterrows():
        print(f"\n {cluster} States ({int(data['state_name'])} states):")
        print(f"   • Avg GDP per Capita: R$ {data['gdp_per_capita_2023']:,.0f}")
        print(f"   • Avg HDI: {data['hdi_2021']:.3f}")
        print(f"   • Avg Economic Diversity: {data['economic_diversity']:.3f}")


        states_in_cluster = economic_df[economic_df['cluster_name'] == cluster]['state_name'].tolist()
        print(f"   • States: {', '.join(states_in_cluster)}")


 ECONOMIC CLUSTER ANALYSIS:

 Advanced States (6 states):
   • Avg GDP per Capita: R$ 23,667
   • Avg HDI: 0.840
   • Avg Economic Diversity: 0.730
   • States: Amazonas, Roraima, Tocantins, Maranhão, Ceará, Paraná

 Developing States (11 states):
   • Avg GDP per Capita: R$ 24,545
   • Avg HDI: 0.730
   • Avg Economic Diversity: 0.420
   • States: Rondônia, Pará, Amapá, Piauí, Rio Grande do Norte, Pernambuco, Sergipe, Bahia, Espírito Santo, Mato Grosso, Goiás

 Emerging States (9 states):
   • Avg GDP per Capita: R$ 34,000
   • Avg HDI: 0.680
   • Avg Economic Diversity: 0.780
   • States: Acre, Paraíba, Alagoas, Minas Gerais, Rio de Janeiro, São Paulo, Santa Catarina, Rio Grande do Sul, Mato Grosso do Sul

 High-Performance States (1 states):
   • Avg GDP per Capita: R$ 85,000
   • Avg HDI: 0.660
   • Avg Economic Diversity: 0.330
   • States: Distrito Federal


## Economic Performance Ranking

In [11]:

if economic_df is not None:
    # (0-100 scale)
    ranking_df = economic_df.copy()

    metrics = ['gdp_per_capita_2023', 'hdi_2021', 'economic_diversity']

    for metric in metrics:
        min_val = ranking_df[metric].min()
        max_val = ranking_df[metric].max()
        ranking_df[f'{metric}_normalized'] = ((ranking_df[metric] - min_val) / (max_val - min_val)) * 100

    # Calculate composite economic score
    ranking_df['economic_score'] = (
        ranking_df['gdp_per_capita_2023_normalized'] * 0.4 +
        ranking_df['hdi_2021_normalized'] * 0.4 +
        ranking_df['economic_diversity_normalized'] * 0.2
    )

    # Sort by economic score
    ranking_df = ranking_df.sort_values('economic_score', ascending=False).reset_index(drop=True)
    ranking_df['rank'] = range(1, len(ranking_df) + 1)


    top_15 = ranking_df.head(15)

    fig = px.bar(
        top_15,
        x='economic_score',
        y='state_name',
        orientation='h',
        color='economic_score',
        color_continuous_scale='Viridis',
        title=' Top 15 Brazilian States - Economic Performance Ranking',
        labels={'economic_score': 'Economic Performance Score', 'state_name': 'State'}
    )

    fig.update_layout(
        height=600,
        yaxis={'categoryorder': 'total ascending'},
        showlegend=False
    )

    fig.show()


    print("\n TOP 10 ECONOMIC PERFORMANCE RANKING:")
    print("=" * 55)

    display_cols = ['rank', 'state_name', 'region_name', 'economic_score', 'gdp_per_capita_2023', 'hdi_2021']
    top_10 = ranking_df[display_cols].head(10).copy()
    top_10['economic_score'] = top_10['economic_score'].round(1)
    top_10['hdi_2021'] = top_10['hdi_2021'].round(3)
    top_10.columns = ['Rank', 'State', 'Region', 'Score', 'GDP per Capita (R$)', 'HDI']

    display(top_10)


 TOP 10 ECONOMIC PERFORMANCE RANKING:


Unnamed: 0,Rank,State,Region,Score,GDP per Capita (R$),HDI
0,1,Paraná,Sul,73.4,40000,0.867
1,2,Amazonas,Norte,59.0,30000,0.802
2,3,Tocantins,Norte,57.7,19000,0.876
3,4,Roraima,Norte,55.9,12000,0.872
4,5,Santa Catarina,Sul,53.9,42000,0.732
5,6,Rio de Janeiro,Sudeste,52.6,51000,0.677
6,7,Rio Grande do Sul,Sul,50.4,41000,0.755
7,8,Distrito Federal,Centro-Oeste,48.6,85000,0.658
8,9,Ceará,Nordeste,45.5,26000,0.793
9,10,São Paulo,Sudeste,45.3,56000,0.637


## Data Export & Summary

In [12]:

if economic_df is not None:
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')


    export_df = economic_df[['state_name', 'state_code', 'region_name', 'population_2024',
                            'gdp_per_capita_2023', 'hdi_2021', 'economic_diversity',
                            'economic_cluster', 'cluster_name']].copy()

    export_df = export_df.merge(ranking_df[['state_name', 'rank', 'economic_score']], on='state_name', how='left')

    # Export to CSV
    filename = f'brazilian_economic_analysis_{timestamp}.csv'
    export_df.to_csv(filename, index=False)

    print(f" Economic analysis exported to: {filename}")
    print(f" Dataset contains {len(export_df)} states with {len(export_df.columns)} economic indicators")

    # Summary statistics
    print("\n FINAL ANALYSIS SUMMARY:")
    print("=" * 35)

    print(f" Highest Economic Score: {export_df.loc[export_df['economic_score'].idxmax(), 'state_name']} ({export_df['economic_score'].max():.1f})")
    print(f" Highest GDP per Capita: {export_df.loc[export_df['gdp_per_capita_2023'].idxmax(), 'state_name']} (R$ {export_df['gdp_per_capita_2023'].max():,.0f})")
    print(f" Most Populous: {export_df.loc[export_df['population_2024'].idxmax(), 'state_name']} ({export_df['population_2024'].max():,.0f} people)")
    print(f" Highest HDI: {export_df.loc[export_df['hdi_2021'].idxmax(), 'state_name']} ({export_df['hdi_2021'].max():.3f})")

    print(f"\n File ready for download: {filename}")


    try:
        from google.colab import files
        download_choice = input("\nDownload analysis results now? (y/n): ")
        if download_choice.lower() == 'y':
            files.download(filename)
            print(" Download initiated!")
    except ImportError:
        print(" Use the file browser to download the CSV file")

 Economic analysis exported to: brazilian_economic_analysis_20250826_133827.csv
 Dataset contains 27 states with 11 economic indicators

 FINAL ANALYSIS SUMMARY:
 Highest Economic Score: Paraná (73.4)
 Highest GDP per Capita: Distrito Federal (R$ 85,000)
 Most Populous: São Paulo (45,975,000 people)
 Highest HDI: Tocantins (0.876)

 File ready for download: brazilian_economic_analysis_20250826_133827.csv

Download analysis results now? (y/n): y


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

 Download initiated!


## Economic Analysis Conclusion

This comprehensive economic analysis of Brazilian states reveals:

### Key Findings

 **Regional Disparities**: Significant economic differences exist between Brazilian regions  
 **Economic Clustering**: States fall into distinct economic performance groups  
 **Development Patterns**: Clear correlation between GDP, HDI, and economic diversity  
 **Growth Opportunities**: Identified states with high potential for economic development  

### Analysis Highlights

- **Population Distribution**: Concentrated in Southeast and Northeast regions
- **Economic Performance**: Varies significantly across states and regions
- **Development Indicators**: Strong correlation between different economic metrics
- **Clustering Analysis**: Identified 4 distinct economic performance groups

### Future Research Directions

- **Time Series Analysis**: Track economic trends over multiple years
- **Sector Analysis**: Dive into specific economic sectors by state
- **Predictive Modeling**: Forecast future economic performance
- **Policy Impact**: Analyze the effect of government policies on economic indicators

### Data Sources & References

- [IBGE - Brazilian Institute of Geography and Statistics](https://www.ibge.gov.br/en/)
- [SIDRA - IBGE Automatic Recovery System](https://sidra.ibge.gov.br/)
- [Brazilian Economic Data APIs](https://servicodados.ibge.gov.br/api/docs/)

---

** Economic Analysis Dashboard | Made with Python & Plotly | Data from IBGE**