# População por subprefeitura

Para calcularmos diversos dados de maneira proporcional à população por subprefeitura, precisamos inicialmente obter a população de cada uma delas. Para isso, utilizamos os dados do Censo Demográfico de 2022, agregados por distritos, e reagregamos esses dados para o nível de subprefeitura.

In [1]:
import pandas as pd
import geopandas as gpd
from os import path, makedirs


from core.downloads.geosampa import get_capabilities, get_features

# Obtendo os dados

## Obtendo os dados do Censo Demográfico de 2022

In [2]:
url_pop = 'https://ftp.ibge.gov.br/Censos/Censo_Demografico_2022/Agregados_por_Setores_Censitarios/Agregados_por_Distrito_csv/Agregados_por_distritos_demografia_BR.zip'

df_pop = pd.read_csv(
    url_pop,
    encoding='latin1',
    sep=';',
    dtype=str
)

df_pop.head()

Unnamed: 0,CD_DIST,NM_DIST,V01006,V01007,V01008,V01009,V01010,V01011,V01012,V01013,...,V01032,V01033,V01034,V01035,V01036,V01037,V01038,V01039,V01040,V01041
0,110001505,Alta Floresta D'Oeste,16699,8267,8432,592,598,589,645,612,...,1166,1144,1250,1191,1211,2485,2487,2028,1479,1053
1,110001515,Filadélfia d'Oeste,551,299,252,23,27,25,19,19,...,39,46,44,44,33,79,84,65,48,19
2,110001520,Izidolândia,532,287,245,25,18,17,25,12,...,44,42,42,34,44,85,78,62,37,22
3,110001525,Nova Gease d'Oeste,1071,562,509,35,32,43,57,27,...,90,79,104,73,92,126,120,130,82,54
4,110001530,Rolim de Moura do Guaporé,801,442,359,42,31,45,45,31,...,55,86,81,51,58,129,116,85,43,23


Primeiro, vamos filtrar os dados para o município de São Paulo.

In [3]:
df_pop = df_pop[df_pop['CD_DIST'].str.startswith('3550308')]
df_pop

Unnamed: 0,CD_DIST,NM_DIST,V01006,V01007,V01008,V01009,V01010,V01011,V01012,V01013,...,V01032,V01033,V01034,V01035,V01036,V01037,V01038,V01039,V01040,V01041
7250,355030801,Água Rasa,85788,39627,46161,1772,2027,2056,2050,2393,...,3966,3974,4022,4765,5504,13078,14034,11887,10700,10401
7251,355030802,Alto de Pinheiros,37359,16939,20420,592,803,823,882,948,...,1572,1691,1745,1862,2181,4834,5671,5290,4897,6314
7252,355030803,Anhanguera,75331,36610,38721,2274,2648,2510,2831,3262,...,5178,4997,5584,6416,6256,11708,11423,10157,5740,3248
7253,355030804,Aricanduva,89574,42131,47443,2216,2466,2553,2884,3279,...,4850,4964,5693,6384,6205,12924,13701,12001,9693,8734
7254,355030805,Artur Alvim,95566,43812,51754,2201,2724,2659,3015,3508,...,5342,5321,5849,6913,6882,14024,14694,12522,10239,9447
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7341,355030892,Vila Medeiros,114939,53445,61494,2915,3219,3286,3420,4416,...,6357,6422,6887,8526,8261,17593,16401,14488,12815,11436
7342,355030893,Vila Prudente,105690,49066,56624,2331,2607,2596,2642,3318,...,5116,5109,5289,6563,7950,18936,16565,13414,11348,10638
7343,355030894,Vila Sônia,123743,57982,65761,3110,3680,3575,3658,4333,...,7336,7131,7318,8595,9194,20167,19999,15282,11561,10804
7344,355030895,São Domingos,88884,41886,46998,2238,2581,2600,2865,3135,...,5084,5151,5537,6143,6385,13251,13733,11996,8986,8096


Agora, vamos selecionar apenas as colunas de interesse, que são as colunas de população por faixa etária. Essas variáveis são as seguintes:

| Tema | Variável | Descrição |
|------|----------|-----------|
| Demografia | V01031 | 0 a 4 anos |
| Demografia | V01032 | 5 a 9 anos |
| Demografia | V01033 | 10 a 14 anos |
| Demografia | V01034 | 15 a 19 anos |
| Demografia | V01035 | 20 a 24 anos |
| Demografia | V01036 | 25 a 29 anos |
| Demografia | V01037 | 30 a 39 anos |
| Demografia | V01038 | 40 a 49 anos |
| Demografia | V01039 | 50 a 59 anos |
| Demografia | V01040 | 60 a 69 anos |
| Demografia | V01041 | 70 anos ou mais |

In [4]:
cols_needed = ['CD_DIST', 'NM_DIST', 'V01031', 'V01032', 'V01033',
               'V01034', 'V01035', 'V01036', 'V01037', 'V01038',
               'V01039', 'V01040', 'V01041',]

df_pop = df_pop[cols_needed]
df_pop

Unnamed: 0,CD_DIST,NM_DIST,V01031,V01032,V01033,V01034,V01035,V01036,V01037,V01038,V01039,V01040,V01041
7250,355030801,Água Rasa,3457,3966,3974,4022,4765,5504,13078,14034,11887,10700,10401
7251,355030802,Alto de Pinheiros,1260,1572,1691,1745,1862,2181,4834,5671,5290,4897,6314
7252,355030803,Anhanguera,4572,5178,4997,5584,6416,6256,11708,11423,10157,5740,3248
7253,355030804,Aricanduva,4413,4850,4964,5693,6384,6205,12924,13701,12001,9693,8734
7254,355030805,Artur Alvim,4333,5342,5321,5849,6913,6882,14024,14694,12522,10239,9447
...,...,...,...,...,...,...,...,...,...,...,...,...,...
7341,355030892,Vila Medeiros,5748,6357,6422,6887,8526,8261,17593,16401,14488,12815,11436
7342,355030893,Vila Prudente,4724,5116,5109,5289,6563,7950,18936,16565,13414,11348,10638
7343,355030894,Vila Sônia,6314,7336,7131,7318,8595,9194,20167,19999,15282,11561,10804
7344,355030895,São Domingos,4506,5084,5151,5537,6143,6385,13251,13733,11996,8986,8096


Por último, vamos convertes as colunas de população para o tipo inteiro.

In [5]:
df_pop = df_pop.astype({c: 'int' for c in cols_needed[2:]})
df_pop

Unnamed: 0,CD_DIST,NM_DIST,V01031,V01032,V01033,V01034,V01035,V01036,V01037,V01038,V01039,V01040,V01041
7250,355030801,Água Rasa,3457,3966,3974,4022,4765,5504,13078,14034,11887,10700,10401
7251,355030802,Alto de Pinheiros,1260,1572,1691,1745,1862,2181,4834,5671,5290,4897,6314
7252,355030803,Anhanguera,4572,5178,4997,5584,6416,6256,11708,11423,10157,5740,3248
7253,355030804,Aricanduva,4413,4850,4964,5693,6384,6205,12924,13701,12001,9693,8734
7254,355030805,Artur Alvim,4333,5342,5321,5849,6913,6882,14024,14694,12522,10239,9447
...,...,...,...,...,...,...,...,...,...,...,...,...,...
7341,355030892,Vila Medeiros,5748,6357,6422,6887,8526,8261,17593,16401,14488,12815,11436
7342,355030893,Vila Prudente,4724,5116,5109,5289,6563,7950,18936,16565,13414,11348,10638
7343,355030894,Vila Sônia,6314,7336,7131,7318,8595,9194,20167,19999,15282,11561,10804
7344,355030895,São Domingos,4506,5084,5151,5537,6143,6385,13251,13733,11996,8986,8096


Como não houve nenhum erro, sabemos que todos os valores de todas as colunas eram numéricos.

## Distritos e Subprefeituras

Também precisaremos saber a qual subprefeitura cada distrito pertence. Para isso, vamos utilizar os dados disponibilizados pelo Geosampa.