In order to estimate how many people live near each dam in the country, we will use two data sources: the Statistical Grid and the Census Tracts produced by the the Brazilian Institute of Geopgraphy and Statistics (IBGE).

The statistical grid is the highest resolution datapoint available for measuring the spacial distribution of the population. It's divided in 300m x 300m (urban areas) or 2km x 2km (rural area) squares, each one with an estimate of the population living there.

<div>
    <img src=notebook-assets/grid.png width="600">
</div>

The census tract differ in shape and size, following the geopolitical divisions of Brazil. Howevert, they also offer plenty of information about of the population that live inside them.

<img src="notebook-assets/tracts.jpg"><br>

We want to make the bost of both worlds: that is, we want to use the statistical grid to measure not only population counts in a very small area, but also the demographics. To do, we will use a technique called [areal weigthed interpolation](https://dges.carleton.ca/CUOSGwiki/index.php/Areal_Interpolation_in_Python_Using_Tobler), which assumes that the characeristics of the population are homogeneous inside each cenus tract.

One important caveat: since there is no Census in the country since 2010, both datasets are reflecting a population pattern that is now 12 years old. They are still the most up to date dataset that is available until the next Census, which is set to start on 2022

#### Importing packages

In [2]:
import geopandas as gpd
import pandas as pd
import tobler

#### Reading census tract data

The census tract data was already parsed in the notebooks `1`, `2` and `3` in this repository. We will simply read them in.

In [3]:
TRACTS = gpd.read_feather("../data/brazil/censo/combined/combined.feather")

#### Reading statistical grid data
The statistical grid, for all it's granularity, is too big and too heavy to handle at once. To make up for this, IBGE divided it into many smaller files divided as shown in the following image:

<img src='notebook-assets/articulacao.jpg'> 

 Instead of reading it all into memory, we will define a function that reads one of those at each time and call it as needed.

In [4]:
def read_grid(id_):
    '''
    Reads and combines the population grids
    specified in the ids list into a single
    GeoDataFrame
    ---
    Parameters:
    
    ids -> A numerical id for the grids that should be read
    '''
    
    gdfs = []
    
    gdf = gpd.read_file(f"zip://../data/brazil/grade/grade_id{id_}.zip")
        
    return gdf
        

Now we can call upon tobler to derive detailed population metrics for each one of the squares, using the simple area weighted interpolation method we described above. 

In [5]:
def area_interpolation(source_df, target_df, extensive_variables=None, intensive_variables=None):
    '''
    A simple wrapper function for tobler's area weighted
    interpolation.
    ---
    Parameters:
    
    source_df -> The gdf with the original data, from which the data will be estimated
    target_df -> The gdf with the new, data-less polygons
    extensive_variables -> Variables that will be derived. Notice that extensive variables
    are EXCLUSIVELY values that depend on the size of the sample,  such as populations counts.
    intensive_variables -> Variables that will be derived. Notice that extensive variables
    are EXCLUSIVELY values that don't depend of sample size, such as population density.
    '''
    
    interp = tobler.area_weighted.area_interpolate(source_df=source_df, 
                                                            target_df=target_df,
                                                            extensive_variables=extensive_variables,
                                                            intensive_variables=intensive_variables)

But, before, we need to make sure that the data is in geodetic and not geographic CRS so the area calcuations are correct. The two functions below will help with that.

In [None]:
def crs_to_area(gdf):
    '''
    Converts the CRS of the geodataframe
    to the Brazilian standard for equal
    area calculations.
    ---
    Parameters:
    
    gdf -> A geodataframe
    '''
    return gdf.to_crs(
    '''
    PROJCS["Brasil_Albers_Equal_Area",
    GEOGCS["GCS_WGS_1984",
    DATUM["D_SIRGAS_2000",
    SPHEROID["GRS_1980",6378137.0,298.257222101]],
    PRIMEM["Greenwich",0.0],
    UNIT["Degree",0.017453292519943295]],
    PROJECTION["Albers"],
    PARAMETER["false_easting",5000000.0],
    PARAMETER["false_northing",10000000.0],
    PARAMETER["central_meridian",-54.0],
    PARAMETER["standard_parallel_1",-2.0],
    PARAMETER["standard_parallel_2",-22.0],
    PARAMETER["latitude_of_origin",-12.0],UNIT["Meter",1.0]]
    '''
)

def crs_to_coords(gdf):
    '''
    Converts the CRS of the geodataframe
    to the Brazilian standard for geogra-
    phic projections.
    ---
    Parameters:
    
    gdf -> A geodataframe
    '''
    pass

In [None]:
AREA_INTERPOLATIOn

In [8]:
def main():


  This is separate from the ipykernel package so we can avoid doing imports until


In [11]:
interp.total_residents.max()

1145.0354192382656