# StatsCan shapefile processing
*April 22, 2022*

This notebook takes statscan census shapefiles and processes them into various useful maps for import into Datawrapper. First, we import geopandas, pandas, and a third module to suppress some annoying warning messages.

In [245]:
import geopandas
import pandas as pd
import warnings;   warnings.filterwarnings("ignore")

Now we read in the latest StatsCan census boundary files, and convert the coordinate system to EPSG:4326, which is what Datawrapper likes.

In [None]:
tracts = (geopandas
          .read_file("https://www12.statcan.gc.ca/census-recensement/2021/geo/sip-pis/boundary-limites/files-fichiers/lct_000b21a_e.zip")
          .to_crs("EPSG:4326")
          )

Next, we read in a table that contains pre-prepared info: a list of provinces that match to PRUIDs, and a list of CMAs that matches names of CMAs and CAs to DGUIDs.

In [231]:
province_list = pd.read_csv("./data/provinces.csv").astype(str).set_index("PRUID")
cma_list = pd.read_csv("./data/cmas.csv").astype(str).set_index("ID")

Let's take a peek at the tracts table.

In [233]:
tracts.head()

Unnamed: 0,CTUID,DGUID,CTNAME,LANDAREA,PRUID,geometry
0,5370001.08,2021S05075370001.08,1.08,1.6383,35,"POLYGON ((-79.85362 43.19320, -79.85380 43.192..."
1,10002.0,2021S05070010002.00,2.0,1.9638,10,"POLYGON ((-52.72050 47.55154, -52.71877 47.550..."
2,5370001.09,2021S05075370001.09,1.09,1.9699,35,"POLYGON ((-79.85586 43.18791, -79.85592 43.187..."
3,5370120.02,2021S05075370120.02,120.02,76.965,35,"POLYGON ((-79.94562 43.16920, -79.94638 43.167..."
4,10006.0,2021S05070010006.00,6.0,1.0467,10,"POLYGON ((-52.71107 47.56251, -52.71143 47.562..."


First, we want to simplify our polygons a bit. Datawrapper has an upload size limit of 2MB, so we use `.simplify()` to reduce the size to an acceptable level.

In [105]:
simple_tracts = tracts.copy()

simple_tracts["geometry"] = tracts["geometry"].simplify(tolerance=0.0001)

Now, we iterate through every CMA and CA in our list, and match that DGUID to the one in our shapefiles. Then, if there's data for that CMA, we output the file as a GeoJSON.

In [232]:
for id in cma_list.index.unique():
    
    name = cma_list.at[id, "NAME"].strip().lower().replace(" ", "")
    id_trim = id[-3:]
    
    data = (simple_tracts
           .loc[simple_tracts["DGUID"].str.contains("2021S0507" + id_trim), :]
          )
    
    if len(data) > 0:
          data.to_file(f"./data/cities/tracts-{name}.geojson", driver='GeoJSON')
    

That's all. Now the repo should be populated with a list of useable GeoJSON files for Datawrapper maps based on the most recent data.

\-30\-