<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Read-in-census-blocks" data-toc-modified-id="Read-in-census-blocks-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Read in census blocks</a></span></li><li><span><a href="#Minor-preprocessing" data-toc-modified-id="Minor-preprocessing-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Minor preprocessing</a></span></li><li><span><a href="#Save" data-toc-modified-id="Save-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Save</a></span></li></ul></div>

**Description**: Reads and processes shapefile of census blocks 2010 and
saves as pickled geopandas dataframe.

---

In [1]:
import pickle
from pathlib import Path

import geopandas as gpd

In [2]:
data_path = Path('../../data')

# Read in census blocks

In [3]:
blocks_path = (data_path / 'raw/Boundaries - Census Blocks - 2010')
blocks = gpd.read_file(str(blocks_path))

# Minor preprocessing

Fix type of some numerical columns (string -> int)

In [4]:
num_cols = [
    'statefp10', 'countyfp10', 'tractce10', 'geoid10', 'blockce10',
    'tract_bloc'
]
blocks[num_cols] = blocks[num_cols].astype(int)

The following drops 46 duplicate entries which are duplicates for every
column, including geometry. Drop_duplicates method does not work for Polygon shapely files. In the
following it is enough to check with 'tract_bloc' (this was tested).

In [5]:
blocks = blocks.drop_duplicates(subset=['tract_bloc'])

# Save

In [6]:
with (data_path / 'interim/blocks.pkl').open('wb') as f:
    pickle.dump(blocks, f)