Scripts used to set up census-postgres on an Amazon EC2 instance.
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
00_connect_ebs_volume.sh
01_set_up_postgresql.sh remove superfluous sudo. closes #1. Thanks @pnorman Jun 24, 2014
02_download_acs_2007_1yr.sh
02_download_acs_2007_3yr.sh Update download scripts to include installing aria2 Sep 15, 2016
02_download_acs_2008_1yr.sh
02_download_acs_2008_3yr.sh Update download scripts to include installing aria2 Sep 15, 2016
02_download_acs_2009_1yr.sh Update download scripts to include installing aria2 Sep 15, 2016
02_download_acs_2009_3yr.sh Update download scripts to include installing aria2 Sep 15, 2016
02_download_acs_2009_5yr.sh
02_download_acs_2010_1yr.sh
02_download_acs_2010_3yr.sh Update download scripts to include installing aria2 Sep 15, 2016
02_download_acs_2010_5yr.sh Update download scripts to include installing aria2 Sep 15, 2016
02_download_acs_2011_1yr.sh Update download scripts to include installing aria2 Sep 15, 2016
02_download_acs_2011_3yr.sh Update download scripts to include installing aria2 Sep 15, 2016
02_download_acs_2011_5yr.sh Update download scripts to include installing aria2 Sep 15, 2016
02_download_acs_2012_1yr.sh Update download scripts to include installing aria2 Sep 15, 2016
02_download_acs_2012_3yr.sh Update download scripts to include installing aria2 Sep 15, 2016
02_download_acs_2012_5yr.sh
02_download_acs_2013_1yr.sh
02_download_acs_2013_3yr.sh Update download scripts to include installing aria2 Sep 15, 2016
02_download_acs_2013_5yr.sh Update download scripts to include installing aria2 Sep 15, 2016
02_download_acs_2014_1yr.sh
02_download_acs_2014_5yr.sh Update download scripts to include installing aria2 Sep 15, 2016
02_download_acs_2015_1yr.sh Add a rename step for the post-download step of tabulation metadata Sep 16, 2016
02_download_acs_2015_5yr.sh Fixing URLs for 2015/2016 5yr downloads Sep 14, 2018
02_download_acs_2016_1yr.sh Add import scripts for 2016 1yr + TIGER 2016 Sep 16, 2017
02_download_acs_2016_5yr.sh Fixing URLs for 2015/2016 5yr downloads Sep 14, 2018
02_download_acs_2017_1yr.sh Fixing URLs for 2015/2016 5yr downloads Sep 14, 2018
02_download_acs_2017_5yr.sh Trying to handle the differences in the 2017 5-year release Dec 21, 2018
03_import_acs_2007_1yr.sh Switching back to sudo postgres for the files that use COPY. Dec 19, 2013
03_import_acs_2007_3yr.sh
03_import_acs_2008_1yr.sh Switching back to sudo postgres for the files that use COPY. Dec 19, 2013
03_import_acs_2008_3yr.sh Switching back to sudo postgres for the files that use COPY. Dec 19, 2013
03_import_acs_2009_1yr.sh
03_import_acs_2009_3yr.sh Switching back to sudo postgres for the files that use COPY. Dec 19, 2013
03_import_acs_2009_5yr.sh Switching back to sudo postgres for the files that use COPY. Dec 19, 2013
03_import_acs_2010_1yr.sh Switching back to sudo postgres for the files that use COPY. Dec 19, 2013
03_import_acs_2010_3yr.sh Switching back to sudo postgres for the files that use COPY. Dec 19, 2013
03_import_acs_2010_5yr.sh Switching back to sudo postgres for the files that use COPY. Dec 19, 2013
03_import_acs_2011_1yr.sh
03_import_acs_2011_3yr.sh Switching back to sudo postgres for the files that use COPY. Dec 19, 2013
03_import_acs_2011_5yr.sh Switching back to sudo postgres for the files that use COPY. Dec 19, 2013
03_import_acs_2012_1yr.sh
03_import_acs_2012_3yr.sh Switching back to sudo postgres for the files that use COPY. Dec 19, 2013
03_import_acs_2012_5yr.sh Switching back to sudo postgres for the files that use COPY. Dec 19, 2013
03_import_acs_2013_1yr.sh Adding 2013 1yr scripts. Sep 18, 2014
03_import_acs_2013_3yr.sh
03_import_acs_2013_5yr.sh Add 2013 5yr download scripts Dec 6, 2014
03_import_acs_2014_1yr.sh Don't sudo for 2014 1yr import process Dec 12, 2015
03_import_acs_2014_5yr.sh acs2014_5yr has 121 sequences, not 165. Dec 11, 2015
03_import_acs_2015_1yr.sh Add 2015 1yr scripts for downloading Sep 15, 2016
03_import_acs_2015_5yr.sh Census stopped packaging the 5yr files deep in a directory hierarchy Dec 8, 2016
03_import_acs_2016_1yr.sh
03_import_acs_2016_5yr.sh Geoheader files moved, too Dec 24, 2017
03_import_acs_2017_1yr.sh
03_import_acs_2017_5yr.sh Trying to handle the differences in the 2017 5-year release Dec 21, 2018
05_create_ebs_snapshot.sh Make all the scripts +x. Apr 15, 2013
11_set_up_postgis.sh
12_download_tiger_2012.sh legislative districts redrawn, ACS 2012 data is aligned with new dist… Jul 2, 2014
12_download_tiger_2013.sh Add tiger 2013 release scripts Dec 12, 2014
12_download_tiger_2014.sh Add 2014 tiger download/import/index as well Dec 4, 2015
12_download_tiger_2015.sh Add the TIGER 2015 download, import, and index scripts Sep 16, 2016
12_download_tiger_2016.sh it's aitsn, not aits Jan 3, 2018
12_download_tiger_2017.sh Add download/import/index steps for 2017 1yr + tiger Sep 14, 2018
13_import_tiger_2012.sh be explicit about geom column name. some versions of shp2pgsql use di… Jul 2, 2014
13_import_tiger_2013.sh Switch to PGHOST instead of sudo for 2013 tiger import script Dec 13, 2015
13_import_tiger_2014.sh
13_import_tiger_2015.sh
13_import_tiger_2016.sh it's aitsn, not aits Jan 3, 2018
13_import_tiger_2017.sh
13_index_tiger_2012.sql index sumlevel on census_name_lookup Jul 2, 2016
13_index_tiger_2013.sql index sumlevel on census_name_lookup Jul 2, 2016
13_index_tiger_2014.sql index sumlevel on census_name_lookup Jul 2, 2016
13_index_tiger_2015.sql There's no acs2015_5yr yet Sep 17, 2016
13_index_tiger_2016.sql Pull population in from 2016 acs Jan 3, 2018
13_index_tiger_2017.sql Add download/import/index steps for 2017 1yr + tiger Sep 14, 2018
LICENSE add MIT license. Fixes #18 Oct 24, 2018
README.md Update README.md Sep 20, 2016
aiannh_table_creation.sql update for 2017 cf #12 censusreporter/censusreporter#34 Oct 24, 2018

README.md

census-postgres scripts

A set of scripts to make it easier to set up census-postgres on an Amazon EC2 instance.

Use Existing Data

If you just want to use American Community Survey data on your own PostgreSQL machine, it's recommended to use the dumps that Census Reporter provides. Read more about those dumps in our Tumblr post.

Use the next section only if want to go through the process of rebuilding these data dumps from scratch.

From Scratch

These are the steps I follow when I want to start from scratch and load all ACS releases into the database.

  1. Launch a c1.xlarge instance using the most recent Ubuntu 14.04 image, making sure to connect all four of the ephemeral storage to block devices during the setup walkthrough. If you have the aws command line tool installed and configured, this command should do it:

    aws ec2 request-spot-instances --dry-run \
        --spot-price 1.5 \
        --instance-count 1 \
        --launch-specification '{\
            "InstanceType": "c1.xlarge",\
            "ImageId": "ami-xxxxxxx",\
            "BlockDeviceMappings": [\
                {"VirtualName": "ephemeral0", "DeviceName": "/dev/sdb"},\
                {"VirtualName": "ephemeral1", "DeviceName": "/dev/sdc"},\
                {"VirtualName": "ephemeral2", "DeviceName": "/dev/sdd"},\
                {"VirtualName": "ephemeral3", "DeviceName": "/dev/sde"}\
            ]\
        }'
  2. Connect to it and immediately launch screen

Set up disk

(From this link)

sudo su -
apt-get update
apt-get -y install mdadm xfsprogs
umount /mnt
yes | mdadm --create /dev/md0 --level=0 -c256 --raid-devices=4 /dev/xvdb /dev/xvdc /dev/xvdd /dev/xvde
echo 'DEVICE /dev/xvdb /dev/xvdc /dev/xvdd /dev/xvde' > /etc/mdadm/mdadm.conf
mdadm --detail --scan >> /etc/mdadm/mdadm.conf
blockdev --setra 65536 /dev/md0
mkfs.xfs -f /dev/md0
mkdir -p /mnt && mount -t xfs -o noatime /dev/md0 /mnt
exit
cd /mnt

After this df -h should indicate that you have 1.7TB of disk available at /dev/md0.

Set up postgres

Installs PostgreSQL 9.1 and puts the data directory on the /dev/md0 partition we just created above.

sudo su -
apt-get install -y postgresql-9.1 postgresql-9.1-postgis
/etc/init.d/postgresql stop
mkdir /mnt/postgresql
sed -i "s/data_directory = '\/var\/lib\/postgresql\/9.1\/main'/data_directory = '\/mnt\/postgresql\/9.1\/main'/" /etc/postgresql/9.1/main/postgresql.conf
mv /var/lib/postgresql/9.1 /mnt/postgresql/
chown -R postgres:postgres /mnt/postgresql
/etc/init.d/postgresql start
exit

Create a user and database for census data.

sudo -u postgres psql -c "CREATE ROLE census WITH NOSUPERUSER LOGIN UNENCRYPTED PASSWORD 'censuspassword';"
sudo -u postgres psql -c "CREATE DATABASE census WITH OWNER census;"

Make login passwordless.

echo "localhost:5432:census:census:censuspassword" > /home/ubuntu/.pgpass
chmod 0600 /home/ubuntu/.pgpass

After this you should be able to connect to your empty PostgreSQL database:

psql -h localhost -U census postgres

Download Census Bureau ACS data

Downloads the raw data from the Census Bureau to prepare for insert into the database you just created. We'll use aria2 to help us download these files as quickly as possible.

sudo mkdir -p /mnt/tmp
sudo chown ubuntu /mnt/tmp
sudo apt-get install -y aria2 git
cd /home/ubuntu
git clone https://github.com/censusreporter/census-postgres-scripts.git
git clone https://github.com/censusreporter/census-postgres.git
cd census-postgres-scripts

With this stuff set up we can use the scripts I wrote to download the data from the Census Bureau in a relatively consistent manner. It probably makes sense to run at least some of these in parallel across several screen sessions. If a recent ACS release doesn't show up on this list, follow the new release checklist to create one for the new release before continuing.

./02_download_acs_2007_1yr.sh
./02_download_acs_2007_3yr.sh
./02_download_acs_2008_1yr.sh
./02_download_acs_2008_3yr.sh
./02_download_acs_2009_1yr.sh
./02_download_acs_2009_3yr.sh
./02_download_acs_2009_5yr.sh
./02_download_acs_2010_1yr.sh
./02_download_acs_2010_3yr.sh
./02_download_acs_2010_5yr.sh
./02_download_acs_2011_1yr.sh
./02_download_acs_2011_3yr.sh
./02_download_acs_2011_5yr.sh
./02_download_acs_2012_1yr.sh
./02_download_acs_2012_3yr.sh
./02_download_acs_2012_5yr.sh

An hour or two and 279GB later you should have a directory at /mnt/tmp full of raw, expanded Census Bureau ACS data.

Importing ACS Data

Once we have the ACS data downloaded it's time to actually load that data in to PostgreSQL. Again, since each release is slightly different there's a bunch of scripts I hand-crafted to do this import in a consistent way. If a recent ACS release doesn't show up on this list, follow the new release checklist to create one for the new release before continuing.

./03_import_acs_2007_1yr.sh
./03_import_acs_2007_3yr.sh
./03_import_acs_2008_1yr.sh
./03_import_acs_2008_3yr.sh
./03_import_acs_2009_1yr.sh
./03_import_acs_2009_3yr.sh
./03_import_acs_2009_5yr.sh
./03_import_acs_2010_1yr.sh
./03_import_acs_2010_3yr.sh
./03_import_acs_2010_5yr.sh
./03_import_acs_2011_1yr.sh
./03_import_acs_2011_3yr.sh
./03_import_acs_2011_5yr.sh
./03_import_acs_2012_1yr.sh
./03_import_acs_2012_3yr.sh
./03_import_acs_2012_5yr.sh

Importing ACS Metadata

We just imported the estimate/error values for the ACS data. We also rely on tables that describe what the various Census tables and columns mean and how they relate to each other. We'll load that information here.

cd /home/ubuntu
git clone https://github.com/censusreporter/census-table-metadata.git
cd census-table-metadata
psql -U census -d census -h localhost -f census_metadata.sql
sudo -u postgres psql -d census -f census_metadata_load.sql

Setting up PostGIS and Importing TIGER Geodata

The geodata part of our APIs comes from the Census Bureau's TIGER 2012 dataset. Let's download and import it.

cd /home/ubuntu/census-postgres-scripts
./11_set_up_postgis.sh
./12_download_tiger_2012.sh
./13_import_tiger_2012.sh
psql -U census -d census -h localhost -f 13_index_tiger_2012.sql

New ACS Release Actions

What to do when the Census Bureau releases a new set of ACS data.