A collection of WPRDC-relevant tools and data analyses.
- WPRDC Property Dashboard - The Regional Data Center's Property Dashboard integrates data from multiple data sources and provides it all in one place. We're happy to announce that the beta version of the tool is now live, and we built it in-house.
- WPRDC's Parcels n'at - Parcels n'at allows you to pull a variety of data by neighborhood, municipality, or even for a user-defined area using some of the embedded drawing tools.
Data analyses by dataset
- conorotompkins/pittsburgh_311 - Conor Tompkins' R-based analaysis of Pittsburgh's 311 data.
- The Use of 311 Requests as a Measure of Neighborhood Conditions in the City of Pittsburgh - An article reporting on the analysis of Pittsburgh's 311 data, by UCSUR's own Don Musa, published in UCSUR's PEQ (Pittsburgh Economic Quarterly).
- WPRDC/Jupyter-notebooks-by-dataset/air-quality-exploration - A demonstration of how to dig into the Allegheny County air-quality data to pull out and plot particular measurements.
- eleanortutt/codeforpgh-20180613 - Eleanor Tutt's demonstration of how to load data into Pandas dataframes and how to make choloropleth maps.
- WPRDC/Jupyter-notebooks-by-dataset/Crash-Data-Analysis - WPRDC data analysis of Allegheny County crash data, demonstrating a few methods for loading data from CKAN and manipulating data.
- conorotompkins/allegheny_crashes - Conor Tompkins' R-based analysis of Allegheny County crash data and the accompanying blog post, showing some interesting visualizations.
- conorotompkins/pgh-crime - Conor Tompkins' R-based analysis of Pittsburgh crime data.
- ZacharyGoldstein/pgh-juvenile-arrests - Zach Goldstein's analysis of Pittsburgh arrests data for a WESA article on juvenile arrests. You can also view the Jupyter notebook with plots.
- nrfulton/pittdoggos - Nathan Fulton's Python scripts for analyzing county (but not city) dog-license data and a simple Web page for searching dogs by name.
- conorotompkins/allegheny_overdoses - Conor Tompkins' R-based analaysis of the County Fatal Accidental Overdoses data.
- conorotompkins/healthy_ride - Conor Tompkins' R-based analysis of Healthy Ride data, now updated to include time-series forecasting.
- sdl60660/pittsburgh-steps - Sam Learner made this series of visualizations of the Pittsburgh Steps dataset, that you can scroll through to learn about the many public staircases in Pittsburgh. Made with D3, Svelte, and Mapbox.
- Fair Housing Project - A class project by Tara Schroth, Stephen Vandrak, Gloria Givler, and Annie Goodwin for Professor Amin Rahimian's Data for Social Good course at the University of Pittsburgh. Demonstrates joining tables, cleaning, and analyzing using Pandas dataframes. Also demonstrates geocoding (using geopy) and mapping, and uses the Community Assets dataset for Allegheny County.
- The following is not an analysis specifically of the WPRDC dataset, but it's based partially on the source of that dataset (the EPA's Toxics Release Inventory). From the "Data is Plural" newsletter:
ProPublica published what it’s calling "the most detailed map of cancer-causing industrial air pollution in the U.S.," along with an investigation based on the map's revelations. In a methodology article, reporters explain how they analyzed billions of rows of data from the Environmental Protection Agency's Risk-Screening Environmental Indicators model, which "takes a variety of inputs, including emissions data, weather modeling, and facility specific information, and puts out estimated concentrations of toxic chemicals in the air around industrial facilities." The EPA publishes the model's output as bulk downloads, in an online dashboard, and in other formats.
Examples demonstrating different skills
Learning Python/Jupyter notebooks/data analysis
- WPRDC/urban-informatics-and-visualization - A WPRDC fork of Paul Waddell's course materials for Urban Informatics and Visualization. These Jupyter notebooks cover Python/Jupyter fundamentals, cleaning/manipulating/analyzing/visualizing/mapping data, and using Web APIs for getting and posting data.
Making bar charts
Making SQL queries on WPRDC data
Pulling WPRDC data through the CKAN API
Transforming from a long-format data table to a wide-format data table
- Tutorial on using R to analyze pothole data - Material from a workshop run by Conor Tompkins, using Pittsburgh's 311 data on potholes to teach the basics of using R (including manipulating data and making charts and maps).
Principal component analysis in R
- Principal Component Analysis in R - A blog post explaining the application of prinicpal component analysis to Pittsburgh's 311 data.
Time-series forecasting in R
- Forecasting Healthy Ride ridership - A blog post describing the use of the
prophetpackage to extract seasonality features and predict the variation in Healthy Ride bike-ride counts.
CKAN API usage under R + debugging a broken SQL query
- Using the CKAN API wrapper + converting string fields to integers in SQL queries - Addressing a common pitfall when running SQL queries, this R script shows how to convert a string field to an integer and then use it in the WHERE clause of a SQL query. This also gives a simple example of using the ckanr wrapper package to more easily use the CKAN API.
Handling Census data + network analysis in R
- Analyzing commuter patterns in Allegheny County - Conor Tompkins describes how to use R to manipulate Census data about commuting to study and map the most common starting and ending points for travelling between work and home, revealing that a huge number of people commute to downtown Pittsburgh with sizable numbers travelling to work in Oakland, Findlay Township, Moon Township, and Robinson Township.
Useful tools and code libraries
General tools for working with data
- saulpw/visidata - A terminal spreadsheet multitool for discovering and arranging data. (It's basically a Swiss Army chainsaw for manipulating tabular data.)
- Downstream - An online tool for downloading any tabular data on the WPRDC data portal (even very large tables) as a CSV or TSV file. Small- to medium-sized tables may also be downloaded as Excel files.
Code for dealing with Census data
- ljwolf/cenpy - A Python library for exploring and querying the US Census API and returning Pandas DataFrames.
- datamade/census - A Python wrapper for the U.S. Census API.
- datadesk/census-error-analyzer - Given two Census values and the corresponding margins of error, this Python library can do an analysis to determine whether there is a statistically significant difference between them.
Code for geospatial manipulation
- mggg/maup - "The geospatial toolkit for redistricting data", a Python package designed to facilitate conversion between spatial regions used for elections (e.g., precincts) and spatial regions used by the Census to collect demographic information (e.g., blocks).
Code/service for calculating routes between locations
- Jupyter notebook examples of how to use the API of the OpenRouteService routing calculator, which currently has a generous free tier, providing not just routing/directions, but maps, geocoding, isochrones, and elevation information.
Other relevant data
- See our Other Useful Datasets GitHub repo for links to non-WPRDC datasets that could be helpful when working with WPRDC datasets.
Please contribute things that could be useful to others using WPRDC data, including scripts for data cleaning or analyzing data, Jupyter notebooks for particular datasets, and tools for manipulating and visualizing the data.