SQLite database exercises for analyzing Fatal Encounters (police officer involved homicides) and Census data
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
examples
.gitignore
README.md
bootstrap.sh
clean-fe-headers.csv
fatal-encounters-and-census-2010.sqlite
fetch.sh
schemas.sql
wrangle.sh

README.md

Studying police-involved homicides and census data with SQLite

This repo contains data from the Fatal Encounters project and Census 2010 Gazetteer -- and the code to bootstrap the data into a SQLite database.

The purpose is to provide an example of how to use SQL joins and aggregations to efficiently summarize and analyze real-world data. Even when the data is as "messy" (I prefer "complicated") as it is from the FE project, because of the manual, decentralized, and crowdsourced data-collection process.

But we can still use SQL to quickly do interesting and wide-scale explorations that would be very difficult with spreadsheet/pivot tables alone. Particularly, the joining of FE data with an entirely different, and more formal dataset: the Census 2010 Gazetteer.

Example: Of the American cities with greater than 100,000 population, which of these cities have the highest count of police-involved homicides, from 2014-2016, when adjusted for population?

Here's the SQL and resulting table, which joins an aggregation of the FE data with Census 2010 city population estimates:

city state population incidents incidents_per_100k
Waco TX 124805 11 8.81
Orlando FL 238300 17 7.13
Kansas City KS 145786 10 6.86
Flint MI 102434 7 6.83
San Bernardino CA 209924 14 6.67
St. Louis MO 319294 21 6.58
Midland TX 111147 7 6.3
Topeka KS 127473 8 6.28
Stockton CA 291707 18 6.17
Knoxville TN 178874 11 6.15
Birmingham AL 212237 13 6.13
Tulsa OK 391906 24 6.12
Beaumont TX 118296 7 5.92
Bakersfield CA 347483 20 5.76
Baton Rouge LA 229493 13 5.66
Pueblo CO 106595 6 5.63
Huntsville AL 180105 10 5.55
Modesto CA 201165 11 5.47
Kansas City MO 459787 25 5.44
Cincinnati OH 296943 16 5.39

(Note: "incidents" are events as recorded by the Fatal Encounters project. It does not necessarily mean "homicides", nevermind police-committed homicides. If you check out the "Related Reading" section at the bottom of this page, you can learn more than you ever wanted to know about how difficult it is to count and classify deaths.)

Contrast that with the (simpler) SQL and table for top 20 cities by total count of incidents (not adjusted for city population), which doesn't use a SQL JOIN to compare the incident total with Census population data:

city state incidents
Houston TX 94
Los Angeles CA 69
Chicago IL 66
Phoenix AZ 48
San Antonio TX 47
Dallas TX 41
Las Vegas NV 30
Detroit MI 29
Oklahoma City OK 28
Albuquerque NM 26
Kansas City MO 25
Tulsa OK 24
San Francisco CA 23
Indianapolis IN 23
Omaha NE 22
Austin TX 22
Washington DC 21
St. Louis MO 21
Tucson AZ 20
Bakersfield CA 20

The data

You can download the SQLite database here: fatal-encounters-and-census-2010.sqlite

The data/ folder contains copies of the source data files. Look at bootstrap.sh and schemas.sql if you're interested in how it was wrangled (requires csvkit and bash and gnu tols)

Sources

Special thanks to D. Brian Burghart and the many volunteers who continue to keep the Fatal Encounters project updated.

Fatal Encounters

The longest-running crowdsourced project to track police-involved homicides in the U.S., with volunteer-researched data as far back as 2000:

Fatal Encounters homepage

The Google Spreadsheet

Census Gazetteer for 2010

The Gazetteer provides an easy-to-download flat (CSV) file of basic Census data by geographical regions. Very useful when all you care about is population by county, zip code, state, etc.

https://www.census.gov/geo/maps-data/data/gazetteer2010.html

Files:

SQL examples

TK: writing a few in the examples folder

But here's a few obvious ones:

Incident count by year
SELECT
  STRFTIME('%Y', date) as year,
  COUNT(*) AS incidentcount
FROM 
  fatal_encounters
GROUP BY year
ORDER BY year ASC;
year incidentcount
2000 817
2001 922
2002 986
2003 1057
2004 1041
2005 1153
2006 1263
2007 1255
2008 1210
2009 1242
2010 1268
2011 1393
2012 1465
2013 1780
2014 1715
2015 1590
2016 1578
2017 1459
Incident count by race
SELECT
  race,
  COUNT(*) AS incidentcount
FROM 
  fatal_encounters
GROUP BY race
ORDER BY incidentcount DESC;
race incidentcount
Race unspecified 9239
European-American/White 6294
African-American/Black 4451
Hispanic/Latino 2667
Asian/Pacific Islander 307
Native American/Alaskan 200
Middle Eastern 36

Top 20 cities by incident total from 2014-2016
SELECT
  city,
  state,
  COUNT(*) AS incidents
FROM fatal_encounters
WHERE
  STRFTIME('%Y', date) IN ('2014', '2015', '2016')
GROUP BY city, state
ORDER BY incidents DESC
LIMIT 20;
city state incidents
Houston TX 94
Los Angeles CA 69
Chicago IL 66
Phoenix AZ 48
San Antonio TX 47
Dallas TX 41
Las Vegas NV 30
Detroit MI 29
Oklahoma City OK 28
Albuquerque NM 26
Kansas City MO 25
Tulsa OK 24
San Francisco CA 23
Indianapolis IN 23
Omaha NE 22
Austin TX 22
Washington DC 21
St. Louis MO 21
Tucson AZ 20
Bakersfield CA 20

Top 20 cities by 2014-2016 incidents per 100k population
WITH tx AS (
  SELECT 
    city
    , state
    , COUNT(*) AS incidents
  FROM fatal_encounters
  WHERE
      STRFTIME('%Y', date) BETWEEN '2014' AND '2016'
  GROUP BY 
    city, state
)

SELECT 
  tx.city
  , tx.state
  , ty.pop10 AS population
  , tx.incidents
  , ROUND(100000.0 * incidents / ty.pop10, 2 ) 
           AS incidents_per_100k
FROM 
  tx
INNER JOIN
    census_places AS ty
    ON
       (tx.city || ' ' || 'city') = ty.name
       AND tx.state = ty.usps
WHERE 
  population > 100000
ORDER BY 
  incidents_per_100k DESC
LIMIT 20;
city state population incidents incidents_per_100k
Waco TX 124805 11 8.81
Orlando FL 238300 17 7.13
Kansas City KS 145786 10 6.86
Flint MI 102434 7 6.83
San Bernardino CA 209924 14 6.67
St. Louis MO 319294 21 6.58
Midland TX 111147 7 6.3
Topeka KS 127473 8 6.28
Stockton CA 291707 18 6.17
Knoxville TN 178874 11 6.15
Birmingham AL 212237 13 6.13
Tulsa OK 391906 24 6.12
Beaumont TX 118296 7 5.92
Bakersfield CA 347483 20 5.76
Baton Rouge LA 229493 13 5.66
Pueblo CO 106595 6 5.63
Huntsville AL 180105 10 5.55
Modesto CA 201165 11 5.47
Kansas City MO 459787 25 5.44
Cincinnati OH 296943 16 5.39

Related reading

GQ Magazine: Meet the Man Who Spends 10 Hours a Day Tracking Police Shootings

(emphasis added)

In 2012, I was the editor of a newspaper. I was driving home from work and I came across this scene of chaos. There were more police cars there than I had ever seen. I realized that either a cop had just killed somebody, or somebody killed a cop.

It was a Friday night and I just came home and I sat down, and I just tried to figure out how often that happens. So I looked at the FBI, and they have a number that they release every year called “justifiable homicides.”

It said that there were about 400 killings a year of people by police. But I just wanted to drill down into that data. And I went straight to Florida, because Florida has a bit of a reputation [of police shootings.] And I figured that those numbers would be high. And yet, according to the FBI, there had never been a person killed by police in the state of Florida.

I realized, at that moment, that the whole thing was kind of a lie.

Reddit IAMA: I’m D. Brian Burghart, a journalist who was offended by the government’s lack of statistics on police-involved deaths, so I started the Fatal Encounters website. AMA!

And of course, there's the important projects run by news organizations, which have also released their data. For this repo so far, I've only included Fatal Encounters because of its much larger scope. These other datasets have (purposeful) limitations and qualities that make them worth studying on their own. Though the joining of their records to Census data is largely the same, given their similar use of recording city and state of each incident:

You can also see a compilation of projects and stories here:

http://2015.padjo.org/briefs/tracking-police-involved-homicides/