How to Find Stories in Data
by Dan Nguyen
Presentation for the Associated Collegiate Press 2017 San Francisco Midwinter Convention.
- Getting involved
- Lists, resources, news for data journalism
- Datasets, databases, and data portals
- How-tos and Reflections
- Stories and projects
- Examples of student work
Get involved with crowdsourced data journalism
Other than learning how to use a spreadsheet and to use it everyday for all of your note and record taking (journalism related or not) -- if you want to do data journalism, then join a crowdsourced project. Help them build data for the public, and get first-hand experience of how the nitty-gritty of data collection becomes the fuel for accountability stories.
Data entry is always dull, but it is always necessary. Might as well get into it by doing data entry for a great journalistic purpose.
A few ongoing, nationwide projects:
- Fatal Encounters - A project started by an independent journalist who recognized long before Ferguson how pitiful the official record-keeping was for police shootings.
- Documenting Hate - ProPublica's initiative to collect and count hate crimes and bias incidents and to create a national dataset.
- TrumpWorld - BuzzFeed has logged more than 1,500 of the Trump Administration's business and personal connections. Use their spreadsheet and help them find more connections. (Github repo)
- OpenElections - Believe it or not, America's elections do not produce a convenient, centralized source of data. This project aims to create the "most comprehensive election results data in human history".
Blogs, feeds, chats, and lists to follow to learn about data journalism and engage with the community:
MuckRock - This FOIA-filing service is free, invaluable repository of public records requests (and their responses) and expert knowledge.
Source: An OpenNews project The best place to find both thoughtful essays and deep, technical writeups from journalist-developers, engineers, designers, and advocates.
NICAR-L A mailing list for the National Institute of Computer Assisted Reporters and is probably the most active mailing list in journalism...it has a mix of investigative journalists and people who are focused purely on the visualization and data science side
News Nerdery "A Slack channel/international meta organization to foster news nerd collaboration and knowledge sharing."
Sunlight Foundation The Sunlight Foundation is a national, nonpartisan, nonprofit organization that uses technology, open data, policy analysis and journalism to make our government and politics more accountable and transparent to all.
Philip Meyer Journalism Award A contest from The National Institute for Computer-Assisted Reporting that recognizes the best journalism done using social research methods.
The Art and Science of Data-Driven Journalism An exhaustive Tow Center report on the state of data journalism, based on interviews with its top practitioners.
The Data Journalism Handbook A free, open source reference book for anyone interested in the emerging field of data journalism.
IRE Awards The IRE Awards is the annual contest of Investigative Reporters and Editors Inc. recognizing the best in investigative reporting by print, broadcast and online media.
Pulitzer Prize for Public Service Widely considered the most prestigious of the Pulitzer Prizes, awarded for the best accountability reporting to publications big and small.
Data is Plural Newsletter A weekly newsletter of useful/curious datasets, curated by BuzzFeed Data Editor Jeremy Singer Vine.
Databases, datasets, and data portals
If only finding data were the hard part of data journalism. But knowing what data exists, and being able to at least "touch" (i.e download) it, is a great first step.
OpenDataNetwork A search tool for Socrata city data portals. One of the best ways to find and download structured, spreadsheet-ready data of public interest.
MuckRock Not just a site for making record requests, but a home for public data and documents of high interest to journalism and activists.
College Scorecard Data from the government to help the public understand the costs and performance of every university.
Guidestar IRS Form 990s are a treasure trove of financial and contact data for American non-profits.
Chronicle Title IX Tracker The federal investigations into alleged mishandling of sexual violence reports by colleges. of A great example of a data project built through public records requests.
Data.gov The home of the U.S. Government’s open data. A great place for at least learning what exists.
Transparent California The salary for every public California employee, including university employees, over several years.
Google Trends An accessible interface to query how the world queries.
fivethirtyeight/data data - Data and code behind the stories and interactives at FiveThirtyEight
BuzzFeedNews/everything everything - An index of all our open-source data, analysis, libraries, tools, and guides.
r/datasets A relatively niche subreddit with people who know where to find data, and even more people who describe questions they hope to explain with data.
State Secrets: Open records laws across the nation With state and local government secrecy on the rise in many U.S. jurisdictions, this database offers a view of state open records and open meetings laws, and provides information about how to get what you are looking for, as well as ensure that government is operating in the sunlight.
FEC Campaign Finance A great example of accessible, important, interesting, and voluminous government data.
How-tos and Reflections
What I love most about data-driven journalism is how its stories can be done in the open, not dependent on privileged access but on a reporter's willingness to work through the data and details. Consequently, the "How I did this" are not just inspirational, but of high practical value.
How the Sun Sentinel reported its Pulitzer Prize winning coverage of off-duty cops We suspected plenty of other cops were routinely speeding, but how could we document it?
David Fahrenthold tells the behind-the-scenes story of his year covering Trump A reporter reveals how he investigated Trump’s claims on his donations to charity.
Paul Kiel and Karen Weise Discuss the Stars and Slackers of the Bailout ProPublica's Paul Kiel and Karen Weise discuss the expiration of the bailout and the effect it has had on the nation’s economy.
A spreadsheet’s star turn: ‘Spotlight’ gave data geeks a moment of glory – 3 to read It’s not often that a spreadsheet has an important role in a movie. But a spreadsheet does indeed get its big-screen debut in the movie Spotlight, which recently won the Oscars for Best Picture and…
Spotlight, the movie: A personal view – 3 to read Lessons learned from survivors of sexual abuse, the strange intoxication of Hollywood & the power of investigative journalism
How NPR made its ‘Arrested Development’ graphic: ‘We like to build useful stuff’ “I have friends who are much more into it than I am,” says reporter who catalogued every in-joke.
What Ethan Swan Learned From Tracking Every Tattoo in the NBA Ethan Swan and I couldn’t see the players’ tattoos from Section 217 of the Barclays Center in Brooklyn, but Swan still knew who was inked and who wasn’t. LeBron James? Obviously. Chris …
About the AJC’s investigation of doctor misconduct Learn more about the national investigation and the Atlanta Journal-Constitution journalists who examined the handling of physician sexual misconduct in all 50 states and how this investigation was done
Meet the Man Who Spends 10 Hours a Day Tracking Police Shootings How many people have been killed by police in America? No one really knows. One man in Reno, Nevada, is on a quest to find out.
The DIY Effort to Count Who Police Kill Police-tracking sites put their official counterparts to shame. Can the DOJ learn from them?
Behind the Story: Tracking problem police officers in Florida http://ire.org/blog/on-the-road/2011/12/20/behind-story-tracking-police/
A Smarter Way to Count the People Killed by Cops Over the past 24 years, there have been a combined 55 fatal civilian shootings at the hands of British and Welsh police officers. Cops in the United States topped that figure within the first 24 days…
How Reuters investigated the preventable deaths of drug-addicted babies http://ire.org/blog/ire-news/2016/04/05/behind-story-how-reuters-investigated-preventable-/ How the Los Angeles Times turned an anonymous tip into a front-page story No such records exist. That’s the message Paige St. John received when she requested audit records on the Los Angeles County Probation Department’s GPS monitoring program.
Decoding the N.F.L. Database to Find 100 Missing Concussions The N.F.L. logged 887 concussions from 1996 to 2001, and they served as the backbone for 13 research papers on head injuries.
How I Investigated Uber Surge Pricing in D.C. The data and processes that show some D.C. neighborhoods wait far longer for Uber service.
How The Chicago Reporter Made 'Settling for Misconduct' For hundreds of police lawsuits, we needed good data management, strong collaborations, and a bigger chart.
How We Made "Failure Factories" And why we kicked off our investigative series with a stand-alone graphic
Inside the Wall Street Journal's Prediction Calculator How a black box graphic fueled unexpected engagement with readers
A college journalist's guide to public records For the savvy college journalist looking to get an exclusive scoop, public records can be the perfect secret weapon. Here's how you can wield it.
The Stories of Everyday Lives, Hidden in Reams of Data Data journalists use data to tell stories that help readers make better choices and live better lives.
How we identified the nation's worst charities Our reporters zeroed in on charities that consistently kept less than 33 cents of every dollar donated.
Homicide Watch: An Interview Homicide Watch is one of those projects that stays in your head. If you tell or edit or assemble stories for a living, it’s also likely to change the way you see the narratives you’re making. Founder Laura Amico is joined here by Chris Amico, the project’s technology lead, in a discussion about Homicide Watch and its implications for the evolution of journalism.
Newtown Sparked a Revolution in Data Collection That Could Actually Reduce Death by Guns in America Five days after the shooting at Sandy Hook Elementary School, President Barack Obama delivered remarks in which he stressed that, this time, the nation ...
Announcing PolitiFact The site is a simple, old newspaper concept that’s been fundamentally redesigned for the web. We’ve taken the political “truth squad” story, where a reporter takes a campaign commercial or a stump speech, fact checks it and writes a story. We’ve taken that concept, blown it apart into it’s fundamental pieces, and reassembled it into a data-driven website covering the 2008 presidential election.
Research chat: Sarah Cohen of the New York Times on the state of data journalism and what reporters need to know - Journalist's Resource 2014 conversation with a leading practitioner of data journalism. Cohen is editor of computer-assisted reporting at the New York Times and board president of Investigative Reporters & Editors.
A Big Article About Wee Things The wee things that we see as part of graphics, maps, visualizations (wee things in space) as well as the wee things we experience as part of interactions, navigation, and usability (wee things in time)
NYT’s Sarah Cohen will make you realize how much better your public records game could be Cohen recently gave California fellows a master class in how to approach public records. In her talk, Cohen stressed the level of pre-reporting that needs to be done before filing a request. Here are a few key takeaways.
Bulletproofing the Data Project stabile - Computer-Assisted Reporting class, Stabile investigative reporting program, Spring 2014
Deadly Force: How This Series Was Put Together http://www.washingtonpost.com/wp-srv/local/longterm/dcpolice/deadlyforce/police1_method.htm Ex-Googler says she exposed company-wide pay inequality with crowdsourced spreadsheet When thousands of Google employees organized to share their salaries internally — highlighting troubling patterns in the way people were paid — Google got angry, according to a former Google engineer
How one Washington Post reporter uses pen and paper to make his tracking of Trump get noticed I think I knew there was going to be a lot of futility to the process. I was looking for a way to make the futility look interesting and give people something to follow.
We’ve Stopped Talking About Domestic Violence And The NFL In early September, the story of Ray Rice assaulting his then-fiancée in an Atlantic City casino exploded in the media, sparking a debate about how to prevent — and respond to — domesti…
Journalism and the Scientific Tradition If you are a journalist, or thinking of becoming one, you may have already noticed this: They are raising the ante on what it takes to be a journalist.
Learn how PolitiFact does its work Editor’s note: We often get questions about how we select claims to check and how we make our rulings. So a couple of times a year, we publish this overview of our procedures and the principles for Truth-O-Meter rulings. PolitiFact is a fact-checking website that rates the accuracy of claims by elected officials and others who speak up in American politics. PolitiFact is run by editors and reporters from the Tampa Bay Times, an independent newspaper in Florida, as is PunditFact, a site devoted to fact-checking pundits. The PolitiFact state sites are run by news organizations that have partnered with ...
MaryJo Webster's training materials for data journalism A huge list of tutorials and guides on Excel, database, and other highly useful data skills and tools.
The Quartz Guide to Bad Data An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.
The Voices of Patient Harm More than 1 million patients suffer harm each year in U.S. health care facilities. Often, their harm isn’t acknowledged even as they live with the consequences. ProPublica set out to capture their stories. Here is what we learned.
Projects and Stories
A very truncated list of projects to note (and copy), and stories about and from data.
Death to ‘Data Journalism’ And long live “fact journalism.”
The NFL’s Uneven History Of Punishing Domestic Violence The elevator doors open and he drops her. She falls to her knees, and then to the floor, but her feet prevent the doors from closing. The man is holding the woman’s purse as he tries to move her un…
What the Fox Knows FiveThirtyEight is a data journalism organization. Let me explain what we mean by that, and why we think the intersection of data and journalism is so important. If you’re a casual reader of …
Final Forms What death certificates can tell us, and what they can’t.
Scott Klein on the Forgotten History of Visualization in News Being a history nerd, I started wondering how far our history goes, and was very surprised, indeed, about how far I could go, said Scott Klein of ProPublica. "It turns out data journalism goes so far back [that] it actually predates newspapers."
Spotlight Church abuse report: Church allowed abuse by priest for years - The Boston Globe Why did it take a succession of three cardinals and many bishops 34 years to place children out of John J. Geoghan’s reach?
Epitaphs for Lost Officers What began as an experiment for an 18-year-old kid who dreamed of becoming a police officer has evolved into a research tool for academics and a teaching and training resource for hundreds of police departments across the country. Prince George's County is among the local jurisdictions that use the site to teach recruits about the perils of policing.
Workers’ Comp Benefits: How Much is a Limb Worth? Depending on where you work, your compensation for the same injury could be drastically different than in other states. Compare them all here.
Homicide Watch D.C. Homicide Watch D.C. is a community-oriented news site that aims to provide clear information about homicides and the tools necessary to record, report and share our experiences and losses within the District of Columbia.
Documenting Hate - ProPublica Nobody knows how many hate crimes and bias incidents take place each year in America. Help us track them.
Till death do us part: A Post and Courier Special Report More than 300 women were shot, stabbed, strangled, beaten, bludgeoned or burned to death over the past decade by men in South Carolina, dying at a rate of one every 12 days while the state does little to stem the carnage from domestic abuse.
Explore the AJC's investigation of physician sexual misconduct List of articles and multimedia from The Atlanta Journal-Constitution’s national investigation of doctor sexual misconduct cases and how they are handled and tolerated by a broken system
City rape statistics, investigations draw concern One of the best examples of how even bad data, when systematically scrutinized, can expose the most obfuscated truths.
AJC investigation: Atlanta School Test Cheating Scandal In-depth coverage of suspicious student test scores in Atlanta, Georgia and across the nation by the AJC
Previously, On Arrested Development NPR's slightly obsessive guide to the running gags on Arrested Development, updated for season 4.
Police shootings 2016 database Since 2015, The Post has created a database cataloging every fatal shooting nationwide by a police officer in the line of duty.
About – Electionland We have created a pop-up newsroom staffed by about 700 journalists and journalism students. It will find and authenticate social media posts, and sift through Google Trends data, SMS and WhatsApp messages, and reports from the national nonpartisan election monitoring group Election Protection. The newsroom will write stories and pass story leads to hundreds of local reporters.
Help Us Map TrumpWorld We logged more than 1,500 people and organizations connected to the incoming administration. Now we want your help to understand them and to add more.
Centinela Valley schools chief amassed $663,000 in compensation in 2013 Documents obtained by the Daily Breeze from the Los Angeles County Office of Education show that although Jose Fernandez had a base pay of $271,000 in the 2013 calendar year, his other benefits amounted to nearly $400,000.
The Real Story Of 2016 On Friday at noon, a Category 5 political cyclone that few journalists saw coming will deposit Donald Trump atop the Capitol Building, where he’ll be sworn in as the 45th president of the United St…
Student data journalism
Students ostensibly have the same access to public data and records as professionals and thus, the same potential for high-impact work. A few examples of student work, both individual and collaborative, that took an empirical approach to journalism:
News21: America’s Weed Rush Explore America's Weed Rush data
Driving with suspended license top crime in Menlo Park, many lose cars - Peninsula Press Menlo Park police citations and vehicle impounds for driving with a suspended license nearly tripled from 2008 to 2014, and many impounded cars are never recovered by owners.
Officer Down Memorial Page Chris Cosgriff, created the Officer Down Memorial Page from his freshman dorm room, after reading about police officers killed in the line of duty.
Are Traffic Stops Prone to Racial Bias? An attempt to find out confronts a frayed patchwork of data across the country.
Human Trafficking Hidden in Dozens of Maryland Communities While Authorities Struggle to Fight It Traffickers find vulnerable young women, seduce them with promises of security, then force them into the sex trade.
Robin Hood in Reverse How universities force working-class students to pay thousands of dollars in hidden fees to athletic departments awash in red ink