Analysis of the US pipeline incidents since Jan 2010
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

The US pipelines incidents

From 2010 to October 2017

A Jupyter notebook to analyze trends in pipeline incidents in the US from 2010 to October 2017. Different aspects of the incidents are considered to answer these five questions:

  1. How common are spills?
  2. What is their spatial and temporal distributions?
  3. What is their scale regarding volume and cost?
  4. What are the main causes of spills?
  5. What places have a higher risk?

map states

How it works?

At this moment, all the code is in pipeline.ipynb file. It mainly performs the following actions:

  • Checks the latest update in the dataset. If the date is more than number_of_days variable, it downloads the latest dataset from the server and replaces the old dataset locally.
  • Extracts the required columns from the dataset, cleans the values (both text and NaN) and converts the units to more useful ones. Finally it saves the cleaned dataset locally. It also exports a json file containing the summary of data. This file will be used to create a website which shows the summary using D3 library.
  • Plots multiple figures showing temporal and spatial trends in spills and their financial and environmental damage.
  • Some figures are exported to be used in the README file and the final report.

Required libraries

  • numpy
  • pandas
  • matplotlib
  • plotly


The dataset contains 'Flagged Incidents' from PHMSA Pipeline Safety website.


Mahdi Sadjadi -


This repository is also published as a blog post.


This project is licensed under the MIT License - see the file for details. The dataset is downloaded from PHMSA.