Python 3.x notebooks about real-world data cleaning and visualization
A set of iPython notebooks on data wrangling and visualization for Stanford Computational Journalism, using the standard lib as well as pandas and matplotlib.
These notebooks are walkthroughs of common matplotlib and pandas techniques.
Recipes for extracting (but not cleaning/wrangling) machine-readable data from raw sources.
The results can be found in the extracted subdirectories in data.
- NASA plaintext data files
- California Dept. of Education Excel spreadsheets - extracting SAT scores and student poverty data from workbooks of various design.
- California Dept. of Education fixed-width fields - scraping a HTML table to get the field-boundaries for the CDE's legacy data format for school performance.
- Texas Dept. of Justice; Death row inmates - HTML scraping with lxml
- Texas Dept. of Justice; Executions - HTML crawling with Beautiful Soup (in progress)
- Pivot Tables with Pandas - An exploration into San Francisco crime data.
- Visualizing the relationship between SAT performance and percentage of students eligible for free-or-reduced-price lunch.
- Matplotlib homepage
- How to make beautiful data visualizations in Python with matplotlib
- Subplots in matplotlib
- 10 Minutes to pandas
- Pandas cookbook
- An Introduction to Pandas, via Michael Hansen
- 12 Useful Pandas Techniques in Python for Data Manipulation
- Things in Pandas I Wish I'd Known Earlier
- Useful Pandas Snippets gist cheatsheet
- Intro to pandas data structures
- Quandl's Numpy/Scipy/Pandas cheat sheet
- Dataconomy's 14 best Python Pandas features
- Brandon Rhodes - Pandas From The Ground Up - PyCon 2015 (Video)