Let's start doing our data analysis not in a spreadsheet program and learn Python and Pandas along the way.
Don't get me wrong, I use spreadsheets, but not for data analysis.
Also, there are some notes from people who I've talked to during the conference in the
.md file, and github will render the document on the website (like this
README.md file you are reading now).
Material for Pandas Tutorial at Pydata Carolinas 2016
PyData Carolinas 2016
September 14-16, 2016
Hosted by IBM Emerging Technologies
Research Triangle Park, NC
IBM RTP Activity Center 3039 East Cornwallis Road, Building 400 Research Triangle, NC 27709
Covered in the tutorial
- Pandas DataFrame basics
- Data assembly
- Missing Data
Not covered in the tutorial
The easiest way to get everything you need to the tutorial is to install
You can download and install it here: https://www.continuum.io/downloads
I will be using the Python 3 version during the tutorial.
I actually ended up using Python 2 because of I had a last minute computer change
Install seaborn for plotting
conda install seaborn
- Gapminder: https://github.com/jennybc/gapminder/raw/master/inst/gapminder.tsv
- Survey: Comes from the Software-Carpentry SQL lesson
- Ebola: www.github.com/cmrivers/ebola