A basic pandas data analysis tutorial
There is an associated blog post to go with this repo. If you're not super familiar with data analysis or how to set-up Python, then there are more instructions in that post.
You can find it here: Leave Excel and learn data analysis with pandas
How to use this repo
Assuming you've come from the blog post, then you should already have all the instructions. If you've landed on this separately then all you need to do is clone the repo and then boot up Jupyter Toolbox for Data Anlaysis.ipynb with Jupyter Notebook.
All the paths and data used the tutorial should already by the path.
What is the point of this repo
The goal of this is to help you to move from Excel, to Python. Anything you don't is scary and for many people that's doubly true for code. But by moving from Excel to Python has a huge number of benefits, your work becomes:
- repeatable: It's code you can easily re-run and use the same pieces of work in a different situation
- understandable: If you perform a piece of analysis in code, the code will show you what has been done. While not perfect, this is still better than Excel, where all you see is the end product
- scaleable: Excel struggles with large amounts of data. Python doesn't. Eventually you'll hit a limit but it's far further off.
- more powerful: Things that are trivial in Python can be exceptionally hard in Excel (for example using text fields in the value part of a pivot chart)
But unlike Excel, it doesn't have lots of nice shiny buttons. The goal of this workbook is to show you how to do all the common actions you take in Excel in python as well as how to work with data.
We'll be relying heavily on pandas, which provides lots of useful functions for data analysis.