# Data Wrangling with Pandas

So you have learned how to import data, clean it up and save as a csv. Time to get our hands dirty and start digging deep

## Starting out

Time to load in our clean data and take a look to make sure everything is as it should be. We're going to import a couple of new libraries in this lesson so we can visualise our data

In [2]:
import pandas as pd
import matplotlib as plt

%matplotlib inline

In [None]:
df = pd.read_csv('', parse_dates=['AcceptedDate'])
df.head()

**Ah, we seem to have saved the index column from our previous dataframe. Let's delete this**

`del df['Unnamed: 0']`

**Ok let's check our data and make sure we have the correct data types and it's the right shape**

## Counts, summing and percentage shares

**Let's see how many donations there are in total?**

`df['DonorName_clean'].count()`

**Ok, let's see what the total value of donations is**

`df['Value'].sum()`

**Ok let's grab some basic descriptive stats and see what we've got**

`df.describe(include='all')`

**Hmm let's see if any Donor gave more than once**

`df['DonorName_clean'].value_counts()`

**Which entity/indidivual recieved the most?**

`df.groupby('RegulatedEntityName')['Value'].sum().sort_values(ascending=False)`

**Let's see who has the largest share of the donations**

`total_donations = df['Value'].sum()`

`total_per_receiver = df.groupby('RegulatedEntityName')['Value'].sum().sort_values()`

`percentage_share = (total_per_receiver/total_donations)*100`

**Ok who got the most money? .head() will give us the least so let's use .tail() to see who got the most**

## Digging deeper

**Let's see in which year the most amount of money was donated**

`df.groupby('YEAR')['Value'].sum()`

**Our data stretches from Jan 1 2015 before the 2015 election right up until April 2017. So let's investigate if more money was donated before or after Brexit. To do this we're going to create a new column with a boolean value**

`df['post_brexit'] = df['AcceptedDate'] <= '23/06/2016'`

**Now we're going to create a pivot table, summing the donations on whether they are before or after the brexit referendum***

`brexit_graph = df.groupby('post_brexit')['Value'].sum()`

**Time to plot our work**

`brexit_graph.plot(kind="bar")`

**Which individual donated the most money?**

`individuals = df[df['DonorStatus'] == "individual"]`

`individuals.groupby('DonorName_clean')['Value'].sum().sort_values(ascending=False)`

**Hmmm one of those names looks very similar to a former primeminister we all know, let's slice our data and see who he's been giving money to **

`blair = df[df['DonorName_clean'] == 'Anthony Blair']`


**Finally let's investigate which type of donor donates the most amount of money per year, to do this we're going to use a groupby again**


``df.groupby(['DonorStatus', 'YEAR'])['Value'].sum().unstack().sort_values(2015).head(10)``

**Now we need to asign our results to a value so we can plot our results**
`most = df.groupby(['DonorStatus', 'YEAR'])['Value'].sum().unstack().sort_values(2015).head(10)`

**Finally let's plot our results**

`most.plot(kind="bar")`

## Wrapping up

**Ok that's all folks!**

Today you got a small taste of what is possible with Pandas, if you want to learn more then check out some of the links below

* dataquest.io - https://www.dataquest.io/blog/pandas-python-tutorial/
* 10 minutes to pandas - http://pandas.pydata.org/pandas-docs/stable/10min.html
* 19 essential snippets in pandas - https://jeffdelaney.me/blog/useful-snippets-in-pandas/