In [1]:
import requests
import json
import pandas as pd
import lxml

# Recap task from last week
**Saved here: https://github.com/AnneCarr/monday-code-club/blob/master/CodeClub-14-Aug-dict.ipynb  **

- Get JSON data from an external source into a Dataframe

# This week
We are going to learn how to create and use functions

Have a look a this tutorial for more info on python functions
https://www.datacamp.com/community/tutorials/functions-python-tutorial#gs.6QALerI

## First task

Using the techniques learned from last week we will
 - create functions to allow us to repeat the task of getting JSON data from external sources
 - use a for loop to loop through the functions to append data from several sources into a dataframe
 - group the data to learn 
  - Which petition had the most number of signatures
  - Which constituency has the most active voters
  - get the % of overall constituency numbers that signed a petition
  - anyother interesting insights you can find and chart based on what we've learned so far


We are going to create a function using two different techniques 

In this one we are going 
 - to pass the URL for the JSON data into the function
 - build a dataframe from the specific set of JSON data we want
 - add a column for the name of the petition 
 - return the completed dataframe

In [2]:
# list of petition urls we are going to use
petitions_list = ['https://petition.parliament.uk/petitions/175433.json',
                'https://petition.parliament.uk/petitions/166847.json',
                'https://petition.parliament.uk/petitions/165672.json',
                  'https://petition.parliament.uk/petitions/187027.json']

We name the function get_petitions_url

We will pass one variable into the function - in this case the URL for the json data we will be using


In [3]:
def get_petitions_url(url):
#     get the JSON data using requests and set it as a JSON object variable
    petitions_data = requests.get(url) 
    petitions_dict = petitions_data.json()
#     Navigate to the data we need and create a dataframe on the data
    const = pd.DataFrame.from_dict(petitions_dict['data']['attributes']['signatures_by_constituency'])
#     add a new column for the name of the petition
    const['petition_name'] = petitions_dict['data']['attributes']['action']
#     the output of the function is the completed dataframe
    return const

We use this for loop to loop through the list of petition urls 

We then call the get_petition_url function passing in the url from the list

On each loop through on the for loop we append the petitions_df with the data collected from the function

In [4]:
petitions_df = pd.DataFrame()
for petition in petitions_list:
    df = get_petitions_url(petition)
    petitions_df = petitions_df.append(df)

petitions_df.head()

Unnamed: 0,mp,name,ons_code,signature_count,petition_name
0,Tommy Sheppard MP,Edinburgh East,S14000022,13,Stop business rate rises before they wreck ind...
1,Deidre Brock MP,Edinburgh North and Leith,S14000023,34,Stop business rate rises before they wreck ind...
2,Ian Murray MP,Edinburgh South,S14000024,7,Stop business rate rises before they wreck ind...
3,Joanna Cherry QC MP,Edinburgh South West,S14000025,12,Stop business rate rises before they wreck ind...
4,Christine Jardine MP,Edinburgh West,S14000026,10,Stop business rate rises before they wreck ind...


In this option we pass in the JSON data we need and create the dataframe directly from the data

We will pass in two variables in this case 
 - one for the constituency data
 - one for the name of the petition
 
We return the completed dataframe 

In [5]:
def get_petitions_dict(const_data,name_data):
    const_df = pd.DataFrame.from_dict(const_data)
    const_df['petition_name'] = name_data
    
    return const_df

In this for loop we are looping through the petitions_list as before but in this case we are setting a variable with the specific data we need to build the dataframe prior to calling the function

The JSON data object is passed into the function and called 'constitucany' - this is used to build the dataframe in the function
The name of the petition is stored in 'name_petition' and also passed into the function

On each loop of the for loop the dataframe object is returned an appended to petitions_df

In [6]:
petitions_df = pd.DataFrame()

for petition in petitions_list:
    petitions_data = requests.get(petition)
    petitions_dict = petitions_data.json()
    constituancy = pd.DataFrame.from_dict(petitions_dict['data']['attributes']['signatures_by_constituency'])
    name_petition = petitions_dict['data']['attributes']['action']
    df = get_petitions_dict(constituancy,name_petition)
    petitions_df = petitions_df.append(df)

petitions_df.head()

Unnamed: 0,mp,name,ons_code,signature_count,petition_name
0,Tommy Sheppard MP,Edinburgh East,S14000022,13,Stop business rate rises before they wreck ind...
1,Deidre Brock MP,Edinburgh North and Leith,S14000023,34,Stop business rate rises before they wreck ind...
2,Ian Murray MP,Edinburgh South,S14000024,7,Stop business rate rises before they wreck ind...
3,Joanna Cherry QC MP,Edinburgh South West,S14000025,12,Stop business rate rises before they wreck ind...
4,Christine Jardine MP,Edinburgh West,S14000026,10,Stop business rate rises before they wreck ind...


As last week we will add the county, electorate population and region data to our completed dataframe

In [7]:
counties = requests.get("https://en.wikipedia.org/wiki/List_of_United_Kingdom_Parliament_constituencies").text

In [8]:
lsDf = pd.read_html(counties)
df = lsDf[1]
df.head()

Unnamed: 0,0,1,2,3,4,5,6
0,Constituency,Electorate 2000,Electorate 2010,Electorate 2017,Largest ceremonial county or council area (Sco...,Country of the UK,Region
1,Aldershot,66499,71908,76205,Hampshire,England,South East
2,Aldridge-Brownhills,58695,59506,60363,West Midlands,England,West Midlands
3,Altrincham and Sale West,69605,72008,73220,Greater Manchester,England,North West
4,Amber Valley,66406,69538,68065,Derbyshire,England,East Midlands


In [9]:
new_header = df.iloc[0]

Get the rest of the data, less the header row

In [10]:
df = df[1:] #take the data less the header row

Rename the columns with your header row

In [11]:
df = df.rename(columns = new_header) 

Now we can use it like a normal dataframe

In [12]:
df.head()

Unnamed: 0,Constituency,Electorate 2000,Electorate 2010,Electorate 2017,Largest ceremonial county or council area (Scotland),Country of the UK,Region
1,Aldershot,66499,71908,76205,Hampshire,England,South East
2,Aldridge-Brownhills,58695,59506,60363,West Midlands,England,West Midlands
3,Altrincham and Sale West,69605,72008,73220,Greater Manchester,England,North West
4,Amber Valley,66406,69538,68065,Derbyshire,England,East Midlands
5,Arundel and South Downs,71203,76697,80766,West Sussex,England,South East


In [13]:
petitions_df.head()

Unnamed: 0,mp,name,ons_code,signature_count,petition_name
0,Tommy Sheppard MP,Edinburgh East,S14000022,13,Stop business rate rises before they wreck ind...
1,Deidre Brock MP,Edinburgh North and Leith,S14000023,34,Stop business rate rises before they wreck ind...
2,Ian Murray MP,Edinburgh South,S14000024,7,Stop business rate rises before they wreck ind...
3,Joanna Cherry QC MP,Edinburgh South West,S14000025,12,Stop business rate rises before they wreck ind...
4,Christine Jardine MP,Edinburgh West,S14000026,10,Stop business rate rises before they wreck ind...


In [14]:
petitions_df = petitions_df.rename(index=str, columns={"name": "Constituency"})
petitions_df.head()

Unnamed: 0,mp,Constituency,ons_code,signature_count,petition_name
0,Tommy Sheppard MP,Edinburgh East,S14000022,13,Stop business rate rises before they wreck ind...
1,Deidre Brock MP,Edinburgh North and Leith,S14000023,34,Stop business rate rises before they wreck ind...
2,Ian Murray MP,Edinburgh South,S14000024,7,Stop business rate rises before they wreck ind...
3,Joanna Cherry QC MP,Edinburgh South West,S14000025,12,Stop business rate rises before they wreck ind...
4,Christine Jardine MP,Edinburgh West,S14000026,10,Stop business rate rises before they wreck ind...


left join

In [15]:
dfRegion = pd.merge(petitions_df, df, how="left", on="Constituency")
dfRegion

Unnamed: 0,mp,Constituency,ons_code,signature_count,petition_name,Electorate 2000,Electorate 2010,Electorate 2017,Largest ceremonial county or council area (Scotland),Country of the UK,Region
0,Tommy Sheppard MP,Edinburgh East,S14000022,13,Stop business rate rises before they wreck ind...,74505,60594,67141,City of Edinburgh,Scotland,
1,Deidre Brock MP,Edinburgh North and Leith,S14000023,34,Stop business rate rises before they wreck ind...,74762,69580,80910,City of Edinburgh,Scotland,
2,Ian Murray MP,Edinburgh South,S14000024,7,Stop business rate rises before they wreck ind...,68884,59285,65801,City of Edinburgh,Scotland,
3,Joanna Cherry QC MP,Edinburgh South West,S14000025,12,Stop business rate rises before they wreck ind...,75787,66262,72149,City of Edinburgh,Scotland,
4,Christine Jardine MP,Edinburgh West,S14000026,10,Stop business rate rises before they wreck ind...,70603,65526,71717,City of Edinburgh,Scotland,
5,David Duguid MP,Banff and Buchan,S14000007,2,Stop business rate rises before they wreck ind...,65970,65183,68609,Aberdeenshire,Scotland,
6,Alison Thewliss MP,Glasgow Central,S14000029,5,Stop business rate rises before they wreck ind...,70378,67521,70945,Glasgow City,Scotland,
7,Patrick Grady MP,Glasgow North,S14000031,2,Stop business rate rises before they wreck ind...,63729,54620,60169,Glasgow City,Scotland,
8,Mr Paul J Sweeney MP,Glasgow North East,S14000032,1,Stop business rate rises before they wreck ind...,70899,64171,66678,Glasgow City,Scotland,
9,Stewart Malcolm McDonald MP,Glasgow South,S14000034,5,Stop business rate rises before they wreck ind...,74482,69122,74051,Glasgow City,Scotland,


From the resulting dataframe find:
 - which petition had the highest number of signatures?
 - which constituancy is most likely to sign a petition?
 - create a bar chart with the 