<a href="https://colab.research.google.com/github/digitaldickinson/Colaboratory/blob/master/Grabbing_JSON_data_from_Parliament_petitions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Grabbing JSON data from Parliament petitions (no, not that one!)
We are going to start with two libraries for this one. 
 - Requests -  a library to access and process web page requests and store the results. 
 - Pandas - to manage and analyse the data

In [0]:
import requests
import pandas as pd

We are going to use the petition "[People found with a knife to get 10 years and using a knife 25 years in prison](https://petition.parliament.uk/petitions/233926)". The petitions website lets you see the raw data for the petition in JSON format. JSON or JavaScript Object Notation is way of storing data that is (so they claim) human as well as machine readable. In practice this means its stored as text. But there is a structure.  Here's an example of a line of data from the petition:

`
{"name":"Glasgow North East","ons_code":"S14000032","mp":"Mr Paul Sweeney MP","signature_count":14},{"name":"Glasgow North West","ons_code":"S14000033","mp":"Carol Monaghan MP","signature_count":23},{"name":"Glasgow South","ons_code":"S14000034","mp":"Stewart Malcolm McDonald MP","signature_count":12}`


The text in quotes are the **name ** of for the data and the content after the colon is the ** value ** e.g. `"mp":"Mr Paul Sweeney MP"`

To access the json data we can use the link [https://petition.parliament.uk/petitions/233926.json](https://petition.parliament.uk/petitions/233926.json) (note it's the same link with `.json` added.)

We'll grab that using the get function from the requests library. 

In [0]:
response  = requests.get(' https://petition.parliament.uk/petitions/233926.json')
response.text

'{"links":{"self":"https://petition.parliament.uk/petitions/233926.json"},"data":{"type":"petition","id":233926,"attributes":{"action":"People found with a knife to get 10 years and using a knife 25 years in prison.","background":"People are scared of the amount of knife crime with apparently very little deterent to stop people carrying knifes.","additional_details":"","state":"open","signature_count":107575,"created_at":"2018-11-26T19:42:52.146Z","updated_at":"2019-03-26T14:56:35.126Z","rejected_at":null,"opened_at":"2018-12-03T11:13:14.938Z","closed_at":null,"moderation_threshold_reached_at":"2018-11-26T20:49:46.815Z","response_threshold_reached_at":"2019-02-28T01:21:25.389Z","government_response_at":"2019-03-14T16:47:38.458Z","debate_threshold_reached_at":"2019-03-13T13:20:55.518Z","scheduled_debate_date":"2019-03-25","debate_outcome_at":null,"creator_name":"John Perrins","rejection":null,"government_response":{"responded_on":"2019-03-14","summary":"Conviction of a knife or offensiv

All of the page is now stored as an object called `response`. We can see the content by using the .text function. (you'll need to scroll across to see it all). So we know it's working but we need to 'convert' the raw text into something with some structure. 

In [0]:
petition_data = response.json()
petition_data

{'data': {'attributes': {'action': 'Revoke Article 50 and remain in the EU.',
   'additional_details': '',
   'background': "The government repeatedly claims exiting the EU is 'the will of the people'. We need to put a stop to this claim by proving the strength of public support now, for remaining in the EU. A People's Vote may not happen - so vote now.",
   'closed_at': None,
   'created_at': '2019-02-14T12:14:59.326Z',
   'creator_name': 'Margaret Anne Georgiadou',
   'debate': None,
   'debate_outcome_at': None,
   'debate_threshold_reached_at': '2019-03-20T20:33:35.184Z',
   'government_response': None,
   'government_response_at': None,
   'moderation_threshold_reached_at': '2019-02-14T14:57:53.747Z',
   'opened_at': '2019-02-20T10:25:02.393Z',
   'rejected_at': None,
   'rejection': None,
   'response_threshold_reached_at': '2019-03-18T13:26:30.257Z',
   'scheduled_debate_date': None,
   'signature_count': 5735629,
   'signatures_by_constituency': [{'mp': 'Tommy Sheppard MP',
   

The .json() function converts the text into a 'dictionary' called `pet` -  a container of data that is indexed or, in other words, you can look-up data by referencing where it is in the dictionary.  It's a big dictionary!  But hopefully you can see the structure and you can see the keyname:value data more clearly. e.g. `'creator_name': 'John Perrins'.` You can also start to see there is a more complex hierarchy of structure here.  We start with a keyname `data`. In that there is a value called `attributes` and within that there are a number of other names and values. 

It means we can work our way through the hierarchy to find what we want.   The example below shows how we can find the name of the person using code.

In [0]:
petition_data["data"]["attributes"]["creator_name"]

'Margaret Anne Georgiadou'

How would you change the code above to get the total number of signatures for the petition

The code below does the same kind of thing, but digs a bit deeper into the signatures by constituency bit of the data. If you look at the json you'll see that unlike the creator_name value, there is more than one bit of data here. That's where the `[5]` comes in. It effectively says:

  go to `data` and then down to `attributes` and then down to the fifth `signatures_by_constituency` and then show me the value for `mp`. Try changing the value and remember Python starts counting these things at 0 not 1. 

In [0]:
petition_data["data"]["attributes"]["signatures_by_constituency"][34]["mp"]


'Gavin Newlands MP'

One are of interesting things to explore around petitions is where people sign from. We can grab that data using the same principle as before.  This time I've used the results to make a new variable called country_data to hold that information. 

In [0]:
country_data = petition_data['data']['attributes']['signatures_by_country']
country_data

[{'code': 'AF', 'name': 'Afghanistan', 'signature_count': 22},
 {'code': 'AL', 'name': 'Albania', 'signature_count': 21},
 {'code': 'DZ', 'name': 'Algeria', 'signature_count': 13},
 {'code': 'AS', 'name': 'American Samoa', 'signature_count': 4},
 {'code': 'AD', 'name': 'Andorra', 'signature_count': 43},
 {'code': 'AO', 'name': 'Angola', 'signature_count': 9},
 {'code': 'AI', 'name': 'Anguilla', 'signature_count': 15},
 {'code': 'AG', 'name': 'Antigua and Barbuda', 'signature_count': 23},
 {'code': 'AR', 'name': 'Argentina', 'signature_count': 163},
 {'code': 'AM', 'name': 'Armenia', 'signature_count': 7},
 {'code': 'AW', 'name': 'Aruba', 'signature_count': 5},
 {'code': 'AU', 'name': 'Australia', 'signature_count': 18346},
 {'code': 'AT', 'name': 'Austria', 'signature_count': 2654},
 {'code': 'AZ', 'name': 'Azerbaijan', 'signature_count': 14},
 {'code': 'BH', 'name': 'Bahrain', 'signature_count': 145},
 {'code': 'BD', 'name': 'Bangladesh', 'signature_count': 23},
 {'code': 'BB', 'name'

Now that we have that data in a new variable, we can use the pandas library to allow us to do some quick analysis. The first thing we need to do is to convert the data into a pandas **data frame** called` country_df`

In [0]:
country_df = pd.DataFrame(country_data)
country_df

Unnamed: 0,code,name,signature_count
0,AF,Afghanistan,22
1,AL,Albania,21
2,DZ,Algeria,13
3,AS,American Samoa,4
4,AD,Andorra,43
5,AO,Angola,9
6,AI,Anguilla,15
7,AG,Antigua and Barbuda,23
8,AR,Argentina,163
9,AM,Armenia,7


You can see the data is now neatly organised as a data frame. But it's organised in alphabetical order. It would be more useful to organise by the number of signatures so we could see where most people signed up from. Pandas has an easy function to do that.

In [0]:
country_df.sort_values(by=['signature_count'], ascending=False)

Unnamed: 0,code,name,signature_count
219,GB,United Kingdom,5499546
71,FR,France,42677
187,ES,Spain,24206
220,US,United States,23422
76,DE,Germany,19591
11,AU,Australia,18346
38,CA,Canada,10437
144,NL,Netherlands,9446
96,IE,Ireland,9181
146,NZ,New Zealand,6802


No surprises really. We could make an assumption that places with large expat communities show high return rates. Let's do the same thing but with the constituency data. 

In [0]:
constituency_data = petition_data['data']['attributes']['signatures_by_constituency']
constituency_df = pd.DataFrame(constituency_data)
constituency_df.sort_values(by=['signature_count'], ascending=False)


Unnamed: 0,mp,name,ons_code,signature_count
292,Julia Lopez MP,Hornchurch and Upminster,E14000751,4412
459,Andrew Rosindell MP,Romford,E14000900,3613
295,Bridget Phillipson MP,Houghton and Sunderland South,E14000754,3086
544,Julie Elliott MP,Sunderland Central,E14000982,2672
191,Jon Cruddas MP,Dagenham and Rainham,E14000657,2383
588,Mrs Sharon Hodgson MP,Washington and Sunderland West,E14001020,1920
559,Jackie Doyle-Price MP,Thurrock,E14000995,1715
56,Rt Hon Dame Margaret Hodge MP,Barking,E14000540,1609
116,Alex Burghart MP,Brentwood and Ongar,E14000594,1429
493,Stephen Metcalfe MP,South Basildon and East Thurrock,E14000933,1239
