# Looking into the Parlament API

- [Service](http://ws-old.parlament.ch/)
- [Documentation](https://www.parlament.ch/centers/documents/de/kurzdokumentation-webservices-d.pdf)

### What will I be gathering
1. Number of the affairs, as far back as possible
2. Then looking at the decisions of these affairs
3. Then look at how the various politicians voted

### Libraries I need

In [1]:
import json
import urllib.request
import pandas as pd
import time

### Gathering the id number of the affairs

In [2]:
count = 1
affair_ids_global = []

for count in range(0,100): #must be a better way to do this
    
    try:
        #Opening all the various json files, max 50 at a time
        request = urllib.request.Request(
        'http://ws-old.parlament.ch/votes/affairs?pageNumber=' + str(count) + '&format=json',
        headers={'User-Agent': 'Mozilla'})

        connection = urllib.request.urlopen(request)
        js = connection.read()

        affair_ids = json.loads(js.decode("utf-8"))
    
        #Creating new list of dicts
        affair_ids_collection = []
    
        for item in affair_ids:
    
            #Pulling out the elements I need
            ID = item['id']
            Titel = item['title']
    
            affair_and_ID = {'ID': ID,'Titel': Titel}
    
            affair_ids_collection.append(affair_and_ID)
    
        #Adding on these IDs to the larger file
        affair_ids_global = affair_ids_global  + affair_ids_collection
    
    except:
        'HTTPError:'
    

In [3]:
len(affair_ids_global)

4098

### Make a list of the values

In [4]:
#Making a list of the key values
id_list = [d['ID'] for d in affair_ids_global]

### Getting the votes of each councillor

In [5]:
#Iterating through all of the votes
councillor_list = []

#Opening all the various json files, max 50 at a time
for affair_number in id_list:
    url = 'http://ws-old.parlament.ch/votes/affairs/'+ str(affair_number) + '?format=json'
    print(url)
    
    # for some reason some numbers can't be found, 
    # for instance this one 20000001. Why? So I 
    # have to work with the try and except. 
    
    try:
        request = urllib.request.Request(url, headers={'User-Agent': 'Mozilla'})

        connection = urllib.request.urlopen(request)
        js = connection.read()

        affair_details = json.loads(js.decode("utf-8"))

        #Getting the title
        affair_overall_title = affair_details['title']

        #Getting the results of each section of the vote on the affairs
        for item in affair_details['affairVotes']:
            date = item['date']
            submission_text = item['submissionText']
            division_text = item['divisionText']
            reg_number = item['registrationNumber']
    
            #Iterating through every one of the councillor entries
            for councillor in item['councillorVotes']:
                decision = councillor['decision']
                first_name = councillor['firstName']
                second_name = councillor['lastName']
                number = councillor['number']
        
                councillor_dict = {'Datum': date,
                               'Submision Text': submission_text,
                               'Division Text': division_text,
                               'Reg. Number': reg_number,
                               'Entscheid': decision,
                               'Vorname': first_name,
                               'Nachname': second_name,
                               'ID Nr.': number}
        
                councillor_list.append(councillor_dict)
            
    except:
        'HTTPError: HTTP Error 404: Not Found'

http://ws-old.parlament.ch/votes/affairs/19910411?format=json
http://ws-old.parlament.ch/votes/affairs/19950085?format=json
http://ws-old.parlament.ch/votes/affairs/19970419?format=json
http://ws-old.parlament.ch/votes/affairs/19980406?format=json
http://ws-old.parlament.ch/votes/affairs/19980451?format=json
http://ws-old.parlament.ch/votes/affairs/19990451?format=json
http://ws-old.parlament.ch/votes/affairs/20000001?format=json
http://ws-old.parlament.ch/votes/affairs/20000079?format=json
http://ws-old.parlament.ch/votes/affairs/20000405?format=json
http://ws-old.parlament.ch/votes/affairs/20000419?format=json
http://ws-old.parlament.ch/votes/affairs/20000421?format=json
http://ws-old.parlament.ch/votes/affairs/20000431?format=json
http://ws-old.parlament.ch/votes/affairs/20000436?format=json
http://ws-old.parlament.ch/votes/affairs/20000456?format=json
http://ws-old.parlament.ch/votes/affairs/20000459?format=json
http://ws-old.parlament.ch/votes/affairs/20000461?format=json
http://w

### Creating df with the councillor list

In [6]:
df = pd.DataFrame(councillor_list)

In [11]:
df.to_csv('data_api/parl_vots_2000-2016.csv')

In [34]:
df['Datum'] = pd.to_datetime(df['Datum']) 

In [35]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2164757 entries, 0 to 2164756
Data columns (total 8 columns):
Datum             datetime64[ns]
Division Text     object
Entscheid         object
ID Nr.            int64
Nachname          object
Reg. Number       int64
Submision Text    object
Vorname           object
dtypes: datetime64[ns](1), int64(2), object(5)
memory usage: 132.1+ MB


### Pulling in sessions 

In [15]:
url = 'http://ws-old.parlament.ch/sessions?format=json'
request = urllib.request.Request(url, headers={'User-Agent': 'Mozilla'})
connection = urllib.request.urlopen(request)
js = connection.read()
sessions = json.loads(js.decode("utf-8"))

In [18]:
#Creating df for sessions
df_sessions = pd.DataFrame(sessions)

In [95]:
df_sessions.head()

Unnamed: 0,code,from,name,to
0,5006,2016-11-28,Wintersession 2016,2016-12-16
1,5005,2016-09-12,Herbstsession 2016,2016-09-30
2,5004,2016-05-30,Sommersession 2016,2016-06-17
3,5003,2016-04-25,Sondersession April 2016,2016-04-27
4,5002,2016-02-29,Frühjahrssession 2016,2016-03-18


In [21]:
#Deleting the columns I don't need
del df_sessions['hasMorePages']
del df_sessions['id']
del df_sessions['updated']

In [29]:
#Overwrite the date column
df_sessions['from'] = pd.to_datetime(df_sessions['from'])  
df_sessions['to'] = pd.to_datetime(df_sessions['to']) 

In [56]:
#Creating lists
code_list = list(df_sessions['code'])
name_list = list(df_sessions['name'])
from_list = list(df_sessions['from'])
to_list = list(df_sessions['to'])

In [59]:
#Creating empty df
starting_dict = [{'Datum': 'XX-XX-XX',
                 'Division Text': 'XXXXXX',
                 'Entscheid': 'XXXXX',
                 'ID Nr.': 1111,
                 'Nachname': 'XXXXX',
                 'Reg. Number': 11111,
                 'Submision Text': 'XXXXX',
                 'Vorname': 'XXXXX',
                 'Session': 'XXXXX',
                 'Code': 1111}]

In [64]:
df_created = pd.DataFrame(starting_dict)

In [66]:
#Creating new df with the session information
for code, name, from_, to in zip(code_list, name_list, from_list, to_list):
    mask = (df['Datum'] > from_) & (df['Datum'] <= to)
    df_new = df.loc[mask]
    df_new['Session'] = name
    df_new['Code'] = code
    
    frames = [df_created, df_new]
    df_created = pd.concat(frames)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [77]:
#Creating new code column
def first_two(number):
    number = str(number)
    return number[:2]

In [78]:
#Creating new column
df_created['S-Code'] = df_created['Code'].apply(first_two)

### Selecting df_50 and df_49
Only first year of sessions in 49. Leg

In [91]:
#Selecting df_50
df_50 = df_created[df_created['S-Code'] == '50']

In [84]:
#Deleting XXXX row
df_created = df_created.reindex(df_created.index.drop(0)) 

In [88]:
#Creating date format
df_created['Datum'] = pd.to_datetime(df_created['Datum']) 

In [89]:
#Selecting df_49
mask = (df_created['Datum'] > '2011-12-23') & (df_created['Datum'] <= '2012-12-14')
df_49 = df_created.loc[mask]

In [93]:
df_49.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 189332 entries, 248583 to 1503272
Data columns (total 12 columns):
Code               189332 non-null object
Datum              189332 non-null datetime64[ns]
Division Text      188535 non-null object
Entscheid          189332 non-null object
ID Nr.             189332 non-null int64
Nachname           189332 non-null object
Reg. Number        189332 non-null int64
Session            189332 non-null object
Submision Text     189332 non-null object
Submission Text    0 non-null object
Vorname            189332 non-null object
S-Code             189332 non-null object
dtypes: datetime64[ns](1), int64(2), object(9)
memory usage: 18.8+ MB


In [94]:
df_50.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 226134 entries, 1031679 to 2083356
Data columns (total 12 columns):
Code               226134 non-null object
Datum              226134 non-null datetime64[ns]
Division Text      155153 non-null object
Entscheid          226134 non-null object
ID Nr.             226134 non-null int64
Nachname           226134 non-null object
Reg. Number        226134 non-null int64
Session            226134 non-null object
Submision Text     226134 non-null object
Submission Text    0 non-null object
Vorname            226134 non-null object
S-Code             226134 non-null object
dtypes: datetime64[ns](1), int64(2), object(9)
memory usage: 22.4+ MB


### Creating Majority Vote Column

In [150]:
reg_list = list(set(list(df_50['Reg. Number'])))

In [151]:
#Creating empty df
starting_dict = [{'Datum': 'XX-XX-XX',
                 'Division Text': 'XXXXXX',
                 'Entscheid': 'XXXXX',
                 'ID Nr.': 1111,
                 'Nachname': 'XXXXX',
                 'Reg. Number': 11111,
                 'Submision Text': 'XXXXX',
                 'Vorname': 'XXXXX',
                 'Session': 'XXXXX',
                 'Code': 1111,
                 'S-Code': 11}]

In [152]:
df_50_majority = pd.DataFrame(starting_dict)

In [153]:
# Creating with the majority votes

for item in reg_list:
    #Counting the most popular count
    result = pd.DataFrame(df_50[df_50['Reg. Number'] == item]['Entscheid'].value_counts()).reset_index()[:1]['index'][0]
    
    df_new = df_50[df_50['Reg. Number'] == item]
    df_new['Majority Vote'] = result
    
    frames = [df_50_majority, df_new]
    df_50_majority = pd.concat(frames)
    

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [154]:
#Why are there majority vote with more Enthaltungen?
df_50_majority['Majority Vote'].value_counts() #Problem with the Enthaltungen 

Yes    166342
No      59392
EH        400
Name: Majority Vote, dtype: int64

In [155]:
#Both at 200, which is good, because it means the votes are the ones from the NR
df_50_majority[df_50_majority['Majority Vote'] == 'EH']['Reg. Number'].value_counts()

14163    200
13994    200
Name: Reg. Number, dtype: int64

In [156]:
#So votes of affair 14163 were actually "yes"
df_50_majority[df_50_majority['Reg. Number'] == 14163]['Entscheid'].value_counts()

EH     109
Yes     74
No      13
ES       2
NT       1
P        1
Name: Entscheid, dtype: int64

In [157]:
#And votes of affair 13994 were actually "no"
df_50_majority[df_50_majority['Reg. Number'] == 13994]['Entscheid'].value_counts()

EH     109
No      83
Yes      5
NT       1
ES       1
P        1
Name: Entscheid, dtype: int64

In [160]:
#That means I need to change EH in 14163 to 'yes'
#and EH in 13994 to no.

#First I need to rename the columns though, as the formula I am using can't deal with empt spaces
df_50_majority.columns = [['code', 'datum', 'div_text', 'entscheid', 'id', 'majority', 'nachname', 'reg', 's-code', 'session', 'sub_text', 'st', 'vorname']]

In [164]:
#now I can manipulate the values in the cells
df_50_majority.loc[df_50_majority.reg == 14163, 'majority'] = "Yes"
df_50_majority.loc[df_50_majority.reg == 13994, 'majority'] = "No"

In [166]:
#Dropping specific cell index x from df
df_50_majority = df_50_majority.reindex(df_50_majority.index.drop(0)) 

In [130]:
reg_list = list(set(list(df_49['Reg. Number'])))

In [131]:
df_49_majority = pd.DataFrame(starting_dict)

In [132]:
for item in reg_list:
    #Counting the most popular count
    result = pd.DataFrame(df_49[df_49['Reg. Number'] == item]['Entscheid'].value_counts()).reset_index()[:1]['index'][0]
    
    df_new = df_49[df_49['Reg. Number'] == item]
    df_new['Majority Vote'] = result
    
    frames = [df_49_majority, df_new]
    df_49_majority = pd.concat(frames)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [138]:
#Dropping specific cell index x from df
df_49_majority = df_49_majority.reindex(df_49_majority.index.drop(0)) 

In [139]:
df_49_majority['Majority Vote'].value_counts() 

Yes    147246
No      42086
Name: Majority Vote, dtype: int64

In [167]:
#Renaming df_49 so values are consistent
df_49_majority.columns = [['code', 'datum', 'div_text', 'entscheid', 'id', 'majority', 'nachname', 'reg', 's-code', 'session', 'sub_text', 'st', 'vorname']]

In [168]:
df_49_majority.to_csv('data_api/df_49_Leg_erstes_jahr.csv')

In [169]:
df_50_majority.to_csv('data_api/df_50_Leg_erstes_jahr.csv')