# CONGRESSS VOTES DATA
2021-10-11

# VOTES
https://voteview.com/articles/data_help_votes
### Fields:

congress: Integer 1+. The number of the congress that this member's row refers to. e.g. 115 for the 115th Congress (2017-2019) <br />

chamber: House, Senate, or President. The chamber in which the member served.
rollnumber: Integer 1+. Starts from 1 in the first rollcall of each congress. Excludes quorum calls and vacated votes. <br />

icpsr: Integer 1-99999. This is an ID code which identifies the member in question. In general, each member receives a single ICPSR identifier applicable to their entire career. A small number of members have received more than one: this can occur for members who have switched parties; as well as members who subsequently become president. Creating a new identifier allows a new NOMINATE estimate to be produced for separate appearances of a member in different roles. <br />

cast_code: Integer 0-9. Indicator of how the member voted. <br />

prob: Estimated probability, based on NOMINATE, of the member making the vote as recorded. <br />

### Cast Codes
cast_code	Description <br />
0	Not a member of the chamber when this vote was taken <br />
1	Yea <br />
2	Paired Yea <br />
3	Announced Yea <br />
4	Announced Nay <br />
5	Paired Nay <br />
6	Nay <br />
7	Present (some Congresses) <br />
8	Present (some Congresses) <br />
9	Not Voting (Abstention) <br />

# Member Ideology Data

https://voteview.com/articles/data_help_members


### Biographical Fields: 
congress: Integer 1+. The number of the congress that this member's row refers to. e.g. 115 for the 115th Congress (2017-2019) <br />

chamber: House, Senate, or President. The chamber in which the member served. <br />

icpsr: Integer 1-99999. This is an ID code which identifies the member in question. In general, each member receives a single ICPSR identifier applicable to their entire career. A small number of members have received more than one: this can occur for members who have switched parties; as well as members who subsequently become president. Creating a new identifier allows a new NOMINATE estimate to be produced for separate appearances of a member in different roles. <br />

state_icpsr: Integer 0-99. Identifier for the state represented by the member. <br />

district_code: Integer 0-99. Identifier for the district that the member represents within their state (e.g. 3 for the Alabama 3rd Congressional District). Senate members are given district_code 0. Members who represent historical "at-large" districts are assigned 99, 98, or 1 in various circumstances. <br />

state_abbrev: String. Two-character postal abbreviation for state (e.g. MO for Missouri). <br />
party_code: Integer 1-9999. Identifying code for the member's party. Please see documentation for Party Data for more information about which party_code identifiers refer to which parties. <br />

occupancy: Integer 1+. ICPSR occupancy code. This item is considered legacy or incomplete information and has not been verified. In general, members receive 0 if they are the only occupant, 1 if they are the first occupant, 2 if they are the second occupant, etc.
last_means: Integer 1-5. ICPSR Attain-Office Code. This is an indicator that reflects the member's last means of attaining office. This item is considered legacy or incomplete information and has not been verified. Members received 1 if they were elected in a general election, 2 if elected by special election, 3 if directly elected by a state legislature, and 5 if appointed. <br />
 
bioname: String. Name of the member, surname first. For most members, agrees with the Biographical Directory of Congress. <br />

bioguide_id: String. Member identifier in the Biographical Directory of Congress. <br />

born: Integer. Year of member's birth. <br />
died: Integer. Year of member's death.


## Party Codes
https://voteview.com/articles/data_help_parties

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx

In [2]:
file_vote = 'HSall_votes.csv'
file_members = 'HSall_members.csv'

In [3]:
df_vote = pd.read_csv(file_vote)
df_members = pd.read_csv(file_members)

In [4]:
df_vote.head()
df_members.head()

Unnamed: 0,congress,chamber,icpsr,state_icpsr,district_code,state_abbrev,party_code,occupancy,last_means,bioname,...,died,nominate_dim1,nominate_dim2,nominate_log_likelihood,nominate_geo_mean_probability,nominate_number_of_votes,nominate_number_of_errors,conditional,nokken_poole_dim1,nokken_poole_dim2
0,1,President,99869,99,0.0,USA,5000,,,"WASHINGTON, George",...,,,,,,,,,,
1,1,House,4766,1,98.0,CT,5000,0.0,1.0,"HUNTINGTON, Benjamin",...,1800.0,0.639,0.304,-29.0467,0.708,84.0,12.0,,0.649,0.229
2,1,House,8457,1,98.0,CT,5000,0.0,1.0,"SHERMAN, Roger",...,1793.0,0.589,0.307,-40.5958,0.684,107.0,18.0,,0.614,0.298
3,1,House,9062,1,98.0,CT,5000,0.0,1.0,"STURGES, Jonathan",...,1819.0,0.531,0.448,-25.87361,0.724,80.0,13.0,,0.573,0.529
4,1,House,9489,1,98.0,CT,5000,0.0,1.0,"TRUMBULL, Jonathan, Jr.",...,1809.0,0.692,0.246,-30.47113,0.75,106.0,11.0,,0.749,0.166


In [5]:
df_vote.head()

Unnamed: 0,congress,chamber,rollnumber,icpsr,cast_code,prob
0,1,House,1,154.0,6,61.1
1,1,House,1,259.0,9,99.6
2,1,House,1,379.0,1,100.0
3,1,House,1,649.0,1,59.2
4,1,House,1,786.0,1,97.7


To do :

- cast_code converter to Yes/No
- party_code dict
- icpsr to party_code

- define republican / democrat vote
- extract meaning full congress data (past 1970s)
- get exact bill time stamp


Final Output :
Adjacency matrix of :  MP x Rep/Dem vote x Times ?

How should time be treated ? Is every bill separated by the same interval ? Timing of event ?

In [6]:
len(df_vote)/10**6

def congress2year(n):
    return(2*n+1787)

In [7]:
congress_limit = 85
congress2year(congress_limit)

1957

In [8]:
# select from the 86 congress onwards
def congress_lim(df,n_congress):
    return(df.loc[df['congress'] > n_congress])
df = congress_lim(df_vote,congress_limit)

In [9]:
# clean db_members 
columns_dbm = df_members.columns
df_members_light = df_members[columns_dbm[[0,1,2,6]]]
df_members_light = congress_lim(df_members_light,congress_limit)


In [10]:
df_members_light.head()

Unnamed: 0,congress,chamber,icpsr,party_code
32419,86,President,99901,200
32420,86,House,195,100
32421,86,House,937,100
32422,86,House,2909,100
32423,86,House,3754,100


In [11]:
result_m = pd.merge(
                df, df_members_light, on=['congress','icpsr','chamber'], how='inner', sort=False
                )
print(f'Is missing {len(df)-len(result_m)}')

Is missing 49000


In [12]:
result_m.head()

Unnamed: 0,congress,chamber,rollnumber,icpsr,cast_code,prob,party_code
0,86,House,1,2.0,1,98.4,100
1,86,House,2,2.0,6,98.8,100
2,86,House,3,2.0,1,98.6,100
3,86,House,4,2.0,1,93.4,100
4,86,House,5,2.0,6,87.5,100


In [13]:
df_rollcalls = pd.read_csv('HSall_rollcalls.csv')
df_parties = pd.read_csv('HSall_parties.csv')

df_rollcalls = congress_lim(df_rollcalls,congress_limit)
df_parties = congress_lim(df_parties,congress_limit)

  interactivity=interactivity, compiler=compiler, result=result)


In [14]:
columns_dfr = df_rollcalls.columns
df_rollcalls_light = df_rollcalls[columns_dfr[[0,1,2,3]]]
df_rollcalls_light.head()

Unnamed: 0,congress,chamber,rollnumber,date
54683,86,House,1,1959-01-07
54684,86,House,2,1959-02-04
54685,86,House,3,1959-02-04
54686,86,House,4,1959-02-05
54687,86,House,5,1959-03-11


In [15]:
columns_dfp = df_parties.columns
df_parties_light = df_parties[columns_dfp[[0,1,2,3]]]
df_parties_light.head()

Unnamed: 0,congress,chamber,party_code,party_name
641,86,President,200,Republican
642,86,House,100,Democrat
643,86,House,200,Republican
644,86,House,329,Ind. Democrat
645,86,Senate,100,Democrat


In [16]:
result_p = pd.merge(
                result_m, df_parties_light, on=['congress','party_code','chamber'], how='inner', sort=False
                )
    
print(f'Is missing {len(result_m)-len(result_p)}')

Is missing 0


In [17]:
result_r = pd.merge(
                result_p, df_rollcalls_light, on=['congress','rollnumber','chamber'], how='inner', sort=False
                )

print(f'Is missing {len(result_p)-len(result_r)}')

Is missing 0


In [18]:
result_r

Unnamed: 0,congress,chamber,rollnumber,icpsr,cast_code,prob,party_code,party_name,date
0,86,House,1,2.0,1,98.4,100,Democrat,1959-01-07
1,86,House,1,13.0,1,99.5,100,Democrat,1959-01-07
2,86,House,1,46.0,1,99.5,100,Democrat,1959-01-07
3,86,House,1,62.0,1,100.0,100,Democrat,1959-01-07
4,86,House,1,82.0,1,99.9,100,Democrat,1959-01-07
...,...,...,...,...,...,...,...,...,...
15875208,117,Senate,402,42103.0,1,100.0,100,Democrat,2021-09-30
15875209,117,Senate,402,42104.0,1,100.0,100,Democrat,2021-09-30
15875210,117,Senate,402,42105.0,1,100.0,100,Democrat,2021-09-30
15875211,117,Senate,402,29147.0,1,100.0,328,Independent,2021-09-30


In [19]:
result_r['cast_code'].unique()

array([1, 7, 6, 5, 2, 3, 4, 9, 8])

In [20]:
result_r

Unnamed: 0,congress,chamber,rollnumber,icpsr,cast_code,prob,party_code,party_name,date
0,86,House,1,2.0,1,98.4,100,Democrat,1959-01-07
1,86,House,1,13.0,1,99.5,100,Democrat,1959-01-07
2,86,House,1,46.0,1,99.5,100,Democrat,1959-01-07
3,86,House,1,62.0,1,100.0,100,Democrat,1959-01-07
4,86,House,1,82.0,1,99.9,100,Democrat,1959-01-07
...,...,...,...,...,...,...,...,...,...
15875208,117,Senate,402,42103.0,1,100.0,100,Democrat,2021-09-30
15875209,117,Senate,402,42104.0,1,100.0,100,Democrat,2021-09-30
15875210,117,Senate,402,42105.0,1,100.0,100,Democrat,2021-09-30
15875211,117,Senate,402,29147.0,1,100.0,328,Independent,2021-09-30


In [21]:
def cast_code_to_bool(i): # convert 
    if i in [4,5,6]: # 0 1 (No)
        return 0
    elif i in [1,2,3]: # 1 0 (yes)
        return 1
    else :  # 0 0 (NO LINKS)
        return 2
    
    

In [22]:
result_r['cast_code'] = result_r['cast_code'].apply(cast_code_to_bool)


In [23]:
result_r = result_r.drop(['prob'], axis=1)

In [24]:
result_r

Unnamed: 0,congress,chamber,rollnumber,icpsr,cast_code,party_code,party_name,date
0,86,House,1,2.0,1,100,Democrat,1959-01-07
1,86,House,1,13.0,1,100,Democrat,1959-01-07
2,86,House,1,46.0,1,100,Democrat,1959-01-07
3,86,House,1,62.0,1,100,Democrat,1959-01-07
4,86,House,1,82.0,1,100,Democrat,1959-01-07
...,...,...,...,...,...,...,...,...
15875208,117,Senate,402,42103.0,1,100,Democrat,2021-09-30
15875209,117,Senate,402,42104.0,1,100,Democrat,2021-09-30
15875210,117,Senate,402,42105.0,1,100,Democrat,2021-09-30
15875211,117,Senate,402,29147.0,1,328,Independent,2021-09-30
