In [1]:
import json
import numpy as np
import pandas as pd
import psycopg2
import us

%load_ext jupyternotify

<IPython.core.display.Javascript object>

# Pharmaceutical lobbying

We will use lobbying data from the [National Institute on Money in Politics](https://www.followthemoney.org/) to determine the extent of pharmaceutical industry state legislative lobbying from 2008 through 2017.

## Create tables for analysis

First, connect to the database.

In [2]:
with open("config.json") as f:
    conf = json.load(f)

In [3]:
conn_str = "host={} dbname={} user={} password={}".format(conf["host"], conf["database"], conf["user"], conf["password"])

In [4]:
conn = psycopg2.connect(conn_str)
conn.autocommit = True # Allow the notebook to commit transactions (like creating a table) to the connected database.

Query the database to create a new table filtering the data to only lobbying conducted since 2008.

In [5]:
%%notify
lobbying_08_17 = pd.read_sql("""CREATE TABLE IF NOT EXISTS lobbying_08_17 AS
SELECT *
FROM lobbying
WHERE YEAR >= 2008;

GRANT ALL ON TABLE lobbying_08_17 TO redash;


SELECT *
FROM lobbying_08_17;""", con=conn)

<IPython.core.display.Javascript object>

Now we'll query the database to create a new table of pharmaceutical lobbying.

In [6]:
%%notify
registrations = pd.read_sql("""CREATE TABLE IF NOT EXISTS pharma_lobbying AS
SELECT *
FROM lobbying_08_17
WHERE year >= 2008
  AND catcodebusiness = 'Pharmaceutical manufacturing';

GRANT ALL ON TABLE pharma_lobbying TO redash;


SELECT *
FROM pharma_lobbying;""", con=conn)

<IPython.core.display.Javascript object>

The data reflects lobbyist registration forms. Sometimes, depending on lobbying disclosure rules, lobbyists will file multiple registration forms listing the same client multiple times in the same year. So in order to have accurate counts of these client-lobbyist relationship, we need to create a table grouped on state, year, lobbyist and client.

In [7]:
%%notify
lobbyist_client_relationships = pd.read_sql("""CREATE TABLE IF NOT EXISTS lobbyist_client_relationships AS
SELECT jurisdiction,
       year,
       client,
       lobbyist,
       count(*) as number_of_registrations
FROM pharma_lobbying
GROUP BY jurisdiction,
         year,
         client,
         lobbyist;

GRANT ALL ON TABLE lobbyist_client_relationships TO redash;


SELECT *
FROM lobbyist_client_relationships;""", con=conn)

<IPython.core.display.Javascript object>

## Import and format state legislative data

One way to measure the influence of pharmaceutical industry lobbying on state legislatures is to compare the number of lobbyists to the number of legislators. To do so, we will use data from the [National Conference of State Legislatures](http://www.ncsl.org/).

In [8]:
legislators = pd.read_excel("data/ncsl_partisan_composition_2000_2018.xlsx", sheetname="2018", header=1, names=["state", "legislators"], usecols=[0, 1], nrows=50)
legislators

  **kwds)


Unnamed: 0,state,legislators
0,Alabama,140
1,Alaska*,60
2,Arizona,90
3,Arkansas,135
4,California,120
5,Colorado,100
6,Connecticut*,187
7,Delaware,62
8,Florida,160
9,Georgia,236


In [9]:
legislators["state"] = legislators["state"].str.replace("*", "").str.strip()

We need to add state abbreviations in order to merge this data with the lobbying data later. To do so, we will use the [US package](https://github.com/unitedstates/python-us).

In [10]:
states = pd.DataFrame.from_dict(us.states.mapping("abbr", "name"), orient="index", columns=["state"])
states.reset_index(inplace=True)
states.rename(columns={"index": "abbreviation"}, inplace=True)
states

Unnamed: 0,abbreviation,state
0,AL,Alabama
1,AK,Alaska
2,AS,American Samoa
3,AZ,Arizona
4,AR,Arkansas
5,CA,California
6,CO,Colorado
7,CT,Connecticut
8,DK,Dakota
9,DE,Delaware


In [11]:
legislators = legislators.merge(states, on="state", how="inner")
legislators

Unnamed: 0,state,legislators,abbreviation
0,Alabama,140,AL
1,Alaska,60,AK
2,Arizona,90,AZ
3,Arkansas,135,AR
4,California,120,CA
5,Colorado,100,CO
6,Connecticut,187,CT
7,Delaware,62,DE
8,Florida,160,FL
9,Georgia,236,GA


## Analyze the data

How many annual lobbyist-client relationships were there in each year and state as measured by annual registrations that represented pharmaceutical interests and how does this compare to the number of legislators in each year and state?

In [12]:
lobbying_by_year_state = lobbyist_client_relationships.groupby(["year", "jurisdiction"]).agg({"client": len})
lobbying_by_year_state.reset_index(inplace=True)
lobbying_by_year_state.rename(columns={"jurisdiction": "state", "client": "registrations"}, inplace=True)
# And now join this with the legislators dataframe.
lobbying_by_year_state = lobbying_by_year_state.merge(legislators, left_on="state", right_on="abbreviation", how="outer")
lobbying_by_year_state.drop("state_x", axis=1, inplace=True)
lobbying_by_year_state.rename(columns={"state_y": "state"}, inplace=True)
lobbying_by_year_state.head()

Unnamed: 0,year,registrations,state,legislators,abbreviation
0,2008,22,Alaska,60,AK
1,2009,15,Alaska,60,AK
2,2010,11,Alaska,60,AK
3,2011,13,Alaska,60,AK
4,2012,8,Alaska,60,AK


We need to subtract one legislator from New York's total in years prior to 2013 when the state added a senator.

In [13]:
lobbying_by_year_state.loc[(lobbying_by_year_state["abbreviation"] == "NY") & (lobbying_by_year_state["year"] <= 2012), "legislators"] = 212
# Calculate the legislators-to-registrations ratio.
lobbying_by_year_state["legislators_to_registrations"] = lobbying_by_year_state["legislators"] / lobbying_by_year_state["registrations"]
lobbying_by_year_state.head()

Unnamed: 0,year,registrations,state,legislators,abbreviation,legislators_to_registrations
0,2008,22,Alaska,60,AK,2.727273
1,2009,15,Alaska,60,AK,4.0
2,2010,11,Alaska,60,AK,5.454545
3,2011,13,Alaska,60,AK,4.615385
4,2012,8,Alaska,60,AK,7.5


In [14]:
lobbying_by_year_state.groupby("year").agg({"registrations": sum, "legislators": sum})

Unnamed: 0_level_0,registrations,legislators
year,Unnamed: 1_level_1,Unnamed: 2_level_1
2008,2708,7382
2009,2497,7382
2010,2692,7382
2011,3051,7382
2012,2520,7382
2013,3259,7383
2014,3320,7320
2015,3725,7383
2016,3454,7320
2017,3162,7383


We know the number of legislators should stay constant with the exception of New York's count going up by one after 2012. So why are the total number of legislators in 2014 and 2016 lower than other years?

In [15]:
lobbying_by_year_state.groupby(["abbreviation"])["abbreviation"].size()

abbreviation
AK    10
AL    10
AR    10
AZ    10
CA    10
CO    10
CT    10
DE    10
FL    10
GA    10
HI    10
IA    10
ID    10
IL    10
IN    10
KS    10
KY    10
LA    10
MA    10
MD    10
ME    10
MI    10
MN    10
MO    10
MS    10
MT    10
NC    10
ND    10
NE    10
NH    10
NJ    10
NM    10
NV     8
NY    10
OH    10
OK    10
OR    10
PA    10
RI    10
SC    10
SD    10
TN    10
TX    10
UT    10
VA    10
VT    10
WA    10
WI    10
WV    10
WY    10
Name: abbreviation, dtype: int64

It appears that Nevada had no lobbyists in those years. Let's create a dataframe with data for them and concatenate it to the existing dataframe.

In [16]:
missing_nv = pd.DataFrame({"year": [2014, 2016],
                           "registrations": [0, 0],
                           "state": ["Nevada", "Nevada"],
                           "legislators": [63, 63],
                           "abbreviation": ["NV", "NV"],
                           "legislators_to_registrations": [np.NaN, np.NaN]},
                          index=[0, 1])
frames = [lobbying_by_year_state, missing_nv]
lobbying_by_year_state = pd.concat(frames)
lobbying_by_year_state.groupby("year").agg({"registrations": sum, "legislators": sum})

Unnamed: 0_level_0,registrations,legislators
year,Unnamed: 1_level_1,Unnamed: 2_level_1
2008,2708,7382
2009,2497,7382
2010,2692,7382
2011,3051,7382
2012,2520,7382
2013,3259,7383
2014,3320,7383
2015,3725,7383
2016,3454,7383
2017,3162,7383


Perfect. Finally, export the data to Excel for visualization and further analysis.

In [17]:
lobbying_by_year_state.to_excel("data/pharma_lobbying.xlsx", index=False, sheet_name="Pharmaceutical Lobbying")