In [1]:
import json
import numpy as np
import pandas as pd
import psycopg2

%load_ext jupyternotify

<IPython.core.display.Javascript object>

# Pharmaceutical lobbying

We will use lobbying data from the [National Institute on Money in Politics](https://www.followthemoney.org/) to determine the extent of pharmaceutical industry state legislative lobbying from 2008 through 2017.

## Create tables for analysis

First, connect to the database.

In [2]:
with open("config.json") as f:
    conf = json.load(f)

In [3]:
conn_str = "host={} dbname={} user={} password={}".format(conf["host"], conf["database"], conf["user"], conf["password"])

In [4]:
conn = psycopg2.connect(conn_str)
conn.autocommit = True # Allow the notebook to commit transactions (like creating a table) to the connected database.

Query the database to create a new table filtering the data to only lobbying conducted since 2008.

In [5]:
%%notify
lobbying_08_17 = pd.read_sql("""CREATE TABLE IF NOT EXISTS lobbying_08_17 AS
SELECT *
FROM lobbying
WHERE YEAR >= 2008;

GRANT ALL ON TABLE lobbying_08_17 TO redash;


SELECT *
FROM lobbying_08_17;""", con=conn)

<IPython.core.display.Javascript object>

Now we'll query the database to create a new table of pharmaceutical lobbying.

In [7]:
%%notify
registrations = pd.read_sql("""CREATE TABLE IF NOT EXISTS pharma_lobbying AS
SELECT *
FROM lobbying_08_17
WHERE year >= 2008
  AND catcodebusiness = 'Pharmaceutical manufacturing';

GRANT ALL ON TABLE pharma_lobbying TO redash;


SELECT *
FROM pharma_lobbying;""", con=conn)

<IPython.core.display.Javascript object>

The data reflects lobbyist registration forms. Sometimes, depending on lobbying disclosure rules, lobbyists will file multiple registration forms listing the same client multiple times in the same year. So in order to have accurate counts of these client-lobbyist relationship, we need to create a table grouped on state, year, lobbyist and client.

In [13]:
%%notify
lobbyist_client_relationships = pd.read_sql("""CREATE TABLE IF NOT EXISTS lobbyist_client_relationships AS
SELECT jurisdiction,
       year,
       client,
       lobbyist,
       count(*) as number_of_registrations
FROM pharma_lobbying
GROUP BY jurisdiction,
         year,
         client,
         lobbyist;

GRANT ALL ON TABLE lobbyist_client_relationships TO redash;


SELECT *
FROM lobbyist_client_relationships;""", con=conn)

<IPython.core.display.Javascript object>

## Import and format state legislative data

## Analyze the data

How many annual lobbyist-client relationships were there in each year and state as measured by annual registrations that represented pharmaceutical interests?

In [30]:
lobbying_by_year_state = lobbyist_client_relationships.groupby(["year", "jurisdiction"]).agg({"client": len})
lobbying_by_year_state.reset_index(inplace=True)
lobbying_by_year_state.rename(columns={"jurisdiction": "state", "client": "registrations"}, inplace=True)
lobbying_by_year_state.head()

Unnamed: 0,year,state,registrations
0,2008,AK,22
1,2008,AL,31
2,2008,AR,36
3,2008,AZ,48
4,2008,CA,144
