# Pharmaceutical lobbying

What is the extent of state-level lobbying by pharmaceutical interests?

## Import and process the data

In [1]:
import numpy as np
import pandas as pd

In [2]:
lobbying = pd.read_csv("data/ASAYLobClientsWCoding.txt", sep="\t", dtype={"ClientEID": object, "LobbyistEID": object, "Affiliate": object})
lobbying.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3165825 entries, 0 to 3165824
Data columns (total 10 columns):
Jurisdiction       object
Year               int64
ClientEID          object
Client             object
CatCodeGroup       object
CatCodeIndustry    object
CatCodeBusiness    object
LobbyistEID        object
Lobbyist           object
Affiliate          object
dtypes: int64(1), object(9)
memory usage: 241.5+ MB


In [3]:
lobbying.to_csv("data/nimp_lobbying.csv")

In [3]:
lobbying.head()

Unnamed: 0,Jurisdiction,Year,ClientEID,Client,CatCodeGroup,CatCodeIndustry,CatCodeBusiness,LobbyistEID,Lobbyist,Affiliate
0,NJ,2013,20317593,"GANNON, RICHARD",,,,20317593,"GANNON, RICHARD",
1,NJ,2013,27607512,180-TURNING LIVES AROUND,,,,20317593,"GANNON, RICHARD",
2,NJ,2013,27420350,"EMMONS, WILLIAM",,,,27420350,"EMMONS, WILLIAM",
3,NJ,2013,27535869,"FERNANDEZ, EDWARD",,,,27420350,"EMMONS, WILLIAM",
4,NJ,2013,27606652,PALISADES SAFETY & INSURANCE ASSOCIATION,,,,27420350,"EMMONS, WILLIAM",


In [4]:
lobbying.groupby("Year")["Jurisdiction"].nunique()

Year
2000     2
2001     2
2002    10
2003    12
2004    12
2005    17
2006    50
2007    50
2008    50
2009    50
2010    50
2011    50
2012    50
2013    50
2014    50
2015    50
2016    50
2017    34
Name: Jurisdiction, dtype: int64

Filter to just 2006-2016.

In [5]:
lobbying = lobbying[(lobbying["Year"] >= 2006) & (lobbying["Year"] <= 2016)]
lobbying.reset_index(drop=True, inplace=True)
lobbying.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2713150 entries, 0 to 2713149
Data columns (total 10 columns):
Jurisdiction       object
Year               int64
ClientEID          object
Client             object
CatCodeGroup       object
CatCodeIndustry    object
CatCodeBusiness    object
LobbyistEID        object
Lobbyist           object
Affiliate          object
dtypes: int64(1), object(9)
memory usage: 207.0+ MB


In [6]:
lobbying.head()

Unnamed: 0,Jurisdiction,Year,ClientEID,Client,CatCodeGroup,CatCodeIndustry,CatCodeBusiness,LobbyistEID,Lobbyist,Affiliate
0,NJ,2013,20317593,"GANNON, RICHARD",,,,20317593,"GANNON, RICHARD",
1,NJ,2013,27607512,180-TURNING LIVES AROUND,,,,20317593,"GANNON, RICHARD",
2,NJ,2013,27420350,"EMMONS, WILLIAM",,,,27420350,"EMMONS, WILLIAM",
3,NJ,2013,27535869,"FERNANDEZ, EDWARD",,,,27420350,"EMMONS, WILLIAM",
4,NJ,2013,27606652,PALISADES SAFETY & INSURANCE ASSOCIATION,,,,27420350,"EMMONS, WILLIAM",


How many records contain "Uncoded" category group code information and what are the category business codes under which they fall?

In [7]:
lobbying[lobbying["CatCodeGroup"] == "Uncoded"].groupby("CatCodeBusiness").size()

CatCodeBusiness
Employer listed but category unknown                   687
Generic occupation - impossible to assign category      66
Homemakers, students & other non-income earners        108
No employer listed or discovered                      1295
Uncoded                                                654
dtype: int64

And how many contain "NaN" values?

In [8]:
lobbying[lobbying["CatCodeGroup"].isnull()]

Unnamed: 0,Jurisdiction,Year,ClientEID,Client,CatCodeGroup,CatCodeIndustry,CatCodeBusiness,LobbyistEID,Lobbyist,Affiliate
0,NJ,2013,20317593,"GANNON, RICHARD",,,,20317593,"GANNON, RICHARD",
1,NJ,2013,27607512,180-TURNING LIVES AROUND,,,,20317593,"GANNON, RICHARD",
2,NJ,2013,27420350,"EMMONS, WILLIAM",,,,27420350,"EMMONS, WILLIAM",
3,NJ,2013,27535869,"FERNANDEZ, EDWARD",,,,27420350,"EMMONS, WILLIAM",
4,NJ,2013,27606652,PALISADES SAFETY & INSURANCE ASSOCIATION,,,,27420350,"EMMONS, WILLIAM",
5,NJ,2013,27420350,"EMMONS, WILLIAM",,,,27535869,"FERNANDEZ, EDWARD",
6,NJ,2013,27535869,"FERNANDEZ, EDWARD",,,,27535869,"FERNANDEZ, EDWARD",
7,NJ,2013,27606652,PALISADES SAFETY & INSURANCE ASSOCIATION,,,,27535869,"FERNANDEZ, EDWARD",
8,NJ,2013,28229120,"HART, DENNIS",,,,28229120,"HART, DENNIS",
10,NJ,2013,28229121,"FRIEDLANDER, EZRA",,,,28229121,"FRIEDLANDER, EZRA",


OK. 2,810 records have an "Uncoded" category group code and another 1,579,636 records have an "NaN" value. Those figures represent a combined 58.3 percent of the 2,713,150 records from 2006 though 2016.

## Analysis

Where does the pharmaceutical industry rank in terms of how many lobbyists they've hired?

We're interested in lobbyist registrations by year: that is, how many contracts exist between pharmaceutial clients and lobbyists in each state and within each year? Because some states require lobbyists to register multiple times a year, we need to group by year, state, client and lobbyist in order to avoid duplicates.

In [49]:
registrations = lobbying.groupby(["Year", "Jurisdiction", "CatCodeBusiness", "Client", "Lobbyist"], as_index=False)
registrations.to_frame()

AttributeError: 'DataFrameGroupBy' object has no attribute 'to_frame'

In [31]:
cleaned = lobbying.astype(str).groupby(["Year", "Jurisdiction", "CatCodeBusiness", "ClientEID", "LobbyistEID"]).sum()
cleaned = cleaned.reset_index()
cleaned_grouped_by_business_code = cleaned.groupby("CatCodeBusiness").sum()
cleaned_grouped_by_business_code = cleaned_grouped_by_business_code.reset_index(name="relationships")
cleaned_grouped_by_business_code.sort_values("relationships", ascending=False)

KeyboardInterrupt: 

In [40]:
lobbying[lobbying["CatCodeBusiness"].isnull()]

Unnamed: 0,Jurisdiction,Year,ClientEID,Client,CatCodeGroup,CatCodeIndustry,CatCodeBusiness,LobbyistEID,Lobbyist,Affiliate
0,NJ,2013,20317593,"GANNON, RICHARD",,,,20317593,"GANNON, RICHARD",
1,NJ,2013,27607512,180-TURNING LIVES AROUND,,,,20317593,"GANNON, RICHARD",
2,NJ,2013,27420350,"EMMONS, WILLIAM",,,,27420350,"EMMONS, WILLIAM",
3,NJ,2013,27535869,"FERNANDEZ, EDWARD",,,,27420350,"EMMONS, WILLIAM",
4,NJ,2013,27606652,PALISADES SAFETY & INSURANCE ASSOCIATION,,,,27420350,"EMMONS, WILLIAM",
5,NJ,2013,27420350,"EMMONS, WILLIAM",,,,27535869,"FERNANDEZ, EDWARD",
6,NJ,2013,27535869,"FERNANDEZ, EDWARD",,,,27535869,"FERNANDEZ, EDWARD",
7,NJ,2013,27606652,PALISADES SAFETY & INSURANCE ASSOCIATION,,,,27535869,"FERNANDEZ, EDWARD",
8,NJ,2013,28229120,"HART, DENNIS",,,,28229120,"HART, DENNIS",
10,NJ,2013,28229121,"FRIEDLANDER, EZRA",,,,28229121,"FRIEDLANDER, EZRA",


In [36]:
business_categories = lobbying.groupby("CatCodeBusiness").size()
business_categories = business_categories.reset_index(name="records")
business_categories
business_categories["rank"] = business_categories["records"].rank(method="min", ascending=False).astype(int)
business_categories.sort_values("records", ascending=False)

Unnamed: 0,CatCodeBusiness,records
0,AIDS treatment & testing,185
1,"Abortion policy, pro-choice",694
2,"Abortion policy, pro-life",504
3,Accident & health insurance,18927
4,Accountants,8088
5,"Actors, actresses & others in the live theater...",2363
6,Adhesives & sealants,115
7,Advertising & public relations services,3017
8,Agricultural chemicals (fertilizers & pesticides),1588
9,Agricultural labor unions,35


In terms of the number of lobbyist-client relationships, pharmaceutical manufacturing ranks third with 37,687 records and medical supplies manufacturing & sales ranks 77th with 4,199 records.

How many clients from the pharmaceutical manufacturing or medical supplies manufacturing & sales business categories hired lobbyists between 2006 and 2016?

In [22]:
medical_companies = lobbying[(lobbying["CatCodeBusiness"] == "Pharmaceutical manufacturing") | (lobbying["CatCodeBusiness"] == "Medical supplies manufacturing & sales")].groupby("Client").size()
medical_companies = medical_companies.reset_index(name="records")
medical_companies["rank"] = medical_companies["records"].rank(method="min", ascending=False).astype(int)
medical_companies.sort_values("records", ascending=False).head()

Unnamed: 0,Client,records,rank
289,PHARMACEUTICAL RESEARCH & MANUFACTURERS ASSOCI...,2917,1
287,PFIZER,2770,2
26,ASTRAZENECA,1682,3
316,SANOFI-AVENTIS,1467,4
112,ELI LILLY & CO,1440,5


In [23]:
medical_companies["Client"].nunique()

385

OK. So, it looks like 385 clients from the pharmaceutical manufacturing or medical supplies manufacturing & sales business categories hired lobbyists between 2006 and 2016.

How many lobbyists did these clients hire between 2006 and 2016?

In [24]:
medical_lobbyists = lobbying[(lobbying["CatCodeBusiness"] == "Pharmaceutical manufacturing") | (lobbying["CatCodeBusiness"] == "Medical supplies manufacturing & sales")].groupby("Lobbyist").size()
medical_lobbyists = medical_lobbyists.reset_index(name="records")
medical_lobbyists["rank"] = medical_lobbyists["records"].rank(method="min", ascending=False).astype(int)
medical_lobbyists.sort_values("records", ascending=False).head()

Unnamed: 0,Lobbyist,records,rank
5571,"SETZEPFANDT, SCOTT",254,1
2100,"GALLO, GEOFFREY A",149,2
1153,"COFFEE, SHERRI D",124,3
6360,"VOJTECH, JULIE",120,4
6256,"TURNER, JAMES",113,5


In [25]:
medical_lobbyists["Lobbyist"].nunique()

6776

OK. So, it looks like these clients hired 6,776 lobbyists between 2006 and 2016.

How do the number of lobbyists vary from year to year between 2006 and 2016?

In [28]:
lobbyists_by_year = lobbying[(lobbying["CatCodeBusiness"] == "Pharmaceutical manufacturing") | (lobbying["CatCodeBusiness"] == "Medical supplies manufacturing & sales")].groupby("Year")["Lobbyist"].nunique()
lobbyists_by_year = lobbyists_by_year.reset_index(name="lobbyists")
lobbyists_by_year["rank"] = lobbyists_by_year["lobbyists"].rank(method="min", ascending=False).astype(int)
lobbyists_by_year.sort_values("Year", ascending=False)

Unnamed: 0,Year,lobbyists,rank
10,2016,2152,2
9,2015,2269,1
8,2014,2064,3
7,2013,2061,4
6,2012,1692,11
5,2011,2048,5
4,2010,1897,8
3,2009,1747,10
2,2008,1935,7
1,2007,1812,9


In [None]:
state, year, client, lobbyist = individual registrations
state, year, lobbyist = individual lobbyists