# Step 5: Analyzing the Results

## 5.1: Importing Reguired Packages, Especially the Data Visualization Packages, Altair and Data

In [19]:
import pandas as pd
import altair as alt
from vega_datasets import data
import requests

## 5.2: Import the bills CSV into a pandas dataframe

In [20]:
bills_analysis = pd.read_csv('bills.csv')

## 5.3: Taking a first glance at the data
I initially performed some basic functions for viewing and getting to know the data, including `.head()`, `.tail`, and `.info`. This is also a helpful way to check to make sure the CSV loaded correctly and that all ten Congresses (106-115) are accounted for. 

In [21]:
bills_analysis.head()

Unnamed: 0,congress,bill_number,url,word_count
0,106,4,https://www.congress.gov/bill/106th-congress/h...,277.0
1,106,5,https://www.congress.gov/bill/106th-congress/h...,714.0
2,106,15,https://www.congress.gov/bill/106th-congress/h...,1042.0
3,106,20,https://www.congress.gov/bill/106th-congress/h...,485.0
4,106,34,https://www.congress.gov/bill/106th-congress/h...,292.0


In [22]:
bills_analysis.tail()

Unnamed: 0,congress,bill_number,url,word_count
2766,115,7243,https://www.congress.gov/bill/115th-congress/h...,147.0
2767,115,7279,https://www.congress.gov/bill/115th-congress/h...,1697.0
2768,115,7318,https://www.congress.gov/bill/115th-congress/h...,315.0
2769,115,7319,https://www.congress.gov/bill/115th-congress/h...,267.0
2770,115,7327,https://www.congress.gov/bill/115th-congress/h...,8519.0


In [56]:
bills_analysis.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2771 entries, 0 to 2770
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   congress     2771 non-null   int64  
 1   bill_number  2771 non-null   int64  
 2   url          2771 non-null   object 
 3   word_count   2769 non-null   float64
dtypes: float64(1), int64(2), object(1)
memory usage: 86.7+ KB


### What are the longest bills?
After taking a first glance, one of the first questions that I want to explore is: what are the longest bills across the last 20 years? Do get a first start at answering this, I make a copy of the bills_analysis dataframe and sort all bills by word count in descending order (from highest to lowest). I'll print this dataframe using the `.head` function to specify that I want to look at the 20 longest bills:

In [193]:
longest_bills = bills_analysis.sort_values(by=['word_count'], ascending=False).head(20)
longest_bills

Unnamed: 0,congress,bill_number,url,word_count
2344,114,2029,https://www.congress.gov/bill/114th-congress/h...,414321.0
2569,115,1625,https://www.congress.gov/bill/115th-congress/h...,413312.0
1730,111,3590,https://www.congress.gov/bill/111th-congress/h...,401393.0
1755,111,4173,https://www.congress.gov/bill/111th-congress/h...,378068.0
990,109,3,https://www.congress.gov/bill/109th-congress/h...,345809.0
2498,115,244,https://www.congress.gov/bill/115th-congress/h...,343679.0
2709,115,5515,https://www.congress.gov/bill/115th-congress/h...,339181.0
2066,113,83,https://www.congress.gov/bill/113th-congress/h...,327149.0
2615,115,2810,https://www.congress.gov/bill/115th-congress/h...,321588.0
299,106,4577,https://www.congress.gov/bill/106th-congress/h...,319963.0


I'm noticing in this table that the more recent congresses, like the 114th and 115th, seem to be overrepresented in the top 10 longest bills, while the less recent congresses are more likley to be seen in the bottom 10. This seems to be pointing in the direction of my hypothesis. I'll want to look deeper into word counts by congress across the whole dataset to get a better sense of whether this observation will continue to hold up. 

## 5.4: Looking at mean, median, maximum, and minimum bill word counts by congress
To get a more complete picture of how bill length has changed across the 10 congressional sessions, it will be necessary to subset `bills_analysis` and analyze the bill length for each congress. I will use then use the `.mean`, `.median`, `.max` and `.min` functions to find the average, median, maximum, and minimum bill length, respectively, for each 2-year congressional session from 1999-2019. 

### Subsetting the `bills_analysis` dataframe for each Congress

In [112]:
# For the 106th congress
bills_106 = bills_analysis[
    (bills_analysis['congress'] == 106)
].copy()

# For the 107th congress
bills_107 = bills_analysis[
    (bills_analysis['congress'] == 107)
].copy()

# For the 108th congress
bills_108 = bills_analysis[
    (bills_analysis['congress'] == 108)
].copy()

# For the 109th congress
bills_109 = bills_analysis[
    (bills_analysis['congress'] == 109)
].copy()

# For the 110th congress
bills_110 = bills_analysis[
    (bills_analysis['congress'] == 110)
].copy()

# For the 111th congress
bills_111 = bills_analysis[
    (bills_analysis['congress'] == 111)
].copy()

# For the 112th congress
bills_112 = bills_analysis[
    (bills_analysis['congress'] == 112)
].copy()

# For the 113th congress
bills_113 = bills_analysis[
    (bills_analysis['congress'] == 113)
].copy()

# For the 114th congress
bills_114 = bills_analysis[
    (bills_analysis['congress'] == 114)
].copy()

# For the 115th congress
bills_115 = bills_analysis[
    (bills_analysis['congress'] == 115)
].copy()


### Finding the word count mean, median, max, and min for all congresses
While I'm certain there is a way to use string interpolation and loops to write this more efficiently, it was helpful for me to see these calculations clearly layed out for each congress.

In [126]:
# Finding the mean, median, min, and max word count for the 106th congress
mean_106 = bills_106['word_count'].mean()
median_106 = bills_106['word_count'].median()
max_106 = bills_106['word_count'].max()
min_106 = bills_106['word_count'].min()

# Finding the mean, median, min, and max word count for the 107th congress
mean_107 = bills_107['word_count'].mean()
median_107 = bills_107['word_count'].median()
max_107 = bills_107['word_count'].max()
min_107 = bills_107['word_count'].min()

# Finding the mean, median, min, and max word count for the 108th congress
mean_108 = bills_108['word_count'].mean()
median_108 = bills_108['word_count'].median()
max_108 = bills_108['word_count'].max()
min_108 = bills_108['word_count'].min()

# Finding the mean, median, min, and max word count for the 109th congress
mean_109 = bills_109['word_count'].mean()
median_109 = bills_109['word_count'].median()
max_109 = bills_109['word_count'].max()
min_109 = bills_109['word_count'].min()

# Finding the mean, median, min, and max word count for the 110th congress
mean_110 = bills_110['word_count'].mean()
median_110 = bills_110['word_count'].median()
max_110 = bills_110['word_count'].max()
min_110 = bills_110['word_count'].min()

# Finding the mean, median, min, and max word count for the 111th congress
mean_111 = bills_111['word_count'].mean()
median_111 = bills_111['word_count'].median()
max_111 = bills_111['word_count'].max()
min_111 = bills_111['word_count'].min()

# Finding the mean, median, min, and max word count for the 112th congress
mean_112 = bills_112['word_count'].mean()
median_112 = bills_112['word_count'].median()
max_112 = bills_112['word_count'].max()
min_112 = bills_112['word_count'].min()

# Finding the mean, median, min, and max word count for the 113th congress
mean_113 = bills_113['word_count'].mean()
median_113 = bills_113['word_count'].median()
max_113 = bills_113['word_count'].max()
min_113 = bills_113['word_count'].min()

# Finding the mean, median, min, and max word count for the 114th congress
mean_114 = bills_114['word_count'].mean()
median_114 = bills_114['word_count'].median()
max_114 = bills_114['word_count'].max()
min_114 = bills_114['word_count'].min()

# Finding the mean, median, min, and max word count for the 115th congress
mean_115 = bills_115['word_count'].mean()
median_115 = bills_115['word_count'].median()
max_115 = bills_115['word_count'].max()
min_115 = bills_115['word_count'].min()

### Storing results for earliest congress (the 106th) into a dataframe 
Now that all the caluclations are performed for each congress and their results stored in distinct variables, I can create a new dataframe that stores the first set of calculations altogether, specifying the column names. To this dataframe I will eventually append the additional summary tables that I will create for each additional congress to create a complete summary table with all this information together in one place. 

In [127]:
#Creating a dataframe to store results for the 106th congress
summary_106 = pd.DataFrame({'congress' : [106], 'mean' : [mean_106], 'median' : [median_106], 'max' : [max_106], 'min' : [min_106]})

In [199]:
#Printing the 106th congress summary table to make sure it's looking right 
summary_106

Unnamed: 0,congress,mean,median,max,min
0,106,7217.940217,603.0,319963.0,140.0


### Creating and summary dataframes for each congress
Now I'll craeate a separate summary dataframe for each congress, structured in the same way as the first one that I made for the 106th congress. 

Once again, I may want to return to this code to use string interpolation and loops to condense it, but I wanted to create an individual summary table for each congress initially, to make it easier for me to follow and easier to access or visualize the data for any given year. Since I am still a beginner in Python, this way was the least confusing for me, although it is by no means the most terse. As I advance, may come back later and condense these. 

In [134]:
summary_107 = pd.DataFrame({'congress' : [107], 'mean' : [mean_107], 'median' : [median_107], 'max' : [max_107], 'min' : [min_107]})
summary_108 = pd.DataFrame({'congress' : [108], 'mean' : [mean_108], 'median' : [median_108], 'max' : [max_108], 'min' : [min_108]})
summary_109 = pd.DataFrame({'congress' : [109], 'mean' : [mean_109], 'median' : [median_109], 'max' : [max_109], 'min' : [min_109]})
summary_110 = pd.DataFrame({'congress' : [110], 'mean' : [mean_110], 'median' : [median_110], 'max' : [max_110], 'min' : [min_110]})
summary_111 = pd.DataFrame({'congress' : [111], 'mean' : [mean_111], 'median' : [median_111], 'max' : [max_111], 'min' : [min_111]})
summary_112 = pd.DataFrame({'congress' : [112], 'mean' : [mean_112], 'median' : [median_112], 'max' : [max_112], 'min' : [min_112]})
summary_113 = pd.DataFrame({'congress' : [113], 'mean' : [mean_113], 'median' : [median_113], 'max' : [max_113], 'min' : [min_113]})
summary_114 = pd.DataFrame({'congress' : [114], 'mean' : [mean_114], 'median' : [median_114], 'max' : [max_114], 'min' : [min_114]})
summary_115 = pd.DataFrame({'congress' : [115], 'mean' : [mean_115], 'median' : [median_115], 'max' : [max_115], 'min' : [min_115]})

### Using `.append()` to combine the summary dataframes for all congresses 
This dataframe will form the most important summary table from which I will draw my analysis. It brings together the mean, median, maximum, and minimum calculations for all 10 congressional sessions from 1999 to 2019. Since all the individual summary dataframes have the same structure and column headers, I can sue `.append()` to add each dataframe on to the first one I created for the 106th. I'll re-name this table simply, `summary`.

In [135]:
summary = summary_106.append([summary_107, summary_108, summary_109, summary_110, summary_111, summary_112, summary_113, summary_114, summary_115], ignore_index=True, verify_integrity=False, sort=False)

Let's take a quick look at `summary` by printing it out, to make sure the `.append()` worked and that it is complete:

In [136]:
summary

Unnamed: 0,congress,mean,median,max,min
0,106,7217.940217,603.0,319963.0,140.0
1,107,7164.340278,529.5,288629.0,139.0
2,108,6628.347305,476.0,316256.0,136.0
3,109,8053.734177,384.5,345809.0,80.0
4,110,6797.598684,249.0,295171.0,71.0
5,111,12144.592885,449.0,401393.0,114.0
6,112,8503.331658,538.0,299409.0,121.0
7,113,8405.509615,418.5,327149.0,125.0
8,114,6438.516279,486.0,414321.0,154.0
9,115,9865.859155,397.0,413312.0,141.0


## 5.5: Using Altair to visualize key indicators of word count trends 
Now comes the fun part. With this rich summary table in place, I can analyze all four of these imporant idicators, mean, median, minimum, and maximum word count, to attempt to answer my research question (for the last 20 years at least). I'll start by creating a chart that looks at average word count for all bills in each congress, from earliest to most recent. As a reminder, the 106th congress began in 1999 and the 115th congress ended in 2018. The red line I add to this chart reprsents the mean of the means: the average word count for all bills across all congresses, which stands at just over 8,000 words.

#### Graphing the mean bill word count by congress

In [220]:
bar = alt.Chart(summary).mark_bar().encode(
    x='congress',
    y='mean'
)
rule = alt.Chart(summary).mark_rule(color='red').encode(
    y='mean(mean):Q'
)

(bar + rule).properties(
    title='Average bill wordcount over 10 Congresses, 1999-2018'
)

It seems that, on average, the mean length of bills over this 19-year period has indeed been trending upward, peaking in the 111th congress. The average bill length has grown by about one-third between the earliest and latest congresess in this dataset. Of course, there is some variation in the dataset, but the red mean line clarifies that the trend is decidedly updward, with four out of 5 of the more recent congresses experiencing above average bill word counts and all five of the less recent congresses experiencing below average mean bill word counts. 

#### Graphing the median bill word count by congress

In [226]:
alt.Chart(summary).mark_bar().encode(
    x='congress',
    y='median'
).properties(
    title='Median bill wordcount over 10 Congresses, 1999-2018'
)

In the median bill wordcount, we actually abserve a downard trend. Even though bills on average are getting longer, the bill that falls in the middle of the distribution for each congress is getting shorter. There could be many reasons for this, which we'll explore in the analyis section below.

#### Graphing the maximum word count by congress

In [225]:
alt.Chart(summary).mark_bar().encode(
    x='congress',
    y='max'
).properties(
    title='Maximum bill wordcount over 10 Congresses, 1999-2018'
)

Perhaps more than even the mean and median charts above, this graph of the longest bills by congressional session illustrates a fairly consistent updward trend, which we would expect to see given the above findings that means are getting higher while the median is getting lower. Something at the top had to be pulling up that average, and we can see there are some mongo bills being put together in Congress as of late, with the largest one topping 420,000 words!

#### Graphing the minimum word count by congress

In [228]:
alt.Chart(summary).mark_bar().encode(
    x='congress',
    y='min'
).properties(
    title='Minimum bill wordcount over 10 congresses, 1999-2018'
)

The smallest bills becoming law in Congress appear to have fluctuated a bit over the last twenty years, starting off high around the start of the 21st Century, then dipping around 2010 only to rise again over the last 10 years (and 5 congresses). It would be interesting to look into what was going on during the late 2010s that could have influenced this. One hypothesis would be the Great Recession, which was just starting to happen at the end of the 110th Congress. Could it be that the jump in minimum length signals the rapid expansion in legislative action taken the next year to help prop up the economy?

### How would these trends compare to the overall mean, median, max, and min across the entire 20 year period?
It seemed like it could be useful to compare the overall mean, median, minimum and maximum bill word counts with the same word counts by congressional session that we just looked at above. I'll use the same approach to do that now: 

In [231]:
#Finding the mean, median, min, and max word count overall
overall_mean = bills_analysis['word_count'].mean()
overall_median = bills_analysis['word_count'].median()
overall_max = bills_analysis['word_count'].max()
overall_min = bills_analysis['word_count'].min()

#Creating a summary dataframe with the mean, median, min and max for all Congresses
summary_overall = pd.DataFrame({'congress' : ['all'], 'mean' : [overall_mean], 'median' : [overall_median], 'max' : [overall_max], 'min' : [overall_min]})

#### Printing the summary overall table

In [230]:
summary_overall

Unnamed: 0,congress,mean,median,max,min
0,all,8033.270495,437.0,414321.0,71.0


#### Appending the summary overall table to the by congressional session overall table for easy comparison

In [233]:
summary_with_overall = summary.append([summary_overall])
summary_with_overall

Unnamed: 0,congress,mean,median,max,min
0,106,7217.940217,603.0,319963.0,140.0
1,107,7164.340278,529.5,288629.0,139.0
2,108,6628.347305,476.0,316256.0,136.0
3,109,8053.734177,384.5,345809.0,80.0
4,110,6797.598684,249.0,295171.0,71.0
5,111,12144.592885,449.0,401393.0,114.0
6,112,8503.331658,538.0,299409.0,121.0
7,113,8405.509615,418.5,327149.0,125.0
8,114,6438.516279,486.0,414321.0,154.0
9,115,9865.859155,397.0,413312.0,141.0


It seems that this comparison, too reinforces the takeaways tha we've gleaned from the above four charts.

### Using a scatter plot to get a better sense of the distribution of individual bills across each congress

In [237]:
alt.Chart(bills_analysis).mark_circle(size=60).encode(
    alt.X('congress',
        scale=alt.Scale(zero=False)
    ),
    y='word_count',
    tooltip=['bill_number', 'congress', 'word_count']
).interactive()


This scatter plot continues to reinforce the general trend of the biggest passed bills getting bigger, while maintaining a consistent pattern of the majority of passsed bills still concetrating in the lower half of the distribution. This sheds a bit more light on the reason why our median word count for passed bills was found to be falling. 

## 5.6: Finding the total number of bills passed for each congressional session
The other key indicator I was interested in for this story was the total number of bills passing in each congress, as it's really the other side of this issue. My initial hypothesis had been that bills that were becoming law were overall becoming more concentrated, which meant longer on average and fewer in number. Here I explore this second piece by subsetting our bill lists by congress and using our familiar `len()` function count how many each congress passed.

In [244]:
# Subsetting all passed bills for each congressional session with a unique identifier
total_106 = len(bills_analysis[bills_analysis['congress'] == 106])
total_107 = len(bills_analysis[bills_analysis['congress'] == 107])
total_108 = len(bills_analysis[bills_analysis['congress'] == 108])
total_109 = len(bills_analysis[bills_analysis['congress'] == 109])
total_110 = len(bills_analysis[bills_analysis['congress'] == 110])
total_111 = len(bills_analysis[bills_analysis['congress'] == 111])
total_112 = len(bills_analysis[bills_analysis['congress'] == 112])
total_113 = len(bills_analysis[bills_analysis['congress'] == 113])
total_114 = len(bills_analysis[bills_analysis['congress'] == 114])
total_115 = len(bills_analysis[bills_analysis['congress'] == 115])

In [245]:
# Creating individual dataframes for each congress and using .append() again to comine them into one dataframe that I can analyze. 
total_106 = pd.DataFrame({'congress' : [106], 'total-bill-count' : [total_106]})
total_107 = pd.DataFrame({'congress' : [107], 'total-bill-count' : [total_107]})
total_108 = pd.DataFrame({'congress' : [108], 'total-bill-count' : [total_108]})
total_109 = pd.DataFrame({'congress' : [109], 'total-bill-count' : [total_109]})
total_110 = pd.DataFrame({'congress' : [110], 'total-bill-count' : [total_110]})
total_111 = pd.DataFrame({'congress' : [111], 'total-bill-count' : [total_111]})
total_112 = pd.DataFrame({'congress' : [112], 'total-bill-count' : [total_112]})
total_113 = pd.DataFrame({'congress' : [113], 'total-bill-count' : [total_113]})
total_114 = pd.DataFrame({'congress' : [114], 'total-bill-count' : [total_114]})
total_115 = pd.DataFrame({'congress' : [115], 'total-bill-count' : [total_115]})

In [241]:
# Creating the combined data frame with totals by congress
totals_by_congress = total_106.append([total_107, total_108, total_109, total_110, total_111, total_112, total_113, total_114, total_115], ignore_index=True, verify_integrity=False, sort=False)

In [246]:
# Printing the new dataframe with total bill count by congress to make sure everything worked properly:
totals_by_congress

Unnamed: 0,congress,total-bill-count
0,106,368
1,107,288
2,108,334
3,109,316
4,110,304
5,111,254
6,112,199
7,113,209
8,114,215
9,115,284


### Visualizing the total bill counts by congress dataframe with Altair

In [247]:
alt.Chart(totals_by_congress).mark_bar().encode(
    x='congress',
    y='total-bill-count'
).properties(
    title='Number of bills that became law, by congress (1999-2018)'
)

The downward trend in number of bills passing through each congress appears to be surprisingly consistent, with really only the most recent congress in teh dataset popping up as an outlier. This makes me especially curious to know what the numbers were for the 116th Congress. 

## 5.7: Key Takeaways and Recommendations for Future Research

Surveying the findings of this research, this project points to four major takeaways:

### 1. On average, bills have gotten longer over the last 20 years, though the median word count has actually declined

This analysis of nearly two decades of bills that have passed through Congress and gone on to become public law seems to overall confirm the high-level hypothesis: that laws are getting longer in word count and fewer in number. And while the median length of a passed bill has declined, this truth may actually reinforce the observation that larger, once-a-year bills like the Defense Authorization Act and the annaual Appropriations bill are being used to hold (and perhaps hide) more and more legislation, especially the controversial stuff, while smaller bills decline in popularity.

### 2. The longest bills are getting consistently longer

The trend line in word count is unquestionably up since the turn of the Century on maximum size bills. This trend was the strongest and most consistent. This dataset at least confirmst that concerns may not be misplaced that omnibus bills are becoming increasingly popular vehicles for legislators and their corporate funders hoping to bundle their provision or law adjustment to a massive piece of legislation that holds too many necessities to not be passed. A deeper analysis of the longest bills in particular and how their composition has changed over time could shed more light on this question.

### 3. The shortest bills have also gotten longer overall

Also not so surprising, the rising rising tide of verbosity seems to be lifting all legislative ships, including the smallest bills that are encated into law. While smaller bills have gotten less attention among commentators, they may deserve a second look. As our distribution graph revealed, these comparatively miniscule bills still outnumber the behemoths. 

### 4. As bills have gotten longer, they have also reduced in number

This analysis seems to have vindicated both pieces of the hypothesis of this project: that bills are not only getting longer, but fewer and fewer of them are being passed each year. This seems to affirm the narrative of legislative concentration layed out above, though it's impossible to infer a causal relationship here. Political science researchers might consider creative approaches, such natural experiments or models, to start getting at this question.

### Recommended Next Steps
This analysis offers just an initial glance at the wider question of what's happening in the American democratic system. While it has undoubtedly clarified the general contours of what's happening with bill size and number, there are countless next steps that could be taken to deepen the analysis Further. I'll recommend three top-line recommendations here:

1. *Expand the dataset:* While the last 20 years is an excellent start and captures most recent trends, it would add even greater credibility to these findings if the analysis were to be extended to cover, say, the last 40 or 50 years. With machine readable bill text available through 1995 and the 116th Congress dataset likely to be completed soon, adding another 5 years will be easy. Extending to befopre 1995 will prove more difficult, since only scanned copies of printed bills are currently available for bills passed before then. No doubt new scan-to-text technology can help solve this problem.

2. *Look at bill length and number trends through eras of Democratic, Republican, and mixed federal control:* Partisan gridlock is frequently blamed as the reason why things don't get done in Washington. Examining this dataset through a partisan lens could offer some valuable insights on just how true this hypothesis is. 

3. *We need deep dives into the big bills and Congressional practices surrounding their passage:* With omnibus bills approaching undreadbale length, democracy advocates have started to call attention to behemoth legislating as a form of democratic subversion. Members of congress in both parties frequently also bemoan the common and insane conondrum of having a 400,000 word bill plopped on their desk, accompanied by a demand from party leadership to be prepared to vote for it within mere days. Even as voter supression and other forms of anti-democratic behaviour are on the rise outside the halls of Congress, it seems congressional processes inside also deserve more attention from researchers, citizens, and pro-democracy advocates. A closer analysis of these growingly important omnibus bills and the practices surrounding them could serve as a valuable next step.