# Stock Trades by Members of the US House of Representatives

This project uses public data about the stock trades made by members of the US House of Representatives. This data is collected and maintained by Timothy Carambat as part of the [House Stock Watcher](https://housestockwatcher.com/) project. The project describes itself as follows:

> With recent and ongoing investigations of incumbent congressional members being investigated for potentially violating the STOCK act. This website compiles this publicly available information in a format that is easier to digest then the original PDF source.
>
> Members of Congress must report periodic reports of their asset transactions. This website is purely for an informative purpose and aid in transparency.
>
> This site does not manipluate or censor any of the information from the original source. All data is transcribed by our community of contributors, which you can join for free by going to our transcription tool. Our moderation team takes great care in ensuring the accuracy of the information.
>
> This site is built and maintained by Timothy Carambat and supported with our contributors.

In [1]:
# Initialize Otter
import otter
grader = otter.Notebook("lab.ipynb")

In [2]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import glob
import time
import requests
import bs4
\
import lxml

Some interesting questions to consider for this data set include:

- Is there a difference in stock trading behavior between political parties? For example:
    - does one party trade more often?
    - does one party make larger trades?
    - do the two parties invest in different stocks or sectors? For instance, do Democrats invest in Tesla more than Republicans?
- What congresspeople have made the most trades?
- What companies are most traded by congresspeople?
- Is there evidence of insider trading? For example, Boeing stock dropped sharply in February 2020. Were there a suspiciously-high number of sales of Boeing before the drop?
- When are stocks bought and sold? Is there a day of the week that is most common? Or a month of the year?

### Getting the Data

The full data set of stock trade disclosures is available as a CSV or as JSON at https://housestockwatcher.com/api.

This data set does not, however, contain the political affiliation of the congresspeople. If you wish to investigate a question that relies on having this information, you'll need to find another dataset that contains it and perform a merge. *Hint*: Kaggle is a useful source of data sets.

In [3]:
r = requests.get('https://house-stock-watcher-data.s3-us-west-2.amazonaws.com/data/all_transactions.json')
df = pd.DataFrame(r.json())

In [4]:
sort_date = df.sort_values(by='disclosure_year', ascending = False)
sort_date.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 15710 entries, 10717 to 15709
Data columns (total 12 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   disclosure_year         15710 non-null  int64 
 1   disclosure_date         15710 non-null  object
 2   transaction_date        15710 non-null  object
 3   owner                   9667 non-null   object
 4   ticker                  15710 non-null  object
 5   asset_description       15706 non-null  object
 6   type                    15710 non-null  object
 7   amount                  15710 non-null  object
 8   representative          15710 non-null  object
 9   district                15710 non-null  object
 10  ptr_link                15710 non-null  object
 11  cap_gains_over_200_usd  15710 non-null  bool  
dtypes: bool(1), int64(1), object(10)
memory usage: 1.5+ MB


In [5]:
sort_date

Unnamed: 0,disclosure_year,disclosure_date,transaction_date,owner,ticker,asset_description,type,amount,representative,district,ptr_link,cap_gains_over_200_usd
10717,2022,10/11/2022,2022-09-12,self,TSLA,Tesla Inc,sale_partial,"$1,001 - $15,000",Hon. Kathy Manning,NC06,https://disclosures-clerk.house.gov/public_dis...,False
7530,2022,06/19/2022,2022-05-18,self,--,Orange CTY CA REV UTX,purchase,"$100,001 - $250,000",Hon. Scott H. Peters,CA52,https://disclosures-clerk.house.gov/public_dis...,False
7576,2022,09/02/2022,2021-03-04,self,WM,Waste Management Inc,purchase,"$1,001 - $15,000",Hon. Carol Devine Miller,WV03,https://disclosures-clerk.house.gov/public_dis...,False
7575,2022,09/02/2022,2021-03-04,self,VTRA,Viatris Inc,sale_full,"$1,001 - $15,000",Hon. Carol Devine Miller,WV03,https://disclosures-clerk.house.gov/public_dis...,False
7574,2022,09/02/2022,2021-07-28,self,VIAC,ViacomCBS Inc - Class B,purchase,"$15,001 - $50,000",Hon. Carol Devine Miller,WV03,https://disclosures-clerk.house.gov/public_dis...,False
...,...,...,...,...,...,...,...,...,...,...,...,...
7377,2020,08/07/2020,2020-05-28,self,CRM,Salesforce.com Inc,purchase,"$1,001 - $15,000",Hon. William R. Timmons,SC04,https://disclosures-clerk.house.gov/public_dis...,False
7378,2020,08/07/2020,2020-06-29,self,SFST,"Southern First Bancshares, Inc.",sale_full,"$1,001 - $15,000",Hon. William R. Timmons,SC04,https://disclosures-clerk.house.gov/public_dis...,False
7379,2020,08/07/2020,2020-05-28,self,SBUX,Starbucks Corporation,sale_full,"$1,001 - $15,000",Hon. William R. Timmons,SC04,https://disclosures-clerk.house.gov/public_dis...,True
7380,2020,08/07/2020,2020-05-28,self,WMT,Walmart Inc.,purchase,"$1,001 - $15,000",Hon. William R. Timmons,SC04,https://disclosures-clerk.house.gov/public_dis...,False


In [6]:
sort_date['owner'].unique()

array(['self', None, 'joint', 'dependent', '--'], dtype=object)

In [7]:
affiliation = pd.read_csv("house_members_116.csv")
affiliate = affiliation[['name', 'current_party']]
affiliate

Unnamed: 0,name,current_party
0,ralph-abraham,Republican
1,alma-adams,Democratic
2,robert-aderholt,Republican
3,pete-aguilar,Democratic
4,rick-allen,Republican
...,...,...
438,ron-wright,Republican
439,john-yarmuth,Democratic
440,ted-yoho,Republican
441,don-young,Republican


In [8]:
affiliate['name'] = affiliate['name'].str.replace('-', ' ')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  affiliate['name'] = affiliate['name'].str.replace('-', ' ')


In [9]:
def clean_name(name):
    name = name.replace('Hon. ', '')
    name = name.replace('None ', '')
    name = name.replace('Mr. ', '')
    name = name.replace('Mrs. ', '')
    name = name.replace('W. ', '')
    no_middle = name.split(' ')[0] +' ' + name.split(' ')[-1]
    return no_middle.lower()
affiliate['name'] = affiliate['name'].transform(clean_name)
affiliate

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  affiliate['name'] = affiliate['name'].transform(clean_name)


Unnamed: 0,name,current_party
0,ralph abraham,Republican
1,alma adams,Democratic
2,robert aderholt,Republican
3,pete aguilar,Democratic
4,rick allen,Republican
...,...,...
438,ron wright,Republican
439,john yarmuth,Democratic
440,ted yoho,Republican
441,don young,Republican


In [10]:
'nancy pelosi' in affiliate['name'].values

True

In [11]:
'nancy pelosi'.replace('Hon. ', '')

'nancy pelosi'

In [12]:
'daniel crenshaw' in affiliate['name'].values

False

In [13]:
replace_values = {
           'rick allen': 'richard allen',
           'cynthia axne': 'cindy axne',
           'dan crenshaw': 'daniel crenshaw',
           'gregory murphy': 'greg murphy',
            'w steube': 'greg steube',
           'jim costa': 'james costa',
           'ro khanna': 'rohit khanna',
           'mike gallagher': 'michael gallagher',
           'k conaway': 'k. conaway',
           'a mceachin': 'aston mceachin',
           'ken buck': 'kenneth buck',
           'jim banks': 'james banks',
           'j hill':'james hill',
           'raul grijalva': 'raúl grijalva',
           'mario balart': 'mario diaz-balart',
           'jim hagedorn':'james hagedorn',
           'raja krishnamoorthi': 's. krishnamoorthi',
           'wm clay': 'wm. clay',
           'tom halleran': "tom o'halleran",
          }

In [14]:
sort_date['cleaned_name'] = sort_date['representative'].transform(clean_name)

In [15]:
affiliate['name'] = affiliate['name'].replace(replace_values)
'w steube' in affiliate['name'].values

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  affiliate['name'] = affiliate['name'].replace(replace_values)


False

In [16]:
with_party = sort_date.merge(affiliate, left_on = 'cleaned_name', right_on = 'name', how='left')


In [17]:
with_party[with_party['representative'].str.contains('V. Taylor')]

Unnamed: 0,disclosure_year,disclosure_date,transaction_date,owner,ticker,asset_description,type,amount,representative,district,ptr_link,cap_gains_over_200_usd,cleaned_name,name,current_party
887,2022,01/14/2022,2021-12-17,,RNGR,Ranger Energy Services Inc Class A,exchange,"$15,001 - $50,000",Hon. Nicholas V. Taylor,TX03,https://disclosures-clerk.house.gov/public_dis...,False,nicholas taylor,,


In [18]:
nans = with_party[with_party['name'].isna()]
uni = nans['cleaned_name'].unique()
uni

array(['kathy manning', 'scott franklin', 'maria salazar',
       'christopher jacobs', 'marjorie greene', 'bill pascrell',
       'august pfluger', 'patrick fallon', 'pete sessions',
       'victoria spartz', 'andrew garbarino', 'nicholas taylor',
       'david cawthorn', 'jake auchincloss', 'deborah ross',
       'marie newman', 'blake moore', 'michael garcia',
       'diana harshbarger', 'sara jacobs', 'peter meijer',
       'ashley arenholz', 'felix moore', 'stephanie bice', 'neal facs'],
      dtype=object)

In [19]:
'tj cox' in affiliate['name'].values

True

In [20]:
d = 'Democratic'
r = 'Republican'
mannul_fill_party = {
                    'kathy manning': d, 'scott franklin': r,
                    'maria salazar': r, 'christopher jacobs': r,
                    'marjorie greene': r, 'bill pascrell': d,
                     'august pfluger': r, 'patrick fallon': r,
                     'pete sessions': r, 'victoria spartz': r,
                     'andrew garbarino': r,'david cawthorn': r,
                     'jake auchincloss': d, 'deborah ross': d,
                     'marie newman': d, 'blake moore': r,
                     'michael garcia': r, 'diana harshbarger': r,
                     'sara jacobs': d, 'peter meijer': r,
                     'ashley arenholz':r,'felix moore': r,
                     'stephanie bice': r, 'neal facs': r
                    }
nicholas_taylors = {"Hon. Nicholas Van Taylor": r,
                   "Hon. Nicholas V. Taylor": d}

In [21]:
def mannul_fill(row):
    if row['cleaned_name'] in mannul_fill_party.keys():
        return mannul_fill_party[row['cleaned_name']]
    elif row['representative'] in nicholas_taylors.keys():
        return nicholas_taylors[row['representative']]
    else:
        return row['current_party']

In [22]:
full_party = with_party.apply(mannul_fill, axis=1)
full_party

0        Democratic
1        Democratic
2        Republican
3        Republican
4        Republican
            ...    
15705    Republican
15706    Republican
15707    Republican
15708    Republican
15709    Republican
Length: 15710, dtype: object

In [23]:
with_party['current_party'] = full_party

In [24]:
with_party['current_party'].isna().sum()

0

In [27]:
#replace the None party with  NaN
with_party["owner"] = with_party["owner"].fillna(value=np.nan)
with_party

Unnamed: 0,disclosure_year,disclosure_date,transaction_date,owner,ticker,asset_description,type,amount,representative,district,ptr_link,cap_gains_over_200_usd,cleaned_name,name,current_party
0,2022,10/11/2022,2022-09-12,self,TSLA,Tesla Inc,sale_partial,"$1,001 - $15,000",Hon. Kathy Manning,NC06,https://disclosures-clerk.house.gov/public_dis...,False,kathy manning,,Democratic
1,2022,06/19/2022,2022-05-18,self,--,Orange CTY CA REV UTX,purchase,"$100,001 - $250,000",Hon. Scott H. Peters,CA52,https://disclosures-clerk.house.gov/public_dis...,False,scott peters,scott peters,Democratic
2,2022,09/02/2022,2021-03-04,self,WM,Waste Management Inc,purchase,"$1,001 - $15,000",Hon. Carol Devine Miller,WV03,https://disclosures-clerk.house.gov/public_dis...,False,carol miller,carol miller,Republican
3,2022,09/02/2022,2021-03-04,self,VTRA,Viatris Inc,sale_full,"$1,001 - $15,000",Hon. Carol Devine Miller,WV03,https://disclosures-clerk.house.gov/public_dis...,False,carol miller,carol miller,Republican
4,2022,09/02/2022,2021-07-28,self,VIAC,ViacomCBS Inc - Class B,purchase,"$15,001 - $50,000",Hon. Carol Devine Miller,WV03,https://disclosures-clerk.house.gov/public_dis...,False,carol miller,carol miller,Republican
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15705,2020,08/07/2020,2020-05-28,self,CRM,Salesforce.com Inc,purchase,"$1,001 - $15,000",Hon. William R. Timmons,SC04,https://disclosures-clerk.house.gov/public_dis...,False,william timmons,william timmons,Republican
15706,2020,08/07/2020,2020-06-29,self,SFST,"Southern First Bancshares, Inc.",sale_full,"$1,001 - $15,000",Hon. William R. Timmons,SC04,https://disclosures-clerk.house.gov/public_dis...,False,william timmons,william timmons,Republican
15707,2020,08/07/2020,2020-05-28,self,SBUX,Starbucks Corporation,sale_full,"$1,001 - $15,000",Hon. William R. Timmons,SC04,https://disclosures-clerk.house.gov/public_dis...,True,william timmons,william timmons,Republican
15708,2020,08/07/2020,2020-05-28,self,WMT,Walmart Inc.,purchase,"$1,001 - $15,000",Hon. William R. Timmons,SC04,https://disclosures-clerk.house.gov/public_dis...,False,william timmons,william timmons,Republican


In [28]:
#replace the asset_description None
with_party["asset_description"] =with_party["asset_description"].fillna(value=np.nan)

In [34]:
#check missing value of dataframe and determine which columns contain "--"
for i in with_party.columns:
    if "--" in with_party[i].value_counts().index:
        print(i) 

In [33]:
#replace all of -- value in the owner and ticker
with_party["owner"] = with_party["owner"].replace("--",np.nan)
with_party["ticker"] = with_party["ticker"].replace("--",np.nan)


In [35]:
#calculate the average amount of trade 
def avg_amount(x):
    result = float(x[0][1:].replace(",",""))+float(x[-1][1:].replace(",",""))
    return result/2


### Cleaning and EDA

- Clean the data.
    - Certain fields have "missing" data that isn't labeled as missing. For example, there are fields with the value "--." Do some exploration to find those values and convert them to null values.
    - You may also want to clean up the date columns to enable time-series exploration.
- Understand the data in ways relevant to your question using univariate and bivariate analysis of the data as well as aggregations.


### Assessment of Missingness

- Assess the missingness per the requirements in `project03.ipynb`

### Hypothesis Test / Permutation Test
Find a hypothesis test or permutation test to perform. You can use the questions at the top of the notebook for inspiration.

# Summary of Findings

### Introduction
TODO

### Cleaning and EDA
TODO

### Assessment of Missingness
TODO

### Hypothesis Test
TODO

# Code

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import seaborn as sns
%matplotlib inline
%config InlineBackend.figure_format = 'retina'  # Higher resolution figures

### Cleaning and EDA

In [None]:
# TODO

### Assessment of Missingness

### We investigate the missingness of ticker.
Firstly, we assume the missingness of ticker is NMAR, becuase it may related to its actual value. Since a lot of  Stock delisting in recent years, their ticker will be delete from the stock database. AS a result, it will causing the missingness of the stock ticker. We can collect the data of recent year. Stocks that have been delisted in recent years


### Now we need to test if the the missingness of ticker is MCAR

In [None]:
with_party = with_party.assign(ticker_missing = with_party["ticker"].isna())

In [None]:
# p-value for amount
n_repetitions = 500
mean_diff = []
for _ in range(n_repetitions):
    
    # Shuffling genders and assigning back to the DataFrame
    with_party['ticker_missing'] = np.random.permutation(with_party['ticker_missing'])
    
    # Computing and storing TVD
    stat = abs(with_party.groupby("ticker_missing")["avg_amount"].mean().diff().loc[True])
    
    
    mean_diff.append(stat)

In [None]:
#p-value to test if the amount and ticker related 
pval = np.mean(np.array(mean_diff) >= obs_avg_amount)
pval

In [58]:
with_party

Unnamed: 0,disclosure_year,disclosure_date,transaction_date,owner,ticker,asset_description,type,amount,representative,district,ptr_link,cap_gains_over_200_usd,cleaned_name,name,current_party,ticker_missing,avg_amount
0,2022,10/11/2022,2022-09-12,self,TSLA,Tesla Inc,sale_partial,"$1,001 - $15,000",Hon. Kathy Manning,NC06,https://disclosures-clerk.house.gov/public_dis...,False,kathy manning,,Democratic,False,8000.5
1,2022,06/19/2022,2022-05-18,self,,Orange CTY CA REV UTX,purchase,"$100,001 - $250,000",Hon. Scott H. Peters,CA52,https://disclosures-clerk.house.gov/public_dis...,False,scott peters,scott peters,Democratic,False,175000.5
2,2022,09/02/2022,2021-03-04,self,WM,Waste Management Inc,purchase,"$1,001 - $15,000",Hon. Carol Devine Miller,WV03,https://disclosures-clerk.house.gov/public_dis...,False,carol miller,carol miller,Republican,False,8000.5
3,2022,09/02/2022,2021-03-04,self,VTRA,Viatris Inc,sale_full,"$1,001 - $15,000",Hon. Carol Devine Miller,WV03,https://disclosures-clerk.house.gov/public_dis...,False,carol miller,carol miller,Republican,False,8000.5
4,2022,09/02/2022,2021-07-28,self,VIAC,ViacomCBS Inc - Class B,purchase,"$15,001 - $50,000",Hon. Carol Devine Miller,WV03,https://disclosures-clerk.house.gov/public_dis...,False,carol miller,carol miller,Republican,False,32500.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15705,2020,08/07/2020,2020-05-28,self,CRM,Salesforce.com Inc,purchase,"$1,001 - $15,000",Hon. William R. Timmons,SC04,https://disclosures-clerk.house.gov/public_dis...,False,william timmons,william timmons,Republican,False,8000.5
15706,2020,08/07/2020,2020-06-29,self,SFST,"Southern First Bancshares, Inc.",sale_full,"$1,001 - $15,000",Hon. William R. Timmons,SC04,https://disclosures-clerk.house.gov/public_dis...,False,william timmons,william timmons,Republican,False,8000.5
15707,2020,08/07/2020,2020-05-28,self,SBUX,Starbucks Corporation,sale_full,"$1,001 - $15,000",Hon. William R. Timmons,SC04,https://disclosures-clerk.house.gov/public_dis...,True,william timmons,william timmons,Republican,False,8000.5
15708,2020,08/07/2020,2020-05-28,self,WMT,Walmart Inc.,purchase,"$1,001 - $15,000",Hon. William R. Timmons,SC04,https://disclosures-clerk.house.gov/public_dis...,False,william timmons,william timmons,Republican,False,8000.5


In [87]:
dic={}

In [91]:
#filter the columns we need to do the tvd permutaiton test 
tvd_tester =with_party.columns.to_list()[0:-5]
tvd_tester.pop(7)
tvd_tester

['disclosure_year',
 'disclosure_date',
 'transaction_date',
 'owner',
 'ticker',
 'asset_description',
 'type',
 'representative',
 'district',
 'ptr_link',
 'cap_gains_over_200_usd']

In [92]:
#test the p-value
for i in tvd_tester:
    tvd_permutation_test(with_party,i)

The reuslt shows that all of the category columns are independent with the ticker ,because all of the p-value are more than 0.05

In [89]:
dic

{'disclosure_year': 1.0,
 'disclosure_date': 1.0,
 'transaction_date': 1.0,
 'owner': 1.0,
 'ticker': 1.0,
 'asset_description': 1.0,
 'type': 1.0,
 'representative': 1.0,
 'district': 1.0,
 'ptr_link': 1.0,
 'cap_gains_over_200_usd': 1.0}

In [93]:
# the function which used to calculate the TVD of ticker missing and other columns and calculate the p-value
def tvd_permutation_test(df,x):
    with_party = df.copy()
    presentative_dis =with_party.pivot_table(index=x, columns='ticker_missing', aggfunc='size').fillna(0)
    presentative_dis = presentative_dis / presentative_dis.sum()
    obs_pre = presentative_dis.diff(axis=1).iloc[:, -1].abs().sum()/2
    
    
    n_repetitions = 500
    tvds = []
    for _ in range(n_repetitions):
        # Shuffling ticker missing and assigning back to the DataFrame
        with_party['ticker_missing'] = np.random.permutation(with_party['ticker_missing'])

        # Computing and storing TVD
        presentative_dis =with_party.pivot_table(index=x, columns='ticker_missing', aggfunc='size').fillna(0)

        tvd = presentative_dis.diff(axis=1).iloc[:, -1].abs().sum()/2
        tvds.append(tvd)
    pval_r = np.mean(np.array(tvds) >= obs_pre)
    dic[x]=pval_r

In [None]:
#clean the amount column to the conform same format 
with_party["amount"] = with_party["amount"].replace({"$1,001 -":"$0 - $1,001",
                             "$1,000,000 +":"$1,000,000 - $5,000,000",
                             "$50,000,000 +":"$50,000,000 - $50,000,000"})

In [None]:
#calculate the average of the amount of trade
with_party["avg_amount"] = with_party["amount"].str.split().apply(avg_amount)

In [100]:
#calculate the amount difference of missingness 
obs_avg_amount = abs(with_party.groupby("ticker_missing")["avg_amount"].mean().diff().loc[True])
obs_avg_amount

44392.378332611734

In [101]:
# p-value for amount 
n_repetitions = 500
mean_diff = []
for _ in range(n_repetitions):
    
    # Shuffling genders and assigning back to the DataFrame
    with_party['ticker_missing'] = np.random.permutation(with_party['ticker_missing'])
    
    # Computing and storing TVD
    stat = abs(with_party.groupby("ticker_missing")["avg_amount"].mean().diff().loc[True])
    
    
    mean_diff.append(stat)

Since the p-value of the amount is less than 0.05 we reject the null hypothesis, we find that the ticker related to the amount

In [102]:
#p-value to test if the amount and ticker related 
pval = np.mean(np.array(mean_diff) >= obs_avg_amount)
pval

0.03

### Hypothesis Test / Permutation Test

## Null hypothesis: 
party and trade amount are not related – the high average trade amount of democratic party is due to chance alone.

## Alternative hypothesis: 
party and trade amount are related  – the high average trade amount of democratic party is due to chance alone.

## test statistic:
the mean of democratic amount 
## Significance level:
we choose the siginificance level of 5% because this is the strict level of siginicicance level and we want to avoid false positive rate.
## Result: 
Since the p-value is 0 which smaller than 0.05, we reject the null hypothesis

In [None]:
#calculate the count and average trade amount of different party group 
party_table = with_party.groupby("current_party")["avg_amount"].agg(['mean', 'count'])

In [None]:
#get the total number of the democratic people 
democra_count =party_table.loc["Democratic"]["count"]

The table of the party for understanding the count and average trade amount of different party group 

In [None]:
party_table

In [56]:
#observe value of mean of the trader amount of the democratic party
obs_de_amount = party_table.loc["Democratic"]["mean"]
obs_de_amount

62229.16729380657

In [55]:
# doing the hypothesis test to get the test statistic
num_reps = 10000
averages = []
for i in np.arange(num_reps):
    random_sample = with_party.sample(int(democra_count))
    new_average = random_sample['avg_amount'].mean()
    averages.append(new_average)

In [None]:
#calculate the p-value 
np.mean(np.array(averages)>=obs_avg_amount)

The histogram graph which shows how far the obeserve value and hypothesis test distribution 

In [None]:
pd.Series(averages).plot(kind='hist', 
                         density=True,
                         bins=30,
                         ec='w',
                         title='Average Bill Lengths in Samples of Size 47');
plt.axvline(x=obs_avg_amount, color='red', linewidth=2);