# Data

In [46]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import wrds

### Get S&P Data

In [3]:
conn=wrds.Connection()

WRDS recommends setting up a .pgpass file.
Created .pgpass file successfully.
You can create this file yourself at any time with the create_pgpass_file() function.
Loading library list...
Done


In [32]:
sp500 = conn.raw_sql("""
                        select a.*, b.date, b.ret
                        from crsp.dsp500list as a,
                        crsp.dsf as b
                        where a.permno=b.permno
                        and b.date >= a.start and b.date<= a.ending
                        and b.date>='01/01/2000'
                        and b.date<='12/31/2023'
                        order by date;
                        """, date_cols=['start', 'ending', 'date'])

In [38]:
sp500 = sp500.drop(columns=['start', 'ending'])

In [39]:
sp500

Unnamed: 0,permno,date,ret
0,64936,2000-01-03,-0.028662
1,24205,2000-01-03,-0.036496
2,60441,2000-01-03,-0.028926
3,45751,2000-01-03,-0.011757
4,76887,2000-01-03,-0.042553
...,...,...,...
28546,87445,2023-12-29,0.004683
28547,21792,2023-12-29,0.000350
28548,13356,2023-12-29,0.002258
28549,58819,2023-12-29,-0.000390


In [40]:
sp500.to_pickle('data/sp500.pkl')

In [41]:
sp500.isna().sum()

permno      0
date        0
ret       101
dtype: int64

In [44]:
(sp500[sp500['ret'].isna()])['permno'].nunique()

98

### Process Transcript Data

##### Scrape Relevant Texts from URLs

In [55]:
fomc_urls = pd.read_csv('data/fomc statement times pre 2011 with urls.csv')
fomc_urls = fomc_urls.drop(columns=['Statement Time', 'Press Conference'])
fomc_urls = fomc_urls.rename(columns={'Statement Date': 'date', 'URL':'url'})
fomc_urls['date'] = pd.to_datetime(fomc_urls['date'])
fomc_urls

Unnamed: 0,date,url
0,1999-05-18,https://www.federalreserve.gov/boarddocs/press...
1,1999-06-30,https://www.federalreserve.gov/boarddocs/press...
2,1999-08-24,https://www.federalreserve.gov/boarddocs/press...
3,1999-10-05,https://www.federalreserve.gov/boarddocs/press...
4,1999-11-16,https://www.federalreserve.gov/boarddocs/press...
...,...,...
89,2010-08-10,https://www.federalreserve.gov/newsevents/pres...
90,2010-09-21,https://www.federalreserve.gov/newsevents/pres...
91,2010-11-03,https://www.federalreserve.gov/newsevents/pres...
92,2010-12-14,https://www.federalreserve.gov/newsevents/pres...


In [98]:
fomc_statements = []

for index, row in fomc_urls.iterrows():
    date = row['date']
    url = row['url']
    
    response = requests.get(url)
    
    # request was successful
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, 'html.parser')
        statement_text = soup.get_text(separator='\n', strip=True)
                
        content_div = soup.find('div', class_='col-xs-12 col-sm-8 col-md-8')

        # dates in range: 01-31-2006 - 01-26-2011 enter here and store relevant text in col-xs-12 col-sm-8 col-md-8
        if content_div:
            paragraphs = content_div.find_all('p')
            statement_text = '\n'.join([p.get_text(strip=True) for p in paragraphs])
        
        # dates in range: 05-18-1999 - 12-13-2005 enter here and store relevant text in <p>
        else:
            statement_text= soup.find('p').get_text(separator='\n', strip=True)
            
        # add statment
        fomc_statements.append({'date': date, 'statement': statement_text})
    else:
        print(f"Failed to retrieve URL at index {index}: {url}")

In [95]:
fomc_statements = pd.DataFrame(fomc_statements)

print("Statements Format 1 (05-18-1999 - 12-13-2005):")
display(fomc_statements['statement'].iloc[0])

print("Statements Format 2 (01-31-2006 - 01-26-2011):")
display(fomc_statements['statement'].iloc[-1])

Statements Format 1 (05-18-1999 - 12-13-2005):


"For immediate release\nThe Federal Reserve released the following statement after today's Federal Open Market Committee meeting:\nWhile the FOMC did not take action today to alter the stance of monetary policy, the Committee was concerned about the potential for a buildup of inflationary imbalances that could undermine the favorable performance of the economy and therefore adopted a directive that is tilted toward the possibility of a  firming in the stance of monetary policy.  Trend increases in costs and core prices have generally remained quite subdued.  But domestic financial markets have recovered and foreign economic prospects have improved since the easing of monetary policy last fall.  Against the background of already-tight domestic labor markets and ongoing strength in demand in excess of productivity gains, the Committee recognizes the need to be alert to developments over coming months that might indicate that financial conditions may no longer be consistent with containin

Statements Format 2 (01-31-2006 - 01-26-2011):


'Information received since the Federal Open Market Committee met in December confirms that the economic recovery is continuing, though at a rate that has been insufficient to bring about a significant improvement in labor market conditions. Growth in household spending picked up late last year, but remains constrained by high unemployment, modest income growth, lower housing wealth, and tight credit. Business spending on equipment and software is rising, while investment in nonresidential structures is still weak. Employers remain reluctant to add to payrolls. The housing sector continues to be depressed. Although commodity prices have risen, longer-term inflation expectations have remained stable, and measures of underlying inflation have been trending downward.\nConsistent with its statutory mandate, the Committee seeks to foster maximum employment and price stability. Currently, the unemployment rate is elevated, and measures of underlying inflation are somewhat low, relative to le