<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Read-in-data" data-toc-modified-id="Read-in-data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Read in data</a></span></li><li><span><a href="#Count-theorists-in-each-speech" data-toc-modified-id="Count-theorists-in-each-speech-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Count theorists in each speech</a></span></li><li><span><a href="#Create-annual-time-series" data-toc-modified-id="Create-annual-time-series-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Create annual time series</a></span></li></ul></div>

***This notebook takes the preprocessed text data of the UK parliamentary speeches and computes the frequency that the names of several social theorists appears, returning several annual time series for each theorist.***

In [1]:
import pandas as pd
import os

In [2]:
os.getcwd()

'/Volumes/GoogleDrive/My Drive/02_Stanford/00_Researching/16_SocialScientization/-03_HM/00_replication/01_scripts'

In [7]:
DIR = os.path.dirname(os.getcwd()) + "/"
DVS = DIR + "00_data/00_dvs/"
IVS = DIR + "00_data/01_ivs/"

## Read in data

In [6]:
uk = pd.read_csv(DATA+"uk_terms.csv")

In [4]:
uk.head()

Unnamed: 0,date,speaker,speech,chamber,year,ndigits,length
0,1803-11-22,The Speaker,acquaint obedi command attend peer hear gracio...,lower,1803,1,22
1,1803-11-22,Lord Hawkesbury,move walsingham appoint chairman privileg,upper,1803,0,5
2,1803-11-22,The Lord Chancellor,second took opportun pai handsom compliment ta...,upper,1803,0,24
3,1803-11-22,Lord Walsingham,rose observ habit trespass attent take valuabl...,upper,1803,0,67
4,1803-11-22,The Earl of Limerick,rose second address fulli coincid sentiment ex...,upper,1803,12,698


In [6]:
uk.shape

(1097460, 7)

## Count theorists in each speech

In [46]:
theorists = uk[['year']]

In [47]:
theorists['smith'] = uk['speech'].apply(lambda x: len(re.findall("(adam\ssmith)", x)))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  theorists['smith'] = uk['speech'].apply(lambda x: len(re.findall("(adam\ssmith)", x)))


In [49]:
sum(theorists['smith'])

835

In [50]:
theorists['malthus'] = uk['speech'].apply(lambda x: len(re.findall("malthus", x)))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  theorists['malthus'] = uk['speech'].apply(lambda x: len(re.findall("malthus", x)))


In [51]:
theorists['owen'] = uk['speech'].apply(lambda x: len(re.findall("(robert\sowen)", x)))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  theorists['owen'] = uk['speech'].apply(lambda x: len(re.findall("(robert\sowen)", x)))


In [52]:
theorists['spencer'] = uk['speech'].apply(lambda x: len(re.findall("(herbert\sspencer)", x)))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  theorists['spencer'] = uk['speech'].apply(lambda x: len(re.findall("(herbert\sspencer)", x)))


In [53]:
theorists['galton'] = uk['speech'].apply(lambda x: len(re.findall("(galton)", x)))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  theorists['galton'] = uk['speech'].apply(lambda x: len(re.findall("(galton)", x)))


In [54]:
theorists['pearson'] = uk['speech'].apply(lambda x: len(re.findall("(pearson)", x)))

In [55]:
theorists['bentham'] = uk['speech'].apply(lambda x: len(re.findall("(bentham)", x)))

In [56]:
theorists

Unnamed: 0,year,smith,malthus,owen,spencer,galton,pearson,bentham
0,1803,0,0,0,0,0,0,0
1,1803,0,0,0,0,0,0,0
2,1803,0,0,0,0,0,0,0
3,1803,0,0,0,0,0,0,0
4,1803,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...
1097455,1925,0,0,0,0,0,0,0
1097456,1925,0,0,0,0,0,0,0
1097457,1925,0,0,0,0,0,0,0
1097458,1925,0,0,0,0,0,0,0


## Create annual time series

In [57]:
theorists_yr = pd.DataFrame(theorists['year'].unique(), columns=['year'])

In [62]:
guys = {'smith':[], 'malthus':[], 'owen':[], 'spencer':[], 'galton':[], 'pearson':[], 'bentham':[]}
theorists_yr = pd.DataFrame(theorists['year'].unique(), columns=['year'])
for year, df in theorists.groupby('year'):
    for guy in guys: 
        guys[guy].append(df[guy].sum())
for guy in guys: 
    theorists_yr[guy] = guys[guy]

In [63]:
theorists_yr

Unnamed: 0,year,smith,malthus,owen,spencer,galton,pearson,bentham
0,1803,0,0,0,0,0,0,0
1,1804,1,0,0,0,0,1,0
2,1805,0,0,0,0,0,1,0
3,1806,1,0,0,0,0,0,0
4,1807,2,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...
106,1911,12,0,0,6,1,2,2
107,1912,7,1,0,0,0,14,5
108,1913,1,0,0,4,0,8,4
109,1914,5,0,0,0,0,4,2


In [64]:
theorists_yr.to_stata(IVS+"theorists.dta", write_index=False)