# Step 2: Pattern Refinement


1. [Compare models](#import)
2. [Identify representative documents (Table 5)](#identify)

This code produces the output used in Table 5, and presents a workflow to carry out the second step in the computational grounded theory workflow: guided deep reading.

<a id='import'></a>
## Import Model

First, read in the Structural Topic Model data and create a Pandas df with desired model.

In [2]:
#enable use of R in a Python kernel
%load_ext rpy2.ipython

In [3]:
%%R
library(stm)

#load the saved STM
load("../input_data/stm_1234.RData")




In [5]:
%%R -o df_all 
#The above outputs the R variable df_all for use in Python cells below

#merge theta onto original dataset
meta$ID <- seq.int(nrow(meta))
theta <- data.frame(mod.40$theta)
theta$ID <- seq.int(nrow(theta))
df_all <- merge(meta, theta)
df_all['ID'] <- NULL
df_all['X'] <- NULL

In [6]:
%%R
terms40 <- labelTopics(mod.40, n=20)
terms40$prob

      [,1]       [,2]        [,3]       [,4]       [,5]        [,6]     
 [1,] "club"     "year"      "member"   "miss"     "boy"       "social" 
 [2,] "abort"    "women"     "law"      "doctor"   "medic"     "hospit" 
 [3,] "women"    "articl"    "read"     "art"      "woman"     "book"   
 [4,] "children" "mother"    "child"    "famili"   "work"      "women"  
 [5,] "can"      "put"       "will"     "way"      "tire"      "wear"   
 [6,] "get"      "dont"      "say"      "know"     "thing"     "like"   
 [7,] "vietnam"  "vietnames" "peopl"    "war"      "american"  "south"  
 [8,] "sanger"   "one"       "will"     "public"   "birth"     "inform" 
 [9,] "women"    "liber"     "work"     "cwlu"     "union"     "chicago"
[10,] "women"    "gonorrhea" "doctor"   "infect"   "can"       "pain"   
[11,] "class"    "work"      "hullhous" "miss"     "museum"    "danc"   
[12,] "didnt"    "time"      "said"     "come"     "knew"      "came"   
[13,] "steinem"  "new"       "festiv"   "york"     

In [7]:
#Read df_all from R into a Pandas dataframe
import pandas
df = pandas.DataFrame(df_all)
df

Unnamed: 0,doc,city,publication,date,word_count,org,identifier,wave,text_string,X1,...,X31,X32,X33,X34,X35,X36,X37,X38,X39,X40
1,notessecondyear_70.txt,nyc,notessecondyear,1969,553,redstockings,1,2,1 1 1 1 1 10 11 2 2 2 2 3 3 3 4 5 6 7 8 9 A An...,0.000108,...,0.000030,0.000563,0.000168,0.000012,0.000080,1.601908e-07,0.000004,0.000547,4.344782e-04,0.001872
2,chicago.cwlu_womankind.1971.11.06.txt,chicago,cwlu_womankind,1971,890,cwlu,2,2,411 93 Actually Alice American American Any As...,0.000683,...,0.001261,0.072608,0.000659,0.000051,0.003934,2.370884e-05,0.000765,0.000813,8.932289e-05,0.000268
3,nyc.masses_1916.04.21.txt,nyc,masses,1916,425,heterodoxy,3,1,All Anarchist Anarchist And Birth Birth Birth ...,0.000013,...,0.000140,0.000385,0.036311,0.001572,0.000022,7.744428e-03,0.000628,0.001633,2.317906e-03,0.002181
4,nyc.redstockings.1973.mainardi.marriagequestio...,nyc,redstockings,1973,972,redstockings,4,2,1968 1968 50s 60s Although Although American A...,0.000106,...,0.000667,0.461181,0.000885,0.000008,0.000254,1.166657e-05,0.000083,0.023737,1.385348e-03,0.004373
5,chicago.cwlu_womankind.1972.01.01.txt,chicago,cwlu_womankind,1972,39,cwlu,5,2,1972 5 Ghots I January Womankind a bind by cro...,0.000028,...,0.001579,0.000261,0.024764,0.002650,0.012088,6.019012e-01,0.011103,0.000128,3.216998e-03,0.000334
6,notesfirstyear_30.txt,nyc,notesfirstyear,1968,442,redstockings,6,2,12 12 15 1868 1868 1968 28 A AUNT All Anybody ...,0.001169,...,0.014021,0.004732,0.003569,0.001689,0.019514,1.164017e-04,0.011646,0.006362,1.552764e-02,0.009689
7,chicago.cwlu_womankind.1972.05.14.txt,chicago,cwlu_womankind,1972,976,cwlu,7,2,1 1970 2 2 3 4 4 5 6 7 8 A AT Also Also Amer A...,0.000011,...,0.000075,0.000291,0.002064,0.000260,0.000511,3.852390e-04,0.001004,0.000943,4.548404e-04,0.001381
8,nyc.redstockings.1973.sarachild.programforcons...,nyc,redstockings,1973,785,redstockings,8,2,1 2 3 A A A APPENDIX And CONSCIOUSNESSRAISING ...,0.000096,...,0.000022,0.000557,0.000149,0.000010,0.000067,1.310724e-07,0.000004,0.000586,3.692345e-04,0.001492
9,chicago.cwlu_womankind.1972.11.11.txt,chicago,cwlu_womankind,1972,985,cwlu,9,2,1867 1972 A AND ARTICLES Adopt Affiar All Amaz...,0.000189,...,0.049130,0.008029,0.004333,0.000305,0.079622,1.280873e-03,0.000064,0.011200,3.911868e-02,0.011208
10,chicago.cwlu_womankind.1972.03.20.txt,chicago,cwlu_womankind,1972,369,cwlu,10,2,Above CWLU CWLU CWLU Chicago Chicago Discus Li...,0.000399,...,0.000106,0.001322,0.008745,0.000343,0.031367,6.535829e-04,0.001451,0.000028,3.536808e-05,0.001559


In [8]:
########################################################
########################################################
#####rename top 12 topics to match labels in Table 4####
########################################################
########################################################

#Hull House Social Activites = X1
#Public Institutions = X27
#Hull House Practical Activities = X28
#Sanger and Birth Control = X8
#Women's lives = X26
#Women's Resistance = X21
#Anti-War = X7
#Liberation School = X9
#Women's Sexual Health = X10
#Forms of Resistance = X25
#Movement Theory = X14
#Movement History = X39

#########################################################
#########################################################


df.rename(columns={'X1': "Hull House Social Activities", 'X28': 'Public Institutions', 'X27': 'Hull House Practical Activities',
          'X8': 'Sanger and Birth Control', 'X26': "Women's Lives", 'X21': "Women's Resistance",
          'X7': "Anti-War", 'X9': 'Liberation School', 'X10': "Women's Sexual Health", 
           'X25': 'Forms of Resistance', 'X14': "Movement Theory", 'X39': 'Movement History'}, inplace=True)

<a id='identify'></a>
## Identify Documents

You can now sort the dataframe based on the desired topic to identify documents that best represent each topic. Below I do this for Topic 39, the "Movement History" topic, but this can be done for any topic.

In [11]:
#Output used in Table 5
#Documents with the highest weight for Topic 39, the 'Movement History' topic.
df[['doc', 'text_string', 'Movement History', 'Anti-War']].sort_values(by='Movement History', ascending=False)[:10]

Unnamed: 0,doc,text_string,Movement History,Anti-War
558,nyc.redstockings.1973.sarachild.powerofhistory...,1 100 1972 1972 2 ARCS Alice Although An And A...,0.96703,0.000313
1005,nyc.redstockings.1973.sarachild.powerofhistory...,100 1959 Anthony Anthony Anthony Anthony Antho...,0.963597,0.000114
130,nyc.redstockings.1973.sarachild.powerofhistory...,1 1928 1959 1965 1969 19th 19th 19th 19th 20th...,0.947335,0.000125
492,nyc.redstockings.1973.sarachild.powerofhistory...,110 1848 1850 1870 1871 1902 1920 19th 19th 41...,0.940308,0.000348
630,nyc.redstockings.1973.sarachild.powerofhistory...,19th Ages All Anthony Anthony Anthony Apparent...,0.915081,2.9e-05
371,nyc.redstockings.1973.sarachild.powerofhistory...,1881 19th 19th 19th 2 29 3 4 Aithough American...,0.864953,0.000108
102,nyc.redstockings.1973.sarachild.powerofhistory...,Along And At At Because Because Both But Fires...,0.864582,0.000166
109,nyc.redstockings.1973.sarachild.powerofhistory...,19th 19th 19th 2506 75 94704 9Some All Also An...,0.816335,9.6e-05
31,nyc.redstockings.1973.sarachild.powerofhistory...,17th 196Os A Although And And And Anne Bradstr...,0.756516,0.000252
54,nyc.redstockings.1973.sarachild.powerofhistory...,1 1948 1972 2 2 3 7 772 969 A A AND America Am...,0.705227,0.000726


In [17]:
df[['doc', 'text_string', 'Movement Theory']].sort_values(by='Movement Theory', ascending=False)[:10]

Unnamed: 0,doc,text_string,Movement Theory
89,nyc.redstockings.1973.leon.dirtytricks-02.txt,155 1959 1959 1962 40 81362 A A A After All Al...,0.992999
899,nyc.redstockings.1973.steinemandcia-4.txt,1 1 111 154 1953 1956 1958 1959 1959 1961 1961...,0.990822
497,nyc.redstockings.1973.leon.dirtytricks-01.txt,10009 1284 18861 1967 1975 31059 765Ql 9959 A ...,0.960695
212,nyc.redstockings.1973.steinemandcia-3.txt,011 011I 04 0517 0559 0colr 1 1 1 1 1 1 1 1 1 ...,0.956721
533,notesfirstyear_2.txt,1 10003 10014 11th 1968 212 212 25 317 6913795...,0.947379
1003,nyc.redstockings.1973.leon.dirtytricks-03.txt,15year 1953 1957 195960 195962 1965 1965 1967 ...,0.926908
46,nyc.redstockings.1973.steinemandcia-2.txt,1 1950s 1954 195658 1957 1959 195962 1967 1967...,0.923025
731,nyc.redstockings.1973.leon.dirtytricks-10.txt,10017 112367 12000 163 1971 1974 1974 20000 21...,0.776272
312,nyc.redstockings.1973.schultz.finnishnotebook-...,171 2116167 2f2lI67 7 75 9 CIA CIA ClA Dennis ...,0.725497
44,nyc.redstockings.1973.schultz.finnishnotebook-...,13 170 1962 1962 1962 1975 5 7 A A A According...,0.719436


In [18]:
df

Unnamed: 0,doc,city,publication,date,word_count,org,identifier,wave,text_string,Hull House Social Activities,...,X31,X32,X33,X34,X35,X36,X37,X38,Movement History,X40
1,notessecondyear_70.txt,nyc,notessecondyear,1969,553,redstockings,1,2,1 1 1 1 1 10 11 2 2 2 2 3 3 3 4 5 6 7 8 9 A An...,0.000009,...,9.839842e-07,1.977760e-04,0.000535,0.003818,1.121387e-05,2.214182e-08,0.000038,2.896903e-03,1.724483e-04,2.217323e-07
2,chicago.cwlu_womankind.1971.11.06.txt,chicago,cwlu_womankind,1971,890,cwlu,2,2,411 93 Actually Alice American American Any As...,0.005051,...,1.174310e-06,9.718309e-04,0.000185,0.000362,3.533782e-04,1.145776e-03,0.307760,8.900101e-04,9.877170e-03,4.152077e-05
3,nyc.masses_1916.04.21.txt,nyc,masses,1916,425,heterodoxy,3,1,All Anarchist Anarchist And Birth Birth Birth ...,0.000005,...,1.599000e-02,1.085408e-04,0.103392,0.000093,7.041272e-05,1.025539e-05,0.000181,1.981448e-04,1.036758e-03,1.956936e-05
4,nyc.redstockings.1973.mainardi.marriagequestio...,nyc,redstockings,1973,972,redstockings,4,2,1968 1968 50s 60s Although Although American A...,0.000610,...,5.621154e-05,4.041277e-03,0.001754,0.003146,5.095332e-06,4.110163e-06,0.000657,5.475914e-03,3.231665e-03,3.159558e-05
5,chicago.cwlu_womankind.1972.01.01.txt,chicago,cwlu_womankind,1972,39,cwlu,5,2,1972 5 Ghots I January Womankind a bind by cro...,0.000417,...,6.662805e-05,5.592332e-04,0.015263,0.000041,1.111954e-02,4.302147e-03,0.396293,3.225515e-03,1.853908e-03,7.546477e-04
6,notesfirstyear_30.txt,nyc,notesfirstyear,1968,442,redstockings,6,2,12 12 15 1868 1868 1968 28 A AUNT All Anybody ...,0.000198,...,1.427645e-04,3.033551e-02,0.017309,0.002739,6.336636e-05,6.437441e-05,0.030969,9.519957e-03,3.588646e-02,5.097306e-04
7,chicago.cwlu_womankind.1972.05.14.txt,chicago,cwlu_womankind,1972,976,cwlu,7,2,1 1970 2 2 3 4 4 5 6 7 8 A AT Also Also Amer A...,0.002649,...,1.073067e-05,2.910664e-05,0.000167,0.000471,8.218961e-02,4.336913e-01,0.000287,7.205336e-03,2.120207e-04,4.333721e-05
8,nyc.redstockings.1973.sarachild.programforcons...,nyc,redstockings,1973,785,redstockings,8,2,1 2 3 A A A APPENDIX And CONSCIOUSNESSRAISING ...,0.000005,...,1.564634e-06,1.118471e-04,0.000506,0.001910,2.077238e-05,2.165445e-07,0.000150,2.214874e-03,1.769736e-02,1.491251e-06
9,chicago.cwlu_womankind.1972.11.11.txt,chicago,cwlu_womankind,1972,985,cwlu,9,2,1867 1972 A AND ARTICLES Adopt Affiar All Amaz...,0.000512,...,1.620817e-05,1.483889e-02,0.044229,0.000432,2.356348e-04,8.004961e-06,0.598019,6.536029e-03,4.789977e-04,3.526054e-05
10,chicago.cwlu_womankind.1972.03.20.txt,chicago,cwlu_womankind,1972,369,cwlu,10,2,Above CWLU CWLU CWLU Chicago Chicago Discus Li...,0.000227,...,2.225532e-05,3.113088e-04,0.002405,0.000890,5.669615e-04,3.215436e-05,0.000443,3.157324e-03,3.696998e-04,2.198869e-05
