# Clean GSS Variables used in Analysis

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import thinkstats2
import thinkplot

import utils
from utils import decorate

In [2]:
def read_gss(dirname):
    """Reads GSS files from the given directory.
    
    dirname: string
    
    returns: DataFrame
    """
    dct = utils.read_stata_dict(dirname + '/GSS.dct')
    gss = dct.read_fixed_width(dirname + '/GSS.dat.gz',
                             compression='gzip')
    return gss

In [3]:
# Read in the GSS variables.
gss = read_gss('gss_vars')

## Clean the Data

In [4]:
gss.head()

Unnamed: 0,ballot,fund,other,denom,relig,cappun,libhomo,colhomo,spkhomo,natfare,...,natrace,nateduc,natdrug,natcity,natheal,natenvir,natspac,polviews,partyid,year
0,0,3,0,0,3,0,0,0,0,0,...,0,0,0,0,0,0,0,0,2,1972
1,0,2,0,0,2,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,1972
2,0,2,0,28,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,3,1972
3,0,9,0,0,5,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,1972
4,0,2,0,28,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1972


In [5]:
gss.columns

Index(['ballot', 'fund', 'other', 'denom', 'relig', 'cappun', 'libhomo',
       'colhomo', 'spkhomo', 'natfare', 'attend', 'reliten', 'relig16',
       'cohort', 'realrinc', 'marhomo', 'homosex', 'hapmar', 'raclive',
       'fund16', 'oth16', 'denom16', 'nataid', 'natarms', 'res16', 'race',
       'sex', 'educ', 'age', 'divorce', 'marital', 'wrkstat', 'id_', 'reg16',
       'region', 'size', 'natrace', 'nateduc', 'natdrug', 'natcity', 'natheal',
       'natenvir', 'natspac', 'polviews', 'partyid', 'year'],
      dtype='object')

The `nataid` variable refers to whether or not the survey respondent thinks that the country spends too much, too little, or just the right amount on foreign aid.

Code | Label
--- | ---
1   | Too little
2   | About right
3   | Too much
8   | Don't know
9   | No answer
0   | Not applicable

In [6]:
gss['nataid'].replace([0,9], np.nan, inplace=True)

In [7]:
gss['nataid'].value_counts().sort_index()

1.0     2355
2.0     8477
3.0    24317
8.0     1891
Name: nataid, dtype: int64

The `natarms` variable refers to whether or not the survey respondent thinks that the country spends too much, too little, or just the right amount on military, armaments, and defense.

Code | Label
--- | ---
1   | Too little
2   | About right
3   | Too much
8   | Don't know
9   | No answer
0   | Not applicable

In [8]:
gss['natarms'].replace([0,9], np.nan, inplace=True)

In [9]:
gss['natarms'].value_counts().sort_index()

1.0     8781
2.0    15030
3.0    11385
8.0     1829
Name: natarms, dtype: int64

The `natrace` variable refers to whether or not the survey respondent thinks that the country spends too much, too little, or just the right amount on "improving the conditions of blacks". 

Code | Label
--- | ---
1   | Too little
2   | About right
3   | Too much
8   | Don't know
9   | No answer
0   | Not applicable

In [10]:
gss['natrace'].replace([0,9], np.nan, inplace=True)

In [11]:
gss['natrace'].value_counts().sort_index()

1.0    12141
2.0    15164
3.0     6506
8.0     3113
Name: natrace, dtype: int64

The `res16` variable relates to the size of the location where the respondent lived when they were 16 years old. We removed the Don't Know entries as they do not supply us with helpful information.

Code | Label
--- | ---
1   | Country, nonfarm
2   | Farm
3   | Town of 50000
4   | 50000 to 250000
5   | Big-city suburb
6   | City 250000
8   | Don't know
9   | No answer
0   | Not applicable

In [12]:
gss['res16'].replace([0,9], np.nan, inplace=True)

In [13]:
gss['res16'].value_counts().sort_index()

1.0     6935
2.0     9440
3.0    20076
4.0     9805
5.0     7078
6.0     9851
8.0       42
Name: res16, dtype: int64

The `race` variable relates to the race of respondent.

Code | Label
--- | ---
1   | White
2   | Black
3   | Other

In [14]:
gss['race'].value_counts().sort_index()

1    52033
2     9187
3     3594
Name: race, dtype: int64

The `sex` variable relates to the race of respondent.

Code | Label
--- | ---
1   | Male
2   | Female

In [15]:
gss['sex'].value_counts().sort_index()

1    28614
2    36200
Name: sex, dtype: int64

The `educ` variable relates to the education of the of respondent in years of schooling.

In [16]:
gss['educ'].replace([98,99], np.nan, inplace=True)

In [17]:
gss['educ'].value_counts().sort_index()

0.0       165
1.0        47
2.0       152
3.0       257
4.0       319
5.0       402
6.0       828
7.0       879
8.0      2724
9.0      2083
10.0     2880
11.0     3743
12.0    19663
13.0     5360
14.0     7160
15.0     2910
16.0     8355
17.0     1967
18.0     2384
19.0      920
20.0     1439
Name: educ, dtype: int64

The `age` variable is the age of the respondent respondent has obtained in years. We removed No Answers and Don't Knows

In [18]:
gss['age'].replace([98,99], np.nan, inplace=True)

In [19]:
gss['age'].value_counts().sort_index()

18.0     241
19.0     861
20.0     885
21.0    1014
22.0    1082
23.0    1238
24.0    1213
25.0    1378
26.0    1336
27.0    1391
28.0    1432
29.0    1327
30.0    1450
31.0    1340
32.0    1431
33.0    1375
34.0    1422
35.0    1383
36.0    1362
37.0    1360
38.0    1350
39.0    1236
40.0    1287
41.0    1222
42.0    1201
43.0    1240
44.0    1174
45.0    1091
46.0    1123
47.0    1104
        ... 
60.0     940
61.0     842
62.0     865
63.0     843
64.0     736
65.0     819
66.0     749
67.0     800
68.0     737
69.0     689
70.0     719
71.0     624
72.0     616
73.0     552
74.0     593
75.0     504
76.0     507
77.0     477
78.0     433
79.0     376
80.0     337
81.0     329
82.0     279
83.0     260
84.0     226
85.0     197
86.0     186
87.0     148
88.0     121
89.0     364
Name: age, Length: 72, dtype: int64

The `cohort` is the year of the respondent's birth

In [20]:
gss['cohort'].replace(9999, np.nan, inplace=True)
gss['cohort'].value_counts().sort_index()

1883.0      2
1884.0      3
1885.0      7
1886.0      4
1887.0     13
1888.0     10
1889.0     21
1890.0     27
1891.0     27
1892.0     31
1893.0     34
1894.0     61
1895.0     55
1896.0     73
1897.0     72
1898.0    105
1899.0    123
1900.0    130
1901.0    138
1902.0    163
1903.0    173
1904.0    182
1905.0    219
1906.0    207
1907.0    263
1908.0    246
1909.0    330
1910.0    285
1911.0    318
1912.0    348
         ... 
1971.0    689
1972.0    679
1973.0    640
1974.0    638
1975.0    570
1976.0    565
1977.0    576
1978.0    498
1979.0    547
1980.0    488
1981.0    468
1982.0    398
1983.0    365
1984.0    376
1985.0    352
1986.0    292
1987.0    268
1988.0    232
1989.0    245
1990.0    179
1991.0    180
1992.0    123
1993.0    136
1994.0    120
1995.0     87
1996.0     72
1997.0     61
1998.0     22
1999.0     26
2000.0     22
Name: cohort, Length: 118, dtype: int64

The `divorce` variable refers to whether or not the survey respondent has ever been married, whether or not that marriage is currently intact or how it ended. We removed Don't Knows as well as N/As and made No 0 for possible regression later on.

Code | Label
--- | ---
1   | Yes
2   | No
8   | Don't know
9   | No answer
0   | Not applicable

In [21]:
gss['divorce'].replace([0,8,9], np.nan, inplace=True)
gss['divorce'].replace(2, 0, inplace=True)

In [22]:
gss['divorce'].value_counts().sort_index()

0.0    31806
1.0     8286
Name: divorce, dtype: int64

The `marital` variable refers to the marital status of the respondent.

Code | Label
--- | ---
1   | Married
2   | Widowed
3   | Divorced
4   | Separated
5   | Never married
9   | No answer

In [23]:
gss['marital'].replace([9], np.nan, inplace=True)

In [24]:
gss['marital'].value_counts().sort_index()

1.0    34129
2.0     6200
3.0     8379
4.0     2242
5.0    13837
Name: marital, dtype: int64

The `wrkstat` variable refers to the employment status of the respondent.

Code | Label
--- | ---
1   | Working fulltime
2   | Working parttime
3   | Temp not working
4   | Unempl, laid off
5   | Retired
6   | School
7   | Keeping house
8  | Other
9   | No answer

In [25]:
gss['wrkstat'].replace([9], np.nan, inplace=True)

In [26]:
gss['wrkstat'].value_counts().sort_index()

1.0    31892
2.0     6719
3.0     1363
4.0     2179
5.0     9121
6.0     1998
7.0    10176
8.0     1345
Name: wrkstat, dtype: int64

The `nateduc` variable refers to whether or not the survey respondent thinks that the country spends too much, too little, or just the right amount on improving the nation's education system. 

Code | Label
--- | ---
1   | Too little
2   | About right
3   | Too much
8   | Don't know
9   | No answer
0   | Not applicable

In [27]:
gss['nateduc'].replace([0,9], np.nan, inplace=True)

In [28]:
gss['nateduc'].value_counts().sort_index()

1.0    23423
2.0    10183
3.0     2491
8.0      946
Name: nateduc, dtype: int64

The `natdrug` variable refers to whether or not the survey respondent thinks that the country spends too much, too little, or just the right amount on dealing with drug addiction. 

Code | Label
--- | ---
1   | Too little
2   | About right
3   | Too much
8   | Don't know
9   | No answer
0   | Not applicable

In [29]:
gss['natdrug'].replace([0,9], np.nan, inplace=True)

In [30]:
gss['natdrug'].value_counts().sort_index()

1.0    22005
2.0    10179
3.0     2994
8.0     1819
Name: natdrug, dtype: int64

The `natcity` variable refers to whether or not the survey respondent thinks that the country spends too much, too little, or just the right amount on solving problems of big cities. 

Code | Label
--- | ---
1   | Too little
2   | About right
3   | Too much
8   | Don't know
9   | No answer
0   | Not applicable

In [31]:
gss['natcity'].replace([0,9], np.nan, inplace=True)

In [32]:
gss['natcity'].value_counts().sort_index()

1.0    16642
2.0    10761
3.0     5216
8.0     4388
Name: natcity, dtype: int64

The `natheal` variable refers to whether or not the survey respondent thinks that the country spends too much, too little, or just the right amount on improving & protecting nations health. 

Code | Label
--- | ---
1   | Too little
2   | About right
3   | Too much
8   | Don't know
9   | No answer
0   | Not applicable

In [33]:
gss['natheal'].replace([0,9], np.nan, inplace=True)

In [34]:
gss['natheal'].value_counts().sort_index()

1.0    23779
2.0     9786
3.0     2317
8.0     1161
Name: natheal, dtype: int64

The `natenvir` variable refers to whether or not the survey respondent thinks that the country spends too much, too little, or just the right amount on improving & protecting environment. 

Code | Label
--- | ---
1   | Too little
2   | About right
3   | Too much
8   | Don't know
9   | No answer
0   | Not applicable

In [35]:
gss['natenvir'].replace([0,9], np.nan, inplace=True)

In [36]:
gss['natenvir'].value_counts().sort_index()

1.0    21658
2.0    10626
3.0     3132
8.0     1631
Name: natenvir, dtype: int64

The `natspac` variable refers to whether or not the survey respondent thinks that the country spends too much, too little, or just the right amount on space exploration program. 

Code | Label
--- | ---
1   | Too little
2   | About right
3   | Too much
8   | Don't know
9   | No answer
0   | Not applicable

In [37]:
gss['natspac'].replace([0,9], np.nan, inplace=True)

In [38]:
gss['natspac'].value_counts().sort_index()

1.0     4750
2.0    14362
3.0    15584
8.0     2386
Name: natspac, dtype: int64

The `polviews` variable refers to where the survey respondent thinks that they fall on a 7 point scale between extremely liberal and extremely conservative. 

Code | Label
--- | ---
1   | Extremely liberal
2   | Liberal
3   | Slightly liberal
4   | Moderate
5   | Slightly conservative
6   | Conservative
7   | Extremely conservative
8   | Don't know
9   | No answer
0   | Not applicable

In [39]:
gss['polviews'].replace([0,9], np.nan, inplace=True)

In [40]:
gss['polviews'].value_counts().sort_index()

1.0     1682
2.0     6514
3.0     7010
4.0    21370
5.0     8690
6.0     8230
7.0     1832
8.0     2326
Name: polviews, dtype: int64

The `partyid` variable refers to whether the survey respondent thinks that they area Republican, Democrat, or Independent. 

Code | Label
--- | ---
0   | Strong democrat
1   | Not strong democrat
2   | Independent, near democrat
3   | Independent
4   | Independent, near republican
5   | Not strong democrat
6   | Strong republican
7   | Other party
8   | Don't know
9   | No answer

In [41]:
gss['partyid'].replace([8,9], np.nan, inplace=True)

In [42]:
gss['partyid'].value_counts().sort_index()

0.0    10378
1.0    13294
2.0     7792
3.0     9888
4.0     5721
5.0     9933
6.0     6318
7.0     1072
Name: partyid, dtype: int64

The `ballot` variable indicates the ballot used for each interview. We replace the 'Not applicable' code with NaNs.

Code | Label
--- | ---
1 | ballot a
2 | ballot b
3 | ballot c
4 | ballot d
0 | not applicable

In [43]:
print(gss['ballot'].value_counts())

0    21875
2    13917
3    13798
1    13706
4     1518
Name: ballot, dtype: int64


In [44]:
gss['ballot'] = gss['ballot'].replace([0], np.nan);

The `spkhomo` variable indicates respondents feel a homosexual man should be allowed to make a speech in their community.

Code | Label
--- | ---
1 | Allowed
2 | Not allowed
8 | Don't know
9 | No answer
0 | Not applicable

In [45]:
gss['spkhomo'].value_counts()

1    30002
0    25042
2     8621
8     1024
9      125
Name: spkhomo, dtype: int64

In [46]:
gss['spkhomo'] = gss['spkhomo'].replace([0, 9], np.nan)

The `colhomo` variable indicates if the respondents feel that a homosexual man should be able to teach in a college or university.

Code | Label
--- | ---
4 | Allowed
5 | Not allowed
8 | Don't know
9 | No answer
0 | Not applicable

In [47]:
gss['colhomo'].value_counts()

4    27010
0    25042
5    11358
8     1272
9      132
Name: colhomo, dtype: int64

In [48]:
gss['colhomo'] = gss['colhomo'].replace([0, 9], np.nan)

The `libhomo` variable indicates respondents' answers to the question "If some people in your community suggested that a book he \[a homosexual man\] wrote in favor of homosexuality should be taken out of your public library, would you favor removing this book, or not?"

Code | Label
--- | ---
1 | Remove
2 | Not remove
8 | Don't know
9 | No answer
0 | Not applicable

In [49]:
gss['libhomo'].value_counts()

2    26505
0    25042
1    12073
8     1065
9      129
Name: libhomo, dtype: int64

In [50]:
gss['libhomo'] = gss['libhomo'].replace([0, 9], np.nan)

The `cappun` variable indicates if respondents are in favor of or oppose the death penalty for persons convicted of murder.

Code | Label
--- | ---
1 | Favor
2 | Oppose
8 | Don't know
9 | No answer
0 | Not applicable

In [51]:
gss['cappun'].value_counts()

1    37946
2    15604
0     7500
8     3410
9      354
Name: cappun, dtype: int64

In [52]:
gss['cappun'] = gss['cappun'].replace([0, 9], np.nan)

The `marhomo` variable indicates if respondents think homosexual couples should have the right to marry.

Code | Label
--- | ---
1 | Strongly agree
2 | Agree
3 | Neither agree or disagree
4 | Disagree
5 | Strongly disagree
8 | Can't choose
9 | No answer
0 | Not applicable

In [53]:
gss['marhomo'].value_counts()

0    51062
5     3635
1     3093
2     2966
4     2086
3     1684
8      202
9       86
Name: marhomo, dtype: int64

In [54]:
gss['marhomo'] = gss['marhomo'].replace([0, 9], np.nan)

The `relig` variable indicates religious preference.

Code | Label
--- | ---
1 | Protestant
2 | Catholic
3 | Jewish
4 | None
5 | Other
6 | Buddhism
7 | Hinduism
8 | Other eastern
9 | Moslem/Islam
10 | Orthodox Christian
11 | Christian
12 | Native American
13 | Inter-nondenominational
98 | Don't know
99 | No answer

In [55]:
gss['relig'].value_counts()

1     37117
2     15674
4      7797
3      1285
5      1086
11      791
99      258
6       198
9       153
13      136
10      118
7       100
8        39
12       31
98       31
Name: relig, dtype: int64

In [56]:
gss['relig'] = gss['relig'].replace([99], np.nan)

The `denom` variable indicates the specific religious denomination.

Code | Label
--- | ---
10 | Am baptist asso
11 | Am bapt ch in usa
12 | Nat bapt conv of am
13 | Nat bapt con usa
14 | Southern baptist
15| Other baptists
18 | Baptist-dk which
20 | Afr meth episcopal
22 | United methodist
23 | Other methodist
28 | Methodist-dk which
30 | Am lutheran
31 | Luth ch in america
32 | Lutheran-mo synod
33 | Wi evan luth synod
34 | Other lutheran
35 | Evangelical luth
38 | Lutheran-dk which
40 | Presbyterian c in us
41 | United pres ch in us
42 | Other presbyterian
43 | Presbyterian, merged
48 | Presbyterian-dk which
50 | Episcopal
60 | Other
70 | No denomination
98 | Don't know
99 | No answer
0 | Not applicable

In [57]:
gss['denom'].value_counts()

0     26640
60     8048
18     6415
70     3955
14     3889
22     2881
28     2403
38     1738
50     1397
48     1190
15      836
32      687
10      675
30      571
41      443
40      347
99      340
11      332
35      302
31      211
20      206
12      200
43      186
42      170
33      157
34      155
23      149
13      128
21       86
98       77
Name: denom, dtype: int64

In [58]:
gss['denom'] = gss['denom'].replace([0, 99], np.nan)

The `other` variable indicates other denominations.

Code | Label
--- | ---
2	| Evangelical Congregational
3	| Ind Bible, Bible, Bible Fellowship
5	| Church of Prophecy
6	| New Testament Christian
7	| Church of God, Saint & Christ
8	| Moravian
9	| Christian & Missionary Alliance
10	| Advent Christian
11	| Spiritualist
12	| Assembly of God
13	| Free Methodist
14	| Apostolic Faith
15	| African Methodist
16	| Free Will Baptist
17	| Eden Evangelist
18	| Holiness (Nazarene)
19	| Baptist (Northern)
20	| Brethren Church, Brethren
21	| Witness Holiness
22	| Brethren, Plymouth
23	| United Brethren, United Brethren in Christ
24	| Independent
25	| Christian Disciples
26	| Christ in Christian Union
27	| Open Bible
28	| Christian Catholic
29	| Christ Church Unity
30	| Christ Adelphians
31	| Christian; Central Christian
32	| Christian Reform
33	| Christian Scientist
34	| Church of Christ, Evangelical
35	| Church of Christ
36	| Churches of God(Except with Christ and Holiness)
37	| Church of God in Christ
38	| Church of God in Christ Holiness
39	| Church of the Living God
40	| Congregationalist, 1st Congreg
41	| Community Church
42	| Covenant
43	| Dutch Reform
44	| Disciples of Christ
45	| Evangelical, Evangelist
46	| Evangelical Reformed
47	| Evangelist Free Church
48	| First Church
49	| First Christian Disciples of Christ
50	| First Reformed
51	| First Christian
52	| Full Gospel
53	| Four Square Gospel
54	| Friends
55	| Holy Roller
56	| Holiness; Church of Holiness
57	| Pilgrim Holiness
58	| Jehovah's Witnesses
61	| LDS--Reorganized
63	| Mennonite
64	| Mormon
65	| Nazarene
66	| Pentecostal Assembly of God
67	| Pentecostal Church of God
68	| Pentecostal
69	| Pentecostal Holiness, Holiness Pentecostal
70	| Quaker
71	| Reformed
72	| Reformed United Church of Christ
73	| Reformed Church of Christ
74	| Religious Science
75	| Mind Science
76	| Salvation Army
77	| 7th Day Adventist
78	| Sanctified, Sanctification
79	| United Holiness
80	| Unitarian, Universalist
81	| United Church of Christ
82	| United Church, Unity Church
83	| Wesleyan
84	| Wesleyan Methodist--Pilgrim
85	| Zion Union
86	| Zion Union Apostolic
87	| Zion Union Apostolic--Reformed
88	| Disciples of God
89	| Grace Reformed
90	| Holiness Church of God
91	| Evangelical Covenant
92	| Mission Covenant
93	| Missionary Baptist
94	| Swedish Mission
95	| Unity
96	| United Church of Christianity
97	| Other Fundamentalist
98	| Federated Church
99	| American Reform
100	| Grace Brethren
102	| Charismatic
103	| Pentecostal Apostolic
104	| House of Prayer
105	| Latvian Lutheran
107	| Apostolic Christian
108	| Christ Cathedral of Truth
109	| Bible Missionary
110	| Calvary Bible
111	| Amish
112	| Evangelical Methodist
113	| Worldwide Church of God
114	| Church Universal and Triumphant
115	| Mennonite Brethren
116	| Church of the First Born
117	| Missionary Church
118	| The Way Ministry
119	| United Church of Canada
120	| Evangelical United Brethren
121	| The Church of God of Prophecy
122	| Chapel of Faith
123	| Polish National Church
124	| Faith Gospel Tabernacle
125	| Christian Calvary Chapel
127	| Church of Daniel's Band
128	| Christian Tabernacle
129	| Living Word
130	| True Light Church of Christ
132	| Brother of Christ
133	| Primitive Baptist
134	| Independent Fundamental Church of America
135	| Chinese Gospel Church
136	| New Age Spirituality
137	| New Song
138	| Apostolic Church
141	| New Birth Christian
143	| Assyrian Evangelist Church
144	| Spirit of Christ
145	| Church of Jesus Christ of the Restoration
146	| Laotian Christian
150	| Zwinglian
151	| World Overcomer Outreach Ministry
152	| Course in Miracles
153	| Unity of the Brethren
154	| Spirit Filled
155	| Christian Union
157	| Community of Christ
158	| New Hope Christian Fellowship
159	| Community Christian Fellowship
166	| United Christian
167	| Sanctuary
168	| Rain on Us Deliverance Ministries
169	| The Word Church
170	| Cornerstone Church
171	| Life Sanctuary
172	| Word of Faith Church
173	| Harvest Church
174	| Shephard's Chapel
175	| Greater New Testament Church
176	| Vineyard Church
177	| Real Life Ministries
178	| Cathedral of Joy
179	| Great Faith Ministries
180	| Shield of Faith Ministries
181	| Born Again
182	| Alliance
185	| Journeys
186	| National Progressive Baptist
187	| New Apostolic
188	| Metropolitan Community
191	| Faith Covenant
196	| Empowerment Temple
197	| Grace Independent Baptist Church
198	| New Life
201	| Pathways Christian Church
205	| Assembly of Christ
206	| The Amana Church
207	| The Legacy Church
208	| Calvary
210	| Ethiopian Evangelical Church
211	| Disciple of Jesus
212	| Scandinavian Church
213	| Hebrew Roots
214	| Hebrew Israelites
215	| Armenian Apostolic Church
998	| Don't know
999	| No answer
0	| Not applicable

In [59]:
gss['other'] = gss['other'].replace([0, 999], np.nan)

The `fund` variable indicates how fundamentalist a respondent is.

Code | Label
--- | ---
1	| Fundamentalist
2	| Moderate
3	| Liberal
9	| Na-excluded

In [60]:
gss['fund'].value_counts()

2    26352
1    19063
3    16856
9     2543
Name: fund, dtype: int64

In [61]:
gss['fund'] = gss['fund'].replace([9], np.nan)

The `attend` variable indicates how often a respondent attends religious services.

Code | Label
--- | ---
0 | Never
1 | Lt once a year
2 | Once a year
3 | Sevrl times a yr
4 | Once a month
5 | 2-3x a month
6 | Nrly every week
7 | Every week
8 | More thn once wk
9 | Dk,na

In [62]:
gss['attend'].value_counts()

7    12686
0    11528
2     8498
3     8003
5     5713
8     4884
1     4844
4     4552
6     3511
9      595
Name: attend, dtype: int64

In [63]:
gss['attend'] = gss['attend'].replace([9], np.nan)

The `reliten` variable indicates the strength of religious affiliation.

Code | Label
--- | ---
1   | Strong
2   | Not very strong
3   | Somewhat strong
4   | No religion
8   | Don't know
9   | No answer
0   | Not applicable

In [64]:
gss['reliten'].value_counts()

2    23738
1    22652
4     7629
3     5736
0     3134
9     1589
8      336
Name: reliten, dtype: int64

In [65]:
gss['reliten'] = gss['reliten'].replace([9, 0], np.nan)

The `realinc` variable indicates the income of the respondent, adjusted for 1986 values. No cleaning is needed for this variable.

The `natfare` variable indicates if the respondent thinks the government spends too much or too little on welfare.

Code | Label
--- | ---
1   | Too little
2   | About right
3   | Too much
8   | Don't know
9   | No answer
0   | Not applicable

In [66]:
gss['natfare'].value_counts()

0    27662
3    16980
2    11144
1     7376
8     1525
9      127
Name: natfare, dtype: int64

In [67]:
gss['natfare'] = gss['natfare'].replace([0, 9], np.nan)

The `homosex` variable indicates if the respondent thinks that homosexual sex is or isn't wrong.

Code | Label
--- | ---
1   | Always wrong
2   | Almst always wrg
3   | Sometimes wrong
4   | Not wrong at all
5   | Other
8   | Don't know
9   | No answer
0   | Not applicable

In [68]:
gss['homosex'].value_counts()

0    25042
1    23469
4     9880
3     2554
2     1756
8     1743
9      288
5       82
Name: homosex, dtype: int64

In [69]:
gss['homosex'] = gss['homosex'].replace([0, 9], np.nan)

The `hapmar` variable indicates how happy a respondent is in their marriage.

Code | Label
--- | ---
1   | Very happy
2   | Pretty happy
3   | Not too happy
8   | Don't know
9   | No answer
0   | Not applicable

In [70]:
gss['hapmar'].value_counts()

0    34043
1    19335
2    10303
3      920
9      162
8       51
Name: hapmar, dtype: int64

In [71]:
gss['hapmar'] = gss['hapmar'].replace([0, 9], np.nan)

The `raclive` variable indicates if there are any people of the opposite race (white or black) living in the respondent's neighborhood.

Code | Label
--- | ---
1   | Yes
2   | No
8   | Don't know
9   | No answer
0   | Not applicable

In [72]:
gss['raclive'].value_counts()

1    34000
2    22567
0     5470
8     2509
9      268
Name: raclive, dtype: int64

In [73]:
gss['raclive'] = gss['raclive'].replace([0, 9], np.nan)

The `fund16` variable indicates the fundamentalism/liberalism in the region the respondent was raised in.

Code | Label
--- | ---
1   | Fundamentalist
2   | Moderate
3   | Liberal
9   | Na-excluded
0   | Not applicable

In [74]:
gss['fund16'].value_counts()

2    27132
1    19472
3    13158
0     3131
9     1921
Name: fund16, dtype: int64

In [75]:
gss['fund16'] = gss['fund16'].replace([0, 9], np.nan)

The `oth16` variable indicates the denomination the respondent was raised in.

Code | Label
--- | ---
2	| Evangelical Congregational
3	| Ind Bible, Bible, Bible Fellowship
5	| Church of Prophecy
6	| New Testament Christian
7	| Church of God, Saint & Christ
8	| Moravian
9	| Christian & Missionary Alliance
10	| Advent Christian
11	| Spiritualist
12	| Assembly of God
13	| Free Methodist
14	| Apostolic Faith
15	| African Methodist
16	| Free Will Baptist
17	| Eden Evangelist
18	| Holiness (Nazarene)
19	| Baptist (Northern)
20	| Brethren Church, Brethren
21	| Witness Holiness
22	| Brethren, Plymouth
23	| United Brethren, United Brethren in Christ
24	| Independent
25	| Christian Disciples
26	| Christ in Christian Union
27	| Open Bible
28	| Christian Catholic
29	| Christ Church Unity
30	| Christ Adelphians
31	| Christian; Central Christian
32	| Christian Reform
33	| Christian Scientist
34	| Church of Christ, Evangelical
35	| Church of Christ
36	| Churches of God(Except with Christ and Holiness)
37	| Church of God in Christ
38	| Church of God in Christ Holiness
39	| Church of the Living God
40	| Congregationalist, 1st Congreg
41	| Community Church
42	| Covenant
43	| Dutch Reform
44	| Disciples of Christ
45	| Evangelical, Evangelist
46	| Evangelical Reformed
47	| Evangelist Free Church
48	| First Church
49	| First Christian Disciples of Christ
50	| First Reformed
51	| First Christian
52	| Full Gospel
53	| Four Square Gospel
54	| Friends
55	| Holy Roller
56	| Holiness; Church of Holiness
57	| Pilgrim Holiness
58	| Jehovah's Witnesses
61	| LDS--Reorganized
63	| Mennonite
64	| Mormon
65	| Nazarene
66	| Pentecostal Assembly of God
67	| Pentecostal Church of God
68	| Pentecostal
69	| Pentecostal Holiness, Holiness Pentecostal
70	| Quaker
71	| Reformed
72	| Reformed United Church of Christ
73	| Reformed Church of Christ
74	| Religious Science
75	| Mind Science
76	| Salvation Army
77	| 7th Day Adventist
78	| Sanctified, Sanctification
79	| United Holiness
80	| Unitarian, Universalist
81	| United Church of Christ
82	| United Church, Unity Church
83	| Wesleyan
84	| Wesleyan Methodist--Pilgrim
85	| Zion Union
86	| Zion Union Apostolic
87	| Zion Union Apostolic--Reformed
88	| Disciples of God
89	| Grace Reformed
90	| Holiness Church of God
91	| Evangelical Covenant
92	| Mission Covenant
93	| Missionary Baptist
94	| Swedish Mission
95	| Unity
96	| United Church of Christianity
97	| Other Fundamentalist
98	| Federated Church
99	| American Reform
100	| Grace Brethren
102	| Charismatic
103	| Pentecostal Apostolic
104	| House of Prayer
105	| Latvian Lutheran
107	| Apostolic Christian
108	| Christ Cathedral of Truth
109	| Bible Missionary
110	| Calvary Bible
111	| Amish
112	| Evangelical Methodist
113	| Worldwide Church of God
114	| Church Universal and Triumphant
115	| Mennonite Brethren
116	| Church of the First Born
117	| Missionary Church
118	| The Way Ministry
119	| United Church of Canada
120	| Evangelical United Brethren
121	| The Church of God of Prophecy
122	| Chapel of Faith
123	| Polish National Church
124	| Faith Gospel Tabernacle
125	| Christian Calvary Chapel
127	| Church of Daniel's Band
128	| Christian Tabernacle
129	| Living Word
130	| True Light Church of Christ
132	| Brother of Christ
133	| Primitive Baptist
134	| Independent Fundamental Church of America
135	| Chinese Gospel Church
136	| New Age Spirituality
137	| New Song
138	| Apostolic Church
141	| New Birth Christian
143	| Assyrian Evangelist Church
144	| Spirit of Christ
145	| Church of Jesus Christ of the Restoration
146	| Laotian Christian
150	| Zwinglian
151	| World Overcomer Outreach Ministry
152	| Course in Miracles
153	| Unity of the Brethren
154	| Spirit Filled
155	| Christian Union
157	| Community of Christ
158	| New Hope Christian Fellowship
159	| Community Christian Fellowship
166	| United Christian
167	| Sanctuary
168	| Rain on Us Deliverance Ministries
169	| The Word Church
170	| Cornerstone Church
171	| Life Sanctuary
172	| Word of Faith Church
173	| Harvest Church
174	| Shephard's Chapel
175	| Greater New Testament Church
176	| Vineyard Church
177	| Real Life Ministries
178	| Cathedral of Joy
179	| Great Faith Ministries
180	| Shield of Faith Ministries
181	| Born Again
182	| Alliance
185	| Journeys
186	| National Progressive Baptist
187	| New Apostolic
188	| Metropolitan Community
191	| Faith Covenant
196	| Empowerment Temple
197	| Grace Independent Baptist Church
198	| New Life
201	| Pathways Christian Church
205	| Assembly of Christ
206	| The Amana Church
207	| The Legacy Church
208	| Calvary
210	| Ethiopian Evangelical Church
211	| Disciple of Jesus
212	| Scandinavian Church
213	| Hebrew Roots
214	| Hebrew Israelites
215	| Armenian Apostolic Church
998	| Don't know
999	| No answer
0	| Not applicable

In [76]:
gss['oth16'] = gss['oth16'].replace([999, 0], np.nan)

The `denom16` variable indicates the denomination the respondent was raised in.

Code | Label
--- | ---
10 | Am baptist asso
11 | Am bapt ch in usa
12 | Nat bapt conv of am
13 | Nat bapt con usa
14 | Southern baptist
15| Other baptists
18 | Baptist-dk which
20 | Afr meth episcopal
22 | United methodist
23 | Other methodist
28 | Methodist-dk which
30 | Am lutheran
31 | Luth ch in america
32 | Lutheran-mo synod
33 | Wi evan luth synod
34 | Other lutheran
35 | Evangelical luth
38 | Lutheran-dk which
40 | Presbyterian c in us
41 | United pres ch in us
42 | Other presbyterian
43 | Presbyterian, merged
48 | Presbyterian-dk which
50 | Episcopal
60 | Other
70 | No denomination
98 | Don't know
99 | No answer
0 | Not applicable

In [77]:
gss['denom16'] = gss['denom16'].replace([99, 0], np.nan)

The `relig16` variable indicates the religion the respondent was raised in.

Code | Label
--- | ---
1 | Protestant
2 | Catholic
3 | Jewish
4 | None
5 | Other
6 | Buddhism
7 | Hinduism
8 | Other eastern
9 | Moslem/Islam
10 | Orthodox Christian
11 | Christian
12 | Native American
13 | Inter-nondenominational
98 | Don't know
99 | No answer

In [78]:
gss['relig16'].value_counts()

1     37042
2     17974
4      3421
0      3131
3      1266
5       648
11      417
99      249
9       161
10      138
6       128
7       114
98       58
13       31
12       24
8        12
Name: relig16, dtype: int64

In [79]:
gss['relig16'] = gss['relig16'].replace([99], np.nan)

In [80]:
dest = 'gss.hdf5'
gss.to_hdf(dest, key='gss')