### Importing OS and CSV modules

In [21]:
# list files in a directory using :os.path.join
import os

# import csv files
import csv

### Importing CSV file 
In order to import your codebook, first ensure that it is saved as a csv file.

Under 'data_dir', replace the path name with the path your csv file is located in.

Under 'csv_file', replace "codebook_file.csv" with the name of your csv file.

In [22]:
# file directory
data_dir = "/Users/fernanr1/Google Drive/Python Workspace/Codebook Converter - 11-15-17/"

# file name
csv_file = os.path.join(data_dir, "codebook_file.csv")
type(csv_file)


str

Checking that we are able to successfully read the first colum of our CSV file

In [23]:
with open(csv_file, 'r') as f:
    reader = csv.reader(f, delimiter = ',') # saving contents to variable 'reader'
    for line in reader:
        print line[0] # reading the first colum 

variable
coursetype
pre_status
post_status
pre_ma3
pre_pav6
pre_ma5
pre_int
pre_eng


Getting a list of all of the response options

In [24]:
with open(csv_file, 'r') as f:
    reader = csv.reader(f, delimiter = ',')
    header = reader.next()
    for line in reader:
        responseopt = line[2]
        print responseopt
            
    

1 "Online" 0 "Face-to-face"
1 "Participated" 0 "Did not participate"
1 "Participated" 0 "Did not participate"
1 "Not at all true" 2 "Not true" 3 "Somewhat true" 4 "True" 5 "Very true"
1 "Not at all true" 2 "Not true" 3 "Somewhat true" 4 "True" 5 "Very true"
1 "Not at all true" 2 "Not true" 3 "Somewhat true" 4 "True" 5 "Very true"
1 "Yes"  0 "No"
1 "Yes"  0 "No"


In [25]:
# Getting a list of unique response options
# These will be the values for the key:value pairs
responses = []

with open(csv_file, 'r') as f:
    reader = csv.reader(f, delimiter = ',')
    header = reader.next()
    for line in reader:
        if line[2] not in responses:
            responses.append(line[2])
            
print responses

['1 "Online" 0 "Face-to-face"', '1 "Participated" 0 "Did not participate"', '1 "Not at all true" 2 "Not true" 3 "Somewhat true" 4 "True" 5 "Very true"', '1 "Yes"  0 "No"']


In [26]:
# Generating 100 labelids which will serve as the keys for the key:value pair
labelid = []
labelnum = 1
for x in range(1, 101):
    res = ("labelname" + `labelnum`)
    labelnum += 1
    labelid.append(res)
print(labelid[:10])

['labelname1', 'labelname2', 'labelname3', 'labelname4', 'labelname5', 'labelname6', 'labelname7', 'labelname8', 'labelname9', 'labelname10']


Creating label dictionary by combining the label id (key) and responses (values)

In [27]:
labeldict = dict(zip(labelid, responses))
print labeldict


{'labelname4': '1 "Yes"  0 "No"', 'labelname3': '1 "Not at all true" 2 "Not true" 3 "Somewhat true" 4 "True" 5 "Very true"', 'labelname2': '1 "Participated" 0 "Did not participate"', 'labelname1': '1 "Online" 0 "Face-to-face"'}


## Coverting codebook to STATA syntax

We see from the above output that our top row has 3 columns.

1) Variable, 2) Item, and  3) Response Options

<i><b>Variable</i></b> contains the variable name of our dataset [index 0]

<i><b>Item</i></b> contains the question item [index 1]

<i><b>Response Options</i></b> contains the response options for each question item [index 2]
<br><br><br>
We will use these three pieces of information to create STATA snytax that does the following:

1) define the response options using `response options` column.

2) Label the variables names and values using the `variable` and `item` columns.


In [34]:
# 1) Defining the response options
print "*** DEFINING LABELS FOR EACH UNIQUE RESPONSE OPTIONS ***"
print ""
for x in labeldict:
    print "label define", x, str(labeldict[x])
    print ""
print ""
print ""

*** DEFINING LABELS FOR EACH UNIQUE RESPONSE OPTIONS ***

label define labelname4 1 "Yes"  0 "No"

label define labelname3 1 "Not at all true" 2 "Not true" 3 "Somewhat true" 4 "True" 5 "Very true"

label define labelname2 1 "Participated" 0 "Did not participate"

label define labelname1 1 "Online" 0 "Face-to-face"





In [32]:
# 2) Labeling the variables (example)

# identifying matching values in a dictionary and printing out that value
# `keylist` refers to the key, and `valuelist` refers to the value for the key-value pair.
# The for-loop is saying, for key, value in the dictionary `labeldict.items`
## you need the .items method to traverse throught the dictionary
## if the value `valuelist` is the same as the text in line[2], then print the key `keylist`

for keylist, valuelist in labeldict.items():
            if valuelist == str(line[2]):
                print "label values", keylist


label values labelname4



In [30]:
# now string to get the right labelnames to print out
print "*** DATA LABELING ***"
print ""
with open(csv_file, 'r') as f:
    reader = csv.reader(f, delimiter = ',')
    header = reader.next()
    for line in reader:
        labels = str(line[0])+"label" # making a label name for STATA
        print "*", line[0], "data label"
        print "label variable", line[0], '"%s"' % line[1]
        for keylist, valuelist in labeldict.items():
            if valuelist == str(line[2]):
                print "label values", keylist
        print""

*** DATA LABELING ***

* coursetype data label
label variable coursetype "Course type"
label values labelname1

* pre_status data label
label variable pre_status "Pre-survey participation status"
label values labelname2

* post_status data label
label variable post_status "Post-survey participation status"
label values labelname2

* pre_ma3 data label
label variable pre_ma3 "I like class work best when it really makes me think"
label values labelname3

* pre_pav6 data label
label variable pre_pav6 "One reason I would not participate in class is to avoid looking stupid"
label values labelname3

* pre_ma5 data label
label variable pre_ma5 "An important reason I do my class work is because I enjoy it"
label values labelname3

* pre_int data label
label variable pre_int "Are you an international student?"
label values labelname4

* pre_eng data label
label variable pre_eng "Is English your native language"
label values labelname4

