# Basic CSV: Extracting Column to File
This Notebook is for extracting a column of text comments from a CSV. It assumes you have **Zapped Gremlins**.

## Finding the file

First we see what files we have.

In [2]:
%ls *.csv

AdamSavageComments.csv   MockInterviewCorpus.csv


## Importing the appropriate column

Now we import colum 4 (the 5th column) of data.

In [3]:
import csv
comments = []
with open('AdamSavageComments.csv', 'r') as file: # This makes sure that file is closed after reading
    data = csv.reader(file)
    for row in data:
        comments.append(row[4]) # This puts all the data from column 5 into a list
    file.close()

len(comments)

585

In [4]:
comments[:4]

['commentText',
 'Adam avoids question.',
 "Adam Savage's stock went way down in my book. Sometimes it's best not to get to know celebrities or what they think... 3:36 Ahh, he's bordering and battling with white guilt. We just saw it's mental evolution. Compare the contradiction 1:45 and 4:04.",
 'not sure what side Adam is talking about... sounds more like the SJW and not GG...']

### Importing with conditions

We can also make decisions based on other columns. In this case we check to see if column 5 is a decimal (because in the the first row it is not) **and** whether it is over 3. This pulls the comments liked by more than three people.

In [8]:
import csv
comments = []
with open('AdamSavageComments.csv', 'r') as file: # This makes sure that file is closed after reading
    data = csv.reader(file)
    for row in data:
        if row[5].isdecimal() and int(row[5]) > 3:
            comments.append(row[4]) # This puts all the data into a list
    file.close()       

len(comments)

174

Now we can check what comments we got.

In [6]:
print(comments[0:2]) # Here we check the list

['i waited until the end to down vote this video.', '0:35 - "I don\'t understand the anger..." You don\'t even understand the premise.']


## Selecting Comments

We can further process the comments selecting those we want.

In [7]:
selectedComms = []

for comment in comments:
    if "book" in comment or "stock" in comment: # Here we check if the words we want are in the comment.
        selectedComms.append(comment)
        
len(selectedComms)

1

## Convert list to string

If we want to use our text tools we need to convert the items in the list into a nice text.

In [9]:
theWholeText = ""

for comment in comments[1:]:
    theWholeText = theWholeText + "\n\n" + comment
    
print(theWholeText[0:100]) # We check by printing first 100 characters of the string



0:35 - "I don't understand the anger..." You don't even understand the premise.

He doesnt know th


Here we append all the comments to get a single text. We could save that out or search it.

## Saving file
Now we save the file.

In [20]:
with open("FullText.txt", "w") as myfile: # Note that we overwrite to the file. That is the "w"
    myfile.write(theWholeText)
    myfile.close()

In [34]:
%ls *.txt

FullText.txt                performanceConcordance.txt
Hume Enquiry.txt            theWritingStory.txt
StoryOfWriting.txt          truthConcordance.txt
bigdata.txt


In [14]:
# theWholeText[0:300]

"\n\nRT @rickkytheG: https://t.co/aq0wC7bOuE     Responding to feminists who oppose prostitution #GamerGate #opskynet #notyourshield\n\nhttps://t.co/aq0wC7bOuE     Responding to feminists who oppose prostitution #GamerGate #opskynet #notyourshield\n\nRT @cringe_channel: So, what isn't #gamergate's fault to"

---
# Exercise

Download the MockInterviewData.csv file. That has interview data from 4 interviews with 2 questions each interview. Write a notebook that can:

* Extract the answers by people with graduate degrees (MA or PhD)

**Optional**
Can you also get the high frequency words from those with just BAs and compare them to those by those with advanced degrees?

* Calculate the top high-frequency words.