### Sample text analysis using spacy

Spacy is a library that can assist you in doing linguistic analyses. 

To install and use the Englis-language version of spacy you should run these commands in your virtual environment:
`pip3 install spacy`
`python3 -m spacy download en_core_web_sm`
We will be importing the `text.txt` file in our `data` folder. It contains a sample article about a very special [cat](https://www.buzzfeednews.com/article/juliareinstein/this-thicc-lazy-high-maintenance-incredibly-well-hydrated/).

In [None]:
import spacy
import pandas as pd

In [None]:
# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load('en_core_web_sm')

# opens the text file and turns it into a string
text = open("../data/text.txt","r+").read()
len(text) # this returns the length of characters and spaces

Now let's turn the string into a corpus for spacy

In [None]:
doc = nlp(text)
len(doc) # this returns the tokens

The document can act like a list of words. To access each word or 'token' we can use the built in function `.text`

In [None]:
for token in doc:
    print(token.text)

Now we can count some words by:
- turning the words into a list
- turning that list into a pandas data frame
- counting the values

In [None]:
rows = []
for token in doc:
    rows.append(token.text)

In [None]:
print(rows)

In [None]:
word_dataframe = pd.DataFrame(rows)
word_dataframe.columns = ['word']
word_dataframe.head()

In [None]:
word_count = word_dataframe['word'].value_counts().reset_index()
word_count.head()

In [None]:
word_count_alt = word_dataframe.groupby('word').agg({"word":"count"})
word_count_alt.head()

In [None]:
word_count.to_csv('../output/word_count.csv', index=False)
word_count_alt.to_csv('../output/word_count2.csv')