# DSFB Assignment 4

In this assignment, you will begin to work with text data and natural language processing. You will analyze aspects of th DonorsChoose.org program. Aspects of this project were first posed as a Kaggle challenge and the data comes from [Kaggle DonorsChoose.org Application Screening challenge](https://www.kaggle.com/c/donorschoose-application-screening/data). We have changed the nature of what you need to do in this assignment (so it does not track what was done in the Kaggle Challenge), but nevertheless using or referring to the Kaggle Challenge repository is not allowed for the assignment.

###  DonorsChoose.org  
  
Founded in 2000 by a high school teacher in the Bronx, DonorsChoose.org empowers public school teachers from across the country to request much-needed materials and experiences for their students. At any given time, there are thousands of classroom requests that can be brought to life with a gift of any amount. DonorsChoose.org receives hundreds of thousands of project proposals each year for classroom projects in need of funding. Right now, a large number of volunteers is needed to manually screen each submission before it's approved to be posted on the DonorsChoose.org website. In this assignment, you will analyze the text of the essays and requirements from each proposal.

<img src="https://cached.imagescaler.hbpl.co.uk/resize/scaleWidth/580/cached.offlinehbpl.hbpl.co.uk/news/NST/C8B9CC1D-03B0-9B80-4CFE78B5B539240F.jpg" width="500" height="500" align="center"/>

Image source: https://cached.imagescaler.hbpl.co.uk/resize/scaleWidth/580/cached.offlinehbpl.hbpl.co.uk/news/NST/C8B9CC1D-03B0-9B80-4CFE78B5B539240F.jpg

### Data

As you will see, this dataset includes many different kinds of features with structured and unstructured data. The dataset consists of application materials (see *application_data.csv*) and resources requested (see *resource_data.csv*). The application materials (see *application_data.csv*) contain the following features.

| Feature name  | Description  |
|----------------|--------------|
| id  | Unique id of the project application    |
| teacher_id    | id of the teacher submitting the application  |
| teacher_prefix    | title of the teacher's name (Ms., Mr., etc.)    |
| school_state    | US state of the teacher's school    |
| project_submitted_datetime    | application submission timestamp    |
| project_grade_category    | school grade levels (PreK-2, 3-5, 6-8, and 9-12)   |
| project_subject_categories   | category of the project (e.g., "Music & The Arts")    |
| project_subject_subcategories    | sub-category of the project (e.g., "Visual Arts")    |
| project_title    | title of the project    |
| project_essay_1    | first essay*   |
| project_essay_2    | second essay*    |
| project_essay_3    | third essay*   |
| project_essay_4    | fourth essay*  |
| project_resource_summary    | summary of the resources needed for the project    |
| teacher_number_of_previously_posted_projects   | number of previously posted applications by the submitting teacher    |
| project_is_approved    | whether DonorsChoose proposal was accepted (0="rejected", 1="accepted"); train.csv only    |


\*Note: Prior to May 17, 2016, the prompts for the essays were as follows:

  * project_essay_1: "Introduce us to your classroom"  

  * project_essay_2: "Tell us more about your students"  

  * project_essay_3: "Describe how your students will use the materials you're requesting"  

  * project_essay_4: "Close by sharing why your project will make a difference"  

Starting on May 17, 2016, the number of essays was reduced from 4 to 2, and the prompts for the first 2 essays were changed to the following:

  * project_essay_1: "Describe your students: What makes your students special? Specific details about their background, your neighborhood, and your school are all helpful."  

  * project_essay_2: "About your project: How will these materials make a difference in your students' learning and improve their school lives?"  

For all projects with project_submitted_datetime of 2016-05-17 and later, the values of project_essay_3 and project_essay_4 will be missing (i.e. NaN).


### Special NLP Libraries

We will use several new libraries for this assignment - so be sure to first install those on your machine by with `pip` in a terminal:

    pip install --user -U nltk
    pip install -U gensim
    pip install -U spacy
    pip install -U pyldavis

## IMPORTS

In [1]:
# Standard imports
import numpy  as np
import pandas as pd

import itertools
import random
import math  
import copy

from pprint import pprint  # nicer printing

# Gensim
import gensim
import gensim.corpora as corpora
from gensim.utils import simple_preprocess
from gensim.models import CoherenceModel

# Other NLP
import re
import spacy
import nltk
from nltk.corpus import stopwords

# General Plotting
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import matplotlib.patches as patches
%matplotlib inline  
import seaborn as sns
sns.set(style="white")

# Special Plotting
import pyLDAvis
import pyLDAvis.gensim  # don't skip this

# ignore some warnings 
import warnings
warnings.filterwarnings('ignore')

# Set the maximum number of rows displayed by pandas
pd.options.display.max_rows = 1000

# Set some CONSTANTS that will be used later
SEED    = 41  # base to generate a random number
SCORE   = 'roc_auc'
FIGSIZE = (16, 10)

# PART 1: Prep

**PROBLEM**: To use a particular model in the `spacy` package, you need to manually download and install that particular model. You will need to run the following code from a terminal: `python -m spacy download en_core_web_sm`. Rather than doing that manually from bash in a separate terminal program, do it inline below using a "magic" command in jupyter. HINT: Use *!* followed by a bash command in a cell to run a bash command.

In [2]:
# Download en_core_web_sm for spacy
!python -m spacy download en_core_web_sm


You should consider upgrading via the 'pip install --upgrade pip' command.[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')


**PROBLEM**: To confirm that `spacy` is working (and `en_core_web_sm` is installed on your computer), you should be able to use `spacy.load()` to build a `Language` object to perform some basic nlp. Do that below:

In [2]:
# Test use of spacy by using the spacy.load() function
spacy.load('en_core_web_sm')


<spacy.lang.en.English at 0x108d7cc88>

**PROBLEM**: Use nltk.download() to download a list of raw stopwords. (see NLTK documentation)

In [3]:
# Download NLTK stopwords
nltk.download('stopwords')


[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/philippspiess/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

**PROBLEM**: Use the `stopwords` object from `nltk` to build a list of English stopwords. 

In [4]:
# Get English Stopwords from NLTK
stop_words = stopwords.words('english')


**PROBLEM**: Extend your `stop_words` list with some additional stopwords that you believe should be ignored in this particular context.

In [5]:
# Extend the stop word list 
setAdditions = {"students", "class",'classroom','edu'} 
#These could also be included: 'school', 'learning', 'book','study', 'learn','read','many'
stop_words = set(list(stop_words) + list(setAdditions))


### Download the Data

Unlike other projects, this project includes a training set too big for GitHub. Through the terminal lab of Jupyter lab, download the data using the *wget* command, unzip it using the *zip* command and check that it's in the root directory of the project. 

Locations : 

    Applications dataset: https://storage.googleapis.com/dsfm/application/application_data.csv.zip
    Resources dataset: https://storage.googleapis.com/dsfm/application/resource_data.csv.zip
    
Hint: Use *wget* and *unzip* commands. Use *!* followed by a bash command in a cell to run a bash command.

**PROBLEM**: wget the data

In [7]:
# wget the data
!wget https://storage.googleapis.com/dsfm/application/application_data.csv.zip
!wget https://storage.googleapis.com/dsfm/application/resource_data.csv.zip 


--2019-11-17 18:54:31--  https://storage.googleapis.com/dsfm/application/application_data.csv.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 2a00:1450:400a:801::2010, 172.217.168.16
Connecting to storage.googleapis.com (storage.googleapis.com)|2a00:1450:400a:801::2010|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 113169833 (108M) [application/zip]
Saving to: ‘application_data.csv.zip.2’


2019-11-17 18:54:54 (4.68 MB/s) - ‘application_data.csv.zip.2’ saved [113169833/113169833]

--2019-11-17 18:54:55--  https://storage.googleapis.com/dsfm/application/resource_data.csv.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 2a00:1450:400a:801::2010, 172.217.168.16
Connecting to storage.googleapis.com (storage.googleapis.com)|2a00:1450:400a:801::2010|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 42396552 (40M) [application/zip]
Saving to: ‘resource_data.csv.zip.1’


2019-11-17 18:55:03 (4.79 MB/s) - ‘resou

**PROBLEM**: unzip the data

In [None]:
# unzip the data
!unzip application_data.csv.zip
!unzip resource_data.csv.zip 


# PART 2: Load Data

**PROBLEM**: Load `application_data.csv` and investigate it a bit.

In [18]:
# Load applications
application_data = pd.read_csv('application_data.csv')
print(application_data.shape)
print(application_data.sample(n=1))

(182080, 16)
             id                        teacher_id teacher_prefix school_state  \
177593  p004759  18a806c803fc047c974f2c908e15e7f5           Mrs.           SC   

       project_submitted_datetime project_grade_category  \
177593        2016-07-18 09:52:44          Grades PreK-2   

       project_subject_categories project_subject_subcategories  \
177593        Literacy & Language                      Literacy   

                 project_title  \
177593  Time To Get Organized!   

                                          project_essay_1  \
177593  In North Charleston, my school is labeled as \...   

                                          project_essay_2 project_essay_3  \
177593  My students are separated into four literacy g...             NaN   

       project_essay_4                           project_resource_summary  \
177593             NaN  My students need help getting organized with a...   

        teacher_number_of_previously_posted_projects  project_is_a

**PROBLEM**: Load `resource_data.csv` and investigate it a bit.

In [26]:
# Load resources
resource_data = pd.read_csv('resource_data.csv')
print(resource_data.shape)
print(resource_data.sample(5))

(1541272, 4)
              id                                        description  quantity  \
994960   p060085  Elmers Washable No-Run School Glue, 4 oz, 1 Bo...         6   
1102090  p185543                       If You Give a Mouse a Cookie         1   
1436320  p172858  Wausau Paper Astrobrights Colored Paper, 8.5" ...         1   
790297   p016253  Staples 5.5 Quart Plastic Locking Lid Containe...         1   
749012   p115521             SKLZ Mini Practice Baseballs - 12 Pack         5   

         price  
994960    7.95  
1102090  12.40  
1436320  10.22  
790297   23.88  
749012    6.99  


**PROBLEM**: Some of the essays are NA. Replace NAs with empty strings.

In [20]:
# Replace NA values in essay columns with ''
application_data=application_data.fillna('')


**PROBLEM**: To simplify matters, combine all essays into just one feature called "essays"

In [21]:
# Combine essays
application_data['essays'] = application_data[['project_essay_1', 'project_essay_2','project_essay_3','project_essay_4']].astype(str).apply(''.join, axis=1)
application_data=application_data.drop(['project_essay_1', 'project_essay_2','project_essay_3','project_essay_4'], axis=1)
print(application_data.columns)
print(application_data.shape)

Index(['id', 'teacher_id', 'teacher_prefix', 'school_state',
       'project_submitted_datetime', 'project_grade_category',
       'project_subject_categories', 'project_subject_subcategories',
       'project_title', 'project_resource_summary',
       'teacher_number_of_previously_posted_projects', 'project_is_approved',
       'essays'],
      dtype='object')
(182080, 13)


**PROBLEM**: Merge the resources and application datasets on the *id* feature.

In [29]:
# Merge two datasets
data = pd.merge(application_data, resource_data, on='id')

# Check the data to confirm it worked
print(data.columns)
print(data.shape)


Index(['id', 'teacher_id', 'teacher_prefix', 'school_state',
       'project_submitted_datetime', 'project_grade_category',
       'project_subject_categories', 'project_subject_subcategories',
       'project_title', 'project_resource_summary',
       'teacher_number_of_previously_posted_projects', 'project_is_approved',
       'essays', 'description', 'quantity', 'price'],
      dtype='object')
(1081830, 16)


**PROBLEM**: Keep the following data for additional analysis (the id and the text features): `id`, `school_state`, `project_subject_categories`, `project_subject_subcategories`, `essays`, `description`

In [11]:
FEATURE_NAMES = ['school_state', 'project_subject_categories', 'project_subject_subcategories', 'essays', 'description']

In [12]:
# Keep the Text Features
FEATURE_NAMES.append('id')
merged = data[FEATURE_NAMES]
FEATURE_NAMES.remove('id')
print(merged.columns)


Index(['school_state', 'project_subject_categories',
       'project_subject_subcategories', 'essays', 'description', 'id'],
      dtype='object')


# PART 3: Preprocess Text

Make an independent copy of the data so we can restart here when testing...

In [13]:
data = copy.copy(merged)  # when "merged" is the pandas dataframe
print(data.columns)

Index(['school_state', 'project_subject_categories',
       'project_subject_subcategories', 'essays', 'description', 'id'],
      dtype='object')


**PROBLEM**: Define a custom function `clean_punctuation()` to remove some punctuation from your text data. You don't have to do absolutely everything one might want to do - just show that you can do it. Start with each some easy operations with `str.replace()`.

In [14]:
# Define a custom function to clean punctuation from given text
def clean_punctuation(txt):
    txt = str(txt)
    txt = txt.replace(".", "")
    txt = txt.replace("!", "")
    txt = txt.replace(",", "")
    txt = txt.replace(";", "")
    txt = txt.lower()
    return txt


**PROBLEM**: Use the `apply()` function from pandas to _apply_ that function down the `essays` column of your data.

In [15]:
# Apply your function to clean the essays column
for feature in FEATURE_NAMES:
    data[feature] = data[feature].apply(lambda txt : clean_punctuation(txt)) 
data.head()

Unnamed: 0,school_state,project_subject_categories,project_subject_subcategories,essays,description,id
0,nv,literacy & language,literacy,most of my kindergarten students come from low...,apple - ipod nano� 16gb mp3 player (8th genera...,p036502
1,nv,literacy & language,literacy,most of my kindergarten students come from low...,apple - ipod nano� 16gb mp3 player (8th genera...,p036502
2,ga,music & the arts health & sports,performing arts team sports,our elementary school is a culturally rich sch...,reebok girls' fashion dance graphic t-shirt - ...,p039565
3,ut,math & science literacy & language,applied sciences literature & writing,hello\r\nmy name is mrs brotherton i teach 5th...,3doodler start full edu bundle,p233823
4,nc,health & sports,health & wellness,my students are the greatest students but are ...,ball pg 4'' poly set of 6 colors,p185307


**PROBLEM**: Define **another** custom function called `clean_re()` to clean your text data using regular expressions. Do at least two "cleanings" (i.e., show that you can use the `re` library).

In [16]:
# Define a custom function to clean some given text
def clean_re(txt):
    txt=str(txt)
    txt = re.sub(r'[^\w\s]','',txt)
    return txt

In [17]:
# Apply clean_re() to all features
for feature in FEATURE_NAMES:
    data[feature] = data[feature].apply(lambda txt : clean_re(txt)) 
data.head()



Unnamed: 0,school_state,project_subject_categories,project_subject_subcategories,essays,description,id
0,nv,literacy language,literacy,most of my kindergarten students come from low...,apple ipod nano 16gb mp3 player 8th generatio...,p036502
1,nv,literacy language,literacy,most of my kindergarten students come from low...,apple ipod nano 16gb mp3 player 8th generatio...,p036502
2,ga,music the arts health sports,performing arts team sports,our elementary school is a culturally rich sch...,reebok girls fashion dance graphic tshirt dd ...,p039565
3,ut,math science literacy language,applied sciences literature writing,hellornmy name is mrs brotherton i teach 5th g...,3doodler start full edu bundle,p233823
4,nc,health sports,health wellness,my students are the greatest students but are ...,ball pg 4 poly set of 6 colors,p185307


**PROBLEM**: Remove stopwords. (Hint: use stopwords from nltk's `stopwords()` plus any additions you'd like to make. Then, again, define a custom function and then apply it to all features.)

In [18]:
# Define custom function to remove stopwords
def remove_stopWords(txt):
    words = txt.split(' ')
    wordsFiltered = []
    for word in words:
        if word not in stop_words:
            wordsFiltered.append(word)
    txt = " ".join(wordsFiltered)
    return txt


In [19]:
# Apply function to remove stopwords  
for feature in FEATURE_NAMES:
    data[feature] = data[feature].apply(lambda txt : remove_stopWords(txt)) 
data.head()

Unnamed: 0,school_state,project_subject_categories,project_subject_subcategories,essays,description,id
0,nv,literacy language,literacy,kindergarten come lowincome households conside...,apple ipod nano 16gb mp3 player 8th generatio...,p036502
1,nv,literacy language,literacy,kindergarten come lowincome households conside...,apple ipod nano 16gb mp3 player 8th generatio...,p036502
2,ga,music arts health sports,performing arts team sports,elementary school culturally rich school diver...,reebok girls fashion dance graphic tshirt dd ...,p039565
3,ut,math science literacy language,applied sciences literature writing,hellornmy name mrs brotherton teach 5th grade ...,3doodler start full bundle,p233823
4,nc,health sports,health wellness,greatest socially economically disadvantaged ...,ball pg 4 poly set 6 colors,p185307


**PROBLEM**: Now use Gensim’s `simple_preprocess()` function to tokenize and clean up your text data. TIP: `simple_preprocess()` returns a list of words, so we want to wrap it with a function that joins the list back together into a string.

In [20]:
# Define custom function to wrap simple_preprocess() from gensim
def wrap(txt):
    words = simple_preprocess(txt)
    txt = " ".join(words)
    return txt


In [21]:
# Apply simple_preprocess() to all features
for feature in FEATURE_NAMES:
    data[feature] = data[feature].apply(lambda txt : wrap(txt)) 
data.head()

Unnamed: 0,school_state,project_subject_categories,project_subject_subcategories,essays,description,id
0,nv,literacy language,literacy,kindergarten come lowincome households conside...,apple ipod nano gb mp player th generation lat...,p036502
1,nv,literacy language,literacy,kindergarten come lowincome households conside...,apple ipod nano gb mp player th generation lat...,p036502
2,ga,music arts health sports,performing arts team sports,elementary school culturally rich school diver...,reebok girls fashion dance graphic tshirt dd d...,p039565
3,ut,math science literacy language,applied sciences literature writing,hellornmy name mrs brotherton teach th grade a...,doodler start full bundle,p233823
4,nc,health sports,health wellness,greatest socially economically disadvantaged i...,ball pg poly set colors,p185307


**PROBLEM**: Lemmatize the text. (Hint: Define a custom function and then apply it to all features.)

In [22]:
# Write a lemmatization function based on nltk.stem.WordNetLemmatizer()
nltk.download('wordnet')
def Lemmatization(txt):
    lemmatizer = nltk.stem.WordNetLemmatizer()
    txt=str(txt)
    words = txt.split(' ')
    wordsLemma = []
    for word in words:
        word = lemmatizer.lemmatize(word)
        wordsLemma.append(word)
    return wordsLemma


[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/philippspiess/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [23]:
# Apply lemmatize_text() to all features  
for feature in FEATURE_NAMES:
    data[feature] = data[feature].apply(lambda txt : Lemmatization(txt)) 
data.head()


Unnamed: 0,school_state,project_subject_categories,project_subject_subcategories,essays,description,id
0,[nv],"[literacy, language]",[literacy],"[kindergarten, come, lowincome, household, con...","[apple, ipod, nano, gb, mp, player, th, genera...",p036502
1,[nv],"[literacy, language]",[literacy],"[kindergarten, come, lowincome, household, con...","[apple, ipod, nano, gb, mp, player, th, genera...",p036502
2,[ga],"[music, art, health, sport]","[performing, art, team, sport]","[elementary, school, culturally, rich, school,...","[reebok, girl, fashion, dance, graphic, tshirt...",p039565
3,[ut],"[math, science, literacy, language]","[applied, science, literature, writing]","[hellornmy, name, mr, brotherton, teach, th, g...","[doodler, start, full, bundle]",p233823
4,[nc],"[health, sport]","[health, wellness]","[greatest, socially, economically, disadvantag...","[ball, pg, poly, set, color]",p185307


**PROBLEM**: What happened to the data in the pandas dataframe>

ANSWER: It was converted from long text into a list of individual words

# PART 4:  Make an LDA topic model for the ESSAYS.

Define an LDA topic model for the `essays`. Compute the "Coherence score." Visually inspect the topic model by inspecting the top keywords from each model. Gensim provides functions for all of these tasks.  

In [24]:
data_essays = data['essays'].tolist()

id2word = corpora.Dictionary(data_essays)
texts = data_essays
corpus = [id2word.doc2bow(text) for text in texts]

print(id2word)

k = [2,4,10]
best_K =2
best_coherence=-100
for K in k:
    model = gensim.models.ldamodel.LdaModel(corpus, K, id2word)
    cm = CoherenceModel(model=model, corpus=corpus, coherence='u_mass')
    coherence = cm.get_coherence()  # get coherence value
    print('K: ', K, '     coherence: ', coherence)
    if coherence>best_coherence:
        best_K = K
        best_coherence = coherence
        lda_model = model
        
pprint(lda_model.print_topics())
print('Best K: ', best_K, '      Best coherence Score: ', best_coherence)

Dictionary(188025 unique tokens: ['activity', 'alongside', 'always', 'around', 'atrisk']...)
K:  2      coherence:  -0.9029530240468011
K:  4      coherence:  -0.9767504846041583
K:  10      coherence:  -1.3412588173843978
[(0,
  '0.021*"school" + 0.013*"learning" + 0.010*"need" + 0.010*"help" + '
  '0.009*"learn" + 0.007*"many" + 0.007*"work" + 0.007*"use" + '
  '0.006*"material" + 0.006*"skill"'),
 (1,
  '0.042*"book" + 0.031*"reading" + 0.018*"read" + 0.016*"school" + '
  '0.012*"love" + 0.010*"help" + 0.010*"library" + 0.009*"need" + 0.009*"many" '
  '+ 0.008*"learn"')]
Best K:  2       Best coherence Score:  -0.9029530240468011


If you use gensim and the following three variables, then you can visualize topics & keywords with the code below.

    lda_model:    this is an LDA model generated by gensim.models.ldamodel.LdaModel()
    id2word:      this is the dictionary term IDs from corpora.Dictionary()
    corpus:       this is the collection of "documents"


In [25]:
# Visualize topics-keywords
pyLDAvis.enable_notebook()
vis = pyLDAvis.gensim.prepare(lda_model, corpus, id2word)
vis

# PART 5:  Make an LDA topic model for the DESCRIPTIONS.

Using the same K (and any other hyperparameters from Part 4), recompute a model for Descriptions. Compare the two sets of results. Do they vary? How? Why? Explain what you find. 

In [26]:
data_description = data['description'].tolist()

id2word = corpora.Dictionary(data_description)
texts = data_description
corpus = [id2word.doc2bow(text) for text in texts]

print(id2word)

lda_model = gensim.models.ldamodel.LdaModel(corpus, best_K, id2word)
cm = CoherenceModel(model=lda_model, corpus=corpus, coherence='u_mass')
coherence = cm.get_coherence()  # get coherence value
    
pprint(lda_model.print_topics())
print('Best K: ', best_K, '    Best coherence Score: ', coherence)

Dictionary(70678 unique tokens: ['apple', 'blue', 'gb', 'generation', 'ipod']...)
[(0,
  '0.022*"pack" + 0.018*"black" + 0.017*"gb" + 0.015*"color" + 0.011*"inch" + '
  '0.011*"ipad" + 0.009*"kid" + 0.009*"mini" + 0.008*"blue" + 0.008*"paper"'),
 (1,
  '0.036*"set" + 0.018*"book" + 0.009*"ball" + 0.008*"kit" + 0.007*"game" + '
  '0.006*"kid" + 0.006*"level" + 0.005*"gr" + 0.005*"learning" + '
  '0.005*"balance"')]
Best K:  2     Best coherence Score:  -5.685597133881545


Answer: 

First of all, the coherence score of "essays" is much better then the one of "Description" for a K = 2 topics.

When we look at the topics of "essays" and if we take aside the terms refering to school which 
were not filtered by the stopword function, we see that overall the topics refer to "need" or "love" showing 
that maybe the essays can be divided in two approaches which involve saying how much funding 
is needed or how much the students would love the project.

When we look at the topics of "description", we can see that the first topic may refer to the aspect of resource requested
the second does not really make sense. This explains the variation in coherence score.
Probably the "description" requires much more topics in order to have meaningful information.

Also, we have to take into account the fact that when the two datasets were merged, some essays were replicated and
this might also have an impact on the coherence of the topics model.