Jupyter Notebook Shortcuts:
- **A**: Insert cell --**ABOVE**--
- **B**: Insert cell --**BELOW**--
- **M**: Change cell to --**MARKDOWN**--
- **Y**: Change cell to --**CODE**--
    
- **Shift + Tab** will show you the Docstring (**documentation**) for the the object you have just typed in a code cell  you can keep pressing this short cut to cycle through a few modes of documentation.
- **Ctrl + Shift + -** will split the current cell into two from where your cursor is.
- **Esc + O** Toggle cell output.
- **Esc + F** Find and replace on your code but not the outputs.

[MORE SHORTCUTS](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/)

### ----------------------------------------------------------------------------------------------------------------------------------------------------

## Data analysis

#### Steps:
1. Apply TF-IDF
2. Try Wikipedia linking
3. Try linking with WordNet
4. Try Bag of Words
5. Try other algorithms? 
6. Define a clear dictionary with words for each category
7. Other Classification algorithms?
8. Try finding n-grams

---

### TODO next:
- get the _tokPOStag.txt files
- read and save every line as key-value pair or list of 2 elements
- compare the second element of each line, i.e the POS, match with the wordnet pos tags
- process stemming correctly
- [lemmatize](https://stackoverflow.com/questions/25534214/nltk-wordnet-lemmatizer-shouldnt-it-lemmatize-all-inflections-of-a-word)
- [find n-grams](https://stackoverflow.com/questions/17531684/n-grams-in-python-four-five-six-grams)
- Perform [Bag of Words](https://pythonprogramminglanguage.com/bag-of-words/)

------

#### Offtopic
- [Puthon theory](http://xahlee.info/python/python_basics.html)
- [Text classification](https://gallery.azure.ai/Experiment/Text-Classification-Step-2-of-5-text-preprocessing-2)
- [Preprocessing steps](https://www.kdnuggets.com/2018/03/text-data-preprocessing-walkthrough-python.html)

In [55]:
import os
import sys
import os.path
import string
import time
import pathlib
from unidecode import unidecode
import pprint
from tabulate import tabulate
import unittest

import scipy
import numpy
import sklearn
import math
from textblob import TextBlob as tb
import nltk
from nltk.corpus import wordnet as wn
from beautifultable import BeautifulTable
#nltk.download('punkt')
#nltk.download('wordnet')

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import euclidean_distances
import operator

## TF-IDF implementation
#### ---------------

IF-IDF is implemented in order to check whether the terms extracted from LOs will have anything in common with the terms that would be extracted with manual MOOC analysis and to compare with of the two methods will bring better results in the classification part

Below is the main TF-IDF implementation without any text provided to it yet.

##### Term frequency
\\( tf(t,d) = 0.5 + 0.5 * (\frac{f_{t,d}}{f_{t',d}:t' \in d}) \\) 

##### Inversed document frequency
\\( idf(t,D) = log * (\frac{N}{d \in D  :  t \in d}) \\)

##### Computing tf-idf
\\( tfidf(t,d,D) = tf(t,d) * idf(t,D) \\)

In [2]:
# blob is the the text where to look for the word
def tf(term, doc):
    #return ratio between nr of certain word count and total document word count
    return doc.words.count(term) / len(doc.words)

def docsWithTermIn(term, doclist):
    return sum(1 for doc in doclist if term in doc.words)

def idf(term, doclist):
    return math.log(len(doclist) / (1 + docsWithTermIn(term, doclist)))

def tfidf(term,doc,doclist):
    return tf(term, doc) * idf(term,doclist)

### Running TF-IDF with data

#### TODO: Fix the input, it takes strings, and not files right now

In [110]:
# traverse each folder and sub-folder
# create an array of files to add each file in it
# if the file is TXT, add to the array
# create a String array of documents with the file of the array with files 
# so we can store the contents of each inside
# read each line of each file and save to the strings
# perform algorithms on the documents

# -------------------------------------------------------------------------------------------------------

# WINDOWS
#path = r"C:\Users\ani\Desktop\Course data Thesis\one file"
#path = r"C:\Users\ani\Desktop\Course data Thesis\one file"
path = r"C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed"

# LINUX
#path = "/media/sf_Shared_Folder/TEST/one file" # FEW TEST FILES
#path = "/media/sf_Shared_Folder/TEST/RAW"   # TEST DATA PATH
#path = "/media/sf_Shared_Folder/Coursera Downloads PreProcessed"   # REAL DATA PATH

docnames = []
counter = 0

# DOCUMENT LIST CONSISTS OF TEXTBLOB files. All input files need to be converted to TEXTBLOB 
# and then saved in this list in order for TF-IDF to work
doclist = []

for root, subdirs, files in os.walk(path):

    for curFile in os.listdir(root):

        filePath = os.path.join(root, curFile)

        if os.path.isdir(filePath):
            pass

        else:
            # check for file extension and if not TXT, continue and disregard the current file
            if not filePath.endswith(".txt"):
                pass
            #elif filePath.endswith("_lemmatized.txt"):
            
            #### PROCESS FIRST WITH BoW to output the FullLemTerm FILES
            elif filePath.endswith("_FullLemTerm.txt"):
                try: 
                    counter += 1
                    curFile = open(filePath, 'r', encoding = "ISO-8859-1") #IMPORTANT ENCODING! UTF8 DOESN'T WORK
                    fileExtRemoved = os.path.splitext(os.path.abspath(filePath))[0]
                    docnames.append(filePath)
                    
                    fcontentTBlob = tb(curFile.read())
                    #print(fcontentTBlob)
                    doclist.append(fcontentTBlob)
                    # bag of words processing:
                    
                finally: 
                    curFile.close()
            else:
                pass

print("Total number of files in docnames[]:", len(docnames))
print("Total number of files in doclist[]:", len(docnames))

Total number of files in docnames[]: 543
Total number of files in doclist[]: 543


In [None]:
# ------------------------------------ TF-IDF --------------------------------------------------------

# arrays to hold the terms found in text and also a custom list to test domain-specific terms
exportedList = []
ownList = {"data management","database","example","iot","lifecycle","bloom","filter","integrity",
           "java","pattern","design pattern","svm","Support vector machine","knn","k-nearest neighbors","machine learning"}

table = BeautifulTable()
table.column_headers = ["TERM", "TF-IDF"]

topNwords = 15;

for i, doc in enumerate(doclist):
    oFileName = str(docnames[i].split(".en")[0]+".en")+"_tfidf.txt"
    #print(oFileName)
    
    with open(oFileName, "w") as oFile:
        print("\nTop {} terms in document {} | {}".format(topNwords, i + 1, docnames[i]))
        scores = {term: tfidf(term, doc, doclist) for term in doc.words}
        sortedTerms = sorted(scores.items(),key=lambda x: x[1], reverse=True)
    
        for term, score in sortedTerms[:topNwords]:
            #print(table.append_row([term, round(score, 5)]))
            #print("\tTERM: {} \t|\t TF-IDF: {}".format(term, round(score, 5)))
            scoreStr = str(round(score, 5))
            saveStr = str(term+","+scoreStr)
            oFile.write(saveStr+"\n")
            exportedList.append(term)
            #print tabulate([term, round(score, 5)], headers=['tTERM', 'TF-IDF'])
        


Top 15 terms in document 1 | C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\01_welcome-to-calculus-one\01_introduction-to-calculus-one\01_why-is-calculus-going-to-be-so-much-fun.en_FullLemTerm.txt

Top 15 terms in document 2 | C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\01_welcome-to-calculus-one\01_introduction-to-calculus-one\03_how-is-this-course-structured.en_FullLemTerm.txt

Top 15 terms in document 3 | C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\02_functions-and-limits\01_introduction\01_how-do-we-get-started-with-calculus.en_FullLemTerm.txt

Top 15 terms in document 4 | C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\02_functions-and-limits\02_functions-whats-a-function\01_what-is-a-function.en_FullLemTerm.txt

Top 15 terms in document 5 | C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Courser

In [16]:
# ----------------------------------------- NLTK, WORDNET -------------------------------------------
print("\n\n------- EXPORTED TERMS in WORDNET ----------") 
for word in exportedList:
    if not wn.synsets(word):
        print("\n", word, ": NO SYNSETS\n")
    else:
        print("\n", word)
        for ss in wn.synsets(word):
            print("- ",ss.name()," | ",ss.definition())

print("\n\n------- CUSTOM TERMS in WORDNET (also domain specific) ----------")    
for word in ownList:
    if not wn.synsets(word):
        print("\n", word, ": NO SYNSETS\n")
    else:
        print("\n", word)
        for ss in wn.synsets(word):
            print("- ",ss.name()," | ",ss.definition())
    



------- EXPORTED TERMS in WORDNET ----------

 least
-  least.n.01  |  something that is of no importance
-  least.a.01  |  the superlative of `little' that can be used with mass nouns and is usually preceded by `the'; a quantifier meaning smallest in amount or extent or degree
-  least.r.01  |  used to form the superlative

 man
-  man.n.01  |  an adult person who is male (as opposed to a woman)
-  serviceman.n.01  |  someone who serves in the armed forces; a member of a military force
-  man.n.03  |  the generic use of the word to refer to any human being
-  homo.n.02  |  any living or extinct member of the family Hominidae characterized by superior intelligence, articulate speech, and erect carriage
-  man.n.05  |  a male subordinate
-  man.n.06  |  an adult male person who has a manly character (virile and courageous competent)
-  valet.n.01  |  a manservant who acts as a personal attendant to his employer
-  man.n.08  |  a male person who plays a significant role (husband or lov

### Bag of Words, and all the rest

In [62]:
# Algorithms

# BoF
def bagOfWords(iFilePath,iPOSfPath,choice):
    
    from sklearn.feature_extraction.text import CountVectorizer
    vectorizer = CountVectorizer()
    data_corpus = []  
    
    if iFilePath.endswith("_lemmatized.txt"):
        print("[Bag of words: ]\t" + iFilePath+"\n")

        baseName = iFilePath.split(".en", 1)[0]
        OFName = baseName + ".en_FullLemTerm.txt" 
        
        try:
            iLemmaFile = open(iFilePath, 'r', encoding = "ISO-8859-1")  # open lemma file in read mode
            LemmafileCont = iLemmaFile.read().split()   # read file content and save it into the string variable

            text = ""
            
            with open(OFName, "w") as oFile:
                for line in LemmafileCont:
                    fullTerm = findRealTerm(line,iPOSfPath)
                    #print("FOUND..", fullTerm)
                    text += fullTerm+" "

                    if(choice == 0):
                        pass
                    elif(choice == 1):
                        oFile.write(fullTerm+"\n")
                    else:
                        print("Invalid output file option.. 0 - NO file, 1 - SAVE file")
                        break
                
                #continue
            data_corpus.append(text)
            
            #print(data_corpus)
            vector = vectorizer.fit_transform(data_corpus).todense() 
            #print(vector.toarray())
            #print(vectorizer.get_feature_names())  

            #array = vector.toarray()
            #featureNames = vectorizer.get_feature_names()
            
            vocDictionary = vectorizer.vocabulary_
            sorted_vocDictionary = sorted(vocDictionary.items(), key=operator.itemgetter(1), reverse=True)
            #print(sorted_vocDictionary)

            """
            for f in vector:
                print("Euclidean distances: \n")
                print(euclidean_distances(vector[0], f))
            """
            
        finally:
            iLemmaFile.close()              
        
    else:
        pass

#### ----------------------------------------------------------------------------------------------------------------------------------------

In [36]:
# Match the lemma to first found word from file _lemmatized.txt for that file
def findRealTerm(lemmaIn, POSfilePath):   
    # take the path of the input file and look for the file ending on "_stemmedbyPOS.txt"
    # split each line into 3, look in line[3] for the first match of the current lemma
    # when found, take line[0] which is the full word and return that word
    # exit the function

    #print("Searching for term")
    res = ""
    try:
        iPOSFile = open(POSfilePath, 'r', encoding = "ISO-8859-1")  # open POS file in read mode
        posfileCont = iPOSFile.read().split()   # read file content and save it into the string variable
        #print(lemmaIn)
        for line in posfileCont:
            line = line.split(",")

            word = line[0]
            lemma = line[2]

            if lemma == lemmaIn:
                #print("WORD: ", word, " || LEMMA: ", lemma)
                res = word
                break
    
    finally:
        iPOSFile.close()
        
    if res == "":
        res = "NOT FOUND: "+lemmaIn
        
    return res

#### ----------------------------------------------------------------------------------------------------------------------------------------

In [69]:
path = r"C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed"
#path = "/media/sf_Shared_Folder/TEST/one file"

# LINUX
#path = "/media/sf_Shared_Folder/TEST/one file" # FEW TEST FILES
#path = "/media/sf_Shared_Folder/TEST/RAW"   # TEST DATA PATH
#path = "/media/sf_Shared_Folder/Coursera Downloads PreProcessed"   # REAL DATA PATH

counter = 0
POSfiles = []

# --- Collecting the POS files
for root, subdirs, files in os.walk(path):

    for curFile in os.listdir(root):

        curFilePath = os.path.join(root, curFile)

        if os.path.isdir(curFilePath):
            pass

        else:
            # create a list of files for POS so that it can be sent along with BoF to look for the right file and terms
            if curFilePath.endswith("_stemmedbyPOS.txt"):
                curFile = open(curFilePath, 'r', encoding = "ISO-8859-1") #IMPORTANT ENCODING! UTF8 DOESN'T WORK
                baseName = os.path.basename(curFile.name.split(".en", 1)[0])
                curFilePOS = baseName+".en_stemmedbyPOS.txt"
                POSfiles.append(curFilePOS)
            else:
                pass

# --------------------------------------------------------------------------------
            
# --- processing Lemmatized files with Algos
for root, subdirs, files in os.walk(path):

    for curFile in os.listdir(root):

        curFilePath = os.path.join(root, curFile)

        if os.path.isdir(curFilePath):
            pass

        else:         
            # check for file extension and if not TXT, continue and disregard the current file
            if curFilePath.endswith("_lemmatized.txt"): 
                counter += 1
                try:
                    # need this only to extract the file path and send it to the algorithm later. Send path, !not file!
                    tempFile = open(curFilePath, 'r', encoding = "ISO-8859-1")                    

                    baseName = tempFile.name.split(".en", 1)[0]
                    POSfilePath = baseName+".en_stemmedbyPOS.txt"

                    if os.path.basename(POSfilePath) in POSfiles:
                        print("\n\n[POS file for BoW: ]\t" + POSfilePath)
                        
                        # ---------- bag of words processing: ------------
                        # <Get lemmatized file, look for the 1st lemma in the respective POS file and take the full term>
                        # last index is whether an output file to be saved or not. 0 - NO, 1 - YES
                        try:
                            bagOfWords(curFilePath,POSfilePath,1)  
                        except ValueError: 
                            print("CANNOT PROCESS (ValueError - only SWs in file) ",curFilePath)
                            pass
                finally:
                    tempFile.close()
            else:
                pass
#print("Total number of POS Files[]:", len(POSfiles))

  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel

  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel

  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel

  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel

  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel import kernelapp as app
  from ipykernel



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\01_welcome-to-calculus-one\01_introduction-to-calculus-one\01_why-is-calculus-going-to-be-so-much-fun.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\01_welcome-to-calculus-one\01_introduction-to-calculus-one\01_why-is-calculus-going-to-be-so-much-fun.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\01_welcome-to-calculus-one\01_introduction-to-calculus-one\03_how-is-this-course-structured.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\01_welcome-to-calculus-one\01_introduction-to-calculus-one\03_how-is-this-course-structured.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Down



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\03_the-end-of-limits\03_infinity-how-can-i-work-with-that\01_why-is-there-an-x-so-that-f-x-x.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\03_the-end-of-limits\03_infinity-how-can-i-work-with-that\01_why-is-there-an-x-so-that-f-x-x.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\03_the-end-of-limits\03_infinity-how-can-i-work-with-that\02_what-does-lim-f-x-infinity-mean.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\03_the-end-of-limits\03_infinity-how-can-i-work-with-that\02_what-does-lim-f-x-infinity-mean.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcess



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\04_the-beginning-of-derivatives\04_how-do-differentiability-and-continuity-relate\02_what-is-the-derivative-of-a-constant-multiple-of-f-x.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\04_the-beginning-of-derivatives\04_how-do-differentiability-and-continuity-relate\02_what-is-the-derivative-of-a-constant-multiple-of-f-x.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\04_the-beginning-of-derivatives\05_how-do-i-find-the-derivative\01_why-is-the-derivative-of-x-2-equal-to-2x.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\04_the-beginning-of-derivatives\05_how-do-i-find-the-derivative\01_why-is-the-derivative-of-x-2-equal-to-2x.e



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\06_chain-rule\01_intro\01_is-there-anything-more-to-learn-about-derivatives.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\06_chain-rule\01_intro\01_is-there-anything-more-to-learn-about-derivatives.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\06_chain-rule\02_what-is-the-chain-rule\01_what-is-the-chain-rule.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\06_chain-rule\02_what-is-the-chain-rule\01_what-is-the-chain-rule.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\06_chain-rule\02_what-is-the-chain-rule\02_what-is-the-derivative-of-1-2x-5



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\08_derivatives-in-the-real-world\01_intro\01_why-would-i-ever-want-to-take-derivatives.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\08_derivatives-in-the-real-world\01_intro\01_why-would-i-ever-want-to-take-derivatives.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\08_derivatives-in-the-real-world\02_how-can-derivatives-help-with-limits\01_how-can-derivatives-help-us-to-compute-limits.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\08_derivatives-in-the-real-world\02_how-can-derivatives-help-with-limits\01_how-can-derivatives-help-us-to-compute-limits.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data 



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\10_linear-approximation\02_what-happens-if-i-repeat-linear-approximation\02_why-is-log-3-base-2-approximately-19-12.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\10_linear-approximation\02_what-happens-if-i-repeat-linear-approximation\02_why-is-log-3-base-2-approximately-19-12.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\10_linear-approximation\03_what-does-dx-mean-by-itself\01_what-does-dx-mean-by-itself.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\10_linear-approximation\03_what-does-dx-mean-by-itself\01_what-does-dx-mean-by-itself.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESS



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\11_antidifferentiation\05_why-would-anybody-want-to-do-this\01_knowing-my-velocity-what-is-my-position.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\11_antidifferentiation\05_why-would-anybody-want-to-do-this\01_knowing-my-velocity-what-is-my-position.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\11_antidifferentiation\05_why-would-anybody-want-to-do-this\02_knowing-my-acceleration-what-is-my-position.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\11_antidifferentiation\05_why-would-anybody-want-to-do-this\02_knowing-my-acceleration-what-is-my-position.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course dat



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\13_fundamental-theorem-of-calculus\01_introduction\01_what-is-the-big-deal-about-the-fundamental-theorem-of-calculus.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\13_fundamental-theorem-of-calculus\01_introduction\01_what-is-the-big-deal-about-the-fundamental-theorem-of-calculus.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\13_fundamental-theorem-of-calculus\02_what-is-the-fundamental-theorem-of-calculus\01_what-is-the-fundamental-theorem-of-calculus.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\13_fundamental-theorem-of-calculus\02_what-is-the-fundamental-theorem-of-calculus\01_what-is-the-fundamental-theorem-of-calculus.e



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\14_substitution-rule\03_what-are-some-tricks-for-doing-substitutions\03_what-is-the-integral-of-x-x-1-1-3-dx.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\14_substitution-rule\03_what-are-some-tricks-for-doing-substitutions\03_what-is-the-integral-of-x-x-1-1-3-dx.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\14_substitution-rule\03_what-are-some-tricks-for-doing-substitutions\04_what-is-the-integral-of-dx-1-cos-x.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\14_substitution-rule\03_what-are-some-tricks-for-doing-substitutions\04_what-is-the-integral-of-dx-1-cos-x.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Deskto



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\16_applications-of-integration\03_how-can-i-calculate-volume\02_what-is-the-volume-of-a-sphere.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\16_applications-of-integration\03_how-can-i-calculate-volume\02_what-is-the-volume-of-a-sphere.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\16_applications-of-integration\03_how-can-i-calculate-volume\03_how-do-washers-help-to-compute-the-volume-of-a-solid-of-revolution.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\calculus1\16_applications-of-integration\03_how-can-i-calculate-volume\03_how-do-washers-help-to-compute-the-volume-of-a-solid-of-revolution.en_lemmatized.txt



[POS file for BoW: ]



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\data-management\02_data-management-planning\01_data-management-planning\04_data-management-planning-tools.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\data-management\02_data-management-planning\01_data-management-planning\04_data-management-planning-tools.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\data-management\03_working-with-data\01_organizing-data\01_good-file-management-in-research.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\data-management\03_working-with-data\01_organizing-data\01_good-file-management-in-research.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\data-manageme



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\data-management\04_sharing-data\02_enabling-sharing\03_intellectual-property-and-data-ownership.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\data-management\04_sharing-data\02_enabling-sharing\03_intellectual-property-and-data-ownership.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\data-management\04_sharing-data\02_enabling-sharing\04_access.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\data-management\04_sharing-data\02_enabling-sharing\04_access.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\data-management\04_sharing-data\03_supplementary-videos\01_what-are-the-benefits-of-sharing-d



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\digitalmedia\01_sonic-painter\01_week-1-sonic-painter\16_1-6-sonic-painter.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\digitalmedia\01_sonic-painter\01_week-1-sonic-painter\16_1-6-sonic-painter.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\digitalmedia\01_sonic-painter\01_week-1-sonic-painter\18_1-7-outro.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\digitalmedia\01_sonic-painter\01_week-1-sonic-painter\18_1-7-outro.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\digitalmedia\01_sonic-painter\02_week-1-additional-lectures\01_additional-lecture-introduction-to-programming.en_stemmedbyPOS



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\digitalmedia\04_angrydroids\01_week-4-angrydroids\06_4-2-forces.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\digitalmedia\04_angrydroids\01_week-4-angrydroids\06_4-2-forces.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\digitalmedia\04_angrydroids\01_week-4-angrydroids\08_4-3-preparing-and-playing-sound-fx.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\digitalmedia\04_angrydroids\01_week-4-angrydroids\08_4-3-preparing-and-playing-sound-fx.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\digitalmedia\04_angrydroids\01_week-4-angrydroids\10_4-4-integrating-audio-and-physics.en_stemmedbyPOS.tx



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\embedded-operating-system\02_processing-elements-of-an-embedded-system\01_cpus-and-fpgas\02_main-features-of-embedded-processors.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\embedded-operating-system\02_processing-elements-of-an-embedded-system\01_cpus-and-fpgas\02_main-features-of-embedded-processors.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\embedded-operating-system\02_processing-elements-of-an-embedded-system\02_use-cases\01_use-cases-of-micro-controller-platforms.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\embedded-operating-system\02_processing-elements-of-an-embedded-system\02_use-cases\01_use-cases-of-micro-controller-platforms.en_lemmatized.txt







[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\interactive-computer-graphics\01_graphical-user-interfaces\02_lecture-and-practice-quiz\01_1-1-scrolling-interface.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\interactive-computer-graphics\01_graphical-user-interfaces\02_lecture-and-practice-quiz\01_1-1-scrolling-interface.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\interactive-computer-graphics\01_graphical-user-interfaces\02_lecture-and-practice-quiz\03_1-2-desktop-icons.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\interactive-computer-graphics\01_graphical-user-interfaces\02_lecture-and-practice-quiz\03_1-2-desktop-icons.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PRO



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\interactive-computer-graphics\04_deformation-and-animation\01_lecture-and-practice-quiz\09_4-5-motion-database.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\interactive-computer-graphics\04_deformation-and-animation\01_lecture-and-practice-quiz\09_4-5-motion-database.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\interactive-computer-graphics\05_fabrication\01_lecture-and-practice-quiz\01_5-1-plush-toys.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\interactive-computer-graphics\05_fabrication\01_lecture-and-practice-quiz\01_5-1-plush-toys.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\int



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\iot-connectivity-security\01_introduction-to-web-connectivity-security\02_introduction-to-cps-web-connectivity\02_introduction-to-web-connectivity.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\iot-connectivity-security\01_introduction-to-web-connectivity-security\02_introduction-to-cps-web-connectivity\02_introduction-to-web-connectivity.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\iot-connectivity-security\01_introduction-to-web-connectivity-security\03_layers-and-protocols\01_application-layer-protocols.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\iot-connectivity-security\01_introduction-to-web-connectivity-security\03_layers-and-protocols\01_application-lay



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\iot-connectivity-security\05_cryptography\01_introduction-to-cryptography\01_introduction-to-cryptography.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\iot-connectivity-security\05_cryptography\01_introduction-to-cryptography\01_introduction-to-cryptography.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\iot-connectivity-security\05_cryptography\02_keys-ciphers-and-security\01_symmetric-key-ciphers-and-wireless-lan-security.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\iot-connectivity-security\05_cryptography\02_keys-ciphers-and-security\01_symmetric-key-ciphers-and-wireless-lan-security.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Th



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\mobile-robot\02_mobile-robots\01_week-2\04_sensors.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\mobile-robot\02_mobile-robots\01_week-2\04_sensors.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\mobile-robot\02_mobile-robots\01_week-2\05_behavior-based-robotics.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\mobile-robot\02_mobile-robots\01_week-2\05_behavior-based-robotics.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\mobile-robot\02_mobile-robots\01_week-2\07_the-grits-simulator.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreP



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\mobile-robot\04_control-design\01_week-4\09_glue-lecture-4.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\mobile-robot\04_control-design\01_week-4\09_glue-lecture-4.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\mobile-robot\04_control-design\01_week-4\10_programming-simulation-lecture-4.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\mobile-robot\04_control-design\01_week-4\10_programming-simulation-lecture-4.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\mobile-robot\05_hybrid-systems\01_week-5\01_switches-everywhere.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data The



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\mobile-robot\07_putting-it-all-together\01_week-7\01_approximations-and-abstractions.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\mobile-robot\07_putting-it-all-together\01_week-7\01_approximations-and-abstractions.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\mobile-robot\07_putting-it-all-together\01_week-7\02_a-layered-architecture.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\mobile-robot\07_putting-it-all-together\01_week-7\02_a-layered-architecture.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\mobile-robot\07_putting-it-all-together\01_week-7\03_differential-drive-trackers.en_ste



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\particle-physics\02_nuclear-physics\02_2-2-nuclear-size-and-spin\01_2-2-nuclear-size-and-spin.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\particle-physics\02_nuclear-physics\02_2-2-nuclear-size-and-spin\01_2-2-nuclear-size-and-spin.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\particle-physics\02_nuclear-physics\03_2-3-models-of-nuclear-structure\01_2-3-models-of-nuclear-structure.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\particle-physics\02_nuclear-physics\03_2-3-models-of-nuclear-structure\01_2-3-models-of-nuclear-structure.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\particle-



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\particle-physics\03_accelerators-and-detectors\07_3-7-ionisation-detectors\01_3-7-ionisation-detectors.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\particle-physics\03_accelerators-and-detectors\07_3-7-ionisation-detectors\01_3-7-ionisation-detectors.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\particle-physics\03_accelerators-and-detectors\08_3-8-semiconductor-detectors\01_3-8-semiconductor-detectors.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\particle-physics\03_accelerators-and-detectors\08_3-8-semiconductor-detectors\01_3-8-semiconductor-detectors.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Download



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\particle-physics\06_electro-weak-interactions\01_6-1-particles-and-antiparticles\01_6-1-particles-and-antiparticles.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\particle-physics\06_electro-weak-interactions\01_6-1-particles-and-antiparticles\01_6-1-particles-and-antiparticles.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\particle-physics\06_electro-weak-interactions\02_6-2-the-discrete-transformations-c-p-and-t\01_6-2-the-discrete-transformations-c-p-and-t.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\particle-physics\06_electro-weak-interactions\02_6-2-the-discrete-transformations-c-p-and-t\01_6-2-the-discrete-transformations-c-p-and-t.en_lemmatized.txt



[POS



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\particle-physics\08_dark-matter-and-dark-energy\03_8-3-dark-energy\01_8-3-dark-energy.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\particle-physics\08_dark-matter-and-dark-energy\03_8-3-dark-energy\01_8-3-dark-energy.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\particle-physics\08_dark-matter-and-dark-energy\03_8-3-dark-energy\02_8-3a-motivating-the-friedmann-equation-optional.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\particle-physics\08_dark-matter-and-dark-energy\03_8-3-dark-energy\02_8-3a-motivating-the-friedmann-equation-optional.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\s



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\spacecraft-dynamics-kinetics\01_continuous-systems-and-rigid-bodies\03_rigid-bodies\09_tips-for-solving-spring-particle-systems.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\spacecraft-dynamics-kinetics\01_continuous-systems-and-rigid-bodies\03_rigid-bodies\09_tips-for-solving-spring-particle-systems.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\spacecraft-dynamics-kinetics\01_continuous-systems-and-rigid-bodies\03_rigid-bodies\10_optional-review-rigid-body-properties.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\spacecraft-dynamics-kinetics\01_continuous-systems-and-rigid-bodies\03_rigid-bodies\10_optional-review-rigid-body-properties.en_lemmatized.txt



[POS f



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\spacecraft-dynamics-kinetics\02_torque-free-motion\02_torque-free-dual-spinners\04_9-example-dual-spinner-stability.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\spacecraft-dynamics-kinetics\02_torque-free-motion\02_torque-free-dual-spinners\04_9-example-dual-spinner-stability.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\spacecraft-dynamics-kinetics\02_torque-free-motion\02_torque-free-dual-spinners\05_9-1-spin-up-considerations.en_stemmedbyPOS.txt
[Bag of words: ]	C:\Users\ani\Desktop\Course data Thesis\PROCESSED\Coursera Downloads PreProcessed\spacecraft-dynamics-kinetics\02_torque-free-motion\02_torque-free-dual-spinners\05_9-1-spin-up-considerations.en_lemmatized.txt



[POS file for BoW: ]	C:\Users\ani\Desktop\Course data Thesis

### Bag of Words v.2

In [9]:
import collections, re
texts = ['John likes to watch movies. Mary likes too.',
'John also likes to watch football games.']

bagsofwords = [collections.Counter(re.findall(r'\w+', txt)) for txt in texts]

bagsofwords[0]

#Counter({'likes': 2, 'watch': 1, 'Mary': 1, 'movies': 1, 'John': 1, 'to': 1, 'too': 1})

bagsofwords[1]
#Counter({'watch': 1, 'games': 1, 'to': 1, 'likes': 1, 'also': 1, 'John': 1, 'football': 1})

sumbags = sum(bagsofwords, collections.Counter())

Counter({'likes': 3, 'watch': 2, 'John': 2, 'to': 2, 'games': 1, 'football': 1, 'Mary': 1, 'movies': 1, 'also': 1, 'too': 1})


NameError: name 'Counter' is not defined

### --------------------------------------------------------------------------------------------------------------------------------------
### NOTES
### --------------------------------------------------------------------------------------------------------------------------------------

**TF-IDF** doesn't output the necessary result, I need n-grams selected as a combined keyword and these are often very general words like `for example` or `key concept` etc. in order to classify the text into the GOAL element. 

**TextBlob** provides options for n-grams and also connection to WordNet ontology which could be useful, so will look more into it.

**WordNet** finds multiple definitions and synsets (synonyms) for most of the general words, however if provided specific e.g. computer science algorithm names, or specific terms, it doesn find any synonyms, nor descriptions of any of them.

**Wikipedia** recognized some of the terms, but not all. For instance if we give it KNN it doesn't find anything, but if we give it K-nearest neighbour, if finds it. This is how the name is in Wikipedia, so that may be the reason. But on Google first returned result for KNN is this article. Same for SVM and Support vector machine. I've modified the script to return "NO DESCRIPTION or DISAMBIGUATION" everytime if finds nopthing ot if there's a disambiguation error, otherwise it wouldn continue checking the rest of the terms. So now it skips the error. 
 
**Full list** of identified key words so far [HERE](https://docs.google.com/spreadsheets/d/1Dj4UAh6U5jAelcsz-gDCdDE9JRVhwaNei0Ctn8m0Ui4/edit?usp=sharing)