-
Notifications
You must be signed in to change notification settings - Fork 0
Module 1_ICP 7: Natural Language Processing in Python using NLTK
acikgozmehmet edited this page Jul 1, 2020
·
5 revisions
#Natural Language Processing in Python using NLTK
The following topics are covered.
- NLP (Natural language processing)
- NLTK (Natural Language Toolkit)
- Computer aided text analysis of human language
- The goal is to enable machines to understand human language and extract meaning from text
- The “Natural Language Toolkit” is a python module that provides a variety of functionality that will aid us in processing text
- An open source library which simplifies the implementation of Natural Language Processing(NLP) in Python.
- Text processing like unigram, bigram, trigram, tokenization, pos tagging, lemmatization, normalization, entity extraction, language model.
- Learning these features will help us for more meaningful project as document classification, spelling corrector, document summarization, etc
a. SVM and see how accuracy changes b. change the tfidf vectorizer to use bigram and see how the accuracy changes TfidfVectorizer(ngram_range=(1,2)) c. Set argument stop_words='english' and see how accuracy changes
Click here to get the source code
https://en.wikipedia.org/wiki/Google
Click here to get the source code
- a. Tokenization
- b. POS
- c. Stemming
- d. Lemmatization
- e. Trigram
- f. Named Entity Recognition
Click here to get the source code
https://github.com/wade12/WikiScraper/blob/master/