Skip to content

Ganapathy-K/NLP-Using-Python

Repository files navigation

NLP-With-Python

Natural Language Processing using Python (January 2025) - Lab Activities

Day 1:

  1. Use Stop Words
  2. POS tagging
  3. Generate Antonyms
  4. Generate Synonyms

Day 2:

  1. Calculating frequency of top N words
  2. Using NLTK and RE module to tokenize text, remove stop words, special characters
  3. Using wordcloud library to generate word cloud from cleaned text

Day 3: Apply CountVectorization to the input data & answer the following:

  1. Count Vectorizer Matrix
  2. Vocabulary (unique words in corpus)
  3. Calculating frequency of top N words (top 1 word = most frequent term)
  4. Finding out words that appear in all/maximum sentences/documents
  5. (Doubts):
    1. What would happen if we set stop_words='english' in the CountVectorizer?
    2. Should we manually explain the tone of 'happy' or should we write a code (Like for pick out term that occurs in all documents)

Day 4: Multinomial Naive Bayes model on 'Movie Review' dataset:

  1. Pre processing steps - Using NLTK and RE module to:
    1. Tokenize text
    2. Remove stop words
    3. Remove special characters
  2. Word Cloud for the data
  3. MultiNB Model

Capstone Project: Major concepts used:

  1. Words Tokenization
  2. Stop Words
  3. Lemmatization

About

Natural Language Processing using Python (January 2025) - Lab Activities

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors