Course: Natural Language Processing (ECE-467)
This repository contains project work from a natural language processing course taken at The Cooper Union.
One project was to build a text categorizer program. The following is a description of what was done:
• Built a text categorization system in Python using a Naïve Bayes approach on unigrams • Used Python’s NLTK package for tokenization & stemming • Used a unique variation of Laplace smoothing that was a function of the test sets • Achieved highest average accuracy score, during testing of the systems, in this project's history for Cooper Union's NLP course
The final project of the course involved sentiment analyis on amazon fine food review. The following is a description of what was done:
• Performed polarity (positive / negative) sentiment analysis on an Amazon fine food reviews dataset distributed by Kaggle • The main components of the project include pre-processing the reviews (lowercasing / stemming), using bigram tokenization in creating Tf-IDf feature vectors for the dataset, and using a logistic regression classifier. • Generated average precision, average recall, and average F1 scores of 97% during testing