NLP (Natural Language Processing)

This repository contains assignments of natural language processing university course [2022/2023]

Team:

Renato Esposito
Vincenzo Mele
Luca Rubino (me)

First Assignment: Statistics and Some Preprocessing Steps

The first step was to calculate some statistics of the input corpus. in NLP it is essential to clean up the data (therefore the text) from useless information (which varies according to the application). For this reason, the preprocessing steps were carried out, such as: Tokenization, Lemmatization and Stemming.

Second Assignment: Sentiment Analysis

A very important task of NLP is sentiment analysis: it is a task whose objective is to verify whether the given input text is positive or negative. To do this we used two models provided by the NLTK library: Naive Bayes and Logistic Regression. These models received input data preprocessed differently depending on the model. after which tests and evaluations ( with Confusion Matrix ) were made to verify the accuracy of the predictions.

Example with LR (Logistic Regression)

Suppose that your input text is The other day I had a pizza in Puglia, but it looked like a cookie. Really horrible and inedible. Output is the following:

Result: ['neg']
 with prob: [[0.89728656 0.10271344]]

Output is negative because 0.89728656 is the negative probability while 0.10271344 is the positive probability.

Third Assignment: N-gram Language Model

In this assignment we want to create a word suggester (like google search does when you're typing words into the search bar). To do this, an N-gram Language Model is created, i.e. a probability matrix that is used to make predictions. Also in this case the model is tested and evaluated with the perplexity measure.

Example

Suppose that your input text is i have just. The model output for this input is the following:

i have just been  -> Probability = 0.008729388942773999
i have just such  -> Probability = 0.002263174911089555
i have just taken  -> Probability = 0.002263174911089555
i have just now  -> Probability = 0.002263174911089555
i have just freed  -> Probability = 0.002263174911089555
i have just felt  -> Probability = 0.002263174911089555
i have just ask  -> Probability = 0.002263174911089555
i have just prevailed  -> Probability = 0.002263174911089555
i have just come  -> Probability = 0.002263174911089555
i have just balances  -> Probability = 0.002263174911089555

in this case the next word will be "been", as it has a higher probability than the others.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
img		img
.gitignore		.gitignore
1_Assignement.ipynb		1_Assignement.ipynb
2_Assignement.ipynb		2_Assignement.ipynb
3_Assignement.ipynb		3_Assignement.ipynb
README.md		README.md
twitter_sentences.csv		twitter_sentences.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP (Natural Language Processing)

Team:

First Assignment: Statistics and Some Preprocessing Steps

Second Assignment: Sentiment Analysis

Example with LR (Logistic Regression)

Third Assignment: N-gram Language Model

Example

About

Releases

Packages

Contributors 2

Languages

Luruu/NLP

Folders and files

Latest commit

History

Repository files navigation

NLP (Natural Language Processing)

Team:

First Assignment: Statistics and Some Preprocessing Steps

Second Assignment: Sentiment Analysis

Example with LR (Logistic Regression)

Third Assignment: N-gram Language Model

Example

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages