Text Classification

This repository implements text classification models, a common task in Natural Language Processing (NLP) that assigns a label or class to a piece of text based on its content, primarily using TensorFlow and Hugging Face. The architectures are based on LSTM networks and pretrained models such as BERT and RoBERTa. Additionally, the LSTM network models include the implementation and visualization of word embeddings generated after training, using Principal Component Analysis (PCA) from Scikit-learn to reduce the dimensionality of the trained weights in the Embedding layer.

Text classification has numerous applications, such as sentiment analysis, Natural Language Inference (NLI), Question Natural Language Inference (QNLI), duplicate questions, grammatical correctness, and more.

Use Cases So Far:

Sentiment Analysis: This approach is used to determine the attitude or emotion expressed in a piece of text.
- Multiclass text classification model to categorize BBC news articles into 5 categories: 'business', 'entertainment', 'politics', 'sport', 'tech'. This model helps organize large volumes of articles into specific categories, making it easier to search and analyze information.
- Model to distinguish between positive and negative movie reviews in the IMDB review dataset. This model helps understand viewer opinions about different movies, which is useful for market research and personalized recommendations.
- News headline classification model to determine if they are sarcastic or not using the News Headlines Dataset for Sarcasm Detection. Sarcasm detection is a significant challenge in NLP due to the subtlety of language, and this model contributes to improving accuracy in sentiment analysis and content moderation.
- Model to distinguish between positive and negative tweets from the NLTK Twitter dataset. This model is useful for social media analysis, allowing businesses and organizations to monitor and respond to user opinions in real time.
Duplicate Questions: This approach is used to identify if two questions are essentially the same or are paraphrases of each other.
- Model to determine if two questions are paraphrases of each other using the Quora Question Pairs dataset. This model is crucial for question-and-answer platforms as it helps prevent content duplication, improving search efficiency and the quality of provided responses.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
images		images
models_lstm		models_lstm
models_transformers		models_transformers
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Classification

Use Cases So Far:

Visualization of Word Embeddings Using PCA

Some Results of the Predictions

Further results from the predictions and word embeddings can be found in their respective notebooks.

Technological Stack

Contact

About

Releases

Packages

Languages

JersonGB22/TextClassification-TensorFlow

Folders and files

Latest commit

History

Repository files navigation

Text Classification

Use Cases So Far:

Visualization of Word Embeddings Using PCA

Some Results of the Predictions

Further results from the predictions and word embeddings can be found in their respective notebooks.

Technological Stack

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages