Aspects Recognition in English Texts

Definition

Aspect extraction in NLP involves identifying and extracting specific aspects, attributes, or features mentioned in text data. This task is crucial for understanding the finer details and nuances of textual information, especially in domains like product reviews, customer feedback, social media analysis, and opinion mining.

Summary

This project focuses on solving the problem of Aspect Recognition in English text. There are three solutions available in the repository:

Solution 1: Frequency-Based Method
- Initial text processing involved tokenization, removing stopwords, and applying stemming using the PorterStemmer() function from nltk.
- Part-of-Speech tagging (Pos_Tag()) was applied to sentences, followed by extracting nouns and noun phrases using the noun_chuncks() function from spacy.
- Aspects were identified based on their frequency, retaining those with frequencies above a specified threshold.
Solution 2: H&L Method
- Inspired by Hu & Liu's approach, this method addresses issues like repeated nouns potentially becoming aspects.
- It first extracts nouns and noun phrases, considers them as candidate aspects, and then forms new aspect candidates by concatenating pairs and trios of aspects occurring in the same order within a sentence.
- Aspect candidates are ranked based on their p-support, which is the number of sentences containing the aspect excluding sentences containing another aspect.
- Further pruning is done by eliminating non-compact phrases and candidates with low p-support or included within another candidate.
Solution 3: Supervised Learning with Conditional Random Field (CRF)
- As aspect extraction can be framed as a sequence labeling problem, Conditional Random Field (CRF) was applied.
- Features used included part-of-speech tags, lemmas, word frequencies, and semantic context (neighboring words).
- Evaluation was performed using precision, recall, and F1-score metrics, visualized through the LETO visual interface.

Additional Notes

The state-of-the-art includes syntax-based methods focusing on syntactic relationships, and unsupervised learning using Latent Dirichlet Allocation (LDA).
Syntax-based methods excel in capturing less frequent aspects but require numerous rules, while LDA-based approaches offer an unsupervised alternative.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
project		project
README.md		README.md
Read me		Read me
sensors.py		sensors.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

project

project

README.md

README.md

Read me

Read me

sensors.py

sensors.py

Repository files navigation

Aspects Recognition in English Texts

Definition

Summary

Additional Notes

About

Releases

Packages

Languages

ClauPorto/Aspect-Extraction-English-Texts

Folders and files

Latest commit

History

Repository files navigation

Aspects Recognition in English Texts

Definition

Summary

Additional Notes

About

Resources

Stars

Watchers

Forks

Languages