Applied to 15,000 Disneyland Reviews
Comprehensive NLP pipeline on 15,000 Disneyland park reviews covering sentiment classification, text summarization, named entity recognition, and n-gram analysis using BERT and traditional ML.
- Source: Disneyland Reviews (Kaggle)
- Size: 15,000 reviews
- Task: Sentiment classification (1-5 star ratings)
- BERT: 89% sentiment classification accuracy
- 4-7% accuracy improvement via T5 summarization preprocessing pipeline
- NER identifies locations, organizations, entities
- Word cloud and n-gram analysis of key themes
- Text Preprocessing — contractions, tokenization, stopwords, lemmatization (NLTK + spaCy)
- Exploratory Analysis — rating distribution, word clouds, n-gram analysis, NER
- Sentiment Classification — BERT, Logistic Regression, SVM, Naive Bayes, TextBlob (TF-IDF)
- Text Summarization — extractive (Summa) and abstractive (T5 transformer), ROUGE evaluation
Python | BERT | T5 | PyTorch | Hugging Face | NLTK | spaCy | Scikit-learn | Pandas | WordCloud
git clone https://github.com/Phoenixking-04/Sentiment-Analysis-NLP.git
pip install pandas scikit-learn nltk spacy transformers torch wordcloud textblob sumy rouge-score matplotlib seaborn
python -m spacy download en_core_web_sm
jupyter notebook NLP.ipynb🔗 Developer: Kalyankumar Sandireddy