A machine learning system that classifies Spotify podcasts as "yapping" (fast, informal speech) vs "normal" (structured, professional) content.
- Python 3.7+
- Kaggle account with API credentials
git clone <your-repo>
cd spotify-yapping-detector
pip install -r requirements.txt# 1. Process dataset
python scripts/spotify_processor.py
# 2. Train model
python scripts/train.py├── src/ # Source code
├── scripts/ # Executable scripts
├── data/ # Datasets
├── models/ # Trained models
├── images/ # Visualizations
├── notebooks/ # Jupyter notebooks
└── requirements.txt # Dependencies
The system identifies "yapping" podcasts using:
- Keywords: drama, react, gossip, tea, rant vs education, tutorial, news
- Language patterns: Multiple exclamations, ALL CAPS, long titles
- Content analysis: Reading complexity, sentence structure
The model achieves 99%+ accuracy with balanced datasets.
Add new keywords in src/yapping_detector.py:
self.yapping_keywords['custom'] = ['new', 'keywords']Educational and research use only.

