An interactive Streamlit app for learning how text data is processed, transformed, and classified using machine learning.
Students can explore datasets, preprocess text, build models, and visualize results — all with clear explanations and examples.
-
Dataset Explorer (📂)
Preview datasets, inspect rows/columns, and understand the structure of text + labels. -
Preprocessing (🔍)
- Tokenization demo (split sentences into words).
- Bag of Words vs TF‑IDF vectorization.
- Worked example showing how TF‑IDF is calculated step‑by‑step.
- Vocabulary preview to see which words are included.
-
Model Builder (🤖)
- Train Logistic Regression, Naive Bayes, and Support Vector Classifier.
- Compare accuracy across models.
- Confusion matrix visualization.
- Top Features chart showing which words drive spam vs ham predictions.
-
Results (📊)
- Test new messages against the trained model.
- See predictions (spam/ham) with probability scores.
- Word clouds for spam vs ham vocabulary.
- Explanation of confidence levels in predictions.
- Streamlit for interactive UI
- scikit-learn for ML models
- NLTK for tokenization
- Matplotlib & Seaborn for plots
- WordCloud for text visualization
├── app.py # Main entry point and landing page
├── pages/
│ ├── 1_Data_Explorer.py # Load and preview text datasets
│ ├── 2_Preprocessing.py # Clean, tokenize, and vectorize text
│ ├── 3_Model_Builder.py # Train and evaluate ML models on text data
│ └── 4_Results.py # Display predictions, metrics, and misclassifications
├── requirements.txt # Dependencies with pinned versions
└── README.md # Project guide and documentation
- Clone the repo:
git clone https://github.com/your-username/text-explorer-app.git cd text-explorer-app - Install dependencies:
pip install -r requirements.txt
- Launch the app:
streamlit run app.py
- Push your repo to GitHub.
- Go to Streamlit Cloud.
- Connect your repo and select app.py as the entry point.
- Deploy and share the link with students!
By using the Text Explorer App, students will:
- Data Explorer (📂)
Understand how text datasets are structured, preview samples, and recognize the importance of dataset inspection. - Preprocessing (🧹)
Learn how to clean text (remove punctuation, stopwords), tokenize words, and convert text into numerical features (e.g., bag‑of‑words, TF‑IDF). - Model Builder (🤖)
Train and compare machine learning models (e.g., Logistic Regression, Naive Bayes) for text classification.
Explore how different algorithms handle sparse text features. - Results (📊)
Interpret predictions, evaluate accuracy, and analyze misclassifications.
Gain experience with confusion matrices and probability scores to understand model confidence.
Add screenshots of each page here once deployed.
Built with ❤️ by Arpit to make machine learning hands‑on and approachable for everyone.