📄 Document Classifier (NLP | Deep Learning)

A deep learning–based document classification system built using TensorFlow, NLP preprocessing, and Streamlit.
This project classifies raw text and .txt files into predefined categories and also supports bulk file sorting into folders based on predicted labels.

🚀 Features

✅ Classify typed or pasted text
✅ Upload and classify .txt files
✅ Batch document classification from a folder
✅ Automatic file sorting into category folders
✅ Deep learning–based prediction using a trained neural network
✅ Interactive Streamlit web interface
✅ Displays class probabilities

🧠 Tech Stack

Python 3.10
TensorFlow / Keras
NLTK
Streamlit
NumPy & Pandas
Regex for text cleaning

📂 Project Structure


document-classifier-nlp/
│
├── Data/                               # Training data
├── New_Files/                          # New test files
├── UI_ScreenShot.png                   # App preview image
├── app.py                              # Streamlit app
├── basic_dl_doc_classification.ipynb   # Training notebook
├── meta_basic.json                     # Model metadata (categories, max_len)
├── news_basic_dl_model.h5              # Trained deep learning model
└── requirements.txt                    # Required dependencies

⚙️ Installation & Setup

1️⃣ Create Environment (Recommended)

conda create -n docclass python=3.10 -y
conda activate docclass

2️⃣ Install Dependencies

pip install -r requirements.txt

3️⃣ Download NLTK Resources (Auto on first run)

The app automatically downloads:

stopwords
punkt
wordnet

▶️ Run the Streamlit App

python -m streamlit run app.py

Then open the browser link shown in terminal, for example:

http://localhost:8501

🗃️ Model & Tokenizer

This project uses:

A trained deep learning model (.h5)
A saved tokenizer (.pkl)

⚠️ These files are not included in the repository due to size and security. You must place your trained model and tokenizer in the project root to run predictions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 Document Classifier (NLP | Deep Learning)

🚀 Features

🧠 Tech Stack

📂 Project Structure

⚙️ Installation & Setup

1️⃣ Create Environment (Recommended)

2️⃣ Install Dependencies

3️⃣ Download NLTK Resources (Auto on first run)

▶️ Run the Streamlit App

🗃️ Model & Tokenizer

📊 Functional Modes

✅ Single Text Classification

✅ Bulk Folder Classification

🖼️ Streamlit App Interface Preview

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Data		Data
New_Files		New_Files
.gitignore		.gitignore
README.md		README.md
UI_ScreenShot.png		UI_ScreenShot.png
app.py		app.py
basic_dl_doc_classification.ipynb		basic_dl_doc_classification.ipynb
meta_basic.json		meta_basic.json
news_basic_dl_model.h5		news_basic_dl_model.h5
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

📄 Document Classifier (NLP | Deep Learning)

🚀 Features

🧠 Tech Stack

📂 Project Structure

⚙️ Installation & Setup

1️⃣ Create Environment (Recommended)

2️⃣ Install Dependencies

3️⃣ Download NLTK Resources (Auto on first run)

▶️ Run the Streamlit App

🗃️ Model & Tokenizer

📊 Functional Modes

✅ Single Text Classification

✅ Bulk Folder Classification

🖼️ Streamlit App Interface Preview

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages