-
Notifications
You must be signed in to change notification settings - Fork 0
Preprocess text documents (PDFs) using Python NLP libraries. Extract text with pdfplumber, tokenize with NLTK and SpaCy, remove Greek stopwords, and optionally handle punctuation. Includes scripts and folder structure for preparing datasets for machine learning or deep learning NLP workflows.
AlexTsev/NLP_Preprocess_Documents
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
About
Preprocess text documents (PDFs) using Python NLP libraries. Extract text with pdfplumber, tokenize with NLTK and SpaCy, remove Greek stopwords, and optionally handle punctuation. Includes scripts and folder structure for preparing datasets for machine learning or deep learning NLP workflows.
Stars
Watchers
Forks
Releases
No releases published