Skip to content

Preprocess text documents (PDFs) using Python NLP libraries. Extract text with pdfplumber, tokenize with NLTK and SpaCy, remove Greek stopwords, and optionally handle punctuation. Includes scripts and folder structure for preparing datasets for machine learning or deep learning NLP workflows.

Notifications You must be signed in to change notification settings

AlexTsev/NLP_Preprocess_Documents

About

Preprocess text documents (PDFs) using Python NLP libraries. Extract text with pdfplumber, tokenize with NLTK and SpaCy, remove Greek stopwords, and optionally handle punctuation. Includes scripts and folder structure for preparing datasets for machine learning or deep learning NLP workflows.

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages