text mining in python
The code in this repository is meant to work on a cluster environment, together with ToMaR.
This is currently work in progress (some scripts are not converted yet), thus it might not work in your environment at all.
pip install nltk
TODO: Configure the nltk_data directory.
Install the (german?) tokenizer:
python (or: /opt/anaconda3/bin/python) >>> import nltk >>> nltk.download() >>> Downloader> d >>> Identifier> punkt
Edit some parameters:
also add the correct path to the input file and the
java_path = "/path/to/java_8_JRE/bin/java" model_ger = "/path/to/ner/model" stanford_jar = "/path/to/stanford-ner.jar"