- Python>=3.4
- miniconda/Anaconda
It is recommended to use miniconda (or conda
) as opposed to Python only default pip
as library dependecy manager and to use the local environment of this repository for reproducibility.
- Install miniconda if
conda
is not yet installed on the system through Anaconda - Setup local environment:
conda env create --prefix ./envs -f environment.yml
- Activate the project environment by running
conda activate ./envs
- Start JupyterLab server:
jupyter lab
- To deactivate the environment, run
conda deactivate
To run the labeler locally, additionally the following needs to be done (need to restart the JupyterLab server afterwards):
- Make sure that NodeJS is installed with:
conda install -c conda-forge nodejs
- Enable Jupyter Widgets:
jupyter labextension install @jupyter-widgets/jupyterlab-manager
- Load the spacy language model:
python -m spacy download en_core_web_md
Raw annual reports, sustainability reports and if available 20-F's of the Euro STOXX 50 for the years 1999-2019 can be found here.
-
Install Docker Desktop Install Docker
-
The
docker-compose.yml
in the root directory contains the container definitions and sets up networking -
The
Dockerfile
in the./data
directory contains the container config for the pdf mining tasks -
Start tmux:
tmux
ortmux new -s myname
-
Connect to container:
docker ps
anddocker exec -it pdf-mining bash
-
Name the session accordingly by first send the prefix
Ctrl
+b
and then$
-
Start long running process and leave/detach the session with prefix
Ctrl
+b
and thend
-
Later, when you want to attach to the session:
tmux list-sessions
tmux attach-session -t 0
- Update environment file if a package was added:
conda env export -f environment.yml --no-builds
orconda env export --no-builds > environment.yml
. NOTE: This will overwrite everything and adds unecessary packages, making cross-platform compatibility difficult. So better to update manually...