PDF Processing Software

Description

Software that processes papers in PDF format by calling Grobid's web service and makes a wordcloud for each of them as well as giving a graph indicating the number of figures per article and a list of links found in all of them.

Requirements

Papers used for input must have an abstract section or the software will fail.
Docker must be installed
Download the Grobid docker image with

docker pull lfoppiano/grobid:0.7.2

Dependencies

This build has been developed on Python 3.10 and should work with higher versions.

Python libraries matplotlib and wordcloud must be previously installed.

Dependencies can be found here to use them to build the environment with Conda

Conda

You can install Conda to easily install all the dependencies needed on an environment(recommended)

If you don't want to use Conda then skip step 3 from the Instructions segment

Instructions

Copy this repo

git clone https://github.com/andriumon/OpenScienceAI.git

Go to the repo and then to the src directory

cd OpenScienceAI/src

Install dependencies or copy the dependencies file to the src directory and use Conda to do it with

conda create -n newenv  
conda activate newenv  
python3 -m pip install --upgrade pip  
pip install -r dependencies.txt

Note: If python3 doesn't work, try py

Create a folder called "pdfs" in the src directory and put inside all the papers you want to process
Install Grobid's Python Client there
Run Grobid with Docker

docker run -t --rm -p 8070:8070 lfoppiano/grobid:0.7.2

Run the script

python3 pdfProcessing.py

You can check the results in the folders "wordclouds", "figures" and "links", which will be created in the directory after you run the script.

Workflow

Contact

Main author and contact: andres.montero.martin@alumnos.upm.es

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
assets		assets
dependencies		dependencies
docs		docs
src		src
test		test
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
codemeta.json		codemeta.json
rationale.md		rationale.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Processing Software

Description

Requirements

Dependencies

Conda

Instructions

Workflow

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF Processing Software

Description

Requirements

Dependencies

Conda

Instructions

Workflow

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages