Skip to content

andriumon/OpenScienceAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF Processing Software

DOI Documentation Status

Description

Software that processes papers in PDF format by calling Grobid's web service and makes a wordcloud for each of them as well as giving a graph indicating the number of figures per article and a list of links found in all of them.

Requirements

  • Papers used for input must have an abstract section or the software will fail.
  • Docker must be installed
  • Download the Grobid docker image with
docker pull lfoppiano/grobid:0.7.2

Dependencies

This build has been developed on Python 3.10 and should work with higher versions.

Python libraries matplotlib and wordcloud must be previously installed.

Dependencies can be found here to use them to build the environment with Conda

Conda

You can install Conda to easily install all the dependencies needed on an environment(recommended)

If you don't want to use Conda then skip step 3 from the Instructions segment

Instructions

  1. Copy this repo
git clone https://github.com/andriumon/OpenScienceAI.git
  1. Go to the repo and then to the src directory
cd OpenScienceAI/src 
  1. Install dependencies or copy the dependencies file to the src directory and use Conda to do it with
conda create -n newenv  
conda activate newenv  
python3 -m pip install --upgrade pip  
pip install -r dependencies.txt

Note: If python3 doesn't work, try py

  1. Create a folder called "pdfs" in the src directory and put inside all the papers you want to process
  2. Install Grobid's Python Client there
  3. Run Grobid with Docker
docker run -t --rm -p 8070:8070 lfoppiano/grobid:0.7.2
  1. Run the script
python3 pdfProcessing.py

You can check the results in the folders "wordclouds", "figures" and "links", which will be created in the directory after you run the script.

Workflow

This is a total mess

Contact

Main author and contact: andres.montero.martin@alumnos.upm.es

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages