Skip to content

ClementSicard/un-semun

Repository files navigation

🇺🇳 SemUN repository

Repository for SemUN project. It is composed of a docker-compose stack, with:

Built with OrbStack Built with neo4j Packages Linter VersionBuilt with HuggingFace Built with spaCy

Table of Contents

Description & Paper

  • To have more information on the project, please refer to the project proposal
  • For more details about the final result, please refer to the paper

Running the project

Install requirements

You also need to have Docker installed, I'm using OrbStack as a Docker desktop client for macOS, but regular Docker installation works perfectly fine as well.

Run the project

When Docker is setup, you just have to run:

# Start the containers
docker-compose up -d

Open the frontend at http://localhost:8080/ if using Docker Desktop or http://un-semun-frontend.un-semun.orb.local/ if using OrbStack.

Stop the stack

To stop the stack, just run:

docker-compose down

You are all set! 🎉

Ingest documents using the ML pipeline API

To ingest documents, you can use the ML pipeline API. You can find more information about it in the README.md of the un-ml-pipeline folder.

You basically need to send a POST request to the /run endpoint at URL http://un-semun-api.un-semun.orb.local with a JSON body containing the following fields:

[
    {"recordId": "<record_id_0>"},
    {"recordId": "<record_id_1>"},
    {"recordId": "<record_id_2>"},
    ...
]

You can also send a POST request to the /run_search endpoint, at the same URL, with a natural language query to the UN Digital Library. The API will then scrape the results and ingest them in the database.

{
  "q": "<query>"
}

You can also include a limit number of results to scrape, by adding a field "n": <value> in the payload.

For instance:

{
  "q": "Women in peacekeeping",
  "n": 256
}