Skip to content

doc2vec-based assisted close reading with support for abstract concept-based search and context-based search

License

Notifications You must be signed in to change notification settings

ADAH-EviDENce/evidence

Repository files navigation

evidence

evidence -- a doc2Vec-based assisted close reading tool with support for abstract concept-based search and context-based search.

Five recommendations for fair software from fair-software.nl Badges
1. Code repository GitHub badge
2. License License badge
3. Community registry Research Software Directory
4. Enable citation DOI
5. Checklist N/A
Other best practices
Test model generation Test model generation
Frontend Frontend
docker-compose docker-compose
GitHub Super Linter Lint Code Base
Markdown Link Checker Check Markdown links

Machine-supported research in humanities

While research in the humanities has been able to leverage the digitization of text corpora and the development of computer based text analysis tools to its benefit, the interface current systems provide the user with is incompatible with the proven method of scholarly close reading of texts which is key in many research scenarios pursuing complex research questions.

What this boils down to, is the fact that it is often restrictive and difficult, if not impossible, to formulate adequate selection criteria, in particular for more complex or abstract concepts, in the framework of a keyword based search which is the standard entry point to digitized text collections.

Querying by example - close reading with tailored suggestions

evidence provides an alternative, intuitive entry point into collections by leveraging the doc2vec framework. Using doc2vec evidence learns abstract representations of the theme and content of the elements of the user's corpus. Then, instead of trying to translate the scientific query into keywords, after compiling a set of relevant elements as starting points, i.e. examples of the concept the user is interested in, the user can query the corpus based on these examples of their concept of interest. Specifically, evidence retrieves elements with similar abstract representations and presents them to the user, using the users feedback to refine its retrieval. Furthermore, this concept-based query mode is complemented by the ability to perform additional retrieval using more-like-this context based retrieval function provided by elasticsearch. Together, this enables a user to combine the power of a close-reading approach with that of a large digitized corpus, selecting elements from the entire corpus which are likely to be of interest, but leaving the decision up to the user as to what evidence they deem useful.

Documentation for users

Running the demo

The repository contains a demonstration including a corpus and a model. The demonstration allows you the explore the features of this software without supplying your own corpus.

Prerequisites:

Step 1

First test that the docker installation is working. Depending on your system, you need to use either a PowerShell (on Windows) or a terminal (on Linux or on MacOS).

  • For Windows:

    Open a Powershell prompt (press Windows+S and type Powershell) and run:

    docker run hello-world
  • For Linux/MacOs:

    Open a terminal and run:

    docker run hello-world

This should show a message that your Docker installation is working correctly. If so, we can proceed to the installation of evidence, otherwise we suggest to check the Docker troubleshooting page.

Step 2

Download a copy of evidence archive and extract its contents on your machine.

Alternatively, if you have git installed, you can also clone the repository.

git clone https://github.com/ADAH-EviDENce/evidence.git

Step 3

  • For Windows:

    • Open a Powershell prompt

    • Change your current working directory to where you extracted the files. For instance:

      cd C:\Users\JohnDoe\Downloads\evidence-master\evidence-master
  • Linux/MacOS:

    • Open a terminal

    • Change your current working directory to where you extracted the files. For instance:

      cd /home/JohnDoe/Downloads/evidence

The demo can be started with the commands below. Keep this PowerShell/Terminal window open and running during the demo.

  • Set the experiment name

    For Windows:

    $Env:EXPERIMENT="getuigenverhalen"

    For Linux/MacOS:

    export EXPERIMENT="getuigenverhalen"
  • Start the demo

    docker-compose up --build

The command above downloads necessary Docker images, builds all the Docker images and starts the demo.

The command prints many log messages. If all goes well, the last lines of the output should be:

...
indexer_1        | Indexing done.
evidence-master_indexer_1 exited with code 0

Check troubleshooting if you have any issues about this step.

Step 4

Go to the following URL in your web browser: http://localhost:8080/.

Step 5

Once you are done with exploring the demo, you can stop it by selecting the PowerShell/Terminal that is still running the demo and press Ctrl+C.

Generating a model

Prerequisites

Verify that your docker-compose version is at least 1.25.4. (Earlier versions may work).

docker-compose --version

Verify that your docker version is at least 19.03.12. (Earlier versions may work).

docker --version

If you want to use your own corpus, refer to ./experiments/README.md for notes on the required format and directory layout.

Define which corpus to use

Define the name of the dataset/experiment. Here we choose 'getuigenverhalen'. The corpus files should reside under /experiments/<EXPERIMENT>/corpus, see sample corpora.

export EXPERIMENT=getuigenverhalen

Building the model generation image

Be aware that building can take a couple of minutes.

# (starting from the repo root directory)
docker-compose --file generate-model.yml build generate-model

Generating the doc2vec model

# (starting from the repo root directory)
docker-compose --file generate-model.yml run --user $(id -u):$(id -g) generate-model

Build the user interface web application and start it

# (starting from the repo root directory)
export EXPERIMENT=getuigenverhalen
docker-compose build
docker-compose up

Frontend should now be usable at http://localhost:8080.

We strongly suggest not making the frontend available publicly as there is no authentication. Anyone with the url will have access to the frontend. Running it on a local network, for example a university network, should be protected from most evil-doers.

Besides interaction with a web browser you can also interact with the frontend from the command line see here and here for examples.

Optional: manage frontend users

The first page of the frontend forces you to select a user or 'gebruiker' in Dutch. A user called demo exists and can be selected.

Change initial user

The initial user named demo can be renamed by setting the FRONTEND_USER environment variable before running docker-compose up.

For example to have myinitialusername as a user name, do the following:

# (starting from the repo root directory)
export EXPERIMENT=getuigenverhalen
export FRONTEND_USER=myinitialusername
docker-compose up

Add additional users

If the existing user is not enough, you can add a new user to the frontend with the following command (you can choose your own username by replacing mynewusername value in the command below):

export EXPERIMENT=getuigenverhalen
export FRONTEND_USER=mynewusername
docker-compose run usercreator

To add more users, repeat the command with different values for FRONTEND_USER.


Documentation for developers

Checking the MarkDown links

When updating the documentation, you can check if the links are all working by running:

npm install
npm run mlc

Related repositories

https://github.com/ADAH-EviDENce/EviDENce_doc2vec_docker_framework

https://github.com/ADAH-EviDENce/evidence-gui