# Local DraCor instance with Docker

This notebook explains how you can setup a local DraCor instance using [Docker](https://www.docker.com). The instructions are **only valid for Mac/Linux** at the moment.

## Installing Docker and Docker Compose

(tba)
To run a local instance of the DraCor platform you need to have Docker and Docker Compose installed on the machine you are executing this notebook. Please refer to the respective documentations and follow the instructions on how to install the tools needed.

## Running commands in the Terminal/Shell
We will rarely use Python Code in this notebook, but instead execute various commands in the shell/terminal of your machine. By adding `!` before the command, it will be executed in your terminal in the background and you will see the results in the notebook. E.g. to test, if Docker is installed – and if you a on a Linux/Mac machine, you can use `which` to get the location of the program. 

In [None]:
#by adding ! before the code, you will execute a command in the terminal
#check, where docker is installed
!which docker

In [None]:
#check for docker compose
!which docker-compose

## Getting the DraCor API
You will now need to get the code of the DraCor API, which can be found on github: https://github.com/dracor-org/dracor-api. 

We will use `git` to get the code. If you are using a Mac and have the [Xcode](https://en.wikipedia.org/wiki/Xcode) Command Line Tools installed, you should be able to follow along, but you can still visit the repository, download the code and add it to the folder, where your notebook is located if it doesn't work the described way.

In [None]:
!which git

In [None]:
# set the url, we will need for pulling and store it in a variable. 
# We can use the variable in our terminal command below by appending a $ to the variable name.
dracor_api_clone_url = "https://github.com/dracor-org/dracor-api.git"

In [None]:
!git clone $dracor_api_clone_url

If this was successful, you should now have a new folder called `dracor-api`. You can list the contents of the directory with `ls`.

In [None]:
# list the contents of your directory
!ls

You could put the code somewhere else as well, but then you have to change the path, we add to a variable `apifolder`:

In [None]:
# set the path to the code of the dracor api:
apifolder = "dracor-api"

## Building DraCor with Docker
(Skip this section for now, but logically it should be described here; see also the ANT error described below.)

When you run a DraCor Docker container for the first time, an image will be built an then reused on the next starts. You can actively re-build the image without relying on the cache.

In [None]:
# make a clean build without using cached images
!docker-compose build --no-cache

## Running DraCor with Docker
The `README.md` of the DraCor API Repository explains how to run the platform using Docker. You have a look at the [instructions on Github](https://github.com/dracor-org/dracor-api#getting-started) or just look at the file in this notebook by using `cat` to display the text:

In [None]:
# display the contents of the README file
!cat $apifolder/README.md

If we were now inside our termina, we would have to follow the instructions as detailed in the `README.md`:

```
git clone https://github.com/dracor-org/dracor-api.git
cd dracor-api
docker-compose up
```

We already completed the first step, but because we are executing the code from this notebook, we have to slightly adapt it. Although we can directly execute code in the terminal, not all commands work like expected (more information on this behaviour of Jupyter Notebooks can be found in a Tutorial of the O'Reilly [Data Science Handbook](https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.05-IPython-And-Shell-Commands.ipynb) (VanderPlas 2016). 

Although it's possible to change the working directory with a "magic command", we won't "go" into the dracor-api folder with `cd`.

In [None]:
# list the contents of the API folder
!ls $apifolder

For the next step we will use the `docker-compose.yml` file that is located in the source code folder. With the command `grep` we can filter the directory listing above and see if the needed file is there as we expect:

In [None]:
# list the contents of the folder and check, if the compose file is there
!ls $apifolder | grep docker-compose.yml

We can now run the docker-compose command by explicitly telling the tool where to look for the compose file by using the `-f` flag. If you run the following command, the API will be started using docker. It will take some time depending on whether you run the API this way the first time (docker will have to build the image before starting it). You will see a lot of text output in the cell. Wait, until the information stream slows down and look for lines similar to the following:

```
api_1       | 20 Jul 2022 07:45:39,137 [main] INFO  (JettyStart.java [run]:288) - Server has started, listening on: 
api_1       | 20 Jul 2022 07:45:39,138 [main] INFO  (JettyStart.java [run]:290) - http://XXX.XX.X.X:8080/  
```
This will tell you, that the process of starting the underlying eXist application has been completed. You should now be able to access your local DraCor instance by opening the url from the output or try http://localhost:8080 which will show you the eXist Dashboard. If you want to go to the frontend, visit http://localhost:8088. 

Your DraCor Platform will be empty, you have to manually load the respective corpora. We will do this in the next steps. 

You can stop you instance from this notebook by clicking on the cell below and using the stop button in the Jupyter notebooks menu.

In [None]:
!docker-compose -f $apifolder/docker-compose.yml up

If this process fails, you might have to change the line `ENV ANT_VERSION 1.10.11` in the `DOCKERFILE` in the source code repo, see corresponding [issue on github](https://github.com/dracor-org/dracor-api/issues/164) (that might be resolved soon). It should be changed to a version, that is available here: https://downloads.apache.org/ant/binaries/. You can also try to run `docker-compose build --no-cache` in your terminal after you changed the line.

To proceed, you have to stop you new local instance of the DraCor plattform, otherwhise the following cells won't run. Use the stop Button from the notebook menu.

## Running Dracor in the background using `os`
Running subprocesses in the background of a notebook doesn't seem to be trivial, for example, the following command will fail:

In [None]:
!docker-compose -f $apifolder/docker-compose.yml up&

We can still accomplish that by making use of the python library `os`. We have to import the library and then send our docker-compose command to the `system` function of `os`.

In [None]:
# import the library
import os 

In [None]:
#store the command in a variable "cmd"
cmd = "docker-compose -f " + apifolder + "/docker-compose.yml up&"

#send the command
os.system(cmd);

We now have a running local DraCor instance that we can access at (http://localhost:8088). The instance will stop working when you shutdown the kernel of this notebook.

## Using the local API
We not only have a running (but empty) local database and a frontend, but also a working local API, e.g. `http://localhost:8088/api`, which we will use in the remainder of the tutorial. We won't cover the basics of how to use the DraCor-API in Python, so please refer to the introductory tutorial ["DraCor API"](https://github.com/dracor-org/dracor-notebooks/tree/main/api-tutorial) in the [dracor-notebooks](https://github.com/dracor-org/dracor-notebooks) repository. 

We define a generic function to send requests to the API. This function is basically the generic function mentioned in some notebooks, but the default `apibase` is changed to the local API `http://localhost:8088/api/` in the line `apibase = "http://localhost:8088/api/"`:

In [None]:
# import libraries json and requests
import json
import requests

#corpusname:str -> []
def get(**kwargs):
    #corpusname=corpusname
    #playname=playname
    #apibase="https://dracor.org/api/"
    #method=method
    #parse_json: True
    
    #could set different apibase, e.g. https://staging.dracor.org/api/ [not recommended, pls use the production server]
    if "apibase" in kwargs:
        if kwargs["apibase"].endswith("/"):
            apibase = kwargs["apibase"]
        else:
            apibase = kwargs["apibase"] + "/"
    else:
        #use local API per default
        apibase = "http://localhost:8088/api/"
    if "corpusname" in kwargs and "playname" in kwargs:
        # used for /api/corpora/{corpusname}/play/{playname}/
        if "method" in kwargs:
            request_url = apibase + "corpora/" + kwargs["corpusname"] + "/play/" + kwargs["playname"] + "/" + kwargs["method"]
        else:
            request_url = apibase + "corpora/" + kwargs["corpusname"] + "/play/" + kwargs["playname"]
    elif "corpusname" in kwargs and not "playname" in kwargs:
        if "method" in kwargs:
            request_url = apibase + "corpora/" + kwargs["corpusname"] + "/" + kwargs["method"]
        else:
            request_url = apibase + "corpora/" + kwargs["corpusname"] 
    elif "method" in kwargs and not "corpusname" in kwargs and not "playname" in kwargs:
            request_url = apibase + kwargs["method"]
            
    else: 
        #nothing set
        request = request_url = apibase + "info"
    
    #send the response
    r = requests.get(request_url)
    if r.status_code == 200:
        #success!
        if "parse_json" in kwargs:
            if kwargs["parse_json"] == True:
                json_data = json.loads(r.text)
                return json_data
            else:
                return r.text
        else:
            return r.text
    else:
        raise Exception("Request was not successful. Server returned status code: "  + str(r.status_code))

In [None]:
#use the defined function to get API Info (default method, if nothing else is specified)
get(parse_json=True)

In addition to the result of the request you will also see the log from the running process in the background, e.g. 

```
frontend_1  | XXX.XXX.X.X - - [20/Jul/2022:09:58:24 +0000] "GET /api/info HTTP/1.1" 200 118 "-" "python-requests/2.27.1" "-"
```

This tells you, that there has been a `GET` request to the `/info` endpoint, which was successful (see status code `200`).

You can also list the available corpora by calling the endpoint `/corpora` which should return an empty list `[]` because there are no corpora loaded yet:

In [None]:
get(method="corpora",parse_json=True)

## Add a corpus and load the data
The [documentation](https://github.com/dracor-org/dracor-api#load-data) in the `README.md` explains how corpora can be added and loaded by using `curl` in the command line. 

Adding a corpus is a two step process:

* in a first step, a corpus needs to be added to the database. This step will only add few metadata, a `name`, a `title` and a link to the repository, from which the data can be retrieved;
* secondly,In a second step the TEI files of the plays are loaded from the repository.

We will add the [Test Drama Corpus](https://github.com/dracor-org/testdracor) "testdracor". We construct the metadata that will be sent as payload in the `POST` request to the `/corpora` endpoint (for details, please refer to the [API Documentation](https://dracor.org/doc/api#/admin/post-corpora)).

The endpoints in the "Admin" section (see [Documentation](https://dracor.org/doc/api#/admin)) are only available for authorized users with admin rights. The default user of the eXist-DB is `admin` and the password is an empty string. This should, of course, be changed for production use, but by default, a local instance of the eXist-DB will have this credentials, which we assign to the variables `usr` and `pwd`. To be able to include these information in the request, we need to import the class `HTTPBasicAuth` from the `requests` library first:

In [None]:
#needed for authorization
from requests.auth import HTTPBasicAuth

#Username of the local instance
usr = "admin"
#Password of the admin user
pwd = ""

We also have to construct the metadata of our corpus:

In [None]:
#construct the payload
testdracor_metadata = {
  "name": "test",
  "title": "Test Drama Corpus",
  "repository": "https://github.com/dracor-org/testdracor"
}

We can then send the `POST` request to the `/corpora` endpoint, supply the metadata and also include the credential of the admin user:

In [None]:
#url of the corpora endpoint
corpora_endpoint_url = "http://localhost:8088/api/corpora"

#send the POST request using library requests
r = requests.post(corpora_endpoint_url, json = testdracor_metadata, auth=HTTPBasicAuth(usr, pwd))

When running for the first time, the API should return a HTTP status code of `200` (actually, `201` would be better!) For other status codes, please check the documentation. For example, if a corpus already exists, the API will return a status code od `409`. To get the status code, you can use the method `status_code` on you request object `r`.

In [None]:
print(r.status_code)

If everything went well, you should see your newly added corpus listed on the DraCor homepage of your [local instance](http://localhost:8088/). Your corpus might show up twice, because DraCor isn't build to display only one corpus, but you can check if you corpus has been successfully added by using the API. This should return only one entry in the list:

In [None]:
#send a request to the /corpora endpoint
get(method="corpora",parse_json=True)

## Loading the plays
If you go to your corpus on the local platform, e.g. http://localhost:8088/test, you will see, that there are no plays included. Likewhise, querying the `/corpora/{corpusname}` endpoint (see [Documentation](https://dracor.org/doc/api#/public/list-corpus-content)) will return an empty list:

In [None]:
get(method="corpora",corpus="test",parse_json=True)

To trigger the loading process, you have to send a JSON array containing `{"load" : true}` (in a Python dictionary, the Boolean value will be `True`) to the `/corpora/{corpusname}` endpoint.

In [None]:
#construct the url
load_test_endpoint_url = "http://localhost:8088/api/corpora/test"

#construct the payload to be send to the endpoint
load_cmd_payload = {"load" : True}

#send the POST request using library requests
r = requests.post(load_test_endpoint_url, json = load_cmd_payload, auth=HTTPBasicAuth(usr, pwd))

If a corpus update was sheduled, you should get a `202` status code:

In [None]:
# inspect the status code of your POST request
print(r.status_code)

(Clearing FUSEKI doesn't seem to work here. This needs to be fixed, see corresponding issue [here](https://github.com/dracor-org/dracor-api/issues/165)).

## Manually adding a play to a corpus
Plays can also be added to a corpus manually by sending a `PUT` request to the `/corpora/{corpusname}/play/{playname}/tei` endpoint. The documentation of the endpoint can be found [here](https://dracor.org/doc/api#/admin/play-tei-put).