Skip to content
Otto Brinkhaus edited this page Nov 2, 2023 · 26 revisions

Welcome to the DECIMER_Web wiki!

Deep Learning for Chemical Image Recognition (DECIMER) is a step towards automated chemical image segmentation and recognition. DECIMER is actively developed and maintained by the Steinbeck group at the Friedrich Schiller University Jena.

How to run DECIMER_Web locally

git clone https://github.com/OBrink/DECIMER_Web.git
sudo chmod -R 777 DECIMER_Web
cd DECIMER_Web/
mv .env.example .env
sed -i '$ d' routes/web.php (Which deletes the last line "URL::forceScheme('https');")
sudo chmod -R 777 storage/
sudo chmod -R 777 bootstrap/cache/
docker-compose up --build -d
  • Open your browser (DECIMER works best on Chrome and Chromium-based web browsers) and enter http://localhost:80
  • On the first run, you will be asked to generate an app key for the Laravel app
  • Click on "Generate app key"
  • Refresh the webpage. Now, DECIMER_Web is running locally on your machine. Have fun!

How to scale down the hardware requirements when running DECIMER_Web locally

When the default version of the Docker container is launched, it consumes approximately 20 GB of RAM. This is due to multiple instances of the deep learning models constantly being loaded into memory. When a user uploads a file, the models don't need to be loaded first and the user gets the results much quicker. When multiple people access decimer.ai simultaneously, the workload can be distributed over multiple instances and the app does not slow down.

The above-mentioned multiple pre-loaded model instances are implemented using Python's socket module. Each model instance is handled by a socket server. When the web application processes a file, it sends a request to one of the available socket servers which then processes the given input data with the preloaded model and sends back the result.

By default, the following amount of socket servers with preloaded models is launched:

  • 3 servers with the preloaded DECIMER Image Classifier
  • 3 servers with the preloaded DECIMER Image Transformer (OCSR) model
  • 6 servers with the preloaded DECIMER Segmentation model
  • 4 servers with the preloaded STOUT V2 model

For the container that runs on decimer.ai, this makes sense to enable a good user experience and fast processing times when multiple people access decimer.ai at the same time. For someone who runs it locally, this may not make sense at all.

How to modify the number of preloaded model instances?

1. Modify Supervisor configuration

DECIMER Web uses the process control system Supervisor to automatically launch the socket servers. Supervisor launches processes, generates logs and automatically restarts them if something goes wrong.

  • Open DECIMER_Web/docker/app/supervisor.conf

    For every process that is taken care of by supervisor, there is an entry like:

    [program:segmentation_socket_server_23456]
    command=python3 /var/www/app/app/Python/decimer_segmentation_server.py 23456
    autostart=true
    autorestart=true
    redirect_stderr=true
    stdout_logfile=/var/www/app/storage/logs/segmentation_socket_server_23456.log
    

    This entry is responsible for a process called 'segmentation_socket_server_23456' being started when launching the app. The command for starting it is specified as well as location of the log and parameters that determine that it is automatically started and restarted when it crashes. The script 'decimer_segmentation_server.py' launches a socket server with a preloaded instance of the DECIMER Segmentation model. It accepts a port number as an argument. In this specific case, the socket server listens to port 23456.

    There are similar entries that launch socket servers with the preloaded DECIMER Image Transformer OSCR models by running 'decimer_predictor_server.py $PORT' or the STOUT V2 models by running the script 'stout_predictor_server.py $PORT' (+ similar entries for the DECIMER IMage Lassifier). Remove the entries that correspond to processes that you do not want start.

    For example, if you want to run DECIMER_Web with only one instance of each model (this configuration requires 9 GB of memory), you should only keep the following entries and delete all other entries that launch one of the above-mentioned socket servers:

    [program:segmentation_socket_server_23456]
    command=python3 /var/www/app/app/Python/decimer_segmentation_server.py 23456
    autostart=true
    autorestart=true
    redirect_stderr=true
    stdout_logfile=/var/www/app/storage/logs/segmentation_socket_server_23456.log
    
    [program:ocsr_socket_server_65432]
    command=python3 /var/www/app/app/Python/decimer_predictor_server.py 65432
    autostart=true
    autorestart=true
    redirect_stderr=true
    stdout_logfile=/var/www/app/storage/logs/ocsr_socket_server_65432.log
    
    [program:stout_socket_server_12345]
    command=python3 /var/www/app/app/Python/stout_predictor_server.py 12345
    autostart=true
    autorestart=true
    redirect_stderr=true
    stdout_logfile=/var/www/app/storage/logs/stout_socket_server_12345.log
    
    [program:decimer_classifier_socket_server_11111]
    command=/bin/bash -c "python3 /var/www/app/app/Python/decimer_classifier_server.py 11111"
    autostart=true
    autorestart=true
    redirect_stderr=true
    stdout_logfile=/var/www/app/storage/logs/decimer_classifier_server_11111.log
    

    To run DECIMER_Web with this configuration, you need to define the ports in the right way:

    • The ports for the segmentation sockets start at 23456.
    • The ports for the OCSR sockets start at 65432.
    • The ports for the STOUT sockets start at 12345.
    • The ports for the Image Classifier start at 11111.
    • If you add more sockets start with these ports and increment. For example, if the first OCSR socket server listens to port 65432, then the second one should listen to 65433, the third one to 65434 etc...
    • When you launch the app for the first time, the first one of these processes will download the model automatically. To avoid multiple processes trying to do that at the same time (which does not work), the second socket should only be launched after a few minutes (in the default configuration, we added "sleep 120&&" in the command).

2. Modify socket client configuration

After following step 1), the number of socket servers that are launched with the container has been modified. The only thing that is left to do now is to make sure that the web app online sends requests to the socket servers that are actually running. The scripts decimer_segmentation_client.py, decimer_predictor_client.py, decimer_classifier_client.py and stout_predictor_client.py in DECIMER_Web/app/Python/ send requests to the socket servers. Simply modify the variable 'num_ports' in the main() function of all three scripts according to the number of corresponding socket servers/open ports. In the example above, where there is one socket server per model type, num_ports should have the value 1 in all three client scripts.

How to change the port of the application

In the docker-compose configuration file (docker-compose.yml), the nginx port configuration is defined as follows:

(...)
nginx:
        image: nginx:alpine
        ports:
            - "${APP_PORT:-80}:80"
(...)

${APP_PORT:-80}:80 means that the host port which is defined by the environment variable APP_PORT is mapped to the port 80 in the container. If APP_PORT is not set, 80 is used as a default option (that's what {APP_PORT:-80} means).

There are multiple ways to set the port (in this example, we set it to 8080):

  1. Set the environment variable directly: APP_PORT=8080 docker-compose up --build -d.
  2. Set the environment variable in the .env file: Just add a line that states APP_PORT=8080.
  3. Directly define the port in docker-compose.yml: Replace - "${APP_PORT:-80}:80" with - "8080:80".

How to remove the limitation of 10 pages and 20 structures when running DECIMER Web locally

Please be aware that loading a lot of molecular editor windows may make your browser crash.

  • Remove limit for pdf -> image conversion:

    • Open DECIMER_Web/app/Python/convert_pdf_to_images.py
    • In line 30: set "last_page" argument to "None"
    • The convert_from_path command should look like this:
    page_images = convert_from_path(full_pdf_path,
                                    300,
                                    last_page=None)
    
    • This will make sure that all pages from the pdf are converted to images and processed by DECIMER Segmentation
  • Remove limit of structures processed using DECIMER V2 OCSR in the DecimerController:

    • Open DECIMER_Web/app/Http/Controllers/DecimerController.php
    • Comment out lines 26-28:
    // if ($num_structures > 20){
    //     $structure_depiction_img_paths = array_slice($structure_depiction_img_paths, 0, 20);
    //    }
    
    • Comment out lines 41-46:
    // if ($num_structures > 20){
        //     for ($i = 0; $i < $num_structures - 20; ++$i){
        //         array_push($smiles_array, "");
        //     }
        //     $num_structures = 20;
        // }
    
  • Delete warnings and make sure everything is presented:

    • Open DECIMER_Web/resources/views/index.blade.php
    • Delete lines 160-162:
    if ($num_ketcher_frames > 20) {
        $num_ketcher_frames = 20;
    }
    
    • Delete lines 231-238:
    @if (count($structure_img_paths_array) > 20)
        <div class="text-xl mb-3 text-red-800">
            <strong>Warning:</strong> It appears like you uploaded more than 20 chemical
            structure depictions (or we detected more than 20 structures in your uploaded
            document). Only the first 20 structures are processed. Please host your own
            version of this application if you want to process a large amounts of data.
        </div>
    @endif
    
    • Delete line 258: @if ($key < 20)
    • Delete line 285 @endif
    • Delete line 292: @if ($key < 20)
    • Delete lines 314-316:
    @else
        <strong>The image has not been processed.</strong> </br>
    @endif
    
    • Delete line 322: @if ($key < 20)
    • Delete lines 327-334:
    @else
        <div class="text-xl mb-3 text-red-800">
            <strong>Warning:</strong> It appears like you uploaded more than 20 chemical
            structure depictions (or we detected more than 20 structures in your uploaded
            document). Only the first 20 structures are processed. Please host your own
            version of this application if you want to process a large amounts of data.
        </div>
    @endif
    
    • Open DECIMER_Web/resources/views/default.blade.php
    • Delete lines 51-53
    if ($num_ketcher_frames > 20) {								 `
         $num_ketcher_frames = 20;
    }
    
    • Delete lines 73-75
    if ($num_ketcher_frames > 20) {								 `
         $num_ketcher_frames = 20;
    }
    

DECIMER.AI is powered by

drawing drawing drawing

License:

  • This project is licensed under the MIT License - see the LICENSE file for details

Citation

  • DECIMER.ai - An open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications: Rajan, K., Brinkhaus, H. O., Agea, M. I., Zielesny, A., Steinbeck, C. ChemRxiv, (2023).
  • DECIMER: towards deep learning for chemical image recognition: Rajan, K., Zielesny, A., Steinbeck, C. J Cheminform, 12, 65 (2020).
  • DECIMER-Segmentation: Automated extraction of chemical structure depictions from scientific literature: Rajan, K., Brinkhaus, H.O., Sorokina, M. et al. J Cheminform, 13, 20 (2021).
  • DECIMER 1.0: deep learning for chemical image recognition using transformers: Rajan, K., Zielesny, A., Steinbeck, C. J Cheminform, 13, 61 (2021).
  • STOUT: SMILES to IUPAC names using neural machine translation: Rajan, K., Zielesny, A., Steinbeck, C. J Cheminform, 13, 34 (2021).