Skip to content

Readfish sometimes go into a stuck state #159

@hasindu2008

Description

@hasindu2008

First of all, thank you very much for writing this nice piece of tool. We have been running it for a while and there is this rare situation where readfish freezes. That is , no more log messages like 2021-08-17 15:44:42,118 ru.ru_gen 24R/0.02245s are printed anymore and I believe that it has gone into a frozen state. In such instances, MinKNOW is still sequencing smoothly, however, without any rejections and likely to be that readfish is frozen.

Digging into the logs, I see that Guppy server has crashes around the same time that readfish froze.

2021-08-17 15:44:47.679933   ERROR: common_process_crashed (host)
    executable: /data1/software/ont-guppy/bin/guppy_basecall_server
    name: guppy
    arguments: --config
               dna_r9.4.1_450bps_fast.cfg
               --port
               5555
               --log_path
               /var/log/minknow/guppy
               --ipc_threads
               4
               --max_queued_reads
               5000
               --num_callers
               6
               -x
               cuda:0
    exit_code: 0
    recent_output: ONT Guppy basecall server software version 4.2.2+effbaf8, client-server API version 3.2.0
                   config file:         /data1/software/ont-guppy/data/dna_r9.4.1_450bps_fast.cfg
                   model file:          /data1/software/ont-guppy/data/template_r9.4.1_450bps_fast.jsn
                   log path:            /var/log/minknow/guppy
                   chunk size:          2000
                   chunks per runner:   160
                   max queued reads:    5000
                   num basecallers:     6
                   num socket threads:  4
                   max returned events: 50000
                   gpu device:          cuda:0
                   kernel path:
                   runners per device:  8
                   Starting server on port: 5555
                   [guppy/error] pipeline::ThreadedNode::worker_function: Exception thrown in basecaller_node worker thread: Exception thrown in Caller worker thread: Called read appears to be malformed for aggregation.
                   The basecall server has shut down successfully.

Any tips on how to resolve this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions