Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metadata database cannot be configured past 1000000 rows #7203

Open
mvforster opened this issue Aug 14, 2023 · 0 comments
Open

metadata database cannot be configured past 1000000 rows #7203

mvforster opened this issue Aug 14, 2023 · 0 comments

Comments

@mvforster
Copy link

Dear Cromwell Team,
I am trying to run a workflow written in WDL using Cromwell v.65. The workflow reports the following error in the stdout:

cromwell.services.MetadataTooLargeNumberOfRowsException: Metadata for workflow <UUID> exists in database but cannot be served because row count of 3138431 exceeds configured limit of 1000000.```
This is after having edited the `cromwell.conf` as suggested in [this thread](https://github.com/broadinstitute/cromwell/issues/2519)

The configuration file used is as follows (edited to remove the main script):

include required(classpath("application"))
backend {
default = LSF
providers {
LSF {
actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
config {
exit-code-timeout-seconds = 300
runtime-attributes = """
Int cpu
Int memory_mb
String? lsf_queue
String? lsf_project
String? docker
"""

            submit = """
            bsub \
            -q ${lsf_queue} \
            -P ${lsf_project} \
            -J ${job_name} \
            -cwd ${cwd} \
            -o ${out} \
            -e ${err} \
            -n ${cpu} \
            -R 'rusage[mem=${memory_mb}] span[hosts=1]' \
            -M ${memory_mb} \
            /usr/bin/env bash ${script}
            """

            submit-docker = """
            module load tools/singularity/3.8.3
            SINGULARITY_MOUNTS='<redacted>'
            export SINGULARITY_CACHEDIR=$HOME/.singularity/cache
            LOCK_FILE=$SINGULARITY_CACHEDIR/singularity_pull_flock

            export SINGULARITY_DOCKER_USERNAME=<redacted>
            export SINGULARITY_DOCKER_PASSWORD=<redacted>

            flock --exclusive --timeout 900 $LOCK_FILE \
            singularity exec docker://${docker} \
            echo "Sucessfully pulled ${docker}"

            bsub \
            -q ${lsf_queue} \
            -P ${lsf_project} \
            -J ${job_name} \
            -cwd ${cwd} \
            -o ${out} \
            -e ${err} \
            -n ${cpu} \
            -R 'rusage[mem=${memory_mb}] span[hosts=1]' \
            -M ${memory_mb} \
            singularity exec --containall $SINGULARITY_MOUNTS --bind ${cwd}:${docker_cwd} docker://${docker} ${job_shell} ${docker_script}
            """

            job-id-regex = "Job <(\\d+)>.*"
            kill = "bkill ${job_id}"
            kill-docker = "bkill ${job_id}"
            check-alive = "bjobs -w ${job_id} |& egrep -qvw 'not found|EXIT|JOBID'"

            filesystems {
                local {
                    localization: [
                        "soft-link", "copy", "hard-link"
                    ]
                    caching {
                        duplication-strategy: [
                            "soft-link", "copy", "hard-link"
                        ]
                        hashing-strategy: "path+modtime"
                    }
                }
            }
        }
    }
}

}
call-caching {
enabled = true
invalidate-bad-cache-results = true
}
database {
profile = "slick.jdbc.HsqldbProfile$"
db {
driver = "org.hsqldb.jdbcDriver"
url = """
jdbc:hsqldb:file:cromwell-executions/cromwell-db/cromwell-db;
shutdown=false;
hsqldb.default_table_type=cached;hsqldb.tx=mvcc;
hsqldb.result_max_memory_rows=10000;
hsqldb.large_data=true;
hsqldb.applog=1;
hsqldb.lob_compressed=true;
hsqldb.script_format=3;
hsqldb.log_size=0
"""
connectionTimeout = 86400000
numThreads = 2
}
insert-batch-size = 2000
read-batch-size = 5000000
write-batch-size = 5000000
metadata {
profile = "slick.jdbc.HsqldbProfile$"
db {
driver = "org.hsqldb.jdbcDriver"
url = """
jdbc:hsqldb:file:cromwell-executions/cromwell-db/cromwell-metadata-db/;
shutdown=false;
hsqldb.default_table_type=cached;hsqldb.tx=mvcc;
hsqldb.result_max_memory_rows=10000;
hsqldb.large_data=true;
hsqldb.applog=1;
hsqldb.lob_compressed=true;
hsqldb.script_format=3;
hsqldb.log_size=0
"""
connectionTimeout = 86400000
numThreads = 2
}
insert-batch-size = 2000
read-batch-size = 5000000
write-batch-size = 5000000
}
}

services {
MetadataService {
metadata-read-row-number-safety-threshold = 5000000
}
}

The main issue that I can see is that Cromwell is ignoring the increased metadata row count. this is despite my separating out the metadata database and increasing the thresholds on both databases.

Prior to running the changes listed above I have ensured that the working directory is completely purged of logs and metadata so as to ensure an unobstructed run.

The documentation currently provides no additional guidance on how to overcome the error. Any assistance will be appreciated.
Best wishes,

Matthieu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant