large run failing at mafft step #44

splaisan · 2023-12-13T09:13:05Z

Hi,
I have ran the analysis of 4 smart-cell runs after renaming the fastq to make them globally unique.
It seemed to go well but after 3 days of computing (44 threads 292GB RAM), it finally died in the mafft step (apparently) and I do not have the final html report and probably also not all the data.
I ran nextflow with -resume to the same result and also ran the experiment on a stronger server with identical outcome (84threads and 512GB RAM).

I attach a zip of the nextflow log, can you please help me fix this, I would rather run the 4 runs in one go to deliver merged data to the end-user since merging the individual runs (which succeeded all 4) is not documented and will in the best case not generate the nice html report.

Thanks in advance

dot.nextflow.log.zip
dot.nextflow_auto.log.zip

# all parameters are standard and rarefaction auto gives 8361 in the second run and was set to 10000 in the first run
# (rarefaction='' in the auto-run and ="--rarefaction_depth 10000" in the first run)
nextflow run main.nf \
  --input "${outfolder}/${outpfx}_samples.tsv" \
  --metadata "${outfolder}/${outpfx}_metadata.tsv" \
  --outdir "${outfolder}" \
  --dada2_cpu "${cpu}" \
  --vsearch_cpu "${cpu}" \ 
  --cutadapt_cpu "${cpu}" \
  "${rarefaction}" \
  --min_asv_totalfreq "${min_asv_totalfreq}" \
  --min_asv_sample "${min_asv_sample}" \
  --colorby "${colorby}" \
  -profile docker 2>&1 | tee ${outfolder}/run_log.txt

The text was updated successfully, but these errors were encountered:

proteinosome · 2023-12-14T11:53:16Z

@splaisan Are you running this locally on a single node without a job scheduler? In that case, would you be able to find the error file in the tmp directory /tmp/qiime2-q2cli-err-adqu7ohp.log as indicated in the log? That would give us more idea on what happened.

Do you also know how many ASVs were discovered post-DADA2? You should be able to find dada2-ccs_stats.qza in the DADA2 output folder and open that in QIIME 2 View to investigate the stats.

Are you also doing separate denoising or pooled denoising? See here for a description.

splaisan · 2023-12-14T16:59:22Z

Yes, I looked for the file but it is not in my /tmp.
Is it possible that the /tmp is that of the docker instead of that of my server (not avail after run completion or crash)?
To test this I am planning to mount /tmp to a local workfolder/tmp by editing the docker config file.
We also wonder if the mafft job is not too heavy with 624 samples. To test that we have added a conf block with much more ram to the config, specifically for that job.
I hope the resume will allow quick debugging.
I am ooo few days but will post at my return.
Cheers
S
PS will look at your other points then too

proteinosome · 2023-12-19T08:12:12Z

@splaisan Yes, the /tmp directory is the default in Docker. You can set the TMPDIR variable and mount the tmpdir for the tools to use a specific temporary directory. See this issue for example.

splaisan · 2023-12-21T10:03:04Z

We (thanks to Kobe in our team) finally got the run to end happily by giving more space to work to two critical steps in the workflow.

Our fix was done in two ways:

add a custom config file extra.config saved in the repo folder with the following content

process {
  // more RAM for the diversity job
  withName: qiime2_phylogeny_diversity {
    cpus = 8
    memory = 240.GB
  }
  // more RAM for the report building
  withName: html_rep {
    cpus = 8
    memory = 128.GB
  }
}

// correct bug in path for reports
// Generate report
report {
  enabled = true
  overwrite = true
  file = "$params.outdir/report/report.html"
}
// Timeline
timeline {
  enabled = true
  overwrite = true
  file = "$params.outdir/report/timeline.html"
}
// DAG
dag {
  enabled = true
  file = "$params.outdir/report/dag.html"
  overwrite = true
}

Note: The amount of extra RAM is probably excessive but at least this ran without dying.

Then we ran the nextflow command with minor edits:

# create tmp folder in output folder
mkdir -p ${outfolder}/tmp

# run edited nextflow command
TMPDIR="${outfolder}/tmp" nextflow run main.nf \
  <... more command arguments ...> \
  -profile docker \
  -c extra.config  2>&1 | tee ${outfolder}/run_log.txt

Declaring TMPDIR just before running the nextflow command ensures that the /tmp (normally located inside the docker image) is remapped to a local folder and visible after the run ends to allow reading error report files when things turn out bad (as discussed in #42)

splaisan changed the title ~~large run failing at maff step~~ large run failing at mafft step Dec 13, 2023

splaisan closed this as completed Dec 21, 2023

ewissel mentioned this issue May 10, 2024

Exit code 1 with Process pb16S:qiime2_phylogeny_diversity (1) #53

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

large run failing at mafft step #44

large run failing at mafft step #44

splaisan commented Dec 13, 2023 •

edited

Loading

proteinosome commented Dec 14, 2023

splaisan commented Dec 14, 2023 •

edited

Loading

proteinosome commented Dec 19, 2023

splaisan commented Dec 21, 2023 •

edited

Loading

large run failing at mafft step #44

large run failing at mafft step #44

Comments

splaisan commented Dec 13, 2023 • edited Loading

proteinosome commented Dec 14, 2023

splaisan commented Dec 14, 2023 • edited Loading

proteinosome commented Dec 19, 2023

splaisan commented Dec 21, 2023 • edited Loading

splaisan commented Dec 13, 2023 •

edited

Loading

splaisan commented Dec 14, 2023 •

edited

Loading

splaisan commented Dec 21, 2023 •

edited

Loading