Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

returned non-zero exit status 255 for 5_epa_outgroup_rooting.py and IndexError: list index out of range for 6_root_digger_rooting.py #2

Open
vinitamehlawat opened this issue Nov 18, 2021 · 53 comments

Comments

@vinitamehlawat
Copy link

vinitamehlawat commented Nov 18, 2021

Hi @idaios

I prepare my dataset from scratch having high quality 10 sars-cov2 genome with 2 outgruop so total sequences in my data are 12 for which I again ran all script, this time I am stuck at 5th script

When first time ran 5th script the ERROR was :

ERROR: Must run iqtree_tests stage of pipeline first
Traceback (most recent call last):
  File "/home/vinita/covid19_cme_analysis/./pipeline/5_epa_outgroup_rooting.py", line 23, in <module>
    raise e
  File "/home/vinita/covid19_cme_analysis/./pipeline/5_epa_outgroup_rooting.py", line 20, in <module>
    util.expect_file_exists( paths.raxml_credible_ml_trees )
  File "/home/vinita/covid19_cme_analysis/scripts/util.py", line 99, in expect_file_exists
    raise RuntimeError( "File doesn't exist: " + file_path )
RuntimeError: File doesn't exist: /home/vinita/covid19_cme_analysis/work_dir/2021-11-17_05/smsao/results/trees/credible_ml_trees.newick

Then I ran 7th script first which is 7_iqtree_tests.py and again ran 5th script which is giving following error

./pipeline/5_epa_outgroup_rooting.py work_dir/2021-11-17_05/smsao

0  /  13
No protocol specified
No protocol specified
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vinita/covid19_cme_analysis/./pipeline/5_epa_outgroup_rooting.py", line 71, in <module>
    cur_modelfile = raxml_launcher.evaluate(tree_file, ref_msa, cur_outdir)
  File "/home/vinita/covid19_cme_analysis/scripts/raxml_launcher.py", line 75, in evaluate
    sub.check_call(cmd, cwd=out_dir, stdout=sub.DEVNULL)
  File "/home/vinita/miniconda3/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/vinita/covid19_cme_analysis/software/raxml-ng/bin/raxml-ng-mpi', '--evaluate', '--msa', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-17_05/smsao/data/covid_edited.fasta', '--model', 'GTR+FO+G4', '--tree', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-17_05/smsao/runs/epa_runs/0/tree.newick', '--prefix', 'eval', '--threads', '4', '--blopt', 'nr_safe', '--redo', '--blmin', '0.000000001']' returned non-zero exit status 255.

Further I tried 6th script

./pipeline/6_root_digger_rooting.py work_dir/2021-11-17_05/fmsan/

Traceback (most recent call last):
  File "/home/vinita/covid19_cme_analysis/./pipeline/6_root_digger_rooting.py", line 84, in <module>
    writer = csv.DictWriter(csv_file, fieldnames=results[0].keys())
IndexError: list index out of range
(name_of_my_env) ./pipeline/6_root_digger_rooting.py work_dir/2021-11-17_05/smsao/
Traceback (most recent call last):
  File "/home/vinita/covid19_cme_analysis/./pipeline/6_root_digger_rooting.py", line 84, in <module>
    writer = csv.DictWriter(csv_file, fieldnames=results[0].keys())
IndexError: list index out of range

But somehow 8_tree_thinning.py, 9_mptp_on_all_trees.py, compare_llhs, and extract_thinned_dataset.py worked on 4 dataset which are FMSAO, SMSAO, FMSAN & SMSAN

For wuhan_placement.py also get some erro

./pipeline/wuhan_placement.py work_dir/2021-11-17_05/smsao/

ERROR: Must run placement stage of pipeline first
Traceback (most recent call last):
  File "/home/vinita/covid19_cme_analysis/./pipeline/wuhan_placement.py", line 18, in <module>
    raise e
  File "/home/vinita/covid19_cme_analysis/./pipeline/wuhan_placement.py", line 15, in <module>
    util.expect_dir_exists( paths.epa_rooting_dir )
  File "/home/vinita/covid19_cme_analysis/scripts/util.py", line 95, in expect_dir_exists
    raise RuntimeError( "Directory doesn't exist: " + dir_path )
RuntimeError: Directory doesn't exist: /home/vinita/covid19_cme_analysis/work_dir/2021-11-17_05/smsao/results/epa_rooting

KIndly help me to understand these issue wether they are interlinked with my data or something which is not present in my data thats why root_digger_lwr.csv is empty in smsao/results/rootdigger_rooting

It would be very great if you could look at these errors and suggest me how I should solve these.

Thank you very much
Vinita

@pierrebarbera
Copy link
Collaborator

Hi Vinita,

could you take a look in work_dir/2021-11-17_05/smsao/runs/epa_runs/0/ and see if theres any error message in the file eval.raxml.log?

Pierre

@vinitamehlawat
Copy link
Author

Hi @Pbdas

I am attaching this eval.raxml.log for your refrence, please have a look

Thank you very much

Vinita
eval.raxml.log

@BenoitMorel
Copy link
Owner

@Pbdas

Maybe we could just add --force in both raxml launcher functions? (here https://github.com/BenoitMorel/covid19_cme_analysis/blob/master/scripts/raxml_launcher.py)

This thread check is not that important in this context anyway

@pierrebarbera
Copy link
Collaborator

Yes I agree, only data with ~30k sites will make it through the filters anyway. I just pushed the change to the master branch, so @vinitamehlawat you should be able to update by running (in the folder of the repository) git update. Let us know if this works.

As for the rootdigger stage, I'm not sure. @computations any idea?

@amkozlov
Copy link
Collaborator

An even better solution would be upgrading to raxml-ng v1.0.x and using --threads auto or --threads auto{4} ;)

@vinitamehlawat
Copy link
Author

vinitamehlawat commented Nov 22, 2021

Hi @Pbdas

Here I am working with my subset data which is consist of 2059 sequences

After git pull I updated this repo and again ran ./pipeline/5_epa_outgroup_rooting.py work_dir/2021-11-19_00/fmsan/

This time I encounter the following error

Traceback (most recent call last):
  File "/home/vinita/covid19_cme_analysis/./pipeline/5_epa_outgroup_rooting.py", line 86, in <module>
    hist_csv_file = placement.gappa_examine_lwr( os.path.join( epa_out_dir, "*/*.jplace" ), result_dir )
  File "/home/vinita/covid19_cme_analysis/scripts/placement.py", line 60, in gappa_examine_lwr
    sub.check_call(cmd)
  File "/home/vinita/miniconda3/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/vinita/covid19_cme_analysis/software/gappa/bin/gappa', 'examine', 'lwr', '--jplace-path', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/34/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/55/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/66/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/18/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/42/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/70/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/2/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/26/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/14/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/53/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/52/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/62/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/8/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/57/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/1/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/56/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/21/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/6/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/5/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/41/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/0/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/13/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/67/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/59/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/10/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/16/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/50/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/64/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/29/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/7/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/43/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/71/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/20/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/37/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/38/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/47/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/51/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/3/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/12/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/44/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/23/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/68/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/24/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/31/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/35/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/4/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/49/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/61/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/17/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/39/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/45/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/36/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/33/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/9/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/40/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/11/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/69/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/48/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/25/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/22/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/28/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/19/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/54/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/60/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/27/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/30/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/46/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/58/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/32/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/15/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/65/epa_result.jplace', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/epa_runs/63/epa_result.jplace', '--no-list-file', '--no-compat-check', '--allow-file-overwriting', '--histogram-bins', '20', '--out-dir', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/results/epa_rooting']' returned non-zero exit status 109.

I also checked the work_dir/2021-11-19_00/fmsan/runs/epa_runs/0/ but this time there is NO error in that file and also I have around 71 folders for my subset data.

I am attaching eval.raxml.log file for your further refrence, Please have a look at this and let me know how I should Proceed.
eval.raxml.log

Thank you very much!
Vinita

@pierrebarbera
Copy link
Collaborator

Hi Vinita,

the log file from RAxML-NG looks good now! The error seems to be with gappa this time. same procedure as last time: git pull, and re-run stage 5. Then, under results/epa_rooting/ there should be a file called gappa_examine_lwr.log that should tell us whats going wrong. (again apologies for the bad error messages)

Pierre

@vinitamehlawat
Copy link
Author

Hi @Pbdas

After git pull I again ran the 5th stage but unfortunately I don't have gappa_examine_lwr.log in results/epa_rooting/ but have one .txt file which is outgroup_check.txt

@pierrebarbera
Copy link
Collaborator

Hi @vinitamehlawat , just letting you know I think I figured out the current issue, and I'm working on a fix

@pierrebarbera
Copy link
Collaborator

pierrebarbera commented Nov 22, 2021

Ok, so I'm 99% sure the issue was that the call to gappa examine lwr was simply too long for the command line to handle (more than 5k characters) due to the number of trees, and the paths being full, non-relative paths. I've made the paths relative to a working directory now, so that should be sufficient to handle it. Please pull and give it a try!

Also, it could be that the next issue will be related to the visualization using R, meaning that it may be necessary to install some packages. Note however that this visualization is not strictly necessary and you can repeat it later by simply calling the script directly, like so:

scripts/lwr_hist.r work_dir/<correct work dir>/<smsao/smsan/... etc>/results/epa_rooting/lwr_histogram.csv work_dir/<correct work dir>/<smsao/smsan/... etc>/results/epa_rooting/lwr_histogram.pdf

(fill in the correct paths)

the necessary R packages are:

ggplot2
readr
tidyr
dplyr
stringr

@vinitamehlawat
Copy link
Author

Hi @Pbdas

After git pull again re-run the 5th step and encounter the following error message

Traceback (most recent call last):
File "/home/vinita/covid19_cme_analysis/./pipeline/5_epa_outgroup_rooting.py", line 86, in
hist_csv_file = placement.gappa_examine_lwr( epa_out_dir, result_dir )
File "/home/vinita/covid19_cme_analysis/scripts/placement.py", line 69, in gappa_examine_lwr
sub.check_call(cmd, cwd=runs_dir, stdout=logfile)
File "/home/vinita/miniconda3/lib/python3.9/subprocess.py", line 373, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/vinita/covid19_cme_analysis/software/gappa/bin/gappa', 'examine', 'lwr', '--jplace-path', '34/epa_result.jplace', '55/epa_result.jplace', '66/epa_result.jplace', '18/epa_result.jplace', '42/epa_result.jplace', '70/epa_result.jplace', '2/epa_result.jplace', '26/epa_result.jplace', '14/epa_result.jplace', '53/epa_result.jplace', '52/epa_result.jplace', '62/epa_result.jplace', '8/epa_result.jplace', '57/epa_result.jplace', '1/epa_result.jplace', '56/epa_result.jplace', '21/epa_result.jplace', '6/epa_result.jplace', '5/epa_result.jplace', '41/epa_result.jplace', '0/epa_result.jplace', '13/epa_result.jplace', '67/epa_result.jplace', '59/epa_result.jplace', '10/epa_result.jplace', '16/epa_result.jplace', '50/epa_result.jplace', '64/epa_result.jplace', '29/epa_result.jplace', '7/epa_result.jplace', '43/epa_result.jplace', '71/epa_result.jplace', '20/epa_result.jplace', '37/epa_result.jplace', '38/epa_result.jplace', '47/epa_result.jplace', '51/epa_result.jplace', '3/epa_result.jplace', '12/epa_result.jplace', '44/epa_result.jplace', '23/epa_result.jplace', '68/epa_result.jplace', '24/epa_result.jplace', '31/epa_result.jplace', '35/epa_result.jplace', '4/epa_result.jplace', '49/epa_result.jplace', '61/epa_result.jplace', '17/epa_result.jplace', '39/epa_result.jplace', '45/epa_result.jplace', '36/epa_result.jplace', '33/epa_result.jplace', '9/epa_result.jplace', '40/epa_result.jplace', '11/epa_result.jplace', '69/epa_result.jplace', '48/epa_result.jplace', '25/epa_result.jplace', '22/epa_result.jplace', '28/epa_result.jplace', '19/epa_result.jplace', '54/epa_result.jplace', '60/epa_result.jplace', '27/epa_result.jplace', '30/epa_result.jplace', '46/epa_result.jplace', '58/epa_result.jplace', '32/epa_result.jplace', '15/epa_result.jplace', '65/epa_result.jplace', '63/epa_result.jplace', '--no-list-file', '--no-compat-check', '--allow-file-overwriting', '--histogram-bins', '20', '--out-dir', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/results/epa_rooting']' returned non-zero exit status 109.

This time in fmsan/results/epa_rooting I have gappa_examine_lwr.log file, which I am attaching for your further look up. Please have a look at this

Thanks
Vinita
gappa_examine_lwr.log

@pierrebarbera
Copy link
Collaborator

Hi Vinita,
before I keep making you try fixes, could you tell us what kind of operating system you're using?

Also please run this command in your terminal and paste the result here:
/bin/sh --version

Pierre

@vinitamehlawat
Copy link
Author

Hi @Pbdas
I am pasting my terminal output, Please have a look

$SHELL --version
GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

/bin/sh =--version
/bin/sh: 0: Illegal option --

@pierrebarbera
Copy link
Collaborator

Hi @vinitamehlawat,

sorry it took a while, but could finally reproduce the issue on my end!

Heres is what you do:
In the terminal, go to software/gappa, then run these commands

make clean
git checkout 7398c1cdf5162fe195c9c9fafe999f15e7d5012b
git submodule update --init --recursive
make -j

Now you can try stage 5 again.

@vinitamehlawat
Copy link
Author

Hi @Pbdas

Thank you , I made changes as per your suggestions. This time this shows error for R-packages like this

Error in library(ggplot2) : there is no package called ‘ggplot2’
Execution halted
Traceback (most recent call last):
File "/home/vinita/covid19_cme_analysis/./pipeline/5_epa_outgroup_rooting.py", line 87, in
placement.ggplot_lwr_histogram( hist_csv_file, result_dir)
File "/home/vinita/covid19_cme_analysis/scripts/placement.py", line 84, in ggplot_lwr_histogram
sub.check_call(cmd)
File "/home/vinita/miniconda3/lib/python3.9/subprocess.py", line 373, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/vinita/covid19_cme_analysis/scripts/lwr_hist.r', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/results/epa_rooting/lwr_histogram.csv', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/results/epa_rooting/lwr_histogram.pdf']' returned non-zero exit status 1.

If this error is regarding to the only R then I will make run as you have mentioned eariler on this thread, scripts/lwr_hist.r work_dir/<correct work dir>/<smsao/smsan/... etc>/results/epa_rooting/lwr_histogram.csv work_dir/<correct work dir>/<smsao/smsan/... etc>/results/epa_rooting/lwr_histogram.pdf

But could you please look at the 6th step error, This time I am also getting the same

./pipeline/6_root_digger_rooting.py work_dir/2021-11-19_00/fmsan/

Traceback (most recent call last):
File "/home/vinita/covid19_cme_analysis/./pipeline/6_root_digger_rooting.py", line 84, in
writer = csv.DictWriter(csv_file, fieldnames=results[0].keys())
IndexError: list index out of range

Thank you very much for your time and effor

Vinita

@BenoitMorel
Copy link
Owner

Dear Vinita,

It looks like I introduced a bug that we haven't detected. It should be fixed now.
Please try to run 'git pull' and to start the analysis again.
Let us know if that fixes the issue

Best,
Benoit

@vinitamehlawat
Copy link
Author

vinitamehlawat commented Nov 25, 2021 via email

@BenoitMorel
Copy link
Owner

BenoitMorel commented Nov 25, 2021 via email

@BenoitMorel
Copy link
Owner

I replied too fast. My fix only fixes step 6. I don't think it depends on step 5. So you should rerun step 6 :-)

@vinitamehlawat
Copy link
Author

Hi @BenoitMorel

After git pull I re-run the 6th script and following error pop up on my terminal. Please have a look

./pipeline/6_root_digger_rooting.py work_dir/2021-11-19_00/fmsan/
running 8 iterations
['mpiexec', '-np', '48', '/home/vinita/covid19_cme_analysis/software/root_digger/bin/rd', '--tree', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/root_digger_runs/0.in.tree', '--msa', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/data/covid_edited.fasta', '--model', 'GTR+FO+G4', '--exhaustive', '--treefile', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/root_digger_runs/0.tree', '--early-stop']
Traceback (most recent call last):
File "/home/vinita/covid19_cme_analysis/./pipeline/6_root_digger_rooting.py", line 64, in
root_digger_launcher.launch_root_digger(tmp_tree_file, alignment, model, outfile,
File "/home/vinita/covid19_cme_analysis/scripts/root_digger_launcher.py", line 21, in launch_root_digger
subprocess.check_call(cmd, stdout = outfile, stderr = outfile)
File "/home/vinita/miniconda3/envs/name_of_my_env/lib/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['mpiexec', '-np', '48', '/home/vinita/covid19_cme_analysis/software/root_digger/bin/rd', '--tree', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/root_digger_runs/0.in.tree', '--msa', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/data/covid_edited.fasta', '--model', 'GTR+FO+G4', '--exhaustive', '--treefile', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/root_digger_runs/0.tree', '--early-stop']' returned non-zero exit status 132.

@vinitamehlawat
Copy link
Author

HI @Pbdas

After installing all R -packages I re-run the 5th script and again got some error, This time I am not sure about this, either bug in your script or it just an error in code. Please have a look (This time I pasted whole output of terminal after running this 5th script)

./pipeline/5_epa_outgroup_rooting.py work_dir/2021-11-19_00/fmsan/

hmmbuild :: profile HMM construction from multiple sequence alignments
HMMER 3.3.2 (Nov 2020); http://hmmer.org/
Copyright (C) 2020 Howard Hughes Medical Institute.
Freely distributed under the BSD open source license.


input alignment file: /home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/data/covid_edited.fasta
output HMM file: reference.hmm
number of worker threads: 48


idx name nseq alen mlen W eff_nseq re/pos description


1 covid_edited 1807 27987 27920 29776 1.70 0.619

CPU time: 15.89u 0.32s 00:00:16.21 Elapsed: 00:00:16.20
INFO Splitting files based on reference: /home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/data/covid_edited.fasta
WARN The query alignment file '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/hmmer_runs/both.afa' appears to have an alignment width that differs from the reference (29966 vs. 27987).
This is likely due to the alignment tool stripping gap-only columns, or adding columns to the reference. Please consider using the produced 'reference.fasta' during placement!`
0 / 71
No protocol specified
No protocol specified
1 / 71
No protocol specified
No protocol specified
.
.
.
.
.
.
71 / 71
No protocol specified
No protocol specified

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

Warning message:
funs() was deprecated in dplyr 0.8.0.
Please use a list of either functions or lambdas:

Simple named list:
list(mean = mean, median = median)

Auto named with tibble::lst():
tibble::lst(mean, median)

Using lambdas
list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
This warning is displayed once every 8 hours.
Call lifecycle::last_lifecycle_warnings() to see where this warning was generated.
Saving 7 x 7 in image
Traceback (most recent call last):
File "/home/vinita/covid19_cme_analysis/./pipeline/5_epa_outgroup_rooting.py", line 95, in
util.copy_dir( d, os.path.join( result_dir, os.path.basename(d) ), [".rba", ".phy", "*.startTree"] )
File "/home/vinita/covid19_cme_analysis/scripts/util.py", line 51, in copy_dir
ign_f = shutil.ignore_patterns(*ignore)
NameError: name 'shutil' is not defined

Thank you
Vinita

@pierrebarbera
Copy link
Collaborator

Hi Vinita,

the part that fails now is just a copy of the result files from runs/epa_runs to results/epa_rooting, so it's very optional. I wouldn't re-run the script just for that. Just know that the files that are not in epa_rooting will be in epa_runs instead. Nevertheless, I just pushed a fix such that it should work correctly next time.

As for the rootdigger error, in the rootdigger_rooting directory, there should be a file called root_digger.log, could you share that one? Then @computations will be able to help I think

Cheers,
Pierre

@vinitamehlawat
Copy link
Author

Hi @Pbdas

I checked results/rootdigger_rooting but there is NO root_digger.log but there is one .csv file root_digger_lwr.csv which is empty.

Thanks
Vinita

@pierrebarbera
Copy link
Collaborator

Hi Vinita,

that makes debugging a lot harder. One thing I can think of right now is that MPI may not be installed on your machine, could it be? you can check by running mpiexec --version

@vinitamehlawat
Copy link
Author

Hi @Pbdas

(base) mpiexec --version
mpiexec (OpenRTE) 4.0.3

Report bugs to http://www.open-mpi.org/community/help/

@computations
Copy link
Collaborator

it is unfortunate that there is no log file, but there are things to try regardless. If you see a file called something.ckp, please upload that here, delete it, then rerun step 6.

@computations
Copy link
Collaborator

Ah, I found it (or at least what I think is going on). I pushed a fix to github for rootdigger. You will need to pull the new version and rebuild the program, and then you should be able to run the script successfully.

@vinitamehlawat
Copy link
Author

Hi @computations

Here I am again little confused, Could you please calrify after git pull only 6th step I should re-run or the whole analysis from ./setup.sh to each script.

@computations
Copy link
Collaborator

Ah, sorry, I should be more clear.

in the top level directory, there should be a directory software/root_digger. cd into that, and run git pull && make -j mpi. This should update and rebuild RootDigger, which will fix the bug. From there you should be able to rerun just step 6.

@vinitamehlawat
Copy link
Author

Hi @computations

I did the same as you have mentioned above and it successfully updated the software but after re-run of 6th script I am getting same error

./pipeline/6_root_digger_rooting.py work_dir/2021-11-19_00/fmsan/
running 8 iterations
['mpiexec', '-np', '48', '/home/vinita/covid19_cme_analysis/software/root_digger/bin/rd', '--tree', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/root_digger_runs/0.in.tree', '--msa', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/data/covid_edited.fasta', '--model', 'GTR+FO+G4', '--exhaustive', '--treefile', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/root_digger_runs/0.tree', '--early-stop']
Traceback (most recent call last):
File "/home/vinita/covid19_cme_analysis/./pipeline/6_root_digger_rooting.py", line 64, in
root_digger_launcher.launch_root_digger(tmp_tree_file, alignment, model, outfile,
File "/home/vinita/covid19_cme_analysis/scripts/root_digger_launcher.py", line 21, in launch_root_digger
subprocess.check_call(cmd, stdout = outfile, stderr = outfile)
File "/home/vinita/miniconda3/envs/name_of_my_env/lib/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['mpiexec', '-np', '48', '/home/vinita/covid19_cme_analysis/software/root_digger/bin/rd', '--tree', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/root_digger_runs/0.in.tree', '--msa', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/data/covid_edited.fasta', '--model', 'GTR+FO+G4', '--exhaustive', '--treefile', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/root_digger_runs/0.tree', '--early-stop']' returned non-zero exit status 1.

@computations
Copy link
Collaborator

Alright, I think I managed to find the problem. There was a change in interface for rootdigger that didn't get updated in this pipeline. I have pushed a change to the pipeline, it should be good to just git pull and run step 6 again.

@vinitamehlawat
Copy link
Author

Hi @computations

I did the git pull and re-run 6th script But this time also same

running 8 iterations
['mpiexec', '-np', '48', '/home/vinita/covid19_cme_analysis/software/root_digger/bin/rd', '--tree', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/root_digger_runs/0.in.tree', '--msa', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/data/covid_edited.fasta', '--model', 'GTR+FO+G4', '--exhaustive', '--prefix', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/root_digger_runs/0', '--early-stop']
Traceback (most recent call last):
File "/home/vinita/covid19_cme_analysis/./pipeline/6_root_digger_rooting.py", line 66, in
root_digger_launcher.launch_root_digger(tmp_tree_file, alignment, model,
File "/home/vinita/covid19_cme_analysis/scripts/root_digger_launcher.py", line 21, in launch_root_digger
subprocess.check_call(cmd, stdout = outfile, stderr = outfile)
File "/home/vinita/miniconda3/envs/name_of_my_env/lib/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['mpiexec', '-np', '48', '/home/vinita/covid19_cme_analysis/software/root_digger/bin/rd', '--tree', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/root_digger_runs/0.in.tree', '--msa', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/data/covid_edited.fasta', '--model', 'GTR+FO+G4', '--exhaustive', '--prefix', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/root_digger_runs/0', '--early-stop']' returned non-zero exit status 134.

@computations
Copy link
Collaborator

computations commented Nov 26, 2021

And there are still no logs at this time in results/rootdigger_rooting? If so, can you just run the command manually like so:

mpiexec -np 48 /home/vinita/covid19_cme_analysis/software/root_digger/bin/rd --tree /home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/root_digger_runs/0.in.tree --msa /home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/data/covid_edited.fasta --model GTR+FO+G4 --exhaustive --prefix /home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/root_digger_runs/0 --early-stop

@vinitamehlawat
Copy link
Author

Hi @computations

Yes, I still don't have log file in results/rootdigger_rooting and I ran manually like you mentioned in above thread.
Please have a look at this attached .txt file, this error I just coiped from terminal after running this command
manul_mpiexec_error.txt

@vinitamehlawat
Copy link
Author

Hi @Pbdas

Thank you very much! Now I am able to ran 5th script without any error and got my outputs for this script.

Again thank you for your time and efforts.

Vinita

@computations
Copy link
Collaborator

It looks like the checkpoints might be corrupted. Remove any files in the runs/root_digger_runs with the .ckp extension and see if that works.

@vinitamehlawat
Copy link
Author

Hi @computations

I removed 0.cpk file from runs/root_digger_runs and re-run the above manual command, Please have a look at attached .txt file

manual_mpiexec_error2.txt

@vinitamehlawat
Copy link
Author

Hi @Pbdas

I apologise for saying that the 5th script worked, but I got outputs for my three datasets fmsan, fmsao, and smsan, but NOT for smsao. For smsao data, I received the following error:

(base) ./pipeline/5_epa_outgroup_rooting.py work_dir/2021-11-19_00/smsao/
0 / 71
No protocol specified
No protocol specified
Traceback (most recent call last):
File "/home/vinita/covid19_cme_analysis/./pipeline/5_epa_outgroup_rooting.py", line 74, in
placement.launch_epa( tree_file, cur_modelfile, ref_msa, query_msa, cur_outdir, thorough=True )
File "/home/vinita/covid19_cme_analysis/scripts/placement.py", line 116, in launch_epa
sub.check_call(cmd, stdout=sub.DEVNULL)
File "/home/vinita/miniconda3/lib/python3.9/subprocess.py", line 373, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/vinita/covid19_cme_analysis/software/epa-ng/bin/epa-ng', '--tree', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/smsao/runs/epa_runs/0/tree.newick', '--model', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/smsao/runs/epa_runs/0/eval.raxml.bestModel', '--msa', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/smsao/data/covid_edited.fasta', '--query', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/smsao/data/covid_outgroups.fasta', '--threads', '48', '--no-heur', '--filter-max', '50', '--filter-acc-lwr', '1.0', '--out-dir', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/smsao/runs/epa_runs/0', '--redo', '--verbose']' returned non-zero exit status 1.

Sorry for bothering you yet again
Vinita

@computations
Copy link
Collaborator

@vinitamehlawat what happens when you remove mpiexec -np 48 and add --threads 48 to the end?

@vinitamehlawat
Copy link
Author

@computations terminal now look like this

(base) /home/vinita/covid19_cme_analysis/software/root_digger/bin/rd --tree /home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/root_digger_runs/0.in.tree --msa /home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/data/covid_edited.fasta --model GTR+FO+G4 --exhaustive --prefix /home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/root_digger_runs/0 --threads 48
No protocol specified
No protocol specified
[0.91] [Warning] Running MPI version with only 1 process,
[0.91] [Warning] Loading options from the checkpoint file
terminate called after throwing an instance of 'checkpoint_read_success_failure'
[balaji:3699665] *** Process received signal ***
[balaji:3699665] Signal: Aborted (6)
[balaji:3699665] Signal code: (-6)
[balaji:3699665] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7fc0bac513c0]
[balaji:3699665] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7fc0baa9018b]
[balaji:3699665] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7fc0baa6f859]
[balaji:3699665] [ 3] /lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911)[0x7fc0baeab911]
[balaji:3699665] [ 4] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c)[0x7fc0baeb738c]
[balaji:3699665] [ 5] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7)[0x7fc0baeb73f7]
[balaji:3699665] [ 6] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9)[0x7fc0baeb76a9]
[balaji:3699665] [ 7] /home/vinita/covid19_cme_analysis/software/root_digger/bin/rd(Z17read_with_successI13cli_options_tEmiRT+0x1ac9)[0x55b8c21d88e9]
[balaji:3699665] [ 8] /home/vinita/covid19_cme_analysis/software/root_digger/bin/rd(_ZN12checkpoint_t12load_optionsER13cli_options_t+0x40)[0x55b8c21d3ac0]
[balaji:3699665] [ 9] /home/vinita/covid19_cme_analysis/software/root_digger/bin/rd(_Z24merge_options_checkpointR13cli_options_tR12checkpoint_t+0x80)[0x55b8c21c9970]
[balaji:3699665] [10] /home/vinita/covid19_cme_analysis/software/root_digger/bin/rd(_Z12wrapped_mainiPPc+0xdd)[0x55b8c21cb4dd]
[balaji:3699665] [11] /home/vinita/covid19_cme_analysis/software/root_digger/bin/rd(main+0x2a)[0x55b8c21c818a]
[balaji:3699665] [12] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fc0baa710b3]
[balaji:3699665] [13] /home/vinita/covid19_cme_analysis/software/root_digger/bin/rd(_start+0x2e)[0x55b8c21c891e]
[balaji:3699665] *** End of error message ***
Aborted (core dumped)

@computations
Copy link
Collaborator

ok, and now try removing the checkpoint file and see what happens with that same command?

@vinitamehlawat
Copy link
Author

@computations :) it worked

it is still running but not sure how much time it will take to compelet

(base) /home/vinita/covid19_cme_analysis/software/root_digger/bin/rd --tree /home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/root_digger_runs/0.in.tree --msa /home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/data/covid_edited.fasta --model GTR+FO+G4 --exhaustive --prefix /home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/root_digger_runs/0 --threads 48
No protocol specified
No protocol specified
[0.85] [Warning] Running MPI version with only 1 process,
[0.85] Running Root Digger
[0.85] Version: v1.7.0-14-g5f23473-mpi
[0.85] Build Commit: 5f234738b7e75848d737092a39155565205aa386
[0.85] Build Date: 2021-11-26 23:10:41
[0.85] Started: 2021-11-27 00:20:21
[0.85] Seed: 4028654047
[0.85] Number of threads per proc: 48
[0.85] Number of procs 1
[0.85] Command: /home/vinita/covid19_cme_analysis/software/root_digger/bin/rd --tree /home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/root_digger_runs/0.in.tree --msa /home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/data/covid_edited.fasta --model GTR+FO+G4 --exhaustive --prefix /home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/fmsan/runs/root_digger_runs/0 --threads 48
[0.85] Please report any bugs to https://groups.google.com/forum/#!forum/raxml
[0.85] [Warning] Ignoring subst matrix GTR for model from command line. Currently only UNREST is supported
[1.59] Starting exhaustive search
[65.11] Step 1 / 3611, ETC: 65.29h
[153.64] Step 2 / 3611, ETC: 77.01h
[225.30] Step 3 / 3611, ETC: 75.27h
[289.90] Step 4 / 3611, ETC: 72.62h
[342.85] Step 5 / 3611, ETC: 68.68h
.
.
.

@computations
Copy link
Collaborator

Thanks for being patient. There is an estimated runtime there, and I find it to be approximately accurate.

One thing to note, this is one of the 8 trees that would have been run. I am going to push a change to the script that fixes this so that you can just run the script. But, this will take a while, you have a very large tree.

@pierrebarbera
Copy link
Collaborator

Hi @Pbdas

I apologise for saying that the 5th script worked, but I got outputs for my three datasets fmsan, fmsao, and smsan, but NOT for smsao. For smsao data, I received the following error:

(base) ./pipeline/5_epa_outgroup_rooting.py work_dir/2021-11-19_00/smsao/ 0 / 71 No protocol specified No protocol specified Traceback (most recent call last): File "/home/vinita/covid19_cme_analysis/./pipeline/5_epa_outgroup_rooting.py", line 74, in placement.launch_epa( tree_file, cur_modelfile, ref_msa, query_msa, cur_outdir, thorough=True ) File "/home/vinita/covid19_cme_analysis/scripts/placement.py", line 116, in launch_epa sub.check_call(cmd, stdout=sub.DEVNULL) File "/home/vinita/miniconda3/lib/python3.9/subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/home/vinita/covid19_cme_analysis/software/epa-ng/bin/epa-ng', '--tree', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/smsao/runs/epa_runs/0/tree.newick', '--model', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/smsao/runs/epa_runs/0/eval.raxml.bestModel', '--msa', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/smsao/data/covid_edited.fasta', '--query', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/smsao/data/covid_outgroups.fasta', '--threads', '48', '--no-heur', '--filter-max', '50', '--filter-acc-lwr', '1.0', '--out-dir', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-19_00/smsao/runs/epa_runs/0', '--redo', '--verbose']' returned non-zero exit status 1.

Sorry for bothering you yet again Vinita

Hi Vinita,

there should be a file called epa_info.log in the runs/epa_runs directory, please share it here.

@vinitamehlawat
Copy link
Author

Hi @Pbdas

Please find attached epa_info.log for smsao data

epa_info.log

Thank you
Vinita

@pierrebarbera
Copy link
Collaborator

pierrebarbera commented Nov 30, 2021

Hi Vinita,

since you mentioned in #4 that for now you're only interested in getting a tree out, I think we can shelve this error for now. Placement is only relevant here if you want to try to see if it could find a better outgroup/root placement of the tree.

Let me know if in the future you want this kind of analysis, then I'll have another look!

Pierre

@vinitamehlawat
Copy link
Author

Hi @Pbdas

Thank you very much, But could you please suggest which scripts exactly I need to run to get thinned tree for my data and also as you mentioned Placement is enough so is its scripts/placement.py or ./pipeline/wuhan_placement.py .

Vinita

@BenoitMorel
Copy link
Owner

Hi Vinita,

I will help you with the thinning. I am updating the wiki page to explain how it should be run, but I need some time to read the code and remember how to use it properly. I will come back to you as soon as possible

Benoit

@pierrebarbera
Copy link
Collaborator

Hi Vinita,

I'm not sure I understand your question about placement, do you want to use it after all? ´scripts/placement.py´ contains functions that are use by pipeline stages. ´pipeline/wuhan_placement.py´ is a separate placement based stage that tries to place the original Wuhan SARS COV2 genome into the tree. If you're just interested in building a tree, you don't need placement (or rootdigger for that amtter) at all.

Pierre

@vinitamehlawat
Copy link
Author

Hi @Pbdas

So thing is that before using your pipeline I was trying to construct a phylogenetic tree with two different softwares like RaXML and IQ-TREE but I didn't get good branch support in both trees, Then I read your paper which I found extremly helpful and followed your pipeline because you clealry mentioned that how difficult it is to do phylogeney for SARS-data with the low number of mutations in sequences.

So My question is, in your pipeline which scripts are useful for my data (like 0_get_data.py, 1_preprocess_data.py 2_pargenes.py ...) so that I can only run those specific script on my sars-data to study phylogeny and get a Phylogenetic tree with good branch support for which I currently struggling.

Hope I am able to deliver my question
Vinita

@pierrebarbera
Copy link
Collaborator

Hi Vinita,

I think the stages should be 0-3 to get the basic trees. Then stage 7 produces statistics about the trees and from that a set of "plausible" trees (these trees should be in a file called ´credible_ml_trees.newick´). From these, two consensus trees are built (MR_consensus_tree.newick and MRE_consensus_tree.newick, majority rule and extended majority rule, respectively). If you load these in a tree viewer, on the nodes it should tell you how high the consensus was for each resolved bifurcation. While that isn't the BS support value, it will give you some idea of how well the tree search could resolve the tree.

Note that a consensus tree (usually) has unresolved nodes, i.e. multifurcations. So you may need to resolve those depending on how you want to further use the tree. However usually these would be resolved randomly, so that doesn't give you any better information. The idea of using the set of plausible trees is that it lets you see the broader picture, which is why what we did here is to do any further steps once for each tree in that set, then look at the set of results and interpret them as a whole.

I hope that makes sense!
Pierre

@BenoitMorel
Copy link
Owner

Hi @vinitamehlawat,

I have updated the wiki page. I have made several minor changes, and added a section here: https://github.com/BenoitMorel/covid19_cme_analysis/wiki#description-of-the-steps

If you want to run tree thinning, please first update your repo (there were a few bugs which are now fixed) and read the new wiki section. Let me know if that's unclear.

Here is another important remark: we do not generate bootstrap trees in the current pipeline, and thus you won't get any support value (which is what interests you most :-)).
But you can change this by setting the following value to 100:

pargenes_bs_trees = 0

After this, you need to restart from step 2, pargenes.

But most importantly: now that I know what you want to achieve (and I am sorry to say this after all the painful debugging that you had to go through), I am not sure that this pipeline will help you more than just running raxml/iqtree. To get the support values, all we do is to run raxml-ng (all the other steps had some sense for our study, but not for your purpose).
Maybe the tree thinning could help you a bit to prune noisy/redundant sequences, but from our experience (see our paper), it didn't improve the support that much...
In general, getting a well-supported tree for covid data is very difficult because the sequences are too similar, which is a fundamental problem that might not have any practical solution... (according to us, at least).

Best,
Benoit

@vinitamehlawat
Copy link
Author

Hi @BenoitMorel

Thank you so much for well explained clarification, I will definately try [https://github.com/BenoitMorel/covid19_cme_analysis/blob/30245cf877552d0151ba73290042cd2d8e0eb7e4/scripts/common.py#L378] and tree thining . I apologise as well, @Pbdas @computations , for all of your efforts in debugging different scripts.

Vinita

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants