Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_all.py taking forever #14

Open
nitinra opened this issue Feb 8, 2023 · 3 comments
Open

run_all.py taking forever #14

nitinra opened this issue Feb 8, 2023 · 3 comments

Comments

@nitinra
Copy link

nitinra commented Feb 8, 2023

Hello,

I am running swiftortho for 276 insect species. I used the following command

python run_all.py -i allprotein.fa -a 20

I started the run on Jan 30th and it's still running the first step (all-vs-all homology search). Is there anyway I can make it run faster?

Regards,
Nitin

@Rinoahu
Copy link
Owner

Rinoahu commented Mar 7, 2023

How many protein sequences are in the file? Are they the same species?

@nitinra
Copy link
Author

nitinra commented Mar 7, 2023

Hello,

The insects species span across the entire insect clade, so 276 different species and the # of sequences in my fasta file is ~6839337 sequences (in the combined fasta file).

How do I make it run faster? thank you!

Regards,
Nitin

@Rinoahu
Copy link
Owner

Rinoahu commented Mar 8, 2023

  1. delete the old version

  2. git clone the latest one

  3. install according to the instructions

  4. Run the command:
    python run_all.py -i allprotein.fa -a 20 -s 11111111 -v 500

  5. You can also try the new tool if the protein sequences have a lot of redundancy
    python run_all_fast.py -i allprotein.fa -a 20 -s 11111111 -v 500

Generally, increasing the seed length or reducing the number of hit at homolog search can make it run faster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants