Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command-line version of AF2-Multimer? #34

Closed
gundalav opened this issue Nov 8, 2021 · 21 comments
Closed

Command-line version of AF2-Multimer? #34

gundalav opened this issue Nov 8, 2021 · 21 comments

Comments

@gundalav
Copy link

gundalav commented Nov 8, 2021

Hi Yoshitaka-san,

As you mentioned AF2-Multimer seem to be available now.
Is there a way we can port it to command line version?
Just like your great AF2_advanced.

G.V.

@YoshitakaMo
Copy link
Owner

AF2-Multimer runs on command-line interface. How about installing it from DeepMind repository? It's very nice!

@xvazquezc
Copy link

The folks from Colabfold are already working on incorporating it into their code to work with MMSeqs2
https://twitter.com/thesteinegger/status/1455768659717095427
It should be quite straightforward to include it here once they release it.

@YoshitakaMo
Copy link
Owner

YoshitakaMo commented Nov 10, 2021

Yes, I notice it, and the ColabFold team including me is now testing the new version in the background. The new Google Colab notebook of ColabFold will be released in a few days. I'll also port it into this repository.

@YoshitakaMo
Copy link
Owner

The new notebook compatible with AF2-Multimer is now public. I'll port it into this repository, but please wait for a few days until I'm free.

Alternatively, Colabfold now has a colabfold_batch command to run it locally. Read more about it here: https://github.com/sokrypton/ColabFold#running-locally

@gitgyj
Copy link

gitgyj commented Nov 14, 2021

Thanks for the work, I am wondering whether the upcoming AF2-Multimer code that is compatible with local machine would be using Jackhmmer?

@ShannonTown
Copy link

Alternatively, Colabfold now has a colabfold_batch command to run it locally. Read more about it here: https://github.com/sokrypton/ColabFold#running-locally

Hi, I was wondering if colabfold_batch can run a fasta file with complexes (multiple sequences)? I tried and it only returned the structure for the first sequence.

@konstin
Copy link

konstin commented Nov 19, 2021

@ShannonTown Yes, since yesterday actually: sokrypton/ColabFold#88 (for older versions, you can use the csv input)

@ShannonTown
Copy link

Thank you. I replaced the batch.py file in my virtual environment, "Python3Env/lib/python3.7/site-packages/colabfold/batch.py", and ran colabfold_batch again, but still got the same result. Is there anything else I need to do to update it? (I'm new to linux and all of these, appreciate your help!)

If I were to try csv input, how should I make the csv files? I couldn't find an example.

@konstin
Copy link

konstin commented Nov 19, 2021

I recommend pip uninstall -y colabfold && pip install git+https://github.com/sokrypton/ColabFold over manually changing files. For the input format, you need to put the whole complex in one line with colons (:) between the chains

@ShannonTown
Copy link

ShannonTown commented Nov 19, 2021

Hi, I reinstalled colabfold and put the whole complex in one line with colons between chains, and got the following error:

If you require more MSAs, please host your own API and pass it to --host-url
2021-11-19 14:21:07,419 Found 5 citations for tools or databases
2021-11-19 14:21:15,490 Query 1/1: Hemoglobin (length 581)
2021-11-19 14:21:15,584 Could not predict Hemoglobin: Invalid character in the sequence: :
Traceback (most recent call last):
File "/alpha/Python3Env/lib/python3.7/site-packages/colabfold/batch.py", line 624, in run
sequence=sequence, description="none", num_res=len(sequence)
File "/alpha/Python3Env/lib/python3.7/site-packages/alphafold/data/pipeline.py", line 43, in make_sequence_features
map_unknown_to_x=True)
File "/alpha/Python3Env/lib/python3.7/site-packages/alphafold/common/residue_constants.py", line 580, in sequence_to_onehot
raise ValueError(f'Invalid character in the sequence: {aa_type}')
ValueError: Invalid character in the sequence: :
Traceback (most recent call last):
File "/alpha/Python3Env/bin/colabfold_batch", line 8, in
sys.exit(main())
File "/alpha/Python3Env/lib/python3.7/site-packages/colabfold/batch.py", line 862, in main
recompile_all_models=args.recompile_all_models,
File "/alpha/Python3Env/lib/python3.7/site-packages/colabfold/batch.py", line 648, in run
np_example = features_for_chain[protein.PDB_CHAIN_IDS[0]]
KeyError: 'A'

Here's what I put in the fasta file, there are 4 chains and ~142 AA each:

'>sp|P69905|HBA_HUMAN Hemoglobin subunit alpha OS=Homo sapiens OX=9606 GN=HBA1 PE=1 SV=2
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR:MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH:MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR:MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH'

@ShannonTown
Copy link

I put the 4 chains in a .csv file with 2 columns, one named "id", the other named "sequence", and colabfold_batch returned the following error:

Traceback (most recent call last):
File "/alpha/Python3Env/bin/colabfold_batch", line 8, in
sys.exit(main())
File "/alpha/Python3Env/lib/python3.7/site-packages/colabfold/batch.py", line 835, in main
queries, is_complex = get_queries(args.input, args.sort_queries_by)
File "/alpha/Python3Env/lib/python3.7/site-packages/colabfold/batch.py", line 319, in get_queries
(seqs, header) = pipeline.parsers.parse_fasta(file.read_text())
File "/alpha/Python3Env/lib/python3.7/site-packages/alphafold/data/parsers.py", line 89, in parse_fasta
sequences[index] += line
IndexError: list index out of range

Thank you!

@CYP152N1
Copy link

Did The .fasta file contain ">" at the beginning of 1st line?

https://zhanggroup.org/FASTA/

@ShannonTown
Copy link

Yes, it contains a ">" at the first line, somehow the ">" was interpreted as a sign for quote

@CYP152N1
Copy link

Certainly. I got it.

@hz424
Copy link

hz424 commented Nov 22, 2021

@ShannonTown I ran into the same issue/error you had. Have you figured out a way to fix this? Thanks!!

@ShannonTown
Copy link

@hz424 I managed to modify the AlphaFold2_mmseqs2 notebook to run locally in Jupyter, and it worked for multimers when I put colons between chains.

Still not sure why in the command line colabfold_batch doesn't work. may it it has something to do with how the queries are imported.

@konstin
Copy link

konstin commented Nov 24, 2021

@ShannonTown Could it be your case is also sokrypton/ColabFold#97 (comment)?

For running it locally, what modification did you have to make, and did see you the local runtimes option in colab?

@ShannonTown
Copy link

ShannonTown commented Nov 24, 2021

@konstin

For running it locally, what modification did you have to make, and did see you the local runtimes option in colab?

I didn't use the local runtime options in colab because I was not aware of this before. I included my local virtual environment as sys.path, which has colabfold installed, and ran the program locally in JupyterHub.
pkgdir = '/xxxxxxxx/lib/python3.7/site-packages/'
sys.path.insert(0, pkgdir)

@hz424 I looked at sokrypton/ColabFold#97 (comment) and batch.py, I think the problem is that the function def get_queries only takes fasta files and does not split sequences when the input path is a directory, as instructed by the README:
colabfold_batch <directory_with_fasta_files> <result_dir>

If we use the file path as input, it should work with a fasta file if you put colon between chains (seems that it still would not work if you use '<fasta name' to separate different sequences, have to use colon.):
colabfold_batch /xxxxxxxxx/xxxxx.fasta <result_dir>
or colabfold_batch /xxxxxxxxx/xxxxx.csv <result_dir>

I think right now the batch mode only works for a directory of fasta files each with a single chain, but doesn't work for a directory of fasta files each with multiple chains.

PS, @konstin , in line 181-187 of batch.py, the def predict_structure function put the start time after the prediction step, so every time the output prediction time is 0.0 sec.

@andzajan
Copy link

Hi @ShannonTown, I know discussion is quite old now, but the proper format for csv import should be like this. batch.py data import function is looking for column names id and sequence.

Fasta file or multiple fasta files in the same folder won't work with --model-type AlphaFold2-multimer.

id,sequence
Complex,VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR:MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH:MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR:MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

@jenchem
Copy link

jenchem commented Aug 4, 2022

Hi @ShannonTown, I know discussion is quite old now, but the proper format for csv import should be like this. batch.py data import function is looking for column names id and sequence.

Fasta file or multiple fasta files in the same folder won't work with --model-type AlphaFold2-multimer.

id,sequence
Complex,VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR:MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH:MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR:MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

So for AF multiuser v2 I need to create a sequence file named 1abc.csv with the following in this format:

id,sequence
Complex,QWERTY : ASDFGH

@ShannonTown
Copy link

ShannonTown commented Aug 4, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants