Error when generating features with `feature_mode='multimer'` #16

guzmanfj · 2023-04-06T12:27:52Z

I want to generate features for a protein complex with a modified version of the example/run_fea_gen.sh script:

#!/bin/bash
# An example script of feature generation. This heavily depenedent on your installation,
# due to many third-party tools and multiple sequence libraries.
#
# You need to take care of these paths, python environment, and third-party sequence tools.
#. load_alphafold  ## set up proper AlphaFold conda environment.

DATA_DIR=/ibex/ai/reference/KSL/alphafold/2.3.1
af_dir=../src

if [ $# -eq 0 ]
  then
    echo "Usage: $0 <seq_file>"
    exit 1
fi
fasta_path=$1
out_dir=af2c_fea_test

# choices are "reduced_dbs", "full_dbs", "uniprot"
db_preset='full_dbs'

# choices are "monomer, multimer, monomer+species, monomer+fullpdb"
# Option "monomer" and "multimer" follows alphafold official datapipeline for monomeric and
# multimeric structure predictions, respectively.
#
# Option "monomer+species" is a modified monomeric pipeline such as the species information
# is recorded for MSA pairing using only monomeric input features. This option is recommended.
#feature_mode='monomer+species'
#
# Option "monomer+fullpdb": in addition to add species, it uses template pipeline for multimer
# rather the template pipeline for the original monomer modeling. The mulitmer template pipeline
# search full PDB for templates, which is more comprehensive than the monomer template pipeline.
# feature_mode='monomer+fullpdb'
feature_mode='multimer'

#max_template_date=2020-05-15  # CASP14 starting date
max_template_date=$(date +"%Y-%m-%d")  # current date


echo "Info: sequence file is $fasta_path"
echo "Info: out_dir is $out_dir"
echo "Info: db_preset is $db_preset"
echo "Info: feature mode is $feature_mode"
echo "Info: max_template_date is $max_template_date"


##########################################################################################


python $af_dir/run_af2c_fea.py --fasta_paths=$fasta_path --db_preset=$db_preset \
  --data_dir=$DATA_DIR --output_dir=$out_dir      \
  --uniprot_database_path=$DATA_DIR/uniprot/uniprot.fasta \
  --uniref90_database_path=$DATA_DIR/uniref90/uniref90.fasta \
  --mgnify_database_path=$DATA_DIR/mgnify/mgy_clusters_2022_05.fa \
  --pdb_seqres_database_path=$DATA_DIR/pdb_seqres/pdb_seqres.txt \
  --bfd_database_path=$DATA_DIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
  --uniclust30_database_path=$DATA_DIR/uniref30/UniRef30_2022_02 \
  --template_mmcif_dir=$DATA_DIR/pdb_mmcif/mmcif_files  \
  --max_template_date=$max_template_date                 \
  --obsolete_pdbs_path=$DATA_DIR/pdb_mmcif/obsolete.dat \
  --feature_mode=$feature_mode \
  --use_precomputed_msas=True

When running the script I obtain the following error:

$ ./run_fea_gen_mod.sh Q9S3U9-6.fasta
Info: sequence file is Q9S3U9-6.fasta
Info: out_dir is af2c_fea_test
Info: db_preset is full_dbs
Info: feature mode is multimer
Info: max_template_date is 2023-03-25
add_species is False
I0325 16:32:42.717077 47109242920640 templates.py:857] Using precomputed obsolete pdbs /ibex/ai/reference/KSL/alphafold/2.3.1/pdb_mmcif/obsolete.dat.
I0325 16:32:42.721372 47109242920640 run_af2c_fea.py:282] Using random seed 372986757380479995 for the data pipeline
Info: working on target Q9S3U9-6 at gpu202-23-l
I0325 16:32:42.721538 47109242920640 run_af2c_fea.py:144] Predicting Q9S3U9-6
I0325 16:32:42.726290 47109242920640 pipeline_multimer.py:287] Running monomer pipeline on chain A: sp|Q9S3U9|VIOC_CHRVO
I0325 16:32:42.726786 47109242920640 jackhmmer.py:133] Launching subprocess "/ibex/sw/csg/alphafold/2.3.1/el7.9_conda/miniconda3/envs/alphafold_2.3.1/bin/jackhmmer -o /dev/null -A /tmp/tmp5q6it5mi/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmpvq51lmhm.fasta /ibex/ai/reference/KSL/alphafold/2.3.1/uniref90/uniref90.fasta"
I0325 16:32:42.730009 47109242920640 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0325 16:37:28.661425 47109242920640 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 285.931 seconds
I0325 16:37:28.665437 47109242920640 jackhmmer.py:133] Launching subprocess "/ibex/sw/csg/alphafold/2.3.1/el7.9_conda/miniconda3/envs/alphafold_2.3.1/bin/jackhmmer -o /dev/null -A /tmp/tmpbc32gpxf/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmpvq51lmhm.fasta /ibex/ai/reference/KSL/alphafold/2.3.1/mgnify/mgy_clusters_2022_05.fa"
I0325 16:37:28.670499 47109242920640 utils.py:36] Started Jackhmmer (mgy_clusters_2022_05.fa) query
I0325 16:47:29.123045 47109242920640 utils.py:40] Finished Jackhmmer (mgy_clusters_2022_05.fa) query in 600.452 seconds
I0325 16:47:29.134068 47109242920640 hmmbuild.py:121] Launching subprocess ['/ibex/sw/csg/alphafold/2.3.1/el7.9_conda/miniconda3/envs/alphafold_2.3.1/bin/hmmbuild', '--hand', '--amino', '/tmp/tmpe_2th29r/output.hmm', '/tmp/tmpe_2th29r/query.msa']
I0325 16:47:29.147607 47109242920640 utils.py:36] Started hmmbuild query
I0325 16:47:29.319181 47109242920640 hmmbuild.py:128] hmmbuild stdout:
# hmmbuild :: profile HMM construction from multiple sequence alignments
# HMMER 3.3.2 (Nov 2020); http://hmmer.org/
# Copyright (C) 2020 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# input alignment file:             /tmp/tmpe_2th29r/query.msa
# output HMM file:                  /tmp/tmpe_2th29r/output.hmm
# input alignment is asserted as:   protein
# model architecture construction:  hand-specified by RF annotation
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# idx name                  nseq  alen  mlen eff_nseq re/pos description
#---- -------------------- ----- ----- ----- -------- ------ -----------
1     query                  505   156   120     3.48  0.590

# CPU time: 0.15u 0.00s 00:00:00.15 Elapsed: 00:00:00.15


stderr:


I0325 16:47:29.319365 47109242920640 utils.py:40] Finished hmmbuild query in 0.172 seconds
I0325 16:47:29.319745 47109242920640 hmmsearch.py:103] Launching sub-process ['/ibex/sw/csg/alphafold/2.3.1/el7.9_conda/miniconda3/envs/alphafold_2.3.1/bin/hmmsearch', '--noali', '--cpu', '8', '--F1', '0.1', '--F2', '0.1', '--F3', '0.1', '--incE', '100', '-E', '100', '--domE', '100', '--incdomE', '100', '-A', '/tmp/tmpzilc_m4o/output.sto', '/tmp/tmpzilc_m4o/query.hmm', '/ibex/ai/reference/KSL/alphafold/2.3.1/pdb_seqres/pdb_seqres.txt']
I0325 16:47:29.331137 47109242920640 utils.py:36] Started hmmsearch (pdb_seqres.txt) query
I0325 16:47:38.230762 47109242920640 utils.py:40] Finished hmmsearch (pdb_seqres.txt) query in 8.899 seconds
Traceback (most recent call last):
  File "../src/run_af2c_fea.py", line 309, in <module>
    app.run(main)
  File "/ibex/sw/csg/alphafold/2.3.1/el7.9_conda/miniconda3/envs/alphafold_2.3.1/lib/python3.8/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/ibex/sw/csg/alphafold/2.3.1/el7.9_conda/miniconda3/envs/alphafold_2.3.1/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "../src/run_af2c_fea.py", line 289, in main
    predict_structure(
  File "../src/run_af2c_fea.py", line 155, in predict_structure
    feature_dict = data_pipeline.process(
  File "/ibex/user/guzmanfj/af2complex/src/alphafold/data/pipeline_multimer.py", line 341, in process
    chain_features = self._process_single_chain(
  File "/ibex/user/guzmanfj/af2complex/src/alphafold/data/pipeline_multimer.py", line 289, in _process_single_chain
    chain_features = self._monomer_data_pipeline.process(
  File "/ibex/user/guzmanfj/af2complex/src/alphafold/data/pipeline.py", line 238, in process
    msa_runner=self.hhblits_bfd_uniref_runner,
AttributeError: 'DataPipeline' object has no attribute 'hhblits_bfd_uniref_runner'

These are the contents of the Q9S3U9-6.fasta input file:

>sp|Q9S3U9|VIOC_CHRVO
MKRAIIVGGGLAGGLTAIYLAKRGYEVHVVEKRGDPLRDLSSYVDVVSSRAIGVSMTVRG
IKSVLAAGIPRAELDACGEPIVAMAFSVGGQYRMRELKPLEDFRPLSLNRAAFQKLLNKY
>sp|Q9S3U9|VIOC_CHRVO
MKRAIIVGGGLAGGLTAIYLAKRGYEVHVVEKRGDPLRDLSSYVDVVSSRAIGVSMTVRG
IKSVLAAGIPRAELDACGEPIVAMAFSVGGQYRMRELKPLEDFRPLSLNRAAFQKLLNKY

The text was updated successfully, but these errors were encountered:

FreshAirTonight · 2023-04-07T20:03:48Z

Thank you for reporting this bug. It was caused by renaming of a variable that affects MSA search on the UniProt ref30 library. I pushed in a fix. Please give it a try.

guzmanfj · 2023-04-09T07:32:36Z

It seems to work now, it produced the features.pkl file. Thank you for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when generating features with `feature_mode='multimer'` #16

Error when generating features with `feature_mode='multimer'` #16

guzmanfj commented Apr 6, 2023

FreshAirTonight commented Apr 7, 2023

guzmanfj commented Apr 9, 2023

Error when generating features with feature_mode='multimer' #16

Error when generating features with feature_mode='multimer' #16

Comments

guzmanfj commented Apr 6, 2023

FreshAirTonight commented Apr 7, 2023

guzmanfj commented Apr 9, 2023

Error when generating features with `feature_mode='multimer'` #16

Error when generating features with `feature_mode='multimer'` #16