Skip to content

cddlab/alphafold3_tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

alphafold3_tools

Toolkit for alphafold3 input generation and output analysis

Python Version License

Installation

Requirements:

  • Python 3.10 or later

brew install python@3.12 with Homebrew is useful to install python3.12 on macOS.

Note

If you are using python3.12, create and activate venv at first. /path/to/workingdirectory is your working directory.

mkdir -p /path/to/workingdirectory ; cd /path/to/workingdirectory
python3.12 -m venv .venv
source .venv/bin/activate
# Ubuntu 22.04 uses python3.10 by default. Use python3.12 if you have it instead of python3.10.
# install from GitHub
python3 -m pip install git+https://github.com/cddlab/alphafold3_tools.git
# upgrade
python3 -m pip uninstall alphafold3_tools -y && python3 -m pip install --upgrade git+https://github.com/cddlab/alphafold3_tools.git

On Ubuntu, the commands will be installed in ~/.local/bin or in the .venv directory (e.g. /path/to/workingdirectory/.venv/bin). You may need to add this directory to your PATH environment variable.

export PATH=$PATH:~/.local/bin

Usage

More detailed usage information can be found by running the commands with the -h option. The version information will be displayed with the -v option.

msatojson

msatojson is a command to convert an a3m-formatted multiple sequence alignment (MSA) file to JSON format. The input name can be specified with the -n option.

msatojson -i input.a3m -o input.json -n inputname

The input a3m MSA file can be generated by MMSeqs2 webserver (or ColabFold). colabfold_batch --msa-only option is useful to generate a3m MSA files only.

msatojson can accept a directory containing multiple a3m files. In this case, the output JSON files will be saved in the specified output directory.

msatojson -i /path/to/a3m_containing/directory -o /path/to/output/directory

From version 0.2.0, templates can be also added to the output JSON file. Use the --include_templates option to include templates. The directory path /path/to/mmcif_files containing mmCIF files and the corresponding pdb_seqres.txt file must be specified with the --pdb_database_path and --seqres_database_path options, respectively. The --max_template_date option can be used to set the maximum template date.

msatojson -i input.a3m -o output.json --include_templates --pdb_database_path /path/to/mmcif_files --seqres_database_path /path/to/pdb_seqres.txt --max_template_date 2099-09-30

Note

  • This feature requires HMMER 3 or later to be installed and accessible in your PATH. For macOS users, you can install HMMER via Homebrew:
brew install hmmer
  • --hmmbuild_binary_path and --hmmsearch_binary_path options can be used to specify the paths to the hmmbuild and hmmsearch binaries, respectively, if they are not in your PATH.
  • --save_hmmsto option can be used to save HMMER's intermediate file.
  • The pdb_seqres.txt file can be downloaded from wwPDB. The file size is about 356 MB (as of Dec. 2025).

fastatojson

fastatojson is a command to convert a FASTA file to JSON format compatible with AlphaFold3.

fastatojson -i input.fasta [-s 1 2 3 ...] [-d]
  • -i: Input FASTA file. Mandatory.
  • -s: Model seeds to be used. Optional. Default is 1. Multiple seeds can be specified.
  • -d: Debug mode. Optional. If specified, the command will print debug information.

For example, if you have a FASTA file containing two sequences, input.fasta:

>P12345
KAKDLSKCLS
>Q67890
KADFILCSLK
>I23L45_I3PLS2
LAKDCL:KKALS

You will obtain three JSON files, p12345.json, q67890.json, and i23l45_i3pls2.json. The last one contains two sequences, LAKDCL and KKALS, which are separated by a colon (:). The output JSON files will look like this:

{
  "name": "i23l45_i3pls2",
  "dialect": "alphafold3",
  "version": 1,
  "sequences": [
    {
      "protein": {
        "id": ["A"],
        "sequence": "LAKDCL"
      },
      "protein": {
        "id": ["B"],
        "sequence": "KKALS"
      }
    }
  ],
  "modelSeeds": [1],
}

paeplot

paeplot is a command to plot the predicted aligned error (PAE). The color map can be specified with the -c option. The default color map is bwr (ColabFold-like), but Greens_r is also available for AlphaFold Structure Database (AFDB)-like coloring.

paeplot -i /path/to/alphafold3_output/directory [-c {Greens_r,bwr}] [--dpi 300] [-n foo] [-f {png,svg}] [-a] [-t "PAE Plot"] [--chain-cmap {pymol,unhcr,<matplotlib_colormap_name>}]

greensr bwr

arguments:

  • -i: Input directory containing the AlphaFold3 output files. Mandatory.
  • -c: Color map for the PAE plot. Optional. Default is bwr. Choose either Greens_r or bwr.
  • --dpi: DPI of the output image. Optional. Default is 100, but 300 is recommended for publication-quality images.
  • -n: Name prefix for the output image file. Optional.
  • -f: Output image file format. Optional. Choose either png or svg. Default is png.
  • -a: If specified, the plot will include all models in the output directory.
  • -t: Title of the plot. Optional.
  • --chain-cmap: Color map for chain coloring on top and right. Optional. Choose either pymol, unhcr, or any valid matplotlib colormap name. (e.g. tab20) Default is pymol.

superpose_ciffiles

superpose_ciffiles is a command to superpose the output mmCIF files. The command creates a multi-model mmCIF file containing all the predicted model.cif subdirectories. The output file name can be specified with the -o option. By default, the output file will be saved as foo_superposed.cif in the input directory. -c option can be used to specify the chain ID to be superposed.

superpose_ciffiles -i /path/to/alphafold3_output/directory [-o /path/to/output/directory/foo_superposed.cif] [-c A]

In PyMOL, the following command will be useful to visualize the plDDT values.

color 0x0053D6, b < 100
color 0x65CBF3, b < 90
color 0xFFDB13, b < 70
color 0xFF7D45, b < 50
util.cnc

plddt

Note

To visualize only an object of seed-1_sample-0 with plddt values, type the following command in PyMOL.

color 0x0053D6, seed-1_sample-0 and b < 100
color 0x65CBF3, seed-1_sample-0 and b < 90
color 0xFFDB13, seed-1_sample-0 and b < 70
color 0xFF7D45, seed-1_sample-0 and b < 50

sdftoccd

sdftoccd is a command to convert sdf file to ccd format. Please refer to the AlphaFold3's input document for the detail of User-provided CCD format.

sdftoccd -i input.sdf -o userccd.cif -n STR

modjson

modjson is a command to modify an existing AlphaFold3 input json file. This tool is useful to add/modify the ligand entities and User-provided CCD string in an input json file.

modjson -i input.json -o output.json [-n jobname] [-p] \
       [-a smiles "CCOCCC" 1 -a ccdCodes PRD 2] \
       [-u userccd1.cif userccd2.cif]
  • -i: Input json file. Mandatory.
  • -o: Output json file. Mandatory.
  • -n: Job name. Optional. Sets the job name in the input JSON file.
  • -p: Purge all ligand entities from the input JSON file at first.
  • -a: Add ligand to the input JSON file. Provide 'ligand type', 'ligand name', and 'number of the ligand molecule'. The 'ligand type' must be either 'smiles' or 'ccdCodes'. Multiple ligands can be added.
    • Example: -a smiles "CCOCCC" 1 -a ccdCodes PRD 2 -a ...
  • -u: Add user provided ccdCodes to the input JSON file. Multiple files can be provided.
    • Example: -u userccd1.cif userccd2.cif

Note

A *_data.json file in the AlphaFold3's output directory can be also used as an input JSON file of modjson.

jsontomsa

jsontomsa is a command to extract MSA from the AlphaFold3 input JSON file. The output file name can be specified with the -o option.

jsontomsa -i /path/to/alphafold3_data.json -o /path/to/out.a3m

Other tools are being developed and will be added.

Acknowledgements

This tool uses the following libraries:

PDBeurope/ccdutils is used for the conversion of sdf to ccd.

About

Toolkit for alphafold3 input and output files

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages