You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+9-1Lines changed: 9 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@ The following script demonstrates how to provide inputs to the model, and obtain
29
29
python examples/predict_structure.py
30
30
```
31
31
32
-
For more advanced use cases, we also expose the `chai_lab.chai1.run_folding_on_context`, which allows users to construct an `AllAtomFeatureContext` manually. This allows users to specify their own templates, MSAs, embeddings, and constraints. We currently provide an example of how to construct an embeddings context, and will be releasing helper methods to build MSA and templates contexts soon.
32
+
For more advanced use cases, we also expose the `chai_lab.chai1.run_folding_on_context`, which allows users to construct an `AllAtomFeatureContext` manually. This allows users to specify their own templates, MSAs, embeddings, and constraints. We currently provide an example of how to construct an embeddings context as well as an MSA context, and will be releasing helper methods to build template contexts soon.
33
33
34
34
<details>
35
35
<summary>Where are downloaded weights stored?</summary>
<summary>How can MSAs be provided to Chai-1?</summary>
48
+
<pmarkdown="1">
49
+
50
+
Chai-1 supports MSAs provided as an `aligned.pqt` file. This file format is similar to an `a3m` file, but has additional columns that provide metadata like the source database and sequence pairing keys. We provide code to convert `a3m` files to `aligned.pqt` files. For more information on how to provide MSAs to Chai-1, see [this documentation](examples/msas/README.md).
Copy file name to clipboardExpand all lines: examples/msas/README.md
+15-3Lines changed: 15 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Adding MSA evolutionary information
2
2
3
-
While Chai-1 performs very well in "single-sequence mode," it can also be given additional evolutionary information to further improve performance. As in other folding methods, this evolutionary information is provided in the form of a multiple sequence alignment (MSA).
3
+
While Chai-1 performs very well in "single-sequence mode," it can also be given additional evolutionary information to further improve performance. As in other folding methods, this evolutionary information is provided in the form of a multiple sequence alignment (MSA). This information is given in the form of a `MSAContext` object (see `chai_lab/data/dataset/msas/msa_context.py`); we provide code for building these `MSAContext` objects through `aligned.pqt` files, though you can play with building out an `MSAContext` yourself as well.
4
4
5
5
## The `.aligned.pqt` file format
6
6
@@ -24,24 +24,36 @@ See the following for a toy example of what this table might look like:
24
24
| RKSES... | uniprot | Mus musculus | A mouse sequence from uniprot |
25
25
| ... |
26
26
27
-
We additionally provide code to parse `a3m` files into this format; see `merge_multi_a3m_to_aligned_dataframe` in `chai_lab/data/parsing/msas/aligned_pqt.py`.
27
+
We additionally provide code to parse `a3m` files into this format; see `merge_multi_a3m_to_aligned_dataframe` in `chai_lab/data/parsing/msas/aligned_pqt.py`. This file can also be run as a commandline script to run ; run `python chai_lab/data/parsing/msas/aligned_pqt.py --help` for details.
28
28
29
29
### TLDR
30
30
31
31
Chai-1 uses `.aligned.pqt` files to specify MSAs. These are similar to `a3m` with added columns for source database and pairing key to pair MSAs across different chains. Each `.aligned.pqt` file contains all MSAs for a single query sequence.
32
32
33
33
## From `.aligned.pqt` to `MSAContext`
34
34
35
-
By default, the `run_inference` example inference code we provide assumes that all MSAs required for a prediction are stored in a specified folder. Each file corresponds to the all alignments for a given sequence, and filenames are specified by the hash of their sequence (this filename is inferred using code in `chai_lab/data/parsing/msas/aligned_pqt.py`). During inference, the script tries to find `<HASH>.aligned.pqt` files in that folder (one file per unique chain sequence) and loads in a `MSAContext` for each MSA it can find; see `chai_lab/data/dataset/msas/load.py` for details.
35
+
By default, the `run_inference` example inference code we provide assumes that all MSAs required for a prediction are stored in a specified folder. Each `.aligned.pqt`file in that folder corresponds to the all MSA alignments for a given sequence (spanning several databases), and filenames are specified by the hash of their sequence (this filename is inferred using code in `chai_lab/data/parsing/msas/aligned_pqt.py`). During inference, the script tries to find `<HASH>.aligned.pqt` files in that folder (one file per unique chain sequence) and loads in a `MSAContext` for each MSA it can find. The code then performs some basic preprocessing such as pairing MSAs by their given `pairing_key` and merging MSAs across chains; see `chai_lab/data/dataset/msas/load.py` for details.
36
36
37
37
## Putting it all together
38
38
39
39
To demonstrate how these pieces tie together, we provide `aligned.pqt` files containing MSAs for the example in `examples/predict_structure.py` under the `examples/msas` folder. Inference can be run using these example MSAs by providing the path to this folder as an additional argument to `run_inference` as follows:
40
40
41
41
```python
42
+
from pathlib import Path
43
+
...
44
+
42
45
candidates = run_inference(
43
46
...
44
47
msa_directory=Path("examples/msas"),
45
48
...
46
49
)
47
50
```
51
+
52
+
You can also manually inspect the example `aligned.pqt` files by loading them as pandas dataframes as follows:
0 commit comments