Skip to content

Commit 4fbd82a

Browse files
authored
update readme (#121)
1 parent 606d7ff commit 4fbd82a

File tree

2 files changed

+24
-4
lines changed

2 files changed

+24
-4
lines changed

README.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ The following script demonstrates how to provide inputs to the model, and obtain
2929
python examples/predict_structure.py
3030
```
3131

32-
For more advanced use cases, we also expose the `chai_lab.chai1.run_folding_on_context`, which allows users to construct an `AllAtomFeatureContext` manually. This allows users to specify their own templates, MSAs, embeddings, and constraints. We currently provide an example of how to construct an embeddings context, and will be releasing helper methods to build MSA and templates contexts soon.
32+
For more advanced use cases, we also expose the `chai_lab.chai1.run_folding_on_context`, which allows users to construct an `AllAtomFeatureContext` manually. This allows users to specify their own templates, MSAs, embeddings, and constraints. We currently provide an example of how to construct an embeddings context as well as an MSA context, and will be releasing helper methods to build template contexts soon.
3333

3434
<details>
3535
<summary>Where are downloaded weights stored?</summary>
@@ -43,6 +43,14 @@ CHAI_DOWNLOADS_DIR=/tmp/downloads python ./examples/predict_structure.py
4343
</p>
4444
</details>
4545

46+
<details>
47+
<summary>How can MSAs be provided to Chai-1?</summary>
48+
<p markdown="1">
49+
50+
Chai-1 supports MSAs provided as an `aligned.pqt` file. This file format is similar to an `a3m` file, but has additional columns that provide metadata like the source database and sequence pairing keys. We provide code to convert `a3m` files to `aligned.pqt` files. For more information on how to provide MSAs to Chai-1, see [this documentation](examples/msas/README.md).
51+
52+
</p>
53+
</details>
4654

4755
## ⚡ Try it online
4856

examples/msas/README.md

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Adding MSA evolutionary information
22

3-
While Chai-1 performs very well in "single-sequence mode," it can also be given additional evolutionary information to further improve performance. As in other folding methods, this evolutionary information is provided in the form of a multiple sequence alignment (MSA).
3+
While Chai-1 performs very well in "single-sequence mode," it can also be given additional evolutionary information to further improve performance. As in other folding methods, this evolutionary information is provided in the form of a multiple sequence alignment (MSA). This information is given in the form of a `MSAContext` object (see `chai_lab/data/dataset/msas/msa_context.py`); we provide code for building these `MSAContext` objects through `aligned.pqt` files, though you can play with building out an `MSAContext` yourself as well.
44

55
## The `.aligned.pqt` file format
66

@@ -24,24 +24,36 @@ See the following for a toy example of what this table might look like:
2424
| RKSES... | uniprot | Mus musculus | A mouse sequence from uniprot |
2525
| ... |
2626

27-
We additionally provide code to parse `a3m` files into this format; see `merge_multi_a3m_to_aligned_dataframe` in `chai_lab/data/parsing/msas/aligned_pqt.py`.
27+
We additionally provide code to parse `a3m` files into this format; see `merge_multi_a3m_to_aligned_dataframe` in `chai_lab/data/parsing/msas/aligned_pqt.py`. This file can also be run as a commandline script to run ; run `python chai_lab/data/parsing/msas/aligned_pqt.py --help` for details.
2828

2929
### TLDR
3030

3131
Chai-1 uses `.aligned.pqt` files to specify MSAs. These are similar to `a3m` with added columns for source database and pairing key to pair MSAs across different chains. Each `.aligned.pqt` file contains all MSAs for a single query sequence.
3232

3333
## From `.aligned.pqt` to `MSAContext`
3434

35-
By default, the `run_inference` example inference code we provide assumes that all MSAs required for a prediction are stored in a specified folder. Each file corresponds to the all alignments for a given sequence, and filenames are specified by the hash of their sequence (this filename is inferred using code in `chai_lab/data/parsing/msas/aligned_pqt.py`). During inference, the script tries to find `<HASH>.aligned.pqt` files in that folder (one file per unique chain sequence) and loads in a `MSAContext` for each MSA it can find; see `chai_lab/data/dataset/msas/load.py` for details.
35+
By default, the `run_inference` example inference code we provide assumes that all MSAs required for a prediction are stored in a specified folder. Each `.aligned.pqt` file in that folder corresponds to the all MSA alignments for a given sequence (spanning several databases), and filenames are specified by the hash of their sequence (this filename is inferred using code in `chai_lab/data/parsing/msas/aligned_pqt.py`). During inference, the script tries to find `<HASH>.aligned.pqt` files in that folder (one file per unique chain sequence) and loads in a `MSAContext` for each MSA it can find. The code then performs some basic preprocessing such as pairing MSAs by their given `pairing_key` and merging MSAs across chains; see `chai_lab/data/dataset/msas/load.py` for details.
3636

3737
## Putting it all together
3838

3939
To demonstrate how these pieces tie together, we provide `aligned.pqt` files containing MSAs for the example in `examples/predict_structure.py` under the `examples/msas` folder. Inference can be run using these example MSAs by providing the path to this folder as an additional argument to `run_inference` as follows:
4040

4141
```python
42+
from pathlib import Path
43+
...
44+
4245
candidates = run_inference(
4346
...
4447
msa_directory=Path("examples/msas"),
4548
...
4649
)
4750
```
51+
52+
You can also manually inspect the example `aligned.pqt` files by loading them as pandas dataframes as follows:
53+
54+
```python
55+
import pandas as pd
56+
57+
aligned_pqt = pd.read_parquet("examples/msas/703adc2c74b8d7e613549b6efcf37126da7963522dc33852ad3c691eef1da06f.aligned.pqt")
58+
aligned_pqt.head()
59+
```

0 commit comments

Comments
 (0)