--bias_by_res_jsonl example? #73

tony-res · 2023-10-24T20:00:15Z

I'm able to mutate just a few positions in the sequence. What I'd like to do now is to provide amino acid biases to these positions.

Does anyone have an example of how to do this?

My current attempt is to pass --bias_by_res_jsonl into proteinmpnn_run.py. For the JSON I have a dictionary with a 2D array (sequence length by 21 amino acid ids).

{"PROTEIN123": {"A": [[0.0001, 0.5,, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.49, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], ... ]]}}

where each row is a position in the sequence and each column is an amino acid. For example, the 0.5 above would weight the second amino acid ("C") as having the highest probability in that position.

Is this correct? When I run it ProteinMPNN doesn't seem to respect the biases.

The text was updated successfully, but these errors were encountered:

tony-res · 2023-10-25T00:59:35Z

It looks like the parsed_pdbs.jsonl file includes a - (which I assume is a gap). So I needed to put a X in the original chain sequence at that gap position.

linamp · 2023-11-02T20:24:13Z

From their examples it would look like something like this: {A: -1.1, F: 0.7}. I tried using one and my json file looks like this, and it ran ok: {"A": -1.50487820683775, "C": -1.50336065414758, "D": 0.451363945615286, "E": 0.407588387244967, "F": 1.37065067139209, "G": -1.50487820683775, "H": 0.801568412577877, "I": 0.538915062355937, "K": -0.292820546680218, "L": 0.670241737466907, "M": -0.0301671964582751, "N": -0.227157209124733, "P": -1.50487820683775, "Q": -0.0301671964582751, "R": 0.889119529318528, "S": -0.774351688753783, "T": -0.686800572013134, "V": 0.0136083619120431, "W": 1.63330402161404, "Y": 1.28309955465145}

In one of the examples, they have the usage of one of their scripts to automatically create the json file.

I do not know if it is possible to have different biases for each position. It would be tedious but maybe you can write a script to design one position at a time passing the respective bias. I would also like to know if there is a better approach here!

tony-res · 2023-11-03T16:40:44Z

Thanks!

Yes. I've been successful doing the per residue bias. Essentially it is a JSON where the first level is the name, the second is the chain, and the third is a N x M matrix of bias values where N is the number of residues and the M is 21 (the 20 amino acids plus a gap character). This works well so long as you account for any gaps that are in the PDB file's sequence.

tony-res closed this as completed Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--bias_by_res_jsonl example? #73

--bias_by_res_jsonl example? #73

tony-res commented Oct 24, 2023 •

edited

Loading

tony-res commented Oct 25, 2023

linamp commented Nov 2, 2023 •

edited

Loading

tony-res commented Nov 3, 2023

--bias_by_res_jsonl example? #73

--bias_by_res_jsonl example? #73

Comments

tony-res commented Oct 24, 2023 • edited Loading

tony-res commented Oct 25, 2023

linamp commented Nov 2, 2023 • edited Loading

tony-res commented Nov 3, 2023

tony-res commented Oct 24, 2023 •

edited

Loading

linamp commented Nov 2, 2023 •

edited

Loading