Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--bias_by_res_jsonl example? #73

Closed
tony-res opened this issue Oct 24, 2023 · 3 comments
Closed

--bias_by_res_jsonl example? #73

tony-res opened this issue Oct 24, 2023 · 3 comments

Comments

@tony-res
Copy link

tony-res commented Oct 24, 2023

I'm able to mutate just a few positions in the sequence. What I'd like to do now is to provide amino acid biases to these positions.

Does anyone have an example of how to do this?

My current attempt is to pass --bias_by_res_jsonl into proteinmpnn_run.py. For the JSON I have a dictionary with a 2D array (sequence length by 21 amino acid ids).

{"PROTEIN123": {"A": [[0.0001, 0.5,, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.49, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], ... ]]}}

where each row is a position in the sequence and each column is an amino acid. For example, the 0.5 above would weight the second amino acid ("C") as having the highest probability in that position.

Is this correct? When I run it ProteinMPNN doesn't seem to respect the biases.

@tony-res
Copy link
Author

It looks like the parsed_pdbs.jsonl file includes a - (which I assume is a gap). So I needed to put a X in the original chain sequence at that gap position.

@linamp
Copy link

linamp commented Nov 2, 2023

From their examples it would look like something like this: {A: -1.1, F: 0.7}. I tried using one and my json file looks like this, and it ran ok: {"A": -1.50487820683775, "C": -1.50336065414758, "D": 0.451363945615286, "E": 0.407588387244967, "F": 1.37065067139209, "G": -1.50487820683775, "H": 0.801568412577877, "I": 0.538915062355937, "K": -0.292820546680218, "L": 0.670241737466907, "M": -0.0301671964582751, "N": -0.227157209124733, "P": -1.50487820683775, "Q": -0.0301671964582751, "R": 0.889119529318528, "S": -0.774351688753783, "T": -0.686800572013134, "V": 0.0136083619120431, "W": 1.63330402161404, "Y": 1.28309955465145}

In one of the examples, they have the usage of one of their scripts to automatically create the json file.

I do not know if it is possible to have different biases for each position. It would be tedious but maybe you can write a script to design one position at a time passing the respective bias. I would also like to know if there is a better approach here!

@tony-res
Copy link
Author

tony-res commented Nov 3, 2023

Thanks!

Yes. I've been successful doing the per residue bias. Essentially it is a JSON where the first level is the name, the second is the chain, and the third is a N x M matrix of bias values where N is the number of residues and the M is 21 (the 20 amino acids plus a gap character). This works well so long as you account for any gaps that are in the PDB file's sequence.

@tony-res tony-res closed this as completed Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants