You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, wondering if evolocity works with long proteins (>1024 residues) when embedding with ESM_1b - since the ESM repo reports issues with these proteins: facebookresearch/esm#49
Although I see that in the preprint that evolocity was used with e.g. Spike which is above 1024 residues.. So, not an issue?
The text was updated successfully, but these errors were encountered:
Hi @salvatoreloguercio, yes the 1024 residue limit of ESM-1b is unfortunate. The current workaround is to just divide the protein into 1022 residue windows (e.g.,
, 1022 + before/after sequence tokens), run these through the model separately, then concatenate the output, but this is definitely a heuristic.
Encouragingly, though, this seems to give reasonable results in the (zero-shot) deep mutational scan benchmark that contains long proteins (>1022 residues). For example, I think BRCA1 is longer than 1022, but the zero-shot mutational effect performance is still higher than DeepSequence.
Hello, wondering if evolocity works with long proteins (>1024 residues) when embedding with ESM_1b - since the ESM repo reports issues with these proteins:
facebookresearch/esm#49
Although I see that in the preprint that evolocity was used with e.g. Spike which is above 1024 residues.. So, not an issue?
The text was updated successfully, but these errors were encountered: