Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ContactFrequency to pandas dataframe #74

Open
PeptideSimulator01 opened this issue May 31, 2020 · 7 comments
Open

ContactFrequency to pandas dataframe #74

PeptideSimulator01 opened this issue May 31, 2020 · 7 comments

Comments

@PeptideSimulator01
Copy link

Dear all,

I really like using ContactMap Explorer! I was wandering if there is a way to create a pandas dataframe out of a ContactFrequency calculation.

Like:

traj=mdt.load('trajectory.h5')
peptide = topology.select('chainid 1 and element != "H"')
protein = topology.select('chainid 0 and resid 77 or resid 124 or resid 130 or resid 137 and element != "H"')
contacts = ContactFrequency(traj, query=peptide, haystack=protein, cutoff=0.45)
df=pd.DataFrame(contacts)

where the query would be the columns and the haystack the rows or the other way round, doesn't matter.

Thanks for your effort!
Have a nice day.

@dwhswenson
Copy link
Owner

First, keep in mind that the contact frequency itself isn't the contact matrix, because it holds information about both the atom-atom contact matrix and about the residue-residue contact matrix.

Assuming you want residue-residue contacts, you can (in principle) do this with:

df = contacts.residue_contacts.df

HOWEVER, pandas 1.0 (released around Feb 1) changed a bunch of stuff, including the way to read in sparse matrices (we store the contact matrices as sparse matrices). When I tried to update Contact Map Explorer to work with pandas 1.0, I found that the sparse matrix implementation in pandas had a bug: it gave completely incorrect results in some cases. I think this has since been fixed in pandas (pandas-dev/pandas#29814), but I haven't gotten around to updating the PR here (#69).

So to convert to pandas, you need to use a version of pandas older than 1.0 (for now). I believe that pandas==0.25.3 was their last version prior to the 1.0 release.

@PeptideSimulator01
Copy link
Author

Thank you, I will downgrade my pandas and then try again. In the meantime, is there a way to get the contacts.residue_contacts as a np.array and then convert this into a pandas dataframe?

@dwhswenson
Copy link
Owner

I think the way to get there would be to go through a scipy.sparse matrix; something like

residue_contacts.sparse_matrix.toarray()

should do it. Be aware; I think all of this returns a square matrix that is the size of the full number of residues in the topology, but everything will be zero except the entries where query meets haystack (that is also true for the .df property, which first creates the scipy sparse matrix and then sends that to pandas as input for a dataframe.)

@PeptideSimulator01
Copy link
Author

it worked with pandas 0.25.3 and I am checking now if the values are correct. And also yes, this gives a square matrix with all the contacts, but I think this can be handled with some pandas remodelling.

@sroet
Copy link
Collaborator

sroet commented Sep 7, 2020

Since the release of contact-map v0.5.1, this should also work with pandas > 1.0.

@PeptideSimulator01 was there anything left before closing this issue?

@PeptideSimulator01
Copy link
Author

Thanks a lot @sroet !
Actually, there is one last thing, which is how to cite your work. I didn't found a proper citation for the Contact Map Explorer. Let me know if you have citation, which suits you. Would be glad to honor your effort.

@dwhswenson
Copy link
Owner

Thanks @PeptideSimulator01. We're working on this! I've gotten a few questions about this lately. I think we'll set up Zenodo DOIs for releases in the short term, which technically makes the code citable. (@sroet is looking into that now.) We're planning to write a paper for peer review as we finish up a few more things that would make it more attractive for publication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants