This is an exploratory analysis project of the currently solved (Jan 2023) XRD protein structures for homo sapiens. The raw datasets were pulled from the Protein Data Bank (https://www.rcsb.org/). The initial aim is to clean the data, calculate a few interesting protein properties and continue into an exploratory analysis on the currently available protein models. The eventual aim is to base a second project on the generated data that adds information about currently unsolved protein structures and uses machine learning to determine the major determinants for protein crystallization.
Note: I am actively working on this project. If you have any comments or suggestions, do not hesitate to contact me. This includes inquiries on the written code and any suggestions for potential protein analyses.
Required Python packages:
glob os numpy pandas biopython matplotlib seaborn