Skip to content

Edgar-Pacheco/Team4HacktoberFest2023

Repository files navigation

Hacktoberfest 2023 Project: Generation of a Report using Voila Dashboard.

Theme

Generation of a report with proteomic profile data, i.e., protein expression, of 77 patients diagnosed with breast cancer, for a better understanding of how gene expression behaves in positive cases and to detect possible new molecular markers for early and timely detection, through Voila Dashboards.

Description

The approach we decided to take is mostly an exploratory analysis of data, in this case, biological data pertinent to positive breast cancer cases, which were previously diagnosed. This analysis, with a good interpretation, could open doors to the identification of novel molecular markers that allow early identification of this clinical condition; in the future, this type of reports, already well established and with a previous research in scientific literature, we can deploy them in the cloud to share results and that all interested persons can have access, in addition, could be complemented with the help of ETL processes for a more effective management of data and a more efficient and faster exploratory analysis. Only 1 Dataset was used, with open license, which contains Proteomes of 77 patients.

The report will be generated by means of a Python script in a Jupyter Notebook, which will upload the data to a database for efficient data management, in order to generate HeatMaps, which will show the level of expression of the proteins in each profile. The DuckDB package will be used to efficiently manage the data in a Database and the SeaBorn package will be used to generate HeatMaps.

The goal of the project is to be able to deploy the report online, so that it can be available to anyone who is interested in studying what was done and give play to future research.

Data Sources

  • Breast Cancer Proteomes: Contains 77 proteomic (Protein Expression) profiles on previously diagnosed breast cancer cases obtained from the Clinical Proteomic Tumor Analysis (NCI/NIH), with expression values for ~12,000 proteins in each of the samples. License Unknown. (https://www.kaggle.com/datasets/piotrgrabo/breastcancerproteomes)

Methods

For the analysis and exploration of the data, the following is proposed:

  • Generate a HeatMap in which the gene expression levels can be visualized in a simple and easy to understand way, with their respective explanation of key points.
    • We use the SeaBorn package, which is for data visualization and drawing of attractive and informative statistical graphs, based on matplotlib. From this package we will obtain the HeatMaps.
    • For efficient data management, DuckDB is used, which allows us to export the data we need to a database, and from there work with that DB.

User Interface

The final idea is to deploy the report through Voila Dashboards, to show the exploratory analysis online to anyone who is interested and to have a basis to continue generating reports in the future.

Team Members

  • Jesús Gerardo Ortiz Romero / @j-gorm: Generation of the Notebook with the exploratory analysis and deployment of the Report.
  • Edgar Pacheco Castan / @Edgar-Pacheco
  • Diego Peñaloza / @diegopenaloza

Final Product

Here is our final product! Thanks a lot to Ploomber team and their mentoring programs, thanks to them we were able to deploy the reports online!

About

Pipeline ETL con componente de aprendizaje automatico

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published