The aim of this repo is to reproduce the project I did in the context of the Advanced Data Analysis course.
In this project I ran two analysis in parallel : one in R, using classical statistical tools, one in Python, using a Deep Learning architecture that was implemented in https://github.com/welch-lab/ConvNetVAE/.
The goal is to identify the epigenomic features that drive cell identity, without using differential analysis since this metric is not relevant in the context of highly sparse data.
The repo is organised as follows :
- data/ contains the input files that are used in the analysis. The dataset was fetched from DOI : 10.1186/s12943-025-02331-9
- ConvNetVAE-main/ contains a coopy of the github repo of the lab that built the DL architecture I will use (https://github.com/welch-lab/ConvNetVAE/)
- Conv_Net_env/ contains the files necessary to import the environment required for the DL architecture (cf. https://github.com/welch-lab/ConvNetVAE/)
- outputs/ contains the outputs of the scripts
- scripts/ contains all the scripts and markdowns I wrote, aranged by order of execution. R_scripts are expected to be run before the Python_scripts (expect for 4.). If all the github is downloaded, each scripts is executable indepently. Otherwise, they need to be executed in order, in order to have the intermediate files.