Comparing_Databases_GEO

Compiling and comparing databases (GEO_based), and checking the metadata validation

Requiremets

python > 3.0 and create a env with pandas

stand-histones

main_stand_dbs.py

This script receives four dataframes:

GEO (e.g GEO_metadata_2023_91930_stand.csv)
NGS-QC (e.g NGS_HS_ChipSeq_nodup_33233.csv)
ChIP-Atlas (e.g CA_hg38_Hs_GSM_GSE_2022_antigenclass_2023_01_24.csv)
CistromeDB (e.g Cistrome_filter_human_noENC.csv)

The script returns a dataframe (.csv file) containing all metadata merged from these four databases.
Also, returns a filtered dataframe with samples (GSM) associated with Histones of interest - IHEC(h3k4me3, h3k4me1, h3k27me3, h3k27ac, h3k9me3, h3k36me3) and inputs belonging to the same experiment (GSE).
Moreover, four additional files will be generated (df_hist_inp_SPECIFIC_DB.csv).
All target columns from these four databases will be standardized. The new columns will be followed by _stand.

Usage

A script to merge several dataframes from different databases, standardize and
filter the Histones and Input samples

optional arguments:
  -h, --help            show this help message and exit
  -g GEO, --geo GEO     GEO metadata csv file generated by GEO-Metadata script
  -n NGS, --ngs NGS     NGS-QC metadata csv file generated by NGS-QC-
                        extraction script
  -c CA, --ca CA        ChIP-Atlas metadata csv file generated by ChIP-Atla-
                        extraction script
  -C CISTROME, --cistrome CISTROME
                        Cistrome metadata csv file generated by Cistrome-
                        extraction script

To see how to submit this script via slurm, please check sh/files/run_comparison_dbs.sh

merge_prediction.py

Script to merge the table generated by main_stand_dbs.py and the EpIClass prediction tables.

Usage

python merge_prediction.py Histones_basedDBs.tsv ChIP_Atlas_pred_EpiLaP.csv

compare_dbs_prediction/

main.py

To be able to run this script, you should run the merge_prediction.py first.

Script to generate a table including columns containing the information of how many databases agree/disagree compared to EpiClass prediction. Also, how many samples agree/disagree among DBs.

The additional columns will be added to the output file (e.g Histones_allDBs_CA_pred_comparisonDBs.tsv):

python main.py Histones_allDBs_CA_pred.tsv Histones_allDBs_CA_pred_comparisonDBs.tsv

clean_cols_histDbsPred.py

This script removes columns from Hist_DBs_predCA table to facilitate the manipulation. You can find the col list in the script.

python clean_cols_histDbsPred.py Histones_DBs_filled_CA_pred_consensus_ENCODE_upset.tsv

merge_metadata_epilap.py

Script to merge all predictions generated by EpiClass associated to ChIP-Atlas predictions. (e.g assay_prediction.csv, sex_prediction.csv, biomaterial_prediction.csv). Hint: all pred.csv should be in the same folder as the script. You need to pass the folder containing all outputs (tsv files from EpiClass)

Usage

python merge_metadata_epilap.py <path_to_pred_outputs>

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
add_cols_encode		add_cols_encode
compare_dbs_prediction		compare_dbs_prediction
plot_upset		plot_upset
sh_files		sh_files
stand-histones-allDbs		stand-histones-allDbs
stand-histones		stand-histones
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparing_Databases_GEO

Requiremets

stand-histones

main_stand_dbs.py

Usage

merge_prediction.py

Usage

compare_dbs_prediction/

main.py

clean_cols_histDbsPred.py

merge_metadata_epilap.py

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Comparing_Databases_GEO

Requiremets

stand-histones

main_stand_dbs.py

Usage

merge_prediction.py

Usage

compare_dbs_prediction/

main.py

clean_cols_histDbsPred.py

merge_metadata_epilap.py

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages