uORFs

This repository contains most of the data and all the scripts used during my final grade project (Apr-Jun 2020)

DATA: Contains all the data generated by the uORFs_identifier and the process of adding more features to each uORF. The complete files are found in all_scores
FGP_RESULTS: Contains the custom track that can be uploaded to UCSC genome browser, and the code needed to create it and also the files with the scored uORFs using the 5 methods proposed.
REFERENCE_DATA: Contains the reference data used for completing the uORFs files with all the additional information. (there is a lot of data missing here, due to its big size, see Data Availability
SCRIPTS: Contains the scripts for identifying all the uORFs in a given genome (uORFs_identifier) and also for adding all the additional features.
replicate_yale_analysis: Contains all the analysis pipeline writen by McGillivray et al. (2018). We have used their scripts and adapted them with our methods and data to obtain a proper classification. R scripts and data are found inside.
time: Contains a file that records the time taken by each execution of the uORFs_identifier.pl file.

Data Availability

GENCODE annotation files available at: www.gencodegenes.org

All conservations scores source files can be downloaded from these sites:

phastCons: http://hgdownload.cse.ucsc.edu/goldenpath/hg38/phastCons100way/hg38.phastCons100way.bw
phyloP: http://hgdownload.cse.ucsc.edu/goldenpath/hg38/phyloP100way/hg38.phyloP100way.bw
phyloCSF: https://data.broadinstitute.org/compbio1/PhyloCSFtracks/hg38_100/20170118/ all PhyloCSF*.bw files

All ribosed data we used was from GWIPS table browser (https://gwips.ucc.ie/cgi-bin/hgTables?command=start)

All these data should be downloaded and placed in the proper directories.

uORFs_identifier

The uORFs identifier tool can be runned using a command like this:

perl uORFs_identifier.pl -i ../../DATA/input_data/input_file.txt -of ../../DATA/raw_data/raw_uORFs/output_file.tsv -m number_of_processes -sp species 2> error

In our study:

perl uORFs_identifier.pl -i ../../DATA/input_data/all_human_EnsemblGeneIDs_v34.txt -of ../../DATA/raw_data/raw_uORFs/allENSG_15-5-2020_at_17-12.tsv -m 100 -sp Human 2> error

Requirements

Ensembl Core API installed see http://www.ensembl.org/info/docs/api/api_installation.html for the installation procedure. GitTools installation recommended in order to manage version updates easily.

Dependencies

For the execution, there are some additional dependencies of Perl modules. The following lines allow you to download the corresponding libraries in a Linux system, but there are other ways to install these dependencies.

sudo apt-get install -y libparallel-forkmanager-perl sudo apt-get install -y libdbd-mysql-perl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DATA

DATA

FGP_RESULTS

FGP_RESULTS

REFERENCE_DATA

REFERENCE_DATA

SCRIPTS

SCRIPTS

replicate_yale_analysis

replicate_yale_analysis

time

time

uORFs

uORFs

.gitattributes

.gitattributes

README.md

README.md

Repository files navigation

uORFs

Contents

Data Availability

uORFs_identifier

Requirements

Dependencies

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
DATA		DATA
FGP_RESULTS		FGP_RESULTS
REFERENCE_DATA		REFERENCE_DATA
SCRIPTS		SCRIPTS
replicate_yale_analysis		replicate_yale_analysis
time		time
uORFs		uORFs
.gitattributes		.gitattributes
README.md		README.md

FerriolCalvet/uORFs

Folders and files

Latest commit

History

Repository files navigation

uORFs

Contents

Data Availability

uORFs_identifier

Requirements

Dependencies

About

Resources

Stars

Watchers

Forks

Languages