Skip to content

This repository contains most of the data and all the scripts used during my final grade project (Apr-Jun 2020)

Notifications You must be signed in to change notification settings

FerriolCalvet/uORFs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

uORFs

This repository contains most of the data and all the scripts used during my final grade project (Apr-Jun 2020)

Contents

  • DATA: Contains all the data generated by the uORFs_identifier and the process of adding more features to each uORF. The complete files are found in all_scores

  • FGP_RESULTS: Contains the custom track that can be uploaded to UCSC genome browser, and the code needed to create it and also the files with the scored uORFs using the 5 methods proposed.

  • REFERENCE_DATA: Contains the reference data used for completing the uORFs files with all the additional information. (there is a lot of data missing here, due to its big size, see Data Availability

  • SCRIPTS: Contains the scripts for identifying all the uORFs in a given genome (uORFs_identifier) and also for adding all the additional features.

  • replicate_yale_analysis: Contains all the analysis pipeline writen by McGillivray et al. (2018). We have used their scripts and adapted them with our methods and data to obtain a proper classification. R scripts and data are found inside.

  • time: Contains a file that records the time taken by each execution of the uORFs_identifier.pl file.

Data Availability

GENCODE annotation files available at: www.gencodegenes.org

All conservations scores source files can be downloaded from these sites:

All ribosed data we used was from GWIPS table browser (https://gwips.ucc.ie/cgi-bin/hgTables?command=start)

All these data should be downloaded and placed in the proper directories.

uORFs_identifier

The uORFs identifier tool can be runned using a command like this:

perl uORFs_identifier.pl -i ../../DATA/input_data/input_file.txt -of ../../DATA/raw_data/raw_uORFs/output_file.tsv -m number_of_processes -sp species 2> error

In our study:

perl uORFs_identifier.pl -i ../../DATA/input_data/all_human_EnsemblGeneIDs_v34.txt -of ../../DATA/raw_data/raw_uORFs/allENSG_15-5-2020_at_17-12.tsv -m 100 -sp Human 2> error

Requirements

Ensembl Core API installed see http://www.ensembl.org/info/docs/api/api_installation.html for the installation procedure. GitTools installation recommended in order to manage version updates easily.

Dependencies

For the execution, there are some additional dependencies of Perl modules. The following lines allow you to download the corresponding libraries in a Linux system, but there are other ways to install these dependencies.

sudo apt-get install -y libparallel-forkmanager-perl sudo apt-get install -y libdbd-mysql-perl

About

This repository contains most of the data and all the scripts used during my final grade project (Apr-Jun 2020)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published