Data, Code and Workflows Guideline

To guide eBook authors having a better sense of the workflow layout, here we briefly introduce the specific purposes of the dir system.

cache: Here, it stores intermediate datasets or results that are generated during the preprocessing steps.
graphs: The graphs/figures produced during the analysis.
input: Here, we store the raw input data. Data size > 100M is not allowed. We recommend using small sample data for the illustration purpose of the workflow. If you have files > 100M, please contact the chapter editor to find a solution.
lib: The source code, functions, or algorithms used within the workflow.
output: The final output results of the workflow.
workflow: Step by step pipeline. It may contain some sub-directories.
- It is suggested to use a numbering system and keywords to indicate the order and the main purpose of the scripts, i.e., 1_fastq_quality_checking.py, 2_cleaned_reads_alignment.py.
- To ensure reproducibility, please use the relative path within the workflow.
README: In the readme file, please briefly describe the purpose of the repository, the installation, and the input data format.
- We recommend using a diagram to describe the workflow briefly.
- Provide the installation details.
- Show a small proportion of the input data unless the data file is in a well-known standard format, i.e., the head or tail of the input data.

Installation

Running environment:
- The workflow was constructed based on the Linux system running the Oracle v1.6 to 1.8 java runtime environment (JREs).
Required software and versions:

MAKER-P (Campbell et al., 2014; v3.1; http://www.yandell-lab.org/software/maker-p.html)
RepeatMasker (Tarailo-Graovac et al., 2009; v4.1.1; www.repeatmasker.org)
Augustus (Stanke et al., 2006; v3.0; http://bioinf.uni-greifswald.de/augustus/)
Fgenesh (Solovyev et al., 2006; v8.0.0a; http://www.softberry.com/berry.phtml)
Snap (Korf, 2004; version 2013-11-29; https://github.com/KorfLab/SNAP)
WUBLAST (Gish, W. (1996-2003); v2.0; ttp://blast.wustl.edu)
InterProScan (Quevillon et al., 2005; v89.0; http://www.ebi.ac.uk/interpro/search/sequence-search).
Exonerate (Slater GS and Birney E, 2005; https://www.ebi.ac.uk/about/vertebrate-genomics/software/exonerate)

Input Data

The example data used here is the FASTA file of genome sequence, here we use maize RP125 genome chr1 sequence from Nie et al., 2021.

Major steps

step 1.

Set up each configure file: Maker_opts.ctl; Maker_exe.ctl; Maker_evm.ctl; and Maker_bopts.ctl, provide path of input data, evidence data to the .ctl file, and set parameter as needed.

step 2.

Run script ‘run_maker.sh’ to annotate the genomes.

step 3.

Use the following command to create the final merged gff file. The “-n” option would produce a gff file without genome sequences.

gff3_merge -s -n -d genome.maker.output/genome_master_datastore_index.log>genome.noseq.gff

step 4.

Generate AED plots.

/programs/maker/AED_cdf_generator.pl -b 0.025 chr1.noseq.gff > AED_rnd3

Plot the file AED_rnd3 in Excel or any plotting software.

step 5.

Load the gff file into IGV or JBrowse. Instructions for IGV and JBrowse can be found at:

IGV: http://software.broadinstitute.org/software/igv/UserGuide

JBrowse: https://biohpc.cornell.edu/lab/userguide.aspx?a=software&i=357#c

Expected results

GFF3 file with gene structure information, and AED score

License

It is a free and open source software, licensed under (choose a license from the suggested list: GPLv3, MIT, or CC BY 4.0).

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
Code		Code
Configure		Configure
Evidence		Evidence
Input		Input
Output		Output
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
template.Rproj		template.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data, Code and Workflows Guideline

Installation

Input Data

Major steps

step 1.

step 2.

step 3.

step 4.

step 5.

Expected results

License

About

Releases

Packages

Contributors 3

Languages

Bio-protocol/Gene_Annotation_Pipeline

Folders and files

Latest commit

History

Repository files navigation

Data, Code and Workflows Guideline

Installation

Input Data

Major steps

step 1.

step 2.

step 3.

step 4.

step 5.

Expected results

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages