2020 Homework 1

Write a script called download_count.sh which does the following.
- Download the data file https://ftp.ncbi.nlm.nih.gov/pub/UniVec/UniVec_Core from NCBI
- Print out the count of the number of FASTA format sequences in this file - see Wikipedia FASTA format - each record starts with a >
Write a script called summary_exons.sh which summarizes the total length of exons in the file data/rice_random_exons.bed. These data are in the BED file format. The columns are "Chromosome", "Start position", "Stop position". The length of a feature (or exon in this case) is computed by doing the computation: STOP - START
- read in the file
- use a loop structure to read each line
- add up the length of each exon by summing this into a variable
- Print out the total length of exon features at the end.
- You do not need to save this for each chromosome, just print out the total length for this example - however if this is too easy for you, go ahead and make a more sophisticated report which presents, per chromosome, the total length of exons as well as the total number of exons, and the average length of exons.
Write a script called strand_gene_count.sh to calculate the number of genes that are on the positive (+) and negative (-) strand in the file.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
download_count.sh		download_count.sh
strand_gene_count.sh		strand_gene_count.sh
summary_exons.sh		summary_exons.sh
summary_exons_no_awk.sh		summary_exons_no_awk.sh

Provide feedback