**<small>Abstract:</small>**

<small> This project demonstrates the use of Unix command-line tools and regular expressions to process and analyze large genomic data files efficiently. The workflow involved downloading compressed annotation files, extracting relevant data using commands like `awk`, `grep`, and `sed`, and performing data filtering and manipulation through regular expressions. These steps enabled automated parsing, extraction, and formatting of genomic features critical for downstream bioinformatics analysis. </small>

**<small>Basic Unix Shell Commands</small>**

In [1]:
%%bash
# Create a directory named "test"
mkdir test

In [2]:
%%bash
# Navigate into the "test" directory
cd test

# Change to the user's home directory
cd ~

# Move one level up in the directory tree
cd ..

# Print the absolute path of the current directory
pwd

/


In [3]:
%%bash
# List all files and directories in the current location
ls

# List files and directories with detailed information (permissions, size, date, etc.)
ls -l

# Same as above, but with human-readable file sizes (e.g., KB, MB)
ls -lh

sample_data
test
total 8
drwxr-xr-x 1 root root 4096 Jun 26 13:35 sample_data
drwxr-xr-x 2 root root 4096 Jul  2 09:01 test
total 8.0K
drwxr-xr-x 1 root root 4.0K Jun 26 13:35 sample_data
drwxr-xr-x 2 root root 4.0K Jul  2 09:01 test


In [5]:
%%bash
# Create a test text file
echo "This is a test file." > test.txt

# List all files ending with .txt in the current directory
ls *txt

# Print the contents of a specific text file (e.g., test.txt)
cat test.txt

test.txt
This is a test file.


In [None]:
%%bash
# Create a test.log file with a simple text
echo "This is a test log file." > test.log

# List files starting with 't' and ending with 'g'
ls t*g

# Display the content of test.log file
cat test.log

test.log
This is a test log file.


In [8]:
%%bash
# List all files that start with 't' (followed by anything)
# Example: this would match 'test.txt', 'test.log', 'tempfile', etc.
ls t*

# Display the contents of 'test.log'
cat test.log

# Display the contents of 'test.txt'
cat test.txt

test.log
test.txt

test:
This is a test log file.
This is a test file.


In [9]:
%%bash
# Create a file named 'filename.txt' and write the text "This is some text" into it
echo "This is some text" > filename.txt

In [10]:
%%bash
# Download the SARS-CoV-2 genome FASTA file from the Ensembl Genomes FTP server
wget http://ftp.ensemblgenomes.org/pub/viruses/fasta/sars_cov_2/dna/Sars_cov_2.ASM985889v3.dna.toplevel.fa.gz

--2025-07-02 09:09:29--  http://ftp.ensemblgenomes.org/pub/viruses/fasta/sars_cov_2/dna/Sars_cov_2.ASM985889v3.dna.toplevel.fa.gz
Resolving ftp.ensemblgenomes.org (ftp.ensemblgenomes.org)... 193.62.193.161
Connecting to ftp.ensemblgenomes.org (ftp.ensemblgenomes.org)|193.62.193.161|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9606 (9.4K) [application/x-gzip]
Saving to: ‘Sars_cov_2.ASM985889v3.dna.toplevel.fa.gz’

     0K .........                                             100%  114M=0s

2025-07-02 09:09:30 (114 MB/s) - ‘Sars_cov_2.ASM985889v3.dna.toplevel.fa.gz’ saved [9606/9606]



In [11]:
%%bash
# Decompress and display the contents of the gzipped FASTA file without extracting it to disk
zcat Sars_cov_2.ASM985889v3.dna.toplevel.fa.gz

>MN908947.3 dna:primary_assembly primary_assembly:ASM985889v3:MN908947.3:1:29903:1 REF
ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCT
GTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACT
CACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGGACACGAGTAACTCGTCTATC
TTCTGCAGGCTGCTTACGGTTTCGTCCGTGTTGCAGCCGATCATCAGCACATCTAGGTTT
CGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTCCCTGGTTTCAACGAGAAAAC
ACACGTCCAACTCAGTTTGCCTGTTTTACAGGTTCGCGACGTGCTCGTACGTGGCTTTGG
AGACTCCGTGGAGGAGGTCTTATCAGAGGCACGTCAACATCTTAAAGATGGCACTTGTGG
CTTAGTAGAAGTTGAAAAAGGCGTTTTGCCTCAACTTGAACAGCCCTATGTGTTCATCAA
ACGTTCGGATGCTCGAACTGCACCTCATGGTCATGTTATGGTTGAGCTGGTAGCAGAACT
CGAAGGCATTCAGTACGGTCGTAGTGGTGAGACACTTGGTGTCCTTGTCCCTCATGTGGG
CGAAATACCAGTGGCTTACCGCAAGGTTCTTCTTCGTAAGAACGGTAATAAAGGAGCTGG
TGGCCATAGTTACGGCGCCGATCTAAAGTCATTTGACTTAGGCGACGAGCTTGGCACTGA
TCCTTATGAAGATTTTCAAGAAAACTGGAACACTAAACATAGCAGTGGTGTTACCCGTGA
ACTCATGCGTGAGCTTAACGGAGGGGCATACACTCGCTATGTCGATAACAACTTCTGTGG
CCCTGATGGCTACCCTCTTGAGTGCATTAAAGACCTTCTAGCACGTGCTGGTAAAGCTT

In [12]:
%%bash
# Decompress the gzipped FASTA file, replacing the .gz file with the decompressed .fa file
gunzip Sars_cov_2.ASM985889v3.dna.toplevel.fa.gz

In [21]:
%%bash
# Display the contents of a plain text file named 'test.txt'
cat /content/Sars_cov_2.ASM985889v3.dna.toplevel.fa

>MN908947.3 dna:primary_assembly primary_assembly:ASM985889v3:MN908947.3:1:29903:1 REF
ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCT
GTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACT
CACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGGACACGAGTAACTCGTCTATC
TTCTGCAGGCTGCTTACGGTTTCGTCCGTGTTGCAGCCGATCATCAGCACATCTAGGTTT
CGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTCCCTGGTTTCAACGAGAAAAC
ACACGTCCAACTCAGTTTGCCTGTTTTACAGGTTCGCGACGTGCTCGTACGTGGCTTTGG
AGACTCCGTGGAGGAGGTCTTATCAGAGGCACGTCAACATCTTAAAGATGGCACTTGTGG
CTTAGTAGAAGTTGAAAAAGGCGTTTTGCCTCAACTTGAACAGCCCTATGTGTTCATCAA
ACGTTCGGATGCTCGAACTGCACCTCATGGTCATGTTATGGTTGAGCTGGTAGCAGAACT
CGAAGGCATTCAGTACGGTCGTAGTGGTGAGACACTTGGTGTCCTTGTCCCTCATGTGGG
CGAAATACCAGTGGCTTACCGCAAGGTTCTTCTTCGTAAGAACGGTAATAAAGGAGCTGG
TGGCCATAGTTACGGCGCCGATCTAAAGTCATTTGACTTAGGCGACGAGCTTGGCACTGA
TCCTTATGAAGATTTTCAAGAAAACTGGAACACTAAACATAGCAGTGGTGTTACCCGTGA
ACTCATGCGTGAGCTTAACGGAGGGGCATACACTCGCTATGTCGATAACAACTTCTGTGG
CCCTGATGGCTACCCTCTTGAGTGCATTAAAGACCTTCTAGCACGTGCTGGTAAAGCTT

In [20]:
%%bash
# Show the first 10 lines of the compressed FASTA file
# Note: Because the file is gzipped, this will show binary/compressed data, not readable nucleotide sequences
head /content/Sars_cov_2.ASM985889v3.dna.toplevel.fa

>MN908947.3 dna:primary_assembly primary_assembly:ASM985889v3:MN908947.3:1:29903:1 REF
ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCT
GTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACT
CACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGGACACGAGTAACTCGTCTATC
TTCTGCAGGCTGCTTACGGTTTCGTCCGTGTTGCAGCCGATCATCAGCACATCTAGGTTT
CGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTCCCTGGTTTCAACGAGAAAAC
ACACGTCCAACTCAGTTTGCCTGTTTTACAGGTTCGCGACGTGCTCGTACGTGGCTTTGG
AGACTCCGTGGAGGAGGTCTTATCAGAGGCACGTCAACATCTTAAAGATGGCACTTGTGG
CTTAGTAGAAGTTGAAAAAGGCGTTTTGCCTCAACTTGAACAGCCCTATGTGTTCATCAA
ACGTTCGGATGCTCGAACTGCACCTCATGGTCATGTTATGGTTGAGCTGGTAGCAGAACT


In [22]:
%%bash
# Show the last 10 lines of the compressed FASTA file
# Same note as above: content is compressed and not human-readable
tail /content/Sars_cov_2.ASM985889v3.dna.toplevel.fa

TATTGACGCATACAAAACATTCCCACCAACAGAGCCTAAAAAGGACAAAAAGAAGAAGGC
TGATGAAACTCAAGCCTTACCGCAGAGACAGAAGAAACAGCAAACTGTGACTCTTCTTCC
TGCTGCAGATTTGGATGATTTCTCCAAACAATTGCAACAATCCATGAGCAGTGCTGACTC
AACTCAGGCCTAAACTCATGCAGACCACACAAGGCAGATGGGCTATATAAACGTTTTCGC
TTTTCCGTTTACGATATATAGTCTACTCTTGTGCAGAATGAATTCTCGTAACTACATAGC
ACAAGTAGATGTAGTTAACTTTAATCTCACATAGCAATCTTTAATCAGTGTGTAACATTA
GGGAGGACTTGAAAGAGCCACCACATTTTCACCGAGGCCACGCGGAGTACGATCGAGTGT
ACAGTGAACAATGCTAGGGAGAGCTGCCTATATGGAAGAGCCCTAATGTGTAAAATTAAT
TTTAGTAGTGCTATCCCCATGTGATTTTAATAGCTTCTTAGGAGAATGACAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAA


In [24]:
%%bash
# Copy the file test.txt to a new file named test1.txt
cp test.txt test1.txt

# List files and directories in the current directory
ls
# Expected output: large.txt test1.txt test.log test.txt

# Create a new directory named 'new'
mkdir new

# Copy the directory 'new' recursively to a new directory named 'new2'
cp -r new new2

# List files and directories again to see the changes
ls
# Expected output: large.txt new new2 test1.txt test.log test.txt

# Copy test.txt file into the 'new' directory
cp test.txt new/

# Copy test.txt file to the 'new' directory but rename it to test_new.txt
cp test.txt new/test_new.txt

# Change directory to 'new'
cd new

# List files inside 'new' directory
ls
# Expected output: test_new.txt test.txt

filename.txt
new
new2
sample_data
Sars_cov_2.ASM985889v3.dna.toplevel.fa
test
test1.txt
test.log
test.txt
filename.txt
new
new2
sample_data
Sars_cov_2.ASM985889v3.dna.toplevel.fa
test
test1.txt
test.log
test.txt
test_new.txt
test.txt


mkdir: cannot create directory ‘new’: File exists


In [30]:
%%bash
# Change directory to /content
cd /content

# Print current working directory to confirm location
pwd  # confirm that we are in /content

# Move test1.txt from /content to /content/new directory
mv test1.txt new/

# Rename test.txt to test_rename.txt in /content directory
mv test.txt test_rename.txt

# List files and directories in /content to check current contents
ls

# If test.txt still exists in /content (unlikely after renaming), move it to new/test2.txt
mv test.txt new/test2.txt || echo "test.txt not found, skipping move"

/content
filename.txt
new
new2
sample_data
Sars_cov_2.ASM985889v3.dna.toplevel.fa
test
test.log
test_rename.txt
test.txt not found, skipping move


mv: cannot stat 'test1.txt': No such file or directory
mv: cannot stat 'test.txt': No such file or directory
mv: cannot stat 'test.txt': No such file or directory


In [31]:
%%bash
# Remove the file test.log from the current directory
rm test.log

# Remove the directory new2 and all its contents recursively
rm -r new2

# List the files and directories in the current directory to verify removal
ls
# Expected output: large.txt  new  test_rename.txt

filename.txt
new
sample_data
Sars_cov_2.ASM985889v3.dna.toplevel.fa
test
test_rename.txt


In [32]:
%%bash
# Create a new file named example.txt
touch example.txt

# Check the current permissions of example.txt
ls -l example.txt

# Change the permissions of example.txt to be readable, writable, and executable by the owner,
# and readable and executable by group and others (rwxr-xr-x)
chmod 755 example.txt

# Verify the permissions have changed
ls -l example.txt

-rw-r--r-- 1 root root 0 Jul  2 09:33 example.txt
-rwxr-xr-x 1 root root 0 Jul  2 09:33 example.txt


In [35]:
%%bash
# List the contents of the current directory to check available files
ls

# Check if 'large.txt' exists; if not, create it with 25 sample lines
if [ ! -f large.txt ]; then
  echo -e "line1\nline2\nline3\nline4\nline5\nline6\nline7\nline8\nline9\nline10\nline11\nline12\nline13\nline14\nline15\nline16\nline17\nline18\nline19\nline20\nline21\nline22\nline23\nline24\nline25" > large.txt
fi

# Count the number of lines in 'large.txt' and display the result
wc -l large.txt

# Save the line count into 'f_ls.txt', overwriting if it exists
wc -l large.txt > f_ls.txt

# Append the list of files and directories to 'f_ls.txt'
ls >> f_ls.txt

# Append the first 10 lines of 'large.txt' to 'f_ls.txt'
head large.txt >> f_ls.txt

example.txt
filename.txt
f_ls.txt
large.txt
new
sample_data
Sars_cov_2.ASM985889v3.dna.toplevel.fa
test
test_rename.txt
25 large.txt


In [36]:
%%bash
# Count and display the number of lines, words, and bytes in the FASTA file
wc Sars_cov_2.ASM985889v3.dna.toplevel.fa

  500   503 30489 Sars_cov_2.ASM985889v3.dna.toplevel.fa


In [40]:
%%bash
# Concatenate the contents of 'large.txt' and 'test.txt', then count total lines
cat /content/large.txt /content/test_rename.txt | wc -l

26


In [39]:
%%bash
# Search and display all lines containing the character '>' in the FASTA file (usually headers)
grep ">" /content/Sars_cov_2.ASM985889v3.dna.toplevel.fa

# Search and display lines containing '>' with line numbers in the FASTA file
grep -n ">" /content/Sars_cov_2.ASM985889v3.dna.toplevel.fa

# Count the number of lines containing '>' in the FASTA file (count of sequences/headers)
grep -c ">" /content/Sars_cov_2.ASM985889v3.dna.toplevel.fa

# List the names of files (large.txt, test_rename.txt) that contain the string '1'
grep -l "1" /content/large.txt /content/test_rename.txt

# Search and display all lines containing the string '1' from large.txt and test.txt
grep "1" /content/large.txt /content/test_rename.txt

>MN908947.3 dna:primary_assembly primary_assembly:ASM985889v3:MN908947.3:1:29903:1 REF
1:>MN908947.3 dna:primary_assembly primary_assembly:ASM985889v3:MN908947.3:1:29903:1 REF
1
/content/large.txt
/content/large.txt:line1
/content/large.txt:line10
/content/large.txt:line11
/content/large.txt:line12
/content/large.txt:line13
/content/large.txt:line14
/content/large.txt:line15
/content/large.txt:line16
/content/large.txt:line17
/content/large.txt:line18
/content/large.txt:line19
/content/large.txt:line21


**<small>Regular Expressions</small>**

In [41]:
%%bash
# Search for lines starting with the sequence "TTCTG" in the FASTA file
grep "^TTCTG" Sars_cov_2.ASM985889v3.dna.toplevel.fa

# Search for lines ending with the sequence "TTCT" in the FASTA file
grep "TTCT$" Sars_cov_2.ASM985889v3.dna.toplevel.fa

TTCTGCAGGCTGCTTACGGTTTCGTCCGTGTTGCAGCCGATCATCAGCACATCTAGGTTT
TTCTGAAGTTGTTCTTAAAAAGTTGAAGAAGTCTTTGAATGTGGCTAAATCTGAATTTGA
AACACATTAACATTAGCTGTACCCTATAATATGAGAGTTATACATTTTGGTGCTGGTTCT


In [42]:
%%bash
# Count the number of lines containing one or more consecutive 'A's in the FASTA file
grep -E "A+" Sars_cov_2.ASM985889v3.dna.toplevel.fa | wc -l

# Find lines with exactly 10 consecutive 'A's in the FASTA file
grep -E "A{10}" Sars_cov_2.ASM985889v3.dna.toplevel.fa

# Find lines with 10 or 11 consecutive 'A's in the FASTA file
grep -E "A{10,11}" Sars_cov_2.ASM985889v3.dna.toplevel.fa

# Find lines containing 'ACTGG' followed by exactly 2 'G's (i.e., 'ACTGGG') in the FASTA file
grep -E "ACTGG{2}" Sars_cov_2.ASM985889v3.dna.toplevel.fa

500
TTTAGTAGTGCTATCCCCATGTGATTTTAATAGCTTCTTAGGAGAATGACAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAA
TTTAGTAGTGCTATCCCCATGTGATTTTAATAGCTTCTTAGGAGAATGACAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAA
ACTGGGCATTGATTTAGATGAGTGGAGTATGGCTACATACTACTTATTTGATGAGTCTGG
TTTAAAGATTGTAGTAAGGTAATCACTGGGTTACATCCTACACAGGCACCTACACACCTC


In [43]:
%%bash
# Replace all occurrences of 'AAAAA' with 'NNNNN' in the FASTA file and show lines containing 'NNNNN'
sed 's/AAAAA/NNNNN/g' Sars_cov_2.ASM985889v3.dna.toplevel.fa | grep "NNNNN"

# Add parentheses around every group of two alphanumeric characters in each line of the FASTA file
sed -E 's/[[:alnum:]]{2}/(&)/g' Sars_cov_2.ASM985889v3.dna.toplevel.fa

# Add parentheses around every occurrence of two consecutive 'A's in the FASTA file
sed -E 's/A{2}/(&)/g' Sars_cov_2.ASM985889v3.dna.toplevel.fa

CTTAGTAGAAGTTGNNNNNGGCGTTTTGCCTCAACTTGAACAGCCCTATGTGTTCATCAA
TTTTGTATTTCCCTTAAATTCCATAATCAAGACTATTCAACCAAGGGTTGAAAAGNNNNN
AATACTCCNNNNNGAGAAAGTCAACATCAATATTGTTGGTGACTTTAAACTTAATGAAGA
AAAAGGAAAAGCTNNNNNAGGTGCCTGGAATATTGGTGAACAGAAATCAATACTGAGTCC
GCTAACTAACATCTTTGGCACTGTTTATGNNNNNCTCAAACCCGTCCTTGATTGGCTTGA
CTGTGTTGTGGCAGATGCTGTCATNNNNNCTTTGCAACCAGTATCTGAATTACTTACACC
AAAACTTACTGACAATGTATACATTNNNNNTGCAGACATTGTGGAAGAAGCTNNNNNGGT
TGTCTACTTAGCTGTCTTTGATNNNNNTCTCTATGACAAACTTGTTTCAAGCTTTTTGGA
AGAGGGTGTTTTAACTGCTGTGGTTATACCTACTNNNNNGGCTGGTGGCACTACTGAAAT
GGGTTTAAATGGTTACACTGTAGAGGAGGCAAAGACAGTGCTTNNNNNGTGTAAAAGTGC
CACTNNNNNGTGGAAATACCCACAAGTTAATGGTTTAACTTCTATTAAATGGGCAGATAA
AGGAGACATTATACTTAAACCAGCAAATAATAGTTTNNNNNTTACAGAAGAGGTTGGCCA
TGCTTACGTTAATACGTTTTCATCAACTTTTAACGTACCAATGGNNNNNCTCAAAACACT
TGACTGTAGTGCGCGTCATATTAATGCGCAGGTAGCNNNNNGTCACAACATTGCTTTGAT
TGCTNNNNNGAATAACTTACCTTTTAAGTTGACATGTGCAACTACTAGACAAGTTGTTAA
ACCACAAACCTCTATCACCTCAGCTGTTTTGCAGAGTGGTTTTAGNNNNNTGGCATTCCC
CCAGTTACACAATGACATTCTCTT

In [45]:
%%bash
# Download the hg38 gene annotation GTF file compressed as .gz
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.ensGene.gtf.gz

# Decompress the downloaded gzipped GTF file
gunzip hg38.ensGene.gtf.gz

# Print the first 10 lines of the GTF file (default head lines)
awk '{print}' hg38.ensGene.gtf | head

# Print the first column of the GTF file (chromosome info) for first 10 lines
awk '{print $1}' hg38.ensGene.gtf | head

# Print the fourth column (start position) for first 10 lines
awk '{print $4}' hg38.ensGene.gtf | head

# Print first column (chromosome) and fourth column (start position) separated by tab for first 10 lines
awk '{print $1"\t"$4}' hg38.ensGene.gtf | head

# Print lines where the start position (4th column) is greater than 50, show first 10 matches
awk '{if($4>50) print}' hg38.ensGene.gtf | head

# Count number of lines where chromosome (1st column) is "chr2"
awk '{if($1=="chr2") print}' hg38.ensGene.gtf | wc -l

# Print chromosome and feature type (3rd column) for lines where chromosome is "chr2", first 10 lines
awk '{if($1=="chr2") print $1"\t"$3}' hg38.ensGene.gtf | head

# Calculate sum of all values in the 4th column (start positions) across the whole file
awk '{sum+=$4} END {print sum}' hg38.ensGene.gtf

# Extract unique gene IDs from a GTF file (assuming hg38.ensGene.gtf):
# - Split fields by ";"
# - Extract the gene_id field
# - Clean transcript_id and quotes
# - Sort, count unique occurrences, and print counts
awk -F ";" '{print $1, $2}' hg38.ensGene.gtf | awk -F "gene_id" '{print $2}' | sed 's/transcript_id//' | sed 's/"//g' | sort | uniq | awk '{print $1}' | sort | uniq -c

chr1	ensGene	transcript	11869	14409	.	+	.	gene_id "ENSG00000223972"; transcript_id "ENST00000456328";  gene_name "ENSG00000223972";
chr1	ensGene	exon	11869	12227	.	+	.	gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "1"; exon_id "ENST00000456328.1"; gene_name "ENSG00000223972";
chr1	ensGene	exon	12613	12721	.	+	.	gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "2"; exon_id "ENST00000456328.2"; gene_name "ENSG00000223972";
chr1	ensGene	exon	13221	14409	.	+	.	gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "3"; exon_id "ENST00000456328.3"; gene_name "ENSG00000223972";
chr1	ensGene	transcript	12010	13670	.	+	.	gene_id "ENSG00000223972"; transcript_id "ENST00000450305";  gene_name "ENSG00000223972";
chr1	ensGene	exon	12010	12057	.	+	.	gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; exon_number "1"; exon_id "ENST00000450305.1"; gene_name "ENSG00000223972";
chr1	ensGene	exon	12179	12227	.	+	.	gene_id "ENSG000

--2025-07-02 09:52:53--  https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.ensGene.gtf.gz
Resolving hgdownload.soe.ucsc.edu (hgdownload.soe.ucsc.edu)... 128.114.119.163
Connecting to hgdownload.soe.ucsc.edu (hgdownload.soe.ucsc.edu)|128.114.119.163|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 27802050 (27M) [application/x-gzip]
Saving to: ‘hg38.ensGene.gtf.gz’

     0K .......... .......... .......... .......... ..........  0% 1.52M 17s
    50K .......... .......... .......... .......... ..........  0% 3.00M 13s
   100K .......... .......... .......... .......... ..........  0% 3.00M 12s
   150K .......... .......... .......... .......... ..........  0% 3.05M 11s
   200K .......... .......... .......... .......... ..........  0%  149M 9s
   250K .......... .......... .......... .......... ..........  1%  213M 7s
   300K .......... .......... .......... .......... ..........  1%  201M 6s
   350K .......... .......... .......... .......... ..