### 1. FASTA Format
Importance:
FASTA format is commonly used to represent nucleotide sequences or peptide sequences. Each sequence in a FASTA file begins with a single-line description, followed by lines of sequence data.

Structure:

The first line starts with a > character followed by a description.
Subsequent lines contain the sequence data.

In [1]:
from Bio import SeqIO

# Reading FASTA file
fasta_sequences = SeqIO.parse(open("data/rcsb_pdb_4CS4.fasta"), 'fasta')
for fasta in fasta_sequences:
    name, sequence = fasta.id, str(fasta.seq)
    print(f"Name: {name}\nSequence: {sequence}")


Name: 4CS4_1|Chain
Sequence: MPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAVVGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNLHHHHHH


### 2. FASTQ Format
Importance:
FASTQ format is used to store nucleotide sequences along with their quality scores. It is essential in next-generation sequencing (NGS) data processing.

Structure:

* The first line starts with an @ character followed by a sequence identifier.
* The second line is the raw sequence.
* The third line begins with a + character and can be followed by the same sequence identifier.
* The fourth line encodes the quality scores for the sequence.

In [3]:
from Bio import SeqIO

# Reading FASTQ file
fastq_sequences = SeqIO.parse(open("data/SRR1552444.fastq"), 'fastq')
for fastq in fastq_sequences:
    name, sequence, quality = fastq.id, str(fastq.seq), fastq.letter_annotations["phred_quality"]
    print(f"Name: {name}\nSequence: {sequence}\nQuality: {quality}")


Name: SRR1552444.1
Sequence: CTGCCCTCAGCTATCTTCTCATGCTGCAAGTCTGACTCCACCGTCCTAGGTGTAGGAGCTGTCTCCATGGANNGGTNACANGTACATACAGTCTACAGCC
Quality: [34, 34, 34, 37, 37, 37, 37, 37, 39, 39, 39, 39, 39, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 40, 41, 41, 41, 41, 41, 41, 40, 41, 40, 41, 41, 41, 41, 41, 41, 41, 41, 40, 41, 41, 41, 41, 41, 41, 39, 39, 39, 40, 40, 41, 41, 40, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 38, 2, 2, 12, 20, 26, 2, 11, 20, 28, 2, 11, 26, 30, 33, 35, 36, 36, 35, 35, 35, 36, 35, 35, 34, 35, 35, 35, 35, 35]
Name: SRR1552444.2
Sequence: AATAAAAAAGATAAAACCTTGGCCTGTCTGAAGATGAGGTGGAGGATCATCCAAGTACAGTACTGTTTTCTCTTGGTTCCGTGCATGCTGACCGCTCTGG
Quality: [31, 31, 27, 35, 35, 35, 30, 35, 39, 30, 35, 39, 37, 40, 38, 27, 36, 36, 38, 39, 38, 39, 36, 31, 37, 39, 35, 37, 39, 40, 35, 34, 35, 35, 39, 38, 40, 40, 40, 18, 30, 33, 30, 37, 38, 38, 40, 34, 34, 33, 39, 34, 35, 37, 40, 28, 37, 38, 38, 39, 38, 39, 36, 31, 36, 39, 40, 35, 39, 36, 39, 22, 30, 30, 31, 34, 26, 33, 26, 26, 32, 