In [None]:
#Step 1: Install required tools
!apt-get install bwa samtools


Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libhts3 libhtscodecs2
Suggested packages:
  cwltool
The following NEW packages will be installed:
  bwa libhts3 libhtscodecs2 samtools
0 upgraded, 4 newly installed, 0 to remove and 41 not upgraded.
Need to get 1,158 kB of archives.
After this operation, 2,736 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 bwa amd64 0.7.17-6 [195 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libhtscodecs2 amd64 1.1.1-3 [53.2 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libhts3 amd64 1.13+ds-2build1 [390 kB]
Get:4 http://archive.ubuntu.com/ubuntu jammy/universe amd64 samtools amd64 1.13-4 [520 kB]
Fetched 1,158 kB in 1s (1,599 kB/s)
Selecting previously unselected package bwa.
(Reading database ... 125079 files and directories currently installed.)
Preparing to un

In [None]:
#Step 2: Download make small portion of hg38 (i.e. 100kb region of chr22)
!mkdir -p /content/reference
%cd /content/reference
!wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr22.fa.gz
!gunzip chr22.fa.gz

/content/reference
--2025-10-31 11:16:18--  https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr22.fa.gz
Resolving hgdownload.soe.ucsc.edu (hgdownload.soe.ucsc.edu)... 128.114.119.163
Connecting to hgdownload.soe.ucsc.edu (hgdownload.soe.ucsc.edu)|128.114.119.163|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12255678 (12M) [application/x-gzip]
Saving to: ‘chr22.fa.gz’


2025-10-31 11:16:18 (60.6 MB/s) - ‘chr22.fa.gz’ saved [12255678/12255678]

gzip: chr22.fa already exists; do you wish to overwrite (y or n)? y


In [None]:
#Optional: Extract only a small region for speed
!head -n 2000 chr22.fa > chr22_small.fa

In [None]:
# Step 3: Index the reference genome
!bwa index chr22_small.fa

[bwa_index] Pack FASTA... 0.00 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.02 seconds elapse.
[bwa_index] Update BWT... 0.00 sec
[bwa_index] Pack forward-only FASTA... 0.00 sec
[bwa_index] Construct SA from BWT and Occ... 0.01 sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa index chr22_small.fa
[main] Real time: 0.086 sec; CPU: 0.044 sec


In [None]:
#Step 4: Create a small query FASTA/FASTQ File
query_seq = """>query1
TGGAAGGACTTTAGAGATGCAAAGCCAAAGAACTAG
"""
with open("/content/query.fa", "w") as f:
  f.write(query_seq)

In [None]:
#Step 5: Align the query using BWA-MEM
!bwa mem chr22_small.fa /content/query.fa >/content/alignment.sam

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 1 sequences (36 bp)...
[M::mem_process_seqs] Processed 1 reads in 0.000 CPU sec, 0.000 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem chr22_small.fa /content/query.fa
[main] Real time: 0.005 sec; CPU: 0.002 sec


Codes explained:
chr22_small.fa /content/query.fa >/content/alignment.sam - because -

chr22_small.fa	: Reference genome to align against

/content/query.fa	: Query sequence(s) to align

">"	: Redirect output (save instead of print)
/content/alignment.sam	: Output SAM file storing alignment results

In [None]:
#Show the content inside alignment
!head /content/alignment.sam

@SQ	SN:chr22	LN:99950
@PG	ID:bwa	PN:bwa	VN:0.7.17-r1188	CL:bwa mem chr22_small.fa /content/query.fa
query1	4	*	0	0	*	*	0	0	TGGAAGGACTTTAGAGATGCAAAGCCAAAGAACTAG	*	AS:i:0	XS:i:0


OUTPUT EXPLAINED:
query1- Query (read) name

4- FLAG 4, the read did not align to the reference.

*- Reference name

0- Position (POS) no alignment

0- Mapping quality (MAPQ): no valid alignment.

*- CIGAR string: no alignment operations (because unmapped)

*- Mate ref name- not applicable because of no alignment.

0- Mate position (NA)

0- Inset size (0)

TGGACTTT...- the actual read sequence

*- Quality string (no quality scores, since FASTA not FASTQ).

AS:i:0- Alignment Score -0 no match.

XS:I:0- Suboptimal alignment sccore (no alternate match either).


READ ALIGNMENT DEMO: Bowtie2 + hg38 (small region)

In [None]:
#Step-1: Create a fresh folder and move into it

%cd /content
!mkdir -p bowtie2_demo
%cd bowtie2_demo


/content
/content/bowtie2_demo


In [None]:
#Install Bowtie2 and Samtools
!apt-get install bowtie2 samtools > /dev/null

/dev/null is a special system file that discards anything written to it.

Think of it as a trash can for output.

“Redirect the standard output (what would normally be printed on screen) into /dev/null, i.e., throw it away.”

In [None]:
!wget https://hgdownload.cse.ucsc.edu/goldenPath/hg38/chromosomes/chr22.fa.gz
!gunzip chr22.fa.gz

--2025-10-31 11:16:49--  https://hgdownload.cse.ucsc.edu/goldenPath/hg38/chromosomes/chr22.fa.gz
Resolving hgdownload.cse.ucsc.edu (hgdownload.cse.ucsc.edu)... 128.114.119.163
Connecting to hgdownload.cse.ucsc.edu (hgdownload.cse.ucsc.edu)|128.114.119.163|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12255678 (12M) [application/x-gzip]
Saving to: ‘chr22.fa.gz’


2025-10-31 11:16:49 (64.7 MB/s) - ‘chr22.fa.gz’ saved [12255678/12255678]



In [None]:
#Reduce the file size for quick indexing
!head -n 2000 chr22.fa > chr22_small2.fa

In [None]:
%%bash
cat > chr22_small2.fa << 'EOF'
>chr22_small2
AGCTTAGCTACCTATATTGGTCGTTGGCCG
EOF

In [None]:
!bowtie2-build chr22_small2.fa chr22_small2_index

Settings:
  Output files: "chr22_small2_index.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  chr22_small2.fa
Building a SMALL index
Reading reference sizes
  Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 7
Using parameters --bmax 6 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 6 --dcv 1024
Constructing suffix-array element g

In [None]:
#Create a small query (FASTA read)
%%bash
cat > query.fa << 'EOF'
>query2
AGCTTAGCTAGCTACCTAT
EOF

echo "Query read content:"
cat query.fa

Query read content:
>query2
AGCTTAGCTAGCTACCTAT


In [None]:
#Run Bowtie2 alignment
!bowtie2 -x chr22_small2_index -f query.fa -S result.sam

1 reads; of these:
  1 (100.00%) were unpaired; of these:
    1 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
0.00% overall alignment rate


In [None]:
#View the SAM output
!cat result.sam

@HD	VN:1.0	SO:unsorted
@SQ	SN:chr22_small2	LN:30
@PG	ID:bowtie2	PN:bowtie2	VN:2.4.4	CL:"/usr/bin/bowtie2-align-s --wrapper basic-0 -x chr22_small2_index -f query.fa -S result.sam"
query2	4	*	0	0	*	*	0	0	AGCTTAGCTAGCTACCTAT	IIIIIIIIIIIIIIIIIII	YT:Z:UU
