## **1) Install Miniconda**

In [None]:
!wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
!chmod +x Miniconda3-latest-Linux-x86_64.sh
!./Miniconda3-latest-Linux-x86_64.sh -b -f -p /usr/local

In [2]:
!conda --version

conda 25.1.1


## **2) Update the environment variables to include Miniconda's bin directory**

In [3]:
import os
os.environ['PATH'] = '/usr/local/bin:' + os.environ['PATH']

## **3) Setup Conda Channels**

In [4]:
!conda config --add channels defaults
!conda config --add channels bioconda
!conda config --add channels conda-forge
!conda config --set offline false

In [5]:
!conda config --show channels

channels:
  - conda-forge
  - bioconda
  - defaults
  - https://repo.anaconda.com/pkgs/main
  - https://repo.anaconda.com/pkgs/r


## **4) Install Kallisto**

In [None]:
!conda install -c bioconda kallisto -y

In [7]:
!kallisto

kallisto 0.46.2

Usage: kallisto <CMD> [arguments] ..

Where <CMD> can be one of:

    index         Builds a kallisto index 
    quant         Runs the quantification algorithm 
    bus           Generate BUS files for single-cell data 
    pseudo        Runs the pseudoalignment step 
    merge         Merges several batch runs 
    h5dump        Converts HDF5-formatted results to plaintext
    inspect       Inspects and gives information about an index
    version       Prints version information
    cite          Prints citation information

Running kallisto <CMD> without arguments prints usage information for <CMD>



## **5) Import Datasets using FTP**

In [None]:
import os

folderPath = '/content/RNAseq'

if not os.path.exists(folderPath):
    os.makedirs(folderPath)

datasets = ['ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR244/035/SRR24448335/SRR24448335.fastq.gz',
            'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR244/036/SRR24448336/SRR24448336.fastq.gz',
            'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR244/037/SRR24448337/SRR24448337.fastq.gz',
            'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR244/038/SRR24448338/SRR24448338.fastq.gz',
            'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR244/039/SRR24448339/SRR24448339.fastq.gz',
            'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR244/040/SRR24448340/SRR24448340.fastq.gz',
            'https://ftp.ensembl.org/pub/release-113/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz']

# Loop through each URL and download the file
for url in datasets:
    # Get the filename from the URL
    filename = url.split("/")[-1]

    # Download the file
    !wget "$url" -P "$folderPath"

    file_path = os.path.join(folderPath, filename)
    if os.path.exists(file_path):
        print(f"Downloaded: {filename}")
    else:
        print(f"Failed to download: {filename}")

In [12]:
!date

Mon Apr 28 05:29:43 PM UTC 2025


## **6) Unzip Reference cDNA Human Genome**

In [14]:
!gunzip "/content/RNAseq/Homo_sapiens.GRCh38.cdna.all.fa.gz"

## **7) Create Reference Index Using Kallisto**

In [27]:
fasta_file = os.path.join(folderPath, "Homo_sapiens.GRCh38.cdna.all.fa")
index_file = os.path.join(folderPath, "Homo_sapiens.GRCh38.cdna.all.index")

!kallisto index -i {index_file} {fasta_file}
print(f"Index created: {index_file}")



[build] loading fasta file /content/RNAseq/Homo_sapiens.GRCh38.cdna.all.fa
[build] k-mer length: 31
        from 1517 target sequences
        with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done 
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 1233731 contigs and contains 116708646 k-mers 

Index created: /content/RNAseq/Homo_sapiens.GRCh38.cdna.all.index


## **8) Map Reads to Kallisto Index**

In [None]:
cd RNAseq/

In [31]:
mkdir kallisto

In [None]:
kallistoData = [['HS01', 'SRR24448340.fastq.gz'],
                ['HS02', 'SRR24448339.fastq.gz'],
                ['HS03', 'SRR24448338.fastq.gz'],
                ['CD01', 'SRR24448337.fastq.gz'],
                ['CD02', 'SRR24448336.fastq.gz'],
                ['CD03', 'SRR24448335.fastq.gz']]

for data in kallistoData:
  folder = f"kallisto/{data[0]}"
  fileloc = data[1]

  !kallisto quant \
  -i {index_file} \
  -o "$folder" \
  -t 2 \
  --single -l 250 -s 30 \
  "$fileloc"

## **9) Zip & download kallisto Folder**

In [None]:
!zip -r kallisto.zip kallisto

from google.colab import files
files.download("kallisto.zip")