This repository contains a semi-automated workflow using Snakemake for genomic variant comparison in HEK293 cell lines, starting from paired-end whole genome sequencing FASTQ files
Human embryonic kidney cells (HEK293) are widely used in biopharmaceutical manufacturing, particularly for recombinant adeno-associated virus (rAAV) production. Despite their industrial relevance, a comprehensive understanding of their genomic stability remains limited. In this study, we systematically analyzed the genetic landscape of various HEK293 cell lines to evaluate their responses to different cultivation conditions and assess potential implications for rAAV production. Therefore, adherent HEK293 cells were adapted to suspension growth using various serum-free media formulations. Following successful adaptation, whole-genome sequencing was performed on both adapted and parental cell lines. The sequenced reads were then aligned to the human reference genome, enabling the assessment of genome stability, by evaluation of identified structural variants. Comparative analysis, including additional publicly available HEK293 sequences, revealed a conserved genetic core across all lines, regardless of cultivation history or phenotypic divergence. The distribution of structural variants and single nucleotide polymorphisms (SNPs) indicated a gradual accumulation of mutations over time in culture rather than abrupt shifts in response to environmental changes. Notably, the adenoviral genes integrated into the HEK293 genome remained highly conserved both with respect to copy number and integration site. These findings provide insight into the genomic evolution of HEK293 cells and offer a foundation for further multi-omics studies aimed at optimizing rAAV production performance.
-
Quality Control
FastQC: Assess read quality.GATKandBEDTools: Evaluate alignment quality.MultiQC: Summarize quality metrics.
-
Preprocessing
Trimmomatic: Remove adapter sequences using TruSeq3-PE-2 templates.
-
Alignment
BWA-MEM: Align reads against customized reference genomes (hg38 and human adenovirus 5).SAMtools: Sort raw alignments.GATK MarkDuplicates: Deduplicate alignments.
-
Variant Calling
GATK HaplotypeCaller: Identify small genomic variants.Manta: Detect structural rearrangements.SURVIVOR: Filter structural variants (≥ 300 bp).
-
Annotation and Analysis
SnpEff: Functionally annotate variants using the hg38 reference.CNVkit: Evaluate and visualize copy number alterations.
-
Comparative Analysis
R: Custom scripts for comparative analysis of small and structural variants.vcfR,VariantAnnotation,Biostrings: Evaluate variants.UpSetR,ggplot2: Visualize results.
-
Clone the repository
- HTTPS: https://github.com/hek-I-seq/HEK293_genomes.git
- SSH: git@github.com:hek-I-seq/HEK293_genomes.git
-
Ensure all dependencies are installed. Required packages are provided in workflow.
-
Adapt Snakemake if needed before execution
- provide resources/sampleList.csv
- adjust
optional settingsdirectly in workflow/Snakefile data: directory for raw reads .fastq filesUpSetR: Visualization scripts can be used for 2 to 13 samples, adapt rule input accordingly