<a href="https://colab.research.google.com/github/SenseiBassa/Bioinformatics-Projects-HackBio-/blob/main/WGS_Variant_Analysis_Human_Project_7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Clinical Case Presentation – WGS Variant Analysis: Human
**By:** Bassa Joshua Samuel  
**HackBio Internship – Week 3**

---

## Patient Background
- **Patient X (25-year-old male):** Recurrent severe fatigue, jaundice, joint pain, and anemia since childhood.  
- **Laboratory findings:** Hemoglobin 6–8 g/dL, elevated reticulocytes, hemolysis.  
- **Family:** African descent; mother sequenced (Patient Y), father unavailable.  
- **Clinical suspicion:** Genetic cause (hemoglobinopathy).  

---

## Objective
- Identify the causal mutation using WGS data.  
- Annotate and interpret the mutation.  
- Provide clinical recommendations for diagnosis and management.  

---

## Tools
- **GATK** (Genome Analysis Toolkit)  
- **Reference genome:** GRCh38 (hg38.fa, .fai, .dict)  
- **Dataset:** Single-end FASTQ reads from `/data/human_stage_1/`

---

### Variant Annotation

Use ANNOVAR or Ensembl VEP to annotate cohort.vcf.gz and detect pathogenic variants.

### Expected Findings

Likely mutation in HBB gene (e.g., Glu6Val, rs334) consistent with Sickle Cell Disease.

Compare patient vs mother to confirm inheritance pattern (carrier vs affected).

### Clinical Recommendations

Confirmatory Test: Hemoglobin electrophoresis or targeted PCR assay for HBB mutation.

### Management Strategies:

- Hydroxyurea therapy (reduces sickle crises).

- Regular monitoring and transfusion support.

- Genetic counseling for patient and family.

- Consider curative approaches such as stem cell transplantation or emerging gene therapy.


In [None]:
# Define directories
DATA=/data/human_stage_1/
REF=/data/ref/

# Step 1: Quality control
fastqc ${DATA}PatientX.fastq.gz ${DATA}PatientY.fastq.gz

# Step 2: Read trimming
fastp -i ${DATA}PatientX.fastq.gz -o PatientX.trimmed.fastq.gz \
      -i ${DATA}PatientY.fastq.gz -o PatientY.trimmed.fastq.gz

# Step 3: Alignment to GRCh38 and BAM processing
for SAMPLE in PatientX PatientY
do
  bwa mem -t 8 ${REF}hg38.fa ${SAMPLE}.trimmed.fastq.gz \
    | samtools sort -o ${SAMPLE}.sorted.bam
  samtools index ${SAMPLE}.sorted.bam
done

# Step 4: Variant calling (GVCF mode)
for SAMPLE in PatientX PatientY
do
  gatk HaplotypeCaller \
    -R ${REF}hg38.fa \
    -I ${SAMPLE}.sorted.bam \
    -O ${SAMPLE}.g.vcf.gz \
    -ERC GVCF
done

# Step 5: Joint genotyping
gatk CombineGVCFs \
  -R ${REF}hg38.fa \
  -V PatientX.g.vcf.gz \
  -V PatientY.g.vcf.gz \
  -O cohort.g.vcf.gz

gatk GenotypeGVCFs \
  -R ${REF}hg38.fa \
  -V cohort.g.vcf.gz \
  -O cohort.vcf.gz
