
# 🧬 CRISPR Base Editing Practical with BEstimate

Welcome! This practical will walk you through:

✅ Preparing your environment (in Colab)  

✅ Preparing the input and running BEstimate

✅ Interpreting the results

✅ Library design  

---



## 1️⃣ Pre-Preparation (Run This Before the Practical)

> **Important:** Run these two sections to install everything you will need for the training.


In [None]:
import os

# Install BEstimate directly from GitHub
!git clone https://github.com/CansuDincer/BEstimate.git
os.chdir("/content/BEstimate/")


In [None]:
!pip3 install -r /content/BEstimate/requirements.txt

**Please restart your session so the packages will be installed!**

**Important:** Please download genome indices for off target analysis from [Figshare](https://figshare.com/s/b23f86418ae71de0a759). After you install BEstimate, you need to put offtargets folder inside BEstimate folder.

> Downloading and indexing the genome in the practical session are infeasible. However you can use the code as below (x_genome.py) on your own in a linux environment

In [None]:
#!python3 x_genome.py --pamseq NGG --assembly GRCh38 --ensembl_version 113

In [1]:
import os, pandas

In [2]:
# Make an output folder inside content directory
#os.mkdir("/content/output/")

# Change the path to inside BEstimate folder
os.chdir("/content/BEstimate/BEstimate/")

## 2️⃣ Designing gRNAs for Base Editors

🧬 To find the most appropriate gRNA for our experiments, we should decide:

1. Length of the protospacer and PAM sequences
  - Typically the protospacer sequence is 20.
  - PAM is more divergent however the most frequently used one is NGG or NGN
2. The sequence interval of the activity window.
  - Typically activity window reside between 4-8 or 3-9 nucleotides on the protospacer sequence.
3. The editable nucleotides
  - CBE or ABE
  - For a novel base editor, you can specify any nucleotide change

After the information related to Base Editors, you should also decide which gene is your interest:

1. Hugo symbol of the gene
2. (Optionally) Ensembl Transcript ID
3. (Optionally) Uniprot ID
4. Any variants you want to incorporate (HGVS structure)



##3️⃣ Running BEstimate on Example Genes

Let's design base editor guides for *SRY* as practice.


### **Mutagenesis on *SRY* gene**

In [None]:
# Run BEstimate with SRY gene for ABE
!python3 BEstimate.py -gene SRY -assembly GRCh38 -pamseq NGG -pamwin 21-23 -actwin 4-8 -protolen 20 -edit A -edit_to G -vep -o /content/output/ -ofile SRY_ABE_NGG


In [None]:
# Run BEstimate with SRY gene for CBE
!python3 BEstimate.py -gene SRY -assembly GRCh38 -pamseq NGG -pamwin 21-23 -actwin 4-8 -protolen 20 -edit C -edit_to T -vep -o /content/output/ -ofile SRY_CBE_NGG

# If you want off targets
#!python3 BEstimate.py -gene SRY -assembly GRCh38 -pamseq NGG -pamwin 21-23 -actwin 4-8 -protolen 20 -edit C -edit_to T -vep -o /content/output/ -ofile SRY_CBE_NGG -ot -ot_path /content/BEstimate/offtargets/

### **Reverting sickle-cel disease associated variant**

Sickle cell disease - mutation β-globin gene (*HBB*): g.5227002A>T in GRCh38, p.Glu7Val

In [40]:
# Generate a mutation file
f = open("/content/sickle_cell_variant.txt", "w")
f.writelines("11:g.5227002A>T")
f.close()

In [None]:
# Run BEstimate with example input
!python3 BEstimate.py -gene HBB -assembly GRCh38 -transcript ENST00000335295 -mutation_file /content/sickle_cell_variant.txt -pamseq NGN -pamwin 21-23 -actwin 3-9 -protolen 20 -edit A -edit_to G -o /content/output/ -ofile HBB_variant_specific_ABE_NGN



## 4️⃣ Exploring BEstimate Outputs and Interpreting Results

Your results are saved in the `/content/output/` folder.

To check what was generated, run:


In [None]:
# List results
!ls -lh /content/output/


**What to look for:**

- Summary `.csv` tables listing guides

- Editable nucleotides with annotations of predicted edits


You can download these files or open them directly in Colab for inspection.


### SRY mutagenesis results

**Let's start with the *edit table*, including gRNAs and their editable nucleotides and sequence information**

In [None]:
edit_df = pandas.read_csv("/content/output/SRY_ABE_NGG_edit_df.csv")
edit_df[:5]

In [None]:
# Check the information with edit file
edit_df.columns

In [None]:
# The number of editable gRNAs
len(edit_df.CRISPR_PAM_Sequence.unique())

In [None]:
# Number of gRNAs within the coding sequence
len(edit_df[edit_df.guide_in_CDS].CRISPR_PAM_Sequence.unique())

In [None]:
# Number of gRNAs with editable nucleotide within the coding sequence
len(edit_df[edit_df.Edit_in_CDS].CRISPR_PAM_Sequence.unique())

In [None]:
# Number of gRNAs with editable nucleotide within the coding sequence w/out polyT
len(edit_df[(edit_df.Edit_in_CDS) & (~edit_df.Poly_T)].CRISPR_PAM_Sequence.unique())

**Let's continue with the *protein table*, including VEP, Uniprot and Interactome Insider annotations**

In [None]:
protein_df = pandas.read_csv("/content/output/SRY_ABE_NGG_protein_df.csv")
protein_df[:5]

In [None]:
# Check the information with protein file
protein_df.columns

**!!!Since there can be several editable nucleotide, multiple edits on the sequence with a gRNA is possible.**

In [None]:
# The most severe consequences from the *SRY* gene targteing gRNAs
protein_df.most_severe_consequence.unique()

In [None]:
# Protein positions of the potential edits
protein_df.Protein_Position.unique()

In [None]:
# Targeted functional domains
protein_df.curated_Domain.unique()

In [None]:
# Whether any gRNAs with clinical consequences
protein_df[~pandas.isna(protein_df.is_clinical) & (protein_df.is_clinical)][[
    'Hugo_Symbol',  'gRNA_Target_Sequence', 'Edit_Location', 'most_severe_consequence', 'Protein_Position', 'Edited_AA','New_AA', 'clinical_id']]

In [None]:
# Whether any gRNAs with clinical consequences
protein_df[pandas.isna(protein_df.is_clinical)][[
    'Hugo_Symbol', 'Direction', 'gRNA_Target_Sequence', 'Edit_Location', 'most_severe_consequence', 'Protein_Position', 'Edited_AA','New_AA', 'clinical_id']]

In [None]:
protein_df[protein_df.most_severe_consequence == "missense_variant"][[
    'Hugo_Symbol', 'gRNA_Target_Sequence', 'most_severe_consequence', 'Protein_Position','Protein_Change','curated_Domain']]

In [None]:
# When you run BEstimate with off targets
grna_df = pandas.read_csv("/content/output/SRY_ABE_NGG_ot_annotated_df.csv",)
grna_df[:5]

In [None]:
# Find gRNAs without any off targets
grna_df[(grna_df.exact == 1) & (grna_df.mm1 == 0) & (grna_df.mm2 == 0) & (grna_df.mm3 == 0)]

### Sickle cell reversion results

In [None]:
hbb_mut_df = pandas.read_csv("/content/output/HBB_variant_specific_ABE_NGN_edit_df.csv", index_col=0)
hbb_mut_df[:5]

In [None]:
# Find the gRNA changing the variation
hbb_mut_df[hbb_mut_df.guide_change_mutation]

**Important:** WT codon is GAG and the mutant one is GTG

Mutation is on 5227002 and mutant sequence between 5227001-5227003 is CAC --> ABE --> CGC (+1 strand)
GCG --> Ala --> naturally occurring, non-sickling variant hemoglobin "Makassar" (HbG)

### **Exercises**

We can manipulate more sequences using both ABE and CBE. Could you please find all the positions that we can edit with ABE and CBE together? Also, how many nucleotides that we can edit with exclusively with ABE and CBE, and using both of them?

Due to the redundancy in the codons, different nucleotide changes can make same changes in protein sequences. Could you please find the same metric in amino acid changes rather than nucleotide changes?

Are there multiple valid gRNAs? How would you prioritise them? Please examplify with *SRY* gene?

### Key points to review in your output tables



- **Base Change**: Given your experiments of interest, you may highlight gRNA targeting specific domains, post translational modification sites, splice sites or clinically important locations.
  - gRNAs that target coding regions resulting in functional consequences like amino acid changes. You can eliminate gRNAs only generating synonymous alterations.
  - gRNAs can also edit non-coding regions, you may want to work with a regulatory region, such as promoters, splice sites. (*If unintended, avoid gRNAs that disrupt known splice sites unless this is the intended effect.*)
  - gRNAs can replicate or revert known pathogenic SNPs, you may want to investigate disease models or corrections.
  - gRNAs targeting highly conserved sequences tend to have more severe functional consequences. You can check the functional consequences and select gRNAs of your interest.

- **Off-targets**: It is a good practice to choose gRNAs with minimal off-target effects.

Note: On-Target Efficiency: You may want to select gRNAs with a high on-target efficiency which can you obtain through BE-Hive. (*It is not provided by BEstimate*)


## 5️⃣ Controls in library design


When generating a gRNA library for base editing, incorporating proper controls is essential for ensuring the reliability and interpretability of your experimental results. The controls help validate the functional outcomes of your gRNAs.

1. Positive Controls which help confirm that your base editing system is working efficiently and that the experimental conditions are optimal.

  - gRNAs targeting essential genes that are essential for cell viability (such as housekeeping genes) where editing should have measurable phenotypic effects like cell death or reduced growth.

2. Negative Controls which are critical to assess the background levels of editing and off-target effects. They ensure that observed changes are due to base editing rather than random or non-specific effects.

  - Non-targeting gRNAs help establish the baseline for off-target activity and general effects of transfection or editing. These controls are typically random sequences with no homology to the genome but are designed to resemble real gRNAs in structure.

  - gRNAs targeting non-essential genes that are expected to result in no significant phenotypic effect from the base editing.




## 🛠️ Troubleshooting Tips

❗ **No module named BEstimate** → Rerun the installation cell at the top and do not forget to restart the session! After you restart your session, you should not run it again.


❗ **Permission errors** → Make sure you are running in a writable Colab notebook.  


## 🎉 Wrap-up

With this practical course, you have now:

✅ Set up your environment

✅ Designed base editor gRNAs with BEstimate

✅ Learned how to interpret your results

✅ Learned things to consider while selecting your gRNAs and designing your library



**Next steps:** You can try using your own genes or variants as input!



Questions? Ask during the live session or contact me at cd7@sanger.ac.uk
