### Example pool design

For CRISPR lentiviral libraries, we recommend using the CROPseq vector, which allows direct _in situ_ sequencing of the CRISPR sgRNA. The code in `ops.pool_design` can select sgRNAs for _in situ_ sequencing, combine multiple within one oligo array, and export the final oligos to order.

Inputs needed to design an oligo array with `ops.pool_design`:

- **sgRNA table**: A list of gene IDs and corresponding sgRNAs. 
    - The sgRNAs for each gene can be ranked, so that higher-ranked sgRNAs are selected first. Library designs usually rank sgRNAs by taking into account on-target efficiency, potential off-target sites, and targeting position within a gene.
    - There are many publicly available library designs and design tools for CRISPR screens. In this example, the [Brunello] and [TKOv3] CRISPR KO libraries are used.
- **gene list**: A text file with one gene ID per row. There should be one gene list for each `design` (for example, a pool of all kinases).
- **pool design**: A spreadsheet with one row for each gene set in a pool. 
    - Multiple `subpools`, each with a different gene `design`, can be synthesized in one pool. Subpools with different `dialout` adapters can be specifically amplified by PCR. 
    - Within a `group`, sgRNAs will have unique 5' prefixes so they can be pooled together and read out by 5'-to-3' sequencing-by-synthesis. Prefixes are selected based on `prefix_length` and `edit_distance`. A longer `prefix_length` allows more sgRNAs to be included, but requires more cycles of _in situ_ sequencing. If minimum `edit_distance` between prefixes is increased from 1 to 2 or 3, prefixes will be robust to 1 or 2 single-base errors (insertions, deletions or substitutions resulting from synthesis or sequencing).
    - The library size is set by the number of genes (`num_genes`) and targeting sgRNAs per gene (`sgRNAs_per_gene`). Oligos can be duplicated to balance subpool size or reduce abundance bias due to synthesis (`duplicate_oligos`).

[Brunello]: https://www.addgene.org/pooled-library/broadgpp-human-knockout-brunello/
[TKOv3]: https://www.addgene.org/pooled-library/moffat-crispr-knockout-tkov3/

In [1]:
from ops.imports_ipython import *
import ops.pool_design

# runs example from example_pool/ sub-directory of project
home = os.path.dirname(os.path.dirname(ops.__file__))
os.chdir(os.path.join(home, 'example_pool'))

### Prepare sgRNA table

In [None]:
# Download CRISPR KO library designs
# https://www.addgene.org/pooled-library/broadgpp-human-knockout-brunello/
brunello_url = ('https://www.addgene.org/static/cms/filer_public/'
                '8b/4c/8b4c89d9-eac1-44b2-bb2f-8fea95672705/'
                'broadgpp-brunello-library-contents.txt')

# https://www.addgene.org/pooled-library/moffat-crispr-knockout-tkov3/
tkov3_url = ('https://media.addgene.org/cms/filer_public/'
             '71/a8/71a81179-7a62-4d75-9b53-236e6f6b7d4d/'
             'tkov3_guide_sequence.xlsx')

!curl -LO {brunello_url}
!curl -LO {tkov3_url}

In [3]:
# load brunello library
f = 'broadgpp-brunello-library-contents.txt'
df_brunello = (ops.pool_design.import_brunello(f)
 .assign(source='1_brunello'))

# load TKOv3 library
f = 'NCBI_ids.tsv'
df_ncbi = ops.pool_design.import_hugo_ncbi(f)

f = 'tkov3_guide_sequence.xlsx'
df_tkov3 = (ops.pool_design.import_tkov3(f, df_ncbi)
 .assign(source='2_tkov3'))

# combine libraries
(pd.concat([df_brunello, df_tkov3], sort=True)
 .sort_values(['gene_id', 'source', 'rank'])
 .drop_duplicates('sgRNA')
 .assign(rank=lambda x: ops.utils.rank_by_order(x, 'gene_id'))
 .to_csv('sgRNAs.csv', index=None)
)

df_sgRNAs = (pd.read_csv('sgRNAs.csv')
 .pipe(ops.pool_design.filter_sgRNAs)
)

(df_sgRNAs
# remove non-targeting
 .query('gene_id != -1')
['gene_id'].value_counts().value_counts()
.rename('sgRNA_counts per gene ID'))

8    8935
7    6023
4    1896
6    1650
5     392
3     307
2      28
1      17
Name: sgRNA_counts per gene ID, dtype: int64

### Load pool design and gene lists

In [4]:
df_design = (pd.read_excel('design.xls', skiprows=1)
 .pipe(ops.pool_design.validate_design))

gene_lists = ['X.txt', 'Y.txt', 'Z.txt']
df_genes = (pd.concat([ops.pool_design.load_gene_list(d + '.txt')
           for d in set(df_design['design'])])
 # optionally: convert gene symbols to gene ids
 .join(df_design.set_index('design'), on='design')
 .reset_index(drop=True)
 .pipe(ops.pool_design.validate_genes, df_sgRNAs)
)

### Select sgRNAs

In [5]:
f = 'kosuri_dialout_primers.csv'
dialout_primers = ops.pool_design.import_dialout_primers(f)

cols = ['subpool', 'dialout', 'design', 'vector', 'group', 
        'prefix_length', 'edit_distance', 'gene_id', 'source', 
        'rank', 'duplicate_oligos', 'sgRNA']

df_oligos = (df_genes
 # select sgRNAs separately for each prefix group
 .groupby('group')
 .apply(ops.pool_design.select_prefix_group, df_sgRNAs)
 .reset_index(drop=True)
 [cols]
 # build the full oligo sequence
 .assign(oligo=lambda x: 
         ops.pool_design.build_sgRNA_oligos(x, dialout_primers))
 # add duplicate oligos where requested
 .reset_index(drop=True)
 .pipe(lambda x: 
      x.loc[np.repeat(x.index.values, x['duplicate_oligos'])])
)

### Check for genes with less sgRNAs than requested

In [6]:
designed = (df_oligos
 .drop_duplicates('oligo')
 .groupby(['subpool', 'design', 'gene_id']).size()
)
requested = (df_genes
 .set_index(['subpool', 'design', 'gene_id'])
 ['sgRNAs_per_gene']
)

(requested.sub(designed, fill_value=0)
 .rename('missing_sgRNAs')
 .reset_index()
 .groupby(['subpool', 'design'])
 ['missing_sgRNAs'].value_counts().rename('gene_ids')
 .reset_index()
)

Unnamed: 0,subpool,design,missing_sgRNAs,gene_ids
0,pool0_0,X,0,497
1,pool0_0,X,1,3
2,pool0_1,Y,0,497
3,pool0_1,Y,1,3
4,pool0_2,nontargeting,0,1
5,pool0_3,Z,0,472
6,pool0_3,Z,1,26
7,pool0_3,Z,2,2
8,pool0_3,nontargeting,0,1


### Validate and export oligo pool

In [7]:
df_oligos.to_csv('pool0_design.csv', index=None)

(df_oligos['oligo']
 # optional: randomize oligo order for synthesis
 # .sample(frac=1)
 .to_csv('pool0_oligos.txt', index=None)
)

df_test = (pd.read_csv('pool0_design.csv')
 .pipe(ops.pool_design.build_test, dialout_primers)
 .pipe(ops.pool_design.validate_test))

Looking good!
