# Zhang Library sgRNAs

The second version  of the Zhang library (GeCKOv2) includes sgRNAs targeting human 1,864 miRNAs. Those sgRNAs which are included in GeCKOv2 are annotated as ZhangLibrary 'T'. The sgRNAs from GeCKOv2 were downloaded from <a href="http://genome-engineering.org/gecko/?page_id=15">here</a>.

In [1]:
import data_processing as dp

def zhang_library(library_file, db_name, sql_version="MySQL", firewall=False):
    """
        Annotates miRNA targeting sgRNAs which are in the Zhang Lab's GeCKOv2 Library
    """
    sg_dict = {"SgRNA": []}
    # Need 'rU' because for some reason newline is '\r' in file
    with open(library_file, "rU") as fin:
        for line in fin:
            if line[:3] == "hsa":
                ele = line.strip("\n").split(",")
                sg_dict["SgRNA"] += [ele[2]]
    num_sgs = len(sg_dict["SgRNA"])
    lib_dict = {"ZhangLibrary": ["T"]*num_sgs}
    
    db_con = dp.DatabaseConnection(sql_version, db_name=db_name, firewall=firewall)
    db_con.update_many_rows(lib_dict, sg_dict, "SingleGuideRNA")
    db_con.close_cursor()
    db_con.close_connection()
    return sg_dict

In [2]:
zhang_dict = zhang_library("Published Libraries\Human_GeCKOv2_Library_A_09Mar2015.csv", "miR-test", firewall=True)

We can then make sure all of the Zhang sgRNAs are in our database of sgRNAs. 

In [4]:
db_con = dp.DatabaseConnection("MySQL", db_name="miR-test", firewall=True)
rows = db_con.fetch_query("SELECT SgRNA FROM SingleGuideRNA WHERE ZhangLibrary LIKE 'T';")
db_con.close_cursor()
db_con.close_connection()

my_sgs = []
for row in rows:
    sg, = row
    my_sgs += [sg]
not_in = filter(lambda x: x not in my_sgs, zhang_dict["SgRNA"])
# set removes duplicates
list(set(not_in))

['CGGCTCAGCCCAGATCAGCC',
 'AAAATTATTGTAGTGTGTGT',
 'AATGACCCGGCCTTGGGGTG',
 'ATACGGAATATATATATATA',
 'ATATACGGAATATATATATA',
 'GAGCTGAGCTGGGCTGAGCT',
 'GAGTTGAGCCAGGCTGATCT',
 'CAATATTTTAAGGAATGACC',
 'GGGCTGGGCTGAGTTGAGCC',
 'TCCGACTCATCAATATTTTA',
 'GGTCGCGGGCCCATTAGCTG',
 'AAAATTATTGTAGTGTTTGT',
 'TGGAGGGGTTGTCAGAGCTG',
 'GAAAATTATTGTAGTGTGTG',
 'GCAGCTCAGTACAGGATACT',
 'GTATACGGAATATATATATA',
 'ATATATACGGAATATGTATA',
 'ATATATATGGAATGTATATA']

The sgRNAs which are not in our library are no longer valid after the change from hg19 to hg38.

* 'CGGCTCAGCCCAGATCAGCC', 'GAGTTGAGCCAGGCTGATCT' and 'GGGCTGGGCTGAGTTGAGCC-AGG' are supposed to target mir-4538 (chr14), but has changed to 'CGGCTCAGCCCAGATCAG<b>T</b>C',  'GAGTTGAGCCAG<b>A</b>CTGATCT' and 'GGGCTGGGCTGAGTTGAGCC-AG<b>A</b>'
* 'AAAATTATTGTAGTGTGTGT' is supposed to target miR-3118-1 (chr21), but has changed 'AAAATT<b>G</b>TT<b>C</b>TAGTGTGTGT'
* 'AATGACCCGGCCTTGGGGTG', 'CAATATTTTAAGGAATGACC', 'TCCGACTCATCAATATTTTA' and 'GGTCGCGGGCCCATTAGCTG' are supposed to target mir-4285 (chr7), but has changed to 'AATGACCCGGCC<b>C</b>TGGGGTG',  '<b>A</b>AATATTTTAAGGAATGACC', 'TCCGACTCAT<b>A</b>AATATTTTA' and 'GGTCGCGGGCCCATTAG---'
* 'GAGCTGAGCTGGGCTGAGCT' is supposed to target mir-4539 (chr14), but has changed to 'GAGCTGA<b>A</b>CTGGGCTGAGCT'
* 'AAAATTATTGTAGTGTTTGT' is supposed to target mir-3118-2, 'AAAATTATTGTAGT<b>A</b>TGTGT' 
* 'TGGAGGGGTTGTCAGAGCTG-CGG' is supposed to target mir-6730, 'TGGAGGGGTTGTCAGAGCTG-C<b>A</b>G'
* 'GAAAATTATTGTAGTGTGTG' is supposed to target mir-3118-3 (chr15), 'GAAAATTATTGTAGT<b>A</b>TGTG'
* 'GCAGCTCAGTACAGGATACT' is supposed to target mir-486, 'GCAGCTCAGTACAGGATA<b><u>C</u></b>CT'
* 'ATATACGGAATATATATATA', 'ATATATACGGAATATGTATA' and 'GTATACGGAATATATATATA'  target mir-3669 which has been removed from miRBase