# Antisense lncRNA
An antisense lncRNA transcript complements an mRNA gene or transcript and regulates it through binding. The antisense lncRNA gene is usually located near the gene it regulates, including possibly being on the opposite strand of the same locus. Antisense lncRNA probably functions within the nucleus.

Antisense genes are not annotated as such in our GenCode GFF file.
But there is official nomenclature for antisense genes in human: GENE-AS#.
See [A guide to naming human non-coding RNA genes](https://www.embopress.org/doi/full/10.15252/embj.2019103777), EMBO 2020.

We generated this list of antisense lncRNA genes. The name gff is a soft link to gencode.v43.chr_patch_hapl_scaff.annotation.gff3 from GenCode.

    $ grep -v '^#' gff | awk '{if ($3=="gene") print $0;}' | grep "\-AS[0-9]" | cut -f 9 | cut -d ';' -f 1 | cut -c 4- | cut -d '.' -f 1 | sort > antisense.lncRNA.genes.txt

In [1]:
antisense_genes = set()
filename='/Users/jasonmiller/WVU/Localization/GenCode/GenCode43/antisense.lncRNA.genes.txt'
with open (filename, 'r') as fin:
    for line in fin:
        gene = line.strip()
        antisense_genes.add(gene)
print('Num antisense lncRNA genes:', len(antisense_genes))
print('Examples:', list(antisense_genes)[:3])

Num antisense lncRNA genes: 1926
Examples: ['ENSG00000224328', 'ENSG00000236859', 'ENSG00000281032']


In [2]:
all_cnrci = dict()
filename='/Users/jasonmiller/WVU/Localization/TrainTest/TrainTest_ver43/all.lncRNA_RCI.csv'
with open (filename, 'r') as fin:
    heading=None
    for line in fin:
        line = line.strip()
        if heading is None:
            heading = line
        else:
            fields = line.split(',')
            gene = fields.pop(0)
            ## fields[1] = 'nan' ## use this to exclude H1.hESC cell line, otherwise comment out this line
            cnrci = [float(x) for x in fields]
            all_cnrci[gene] = cnrci
print('Num genes with CNRCI', len(all_cnrci.keys()))
print('Examples:', [p for p in all_cnrci.items()][:3])

Num genes with CNRCI 6423
Examples: [('ENSG00000082929', [nan, nan, nan, nan, nan, nan, 1.23491, nan, nan, nan, nan, nan, nan, nan, nan]), ('ENSG00000099869', [nan, 1.0, nan, 0.00846158, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]), ('ENSG00000100181', [nan, -0.192645, nan, nan, nan, -1.01879, nan, nan, 0.0238308, -0.161441, nan, nan, nan, -0.404775, nan])]


In [3]:
import numpy as np
def get_stats(dictionary):
    data = []
    min_min = 10000
    max_max = -10000
    means = []
    for gene in dictionary.keys():
        cnrci_list = dictionary[gene]
        count_this = np.count_nonzero(~np.isnan(cnrci_list))
        min_this = np.nanmin(cnrci_list)
        max_this = np.nanmax(cnrci_list)
        range_this = max_this - min_this
        mean_this = np.nanmean(cnrci_list) 
        if not np.isnan(mean_this):
            data1 = [gene,count_this,min_this,max_this,range_this,mean_this]
            data.append(data1)
            min_min = min(min_min, min_this)
            max_max = max(max_max, max_this)
            means.append(mean_this)
    mean_mean = np.mean(means)
    print('Overall min, max, mean:', min_min, max_max, mean_mean)
    return data

In [4]:
antisense_cnrci = dict()
other_cnrci = dict()
for gene in all_cnrci.keys():
    if gene in antisense_genes:
        antisense_cnrci[gene]=all_cnrci[gene]
    else:
        other_cnrci[gene]=all_cnrci[gene]

In [5]:
print('All genes', len(all_cnrci))
data_all = get_stats(all_cnrci)

All genes 6423
Overall min, max, mean: -10.255 5.58139 -1.0394403070479752


In [6]:
print('Antisense genes', len(antisense_cnrci))
data_anti = get_stats(antisense_cnrci)

Antisense genes 1007
Overall min, max, mean: -7.55636 4.58496 -0.573704981244086


In [7]:
print('Other genes', len(other_cnrci))
data_other = get_stats(other_cnrci)

Other genes 5416
Overall min, max, mean: -10.255 5.58139 -1.126034744471261


## Conclusion
Mean lncRNA CNRCI across 15 cell lines breakdown:
* antisense genes: -0.57 (slightly nuclear)
* other genes: -1.12 (nuclear)
* all genes: -1.039 (nuclear)

Contrary to expectation, the mean CNRCI is higher (more cytoplasmic)
for antisense genes than for other genes.

# Max Range
Find examples of a gene with a large range of CNRCI across cell lines.

In [8]:
for data in data_all:
    gene,count_this,min_this,max_this,range_this,mean_this = data
    if min_this<-3 and max_this>3:
        print(gene,all_cnrci[gene])

ENSG00000252690 [nan, 4.90689, nan, -2.85982, -3.27473, nan, -5.20107, -3.06575, nan, -7.69846, -0.725984, nan, -1.48286, nan, nan]
ENSG00000259865 [nan, 3.86942, nan, -2.15141, -2.93709, nan, -4.85943, -2.47573, nan, nan, -0.568702, nan, -0.898076, nan, nan]
ENSG00000260566 [nan, 3.39232, 0.0103419, 0.316555, -0.678072, -1.54029, -3.33084, -1.50321, nan, nan, -0.749954, -1.08134, -1.89308, nan, nan]
ENSG00000261094 [nan, 3.10281, nan, 0.0140409, -1.04689, nan, -1.71193, -1.36246, -3.20163, -2.84489, 0.743501, -2.05964, 0.903014, nan, nan]
ENSG00000261366 [nan, 3.41504, nan, nan, -2.66432, -3.79008, -2.848, -2.14839, -4.78136, nan, -0.172533, nan, -1.45409, nan, nan]
ENSG00000269939 [-0.257158, 3.10434, nan, -2.01282, -3.88716, -2.43741, -4.365, -0.905957, nan, nan, nan, nan, -0.437405, -0.975338, nan]
ENSG00000273321 [nan, 3.1375, -3.33498, nan, nan, nan, -1.96025, nan, nan, -5.01142, nan, nan, nan, nan, nan]
ENSG00000274227 [nan, -1.27302, nan, nan, nan, -3.79542, -5.28077, 3.94251, 

Above:
There are 14 cases of one gene having CNRCI above 3 and below -3.
All but 2 of those have their largest CNRCI in cell line 1 = H1.hESC.
This agrees with our previous observations that H1.hESC is least correlated with the other cell lines.

Here are the two exceptions. Little is known about either one. These genes are recent additions to the human genome assembly. Their high cytoplasmic scores were not from the same cell line. One of these genes has only one (known) transcript, indicating the high and low CNRCI are not due to isoforms.

* ENSG00000274227 - 'novel transcript, antisense to ERP29', 1 transcript
* ENSG00000276077 - 'novel transcript', 7 transcripts

This review gives examples of cytoplasmic AS-lncRNA function: 
[link](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7847652/)

This review says antisense lncRNA can be nuclear under normal conditions and cytoplasmic under stress conditions:
[link](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5963534/)

The implication for our work is that one lncRNA might contain both kinds of K-mers: the ones for nuclear retention under some conditions, and the ones for export under other conditions. We don't know whether the cell lines in lncAtlas were grown under stress conditions or not -- probably not. But the CNRCI might be more dependent on environmental factors than on tissue type.
