Umair Khan

Weighted Gene Co-expression Network Analysis of CD8 and CD4 T Cells of Infected Covid-19 Patients


Copy of R Markdown to read in RDS File containing metadata and get the scaled gene expression data matrix of highly variable genes.

Here I use the Seurat package to subset the data to remove unwanted genes and cells, extract different T-Cells in the expression matrix, normalize, find variable genes, and scale. 

memory.limit(30000)
library(Seurat)
pbmcs <- readRDS("C:\\Users\\zulfi\\Desktop\\blish_covid.seu.rds")
DefaultAssay(pbmcs) <- "RNA"

pbmcs <- subset(pbmcs, subset = nFeature_RNA < 4000 & nFeature_RNA > 500 & percent.mt < 10 & percent.mt > 2.5)


Idents(object = pbmcs) <- "cell.type"
TCells <- subset(pbmcs,idents = c("CD4m T","CD4n T","CD8m T","CD8eff T"))


covTcells <- NormalizeData(covTcells,normalization.method = "LogNormalize",scale.factor = 10000, assay = "RNA")

covTcells <- FindVariableFeatures(covTcells, selection.method = "vst", nfeatures = 2000,assay = "RNA")

covTcells <- ScaleData(covTcells,assay = "RNA",vars.to.regress = "percent.mt",do.scale = TRUE,
                       do.center = TRUE)

library(data.table)
covMat <- t(covTcells@assays$RNA@scale.data)
covMat <- as.data.frame(covMat)
df <- fwrite(covMat,file="covTcells.csv")


In [44]:
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
from scipy.stats.mstats import rankdata 
import community as community_louvain

class CoexpressionNetwork:
    
    '''
    Python version of R's WGCNA using a biweight midcorrelation for the similarity matrix and topological overlap 
    measure for the network proximity measure. I convert the topological overlap measure dissimilarity matrix to a 
    networkx object and implement the Louvain method for finding network modularity. Last, I find hub genes for each 
    cluster membership found by Louvain. 
    
    Metadata of single-cell peripheral blood mononuclear cells of infected Covid Patients from the Human
    Covid-19 Cell Atlas.
    
    '''
    
    def __init__(self,dgeMatrix,softThres,transform):
        
        '''
        Class Parameters:
        
        dgeMatrix: CSV File containing a scaled gene expression of highly variable genes only.
        softThres: Power to take for soft thresholding 
        transform: Number to transform matrix so the signed matrix can be on the 0-1 scale.
        
        '''
        
        self.dgeMatrix = pd.read_csv(dgeMatrix)
        self.softThres = softThres
        self.transform = transform
        self.mappings = { index:column for index,column in enumerate(self.dgeMatrix) }
        self.dgeMatrix = self.dgeMatrix.to_numpy(dtype='float')
        self.bicorMat = np.zeros([self.dgeMatrix.shape[1],self.dgeMatrix.shape[1]])
        self.adjMat = np.array([self.dgeMatrix.shape[1],self.dgeMatrix.shape[1]])
        self.tomMat = np.zeros([self.dgeMatrix.shape[1],self.dgeMatrix.shape[1]])
        self.corNet = nx.Graph()
        self.partitions = {}
        
    def bicorMatrix(self):
        
        '''
        Computes the biweight midcorrelation matrix.
        Algorithm from : https://en.wikipedia.org/wiki/Biweight_midcorrelation
        
        '''
        X = self.dgeMatrix - np.median(self.dgeMatrix,axis=0)
        
        U = X / (9*np.median(np.abs(X),axis=0))
        
        W = ((1 - U ** 2) ** 2) * ((1-np.abs(U)) > 0)
    
        E = X*W
                
        for i in range(self.dgeMatrix.shape[1]):
            
            for j in range(self.dgeMatrix.shape[1]):
                
                self.bicorMat[i,j] = np.dot(E[:,i],E[:,j]) / (np.sqrt(np.sum(E[:,i] ** 2)) * np.sqrt(np.sum(E[:,j] ** 2)))
                  
        return self.bicorMat
                              

    def adjacencyMatrix(self):
        
        '''
        Creates a weighted adjacency matrix/network by transforming the biweight midcorrelation matrix
        to a signed weighted co-expression graph on the 0-1 scale and then applying a power function 
        for soft thresholding, as opposed to hard thresholding for the adjacency matrix. For transforming,
        multiply correlation matrix by 0.5 then add by 0.5. Then we take the power of the result by a softhresholding
        value.
        
        '''
        
        self.adjMat = np.power(self.bicorMat*self.transform + self.transform, self.softThres)
            
        np.fill_diagonal(self.adjMat,0)
        
        return self.adjMat
    
   
    def tomMatrix(self):
        
        '''
        Computes the topological overlap measurement(TOM) to make a TOM dissimilarity matrix.
        Algorithm from : https://dibernardo.tigem.it/files/papers/2008/zhangbin-statappsgeneticsmolbio.pdf
        
        '''
        
        L = np.matmul(self.adjMat,self.adjMat)
        
        K = self.adjMat.sum(axis=1)
                        
        for i in range(self.adjMat.shape[0]):
            
            for j in range(i+1,self.adjMat.shape[0]):
                
                numerator = L[i,j] + self.adjMat[i,j]
                
                denominator = min(K[i],K[j]) + 1 - self.adjMat[i,j]
                
                self.tomMat[i,j] = numerator/denominator
                                
        self.tomMat += self.tomMat.T
          
        np.fill_diagonal(self.tomMat,1)
        
        self.tomMat = 1 - self.tomMat
            
        return self.tomMat
        
            
    def corrNetwork(self):
        
        '''
        Convert weighted co-expression adjacency network to a networkx graph object and relabel nodes to corresponding 
        integer:gene mapping.
        
        '''

        self.corNet = nx.from_numpy_array(self.tomMat, parallel_edges = False)
        
        self.corNet = nx.relabel.relabel_nodes(self.corNet,self.mappings)
                
        return self.corNet
    
    def louvain(self):
        
        '''
        Use the Louvain method for finding network modules. From python-louvain and networkx. 
        
        '''
        
        self.partitions = community_louvain.best_partition(self.corNet,weight='weight',resolution=1.0)
        
        return self.partitions


    def hubGenes(self):
        
        '''
        Find hub genes within each cluster. geneClusters is a dictionary where a list of genes are keyed their cluster id 
        membership. networkClusters is a list containing networkx graph objects only containing genes by corresponding
        cluster id. I then use nx.hits_numpy to find hub genes for each graph objects and return the top 20 hub genes.
        
        Hubs found using HITS algorithm using networkx's built in function
    
        '''
        
        geneClusters,networkClusters = {},[]
        
        for genes,cluster in self.partitions.items():
            
            if cluster not in geneClusters:
                
                geneClusters[cluster] = [] 
                
            geneClusters[cluster].append(genes)
                        
        for index in geneClusters:

            networkClusters.append(nx.Graph(((u,v,w) for u,v,w in self.corNet.edges(data=True) if u in geneClusters[index])))
        
        hubs = [ nx.hits_numpy(n) for n in networkClusters ]
        
        output =  [ sorted(j[0].items(),key=lambda x:x[1],reverse = True)[:20] for j in hubs ]
        
        return output
        
    

In [45]:
cn = CoexpressionNetwork(dgeMatrix='covTcells.csv',softThres=12,transform=0.5)
cn.bicorMatrix()
cn.adjacencyMatrix()
cn.tomMatrix()
cn.corrNetwork()
cn.louvain()
cn.hubGenes()

[[('CANX', 0.001684423549335662),
  ('FTL', 0.0016217710091694219),
  ('ARPC5', 0.0015234710387957446),
  ('MALAT1', 0.0014907247540372804),
  ('ANXA1', 0.0014887388922999738),
  ('CCL5', 0.0014694832332506273),
  ('CALR', 0.0013988704984717977),
  ('DEK', 0.001354717558426897),
  ('CCT3', 0.0013458539423112858),
  ('ITGA4', 0.0013430360986003789),
  ('ATP5A1', 0.001334474656488923),
  ('CD8A', 0.0013304131752991959),
  ('GPRIN3', 0.0013296992125983666),
  ('ANP32E', 0.0013171288618258998),
  ('EFHD2', 0.0013106884630893244),
  ('BIRC3', 0.0013102212436055605),
  ('CTSS', 0.001309039084519524),
  ('HMGB1', 0.0013082055981069555),
  ('DNAJA1', 0.0013045927815311768),
  ('ITGB1', 0.001301820813357916)],
 [('ACTB', 0.0009756952815166152),
  ('ACTR3', 0.0009730277682695686),
  ('ACTG1', 0.0009692350881815931),
  ('ARPC2', 0.0009625481428736085),
  ('CAP1', 0.0009615419476105815),
  ('ENO1', 0.0009574758452606727),
  ('HNRNPA2B1', 0.0009533975848766701),
  ('HSP90AA1', 0.0009525767610006365



Louvain modularity identifies two clusters taking the topological overlap measure matrix as input. Gene functions taken from genecards.org, unless otherwise cited.

Module 1 identifies genes associated in neuropathogeneis including GPRIN3(implicated in neurite growth), ANP32E(implicated in cerebellar genesis and synaptogenesis), and FTL induced neurodegeneration due to oxidative stress in the ETC. Module 1's genes with Metastatic properties include MALAT1(a well known long spliced non-coding RNA that been noted to have a role in transcriptional regulation of genes involved in cancer metastasis, cell migration, and upregulation of cancerous tissues by proliferation, CALR which produces the calreticulin protein involved in cell migration,cell adhesion, gene activity,and cell growth(1) through calcium regulation, and BIRC3 modulating inflammatory signaling/immunity, mitogenic kinase signaling,cell proliferation, cell migration, and metastasis. ITGA4 and its interaction with ITGAB1 has been studied to influence cell migration and motility and are receptors for fibronectin known to function for cell differentiation, migration and wound healing. Integrin family proteins also exhibit properties to induce cell mobility residing near the extracellular matrix, and this along with CTSS that is actively involved in blood vessel angiogenesis(2) suggest the presence of these family of proteins at the site of blood vessel differentiation. 

Module 2 identifies genes relating to actin polymerization including ACTB, ACTR3, ACTG1,ARPC2,CAP1(implicated in complex developent and morphological processes). ENO1 has suggested functions  in the intravascular and pericellular fibrinolytic system due to its ability to serve as a receptor and activator of plasminogen on the cell surface of several cell-types such as leukocytes and neurons. Isoforms of LCP1 are known to have replication potential in epithelial cells, fibroblasts, and endothelial cells and LTB has been involved in cancer metastasis. HSP90AA1, HSP90AB1, HSP90B1,HSPA5, and HNRNPA2BI(pre-mRNA processing) are mostly molecular chaperone proteins with functions ranging from protein quality control, folding, and morphological evolution, structure, maintenence, where HSP90 co modulate transcription in three levels including physiological cues, steady-state levels in epigenetic modifiers, and gene expression through the eviction of histones on promoter regions. MT-CO1,MT-CO2,MT-CO3, MT-CYB, and MT-ATP6,  constitute mitochondrial genes that have functions including regulation of the electron transport chain, oxidative phosphorylation, and aerobic metabolism. Defencies in MT-ATP6 and MT-CYB are both known to have an effect on neurodegeneration and cardiovascular health, and these set of genes implicate an inflammation that leads to mitochondrial oxidative damage by Sars-Cov-2(3).

Key Takeaway: Hub genes found from gene expression data of T cells from infected Covid-19 patients suggest a multitude of problems that include: 

1. actin polymerization
2. cell migration and metastasis properties
3. mitochondrial level cardiovascular damage
4. implications of neuron differentiation and growth
5. damage to mitochondrial cytochrome subunits

Hub genes mentioned and listed maybe of clinical use to investigate for drug therapeutics to combat Sars-Cov-2 in terms observing pathways in actin growth to reduce virus protrusion and drugs working to improve mitochondrial function and reduce oxidative stress and mitochondrial disfunction.  


Sources:

1. https://ghr.nlm.nih.gov/gene/CALR
2. https://cancerres.aacrjournals.org/content/69/10/4537
3. https://www.sciencedirect.com/science/article/pii/S1567724920301380