# ABA normalization

In list_top_genes.csv around 4000 genes are mentioned.
<br>Around 3000 of them are mentioned as used in analysis. (saved in genes_mmp)
<br>2941 of those genes could be found on ABA atlas. (saved in finalgenes_mmp)
<br>**Final genes need to get transposed. (saved in finalgenes_T)
<br>Different normalizations**:
* Including -1, scaled (std)
* Including -1, L2 normalized
* -1 to 0, scaled
* -1 to 0, L2 normalized

In [2]:
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import normalize
from sklearn.preprocessing import StandardScaler

In [4]:
# Loading the top 2941 genes
X_gv = np.memmap('/data/bioprotean/ABA/MEMMAP/genes_list/finalgenes_mmp.mymemmap',\
dtype='float32', mode='r', shape=(2941,159326))

In [5]:
# Transposing the matrix
X = np.transpose(X_gv)

X_save = np.memmap('/data/bioprotean/ABA/MEMMAP/genes_list/finalgenes_T.mymemmap',\
dtype='float32', mode='w+', shape=(159326,2941))

X_save[:,:] = X[:,:]

In [6]:
# Including -1s, and scaling the values
X_std = StandardScaler().fit_transform(X)

X_save = np.memmap('/data/bioprotean/ABA/MEMMAP/genes_list/finalgenes_std.mymemmap',\
dtype='float32', mode='w+', shape=(159326,2941))

X_save[:,:] = X_std[:,:]

In [7]:
# Including -1s, L2 normalizing
X_norm = normalize(X, norm='l2', axis=1, copy=True)

X_save = np.memmap('/data/bioprotean/ABA/MEMMAP/genes_list/finalgenes_L2.mymemmap',\
dtype='float32', mode='w+', shape=(159326,2941))

X_save[:,:] = X_norm[:,:]

In [8]:
# Chaning all negative values to 0
X_pos = np.where(X < 0, 0, X)

In [9]:
# No -1s, and scaling the values
X_std = StandardScaler().fit_transform(X_pos)

X_save = np.memmap('/data/bioprotean/ABA/MEMMAP/genes_list/finalgenes_pos_std.mymemmap',\
dtype='float32', mode='w+', shape=(159326,2941))

X_save[:,:] = X_std[:,:]

In [10]:
# No -1s, L2 normalizing
X_norm = normalize(X_pos, norm='l2', axis=1, copy=True)

X_save = np.memmap('/data/bioprotean/ABA/MEMMAP/genes_list/finalgenes_pos_L2.mymemmap',\
dtype='float32', mode='w+', shape=(159326,2941))

X_save[:,:] = X_norm[:,:]