---
---
Paper Summary
---
---


---
### Preamble
---

1. Non-independence of TPMs evolution is a confouding factor in HC.

2. This is mostly due to pleitropic effects.

3. The resulting effect is for units (cells/genes) to cluster by species instead of by homology.

4. This level of correlation in TPMs is time varying, with a decreasing effect as the time gap between tissues increases. The reverse of which is also true.

5. Removing the LC(T)E can improve clustering.

---
## Methods
---



### Estimating LC(T)E

---
a. Procedure:

(i) Obtain TPM
*   For each gene, obtain the read count in kilobases.
*   Divide this by the gene length to get read per kilobase (RPK).
*   Count all RPK values in a sample and divide by 1 million to get TPM. Source: [CCR](https://btep.ccr.cancer.gov/question/faq/what-is-the-difference-between-rpkm-fpkm-and-tpm/)

(ii) Normalize and Transform
*   Using one-to-one ortholog transforms by Muser - NWU
*   Apply the square-root transform and remove genes that still exhibit high variance.

(iii) Estimate LC(T)E for all combinations of of tissues and cells by computing the pearson correlation coefficient of their contrast vectors.

You can now test for the significance of the correlation via the permutation test.

---
---

In [None]:
## import libraries
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
## import sklearn modules
from sklearn.model_selection import train_test_split
## import keras models, layers and optimizers
import tensorflow as tf
import keras
import keras.backend as K
from keras.models import Sequential, Model
from keras.layers import Embedding, Flatten, Dense, Dropout, concatenate, multiply, Input, Dot, Masking, Multiply
from keras.optimizers import Adam
from mpl_toolkits.mplot3d import Axes3D
from sklearn.decomposition import PCA
from keras.constraints import Constraint

Using TensorFlow backend.


In [None]:
data1 = pd.read_csv("https://sixtusdakurah.com/projects/liger/stim_sparse_dfp.csv", sep=",", header=0)
# https://sixtusdakurah.com/projects/liger/stim_sparse_dfp.csv
data2 = pd.read_csv("https://sixtusdakurah.com/projects/liger/ctrl_sparse_dfp.csv", sep=",", header=0)
# datan = pd.read_csv("https://raw.githubusercontent.com/Mcpaeis/Independent-Study/master/Data/xns0-inmf-paper.csv")
data1 = data1.iloc[0:3000, ]
data2 = data2.iloc[0:3000, ]
print("Shape of data 1:", data1.shape)
print("Shape of data 2:", data2.shape)
#data3 = pd.concat([data1, data2])
#print("Shape of data 3:", data3.shape)
#print("Shape of data n:", datan.shape)

Shape of data 1: (3000, 3001)
Shape of data 2: (3000, 3001)


In [None]:
# get the extracted columns
x1_genes = data1['Unnamed: 0']
print(x1_genes)
x2_genes = data2['Unnamed: 0']
print(x2_genes)

0          AL627309.1
1       RP11.206L10.2
2       RP11.206L10.9
3           LINC00115
4               NOC2L
            ...      
2995            HCLS1
2996           GOLGB1
2997            IQCB1
2998             EAF2
2999          SLC15A2
Name: Unnamed: 0, Length: 3000, dtype: object
0       RP11.206L10.2
1       RP11.206L10.9
2           LINC00115
3              FAM41C
4               NOC2L
            ...      
2995          TMPRSS7
2996          C3orf52
2997            GCSAM
2998            CD200
2999             BTLA
Name: Unnamed: 0, Length: 3000, dtype: object


In [None]:
data1.head()

Unnamed: 0.1,Unnamed: 0,stimAGGACACTCATGGT-1,stimCCCTTACTTTGCGA-1,stimGCTACCTGTGGTCA-1,stimTCCTAATGTCTCTA-1,stimTAGCCCTGACCTCC-1,stimGGTCTAGAAGTGTC-1,stimTGACGAACTACGCA-1,stimTCATTCGATTTACC-1,stimAGGTACTGTTCCAT-1,stimTATGTCTGCACACA-1,stimATCGGTGATCCCAC-1,stimCGGGCATGTGTTCT-1,stimATTGATGATTCTTG-1,stimCACTGCTGGTCTGA-1,stimAGCGAACTAAAACG-1,stimTTGGGAACTCGACA-1,stimTTATGAGATCGTGA-1,stimACGCGGTGCTTACT-1,stimCAGATCGATGGTTG-1,stimGGCGGACTGTGCTA-1,stimACCCGTACAAAGTG-1,stimATACTCTGCTGCAA-1,stimTAACACCTACCTTT-1,stimAAGTCTCTTTGCTT-1,stimCTCGACTGCGAATC-1,stimCGGTAAACGAAAGT-1,stimGAGTAAGACTTGCC-1,stimCGACTCTGGACTAC-1,stimAATCCTACTACTTC-1,stimCCAGTCACCGGAGA-1,stimAAGAAGACTAACCG-1,stimCGAAGTACAAGAGT-1,stimTGGTACGAACACTG-1,stimACGAACACGATACC-1,stimCCCACATGCGGAGA-1,stimAAGAGATGCAAAGA-1,stimAATGCGTGCTGTGA-1,stimAGGGACGAACTAGC-1,stimCAATCTACTTCTGT-1,...,stimCCTATAACTCGTGA-1,stimGAAATACTCGAGAG-1,stimATACGGACAACCAC-1,stimAACCTTTGTGCCCT-1,stimACGTTGGACCCTCA-1,stimGATATCCTTCTCGC-1,stimATTACCTGCTAGTG-1,stimAGTGAAGAGTTACG-1,stimCAACCAGAGAGCAG-1,stimCGAGAACTTCAAGC-1,stimCATGCGCTAAAACG-1,stimCAATCGGACGTAGT-1,stimGACTGAACCCAGTA-1,stimGGAGTTTGGTGCTA-1,stimGGTGATACGTTGCA-1,stimCTCAATTGGTGCAT-1,stimATTCGGGAACCACA-1,stimTAGTAAACTCGCTC-1,stimTATCAAGACAATCG-1,stimAGGCAGGACGTCTC-1,stimGAGGCAGATCATTC-1,stimCCAGTCTGGTAGGG-1,stimTGTAAAACTCCCGT-1,stimTAGTCACTGAGATA-1,stimAAATCAACGAGGCA-1,stimACAAGCACTCTTAC-1,stimACACGAACGTCTAG-1,stimCTTCACCTATGTGC-1,stimCATCAGGAGTATCG-1,stimACTGTTACGTGCTA-1,stimCCGCTATGGTCCTC-1,stimATCTTGACGCTGAT-1,stimCGGTCACTAGCACT-1,stimAATCTCTGGTATGC-1,stimCGGTACCTAGATGA-1,stimTTGGTACTTTCCCG-1,stimGGAGGCCTTTTACC-1,stimTGCAGATGACCGAT-1,stimACTCCTCTACGGAG-1,stimGCATTGGATATGCG-1
0,AL627309.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,RP11.206L10.2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,RP11.206L10.9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,LINC00115,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,NOC2L,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
# melt data
X1 = pd.melt(data1, id_vars=['Unnamed: 0'], var_name='cell', value_name='val')
X1.rename(columns = {'Unnamed: 0':'gene'}, inplace = True)
#X1 = X1.loc[X1['gene'].isin(x2_genes)] ## select genes common to X1 and X2
X1['gene'] = X1['gene'].astype('category')
X1['gene'] = X1['gene'].cat.codes
X1['cell'] = X1['cell'].astype('category')
X1['cell'] = X1['cell'].cat.codes
num_genes_x1 = len(X1['gene'].unique())
num_cells_x1 = len(X1['cell'].unique())
X1.head()

Unnamed: 0,gene,cell,val
0,172,553,0
1,2051,553,0
2,2052,553,0
3,1319,553,0
4,1611,553,0


In [None]:
# melt data
X2 = pd.melt(data2, id_vars=['Unnamed: 0'], var_name='cell', value_name='val')
X2.rename(columns = {'Unnamed: 0':'gene'}, inplace = True)
X2['gene'] = X2['gene'].astype('category')
X2['gene'] = X2['gene'].cat.codes
X2['cell'] = X2['cell'].astype('category')
X2['cell'] = X2['cell'].cat.codes
num_genes_x2 = len(X2['gene'].unique())
num_cells_x2 = len(X2['cell'].unique())
X2.head()

Unnamed: 0,gene,cell,val
0,2038,2562,0
1,2039,2562,0
2,1311,2562,0
3,869,2562,0
4,1601,2562,0


In [None]:
def getMatrix(df, ng, nc):
  matrix = np.full((ng, nc), 0)
  for (_, gene, cell, val) in df.itertuples():
    matrix[gene, cell] = val
  return matrix

In [None]:
# Get the 2D  representation of the data
x1_matrix = getMatrix(X1, num_genes_x1, num_cells_x1)
x2_matrix = getMatrix(X2, num_genes_x2, num_cells_x2)

In [None]:
n_rows1 = len((X1.cell).unique())
n_cols1 = len((X1.gene).unique())
print(n_rows1)
print(n_cols1)
n_rows2 = len((X2.cell).unique())
n_cols2 = len((X2.gene).unique())
print(n_rows2)
print(n_cols2)
# print(n_rows_)
# print(n_cols_)

3000
3000
3000
3000


In [None]:
## Load pre-trained embeddings
W_Embedding = pd.read_csv("https://raw.githubusercontent.com/Mcpaeis/Independent-Study/master/Data/W-Embedding.csv", sep=",", header=None)
H1_Embedding = pd.read_csv("https://raw.githubusercontent.com/Mcpaeis/Independent-Study/master/Data/H1-Embedding.csv", sep=",", header=None)
H2_Embedding = pd.read_csv("https://raw.githubusercontent.com/Mcpaeis/Independent-Study/master/Data/H2-Embedding.csv", sep=",", header=None)
print("W_Embedding shape: ", W_Embedding.shape)
print("H1_Embedding shape: ", H1_Embedding.shape)
print("H2_Embedding shape: ", H2_Embedding.shape)

W_Embedding shape:  (1001, 20)
H1_Embedding shape:  (401, 20)
H2_Embedding shape:  (601, 20)


In [None]:
## Flatten the autoencoder matrices
x11_matrix = x1_matrix.flatten()
x22_matrix = x2_matrix.flatten()
padded_matrix =  tf.keras.preprocessing.sequence.pad_sequences([x11_matrix, x22_matrix], value=-10, padding="post")
x1_pad = padded_matrix[0]
print("X1 pad sequence: ", x1_pad.shape)
x2_pad = padded_matrix[1]
x2_pad = x2_pad.reshape(-1, )
print("X2 pad sequence: ", x2_pad.shape)

X1 pad sequence:  (9000000,)
X2 pad sequence:  (9000000,)


In [None]:
def NMFL():

  ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ##### Begin Autoencoder
  ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  # Encoder network
  input_layera = Input(shape=(1,), name="X1-Input")
  masked_row_inputa = Masking(mask_value=-10)(input_layera)
  dense_1a = Dense(150, activation = 'relu', name = 'Dense1a')(masked_row_inputa)
  dense_1a = Dropout(0.2)(dense_1a)
  dense_2a = Dense(300, activation = 'relu', name = 'Dense2a')(dense_1a)
  dense_2a = Dropout(0.2)(dense_2a)
  dense_3a = Dense(150, activation = 'relu', name = 'Dense3a')(dense_2a)
  dense_3a = Dropout(0.2)(dense_3a)
  # add the latent space
  latent_spacea = Dense(512, activation = 'relu', name = 'LatentSpacea')(dense_3a)
  latent_spacea = Dropout(0.2)(latent_spacea)
  # Decoder network
  dense_4a = Dense(150, activation = 'relu', name = 'Dense4a')(latent_spacea)
  dense_4a = Dropout(0.2)(dense_4a)
  dense_5a = Dense(300, activation = 'relu', name = 'Dense5a')(dense_4a)
  dense_5a = Dropout(0.2)(dense_5a)
  dense_6a = Dense(150, activation = 'relu', name = 'Dense6a')(dense_5a)
  dense_6a = Dropout(0.2)(dense_6a)

  resulta = Dense(1, activation = 'softmax', name = 'Result1')(dense_6a)

  # Encoder network
  input_layer_a = Input(shape=(1,), name="X2-Input")
  masked_row_input_a = Masking(mask_value=-10)(input_layer_a)
  dense_1_a = Dense(150, activation = 'relu', name = 'Dense1_a')(masked_row_input_a)
  dense_1_a = Dropout(0.2)(dense_1_a)
  dense_2_a = Dense(300, activation = 'relu', name = 'Dense2_a')(dense_1_a)
  dense_2_a = Dropout(0.2)(dense_2_a)
  dense_3_a = Dense(150, activation = 'relu', name = 'Dense3_a')(dense_2_a)
  dense_3_a = Dropout(0.2)(dense_3_a)
  # add the latent space
  latent_space_a = Dense(512, activation = 'relu', name = 'LatentSpace_a')(dense_3_a)
  latent_space_a = Dropout(0.2)(latent_space_a)
  # Decoder network
  dense_4_a = Dense(150, activation = 'relu', name = 'Dense4_a')(latent_space_a)
  dense_4_a = Dropout(0.2)(dense_4_a)
  dense_5_a = Dense(300, activation = 'relu', name = 'Dense5_a')(dense_4_a)
  dense_5_a = Dropout(0.2)(dense_5_a)
  dense_6_a = Dense(150, activation = 'relu', name = 'Dense6_a')(dense_5_a)
  dense_6_a = Dropout(0.2)(dense_6_a)

  result_a = Dense(1, activation = 'softmax', name = 'Result2')(dense_6_a)
  ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ##### End Autoencoder
  ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


  ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ##### Begin Linear Network
  ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  ## create keras model
  dim_embeddings = 20

  ## rows
  row_input = Input(shape=[1], name='X1-Row')
  masked_row_input = Masking(mask_value=-10)(row_input)
  row_embedding = Embedding(n_rows1 + 1, dim_embeddings, name = "H1-Embedding",
                            embeddings_constraint=tf.keras.constraints.NonNeg())(masked_row_input)
  row_embedding.trainable = True;
  #row_bias = Embedding(n_rows1 + 1, bias, name="Row-Bias")(row_input)

  ## cols
  col_input = Input(shape=[1],name='X2-Row')
  masked_col_input = Masking(mask_value=-10)(col_input)
  col_embedding = Embedding(n_rows2 + 1, dim_embeddings, name="H2-Embedding",
                            embeddings_constraint=tf.keras.constraints.NonNeg())(masked_col_input)
  col_embedding.trainable = True;
  #col_bias = Embedding(n_rows2 + 1, bias, name="Col-Bias")(col_input)

  ## cols
  rc_input = Input(shape=[1], name='RC-Column')
  masked_rc_input = Masking(mask_value=-10)(rc_input)
  # cols 1 or cols 2 will do
  rc_embedding = Embedding(n_cols1 + 1, dim_embeddings, name="W-Embedding",
                          embeddings_constraint=tf.keras.constraints.NonNeg())(masked_rc_input)
  rc_embedding.trainable = True;
  rc_embedding_t = keras.layers.Permute((2, 1))(rc_embedding)
  #rc_bias = Embedding(n_cols_ + 1, bias, name="RC-Bias")(rc_input)

  ## matrix product
  matrix_product = Dot(axes=(1, 2))([row_embedding, rc_embedding_t]) #([row_embedding, rc_embedding_t])
  matrix_product = Dropout(0.2)(matrix_product)
  matrix_product_ = Dot(axes=(1, 2))([col_embedding, rc_embedding_t])
  matrix_product_ = Dropout(0.2)(matrix_product_)

  #matrix_product_gen = multiply([row_embedding, col_embedding, rc_embedding])

  ## add bias terms
  # input_terms = concatenate([matrix_product, rc_bias, row_bias])
  input_terms = Flatten()(matrix_product)
  # input_terms_ = concatenate([matrix_product, rc_bias, col_bias])
  input_terms_ = Flatten()(matrix_product_)

  ## Construct the linear update rule.

  ## add dense layers
  dense_1 = Dense(150, activation='linear', name = "Dense1")(input_terms)
  # Incorporate the autoencoder
  dense_1 = Multiply()([dense_1a, dense_1])
  dense_1 = Dropout(0.2)(dense_1)
  dense_2 = Dense(120, activation="linear", name = "Dense2")(dense_1)
  dense_2 = Dropout(0.2)(dense_2)
  dense_3 = Dense(90, activation="linear", name = "Dense3")(dense_2)
  dense_3 = Dropout(0.2)(dense_3)
  dense_4 = Dense(60, activation="linear", name = "Dense4")(dense_3)
  dense_4 = Dropout(0.2)(dense_4)
  dense_5 = Dense(20, activation="linear", name = "Dense5")(dense_4)
  dense_5 = Dropout(0.2)(dense_5)
  result = Dense(1, activation='linear', name='X1-Output')(dense_5)
  # add dense layers
  dense_1_ = Dense(150, activation="linear", name = "Dense1_")(input_terms_)
  # Incorporate the autoencoder
  dense_1_ = Multiply()([dense_1_, dense_1_a])
  dense_1_ = Dropout(0.2)(dense_1_)
  dense_2_ = Dense(120, activation="linear", name = "Dense2_")(dense_1_)
  dense_2_ = Dropout(0.2)(dense_2_)
  dense_3_ = Dense(90, activation="linear", name = "Dense3_")(dense_2_)
  dense_3_ = Dropout(0.2)(dense_3_)
  dense_4_ = Dense(60, activation="linear", name = "Dense4_")(dense_3_)
  dense_4_ = Dropout(0.2)(dense_4_)
  dense_5_ = Dense(20, activation="linear", name = "Dense5_")(dense_4_)
  dense_5_ = Dropout(0.2)(dense_5_)
  result_ = Dense(1, activation='linear', name='X2-Output')(dense_5_)

  ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ##### End Linear Network
  ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  ## define model with 2 inputs and 1 output
  model = Model(inputs=[input_layera, input_layer_a, row_input, col_input, rc_input], outputs=[resulta, result_a, result, result_])

  ## show model summary
  #model.summary()
  keras.utils.plot_model(model, "autoencoder.png", show_shapes=True, expand_nested = True)
  return model

In [None]:
model_mf = NMFL()
model_mf.compile(optimizer = Adam(lr=0.0001), loss=['mse', 'mse', 'mse', 'mse'],
                 metrics=['mean_squared_error', 'mean_squared_error',
                          'mean_squared_error', 'mean_squared_error'],
                 loss_weights = [1, 1, 1, 1])
model_mf.summary()

Model: "model_2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
RC-Column (InputLayer)          (None, 1)            0                                            
__________________________________________________________________________________________________
X1-Row (InputLayer)             (None, 1)            0                                            
__________________________________________________________________________________________________
masking_10 (Masking)            (None, 1)            0           RC-Column[0][0]                  
__________________________________________________________________________________________________
X2-Row (InputLayer)             (None, 1)            0                                            
____________________________________________________________________________________________

In [None]:
## Get indexes and inputs
special_value = -10.0
r = (np.array(X1['cell']).reshape(-1, 1))
print("r shape: ",r.shape[0])
c = (np.array(X2['cell']).reshape(-1, 1))
print("c shape: ",c.shape)
rc = (np.array(X1['gene']).reshape(-1, 1))
print("rc shape: ", rc.shape)
max_len = max(r.shape[0], c.shape[0], rc.shape[0])
print("max len: ", max_len)
padded_sequence = tf.keras.preprocessing.sequence.pad_sequences([r, c, rc], value=-10, padding="post")
rp = padded_sequence[0]
print("rp shape: ",rp.shape)
cp = padded_sequence[1]
print("cp shape: ",cp.shape)
rcp = padded_sequence[2]
print("rcp shape: ",rcp.shape)
padded_target = tf.keras.preprocessing.sequence.pad_sequences([X1['val'], X2['val']], value=-10, padding="post")
val1 = padded_target[0]
val2 = padded_target[1]
x1_pad = np.asarray(x1_pad)
x1_pad = x1_pad.reshape(-1, 1)
#x1_pad = np.transpose(x1_pad)
x2_pad = x2_pad.reshape(-1, 1)
#x2_pad = np.transpose(x2_pad)

x1_pad.shape

r shape:  9000000
c shape:  (9000000, 1)
rc shape:  (9000000, 1)
max len:  9000000
rp shape:  (9000000, 1)
cp shape:  (9000000, 1)
rcp shape:  (9000000, 1)


(9000000, 1)

In [None]:
## fit model
history_mf = model_mf.fit({"X1-Input": x1_pad, "X2-Input": x2_pad, "X1-Row": rp, "X2-Row": cp, "RC-Column": rcp},
                          {"Result1": x1_pad, "Result2": x2_pad, "X1-Output": val1, "X2-Output": val2},
                          batch_size = 256,
                          validation_split = 0.2,
                          epochs = 50,
                          verbose = 1)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 7200000 samples, validate on 1800000 samples
Epoch 1/50
 820992/7200000 [==>...........................] - ETA: 8:53 - loss: 7.0329 - Result1_loss: 3.3831 - Result2_loss: 1.2853 - X1-Output_loss: 1.9890 - X2-Output_loss: 0.3755 - Result1_mean_squared_error: 3.3831 - Result2_mean_squared_error: 1.2853 - X1-Output_mean_squared_error: 1.9890 - X2-Output_mean_squared_error: 0.3755

In [None]:
# print out the layers
index = 0
for lay in model_mf.layers:
  print(lay.name, 'at index: ', index)
  index+=1

In [None]:
## get weights of the books embedding matrix
W = model_mf.layers[7].get_weights()[0]
H1 = model_mf.layers[11].get_weights()[0]
H2 = model_mf.layers[13].get_weights()[0]
print("W shape: ", W.shape)
print("H1 shape: ", H1.shape)
print("H2 shape: ", H2.shape)

In [None]:
(pd.DataFrame(W)).head()

In [None]:
def plot3D(df, title):
  my_dpi=96
  plt.figure(figsize=(480/my_dpi, 480/my_dpi), dpi=my_dpi)
  pca = PCA(n_components=3)
  df = pd.DataFrame(df)
  pca.fit(df)
  # Store results of PCA in a data frame
  result=pd.DataFrame(pca.transform(df), columns=['PCA%i' % i for i in range(3)], index=df.index)

  # Plot initialisation
  fig = plt.figure()
  ax = fig.add_subplot(111, projection='3d')
  ax.scatter(result['PCA0'], result['PCA1'], result['PCA2'], s=60)

  # label the axes
  ax.set_title(title)
  plt.show()

In [None]:
d1 = data1.drop(columns=['Unnamed: 0'])
d2 = data2.drop(columns=['Unnamed: 0'])
plot3D(d1, "Data 1")
plot3D(d2, "Data 2")
plot3D(pd.DataFrame(W), "W")