# ClosedFormDeconvolutionTest
```
Andrew E. Davidson
aedavids@ucsc.edu
10/25/22
```

Cibersort took 3.5 days to run on GTEx_TCGA training data sets with ???25 best genes selected. This is GEP (gene signature matrix) with about 825 genes.

This trival example demonstrates are able to Solve for F given G, and M using a closd form solution

In addition to performance I think this is a better solution. All other linear deconvolution algo are of the form F*G = M, and use an itterative method to solve for F given G and M. The assumption is that Value of F that minimizes the set of regression models, I.E. sum( (M-Mhat)^2) is the correct

We have an unusal data set. We have labels for each sample. What we want to do is figure out what the best Gene Signatue matrix is. That is to say minimize sum( (F-Fhat)^2 )

**<span style="color:red">TODO</span>**
1. we are not always able to to calculate an inverse. 
    + after squaring G, we sometime get error from np.linalg.inv 'singlar values'. ie. no solution
    + see bottom cells, sometimes we get an inverse how ever mut G*GInv does not produce
    the identiy matrix
    + ref:
        - [numpy inverse](https://numpy.org/doc/stable/reference/generated/numpy.linalg.inv.html)
        - [numpy psudo inverse](https://numpy.org/doc/stable/reference/generated/numpy.linalg.pinv.html)

2. explore conditions when this approach will and will not works.
    + does numpy have a psudo inverse?
3. does it produce same results as cibersort?
    + 10/27/2022
    + we have not evaluated cibersort yet. We know that some samples have high correlations, others low. To see if there was any evidence our algorithm works we selected cibersort results with R > 0.9 and p-value < 0.01. This results in 9,873 samples and a total of 819,459 fractions. 818,459 = 9,873 * 83. 
    
    + **preprocescing**
        1. set values < 0 = 0
        2. normalize (ie row sums = 1)
        3. we round(decimal = 2), np.isclose() reports
    | atol | accuracy %| round(2) accuracy %|
    |--- | --- | --- |
    |1e-08 | 19.22 | 47.19|
    |1e-05 | 20.58 | |
    |1e-03 | 38.48 | 47.19 |
    |1e-02 | 51.33 | 53.13 |
    
    <span style="color:red">Could accuracy be inflated?</span> We have 83 types, our samples are from a single type. so lots of zeros match
    + <span style="color:red">TODO</span>
    1. we need to first evalute cibersort results.
    2. assuming cibersort results are good, we need to check that our deconvolution results look like a one-hot encoding

4. is it faster than cibersort?
    + mesure time, memory, and cpu
5. Lets assume the closed form allows us to calculate F quickly. How do we improve the signature matrix?
    + find under represented samples and add 1vsAll genes
    + normalize (min/max scale) G and M
        - linear models, Deep models SVM models tend to focus on features with large values and ignore features with small models
    + drop  colinear variables or variables with low variance
        - linear models tend to perform better with fewer redudent variables
    + Shearing ??? feature importance
    + Permutation feature importance
        - train model, and make perdiction
        - for each feature
            * scrample feature (remove all info)
            * make prediction
            * compare to unscrambled reference
 

In [1]:
from IPython.display import display
import numpy as np
import pandas as pd
import pathlib as pl
import shutil

numGenes = 4
numTypes = 3
numSamples = 5

In [2]:
# not invertable G = (np.arange(0, numTypes * numGenes) + 1).reshape(numTypes , numGenes)
G =  np.array([
             [ 1,  2,  3, 13]
            ,[ 4,  5,  6, 14]
            ,[ 7,  8,  9, 14]
            #,[10, 11, 12]
              ]).astype('float')

In [3]:
F = np.array([
    [0, 1/2, 2/3],
    [1/6, 3/6, 2/6],
    [5/9, 3/9, 1/9]
])

In [4]:
M = np.matmul(F, G)
M

array([[ 6.66666667,  7.83333333,  9.        , 16.33333333],
       [ 4.5       ,  5.5       ,  6.5       , 13.83333333],
       [ 2.66666667,  3.66666667,  4.66666667, 13.44444444]])

# Can we solve trival problem for F, given G, and M

In [5]:
GSq = np.matmul(G, G.transpose())
GInv = np.linalg.inv( GSq )

GInv

array([[  99.        , -179.33333333,   87.33333333],
       [-179.33333333,  324.92592593, -158.25925926],
       [  87.33333333, -158.25925926,   77.09259259]])

In [6]:
# check
print("check expected I")
np.matmul( GSq, GInv)

check expected I


array([[ 1.00000000e+00,  1.00044417e-11, -5.79802872e-12],
       [-4.68958206e-12,  1.00000000e+00, -9.57811608e-12],
       [-3.21165317e-12,  1.02318154e-11,  1.00000000e+00]])

```
F*G = M
F*G*G.transpose = M*G.transpose # if G is not square, make it square  so we can invese
F*G*G*inv(G) = M*G.transpose*inv(G) 
F*I =  M*G.transpose*inv(G)
F = M*G.transpose*inv(G)
```

In [7]:
print("M.shape:{}".format(M.shape))
print("G.shape:{}".format(G.shape))
FHat = np.matmul( np.matmul(M, G.transpose()), GInv )
print("FHat")
print(FHat)
print("\n------------- F")
print(F)

M.shape:(3, 4)
G.shape:(3, 4)
FHat
[[-1.61198462e-11  5.00000000e-01  6.66666667e-01]
 [ 1.66666667e-01  5.00000000e-01  3.33333333e-01]
 [ 5.55555556e-01  3.33333333e-01  1.11111111e-01]]

------------- F
[[0.         0.5        0.66666667]
 [0.16666667 0.5        0.33333333]
 [0.55555556 0.33333333 0.11111111]]


In [8]:
assert np.isclose(FHat, F).all(), "ERROR Solve for F failed"

# check code to calculate inverse

In [9]:
def getInverse(A, usePseudo=False):
    '''
    returns (A, Ainverse)
    If A was not a square matrix it returns (A*A, AInverse)
    '''
    m,n = A.shape
    if m != n :
        # square the matrix
        A = np.matmul(A, A.transpose())
    
    if usePseudo:
        # https://numpy.org/doc/stable/reference/generated/numpy.linalg.pinv.html
        ret = np.linalg.pinv(A)
    else:
        ret = np.linalg.inv(A)
    return (A, ret)

def checkInverse(A, AInverse):
    '''
    returns A * AInverse
    '''
    print("A:\n {}".format(A))
    print("\nA inverse:\n {}".format(AInverse))
    print("\nA * AInverse")
    ret = np.matmul(A, AInverse)
    print(ret)
    return ret

def deconvole(M, G, GInverse):
    ret = np.matmult( np.matmult(M, G.transpose()), GInverse)
    return ret

In [10]:
# 3x3 T works
T = np.array([
             [ 1,  2,  3]
            ,[ 4,  5,  6]
            ,[ 7,  8,  9]
            #,[10, 11, 12]
              ]).astype('float')


# psudo
# A inverse: usePseudo=True, assert isclose fails
#  [[-6.38888889e-01 -1.66666667e-01  3.05555556e-01]
#  [-5.55555556e-02  3.36727575e-17  5.55555556e-02]
#  [ 5.27777778e-01  1.66666667e-01 -1.94444444e-01]]

# usePseudo=False raises LinAlgError: Singular matrix
TSq, TInv = getInverse(T, usePseudo=True)
shouldBeIdentity = checkInverse(TSq, TInv)
m,n = TSq.shape
assert np.isclose(shouldBeIdentity, np.eye(m,n)).all(), "ERROR calc inverse failed"

A:
 [[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]

A inverse:
 [[-6.38888889e-01 -1.66666667e-01  3.05555556e-01]
 [-5.55555556e-02  3.36727575e-17  5.55555556e-02]
 [ 5.27777778e-01  1.66666667e-01 -1.94444444e-01]]

A * AInverse
[[ 0.83333333  0.33333333 -0.16666667]
 [ 0.33333333  0.33333333  0.33333333]
 [-0.16666667  0.33333333  0.83333333]]


AssertionError: ERROR calc inverse failed

In [11]:
T2 = np.array([
             [ 1,  2,  3]
            ,[ 4,  5,  6]
            ,[ 7,  8,  9]
            ,[10, 11, 12]
              ]).astype('float')

# A inverse: usePseudo=False, assert fails 
#  [[-4.69124961e+13  7.03687442e+13  1.25000000e-01 -2.34562481e+13]
#  [ 9.96890543e+13 -1.40737488e+14 -1.75921860e+13  5.86406201e+13]
#  [-5.86406201e+13  7.03687442e+13  3.51843721e+13 -4.69124961e+13]
#  [ 5.86406201e+12 -0.00000000e+00 -1.75921860e+13  1.17281240e+13]]

TSq2, TInv2 = getInverse(T2, usePseudo=True)
shouldBeIdentity2 = checkInverse(TSq2, TInv2)
m,n = TSq2.shape
assert np.isclose(shouldBeIdentity2, np.eye(m,n)).all(), "ERROR calc inverse failed"

A:
 [[ 14.  32.  50.  68.]
 [ 32.  77. 122. 167.]
 [ 50. 122. 194. 266.]
 [ 68. 167. 266. 365.]]

A inverse:
 [[ 0.40833333  0.21111111  0.01388889 -0.18333333]
 [ 0.21111111  0.10925926  0.00740741 -0.09444444]
 [ 0.01388889  0.00740741  0.00092593 -0.00555556]
 [-0.18333333 -0.09444444 -0.00555556  0.08333333]]

A * AInverse
[[ 0.7  0.4  0.1 -0.2]
 [ 0.4  0.3  0.2  0.1]
 [ 0.1  0.2  0.3  0.4]
 [-0.2  0.1  0.4  0.7]]


AssertionError: ERROR calc inverse failed

In [12]:
T3 = np.array([
             [ 1,  2,  3, 13]
            ,[ 4,  5,  6, 14]
            ,[ 7,  8,  9, 14]
            #,[10, 11, 12]
              ]).astype('float')

# A inverse: usePseudo=False
#  [[  99.         -179.33333333   87.33333333]
#  [-179.33333333  324.92592593 -158.25925926]
#  [  87.33333333 -158.25925926   77.09259259]]

TSq3, TInv3 = getInverse(T3, usePseudo=True)
shouldBeIdentity3 = checkInverse(TSq3, TInv3)
m,n = TSq3.shape
assert np.isclose(shouldBeIdentity3, np.eye(m,n)).all(), "ERROR calc inverse failed"

A:
 [[183. 214. 232.]
 [214. 273. 318.]
 [232. 318. 390.]]

A inverse:
 [[  99.         -179.33333333   87.33333333]
 [-179.33333333  324.92592592 -158.25925926]
 [  87.33333333 -158.25925926   77.09259259]]

A * AInverse
[[ 1.00000000e+00  1.50066626e-11 -9.89075488e-12]
 [ 1.19371180e-12  1.00000000e+00  4.63273864e-12]
 [ 9.15179044e-12 -1.83035809e-11  1.00000000e+00]]


# Compare closed form solution vs. Cibersort

## Load Data

In [65]:
# copy data from private to local disk
kl       = pl.Path("/private/groups/kimlab")
rootDir  = pl.Path("/scratch/aedavids/GTEx_TCGA")
bestRoot = pl.Path("geneSignatureProfiles/best")
oneVsAll = pl.Path("GTEx_TCGA_1vsAll-design:~__gender_+_category-padj:0.001-lfc:2.0-n:25")
inputRootDir = rootDir.joinpath(bestRoot)

bestSrc = kl.joinpath("GTEx_TCGA/geneSignatureProfiles/best").joinpath(oneVsAll)
#print("bestSrc:\n{}".format(bestSrc))

notebookName="closedFormDeconvolution"
inputLocalDir = rootDir.joinpath(notebookName).joinpath("input")
print("\ninputLocalDir:\n{}".format(inputLocalDir))
inputLocalDir.mkdir(parents=True, exist_ok=True)

mixtureMatrixFileName = "GTEx_TCGA_TrainGroupby_mixture.txt"
mixtureMatrixSrc = bestSrc.joinpath("ciberSort").joinpath(mixtureMatrixFileName)
mixtureMatrix = inputLocalDir.joinpath(mixtureMatrixFileName)
print("\nmixtureMatrixSrc:\n{}".format(mixtureMatrixSrc))

def tree(directory):
    '''
    debug file paths
    '''
    print(f'+ {directory}')
    for path in sorted(directory.rglob('*')):
        depth = len(path.relative_to(directory).parts)
        spacer = '    ' * depth
        print(f'{spacer}+ {path.name}')
        
#tree(mixtureMatrixSrc.parent)


if not mixtureMatrix.exists():
    shutil.copy(mixtureMatrixSrc, mixtureMatrix)

signatureMatrixFileName = "signatureGenes.tsv"
signatureMatrix = inputLocalDir.joinpath(signatureMatrixFileName)
signatureMatrixSrc = bestSrc.joinpath("ciberSort").joinpath(signatureMatrixFileName)
if not signatureMatrix.exists():
    shutil.copy(signatureMatrixSrc, signatureMatrix)
print("\nsignatureMatrixSrc:\n{}".format(signatureMatrixSrc))   


inputLocalDir:
/scratch/aedavids/GTEx_TCGA/closedFormDeconvolution/input

mixtureMatrixSrc:
/private/groups/kimlab/GTEx_TCGA/geneSignatureProfiles/best/GTEx_TCGA_1vsAll-design:~__gender_+_category-padj:0.001-lfc:2.0-n:25/ciberSort/GTEx_TCGA_TrainGroupby_mixture.txt

signatureMatrixSrc:
/private/groups/kimlab/GTEx_TCGA/geneSignatureProfiles/best/GTEx_TCGA_1vsAll-design:~__gender_+_category-padj:0.001-lfc:2.0-n:25/ciberSort/signatureGenes.tsv


In [66]:
%%time
# Create pandas dataframe
outputDir = rootDir.joinpath(notebookName).joinpath("output")
outputDir.mkdir(parents=True, exist_ok=True)

outputFile = outputDir.joinpath("fractions.tsv")
if outputFile.exists():
    print("{} exits".format(outputFile))
else:
    signatureDF = pd.read_csv(signatureMatrix, sep='\t')
    signatureDF.drop(columns=["name"], inplace=True)
    
    mixtureDF = pd.read_csv(mixtureMatrix, sep='\t')
    mixtureDF.drop(columns=["sampleTitle"], inplace=True)
    
    # convert from cibersort expected format to
    # standard linear algebra format
    signatureDF = signatureDF.transpose()
    mixtureDF = mixtureDF.transpose()

CPU times: user 7.09 s, sys: 230 ms, total: 7.32 s
Wall time: 7.32 s


In [67]:
print(mixtureDF.shape)
mixtureDF.iloc[0:5,0:5]

(15801, 832)


Unnamed: 0,0,1,2,3,4
GTEX-1117F-0226-SM-5GZZ7,0.0,0.0,0.0,0.0,0.0
GTEX-1117F-0526-SM-5EGHJ,0.0,0.0,0.0,0.0,0.0
GTEX-1117F-0726-SM-5GIEN,0.0,0.0,0.0,0.0,0.0
GTEX-1117F-2826-SM-5GZXL,0.0,0.0,0.0,0.0,0.0
GTEX-1117F-3226-SM-5N9CT,0.0,0.0,0.0,0.0,0.0


In [68]:
%%time
# find fraction matrix
mixtureNP = mixtureDF.to_numpy()
print("mixtureNP.shape: {}".format(mixtureNP.shape))

signatureNP = signatureDF.to_numpy()
print("signatureNP.shape: {}".format(signatureNP.shape))

signatureTransposeNP = signatureNP.transpose()
print("signatureTransposeNP.shape: {}".format(signatureTransposeNP.shape))

signatureSqNP, signatureInvNP = getInverse(signatureNP, usePseudo=True)
print("signatureInvNP.shape: {}".format(signatureInvNP.shape))

fractionsNP = np.matmul( np.matmul(mixtureNP, signatureTransposeNP), signatureInvNP )
print("fractionsNP.shape: {}".format(fractionsNP.shape))

mixtureNP.shape: (15801, 832)
signatureNP.shape: (83, 832)
signatureTransposeNP.shape: (832, 83)
signatureInvNP.shape: (83, 83)
fractionsNP.shape: (15801, 83)
CPU times: user 2.33 s, sys: 5.78 s, total: 8.11 s
Wall time: 85 ms


In [69]:
# check fractionsNP has expected shape
assert fractionsNP.shape[0] * fractionsNP.shape[1] == 15801 * 83

In [70]:
# check there are no negative values
negativeLogicalNP = fractionsNP < 0
print( np.sum(negativeLogicalNP) )

663372


## Compare results
todo
1. check the shape fractionsNP
2. check fractions sum to 1
3. we have 663,372 negative fractions. There should be no values < =  0
    * select the positive values and compare with cibersort results
    * cibersort probalby clips gradients, can we round our results?

In [71]:
# load cibersort results
cibersortOut = pl.Path("/scratch/aedavids/cibersort.out/GTEx_TCGA_TrainGroupby_mixture-2022-10-18-07.40.54-PDT")
cibersortRet = cibersortOut.joinpath("CIBERSORTx_GTEx_TCGA_TrainGroupby_mixture-2022-10-18-07.40.54-PDT_Results.txt")
cibersortFractionsDF = pd.read_csv(cibersortRet, sep='\t')

In [72]:
# select samples with large R
display( cibersortFractionsDF.loc[:, ['Mixture', 'P-value', 'Correlation', 'RMSE']].head())
selectRowsWithHighR = (cibersortFractionsDF.loc[:, 'Correlation'] > 0.9) & (cibersortFractionsDF.loc[:, 'P-value'] <= 0.01)
print( sum(selectRowsWithHighR) )
cibersortFractionLargeRDF = cibersortFractionsDF.loc[selectRowsWithHighR, ['Mixture', 'P-value', 'Correlation', 'RMSE']]

# we need index values to select fractions we want to compare
# check index works as expected. ie. monitonic increase, with gaps
print("\n check index works as expected. ie. monitonic increase, with gaps")
display( cibersortFractionLargeRDF.head() )
display( cibersortFractionLargeRDF.tail() )

cibersortLargeRIdxNP = cibersortFractionLargeRDF.index.to_numpy()
cibersortLargeRIdxNP

Unnamed: 0,Mixture,P-value,Correlation,RMSE
0,GTEX-1117F-0226-SM-5GZZ7,0.0,0.985425,0.926455
1,GTEX-1117F-0526-SM-5EGHJ,0.0,0.979791,0.934695
2,GTEX-1117F-0726-SM-5GIEN,0.0,0.984906,0.447916
3,GTEX-1117F-2826-SM-5GZXL,0.0,0.988464,0.907675
4,GTEX-1117F-3226-SM-5N9CT,0.0,0.917052,0.949553


9873

 check index works as expected. ie. monitonic increase, with gaps


Unnamed: 0,Mixture,P-value,Correlation,RMSE
0,GTEX-1117F-0226-SM-5GZZ7,0.0,0.985425,0.926455
1,GTEX-1117F-0526-SM-5EGHJ,0.0,0.979791,0.934695
2,GTEX-1117F-0726-SM-5GIEN,0.0,0.984906,0.447916
3,GTEX-1117F-2826-SM-5GZXL,0.0,0.988464,0.907675
4,GTEX-1117F-3226-SM-5N9CT,0.0,0.917052,0.949553


Unnamed: 0,Mixture,P-value,Correlation,RMSE
15766,UVM-V4-A9ET-TP,0.0,0.901526,0.954368
15775,UVM-V4-A9F8-TP,0.0,0.905897,0.952661
15778,UVM-VD-A8KB-TP,0.0,0.932183,0.95182
15793,UVM-WC-A883-TP,0.0,0.939317,0.713812
15800,UVM-YZ-A985-TP,0.0,0.916389,0.951274


array([    0,     1,     2, ..., 15778, 15793, 15800])

In [73]:
# select fractions
cibersortFractionsNP = cibersortFractionsDF.drop(columns=['Mixture', 'P-value', 'Correlation', 'RMSE']).to_numpy()
print("cibersortFractionsNP.shape: {}".format(cibersortFractionsNP.shape))

cibersortFractionsNP.shape: (15801, 83)


In [74]:
# check shape
assert fractionsNP.shape == cibersortFractionsNP.shape, "ERROR closed form fraction shape does not match cibersort result"

In [75]:
# check: fractions rows should sum to 1
byRows = 1  
emsg = " ERROR fractions do not sum to 1"
assert np.isclose( np.sum(np.sum( cibersortFractionsNP, axis=byRows )),  cibersortFractionsNP.shape[0] ), "cibersortFractionsNP" + emsg

assert np.isclose( np.sum(np.sum( fractionsNP,          axis=byRows)), fractionsNP.shape[0] ), "fractionsNP:" + emsg

In [76]:
# check fractions are >=  0
emsg = " ERROR fractions can not be negative. ie can not have negative counts"
assert np.sum( cibersortFractionsNP < 0 ) == 0, "cibersortFractionsNP:" + emsg

negativeLogicalNP = fractionsNP < 0
numNegValues = np.sum( negativeLogicalNP )
print("ERROR fractionsNP has {:,} negative values!".format(numNegValues))
# assert np.sum( fractionsNP < 0 ) == 0, "fractionsNP:" + emsg
print( "percentage of negative values {:,}".format( (fractionsNP.size - numNegValues) / fractionsNP.size * 100) )


ERROR fractionsNP has 663,372 negative values!
percentage of negative values 49.418177742296315


In [77]:
# can not fix by simply setting negative values = 0?
clippedNP = np.where(fractionsNP < 0, 0, fractionsNP)
numNegValues = np.sum( clippedNP < 0)
numNegValues

# do the fractions still sum to 1?
np.sum(clippedNP, axis=byRows) > 1

array([ True, False,  True, ...,  True,  True, False])

In [78]:
# can we fix by clipping and rescaling values so that they sum to 1
rowSums = np.sum(clippedNP, axis=byRows)
normalizedNP = np.divide(clippedNP, rowSums.reshape(rowSums.size, 1) )
#print( np.divide(matrix_2d_ordered, vector_1d.reshape(3,1)) )

print("normalizedNP")
print(normalizedNP)

print("\ncheck sums to 1")
print( np.sum(normalizedNP, axis=byRows) )
print("\n if normalized correctly should sum {:,}".format(clippedNP.shape[0]))
print( np.sum(np.sum( normalizedNP, axis=byRows )) )

normalizedNP
[[0.00000000e+00 3.79593888e-01 0.00000000e+00 ... 0.00000000e+00
  4.36614928e-03 0.00000000e+00]
 [2.39629838e-04 2.20241789e-01 0.00000000e+00 ... 0.00000000e+00
  3.24159460e-03 0.00000000e+00]
 [1.02130515e-03 1.00332718e-01 3.01667939e-02 ... 0.00000000e+00
  0.00000000e+00 1.24850457e-03]
 ...
 [0.00000000e+00 0.00000000e+00 3.64691740e-02 ... 9.46296853e-03
  0.00000000e+00 0.00000000e+00]
 [2.30110198e-04 3.96736442e-02 0.00000000e+00 ... 2.30145655e-02
  0.00000000e+00 2.49562731e-04]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 ... 2.19493353e-04
  0.00000000e+00 0.00000000e+00]]

check sums to 1
[1. 1. 1. ... 1. 1. 1.]

 if normalized correctly should sum 15,801
15801.0


In [143]:
# Is there evidence normalization works?
# check cibersort samples with high correlation and small p-values


# foo = normalizedNP[[cibersortLargeRIdxNP]]
# print(foo.shape)
# print( foo[0:4, :][:, 0:4] )

# print()
# bar = cibersortFractionsNP[[cibersortLargeRIdxNP]]
# print(bar.shape)
# bar[0:4,:][:,0:4]

# isclose is not symmetric
rtol= 1e-05 # default 1e-05 
atol= 1e-08 # default 1e-08 
normalizedLargeRNP = normalizedNP[[cibersortLargeRIdxNP]]
cibersortLargeRNP = cibersortFractionsNP[[cibersortLargeRIdxNP]]
pos = np.sum( np.isclose(normalizedLargeRNP, cibersortLargeRNP, rtol, atol)) 
print("100% TP sum() == {:,}".format(normalizedLargeRNP.size))
print( "rtol = {} atol = {} sum() == {:,} accuracy = {:.2f}%".format(rtol, atol, pos, pos / normalizedNP.size * 100) )

print("\nround")
roundPos = np.sum( np.isclose(normalizedLargeRNP.round(decimals=2), cibersortLargeRNP.round(decimals=2), rtol, atol)) 
print( "rtol = {} atol = {}  sum() == {:,} accuracy = {:.2f}%".format(rtol, atol, roundPos, roundPos / normalizedNP.size * 100) )


100% TP sum() == 819,459
rtol = 1e-05 atol = 1e-08 sum() == 252,102 accuracy = 19.22%

round
rtol = 1e-05 atol = 1e-08  sum() == 618,842 accuracy = 47.19%




**isclose() results**

| atol | accuracy %| round(2) accuracy %|
|--- | --- | --- |
|1e-08 | 19.22 | 47.19|
|1e-05 | 20.58 | |
|1e-03 | 38.48 | 47.19 |
|1e-02 | 51.33 | 53.13 |

In [114]:
# print(normalizedNP[[cibersortLargeRIdxNP]])

# print()
# print( np.round(normalizedNP[[cibersortLargeRIdxNP]]) )
print()
normalizedNP[[cibersortLargeRIdxNP]].round()




  


array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [80]:
aedwip

NameError: name 'aedwip' is not defined

In [None]:
expected = cibersortFractionsNP
# isclose is not symmetric
rtol= 1e-05 # default 1e-05 
atol= 1e-08 # default 1e-08 

# isclose is not submetric
print("if perfect match isclose().sum == 1,311,483 i.e. 15801 * 83")
print(np.isclose( fractionsNP, expected,    rtol, atol).sum())
print(np.isclose( expected,    fractionsNP, rtol, atol).sum())

In [None]:
fractionsNP.flatten().shape

In [None]:
expected.flatten().shape

In [None]:
ksdlkdsl

# <span style="color:red">hacks</span>

In [None]:
if not mixtureInverseNP:
    print("WTF")

In [None]:
len(mixtureInverseNP)

In [None]:
mixtureDF.shape

In [None]:
mixtureDF.columns

In [None]:
mixtureTransposedDF = mixtureDF.transpose()
mixtureTransposedDF.shape

In [None]:
mixtureTransposedDF.columns

In [None]:
mixtureTransposedDF.head()

In [None]:
tNP = mixtureTransposedDF.to_numpy()
tNP.shape

In [None]:
tNP[0:5, 0:5]

In [None]:
tNP[-5:, -5:]

In [None]:
# sum test
d = np.arange(0,4)
dd = np.array( [d, d*-1, d, d*-1] )
print(dd)
byRows = 1
byColumns = 0
print(np.sum(dd, axis=byRows))
print(np.sum(dd, axis=byColumns))

In [None]:
# fancy idx set neg to zero
d = np.arange(0,4)
dd = np.array( [d, d*-1, d, d*-1] )
print(dd)

negValuesLogical = dd < 0
print()
print(negValuesLogical)
#dd + negValuesLogical * 999
#dd[negValuesLogical]

np.where(dd < 0, 0, dd)

In [None]:
# can we scale rows so that they sum to 1
d = np.arange(0,16).reshape(2,8)
print(d)

rowSums = np.sum(d, axis=byRows)
print("\nrowSums")
print(rowSums)

print("\n d / rowSums")
#print(d / rowSums)
print(np.divide(d, rowSums.transpose()))

# print("\nsum")
# #byColumns
# print( np.sum(d / rowSums, axis=byRows) )

In [None]:
# CREATE 1D 'VECTOR'
vector_1d = np.array([10,20,30])

# CREATE 2D MATRIX OF NUMBERS, 1 TO 9
numbers_1_to_9 = np.arange(start = 1, stop = 10)
matrix_2d_ordered = numbers_1_to_9.reshape((3,3))


print(vector_1d)
print()
print(matrix_2d_ordered)

print()
print( np.divide(matrix_2d_ordered, vector_1d) )

print("winner winner chicken dinner")
print( np.divide(matrix_2d_ordered, vector_1d.reshape(3,1)) )

In [None]:
# check we select rows of interest

d = np.arange(0,15).reshape(3,5)
print(d)
print()
print(d[[0,2]])