# Elastic nets on the Zeisel data set

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import numpy as np
import pandas as pd

In [3]:
import sklearn.linear_model as sklm

In [4]:
import sys
sys.path.append('/home/ahsvargo/xvalid')
sys.path.append('/home/ahsvargo/.local/bin')

In [5]:
from picturedrocks import Rocks
from picturedrocks.performance import FoldTester, PerformanceReport, NearestCentroidClassifier

### Import the data

We process the zeisel data for easy uploading.

In [6]:
rawdata = np.load("zeisel/zeisel-proc.npz")

In [7]:
# looking at the top 5k genes
rawdata['X'].shape

(3005, 4999)

Load the folds in a somewhat roundabout manner.

In [8]:
test = Rocks(rawdata['X'], rawdata['y'])

In [9]:
ft = FoldTester(test)
ft.loadfolds('zeisel/zeisel14-5folds.npz')

In [11]:
folds = ft.folds

### Testing to find the proper values of alpha to use in the mesh

Again, test by fitting a model for cluster 0 in fold 0.

In [10]:
vec0 = (test.y == 0)*1

In [11]:
l1_ratios = [.1, .5, .7, .9, .95, .99, 1]

With `max_iter`=10000, I couldn't find any consistent patterns.  The maximum time I observed was ~1 min 20 sec, the minimum time was less that 1 sec. We always converged.

Even with `alpha=50`, we are sometimes (often) seeing a marker or markers selected.  Decreasing the `l1_ratio` generally selects fewer markers, but not always.  So we need to consider a larger range of values of alpha than in the Paul data set.

It seems that 1000 iterations take about 20 seconds without a precomputed gram matrix.

Compute a gram matrix ahead of time to make the elastic nets fit faster.  You only need to run the following one time

In [14]:
%%time
vec = np.zeros( (test.X.shape[0], test.X.shape[0]) ) 
for i in range(test.X.shape[0]):
    if not i%100: print("Working on row " + str(i))
    for j in range(test.X.shape[0]):
        vec[i,j] = test.X[i].dot(test.X[j].T)

Working on row 0
Working on row 100
Working on row 200
Working on row 300
Working on row 400
Working on row 500
Working on row 600
Working on row 700
Working on row 800
Working on row 900
Working on row 1000
Working on row 1100
Working on row 1200
Working on row 1300
Working on row 1400
Working on row 1500
Working on row 1600
Working on row 1700
Working on row 1800
Working on row 1900
Working on row 2000
Working on row 2100
Working on row 2200
Working on row 2300
Working on row 2400
Working on row 2500
Working on row 2600
Working on row 2700
Working on row 2800
Working on row 2900
Working on row 3000
CPU times: user 12min 18s, sys: 84.7 ms, total: 12min 18s
Wall time: 12min 18s


Numpy dot products seem to take quite a long time

In [10]:
Xcpy = test.X - np.mean(test.X,axis=0)

In [17]:
%%time
gram = np.zeros( (test.X.shape[1], test.X.shape[1]) ) 
for i in range(Xcpy.shape[1]):
    if not i%100: print("Working on row " + str(i), flush=True)
    for j in range(Xcpy.shape[1]):
        gram[i,j] = Xcpy[:,i].dot(Xcpy[:,j].T)

Working on row 0
Working on row 100
Working on row 200
Working on row 300
Working on row 400
Working on row 500
Working on row 600
Working on row 700
Working on row 800
Working on row 900
Working on row 1000
Working on row 1100
Working on row 1200
Working on row 1300
Working on row 1400
Working on row 1500
Working on row 1600
Working on row 1700
Working on row 1800
Working on row 1900
Working on row 2000
Working on row 2100
Working on row 2200
Working on row 2300
Working on row 2400
Working on row 2500
Working on row 2600
Working on row 2700
Working on row 2800
Working on row 2900
Working on row 3000
Working on row 3100
Working on row 3200
Working on row 3300
Working on row 3400
Working on row 3500
Working on row 3600
Working on row 3700
Working on row 3800
Working on row 3900
Working on row 4000
Working on row 4100
Working on row 4200
Working on row 4300
Working on row 4400
Working on row 4500
Working on row 4600
Working on row 4700
Working on row 4800
Working on row 4900
CPU times: u

In [40]:
Xcpy.shape

(3005, 4999)

In [18]:
np.savez("zeisel-centered-gram-mat.npz", gram=gram)

Load the gram matrix if you have already computed it.

In [12]:
gram=np.load("zeisel-centered-gram-mat.npz")['gram']

Supplying the gram matrix shortens the calculations significantly.

In [21]:
%%time
en = sklm.ElasticNet(l1_ratio=.1,max_iter=5000, alpha=0.08, tol=0.0001, precompute=gram)
en.fit(Xcpy, vec0[:,0])

CPU times: user 15 s, sys: 511 ms, total: 15.5 s
Wall time: 5.4 s


In [36]:
vec0[:,0].shape

(3005,)

In [25]:
%%time
en = sklm.ElasticNet(l1_ratio=.1,max_iter=5000, alpha=0.08, tol=0.0001)
en.fit(test.X, vec0)

CPU times: user 1min 53s, sys: 7.54 s, total: 2min 1s
Wall time: 53.2 s


In [26]:
en.n_iter_

2112

In [44]:
en.intercept_

0.09650582362728785

In [29]:
np.mean(vec0)- en.coef_.dot(np.mean(test.X,axis=0))

0.05500420498329887

In [39]:
np.nonzero(en.coef_)[0].shape

(4,)

## Run the cross validation to fit the models

I haven't been able to find a place where we diverge on this data set, so I'm just going to try running the built in CV method without specifying an alphaList.  How sparse is it?

In [41]:
(1*(test.X != 0)).sum()/(test.X.shape[0]*test.X.shape[1])

0.13493800257555671

In [14]:
en = sklm.ElasticNetCV(l1_ratio=[.1, .5, .7, .9, .95, .99, 1], cv=5, max_iter=5000, n_jobs=2, precompute=True, verbose=1)

In [15]:
%%time
en.fit(Xcpy, vec0[:,0]-np.mean(vec0))

[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

CPU times: user 31min 49s, sys: 26min 23s, total: 58min 13s
Wall time: 19min 51s


ElasticNetCV(alphas=None, copy_X=True, cv=5, eps=0.001, fit_intercept=True,
       l1_ratio=[0.1, 0.5, 0.7, 0.9, 0.95, 0.99, 1], max_iter=5000,
       n_alphas=100, n_jobs=2, normalize=False, positive=False,
       precompute=True, random_state=None, selection='cyclic', tol=0.0001,
       verbose=1)

With `max_iter=10000`, this ran for 1 hour 45 minutes without finishing.  But we did not see any cases in which we didn't converge.

With `max_iter=1000`, I ran for approximately 3 hours 20 minutes before quitting. At least 9 times without convergence - I had to close my laptop while travelling (so I'm not sure if I got all of the output).  But still untenable since it is only for 1 cluster.

I tell it `precompute=True` and it only takes 20 min with `max_iter=5000` and no times where it fails to converge.  Seems that `precompute=True` forces it to re-use the Gram matrices in this case.

### Run on all clusters for one fold

Center the data to make precomputing the gram matrix worthwhile

In [12]:
Xcpy = test.X - np.mean(test.X,axis=0)

In [24]:
%%time

foldN = 4
mask = np.zeros(test.N, dtype=bool)
mask[folds[foldN]] = True

allMarks = []
allCoefs = []

for clust in np.unique(test.y):
    print("Working on cluster {}".format(clust), flush=True)
    en = sklm.ElasticNetCV(l1_ratio=[.1, .5, .7, .9, .95, .99, 1], cv=5, max_iter=5000, n_jobs=2, precompute=True, verbose=1)
    
    currVec = (test.y[~mask]==clust)*1
    en.fit(Xcpy[~mask], currVec[:,0] - currVec.mean())
    
    tempMarks = np.flipud(np.argsort(np.abs(en.coef_)))
    
    currMarks = []
    currCoefs = []
    
    ind = 0
    while en.coef_[tempMarks[ind]] != 0:
        currMarks.append(tempMarks[ind])
        currCoefs.append(en.coef_[tempMarks[ind]])
        
        ind += 1

    allMarks.append(np.array(currMarks))
    allCoefs.append(np.array(allCoefs))

Working on cluster 0


[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Working on cluster 1


[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Working on cluster 2


[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Working on cluster 3


[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Working on cluster 4


[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Working on cluster 5


[Parallel(n_jobs=2)]: Done  35 out of  35 | elapsed:  4.6min finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Working on cluster 6


[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Working on cluster 7


[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Working on cluster 8


[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

CPU times: user 2h 41min 12s, sys: 3h 10min 56s, total: 5h 52min 9s
Wall time: 1h 1min 39s


In [25]:
np.savez("zeisel-nets-fold{}-coefs".format(foldN), allCoefs)
np.savez("zeisel-nets-fold{}-marks".format(foldN), allMarks)

Some timing data: 6 CPU, 2GB RAM

* Fold 0: 59 min 53s. Convergence failed on 6 occasions for cluster 6
* Fold 1: 1hr 1 min 10s.  No convergence failures.
* Fold 2: 1hr 2 min 17s.  Convergence failures: 43 for cluster 5 (still says that it completed)
* Fold 3: 1hr 2 min 36s.  Convergence failures: 58 on cluster 5.  Again, still says that it completed.
* Fold 4: 1hr 1 min 39s.  No convergence failures

We will need to see how well we are able to classify cluster 5 after this is done.