# Generalized Linear Models

This notebook introduces the algorithms within [Dask-GLM](https://github.com/dask/dask-glm) for [Generalized Linear Models](https://en.wikipedia.org/wiki/Generalized_linear_model).

## Start Dask Client for Dashboard

Starting the Dask Client is optional.  It will provide a dashboard which 
is useful to gain insight on the computation.  

The link to the dashboard will become visible when you create the client below.  We recommend having it open on one side of your screen while using your notebook on the other side.  This can take some effort to arrange your windows, but seeing them both at the same is very useful when learning.

In [1]:
from dask.distributed import Client, progress
client = Client(processes=False, threads_per_worker=4,
                n_workers=1, memory_limit='2GB')
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://192.168.176.2:8787/status,

0,1
Dashboard: http://192.168.176.2:8787/status,Workers: 1
Total threads: 4,Total memory: 1.86 GiB
Status: running,Using processes: False

0,1
Comm: inproc://192.168.176.2/615/1,Workers: 1
Dashboard: http://192.168.176.2:8787/status,Total threads: 4
Started: Just now,Total memory: 1.86 GiB

0,1
Comm: inproc://192.168.176.2/615/4,Total threads: 4
Dashboard: http://192.168.176.2:32827/status,Memory: 1.86 GiB
Nanny: None,
Local directory: /home/jovyan/work/machine-learning/dask-worker-space/worker-9jcahl34,Local directory: /home/jovyan/work/machine-learning/dask-worker-space/worker-9jcahl34


## Make a random dataset

In [2]:
from dask_glm.datasets import make_regression
X, y = make_regression(n_samples=200000, n_features=100, n_informative=5, chunksize=10000)
X

Unnamed: 0,Array,Chunk
Bytes,152.59 MiB,7.63 MiB
Shape,"(200000, 100)","(10000, 100)"
Count,20 Tasks,20 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 152.59 MiB 7.63 MiB Shape (200000, 100) (10000, 100) Count 20 Tasks 20 Chunks Type float64 numpy.ndarray",100  200000,

Unnamed: 0,Array,Chunk
Bytes,152.59 MiB,7.63 MiB
Shape,"(200000, 100)","(10000, 100)"
Count,20 Tasks,20 Chunks
Type,float64,numpy.ndarray


In [3]:
import dask
X, y = dask.persist(X, y)

## Solve with a GLM algorithm

*We also recommend looking at the "Graph" dashboard during execution if available*

In [4]:
import dask_glm.algorithms

b = dask_glm.algorithms.admm(X, y, max_iter=5)

## Solve with a difference GLM algorithm

In [5]:
b = dask_glm.algorithms.proximal_grad(X, y, max_iter=5)

  return func(*(_execute_task(a, cache) for a in args))


## Customizable with different families and regularizers

The Dask-GLM project is nicely modular, allowing for different GLM families and regularizers, including a relatively straightforward interface for implementing custom ones.

In [6]:
import dask_glm.families
import dask_glm.regularizers

family = dask_glm.families.Poisson()
regularizer = dask_glm.regularizers.ElasticNet()

b = dask_glm.algorithms.proximal_grad(
    X, y, 
    max_iter=5, 
    family=family,
    regularizer=regularizer,
)

  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in args))


In [7]:
dask_glm.families.Poisson??

[0;31mInit signature:[0m [0mdask_glm[0m[0;34m.[0m[0mfamilies[0m[0;34m.[0m[0mPoisson[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m        
[0;32mclass[0m [0mPoisson[0m[0;34m([0m[0mobject[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m"""[0m
[0;34m    This implements `Poisson regression`_[0m
[0;34m[0m
[0;34m    Useful for modelling count data.[0m
[0;34m[0m
[0;34m    .. _Poisson regression: https://en.wikipedia.org/wiki/Poisson_regression[0m
[0;34m    """[0m[0;34m[0m
[0;34m[0m    [0;34m@[0m[0mstaticmethod[0m[0;34m[0m
[0;34m[0m    [0;32mdef[0m [0mloglike[0m[0;34m([0m[0mXbeta[0m[0;34m,[0m [0my[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0meXbeta[0m [0;34m=[0m [0mexp[0m[0;34m([0m[0mXbeta[0m[0;34m)[0m[0;34m[0m
[0;34m[0m        [0myXbeta[0m [0;34m=[0m [0my[0m [0;34m*[0m [0mXbeta[0m[0;34m[0m
[0;34m[0m        [0;32mreturn[0m [0;34m([0m[0meXbeta[0m [0;

In [8]:
dask_glm.regularizers.ElasticNet??

[0;31mInit signature:[0m [0mdask_glm[0m[0;34m.[0m[0mregularizers[0m[0;34m.[0m[0mElasticNet[0m[0;34m([0m[0mweight[0m[0;34m=[0m[0;36m0.5[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m        
[0;32mclass[0m [0mElasticNet[0m[0;34m([0m[0mRegularizer[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m"""Elastic net regularization."""[0m[0;34m[0m
[0;34m[0m    [0mname[0m [0;34m=[0m [0;34m'elastic_net'[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m    [0;32mdef[0m [0m__init__[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mweight[0m[0;34m=[0m[0;36m0.5[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0mself[0m[0;34m.[0m[0mweight[0m [0;34m=[0m [0mweight[0m[0;34m[0m
[0;34m[0m        [0mself[0m[0;34m.[0m[0ml1[0m [0;34m=[0m [0mL1[0m[0;34m([0m[0;34m)[0m[0;34m[0m
[0;34m[0m        [0mself[0m[0;34m.[0m[0ml2[0m [0;34m=[0m [0mL2[0m[0;34m([0m[0;34m)[0m[0;34m[0m
[0;34m[0m[0;34m[0m