**Notas para contenedor de docker:**

Comando de docker para ejecución de la nota de forma local:

nota: cambiar `<ruta a mi directorio>` por la ruta de directorio que se desea mapear a `/datos` dentro del contenedor de docker.

```
docker run --rm -v <ruta a mi directorio>:/datos --name jupyterlab_r_kernel_local -p 8888:8888 -d palmoreck/jupyterlab_r_kernel:1.1.0
```

password para jupyterlab: `qwerty`

Detener el contenedor de docker:

```
docker stop jupyterlab_r_kernel_local
```


Documentación de la imagen de docker `palmoreck/jupyterlab_r_kernel:1.1.0` en [liga](https://github.com/palmoreck/dockerfiles/tree/master/jupyterlab/r_kernel).

---

Esta nota utiliza métodos vistos en [1.5.Integracion_numerica](https://github.com/ITAM-DS/analisis-numerico-computo-cientifico/blob/master/temas/I.computo_cientifico/1.5.Integracion_numerica.ipynb)

In [1]:
install.packages("microbenchmark",lib="/usr/local/lib/R/site-library/",
                repos="https://cran.itam.mx/",verbose=TRUE)

system (cmd0): /usr/lib/R/bin/R CMD INSTALL

foundpkgs: microbenchmark, /tmp/RtmpCCUlp9/downloaded_packages/microbenchmark_1.4-7.tar.gz

files: /tmp/RtmpCCUlp9/downloaded_packages/microbenchmark_1.4-7.tar.gz

1): succeeded '/usr/lib/R/bin/R CMD INSTALL -l '/usr/local/lib/R/site-library' /tmp/RtmpCCUlp9/downloaded_packages/microbenchmark_1.4-7.tar.gz'



In [2]:
install.packages("tictoc",lib="/usr/local/lib/R/site-library/",
                repos="https://cran.itam.mx/",verbose=TRUE)

system (cmd0): /usr/lib/R/bin/R CMD INSTALL

foundpkgs: tictoc, /tmp/RtmpCCUlp9/downloaded_packages/tictoc_1.0.tar.gz

files: /tmp/RtmpCCUlp9/downloaded_packages/tictoc_1.0.tar.gz

1): succeeded '/usr/lib/R/bin/R CMD INSTALL -l '/usr/local/lib/R/site-library' /tmp/RtmpCCUlp9/downloaded_packages/tictoc_1.0.tar.gz'



# Parallel

Entre las herramientas más populares en R para procesamiento en paralelo están:

* [Simple Network Of Workstations: snow](https://www.rdocumentation.org/packages/snow/versions/0.4-3), ver [liga](http://homepage.divms.uiowa.edu/~luke/R/cluster/cluster.html) para más información (ya es una herramienta incluida en el paquete **parallel**).

* [multicore](https://www.rdocumentation.org/packages/future/versions/1.15.1/topics/multicore) (funciona en la familia Unix pero no en Windows y ya es una herramienta incluida en el paquete **parallel**).

* [foreach](https://www.rdocumentation.org/packages/foreach/versions/1.4.7/topics/foreach). Hay ventajas al usarse con el paquete [iterators](https://www.rdocumentation.org/packages/iterators/versions/1.0.12).

* [Rmpi](https://www.rdocumentation.org/packages/Rmpi/versions/0.6-9). Paralelización en máquinas multicore y en clústers de máquinas.

**Comentarios:**

* Las primeras dos son parte del paquete [parallel](https://www.rdocumentation.org/packages/parallel/versions/3.6.2) desde la versión de R().

* Los cuatro paquetes de arriba emplean un paradigma de programación en paralelo del tipo: *scatter/gather*: se tienen múltiples instancias de R corriendo al mismo tiempo (revisar si esto es correcto para el caso de *multicore*...), ya sea en un clúster de máquinas, o en una máquina multicore. Una de las instancias se le denomina *manager*  y las restantes *workers*. El cómputo en paralelo procede como sigue:

* **scatter**: *manager* descompone el cómputo a realizar en *chunks* y envía (*scatters*) los chunks a *workers*.

* **chunk computation**: *workers* hacen el cómputo en cada *chunk* y envían de regreso los resultados a *manager*.

* **gather**: *manager* recibe (*gathers*) los resultados y los combina para resolver el problema.



## Ejemplos

### 1) Hello world!

In [None]:
library(parallel)

In [None]:
p<-detectCores()

In [None]:
p

In [None]:
cl<-makeCluster(p)

In [None]:
cl

In [228]:
clusterApply(cl, 1:p,function(dummy)print("Hello world!"))

In [229]:
clusterApply(cl, 1:5,function(dummy)print("Hello world!"))

In [3]:
library(microbenchmark)
library(tictoc)

### 2) Regla compuesta del rectángulo

In [7]:
f<-function(x)exp(-x**2)

In [8]:
a<-0
b<-1
n<-10**6
h_hat<-(b-a)/n

**Forma secuencial**

In [9]:
Rcf1<-function(f,a,b,n){
    #Compute numerical approximation using rectangle or mid-point method in 
    #an interval.
    #Nodes are generated via formula: x_i = a+(i+1/2)h_hat for i=0,1,...,n-1 and h_hat=(b-a)/n
    #Args:
    #    f (function): function of integrand
    #    a (int): left point of interval
    #    b (int): right point of interval
    #    n (int): number of subintervals
    #Returns:
    #    Rcf (float)
    h_hat<-(b-a)/n
    sum_res<-0
    for(j in 0:(n-1)){
        x<-a+(j+1/2.0)*h_hat
        sum_res<-sum_res+f(x)
    }
    h_hat*sum_res
}

In [10]:
system.time(aprox<-Rcf1(f,a,b,n))

   user  system elapsed 
  0.600   0.000   0.607 

In [11]:
err_relativo<-function(aprox,obj)abs(aprox-obj)/abs(obj)

In [12]:
obj<-integrate(Vectorize(f),a,b) #en la documentación de integrate
                                 #se menciona que se utilice Vectorize

In [13]:
err_relativo(aprox,obj$value)

In [14]:
Rcf2<-function(f,a,b,n){
    #Compute numerical approximation using rectangle or mid-point method in 
    #an interval.
    #Nodes are generated via formula: x_i = a+(i+1/2)h_hat for i=0,1,...,n-1 and h_hat=(b-a)/n
    #Args:
    #    f (function): function of integrand
    #    a (int): left point of interval
    #    b (int): right point of interval
    #    n (int): number of subintervals
    #Returns:
    #    Rcf (float)
    h_hat<-(b-a)/n
    sum_res<-0
    x<-vapply(0:(n-1),function(j)a+(j+1/2.0)*h_hat,numeric(1))
    for(j in 1:n){
        sum_res<-sum_res+f(x[j])
    }
    h_hat*sum_res
}

In [15]:
system.time(aprox<-Rcf2(f,a,b,n))

   user  system elapsed 
  1.490   0.000   1.491 

In [16]:
err_relativo(aprox,obj$value)

Una implementación que utiliza la función `sum` de `R` es la siguiente:

In [17]:
Rcf3<-function(f,a,b,n){
    #Compute numerical approximation using rectangle or mid-point method in 
    #an interval.
    #Nodes are generated via formula: x_i = a+(i+1/2)h_hat for i=0,1,...,n-1 and h_hat=(b-a)/n
    #Args:
    #    f (function): function of integrand
    #    a (int): left point of interval
    #    b (int): right point of interval
    #    n (int): number of subintervals
    #Returns:
    #    Rcf (float)
    h_hat<-(b-a)/n
    x<-vapply(0:(n-1),function(j)a+(j+1/2.0)*h_hat,numeric(1))
    h_hat*sum(f(x))
}

In [18]:
system.time(aprox<-Rcf3(f,a,b,n))

   user  system elapsed 
  1.070   0.010   1.075 

In [19]:
err_relativo(aprox,obj$value)

In [20]:
library(tictoc)

In [21]:
tic("medición de tiempo de regla de trapecio secuencial con tictoc")
tic()
Rcf1(f,a,b,n)
toc()

medición de tiempo de regla de trapecio secuencial con tictoc: 0.703 sec elapsed


In [22]:
mbk<-microbenchmark(
    Rcf1(f,a,b,n),
    Rcf2(f,a,b,n),
    Rcf3(f,a,b,n),
    times=10
    )

In [23]:
print(mbk)

Unit: milliseconds
             expr       min        lq      mean    median        uq       max
 Rcf1(f, a, b, n)  535.5175  538.3726  548.1754  545.6214  547.2653  594.4634
 Rcf2(f, a, b, n) 1167.4706 1174.7262 1191.5232 1181.6957 1210.0328 1220.6331
 Rcf3(f, a, b, n)  684.7582  702.3215  717.8217  712.9784  719.0908  794.4511
 neval
    10
    10
    10


**Forma en paralelo**

In [27]:
ns_p<-as.integer(n/p)

In [28]:
sprintf("número de subintervalos: %d",n)

In [29]:
sprintf("número de subintervalos por proceso: %d",ns_p)

In [32]:
Rcf_parallel<-function(mi_id){
    begin<-mi_id*ns_p
    end<-begin+ns_p
    suma_res<-0
    for(j in begin:(end-1)){
        x<-a+(j+1/2.0)*h_hat
        suma_res<-suma_res+f(x)
    }
    suma_res    
}

In [89]:
clusterExport(cl,c('ns_p','a','f','h_hat'))

In [34]:
tic("regla Rcf_parallel")
result<-clusterApply(cl,0:(p-1),Rcf_parallel)
aprox<-h_hat*Reduce(sum,result)
toc()

regla Rcf_parallel: 0.477 sec elapsed


In [35]:
err_relativo(aprox,obj$value)

In [36]:
clapply<-function(cl,p){
    result<-clusterApply(cl,0:(p-1),Rcf_parallel)
    aprox<-h_hat*Reduce(sum,result)
}

In [37]:
mbk<-microbenchmark(
    Rcf1(f,a,b,n),
    clapply(cl,p),
    times=10
    )

In [38]:
print(mbk)

Unit: milliseconds
             expr      min       lq     mean   median       uq      max neval
 Rcf1(f, a, b, n) 536.6519 545.6601 572.3095 550.1806 558.9702 719.3401    10
   clapply(cl, p) 305.8862 316.0137 347.8998 345.7567 375.7119 396.7314    10


Una implementación utilizando clusterSplit:

In [66]:
Rcf_parallel2<-function(chunk){
    suma_res<-0
    for(x in chunk){
        suma_res<-suma_res+f(x)
    }
    suma_res    
}

In [69]:
tic()
chunks<-clusterSplit(cl,vapply(0:(n-1),function(j)a+(j+1/2.0)*h_hat,numeric(1))) #se crean los chunks
                                                                                 #del conjunto de nodos
result<-clusterApply(cl,chunks,Rcf_parallel2)
aprox<-h_hat*Reduce(sum,result)
toc()

1.253 sec elapsed


In [70]:
err_relativo(aprox,obj$value)

In [221]:
Rcf_parallel3<-function(chunk){
    ns_p<-length(chunk)
    begin<-chunk[1]
    end<-begin+ns_p
    suma_res<-0
    for(j in begin:(end-1)){
        x<-a+(j+1/2.0)*h_hat
        suma_res<-suma_res+f(x)
    }
    suma_res    
}

In [222]:
tic()
chunks<-clusterSplit(cl,0:(n-1)) #se crean los chunks
                                 #del iterable
result<-clusterApply(cl,chunks,Rcf_parallel3)
aprox<-h_hat*Reduce(sum,result)
toc()

0.551 sec elapsed


In [223]:
err_relativo(aprox,obj$value)

Midiendo con benchmark:

In [225]:
clapply3<-function(cl){
    chunks<-clusterSplit(cl,0:(n-1))
    result<-clusterApply(cl,chunks,Rcf_parallel3)
    aprox<-h_hat*Reduce(sum,result)
}

In [226]:
mbk<-microbenchmark(
    Rcf1(f,a,b,n),
    clapply(cl,p),
    clapply2(cl,p),
    clapply3(cl),
    times=10
    )

In [227]:
print(mbk)

Unit: milliseconds
             expr        min        lq      mean    median        uq       max
 Rcf1(f, a, b, n) 547.169119 561.60741 576.21081 564.64923 586.70883 649.34550
   clapply(cl, p)   5.765622  10.96246  36.00702  49.67209  51.45157  62.24972
  clapply2(cl, p) 811.023255 833.16958 853.15478 854.25546 872.13496 908.65073
     clapply3(cl) 429.537391 463.68564 488.04472 480.09506 513.84278 553.04516
 neval
    10
    10
    10
    10


**Gráfica**

In [220]:
n<-10**6
h_hat<-(b-a)/n
clusterExport(cl,c('a','f','h_hat'))

**Una buena práctica es detener el clúster, observar que en la documentación se menciona:** It is good practice to shut down the workers by calling stopCluster: however the workers will terminate themselves once the socket on which they are listening for commands becomes unavailable, which it should if the master R session is completed (or its process dies).

In [35]:
stopCluster(cl)

**Referencias:**

1. N. Matloff, Parallel Computing for Data Science. With Examples in R, C++ and CUDA, 2014.

2. [2.1.Un_poco_de_historia_y_generalidades](https://github.com/ITAM-DS/analisis-numerico-computo-cientifico/blob/master/temas/II.computo_paralelo/2.1.Un_poco_de_historia_y_generalidades.ipynb)

3. [2.2.Sistemas_de_memoria_compartida.ipynb](https://github.com/ITAM-DS/analisis-numerico-computo-cientifico/blob/master/temas/II.computo_paralelo/2.2.Sistemas_de_memoria_compartida.ipynb)

Otras referencias:

* [snow Simplified](http://www.sfu.ca/~sblay/R/snow.html)

* [Using foreach and iterators for manual parallel execution](https://docs.microsoft.com/en-us/machine-learning-server/r/how-to-revoscaler-distributed-computing-foreach)

Otro paquete a revisar:

* [future](https://www.rdocumentation.org/packages/future/versions/1.16.0)