## Overview of how to set up mounted path and pre-install libraries/packages:

1. Establish a standard convention within your Organization and your shared workspace for `{project}/{dbr_version}/{Library}` &/or `{.Rprofile}` paths 

2. Install libraries and packages in the path identified as `R_LIB_PATH_MOUNTED`   

3. Set up Cluster Configs:      

    - 3A [Recommended]: Set the Cluster Init Scripts with path to Volumes Init Script      
    - 3B [Alternative]: Set the Cluster Environment variable `R_PROFILE_USER`     
    
     
##### Post Installation of Library/Packages + Cluster Init Script / Environment Variable set up
With our Libs pre-installed + Cluster Init Script (via Volumes) OR Environment Variables set up, we will now make sure that the mounted path is `Read-Only`:          

4. Unmount + Remount Storage onto DBFS with `Read-Only` permissions        

5. Post Compute re-start/updates .... [TEST] R_library loads! 

In [0]:
## START cluster for setup: mmt_dbr14.3LTSML_cpu

### 1. Establish a standard convention within your Organization and your shared workspace for `{project}/{dbr_version}/{Library}` &/or `{.Rprofile}` paths


#### 1.1 Use Python and [dbutils](https://docs.databricks.com/en/dev-tools/databricks-utils.html) to  
- set up `Project`, `Library` paths
- extract configured `Scopes` and `Secrets` [[AWS](https://docs.databricks.com/en/security/secrets/secret-scopes.html) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/security/secrets/secret-scopes)] to store e.g. `External StorageBlob e.g. ADLS/S3` Shared Access Tokens 

In [0]:
%python

PROJECT_GROUP = "dais24_demo"
PROJECT_DIR = "faster_lib_loads"
DBR_VERSION = "14.3LTS_ML"

PATH_MOUNTED = f"/mnt/{PROJECT_GROUP}/"  ## points to blobContainerName = "dais24"
R_LIB_PATH_MOUNTED = f"/mnt/{PROJECT_GROUP}/{PROJECT_DIR}/{DBR_VERSION}/libs/r/"

R_PROFILE_PATH = f"/mnt/{PROJECT_GROUP}/{PROJECT_DIR}/{DBR_VERSION}/clusterEnv/r_profile/"  # .Rprofile

print(f"PATH_MOUNTED: {PATH_MOUNTED}")
print(f"R_LIB_PATH_MOUNTED: {R_LIB_PATH_MOUNTED}")
print(f"R_PROFILE_PATH: {R_PROFILE_PATH}")

#### 1.2 Blob Storage to associate PATH_MOUNTED  as `dbfs/mnt path to install specific r packages`

![Blob Storage as PATH_MOUNTED](./markdown_images/access_storage_container.png)

<!-- %python
from IPython.display import Image

Image(
    filename="/Workspace/Users/{username@email.com}/Faster_Lib_Loads/markdown_images/access_storage_container.png",
    width=1600,
) -->

![Blob Storage as SAStoken](./markdown_images/generateSAStoken.png)


<!-- %python
from IPython.display import Image

Image(
    filename="/Workspace/Users/{username@email.com}/Faster_Lib_Loads/markdown_images/generateSAStoken.png",
    width=1600,
) -->

In [0]:
%python
secret_scope_name = "dais24_fasterlibloads"
dbutils.secrets.list(f"{secret_scope_name}")

In [0]:
%python

storageAccountName = "hlsfieldexternal"
blobContainerName = "dais24"
secret_scope_name = "dais24_fasterlibloads"
r_sasToken = dbutils.secrets.get(f"{secret_scope_name}", "r_token")
rwr_sasToken = dbutils.secrets.get(f"{secret_scope_name}", "rwr_token")

mountPoint = PATH_MOUNTED


def mount_azblob(storageAccountName, blobContainerName, mountPoint, sasToken):

    # first unmount if already mounted
    if any(mount.mountPoint == mountPoint for mount in dbutils.fs.mounts()):
        dbutils.fs.unmount(mountPoint)

    try:
        # mount to specified mountPoint
        dbutils.fs.mount(
            source=f"wasbs://{blobContainerName}@{storageAccountName}.blob.core.windows.net",
            mount_point=mountPoint,
            extra_configs={
                f"fs.azure.sas.{blobContainerName}.{storageAccountName}.blob.core.windows.net": sasToken
            },
        )
        print("mount succeeded!")
    except Exception as e:
        print("mount exception", e)

        dbutils.fs.refreshMounts()

In [0]:
%python
mount_azblob(storageAccountName, blobContainerName, mountPoint, rwr_sasToken)

In [0]:
%python
dbutils.fs.mounts()

In [0]:
%python
[p for p in dbutils.fs.mounts() if f"{blobContainerName}" in p.source]

In [0]:
%python
mountpath = PATH_MOUNTED
display(dbutils.fs.ls(mountpath))

In [0]:
%python
R_LIB_PATH_MOUNTED

### 2. Install libraries and packages in the path identified above as R_LIB_PATH_MOUNTED 

#### 2.1 Use R language for R packages installation 

In [0]:
# setup HTTP User agent so that posit knows what version of R we are using
options(HTTPUserAgent = sprintf("R/%s R (%s)", getRversion(), paste(getRversion(), R.version["platform"], R.version["arch"], R.version["os"])))

# https://packagemanager.posit.co/client/#/
# this will vary based on Databricks runtime version! 
options(repos = c(POSIT = "https://packagemanager.posit.co/cran/__linux__/jammy/latest", CRAN="http://cran.us.r-project.org"))

# Ensure that you update the URL to match the linux version of the runtime e.g. jammy or bionic
# options(repos = c(POSIT = "https://packagemanager.posit.co/cran/__linux__/<linux-release-name>/latest"))


In [0]:
%fs mkdirs "dbfs:/mnt/dais24_demo/faster_lib_loads/14.3LTS_ML/libs/r/"

In [0]:
%fs ls "dbfs:/mnt/dais24_demo/faster_lib_loads/14.3LTS_ML/libs/r/"

In [0]:
packages <- c("data.table","car","lmtest","mclust","fitdistrplus","mixtools","extraDistr","actuar","forecast","SparkR","stringi","assertthat","naniar","tidyverse","XML","xml2","rcompanion","librarian", "ggiraph","ggiraphExtra","gtable", "ggplot2")

R_LIB_PATH_MOUNTED <- '/dbfs/mnt/dais24_demo/faster_lib_loads/14.3LTS_ML/libs/r/'

install.packages(packages, 
                 dependencies=TRUE,
                 INSTALL_opts = "--no-lock", 
                 repos=c(POSIT = "https://packagemanager.posit.co/cran/__linux__/jammy/latest"),                 
                 lib = R_LIB_PATH_MOUNTED,                  
                 upgrade=TRUE, update.packages = TRUE,
                 quiet = FALSE, 
                 verbose = TRUE
                ) 

# dependencies are downloaded and installed to lib path 

# Command took 15.21 minutes -- by may.merkletan@databricks.com at 6/4/2024, 1:49:01 PM on mmt_14.3LTSML_cpu_(r)

In [0]:
R_LIB_PATH_MOUNTED <- '/dbfs/mnt/dais24_demo/faster_lib_loads/14.3LTS_ML/libs/r/'

In [0]:
%fs ls /mnt/dais24_demo/faster_lib_loads/14.3LTS_ML/libs/r/

#### 2.2 Helpful quick check on `libsPath()` appending 

In [0]:
R_LIB_PATH_MOUNTED <- '/dbfs/mnt/dais24_demo/faster_lib_loads/14.3LTS_ML/libs/r/'
# Add the library to the search path at the start
.libPaths(c(.libPaths(), R_LIB_PATH_MOUNTED))

In [0]:
.libPaths()

In [0]:
packINFO <- as.data.frame(installed.packages())[,c("Package", "Version", "LibPath")]
rownames(packINFO) <- NULL

print(packINFO) 
print(dim(packINFO))

In [0]:
# CHECK WHERE packages installed... 
system.file(package = "ggplot2") ## default DBR is found earlier in search path 

In [0]:
rcompanion::compareGLM

In [0]:
system.file(package = "rcompanion") 

In [0]:
system.file(package = "ggiraphExtra") ## 

In [0]:
library(ggiraphExtra, lib.loc=.libPaths()[7])
ggiraphExtra::ggBoxplot

In [0]:
system.file(package = "ggplot2") 

In [0]:
ggplot2::geom_boxplot

### 3. Set up Cluster Configs:

### 3A [Recommended]: Set the Cluster Init Scripts with path to Volumes Init Script 

USE Volume `init.sh` to append pre-installed libraries/packages path `{R_LIB_PATH_MOUNTED}` to `.LibPaths()` leveraging `cat <<EOL`  to insert code into `/root/.Rprofile` during cluster startup


####  3A.1 We will use Python to help with `shell cmds`  
- create + copy Init shell script `r_profile_init.sh` from Workspace to Volumes

In [0]:
%python
!head -n 8 r/r_profile_init.sh

In [0]:
%python
!cp r/r_profile_init.sh /Volumes/mmt_external/dais24/ext_vols/init_scripts/r_profile_init.sh

In [0]:
%python
!head /Volumes/mmt_external/dais24/ext_vols/init_scripts/r_profile_init.sh

####  3A.2 Next We will specify `Init Scripts` within `Cluster Advance options`

![Init Scripts](./markdown_images/cluster_RProfile_InitVol.png)

<!-- %python
from IPython.display import Image

Image(
    filename="/Workspace/Users/{user@email/com}/Faster_Lib_Loads/markdown_images/cluster_RProfile_InitVol.png",
    width=900,
) -->

### 3B [Alternative]: Set the Cluster Environment variable `R_PROFILE_USER`


####  3B.1 We will use Python and [dbutils](https://docs.databricks.com/en/dev-tools/databricks-utils.html) to  
- set up a workspace `.Rprofile` script 
- specify `.Rprofile` script to include path `{R_LIB_PATH_MOUNTED}` to pre-installed libraries/packages during cluster startup 
- copy workspace `.Rprofile` script to mounted `{R_PROFILE_USER_PATH}`

In [0]:
%python
!head r/.Rprofile

In [0]:
%python

dbutils.fs.mkdirs(f"{R_PROFILE_PATH}")

R_PROFILE_FILEPATH = f"/dbfs{R_PROFILE_PATH}.Rprofile"
!echo $R_PROFILE_FILEPATH

!cp r/.Rprofile $R_PROFILE_FILEPATH

In [0]:
%python
!head $R_PROFILE_FILEPATH

####  3B.2 Next We will specify `Environment variables` within `Cluster Advance options`

`R_PROFILE_USER=/dbfs/{PROJECT_GROUP}/{PROJECT_DIR}/{DBR_version}/{clusterEnv}/r_profile/.Rprofile`



![cluster_RProfileUser_EnvVar](./markdown_images/cluster_RProfileUser_EnvVar.png)

<!-- %python
from IPython.display import Image

Image(
    filename="/Workspace/Users/{username@email/com}/Faster_Lib_Loads/markdown_images/cluster_RProfileUser_EnvVar.png",
    width=900,
) -->

### Post Installation of Library/Packages + Cluster Init Script / Environment Variable set up
With our Libs pre-installed + Cluster Init Script (via Volumes) OR Environment Variables set up, we will now make sure that the mounted path is Read-Only:  

### 4. Unmount + Remount Storage onto DBFS with Read-Only permissions 
Now that the Rpackages are installed,   
We can unmount the (Azure blob) storage with `readNwrite` permissions    
And re-mount the storage path with `read-only` permissions    

*We use `Python` to facilitate this step*

<!-- ```
%python
mountpath = PATH_MOUNTED 
dbutils.fs.unmount(mountpath)

dbutils.fs.refreshMounts()
``` -->

In [0]:
%python
mount_azblob(storageAccountName, blobContainerName, mountPoint, r_sasToken)

In [0]:
%python
[p for p in dbutils.fs.mounts() if "dais24" in p.source]

In [0]:
%fs ls "dbfs:/mnt/dais24_demo/faster_lib_loads/14.3LTS_ML/libs/r/"

In [0]:
%fs mkdirs "dbfs:/mnt/dais24_demo/faster_lib_loads/14.3LTS_ML/libs/test"

### 5. Post Compute re-start/updates 

In [0]:
# - Change compute to : mmt_14.3LTSML_cpu_(fasterlibloads_r) 

In [0]:
.libPaths()

#### 5.1 [TEST] R_library loads

In [0]:
library(rcompanion) 

In [0]:
??rcompanion 

In [0]:
rcompanion::compareGLM

In [0]:
system.file(package="rcompanion")

In [0]:
system.file(package = "ggplot2") ## default path 

In [0]:
system.file(package = "ggiraphExtra") ## actually requires a more recent version of ggplot2 

In [0]:
library("ggiraphExtra", lib.loc=.libPaths()[7]) ## requires specifying path for more recent version of ggplot2
ggiraphExtra::ggBoxplot

In [0]:
system.file(package = "ggplot2") ## 

In [0]:
library(ggplot2, lib.loc = .libPaths()[7])
ggplot(diamonds, aes(carat, price, colour = clarity, group = clarity)) + geom_point(alpha = 0.3) + stat_smooth()

In [0]:
diamonds

In [0]:
system.file(package = "ggplot2") ## 