# Intermediate Lesson on Spatial Modeling and Analytics
### Segment 1 of 4
## Introduction to R

In [3]:
# This code cell starts the necessary setup for Hour of CI lesson notebooks.
# First, it enables users to hide and unhide code by producing a 'Toggle raw code' button below.
# Second, it imports the hourofci package, which is necessary for lessons and interactive Jupyter Widgets.
# Third, it helps hide/control other aspects of Jupyter Notebooks to improve the user experience
# This is an initialization cell
# It is not displayed because the Slide Type is 'Skip'

from IPython.display import HTML, IFrame, Javascript, display
from ipywidgets import interactive
import ipywidgets as widgets
from ipywidgets import Layout

import getpass # This library allows us to get the username (User agent string)

# import package for hourofci project
import sys
sys.path.append('../../supplementary') # relative path (may change depending on the location of the lesson notebook)
import hourofci

import warnings
warnings.filterwarnings('ignore') # Hide warnings

# load javascript to initialize/hide cells, get user agent string, and hide output indicator
# hide code by introducing a toggle button "Toggle raw code"
# HTML(''' 
#     <script type="text/javascript" src=\"../../supplementary/js/custom.js\"></script>
    
#     <input id="toggle_code" type="button" value="Toggle raw code">
# ''')

HTML(''' 
    <script type="text/javascript" src=\"../../supplementary/js/custom.js\"></script>
    
    <style>
        .output_prompt{opacity:0;}
    </style>
    
    <input id="toggle_code" type="button" value="Toggle raw code">
''')


## <center>And now for something completely different!</center>

While all the other lessons in this Hour of CI collection have used Python for the coding language, in this lesson, we're introducting the very popular coding language called R. 

Read on to find out why...

## Comparing R and Python

<table>
    <tr style="background: #fff; text-align: left; vertical-align:">
        <td style="background: #fff; text-align: left; font-size: 23px;">
            R provides vast amounts of discipline-specific data science packages such as some for
<li>
Biostatistics
    </li>
    <li>
Geostatistics
    </li>
    <li>
Econometrics
    </li>
          
Python has lots of general purpose data-science libraries, including
<li>
Deep Learning (Tensorflow, …)
    </li>
    <li>
Machine Learning (scikit-learn, …)
    </li>
<br>
Importantly, Python is widely used for general analysis and to build scalable software.        
</td>
     <td style="width: 50%; background: #fff; text-align: left; vertical-align: top;"> <img src='supplementary/r_python.png' width="500" height="700" alt='map'>
        </td>
    </tr>
</table>










## Strengths of the R Language

Breadth of available packages
- Almost 20,000 packages as of April 2023

Discipline-specific data science functionality

Gentle learning curve

Analysis ecosystem built for 
- Statistical analysis
- Open-science

## Weaknesses of R Language

Speed
- R is considerably slower than Python      
- Loops are notoriously slow
  
Memory
  
- R is a single threaded programming language      
- It utilizes a single CPU at a time           
- Packages for multithreading exists, but they are not part of base R             
- Memory bottlenecks occur very frequently with medium size (1 to 2 GB) data
     
Inefficient R code is not as forgiving as Python
             
Avoid loops as much as possible!

Despite these shortcomings, the R language is extremely popular and powerful for discipline-specific data science, including spatial analysis and modeling.

## <center> If you're going to work with R, you need to know about <b>CRAN

## CRAN [(The Comprehensive R Archive Network)](https://cran.r-project.org/)

CRAN is a network of servers that serve all the R executables, source code and documentation
      
It is also an administrative organization that asserts policy and quality control over R packages. It is responsible for
- Ensuring new packages are open-source 
- Upholding documentation quality
- Making sure every R package in CRAN works!
     
CRAN is your one-stop-shop for downloading R.

## The R Ecosystem


<center><img src='supplementary/r_eco.png' width="500" height="700" alt='map'></center>


## The R Ecosystem: CRAN

CRAN hosts the vast majority of R packages and their documentation

All CRAN packages can be installed simply by:
- <code>Install.packages(“package_name”)</code>

Serves an exhaustive <a href= https://cran.r-project.org/web/packages/available_packages_by_name.html>list of all supported packages</a>

## The R Ecosystem: Bioconductor

Serves packages related to bioinformatics
- Extensive package list for fields such as genomics
- More specific in its scope compared to CRAN

As of April 2023, serves almost 2200 packages

Requires an R package called BiocManager to install packages
- <code>BiocManager::install(“package_name”)</code>   

## Important caveats about Bioconductor

- Packages on Bioconductor may not be as carefully managed, updated, revised

- It is not peer-reviewed and may not have documentation

Nevertheless, it is used frequently for personal or development projects

An R package, devtools, can be used to install non-CRAN R packages on GitHub directly
- <code>devtools::install_github(“owner_name/repo_name")</code>

## R Vignettes

*Vignettes* are one of the most useful package components available on CRAN

A vignette is a long-form description of a package, structured as an academic paper beginning with an introduction to the method(s) implemented.

- Provide detailed information and illustrations about specific function parameters. 
- Showcase use on sample problems

Here is an example vignette for spatstat, a commonly used spatial statistics package
- <a href = https://cran.r-project.org/web/packages/spatstat/vignettes/getstart.pdf>Example spatstat vignette</a>

## Task Views

Since there are so many packages on CRAN, *Task Views* can help you can find what you need.

A task view is a list of R packages organized within a theme
- Within the theme, task views organize R packages in terms of common analysis types

For example, the <a href = https://cran.r-project.org/web/views/Spatial.html>Analysis of Spatial Data task view</a>
- Contains a wide array of R packages
- Groups packages with respect to their use in different stages of spatial analysis

## R Documentation

Documentation for each package is provided as a PDF and a live document
- PDF version can be found in the CRAN page of a package
- See <a href=https://cran.r-project.org/web/packages/spatstat/index.html>spatstat example</a>
       
Targeted help about a function can be obtained via the R code:
- <code>? function_name</code>

Keyword search for a phrase or concept (use quotes for multiple words)
- <code>??"geographically weighted regression"</code>

And that's all you need to know to get started with R!

<font size="+1"><a style="background-color:blue;color:white;padding:12px;margin:10px;font-weight:bold;" href="sma-3.ipynb">Click here to go to the next notebook.</a></font>