<a href="https://colab.research.google.com/github/Echisholm21/azmpdata/blob/master/azmpdata_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **<font color="Black">azmpdata - R Package Demo</font>**

From **github**, you can launch this page by clicking on the "Open in Colab" badge.

<br>

### **This Tutorial uses Google Colab.**

![colab logo](https://github.com/halldm2000/NOAA-AI-2020-TUTORIAL/blob/master/Images/colab.png?raw=true)

<br>


## **Set up your environment**

- First we need to tell the notebook we will be working in R
- Then we install azmpdata from github
- Lastly, call the library to make sure it is loaded (you should get a message about your package being up to date)

In [None]:
# activate R magic
%load_ext rpy2.ipython

In [None]:
%%R
# download devtools so we can install from github
install.packages('devtools')

# install azmpdata
devtools::install_github('casaultb/azmpdata')
library(azmpdata)

## **Let's try to access some data!**
- data can be called directly (if we know the dataframe name)
- R auto-complete magic doesn't work in colab but in R-Studio you would get a list of dataframes if you started typing 'data(...)' slowly

In [None]:
%%R

data('Discrete_Annual_Broadscale')
head(Discrete_Annual_Broadscale)

- we can also list all the available dataframes

In [None]:
%%R

data(package = 'azmpdata')

## **Now you try**
- Try calling one of the other available dataframes from the package
- use the `summary()` function to see the data
- Try your favourite 'data peek' function!

In [None]:
%%R

data('Zooplankton_Occupations_Broadscale')
# TODO: Complete activity instructions above!


## **Searching azmpdata**

There is a built in search function which can help you find the data you are looking for within the azmpdata package



- This search function can:
    - Search through variable names in all dataframes
    - Search through help files 

In [None]:
%%R

res <- variable_lookup('nitrate')
print(res)

In [None]:
%%R

res <- variable_lookup('stratification', search_help = TRUE)
print(res)

The result you get from the search function includes the variable name that matched your search, and the dataframe in which it is contained. 

You can then call the dataframe with your desired variable!

In [None]:
%%R

head(get(res$dataframe[1]))

## **Now you try**
- Try searching for the term 'zooplankton'
- Try combining your search terms (eg. 'zooplankton' and 'dry_weight')
- Try searching through both the variable names and help files
- Try calling the dataframe which matches your search terms and viewing the data!

In [None]:
%%R

variable_lookup(...)
variable_lookup(..., search_help = TRUE) # search through help files!

## **Analysis and Plotting**

- We can use dplyr, or base R tools to manipulate, analyze and plot the data
- We can also use some specially deisgned tools to simplify these tasks


Let's download another package `multivaR`

In [None]:
%%R

devtools::install_github('echisholm21/multivaR')
install.packages('oce', dependencies = TRUE)
library(multivaR)
library(dplyr)

- We can use multivaR to do common analysis tasks such as calculate anomalies

In [None]:
%%R

# call in some data to play with
data('Derived_Occupations_Stations')

df_anom <- calculate_anomaly(data = Derived_Occupations_Stations,
                  anomalyType = 'monthly', 
                  climatologyYears = c(1999, 2010),
                  var = 'nitrate_0_50',
                  normalizedAnomaly = FALSE)
head(df_anom)

Or even look at the spatial extent of our data

In [None]:
%%R
# plot_region(name = 'HL2')
 sysreg_att<- system.file('extdata/', 'polygons_attributes.csv', package = 'multivaR', mustWork = TRUE)
 sysreg_geo<- system.file('extdata/', 'polygons_geometry.csv', package = 'multivaR', mustWork = TRUE)

 regtab_att <- utils::read.csv(sysreg_att)
 regtab_geo <- utils::read.csv(sysreg_geo)

head(regtab_att)




In [None]:
%%R
# get the info just for the station of interest
subtab <- regtab_att[regtab_att$sname == 'P5',]

# join with geographic info
subtab_geo <- regtab_geo[regtab_geo$record == unique(subtab$record),]


full_join(subtab, subtab_geo)

## **Now you try**
- use help(package = 'multivaR') to see what else multivaR can do
- try calculating a PCA for the 'Derived_Occupations_Stations' dataframe 


In [None]:
%%R

help(package = 'multivaR')
help(..., package = 'multivaR') #HINT: Insert the function name to replace '...'

# call in data

# perform PCA
PCA(...) # HINT: you should only give the PCA function data variables, not metadata!