# Environment Setup

### Import necessary libraries
In Python, most functionality comes in the form of modules, which must be imported by your script prior to use. It is common for 
most python scripts and notebooks to start with an import block, where required functions and classes may be imported.


The import is typically in the form of `import module`, but you can also import submodules using `from module import submodule`.
You may also import a module and give it an alternate name using `import module as new_name`

In [1]:
# Import Python core modules
import time
tic = time.time()
import os
import sys
import pathlib as pl
import platform
from pprint import pprint
from datetime import datetime, timedelta
import shutil

# Import modules for reading archived data
from zipfile import ZipFile
import gzip

# Import module for accessing resources on a file server
import requests
from io import BytesIO

# Import GIS libraries
import arcpy
import arcgis
from arcgis.gis import GIS

# Import modules for numerical computation and analysis
import numpy as np
import pandas as pd
import xarray as xr
from scipy import stats

# Import modules for plotting and visuzliation
import matplotlib
import matplotlib.pyplot as plt
from ipywidgets import widgets
from ipywidgets import *
from IPython.display import display, HTML

# Do not display warnings. This is because the geoglows.streamflow.latlon_to_reach issues a shapely deprecation warning every time
import warnings
warnings.filterwarnings('ignore')

### Install and import libraries that are not part of the ArcGIS Pro Python environment

These are Python libraries that are not distributed with this distribution of ArcGIS Pro.

* `geoglows` library is used to pull streamflow forecasts from the GEOGloWS service.
* `plotly` library is used in conjunction with the `geoglows` library to enable interactive plotting.
* `gdown` library will allow us to download files from a Google Drive dataset.
* `cartopy` library enables some custom geospatial plotting that is required by the Q2Q scripts provided. We will attempt to import the library and then install using a `conda` command if the library is not available in the current environment.

In [2]:
# Geoglows API
try:
    import geoglows
except ImportError:
    print('Could not import geoglows library. Installing using pip.')
    
    # Start by installing the package and importing it to your code. Run this cell to do that.
    !pip install geoglows -q
    import geoglows

# Plotly plotting library to support geoglows plotting
try:
    import plotly
except ImportError:
    print('Could not import plotly library. Installing using pip.')
    
    # Start by installing the package and importing it to your code. Run this cell to do that.
    !pip install -U kaleido plotly==5.3.1 -q
    import plotly

# In the Esri Notebook environment, plotly only seems to be able to render to the browser
plotly.io.renderers.default = "browser"

# GDOWN - for downloading files from Google Drive
try:
    import gdown
except ImportError:
    print('Could not import gdown library. Installing using conda.')
    !conda install --freeze-installed esri::numpy=1.20.* conda-forge::gdown -y
    import gdown

Could not import geoglows library. Installing using pip.
Could not import gdown library. Installing using conda.
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done
﻿
## Package Plan ##
﻿
  environment location: C:\Users\ksampson\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone
﻿
  added / updated specs:
    - conda-forge::gdown
    - esri::numpy=1.20
﻿
﻿
The following packages will be downloaded:
﻿
    package                    |            build
    ---------------------------|-----------------
    backcall-0.2.0             |     pyhd3eb1b0_0          14 KB  esri
    ------------------------------------------------------------
                                           Total:          14 KB
﻿
The following NEW packages will be INSTALLED:
﻿
  filelock           pkgs/main/win-64::filelock-3.9.0-py39haa95532_0
  gdown              conda-forge/noarch::gdown-4.7.1-pyhd8ed1ab_0
﻿
The following packages will be UPDATED:
﻿
  attr

### Connecting ArcGIS Online session through ArcGIS Pro

This will establish the user connection to ArcGIS Online. There are many ways to establish this connection. [Read here for all possible authentication methods](https://developers.arcgis.com/python/guide/working-with-different-authentication-schemes/)

It is often useful to write scripts that work against the active portal in the ArcGIS Pro app.

Using the `pro` authentication scheme, scripts can create an instance of the `GIS` class representing the active portal in ArcGIS Pro without requiring the user to pass their credentials. In this mode, users can leverage the Pro app to login to the portal and their scripts can use whichever Portal is currently active. This mode can also serve as a bridge for users with advanced authentication scenarios like IWA using NTLM or Kerberos or Smart Card where signing in with credentials may not be possible or desirable.

> NOTE: When using a [Named User license](https://pro.arcgis.com/en/pro-app/get-started/licensing-arcgis-pro.htm) to license ArcGIS Pro, unless the [Sign me in automatically](https://pro.arcgis.com/en/pro-app/help/projects/sign-in-to-your-organization.htm) checkbox is selected when signing into the licensing portal or ArcGIS Pro has been [licensed for offline use](https://pro.arcgis.com/en/pro-app/get-started/start-arcgis-pro-with-a-named-user-license.htm#ESRI_SECTION1_15AD453E27C446CE9B51D45C021E8067), ArcGIS Pro should be installed and concurrently running on the machine that executes the script for authentication to succeed. When selecting `Sign me in automatically`, ArcGIS 
> Pro can remain closed for 2 weeks by default before the portal token expires. See [here](https://enterprise.arcgis.com/en/portal/latest/administer/windows/specify-the-default-token-expiration-time.htm) for details on configuring token settings.

In [3]:
print("Active Portal in ArcGIS Pro")  
gis = GIS("pro")
print("Logged in as " + str(gis.properties.user.username))

Active Portal in ArcGIS Pro
Logged in as ksampson


### Get information about this map document

Using `arcpy.mp`, we can manipulate the content of existing ArcGIS Pro projects. We will read the information about this project in order to set some important path variables for using later in the training. [Read more...](https://pro.arcgis.com/en/pro-app/latest/arcpy/mapping/introduction-to-arcpy-mp.htm)

In [4]:
# Get available information from the current map document
aprx = arcpy.mp.ArcGISProject("current")

# Find the directory of the current project
aprx_path = aprx.filePath

# Find the home folder for the current project
home_folder = aprx.homeFolder

# Get a list of the current folder connections available to the current project
folder_connections = aprx.folderConnections

# Get the default GeoDatabase associated with the current project
default_gdb = aprx.defaultGeodatabase

# Get names for open Maps in this project
open_maps = [item.name for item in aprx.listMaps()]

# Move system path to the home folder
sys.path.insert(0, home_folder)
os.chdir(home_folder)

### Find local environment and directory structure

This function use the .aprx home folder location, and determine other relative paths. The rest of the training will assume that the directory structure has not been altered from that found in the GitHub repository, and that there are separate directories for `\notebooks`, `\scripts`, and `\data`.

In [5]:
# Derive the root folder for this training based on the .aprx home folder
# Assumes map document project is in the GitHub repository, and moves up 2 directories
root_dir = pl.Path(home_folder).parent.parent

# Set the notebook directory
notebook_dir = root_dir / 'notebooks'

# Set the directory to the scripts used in the training
scripts_dir = root_dir / 'scripts'

# Set the directory to the datasets used in the training
data_dir = root_dir / 'data'
input_data_dir = data_dir / 'input'

# Set the directory to save output datasets derived during the training
output_data_dir = data_dir / 'output'

# User specific data
user_name = os.environ.get('USERNAME', None)
temp_dir = os.environ.get('TEMP', fr'C:\{user_name}')

The next codeblock will attempt to find the current active conda environment and path

In [6]:
# Get the conda path and environment name
envs = ! conda env list
active_env = list(filter(lambda s: '*' in str(s), envs))[0]
env_name = active_env.split()[0]
active_env_dir = pl.Path(active_env.split()[-1])
print('Current conda environment:\n\t{0} {1}'.format(env_name, active_env_dir))

Current conda environment:
	arcgispro-py3-clone C:\Users\ksampson\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone


### Output Options

To ensure the default behaviour in ArcGIS Pro to automatically add geoprocessing outputs to an active map or scene is enabled, and to ensure you are allowed to overwrite existing outputs, the following properties of the `arcpy.env` module may be set to `True`:

In [7]:
arcpy.env.addOutputsToMap = True
arcpy.env.overwriteOutput = True

### Download input datasets

This training relies on pre-built dataset that will allow us to perform operations such as bias-correction, visualization of gridded precipitation data, and other geospatial and static tabular data. We will use the `gdown` library to pull datasets from Google Drive links that are publicaly available.

In [8]:
# Publicaly accessible path to the Google Drive dataset containing training data
gfile = 'https://drive.google.com/uc?id=1GweIOjbFvPvfe88_Fuyf5WdW_Jzbx8nQ'  
gzfilename = 'training_data.tar.gz'
training_data_file = os.path.join(data_dir, gzfilename)

if os.path.exists(input_data_dir):
    print('Found input data directory: {0}'.format(input_data_dir))
else:
    os.makedirs(input_data_dir)
    training_data_file = os.path.join(data_dir, gzfilename)
    if not os.path.exists(training_data_file):
        print('Downloading from Google Drive, link = {0}'.format(gfile))
        !gdown -O {training_data_file} {gfile}
    else:
        print('Found GZIP archive of training data: {0}'.format(training_data_file))
    print('    Unzipping...')
    shutil.unpack_archive(training_data_file, extract_dir=input_data_dir, format="gztar")
    print('Unzipped archive of training data to {0}'.format(input_data_dir)) 

Found input data directory: C:\Users\ksampson\Documents\GitHub\GloFAS_Q2Q_Bias_Correction_and_Verification\data\input


### Define Functions

To clean up and de-clutter the notebooks in this training, we will store various functions related to analysis and plotting here.

These functions will be loaded into the namespace of each of the subsequent Esri Notebooks when `%run 00_environment_setup.ipynb` is called

In [9]:
# Quantile-to-Quantile mapping function, vectorized
# https://staff.ral.ucar.edu/hopson/GloFAS/Q2Qbiascorrection/q2q_glofas.py
def q2q(modclim, obsclim, fcst, return_pct=False):
    #Python code to map a GLOFAS forecast to the observed value of a similar quantile
    #Requires historical records of the model and observations to do the mapping
    #New forecasts larger and smaller than the archived forecast
    #distrubution will be mapped to the largest/smallest values of the observations
    
    # Handle the type by converting to array
    if type(fcst) in [float, int]:
        fcst = np.array([fcst])
        
    first_index=np.searchsorted(np.sort(modclim),fcst,side='left')
    last_index=np.searchsorted(np.sort(modclim),fcst,side='right')
    indval=np.array([np.random.choice(np.arange(first_index[n],last_index[n]+1),1)[0] for n in range(fcst.shape[0])])
    percentile=100*((indval)/(float(len(modclim))))
    q2q_corrected=np.percentile(obsclim,percentile)
    if return_pct:
        return q2q_corrected, percentile/100.
    else:
        return q2q_corrected

# Add to an existing dataframe, a cumulative density function based on a column.
def get_cdf(in_df, column=''):
    '''
    Get the frequency, PDF and CDF for each value in the series. 
    Input is a pandas DataFrame
    ''' 
    
    # Calculate the percentile of each value in the input array
    in_df['cdf'] = np.array([stats.percentileofscore(in_df[column], item)/100. for item in in_df[column]])
    return in_df
    
# Function to wrap a dropdown widget with value list
def drop_down_select(value_list, descriptor='Select:'):
    # Initialize the dropdown widget
    w = widgets.Dropdown(
        options=value_list,
        description=descriptor,
        disabled=False)

    # Set an observe object to detect any changes
    w.observe(lambda c: plot_content(c['new']) if (c['type'] == 'change' and c['name'] == 'value') else None)
    return w

def get_size_gb(input_da, silent=False):
    '''
    Get the size of an Xarray DataSet or DataArray and print total size
    '''
    # Print out information about the input dataset
    dataset_size_GB = input_da.nbytes/(1024.**3)
    if not silent:
        print('Size of input dataset:\t{0:3.2f} Gb'.format(dataset_size_GB))
    return dataset_size_GB

# Function to describe the structure of the xarray DataSet
def report_structure(ds, variable, time_coord='time', silent=False):
    '''
    Inputs:
        ds - an xarray DataSet object.
        variable - String - a variable in the input DataSet object to examine.
        time_coord - string - the name of the time coordinate in the input DataSet object.
    Outputs:
        ds - The xarray DataSet object, possibly altered to unify chunk sizes
        timesteps - The time values in the input ifle
    '''

    # Pull the timesteps from the input file
    timesteps = ds[time_coord].values
    if not silent:
        print('Found {0} timestep(s) in input file'.format(timesteps.shape[0]))

    # Print out information about the input dataset
    dataset_size_GB = get_size_gb(ds, silent=True)
    if not silent:
        print('Size of input dataset:  {0:3.2f} Gb'.format(dataset_size_GB))

    # Find out the size of one timestep (the unit of processing)
    timestep_size_GB = get_size_gb(ds[variable].isel({time_coord:0}), silent=True)
    if not silent:
        print('Size of 1 timesteps in dataset:  {0:3.3f} Gb'.format(timestep_size_GB))
    return ds, timesteps

### Finish

In [10]:
print('Completed importing and/or installing libraries in {0:3.2f} seconds.'.format(time.time()-tic))

Completed importing and/or installing libraries in 255.67 seconds.
