# AMGeO CEDAR Workshop 2021

Welcome to AMGeO's workshop for CEDAR 2021. In this workshop, we hope to 

- Give a brief introduction to AMGeO

- Get you registered with AMGeO and its data providers

- Introduce our new API for interacting with AMGeO entirely with jupyter notebooks

- Have you try out our new API for yourself!

## AMGeO Introduction

TODO:

## Registration for AMGeO

To use AMGeO, users will need to register on 3 different platforms:

- AMGeO (https://amgeo.colorado.edu/register)

- SuperMag (https://supermag.jhuapl.edu/mag/)

- AMPERE (http://ampere.jhuapl.edu/)

For more information on why you need to register, please see https://amgeo.colorado.edu/about#data-policy

## New AMGeO API Demo

### Setting up your AMGeO Environment

Now that we are familiar with what AMGeO does and are registered, lets try out the new API! First step is to import AMGeO from this notebook.

<div class="alert alert-block alert-danger">
    <b>Warning:</b> If you have not registered for AMGeO, you will not be able to import the API. <br><br>
    Please follow the above links in 'Registration for AMGeO' to complete this step
</div>

In [1]:
# Import AMGeOApi
from AMGeO.api import AMGeOApi

Traceback (most recent call last):
  File "/Users/willemmirkovich/AMGeO/AMGeO/src/nasaomnireader/nasaomnireader/__init__.py", line 5, in <module>
    from nasaomnireader.omnireader_config import config
ModuleNotFoundError: No module named 'nasaomnireader.omnireader_config'

Solar wind data files will be saved to /Users/willemmirkovich/Library/Application Support/nasaomnireader
Traceback (most recent call last):
  File "/Users/willemmirkovich/AMGeO/AMGeO/src/nasaomnireader/nasaomnireader/omnireader.py", line 12, in <module>
    from spacepy import pycdf
ModuleNotFoundError: No module named 'spacepy'


------------IMPORTANT----------------------------
Unable to import spacepy. Will fall back to
using Omni text files, which may have slightly
different data and incomplete metadata
-------------------------------------------------



Next, create a new API class using the AMGeOApi contructor. This will return an AMGeOApi class for us to utilize

```python
# creates a new AMGeOApi class
AMGeOApi()
```

In [2]:
api = AMGeOApi()

This will allow us to access helpful information, such as the current output directory of where AMGeO data will be generated, or changing your registration information.

First, see where AMGeO's current output directory is by running the method ```get_output_dir```

```python
# returns the current output directory for AMGeO
AMGeOApi.get_output_dir()

# Example
api = AMGeOApi()
api.get_output_dir()
>>> '~/amgeo_output_dir'
```

In [3]:
api.get_output_dir()

'./potential_exercise_output'

This output directory is the default that AMGeO will set for any user. If you look at the directory we are currently in, we already have some data generated by AMGeO for you to use. To start using that data, we must set our output directory to this directory using ```set_output_dir```

```python
# sets the current output directory for AMGeO
AMGeOApi.set_output_dir('/the/directory')

# Example
api = AMGeOApi()
api.set_output_dir('./my/local/directory')
api.get_output_dir()
>>> './my/local/directory'
```

In [4]:
api.set_output_dir('./amgeo_output')

Let's verify that this worked by testing that running ```get_output_dir``` now is pointing to the correct location

In [5]:
assert(api.get_output_dir() == './amgeo_output')

### Generating AMGeO Maps

Now that we have our AMGeO environment set, lets generate some AMGeO Maps! 

![AMGeO Electric Potential Map](./static/AMGeOElectricPotentialMap.png)

This is an example of an image representation of electric potential data generated by AMGeO. What this looks like in terms of raw data is as follows:

$$\begin{pmatrix}
V_{1, 1} & V_{1, 2} & \cdots & V_{1, 37} \\
V_{2, 1} & V_{2, 2} & \cdots & V_{2, 37} \\
\vdots & \vdots & \ddots & \vdots \\
V_{24, 1} & V_{24, 2} & \cdots & V_{24, 37} \\
\end{pmatrix}$$

Where each column maps to a longitude in a given hemisphere, and each row maps to a latitude in a given hemisphere.



AMGeO can generate maps given many configurable options (see our documentation [here](https://amgeo.colorado.edu/protected/documentation/api.html), but in most cases users will use default options provided by AMGeO, or some regimented subset of the given options.

AMGeO's new API has developed the concept of ```controllers``` to let users decide which out-of-the-box settings of AMGeO they would like to use. 

To get a ```controller```, run ```get_controller``` on the AMGeO API class.

```python
# Returns an AMGeO Controller
api = AMGeOApi()
api.get_controller(type='default')
>>> amgeo_controller

# Example
controller = api.get_controller() # returns the default controller by default
controller
>>> default_controller
```

In [6]:
controller = api.get_controller()
controller

Default AMGeO Controller

Currently, AMGeO only supports default settings with the AMGeO API, the only controller type supported is ```'Default'```

With our new cotroller, we can begin generating AMGeO Maps. Previously, to do this required using our command line interface to generate an AMGeO map. This was limited to a whole day of AMGeO or a single datetime, but now has been expanded to other high-level jobs. When generated, these maps will be stored in the grid format as numpy arrays in an hdf5 file within the AMGeO output directory.

To generate an AMGeO map, you will use the ```generate``` method on a ```controller```

```python
# generates AMGeO Data for the given arguments to the AMGeO output directory
controller.generate(date_args, hemisphere)
```

The ```date_args``` that the AMGeO API accepts are in the form of [python date/datetimes[(https://docs.python.org/3/library/datetime.html)

The new ```generate``` method allows for multiple different ways of generating maps for a given task and range of dates

- Option 1: A datetime
```python
# generate data for a single datetime
dt = datetime.datetime(2013, 5, 6, 12, 30, 0) # 5/6/2013 @ 12:30:00
controller.generate(dt, 'N') # generates the map for the given datetime on the northern hemishphere
```

- Option 2: A list of datetimes
```python
# generate data for a single datetime
dts = [
    datetime.datetime(2013, 5, 6, 12, 30, 0), # 5/6/2013 @ 12:30:00
    datetime.datetime(2014, 5, 6, 13, 30, 0), # 5/6/2014 @ 13:30:00
    datetime.datetime(2015, 5, 6, 1, 30, 0), # 5/6/2015 @ 01:30:00
]
controller.generate(dts, 'N') # generates the maps for each datetime on the northern hemisphere
```

- Option 3: A date
```python
# generate data for a whole day, which will create maps for every 2 min 30 sec interval in a given day
d = datetime.date(2014, 1, 1) # 1/1/2014
controller.generate(d, 'S') # generates the maps for an entire day on the southern hemisphere
```

- Option 4: A range of dates
```python
# generate data for each day in a given range
date_range = (
    datetime.date(2014, 1, 1), 
    datetime.date(2014, 1, 5)
) # tuple of range from 1/1/2014 -> 1/5/2014
controller.generate(date_range, 'N')
```

Lets run some examples for the purposes of this notebook demo

In [7]:
'''
First, import pythons datetime module

This is how you will pass arguments to the AMGeO Controller to generate data
'''
from datetime import datetime, date

In [8]:
''' 
Example 1
Option 1: A single datetime
'''
dt = datetime(2014, 5, 6, 12, 30, 0)
hemisphere = 'N'
controller.generate(dt, hemisphere)

Request recieved for 2014-5-6 N
2014-5-6 N complete


In [9]:
'''
Example 2
Option 2: A list of datetimes
'''
dts = []
# do 15 min intervals for A
for i in range(0, 60, 15):
    dt = datetime(2015, 6, 25, 13, i, 0)
    print('Datetime: %s' % dt)
    dts.append(dt)
print()
controller.generate(dts, hemisphere)

Datetime: 2015-06-25 13:00:00
Datetime: 2015-06-25 13:15:00
Datetime: 2015-06-25 13:30:00
Datetime: 2015-06-25 13:45:00

Request recieved for 2015-6-25 N
2015-6-25 N complete


Now that we have some assimilative maps generated, lets look into how we can load these maps easily to be able to accomplish some basic tasks

### Loading AMGeO Maps

With the data we just generated, lets see what days we have available to load from. The ```controller``` class provides the ```browse``` method to do this.

```python
# returns the list of all days for which AMGeO has data available in its output directory
controller.browse()
>>> [] # list of days that have maps generated 

# Example
controller.browse()
>>> ['20110101N', '20120102S']
```

In [10]:
controller.browse()

['20150625N', '20140506N']

One of the key development tasks that the API achieves is giving researchers/developers the tools to interact with AMGeO data in an intuitive way. 

[Xarray](http://xarray.pydata.org/en/stable/index.html), a new python pacakge desinged to help bundle scientific data into python, is used by AMGeO to accopmlish this. 

Once you have used a controller to generate these maps, controllers also allow for you to load the data into an [Xarray Dataset](http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html)

This image is a visual aid to understand how AMGeO data is represented/stored in a Xarray Dataset

![AMGeO Xarray Dataset](./static/AMGeOXarrayDataset.png)

Now that we have some maps available to use, lets try loading one day into an Dataset. To do this, use the ```load``` method on the ```controller``` class

```python
# load data available for a given day/hemisphere stored in AMGeO's output directory
controller.load('YYYYMMDDH') 
>>> Xarray DataSet

# Example
first_day = controller.browse()[0] # grab first element in days available
print(first_day)
>>> '20190501N'
ds = controller.load(first_day) # store dataset
```

In [11]:
# grabs the first day from our list of days that we have maps for
days = controller.browse()
day = days[0]

# load the days data into an Xarray Dataset
ds = controller.load(day)

ds

With a live AMGeO Dataset, lets learn more about Xarray and how it can aid your research goals

### Interacting with AMGeO Maps using Xarray


#### Xarray Data Variables

To first understand how Xarray Datasets work, the first step is to understand data variables. 
If the Dataset was analagous to a python dictionary, Data Variables would be this dictionary's keys. 
By accessing one, you get a [Xarray DataArray](http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html) of the specific type of data you are interested in. For example, to get the electric potential data, you would access the ```epot``` key on the Dataset

In [12]:
ds['epot']

Each of these data variables maps to a specific type of data from an AMGeO Map, such as Electric Potential, Hall Conductance, Joule Heating, etc. 

#### Xarray Dimensions

Since we are dealing with multi-dimensional data, it can sometimes be helpful to understand what dimensions you are currently working with at any given time. Xarray makes this very easy to access with the ```dims``` attribute on a DataArray

In [13]:
halls = ds['cond_hall'] # get the hall conductance data from the dataset
halls.dims

('time', 'lat', 'lon')

From the above example, we can see that once we have the hall conductance data, our first dimension on the array is time (which datetime you are interested in), followed by the lat/lon grid of hall conductance observations

#### Xarray Coordinates

Xarray Coordinates allow for even further analysis of the dimensions, with information as to what the specific coordinate(s) are at a given data point. 

In [14]:
halls.coords

Coordinates:
  * time     (time) datetime64[ns] 2015-06-25T13:00:00 ... 2015-06-25T13:45:00
  * lat      (lat) float64 88.33 86.67 85.0 83.33 ... 55.0 53.33 51.67 50.0
  * lon      (lon) float64 0.0 10.0 20.0 30.0 40.0 ... 330.0 340.0 350.0 360.0

From the above, we can seee that our dims ```time```, ```lat``` and ```lon``` are datatypes ```datetime64```, ```float64``` and ```float64``` respectively. We can access a specific coordinate we are interested in.

In [15]:
# grab the first time slice in the hall conductance array
t = halls[0]

# get the time of the specific data point in question
t.time

#### Xarray Metadata

One of the other benefits of using Xarray is its use of metadata. AMGeO uses this capability currently and seeks to expand it even more in the future. 

To access the metadata, use Xarray's ```.attrs``` attribute

Metadata lives on all aspects of the AMGeO DataSet

From the Dataset itself...

In [16]:
ds.attrs

{'description': 'AMGeO v2 beta data', 'version': 'v2_beta'}

to the DataArrays inside of the DataSet

In [17]:
epots = ds['epot']
epots.attrs

{'description': 'epot',
 'longname': 'Electric Potential',
 'shortname': 'epot',
 'units': 'V'}

These aim to help provide relevant information to the user within the notebook itself without having to go dig it up on your own

#### Compatability with Numpy

Xarray allows for simple one to one mapping operations from an Xarray DataArray to a Numpy Array using the ```.values``` attribute

In [18]:
epots = ds['epot']
numpy_arr = epots.values 
type(numpy_arr)

numpy.ndarray

#### Compatability with Pandas

All this multi-dimensial stuff not your forte? More used to pandas DataFrames? No Problem! Xarray also has a simple method to transform a DataSet to a DataFrame

While possible, this is not recommended as it can be even more taxing to move away from the multi-dimensional structure of the Dataset. BUT, it is possible

In [19]:
ds.to_dataframe()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,E_ph,E_th,cond_hall,cond_ped,epot,int_joule_heat,joule_heat,v_ph,v_th
lat,lon,time,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
88.333323,0.0,2015-06-25 13:00:00,-0.024202,0.005640,6.556684,5.360257,8236.750029,170.748098,3.310249,95.961155,411.813523
88.333323,0.0,2015-06-25 13:15:00,-0.027219,0.003510,6.652478,5.439853,8076.395199,153.419431,4.097304,59.719779,463.147890
88.333323,0.0,2015-06-25 13:30:00,-0.024656,0.000046,6.745371,5.517046,10311.950506,142.447042,3.354014,0.788708,419.539905
88.333323,0.0,2015-06-25 13:45:00,-0.024285,0.005634,6.841622,5.571366,9557.370954,163.730132,3.462716,95.870715,413.226241
88.333323,10.0,2015-06-25 13:00:00,-0.026104,0.001961,6.574237,5.373967,9188.410499,170.748098,3.682717,33.372274,444.180273
...,...,...,...,...,...,...,...,...,...,...,...
49.999754,350.0,2015-06-25 13:45:00,-0.000077,0.000016,4.000000,4.000000,74.972830,163.730132,0.000025,0.335358,1.579175
49.999754,360.0,2015-06-25 13:00:00,-0.000023,0.000064,4.000000,4.000000,135.162509,170.748098,0.000018,1.302176,0.466467
49.999754,360.0,2015-06-25 13:15:00,-0.000009,0.000093,4.000000,4.000000,44.977675,153.419431,0.000035,1.910103,0.180971
49.999754,360.0,2015-06-25 13:30:00,-0.000014,0.000106,4.000000,4.000000,-43.991883,142.447042,0.000045,2.161771,0.279861


# Thanks! 

TODO: placeholder for Discussion board