## Confirm Earthdata Login

Install the [sarp-east-toolkit](https://github.com/NASA-SARP/sarp-east-toolkit).

In [2]:
import xarray
from sarp_east_toolkit import earthdata_rio

rio_env = earthdata_rio('ornldaac')
fileobj = (
    's3://ornl-cumulus-prod-protected/'
    'gedi/GEDI_L4B_Gridded_Biomass/data/'
    'GEDI04_B_MW019MW138_02_002_05_R01000M_MU.tif'
)
with rio_env as env:
    raster = xarray.open_dataset(fileobj, engine='rasterio', chunks={})

In [3]:
raster

Unnamed: 0,Array,Chunk
Bytes,1.89 GiB,1.89 GiB
Shape,"(1, 14616, 34704)","(1, 14616, 34704)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.89 GiB 1.89 GiB Shape (1, 14616, 34704) (1, 14616, 34704) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",34704  14616  1,

Unnamed: 0,Array,Chunk
Bytes,1.89 GiB,1.89 GiB
Shape,"(1, 14616, 34704)","(1, 14616, 34704)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Problem 1 Exploring a dataset with XArray

1. What type of data structure is `raster`?

2. What variables are there?

3. What are the dimensions of this dataset? What is the size of each dimension? (If you are a visual person feel free to draw a picture of the data cube)

4. Extract the `DataArray` `band_data` and assign it to a new variable called `biomass_density`.

:::{dropdown} Solution
```
biomass_density = raster.band_data
```
:::

4. This data uses a type of grid system that is **projected**. That means that instead of using latitude and longitude to locate points it uses an **Easting** and a **Northing**, both measured in meters/kilometers.

```{image} https://www.maptools.com/images/28ad74e.png
:alt: Github logo
:width: 400px
:align: center
```

One way we see this in our data is that the dimensions are **x** and **y**, not latitude and longitude.

Display the value of the `biomass_density` array at x,y coordinate the point (3000, -4000). There isn't a grid point that corresponds to _exactly_ 3000 meters east and 4000 meters south of the origin, so add an additional argument `method=nearest`. This tell xarray to return the value at the gridpoint closes to that point.

:::{dropdown} Solution
```
biomass_density.sel(x = 3000, y = -4000, method='nearest')
```
:::

5. Now grab a slice of data. Get data from 750,000, 720,000 meters east and 480,000, 450,000 meters north.

Notice - what is the size of the output array? The should have gotten smaller!

:::{dropdown} Solutions
```
biomass_density.sel(x=slice(-750_000, -720_000), y=slice(480_000, 450_000))
```

:::

## Problem 2: XArray and Pandas

:::{dropdown} Problem
:open:

The OCO3 file has a peculiar way of storing the datetime for each sounding. The `date` variable has `epoch_dimension` as its second dimension: the 7 elements along this dimension correspond to year, month, day, hour, minute, second, and microsecond. Note that the
file name indicates all the data are from `200228`, or 2002-02-28.

Level: I

: Use the `xarray.DataArray.min` method to work out the earliest `sounding_id` time.

Level: I already knew about pandas.to_datetime

: Create a new variable in the dataset with the date converted to a datetime, getting rid of the epoch_dimension but keeping the sounding_id dimension

:::

In [None]:
# packages you need?

In [None]:
# file location

file = (
    DATADIR
    / 'oco-3-co2-data'
    / 'oco3_LtCO2_200228_B10400Br_220317235859s.nc4'
)

In [None]:
# open dataset


In [None]:
# pull the date variable out and understand this epoch_dimension
# 2020-02-28 10:36:04.1234 -> [2020, 2, 28, 10, 36, 4, 1.234]

In [None]:
# use the `xarray.DataArray.min` method over the correct dimension


In [None]:
# mask the date to select sounding_id with the minimum hour
# use the drop=True argument to ditch everything else


In [None]:
# repeat above with minutes, days, ... 


In [None]:
# how does pandas.to_datetime work when you pass it a data frame?
# how can you build a pandas.DataFrame from this xarray.DataArray?

:::{dropdown} Solution

```
from pathlib import Path

import xarray
import pandas


file = (
    Path('/efs/sarp/data/rawdata_readonly')
    / 'oco-3-co2-data'
    / 'oco3_LtCO2_200228_B10400Br_220317235859s.nc4'
)
oco3 = xarray.open_dataset(file)

date = oco3['date']
hour = date.sel({'epoch_dimension': 3}).min()
print(hour)
date = date.where(date.sel({'epoch_dimension': 3}) == hour)
minute = date.sel({'epoch_dimension': 4}).min()
print(minute)
date = date.where(date.sel({'epoch_dimension': 4}) == minute)
second = date.sel({'epoch_dimension': 5}).min()
print(second)
date = date.where(date.sel({'epoch_dimension': 5}) == second)
microsec = date.sel({'epoch_dimension': 6}).min()
print(microsec)

date = oco3['date'].assign_coords({
    'epoch_dimension': ['year', 'month', 'day', 'hour', 'minute', 'second', 'microsecond']
})
date_pandas = date.to_dataset(dim='epoch_dimension').to_dataframe()
oco3['datetime'] = pandas.to_datetime(date_pandas)
```
:::