---
title: Datasets II - Practice
subtitle: Work with Satellite Images and Other Multi-dimensional Data
authors:
  - name: Ian Carroll
    affiliations:
      - University of Maryland Baltimore County
      - NASA Goddard Space Flight Center
  - name: Rachel Wegener
    affiliations:
      - University of Maryland College Park
thumbnail: https://rwegener2.github.io/sarp_lessons/_static/sarp_logo.png
github: itcarroll/itcarroll.github.io
---

## Exercise 1

:::{dropdown} Problem
:open:

Level: I

: Build a tiny multi-dimensional dataset from scratch (no reading from a file!). Make the first variable one-dimensional with a coordinate. Make a second variable in the same dataset that results from an algebraic operation on the first variable. What kind of earth system are you thinking about with these two variables?

Level: II

: Make the second variable the two-dimensional result of raising the first variable to the powers 0, 1, and 2. What is a practical application of doing this operation on a variable?

Level: I took Data Visualization in Python last semester

: Plot curves for each power of the second variable along the coordinate.

:::

In [None]:
# your work here!

:::{dropdown} Solution
```
import xarray
import hvplot.xarray

y = xarray.DataArray(
    data=[-0.45, 0.5, -1.3, 0.6, 0.2],
    coords={'x': [0.1, 2.3, 4.5, 5.6, 7.8]},
    name='y'
)
problem = y.to_dataset()
powers = xarray.DataArray([0, 1, 2], dims='n')
problem['y^n'] = y ** powers
problem.hvplot(x='x', y='y^n', by='n')
```
:::

## Exercise 2

:::{dropdown} Problem
:open:

The OCO3 file has a peculiar way of storing the datetime for each sounding. The `date` variable has `epoch_dimension` as its second dimension: the 7 elements along this dimension correspond to year, month, day, hour, minute, second, and microsecond.

Level: I

: Read in any OCO3 data file and find the minimum and maximum dates.

Level: I already knew about pandas.to_datetime

: Create a new variable in the dataset with the date converted to a datetime, getting rid of the epoch_dimension but keeping the sounding_id dimension

:::

In [None]:
# your work here!

:::{dropdown} Solution
```
from pathlib import Path

import xarray
import pandas


file = (
    Path('/efs/sarp/data/rawdata_readonly')
    / 'oco-3-co2-data'
    / 'oco3_LtCO2_200228_B10400Br_220317235859s.nc4'
)
oco3 = xarray.open_dataset(file)

min_date = oco3['date'].min(dim='sounding_id')
print(min_date.data)
max_date = oco3['date'].max(dim='sounding_id')
print(max_date.data)

date = oco3['date'].assign_coords({
    'epoch_dimension': ['year', 'month', 'day', 'hour', 'minute', 'second', 'microsecond']
})
date_pandas = date.to_dataset(dim='epoch_dimension').to_dataframe()
oco3['datetime'] = pandas.to_datetime(date_pandas)
```
:::

## Exercise 3

:::{dropdown} Problem
:open:

The lesson demonstrated twice what the result of a programming error looks like in Python. The second error occurred while trying to reproject a GeoPandas data frame to a given CRS. Compose a minimal reproducible example (MRE) that raises the same error. The purpose of an MRE is for asking help, probably from someone who has no interest in your data but knows Python really well. You want to make it as easy as possible for them to cause the error to happen on their own machine, so don't expect them to download any data.

:::

In [None]:
# your work here!

:::{dropdown} Solution
```
from shapely.geometry import Point
import geopandas

gdf = geopandas.GeoDataFrame(
    data=[0, 1, 2],
    geometry=[Point(0, 0), Point(0, 1), Point(1, 0)]
)
gdf.to_crs(epsg=6933)
```
:::