# Part 1: *Soemthing catchy here*

## Exercise 1: COG to Zarr with a single tile

In this exercise, we will load in a single GeoTIFF into xarray using [rioxarray](https://corteva.github.io/rioxarray/html/modules.html) and show how to navigate the xarray repr. We will then do some quick visualizations of the tile and save out the xarray data to Zarr. 

To start, let's read in a single GLAD LULC tile from the year 2000 from Google Cloud. The data can also be downloaded to local files [here](https://storage.googleapis.com/earthenginepartners-hansen/GLCLU2000-2020/v2/download.html). We will use rioxarray's [`open_rasterio`](https://corteva.github.io/rioxarray/html/rioxarray.html#rioxarray-open-rasterio) for this operation:

In [1]:
import rioxarray

year = 2000  # Feel free to change this to 2005, 2010, 2015, or 2020
file_name = "50N_120W"  # Feel free to change this to any of the other files in the dataset

url = f"https://storage.googleapis.com/earthenginepartners-hansen/GLCLU2000-2020/v2/{year}/{file_name}.tif"

da = rioxarray.open_rasterio(url, masked=True)
da

ModuleNotFoundError: No module named 'rioxarray'

Now, let's examine the data structure....

TODO: Tom to fill in

It's important to note here that we did not actually read in all of the tile data; we actually only read in the metadata, which is why this was so quick! We will actually have to load the data in for operations that require direct data access like plotting and writing to Zarr. These will require some additional optimizations for these large tiles. 

For this first part of the tutorial, we will subset this tile to expedite the first few exercises. We will discuss optimizations when we get to building the global Zarr dataset. 

In [3]:
# Select a subset of the data
x_slice=slice(-113, -110)  # None for whole tile
y_slice=slice(42, 40)  # None for whole tile
da_sample = da.sel(x=x_slice, y=y_slice)
da_sample

You will find that for a lot of Level 3 geospatial datasets, the data is stored in a single band (often named "band" -- very original!) as it is here. Let's rename this band to "lulc" just to be a bit more explicit. 

We will also remove the `lulc` dim. Since it only has one value, it doesn't hold additional information along that axis, so removing it will simplify the array shape from a 3D array to a 2D array.

In [4]:
da_sample = da_sample.rename({"band": "lulc"})
da_sample = da_sample.squeeze("lulc")
da_sample

### Plotting

Visualization is essential for geospatial data. How can we know that our data was correctly loaded into xarray without actually looking at it? Below are a few different approaches to plotting xarray data in a notebook. 

**Cloud vs. Local Latencies**

Note that the data must be loaded in before it can be plotted; loading data in from a cloud source vs. from your local machine can cause a large disparity in runtime. Loading data from the cloud has higher latency. This tile is also quite large. To avoid long runtimes or kernel crashes, consider downloading the data locally, only trying to visualize a slice of the full tile, or TODO

**Visualizing in QGIS**

QGIS natively supports TIFF and GeoTIFFS...

In [3]:
# Load the data into memory from the cloud
# This may take awhile depending on your internet connection, the size of the file, and whether it is local or in cloud storage
# This is slow because we are loading non-cloud optimized data
x_slice=slice(-113, -110)  # None for whole tile
y_slice=slice(42, 40)  # None for whole tile
da_sample = da.sel(x=x_slice, y=y_slice)
da_sample = da_sample.load()
da_sample

#### hvPlot

[hvPlot](https://hvplot.holoviz.org/) is great for large xarray datasets because it integrates well with xarray, supports Dask for lazy evaluation, and leverages Datashader to efficiently render millions of points without performance loss. It also enables interactive, zoomable plots with minimal code, making it ideal for exploring complex geospatial or time-series data.

We discourage the use of dask-backed xarray dataset in this plotting example because **

In [None]:
import hvplot.xarray
import hvplot.pandas  # needed for tile sources
import holoviews as hv
from holoviews.element.tiles import EsriImagery  # or other tile source

hv.extension('bokeh')

def plot_hvplot(data_to_plot):
    # rasterize=True will enable datashading for large datasets
    img = data_to_plot.hvplot.image(x='x', y='y', cmap='viridis', rasterize=False, frame_width=500, dynamic=True, geo=True)
    return EsriImagery() * img

plot_hvplot(da_sample)


# TODO: values are weird, like they are being averaged/sampled

#### Leafmap

[leafmap](https://leafmap.org/) is good for plotting xarray data because it combines the mapping power of Leaflet (via `ipyleaflet` or `folium`) with convenient tools for handling raster and vector geospatial data, including xarray. It can automatically convert xarray DataArrays into interactive map layers, supporting time sliders, colorbars, and basemaps — making it especially useful for visualizing geospatial timeseries or remote sensing data with minimal setup.

In [None]:
import leafmap

def plot_leafmap(data_to_plot):
    m = leafmap.Map(center=(40, -100), zoom=4)
    m.add_raster(data_to_plot)
    return m

x=slice(-113, -110)  # None for whole tile
y=slice(42, 40)  # None for whole tile
plot_leafmap(sample_da)

### Writing Data to Zarr

Before we move on from this single data tile, let's write our data to Zarr using xarray's [`to_zarr`](https://docs.xarray.dev/en/latest/generated/xarray.Dataset.to_zarr.html) method.

- `store`:
- `group`:

In [None]:
store = ""
group = ""
da.to_zarr(store=store, group=group)

We can also easily read this dataset back into Xarray with [`open_zarr`](https://docs.xarray.dev/en/stable/generated/xarray.open_zarr.html). Note that we set `chunks` here to load the data into `dask` arrays. We will discuss chunking more in the next section of the workshop.

In [None]:
import xarray as xr

ds = xr.open_zarr(store=store, group=group, chunks={"x": 2048, "y": 2048})
ds