# Introduction and Setup

Initially, we require several libraries and scientific modules for our tasks. Run the cell.

In [1]:
# Standard libraries
import os

# Scientific libraries
import numpy as np
import xarray as xr
import geopandas as gpd


#Visualization libraries

from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt


import emoji




In [2]:
# This is a command specific to Jupyter Notebooks that ensures Matplotlib plots are embedded
# and displayed directly within the notebook interface, independent of the Jupyter version.
%matplotlib inline

In [None]:
print(emoji.emojize('Python is :thumbs_up:'))

### Working with netCDF data 

**Downloading the 2023 Monthly Wind Data**

Navigate to the Copernicus Marine Service website to access the 2023 monthly wind data. Specifically, download the 12 NetCDF files for 2023 from the following link:
https://data.marine.copernicus.eu/product/WIND_GLO_PHY_CLIMATE_L4_MY_012_003/files?path=WIND_GLO_PHY_CLIMATE_L4_MY_012_003%2Fcmems_obs-wind_glo_phy_my_l4_P1M_202211%2F2023%2F&subdataset=cmems_obs-wind_glo_phy_my_l4_P1M_202211

Once downloaded, store the data in `OceanographicDataProcessingCourse/Data/Wind`

In the cell below, please provide the file path to where you've stored the downloaded data. This will allow for the data to be accessed and processed in the subsequent steps. You can use a relative or absolute path. For example: `C:/PATH/TO/FILE` on any operating system. Today, we start with the data of January 2023.



In [4]:
## datapath and filename
datapath = '../Data/Wind'
filename = "cmems_obs-wind_glo_phy_my_l4_P1M_202301.nc"

shapefile = '../Data/110m_cultural/ne_110m_admin_0_countries.shp'

Since geographic data files can often be very large, when we first open our data file in xarray it simply loads the metadata associated with the file. We can then view summary information about the contents of the file before deciding whether we’d like to load some or all of the data into memory ( xarray allows for a quick view of the dataset's metadata without loading the full data, but once the data is accessed, it will be loaded.). Run the next cell

In [5]:
#run the cell
full_path = os.path.join(datapath, filename)
ds = xr.open_dataset(full_path)

In [6]:
#dir(ds)

An xarray has typically the following components:  
data : data array ( values)  
coords : dictonary which shows dimensions with corresponding coordinates and data types
attrs : dictionary with metadata and attributes  
Have a look into the xarray ds by running the next cell.

In [None]:
## run the cell 
ds

Alternatively you can use the `print()`-Statement, which is also helpful when you work with excecutable Python Scripts

In [None]:
#run the cell
print(ds)


You've already observed the variables within the dataset. Another method to display the variables included in the dataset is to simply use `list()` on `data_vars`.

In [None]:
#run the cell
list(ds.data_vars)

From this point, you'll have a clear overview of what's contained in the data. You'll also be able to see how the data is distributed both temporally and spatially. If you can't immediately discern the resolution of the data, the following code snippet will assist you:

In [None]:
# run the cell
if 'lon' in ds.dims and len(ds.lon) > 1:
    print("Longitude resolution:", ds.lon.values[1] - ds.lon.values[0])
if 'lat' in ds.dims and len(ds.lat) > 1:
    print("Latitude resolution:", ds.lat.values[1] - ds.lat.values[0])
if 'time' in ds.dims and len(ds.time) > 1:
    print("Temporal resolution:", ds.time.values[1] - ds.time.values[0])

In [None]:
# or 
ds.lon.diff(dim = 'lon').plot()



You might have noticed the absence of an output for temporal resolution. Why might that be? Although the dataset has a `'time'` dimension, it doesn't truly have a temporal resolution since there's only one time value. This indicates that we have only one timestep.

You can see, which variables the data set includes and which dimension they have. We are interested in the u and v variable contained within that xarray dataset and named here `eastward_wind` and `northward_wind`:

In the xarray library, a dataset (often denoted as ds) represents an in-memory on-disk database of arrays. These arrays can be thought of as variables in the dataset. There are two primary ways to access these variables:

1. Attribute-style access: `ds.variable_name`
2. Dictionary-style access: `ds['variable_name']`

Attribute access is shorter but might not always work, especially with invalid Python names (e.g. '123' or 'print'). Dictionary access is more universal and works with any variable name.

Testing both access' by running the next two cells.


In [None]:
# run the cell
ds.eastward_wind

In [None]:
#run the cell
ds['eastward_wind']

Both lines should produce the same output, assuming that `eastward_wind` is a valid variable in your dataset. If not, you'd get an error.

In summary, both methods are valid ways to access xarray Dataset variables, and which one to use often comes down to personal preference, the specific situation, and the variable names you're working with.

## Visualization

With `xarray`'s built-in plotting functionality, we can easily visualize DataArrays. Here, we are plotting `ds.eastward_wind` at the time index `time = 0` for all values of latitude and longitude using `ds.eastward_wind[0,:,:]`. If this dataset contained data for multiple months, we would specify the desired time index accordingly.

In [None]:
# Plot eastward wind using xarray's built-in plotting functionality
ds.eastward_wind[0,:,:].plot(cmap='coolwarm', figsize=(12, 5))

# Show the plot
plt.show()

This plot already looks quite impressive! We can observe the zonal wind velocity, with positive amplitudes in the mid-latitudes and negative amplitudes in the higher latitudes, as well as between -20° and 20°. However, now we also want to visualize the continents and add the spatial grid to the plot.


In [None]:
# Load the shapefile of the world map
world = gpd.read_file(shapefile)


# Create the figure and axis
fig, ax = plt.subplots(figsize=(12, 5))

# Plot eastward wind using xarray's built-in plotting functionality
# This uses the existing data from an xarray Dataset (e.g., ds)
ds.eastward_wind[0,:,:].plot(ax=ax, cmap='coolwarm')

# Plot the world map (continents)
world.plot(ax=ax, color='none', edgecolor='black', linewidth=1)

# Add gridlines
ax.set_xticks(range(-180, 181, 30))  # Longitude gridlines every 30°
ax.set_yticks(range(-90, 91, 30))    # Latitude gridlines every 30°
ax.grid(True, linestyle='--', color='gray')  # Dashed gridlines

# Show the plot
plt.show()




First, we load the shapefile containing the country borders as `world`. We create the figure and axis, and plot the eastward wind using xarray's built-in plotting functionality as before. After that, we plot the landmasses and ensure both plots are drawn on the same axis. Finally, we add gridlines and apply a specific style to them.


When you take a closer look at the code snippet above, where do you think you could change the color of the landmasses and the borders, for example?

**1. Exercise: Copy the code from above and modify it to change the color of the landmasses and borders. Experiment and see how different color schemes affect the visualization.**



In [17]:
#copy the code from above change the relevant paramters



The xarray.plot() function expects regularly gridded data on a flat, linear axis, which could result in distorted representations of the Earth. In flat projections, the sizes and shapes of geographical features are not accurate — for example, landmasses near the poles appear much larger than they actually are. To create more realistic representations of the Earth, we could use map projections, such as the Robinson projection.

For this, we use `Basemap`, which comes with built-in coastlines, so there’s no need to load a separate shapefile.

In [None]:
#run the cell


# Create the figure and axis
fig, ax = plt.subplots(figsize=(12, 7))

# Define the map projection using Robinson (Basemap for Robinson projection)
m = Basemap(projection='robin', lon_0=0, ax=ax)

# Get longitude and latitude data from the dataset
lon = ds.coords['lon'].values
lat = ds.coords['lat'].values

# Create a meshgrid for plotting
lon2d, lat2d = np.meshgrid(lon, lat)

# Transform the coordinates into the Robinson projection
x, y = m(lon2d, lat2d)

# Define levels for contouring (for eastward wind)
levels = np.linspace(-12, 12, 30)

# Plot the eastward wind data using contourf
cs = m.contourf(x, y, ds.eastward_wind[0, :, :], cmap='coolwarm', levels=levels)

# Add a colorbar
cbar = m.colorbar(cs, location='right', pad="10%",ticks=np.arange(-10, 11, 2))

cbar.set_label(f'Eastward Wind ({ds.eastward_wind.units})')


# Use Basemap to fill continents with white
m.fillcontinents(color='white')

# Draw coastlines (only coastlines, no rivers)
m.drawcoastlines()

# Add gridlines (parallels and meridians) for the Robinson projection
m.drawparallels(np.arange(-90., 91., 30.), labels=[1, 0, 0, 0], linewidth=0.5, color='gray')
m.drawmeridians(np.arange(-180., 181., 40.), labels=[0, 0, 0, 1], linewidth=0.5, color='gray')

# Get the time of the data (assuming the first time step is relevant) for the title
time = str(ds.coords['time'].values[0])
# Set a dynamic title based on the dataset
plt.title(f"Eastward Wind at {time[:10]} ({ds.eastward_wind.units})")

# Show the plot
plt.show()

Basemap converts the geographic coordinates (latitude and longitude) into a different coordinate system so we can correctly plot the data using the Robinson projection. The xarray.plot() function is designed for 2D Cartesian grids and does not inherently apply geographic projections like Robinson or Mercator.

**2. Exercise: Copy the code from above and**:

    1. Plot the northward_wind component.
    2. Change the color of the landmasses.
    3. Modify the color, linewidth, and spacing of the parallels and meridians.
    4. Think about what else should be modified when plotting a new variable (e.g., color scale, units, or labels).


In [19]:
# copy the code from above and modify



# Data Reduction Techniques - Exploring Coarsen and Slice 

Now that you've explored various plotting techniques, you have a basic understanding of how zonal (eastward) and meridional (northward) wind velocities look. Wind, as a vector, has both a northward and eastward component, which are typically combined and represented as wind vectors. You might recognize this from weather apps, where wind direction and strength are often shown using arrows.

We have plotted the wind components separately, but typically, wind data is represented using vector arrows to visually display both speed and direction. To do this, we can use the quiver() function:

In [None]:
# run this cell

# Create the figure and axis
fig, ax = plt.subplots(figsize=(12, 7))

# Create a basemap instance (PlateCarree projection, which matches our lat/lon grid)
m = Basemap(projection='cyl', llcrnrlat=-90, urcrnrlat=90, llcrnrlon=-180, urcrnrlon=180, resolution='c', ax=ax)

# Draw coastlines and fill continents
m.drawcoastlines()
m.fillcontinents(color='white')

# Draw map boundaries and lat/lon gridlines
m.drawmapboundary()
m.drawparallels(np.arange(-90., 91., 30.), labels=[1,0,0,0], linewidth=0.5)
m.drawmeridians(np.arange(-180., 181., 60.), labels=[0,0,0,1], linewidth=0.5)



# Convert the longitude and latitude from the xarray dataset for the quiver plot
lon = ds.coords['lon'].values
lat = ds.coords['lat'].values
lon2d, lat2d = np.meshgrid(lon, lat)  # Create a meshgrid for quiver plotting

# Quiver plot with wind vectors (zonal and meridional wind components)
quiver_plot = ax.quiver(lon2d, lat2d, ds.eastward_wind, ds.northward_wind, scale=500)

# Set labels and title
plt.title('Wind Vector Plot (Eastward and Northward Components)')

# Show the plot
plt.show()


We encountered a `ValueError`, and no data were plotted. This likely happened because ??? What do you think?

What’s the difference between downsampling and slicing?

    Downsampling involves reducing the resolution of the dataset by averaging or aggregating data onto a coarser grid. Instead of simply selecting fewer points, it creates a new, lower-resolution dataset by merging data from finer grids. This results in a smoother representation with fewer data points, while still capturing the overall pattern of the dataset.

    Slicing involves selecting specific parts of the data based on intervals. For example, you might select every 20th point in the dataset to reduce the number of arrows plotted, making it more manageable.

Let's start by testing slicing using `pu, pv = ds.eastward_wind[0, ::20, ::20], ds.northward_wind[0, ::20, ::20]`

In [None]:
# run this cell


# Create the figure and axis
fig, ax = plt.subplots(figsize=(12, 7))

# Create a basemap instance (PlateCarree projection, which matches our lat/lon grid)
m = Basemap(projection='cyl', llcrnrlat=-90, urcrnrlat=90, llcrnrlon=-180, urcrnrlon=180, resolution='c', ax=ax)
# Draw coastlines and fill continents
m.drawcoastlines()
m.fillcontinents(color='white')

# Draw map boundaries and lat/lon gridlines
m.drawmapboundary()
m.drawparallels(np.arange(-90., 91., 30.), labels=[1,0,0,0], linewidth=0.5)
m.drawmeridians(np.arange(-180., 181., 60.), labels=[0,0,0,1], linewidth=0.5)

# Get the longitude and latitude data from the dataset
pu, pv = ds.eastward_wind[0, ::20, ::20], ds.northward_wind[0, ::20, ::20]  # Slicing the wind data for clarity

# Convert the longitude and latitude from the xarray dataset for the quiver plot
lon = ds.coords['lon'].values[::20]
lat = ds.coords['lat'].values[::20]
lon2d, lat2d = np.meshgrid(lon, lat)  # Create a meshgrid for quiver plotting

# Quiver plot with wind vectors (zonal and meridional wind components)
quiver_plot = ax.quiver(lon2d, lat2d, pu, pv, scale=500)

# Set labels and title
plt.title('Wind Vector Plot (Eastward and Northward Components)')

# Show the plot
plt.show()


The direction of the arrow represents the wind direction, while the length of the arrow indicates the wind speed.


**3. Exercise: Copy the code from above and experiment with slicing. Try selecting every 10th or 50th point, and vary the slicing for the zonal latitude and longitude direction. What else do you need to consider? (For example, when slicing differently for each direction, remember to adjust latitude and longitude accordingly.)**

You can also use `ds.eastward_wind[0, slice(None, None, Step), slice(None, None, Step)]`, where Step can be set to 20, 50, 10, or any other value depending on how much you want to slice the data.

In [None]:
# copy the code from above and modify




Great! To create a more intelligible visualization, you employed data slicing. It's important to remember that slicing can omit certain values, possibly leading to a loss of information. An alternative is to use spatial averaging, which can be seamlessly achieved using the `coarsen method`. 


First, we need to calculate the absolute wind speed: 
To find the wind speed, `U`, from the eastward (u) and northward (v) components, use:

$$U = \sqrt{u^2 + v^2}$$  


**4. Exercise: Use  `np.sqrt()`in numpy to compute `U`. Plug in your data for u and v. Ensure the units and name attributes in your xarray data array are correct. Update the metadata attributes of your xarray data array (`wind_speed.attrs['units'] = 'UNIT' , wind_speed.attrs['long_name'] = 'NAME OF VARIABLE'`). Then, plot wind speed with your preferred method.**


Tip: To square a value in Python, use `**`

In [57]:
## your computation and plot of wind speed here
wind_speed = 

#### Common Statistical Operations in Python

In Python, especially with libraries like xarray, numpy, or pandas, you can perform various statistical operations by appending methods like .mean(), .std(), or .sum() to your data structures. Downsampling, or reducing data resolution using methods like [`coarsen()`](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.coarsen.html), can be useful for several reasons, such as improving performance, aligning data with coarser grids, or reducing noise for analysis. For example, when reducing resolution from 0.25° to 1°, you can use a window size of 4 and apply functions like .mean(). Don’t forget to update metadata, such as units, after downsampling.     


**5. Exercise**:  
1. Average the data onto a 1°x1° grid. Name it `coarsened_mean`
2. Update attributes
3. Plot the results

In [58]:
# your calculation here
coarsened_mean = 

In [None]:
# your plot here



In the following, we plot the wind vectors together with the absolute wind speed.

In [None]:
# run this cell

# run this cell


# Create the figure and axis
fig, ax = plt.subplots(figsize=(12, 7))

# Create a basemap instance (PlateCarree projection, which matches our lat/lon grid)
m = Basemap(projection='cyl', llcrnrlat=-90, urcrnrlat=90, llcrnrlon=-180, urcrnrlon=180, resolution='c', ax=ax)
# Draw coastlines and fill continents

#m.fillcontinents(color='white')
m.drawcoastlines()
# Draw map boundaries and lat/lon gridlines
m.drawmapboundary()
m.drawparallels(np.arange(-90., 91., 30.), labels=[1,0,0,0], linewidth=0.5)
m.drawmeridians(np.arange(-180., 181., 60.), labels=[0,0,0,1], linewidth=0.5)

# Get the longitude and latitude data from the dataset
pu, pv = ds.eastward_wind[0, ::20, ::20], ds.northward_wind[0, ::20, ::20]  # Slicing the wind data for clarity

# Convert the longitude and latitude from the xarray dataset for the quiver plot
lon = ds.coords['lon'].values[::20]
lat = ds.coords['lat'].values[::20]
lon2d, lat2d = np.meshgrid(lon, lat)  # Create a meshgrid for quiver plotting
# Create a meshgrid for plotting
lon2dc, lat2dc = np.meshgrid(coarsened_mean.lon, coarsened_mean.lat)

# Transform the coordinates into the Robinson projection
x, y = m(lon2dc, lat2dc)
levels = np.linspace(0, 12, 30)
cs = m.contourf(x, y, coarsened_mean, cmap='viridis', levels=levels)
# Add a color bar for the contourf plot
cbar = m.colorbar(cs, location='right', pad="10%",ticks=np.arange(0, 14, 2))
cbar.set_label(f'Coarsened Wind Speed ({coarsened_mean.units})')
# After contourf, fill continents with white
m.fillcontinents(color='white')
# Quiver plot with wind vectors (zonal and meridional wind components)
quiver_plot = ax.quiver(lon2d, lat2d, pu, pv, scale=500)

# Set labels and title
plt.title('Wind Vector Plot (Eastward and Northward Components)')

# Show the plot
plt.show()

We use the gridded wind vectors and coarsened wind speed for this plot. Ideally, the wind vectors should also be coarsened instead of simply sliced to ensure better comparability with the coarsened wind speed. Despite this, the plot clearly shows regions with higher wind speeds and more prominent wind vectors, while in the blue-shaded areas with lower speeds, the arrows are also smaller.

**Extra Exercise: Coarsen the zonal and meridional wind vectors and plot them together with the coarsened wind speed.**

In [None]:
# Extra Excercise 

Plotting is more efficient on a coarser grid. However, is it always appropriate? Compute the standard deviation within each grid cell and assess the variability that might be obscured due to the coarser gridding.  

**6. Exercise:**  
1. Compute standard deviation (std) of `wind_speed` within each 1°x1° grid cell and name it `coarsened_std`
2. Update attributes
3. Visualize the std as a contourplot

In [60]:
## your calculation here 
coarsened_std = 


In [None]:
# your plot here



**Note:** The plot shows pronounced spatial variability in certain areas. The right grid resolution is crucial and varies based on whether you're studying broad or fine details, relevant in both atmospheric and ocean data. 

To summarize, both `slicing` and `coarsen` offer distinct methodologies for handling and visualizing large datasets. While slicing is a direct approach to selectively display data, making visualizations more intelligible, coarsen provides a more comprehensive representation by spatially averaging the data. This ensures key information is retained, but the spatial variability within the data is smoothed out. This is because it averages over specified spatial windows, and as a result, finer-scale variations that fall below the size of this window are effectively lost or averaged out.  

For big patterns, a coarser grid works. For detailed studies on small phenomena, use a fine grid and perhaps zoom into an area of interest.  To achieve this, we can slice the data. Unlike before, where you might pick any x-value for latitude or longitude, we can now select a specific box using `wind_speed_sliced = wind_speed.sel(lon=slice(lon min, lon_max),lat=slice(lat_min, lat_max))`. 


In [None]:
#run the cell

# Define the Indian Ocean region: 
# Approximate bounds: Longitude 20°E to 120°E, Latitude -60°S to 30°N
lon_min, lon_max = 20, 120
lat_min, lat_max = -60, 30

# Slice the wind speed data for the Indian Ocean region
wind_speed_sliced = wind_speed.sel(lon=slice(lon_min, lon_max), lat=slice(lat_min, lat_max))

# Create the figure and axis
fig, ax = plt.subplots(figsize=(12, 7))

# Define the map projection, but zoom in on the Indian Ocean
m = Basemap(projection='cyl', 
            llcrnrlon=lon_min, urcrnrlon=lon_max,  # Set longitude bounds
            llcrnrlat=lat_min, urcrnrlat=lat_max,  # Set latitude bounds
            ax=ax)

# Get longitude and latitude data from the sliced dataset
lon2d, lat2d = np.meshgrid(wind_speed_sliced.lon, wind_speed_sliced.lat)


# Define levels for contouring the wind speed
levels = np.linspace(0, np.max(wind_speed_sliced), 15)

# Plot the wind speed data using contourf
cs = m.contourf(lon2d, lat2d, wind_speed_sliced, cmap='viridis', levels=levels)

# Add a colorbar
cbar = m.colorbar(cs, location='right', pad="10%", ticks=np.arange(0, 14, 2))

cbar.set_label(f'Wind Speed ({wind_speed_sliced.units})')

# Use Basemap to fill continents with white
m.fillcontinents(color='white')

# Draw coastlines (only coastlines, no rivers)
m.drawcoastlines()


# Add gridlines (parallels and meridians) for the Indian Ocean region
m.drawparallels(np.arange(lat_min, lat_max + 10, 10), labels=[1, 0, 0, 0], linewidth=0.5, color='gray')
m.drawmeridians(np.arange(lon_min, lon_max + 20, 20), labels=[0, 0, 0, 1], linewidth=0.5, color='gray')

# Get the time of the data (assuming the first time step is relevant) for the title
time = str(ds.coords['time'].values[0])

# Set a dynamic title based on the dataset
plt.title(f"Wind Speed over Indian Ocean at {time[:10]} ({wind_speed_sliced.units})")

# Show the plot
plt.show()


**Extra Exercise: Slice a region in the Northern North Atlantic.**

In [None]:
# Extra Excercise

In [None]:
ds.close()

# Key Learnings:


**Scientific Modules:** You've worked with numpy and xarray, both crucial in handling and analyzing scientific data in Python.

**Variables and Types:** You've interacted with various data types like floats, arrays, and DataArrays in xarray, understanding how to manipulate and process them.

**Operators and Comparisons:** You've used mathematical operations in slicing and transforming data and made comparisons when selecting regions (e.g., slicing wind speed data by latitude and longitude).

**Linear Algebra:** You've dealt with vector data, like the zonal and meridional wind components, and their role in wind vector calculations.

**Scientific Algorithms:** You've calculated means and standard deviations (e.g., resampling or coarsening data) to summarize wind speed data, a fundamental part of statistical analysis.

**Exceptions and Error Handling:**You navigated errors like ValueError and projection issues when plotting, which required adjusting your approach to ensure the code executed properly.

**Visualization, Plotting, and Data Organization:** You’ve extensively worked on visualization tasks using matplotlib and Basemap to represent data with wind vector plots and contour maps. You've also made adjustments to plot wind speed and zonal/meridional wind components.

**Data Extraction and Manipulation:**You've sliced and extracted specific regions of datasets, e.g., focusing on the Indian Ocean for wind speed visualization. You’ve also resampled data spatially using xarray operations.

**Spatial Data Resampling:** You’ve applied resampling techniques (coarsening) to reduce the resolution of data for more manageable computations and better visualization.
