<img style="float: left;" src="earth-lab-logo-rgb.png" width="150" height="150" />

# Earth Analytics Education - Climate 101 Workshop

## Interactive Data Activity

This notebook is an interactive activity to demonstrate the capabilities of using **Python** to work with climate data! We will be using packages such as **xarray** to open and manipulate climate data into a meaningful graph. 

In [None]:
# Import packages
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import seaborn as sns

# Plotting options
sns.set(font_scale=1.3)
sns.set_style("white")

## Challenge 1 - Select Your Data

The data we are using is described as a "Monthly aggregation of downscaled daily meteorological data of Monthly Precipitation Amount from College of Global Change and Earth System Science, Beijing Normal University". In short, the data is monthly summary of lots of meteorological data, such as precipitation, air temperature, and more. The data also includes a climate model of what is likely to come for these variables. 

Below, you will assign three variables to choose which data you want to work with in this notebook. 

`model = ` can be set to any number between 0 and 19. You can see the list of models you are choosing from in the cell two below this one. The models are listed after `model_name = ` All of the models are different models for how the climate will change going into the future. There are 20 options for models, and to pick one you can assign `model = ` to any number between 0 and 19, where 0 is the first option in the list, and 19 is the last. 

`var = ` is the variable in the dataset you want to be analyzed. You can see the variables in the cell two below this one. The variables are listed after `var_long_name = `. The variables are as described by the variable name, so `air_temperature` is the aggregate air temperature for each month, for example. There are 9 options for variables, and to pick one you can assign `var = ` to any number between 0 and 8, where 0 is the first option in the list, and 8 is the last. 

Lastly, `scenario = ` can be chosen to pick which climate scenario you want to pull your data from. `0` is the historical data and doesn't include any modeling. `1` is the `rcp45` scenario, which is described as an intermediate climate scenario. `2` is the `rcp85` scenario, which is a worst case climate scenario. 

In [None]:
# Model options between 0-19
model = 2
# Options 0-8 will work for var. Var maps to the variable name below
var = 4
# Options range from 0-2
scenario = 1

In [None]:
dir_path = 'http://thredds.northwestknowledge.net:8080/thredds/dodsC/'

# These are the variable options for the met data
variable_name = ('tasmax',
                 'tasmin',
                 'rhsmax',
                 'rhsmin',
                 'pr',
                 'rsds',
                 'uas',
                 'vas',
                 'huss')

# These are var options in long form
var_long_name = ('air_temperature',
                 'air_temperature',
                 'relative_humidity',
                 'relative_humidity',
                 'precipitation',
                 'surface_downwelling_shortwave_flux_in_air',
                 'eastward_wind',
                 'northward_wind',
                 'specific_humidity')

# Models to chose from
model_name = ('bcc-csm1-1',
              'bcc-csm1-1-m',
              'BNU-ESM',
              'CanESM2',
              'CCSM4',
              'CNRM-CM5',
              'CSIRO-Mk3-6-0',
              'GFDL-ESM2G',
              'GFDL-ESM2M',
              'HadGEM2-CC365',
              'HadGEM2-ES365',
              'inmcm4',
              'IPSL-CM5A-MR',
              'IPSL-CM5A-LR',
              'IPSL-CM5B-LR',
              'MIROC5',
              'MIROC-ESM',
              'MIROC-ESM-CHEM',
              'MRI-CGCM3',
              'NorESM1-M')

# Scenarios
scenario_type = ('historical', 'rcp45', 'rcp85')

# Year start and ends (historical vs projected)
year_start = ('1950', '2006', '2006')
year_end = ('2005', '2099', '2099')
run_num = [1] * 20
run_num[4] = 6  # setting CCSM4 with run 6
domain = 'CONUS'

In [None]:
time = year_start[scenario]+'_' + year_end[scenario]
print("\u2705 Your selected time period is:", time)

In [None]:
# This is only going to provide monthly data
file_name = ('agg_macav2metdata_' +
             str(variable_name[var]) +
             '_' +
             str(model_name[model]) +
             '_r' +
             str(run_num[model])+'i1p1_' +
             str(scenario_type[scenario]) +
             '_' +
             time + '_' +
             domain + '_monthly.nc')

print("\u2705 You are accessing:\n", file_name, "\n data in netcdf format")

In [None]:
full_file_path = dir_path + file_name
full_file_path

## Challenge 2 - Run the Cell Below to Open Your Data

Run the cell below to open your dataset.


In [None]:
# Open the data
with xr.open_dataset(full_file_path) as file_nc:
    max_var_xr = file_nc

# View xarray object
max_var_xr

## Challenge 3 - Subset Your Data

Currently, the dataset you have is too big to work with. You can fix this by subsetting the data! There are two ways you can subset the data: spatially, and temporally. 

To spatially subset the data, you will only look at data from one point in the xarray Dataset. Below, assign a new number for `latitude` and `longitude` to pick a new point. The data's latitude values range from about 25 to 50, and the data's longitude values range from 235 to 292. So try and pick new values within those ranges.

To temporally subset the data, you can pick a start date and end date to trim the data to. Below, assign new values for the data to start and end at. Make sure the values you assign stay in the quotes provided. The format should be `'yyyy-mm'`. Keep in mind that depending on which scenario you chose above, the years of your data will be different. So pick dates that are within the scenario you chose!

|Scenario Number|Date Range|
|-------|-----------|
|0|1950-2005|
|1|2006-2099|
|2|2006-2099|

In [None]:
# Select the latitude, longitude, and timeframe to subset the data to

# Ensure your latitude value is between 25 and 50, and your logitude value is between 235 and 292
latitude = 35
longitude = 270
time_start = '2008-01'
time_end = '2012-09'

In [None]:
# Selecting the nearest point to the latitude and longitude that was input
max_var_point = max_var_xr[var_long_name[var]].sel(
    lat=latitude, lon=longitude, method='pad')

# Slicing the data to the timeframe requested
max_var_point = max_var_point.sel(time=slice(time_start, time_end))

Below is a plot that shows where the latitude and longitude you selected are, and where the data in the rest notebook will be pulled from!

In [None]:
extent = [-120, -70, 24, 50.5]
central_lon = np.mean(extent[:2])
central_lat = np.mean(extent[2:])

f, ax = plt.subplots(figsize=(12, 6),
                     subplot_kw={'projection': ccrs.AlbersEqualArea(central_lon, central_lat)})
ax.coastlines()
ax.set_extent(extent)
# Plotting the star in the CRS of the map from the coords we have
ax.annotate('\u2605', xy=(longitude-360, latitude),
            xycoords=ccrs.PlateCarree()._as_mpl_transform(ax), color='purple', fontsize=20)
ax.set(title="Location of the lat / lon Being Used To to Slice Your netcdf Climate Data File")

# Adds a bunch of elements to the map
ax.add_feature(cfeature.LAND, edgecolor='black')

ax.gridlines()
plt.show()

## Challenge 4 - Modify your plot

With the newly subset data being more reasonable in size, you can now plot the data! Below is the code you use to plot a line showing the change in the variable you selected at the top over time. 

There are a few aspects of the plot that you can modify to make the plot even better. First, you can change the title, xlabel, and ylabel by modifying the code seen here: 

```
ax.set(title="Modify this text to change the title!", 

       xlabel="Modify this text to change the x axis label!",
       
       ylabel="Modify this text to change the y axis label!")
```

Make sure when you change the names of those variables, that you keep the new title or axis label within the quotes already there.

You can also change the color of the plot by changing these variables colors listed after `color=`, `markerfacecolor=`, and `markeredgecolor=`. Change those to colors you think fit the plot better and see what changes! When you change them to a new color, make sure the new color is still within the quotes provided. 

In [None]:
# Plotting the subset data
fig, ax = plt.subplots(figsize=(12, 6))
max_var_point.plot.line(ax=ax,
                        marker="o",
                        # Change the line color
                        color="orange",
                        # Change both variables below to change the color of the markers
                        markerfacecolor="black",
                        markeredgecolor="black")

# Change the values below to match the data you selected
ax.set(title="Modify this text to change the title!",
       xlabel="Modify this text to change the x axis label!",
       ylabel="Modify this text to change the y axis label!")

plt.show()

## Challenge 5  - Export your data to a csv file

This subset data is worth sharing! Below you will export the data to a `.csv` file. 

In [None]:
# Changing your data to a numpy dataframe to make it exportable
max_var_point_df = max_var_point.to_dataframe()
max_var_point_df.head()

In [None]:
# Creating a file name based on the variables you chose earlier!
# The name should be the variable you chose, and then the start and end date of the subset
file_name = var_long_name[var] + "-" + time_start + "-" + time_end + ".csv"
file_name

In [None]:
# Export to a csv file to share with your friends!
max_var_point_df.to_csv(file_name)