# Week 9 group activities

## Introduction to group coding exercises
Today you’ll work on this exercise in the same groups of 3-4 you were assigned last week, submitting a single notebook file at the end of the class period. Decide amongst yourselves which member will upload the completed notebook to Gradescope this week. Make sure that everyone takes a turn being the “Uploader”. _You cannot upload the final code two weeks in a row._

Designate a different group member to be the "Reporter". The Reporter will be in charge of participating in the group discussion at the end of the class session.

### Workflow
Each question will be timed to ensure that everyone gets to work on at least a part of every question. Group activities are not graded by completeness or correctness, but by effort. We will be breaking down each question in the following order:  
1. Independent work 
2. Group work and discussion on coding question
3. Group work and discussion on reflection questions

You are welcome and encouraged to communicate with other groups and the teaching team when you feel stuck on a problem. 

As a reminder, we will be grading based best practices in coding. These include: 
1) Variables are used to store objects

2) Make sure that your variable names are meaningful

3) Format your code consistently

4) Add comments to document your intention, as well as (less commonly) tricky implementation

5) Documenting help from outside sources, such as from other groups or online documentation. 

6) Final notebook fully runs from start to finish. A good way to check this is restarting the kernel and fully running through all the cells to check for any errors.

### Storing your answers
In the code cells where you will write your answers, there will be comments denoting:

"**# your code**"

and 

"**# answer variables**"

You may store any intermediary variables in the **your code** section. If you do not have any intermediary, you can also store your answer directly in the answer variables.

### Required Plot Elements for Figures
This assignments requires you to create and design figures using `matplotlib`. To practice good plotting practices, each figure will require the following to receive full points:
1) Concise, descriptive title for each figure/subplot
2) Axis labels with units (when possible)
3) Appropriate axis limits (minimum and maximum)
4) Appropriate tick resolution
5) Legend when using different datasets 
6) Appropriate font size (a good range is 12-15)

## Note here **and in the Gradescope submission** each of your group members:
1.
2. 
3.

## Question #1: Global temperature concentrations

The figure below shows CO<sub>2</sub> concentration over the past three centuries. We will compare this trend in CO<sub>2</sub> with measured global temperature anomalies stored in "NASA_GISS_global_temp.csv" aquired from 
[NASA's Goddard Institute for Space Studies](https://data.giss.nasa.gov/gistemp/). Today you will read this data using Pandas, plot it, and model it using linear regression.

![co2-photo.png](https://github.com/OCEAN-215-2025/week9_group_activities/blob/main/img/CO2_concentration.png?raw=true)

*Image: Atmospheric CO2 from 1700 to 2020. Source: [Scripps/UCSD](https://keelingcurve.ucsd.edu/).*

<!-- BEGIN QUESTION -->



In [None]:
# import packages you need for question 1 here


<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

## Part a. Plot the observed data (20 minutes)

1) Set your `filepath` variable to the string representation of the location of the "NASA_GISS_global_temp.csv" file. 
Load the file as a DataFrame using Pandas into the `global_temp` variable. Using the `index_col` argument, set the `Year` column as your DataFrame index.

2) Make a plot of the 1880-2019 time series for Temperature Anomaly. Set the year (now the index) in the x-axis, and the "Temp_anomaly" column in the y-axis. Include the following:

> a) Markers for each data point
>
> b) Grid
>
> c) All required plot elements listed in Assignment Instructions

In [None]:
# string for your file path
filepath = ...

# load data
global_temp = ...

# make line plot
fig = plt.figure()
ax = fig.add_subplot()

plt.show(fig)

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

## Part b. Apply linear fit (20 minutes)

1) Use the `linregress()` function from `scipy.stats` to calculate a linear regression on the entire 1880-2019 time series, treating year as the independent variable x and the temperature anomaly as the dependent variable y. Store the output in the `regress` variable. 

2) Print out the value of R<sup>2</sup> and the slope of the regression (with units, if applicable).

3) Calculate the values of the temperature anomaly _as predicted by the linear fit_. Recall that in linear regression we have $y = m x + b$, where $m$ is the regression slope and $b$ is the regression intercept.

4) Make a new plot by including your linear regression fit to the previous plot in part (a). Make sure that you

>i) use different line type and color for the observed data and the regression line
>
>ii) include a legend

In [None]:
# calculate linear regression 

# print the value of R^2 and slope

# find the temperature analomy predicted by the fit

# paste the plot from part (a) below
# add the best fitted line and legend


<!-- END QUESTION -->

## Q.1 Reflection questions (5 minutes)

The purpose of the reflection is to inform us as instructors about students comfort level with course content. We use these answers to inform how we spend class time and design coursework in subsequent weeks. This question is graded for completeness, so please answer each question in the text box below. Be concise in your answers (max. 2 sentences). 

1) What do you feel you excelled at in this exercise? Why?

2) What did you struggle with most in the exercise? Why?

3) Is there any section of the question that you did not complete? If so, briefly describe why and the section you spent the most time on. 

4) Is there any topic you feel we need to revisit or review in class? Why?

1)

2)

3)

4)

## Question #2: Chlorophyll from Mercator-Pisces Biogeochemistry Model

In this question we will be accessing gridded data from a European global biogeochemical model.

"Produced by Mercator Ocean in Toulouse, France, is a global Ocean Biogeochemical analysis product at 1/2°. It is providing a 7-days mean global forecast updated weekly as well as 3D global ocean biogeochemical weekly mean analysis for the past 2 years updated every week. This product includes weekly mean files of dissolved iron, **nitrate**, phosphate, silicate, dissolved oxygen, **chlorophyll concentration**, phytoplankton concentration and primary production parameters from the top to the bottom of the Global Ocean on a 1/2° regular grid projection interpolated from the 1/4° ARAKAWA-C native grid. Vertical coverage includes 50 levels ranging from 0 to 5500 meters." 

[Source](http://marine.copernicus.eu)

<!-- BEGIN QUESTION -->



In [None]:
# import packages you need for question 2 here


<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

## Part a. Loading and transforming the data (30 minutes):

The particular data we use was downloaded from Simons CMAP and placed onto Google Drive. The Google Drive link is [https://drive.google.com/file/d/19Vkvo1AF8JEKpXS80Gh-WVCbqkQivcno/view?usp=drive_link](https://drive.google.com/file/d/19Vkvo1AF8JEKpXS80Gh-WVCbqkQivcno/view?usp=drive_link)

1. Upload the file on GoogleDrive to your JupyterHub file system. The file should be called "Mercator_nitrate_chlorophyll.csv" and should be placed inside a "data" folder at the same level as your Jupyter notebook. You may elect to do so manually or via `gdown`

2. Read the csv file into a pandas dataframe. Display the result.

3. From step 2 you should find that the data set has 2 dependent variables CHL and NO3, and 4 indepdent variables time, lat, lon, and depth. Now use `xr.Dataset.from_dataframe()` to convert the data into an xarray Dataset for which CHL and NO3 are data variables and time, lat, lon, and depth are coordinates and dimensions. (_Hint_: you will probably need to use the `.set_index()` method on the dataframe before passing it as argument to `xr.Dataset.from_dataframe()`)

4. Now display the resulting xarray Dataset and answer the following in the provided markdown cell:

> i) What is the resolution (spacing) in latitude and longitude of the dataset?
> 
> ii) How many distinct time points are there in the dataset? How many distinct value of depths are there?

5. Extract the DataArray associated with the cholorophyll concentration (CHL) from the dataset, and reduce its dimension by picking the depth closest to the surface and averaging over time points. Assign the resulting DataArray to the variable `chl_data`

_Type your answer here, replacing this text._

In [None]:
# step 1: download the data

# step 2: read the csv file into pandas dataframe

# step 3: convert the dataframe to xarray dataset

# step 4: inspect the xarray dataset

# step 5: variable extraction and dimension reduction


<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

## Part b. Make a Chlorophyll map (20 minutes)

1) From your `chl_data` data array, assign the latitude, longitude, and chl values to the `lat`, `lon`, and `chl` variables, respectively. 

2) Using matplotlib and cartopy, make a global color map using the [pcolormesh](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.pcolormesh.html) function for chlorophyll 

In addition to the required plot elements, your map should:

>i) Use the Robinson projection.
>
>ii) Use a reasonable color map (consult both the [matplotlib documentation](https://matplotlib.org/stable/users/explain/colors/colormaps.html) and the [cmocean documentation](https://matplotlib.org/cmocean/)
>
>iii) display coastline in black and land features in 'lemonchiffon'.
>
>iv) include a colorbar with appropriate color limits, and a label of unit (which is mg/m<sup>3</sup>).
>
>v) include latitude and longitude gridlines.

_HINT_: Try setting the "norm" argument in `pcolormesh()` to a [LogNorm](https://matplotlib.org/stable/api/_as_gen/matplotlib.colors.LogNorm.html) object to better distinguish between the colors! (use `from matplotlib.colors import LogNorm` to make `LogNorm()` available to you)

_HINT_: To get gridlines to show, you will need to plot them **after** you plot your data with `pcolormesh()`!

In [None]:
# step 1: extract lat, lon, and chl arrays

# step 2: make cartopy plot


<!-- END QUESTION -->

## Q.2 Reflection questions (5 minutes)
The purpose of the reflection is to inform us as instructors about students comfort level with course content. We use these answers to inform how we spend class time and design coursework in subsequent weeks. This question is graded for completeness, so please answer each question in the text box below. Be concise in your answers (max. 2 sentences). 

1) What do you feel you excelled at in this exercise? Why?

2) What did you struggle with most in the exercise? Why?

3) Is there any section of the question that you did not complete? If so, briefly describe why and the section you spent the most time on. 

4) Is there any topic you feel we need to revisit or review in class? Why?

1)

2)

3)

4)

## Check that your codes run without errors

Please check that your code runs without error when the notebook is executed from top to bottom. You may find the "Restart the kernel and run all cells" option (selectable using the `⏩` icon) useful.