# Week 6 group activities

In [None]:
# import the required packages once and for all
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Introduction to group coding exercises
Today you’ll work on this exercise in the same groups of 3-4 you were assigned last week, submitting a single notebook file at the end of the class period. Decide amongst yourselves which member will upload the completed notebook to Gradescope this week. Make sure that everyone takes a turn being the “Uploader”. _You cannot upload the final code two weeks in a row._

Designate a different group member to be the "Reporter". The Reporter will be in charge of participating in the group discussion at the end of the class session.

### Workflow
Each question will be timed to ensure that everyone gets to work on at least a part of every question. Group activities are not graded by completeness or correctness, but by effort. We will be breaking down each question in the following order:  
1. Independent work 
2. Group work and discussion on coding question
3. Group work and discussion on reflection questions

You are welcome and encouraged to communicate with other groups and the teaching team when you feel stuck on a problem. 

As a reminder, we will be grading based best practices in coding. These include: 
1) Variables are used to store objects

2) Make sure that your variable names are meaningful

3) Format your code consistently

4) Add comments to document your intention, as well as (less commonly) tricky implementation

5) Documenting help from outside sources, such as from other groups or online documentation. 

6) Final notebook fully runs from start to finish. A good way to check this is restarting the kernel and fully running through all the cells to check for any errors.

### Storing your answers
In the code cells where you will write your answers, there will be comments denoting:

"**# your code**"

and 

"**# answer variables**"

You may store any intermediary variables in the **your code** section. If you do not have any intermediary, you can also store your answer directly in the answer variables.

### Required Plot Elements for Figures
This assignments requires you to create and design figures using `matplotlib`. To practice good plotting practices, each figure will require the following to receive full points:
1) Concise, descriptive title for each figure/subplot
2) Axis labels with units (when possible)
3) Appropriate axis limits (minimum and maximum)
4) Appropriate tick resolution
5) Legend when using different datasets 
6) Appropriate font size (a good range is 12-15)

## Note here **and in the Gradescope submission** each of your group members:
1.
2. 
3.

## Question #1. Practice with `pandas`: Ballard Locks salmon counts (50 minutes)

<!-- BEGIN QUESTION -->

## Part a. Loading and inspecting data (15 minutes)

Here are some local salmon data we can load and visualize using pandas! Let's first get an overview of the data by practicing some dataframe functions.

We will load this data from the URL stored in the `filepath` string in the starter code cell below. Instead of downloading the data itself into the JupyterHub file system, we will practice pulling data directly into a pandas dataframe. 

### Instructions

1. Load the salmon data URL into Pandas using the pandas `pd.read_csv()` function and store it into the `salmon_df` variable. Follow the steps below to supply additional arguments to `pd.read_csv()` function while loading the file:

> * Specify that the first (positional index 0) column (the dates) should be the index.
>
> * Also specify that Pandas should parse the index as dates (datetimes).
>
> * Consult the documentation on [https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) to find the arguments that specify these two things.

2. Display the data.

3. Use `.describe()` to view the summary statistics.

4. Answer the following questions with your group in a comment or a markdown cell below your code:

>i. How many salmon species are counted?
>
>ii. What are the mean daily counts for each species?
>
>iii. What are the maximum daily counts for each species?

In [None]:
# starter code cell, DO NOT CHANGE THIS CODE!
filepath = 'https://raw.githubusercontent.com/OCEAN-215-2025/week6_group_activities/refs/heads/main/data/ballard_salmon_counts.csv'

In [None]:
# load your csv file from the url and store in here

# display the DataFrame and its statistical summary


_Answer Part 4 in this markdown cell_

i. ...

ii. ...

iii. ...

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

## Part b: **Plot the data!** (20 minutes)

### Instructions

1. Use `ax.plot()` to make line plots of each of the three species' counts over time. In other words, the x-values should be datetimes from the index and the y-values should be daily salmon counts for each species. Your plot should have 3 different lines.
2. Make each species the following colors for each line on your plot:
> * Chinook: 'forestgreen'
>
> * Coho: 'darkcyan'
>
> * Sockeye: 'salmon'
>


3. Label your plot axes and add a title.
4. Add a grid to your plot.
5. Add a legend.
6. Double check that your plot contains all the required elements as listed in the Assignment Instructions above.

In [None]:
# your plot here
fig = plt.figure()
ax = fig.add_subplot()

plt.show(fig)

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

## Part c: **slicing data to discover information** (20 minutes)

1. Use indexing to find out how many coho salmon passed through Ballard Locks on September 30, 2020, the day that Autumn Quarter started at UW in year 2020. Store this number in the `coho_count` variable without hardcoding. 
> _Hint:_ you'll need `.loc[]` and a conditional to do this

2. Combine `.loc[]` with slicing (`:`) to get the sockeye salmon counts for all dates in the years specified below. Make sure to print the results to check that they're correct.
> * First, do this for 2020. Save the result as a new variable, `sockeye_2020`.
>
> * Then, do the same for 2013. Save this as `sockeye_2013`.
>


3. Apply NumPy functions to `sockeye_2020` and `sockeye_2013` to find the following and store in their corresponding answer variables:
> * The maximum daily sockeye count in each year (2013 and 2020).
>
> * The total number of sockeye that passed through Ballard Locks in each year.
>
Print (via `print()`) the 4 resulting numbers.

In [None]:
# Your code for step 1

# print out the results


# Your code for step 2

# Your code for step 3

# print out the results


<!-- END QUESTION -->

## Q.1 Reflection questions (5 minutes)

The purpose of the reflection is to inform us as instructors about students comfort level with course content. We use these answers to inform how we spend class time and design coursework in subsequent weeks. This question is graded for completeness, so please answer each question in the text box below. Be concise in your answers (max. 2 sentences). 

1) What do you feel you excelled at in this exercise? Why?

2) What did you struggle with most in the exercise? Why?

3) Is there any section of the question that you did not complete? If so, briefly describe why and the section you spent the most time on. 

4) Is there any topic you feel we need to revisit or review in class? Why?

1)

2)

3)

4)

## Question #2: Practice with `pandas` and .csv files 

<!-- BEGIN QUESTION -->

## Part a. Loading and inspecting data (10 minutes)

Download the three .csv files titled `cruise_data_0.csv`, `cruise_data_1.csv`, and `cruise_data_2.csv` and upload them to your JupyterHub under a `data` folder in the directory that contains your Jupyter notebook. Each of these files contains data for 5 different pigments representing different phytoplankton groups, taken from three different cruises (for more info, see this link: https://www-air.larc.nasa.gov/missions/pacepax/). Next, 

1. load in the data in three separate variables

2. display each of them

3. Keep note of which variable is which cruise for the next part.

In [None]:
# Your codes for part (a)


<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

## Part b. Combining data in pandas (15 minutes)

Suppose you want to pool together the data from the 3 cruises for further analysis, but also want to keep track of which data comes from which source. To do so,

1. For each of the DataFrame from part (a), add a new column called `cruise` and use it to store the cruise number.

2. Use `pd.concat()` to merge the 3 DataFrames into a single one.

3. Display the new dataset and ensure that the `cruise` column starts at 0, then goes to 1 and finally 2 in order.

4. Finally, save this new dataset as a new .csv file. Title it anything you think is reasonable.

In [None]:
# Your codes for part (b)


<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

## (Optional) Part c. Sorting data

Earlier, we combined the three datasets by cruise number in ascending order. Instead, we want to combine them by date in ascending order.

1. First, use pandas to format the `date` column in your combined dataset from part 2 into datetime values.

2. Then, sort your dataset by ascending order using the date column (HINT: look up the `.sort_values()` and `.reset_index()` functions in pandas).

3. Display the first 20 rows of the data to check if your data is organized by time correctly.

4. Based on the result from step 3, determine if the cruises overlap in time. (Use the provided markdown cell to record your findings)

In [None]:
# Your codes for part (c)


_Your response for step 4_

<!-- END QUESTION -->

## Q.2 Reflection questions (5 minutes)

The purpose of the reflection is to inform us as instructors about students comfort level with course content. We use these answers to inform how we spend class time and design coursework in subsequent weeks. This question is graded for completeness, so please answer each question in the text box below. Be concise in your answers (max. 2 sentences). 

1) What do you feel you excelled at in this exercise? Why?

2) What did you struggle with most in the exercise? Why?

3) Is there any section of the question that you did not complete? If so, briefly describe why and the section you spent the most time on. 

4) Is there any topic you feel we need to revisit or review in class? Why?

1)

2)

3)

4)

## Check that your codes run without errors

Please check that your code runs without error when the notebook is executed from top to bottom. You may find the "Restart the kernel and run all cells" option (selectable using the `⏩` icon) useful.