## Make sure you are checking that your code full runs in your notebook before submitting to Gradescope!
1. Restart your kernel with the refresh button in the top panel below this tab.
2. Run through all your cells one by one, checking to see there are no errors!
3. Read through the Submission and Autograder announcement on Canvas for more details. 

### To check your answers with the autograder as you go, run this cell below:


In [None]:
%%capture
# Install the autograder to check your answers as you go along
import sys
!{sys.executable} -m pip install otter-grader
# Initialize Otter
import otter
grader = otter.Notebook("HW3.ipynb")

# Homework #3
**Due Friday Nov. 10th @ 11:59 pm**

**Objective:** This assignment will give you experience with NumPy genfromtext, Pandas, advanced plotting, and Cartopy.


## **Instructions:**
### Accessing Class Code

1. Clone this repository into your own JupyterHub by running this command in your home directory:
```git clone your_SSH_URL```

2. Once you have cloned the repository, go into your directory and set your branch to "main" (See the GitHub cheatsheet for help on this). You will not need to reset your remote origin, as you have already directly cloned from your own version of the repository.

3. There should now be a "homework_3" directory in the home directory of your JupyterHub. In terminal, change directories into "homework_3". Next, click on the the "homework_3" icon on the filepath hierarchy in the left panel of JupyterHub. If you don't see it, make sure you're in the home folder by clicking the folder icon under the search bar. 

4. Double click the "HW3.ipynb" to open it in a new tab and begin working on the assignment. Read the instructions carefully, and make sure to write your answers in the specified cells. Typically, you will see a "..." in the places you need to fill in. Make sure to use the variable names provided in the starter code. See the "Working in your Notebook" section below for an example. There are some autograder tests embedded in the notebook, but there are also some hidden tests that will be graded after submission.

5. Edit the README file and write your name and UW NetID. Add a paragraph on what plots elements are required for this course and why these requirements are important (4-5 sentences). (5 points) Review the week 5 pre-lecture or the "10-24_lecture" (slide 33) to check these requirements. 

6. As you continue to answer the homework questions and make edits to your code, make sure to regularly update your GitHub repository as well via git add, commit, and push (steps 15, 16, 19 in 0b). A good rule of thumb would be to run these git steps anytime you make an addition or change that you don't want to accidentally lose. Generally, you can push once a day to maintain good version control practices. <br>
As a note, make sure that your git commands are running without errors before you refresh your GitHub and check your changes. If you are not seeing the updated changes you created in your local JupyterHub directory, check where your status is by this command: <br>
``` git status```

Then, you can see if you made an error with your git add, commit, or push commands. 

**Sometimes, our JupyterHub server has trouble remembering the file permissions for our SSH keys. If you get a file permission error with your private ssh key, run this line of code:**
```chmod 400 ~/.ssh/id_ed25519``` 
<br>

This will change your file permission to the proper permissions that SSH requires.
    
### Submitting to GradeScope

7. Go to the class Gradescope dashboard and submit your personal GitHub repository link to the Homework 2 assignment. Make sure your GitHub is synchronized with Gradescope to access both your public _and_ private repositories. If prompted, log in to GitHub.

8. Run the autograder to check if your code runs and if you passed the initial unit tests. You should be able to run the autograder as many times as you want before submitting. Again, double check that your final answers are stored in the provided variable names given in the starter code!

13. Once the autograder has finished running, check that you have submitted the assignment. If you make any more changes to your code after submitting to Gradescope, make sure to push your changes to GitHub and resubmit the assignment on Gradescope. You can submit as many times as you want as there is no maximum submission attempts, but be sure to have your final submission in before the deadline.

### Working in your Notebook
   To help you start thinking about how to write meaningful and concise variable names, we have provided variable names in most of your questions. 
    Note that there is an ellipsis (the "...") after each of the variable names. These are the sections you are expected to fill in. Please use the provided variable names (ie "pelagic" and "coastal" in the above example) to report your final answer back in. This will ensure that the autograder on Gradescope runs properly. <br>
    <br>Make sure that you are adding comments and your outside references as you go along! Part of your grade will include using best coding practices in your homework assignments. 

### Honor Code

- Complete the assignment by writing and executing text and code cells as specified. For this assignment, do not use any features of Python that have not yet been discussed in the lessons or class sessions.

- Please keep in mind our late work and dropped homework grading policy. Review the syllabus for details.

- You can acknowledge and describe any assistance you've received on this assignment in the specified cell of this HW3 notebook, whether that was from an instructor, classmate (either directly or on Ed Discussion), and/or online resources other than official Python documentation websites like docs.python.org or numpy.org. Alternatively, if you prefer, you may acknowledge assistance at the relevant point(s) in your code using a Python comment (#). Don't forget that you can receive extra credit from answering at least one question on Ed Discussion!

## Grade Breakdown
- Question 1: 25 points
- Question 2: 30 points
- Question 3: 30 points
- Best coding practices: 10 points
- README: 5 points

**Total: 100 points**

- Extra Credit: 5 points for answering a question on Ed Discussion

# Question 1: Pandas and CSV Files (25 points)

![Axial Seamount data](figures/axial_seamount.png)

For this exercise we will be examining vertical profiler data from Axial Seamount curated by Ocean Data Labs (https://datalab.marine.rutgers.edu/data-nuggets/nutrients/).

The figure above shows dissolved oxygen and dissolved organic matter depth profiles over a 4 month period in 2017. It was created using the profiler data you will be looking at today. The data will be directly accessed from the Ocean Data Labs website, using the "profiler_url" string in the cell below.


## Instructions 
1. Using Pandas, read the .csv into a DataFrame by its URL (cell below) and store it in the appropriate answer variable. If you forgot how to do this, read the [documentation](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html). Print the names of the columns in this DataFrame and store many rows it has in the specified answer variable. (5 points)

2. Use the describe() function to show and print summary parameters (mean, max, std, count, etc.) of each column of the DataFrame. Without hardcoding, store and print the mean value of Seawater Temperature (deg_C) in the specified answer variable. If you are having trouble accessing your column, double check that you are spelling the column name exactly as it is stored in the dataframe. You can also use your "tab" button to autofill. (5 points)

3. Use the loc() function to find all rows of your DataFrame where the "time" column is equal to the string "2017-08-04 17:00:00". Assign those rows to a new DataFrame and store them in their respective answer variable. Repeat for where time is "2017-12-08 17:00:00". Use the display() function to print your new, subsetted dataframes. (5 points)


4. Your two new DataFrames contain single water column profiles measured at your selected times. Let's make a figure to compare the water column profiles of dissolved oxygen and dissolved organic matter (CDOM). Follow these steps, **and make sure to include all required plot elements**: (10 points)

>a) Use matplotlib to make a figure with two subplots. The subplots should be columns, not rows. Check the [documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html) if you need help.
	
>b) On the first subplot, plot CDOM (x-axis) vs. depth* (y-axis) for your two subsetted dataframes. Plot two lines in different colors: one profile for your 2017-08-04 17:00:00 DataFrame, and the other for your 2017-12-08 17:00:00 DataFrame. Reverse the y-axis so depth values increase downward.

*Note: You can assume pressure is a proxy for depth (1dbar = ~1 meter).

>c) On the second subplot, plot dissolved oxygen vs. depth for the two subsetted dataframes following the same conventions as part b).

>d) Add a legend to your figure.

>e) In two sentences, describe how dissolved oxygen and CDOM vs. depth profiles vary between the two dates you have plotted. 



In [None]:
# URL to read in to pandas, DO NOT change this!
profiler_url = 'https://datalab.marine.rutgers.edu/wp-content/uploads/2020/08/E01_RS03AXPS_Axial_Base_profiler.csv'

In [None]:
# don't forget to print both your answer variables and the outputs that instructions say to print!
## PART 1
print("Part 1)")
# store profiler dataframe here
pro_df = ...
...
...
# store number of dataframe rows here
df_rows = ...

## PART 2
print("Part 2)")
...
# store mean seawater temperature here
mean_temp = ...
... 

## PART 3
print("Part 3)")
# store rows where time == "2017-08-04 17:00:00" here
aug_df = ...
...
# store rows where time == "2017-12-08 17:00:00" here
dec_df = ...
...

## PART 4
print("Part 4)")
# create your plot
...

In [None]:
grader.check("Question 1")

# Question 2: genfromtxt and cartopy (30 points)

![Data map](figures/map.png)

PANGAEA is an amazing source of oceanographic data. The database has compiled data from over hundreds of cruises, accumulating to over 419,000 datasets with over 25 billion individual measurements and observations. _Citation_: https://doi.org/10.1038/s41597-023-02269-x.

For this exercise we will be reading and mapping measured surface salinity and temperature data from a 2016 transatlantic cruise from South America to Europe (COLIBRI cruise 35MJ20150607). Source of ship data: https://doi.pangaea.de/10.1594/PANGAEA.865996.

Our data is stored in a text file in the data folder of this repository: "data/35MJ20150607_CO2_underway_SOCATv4.tab"

1) Use readline() in a loop to print the first 50 rows of the text file. Store and printhow many lines of header are in this text file in the answer varaible. Now, look at the columns. Store the column indices for latitude, longitude, meausred salinity (Sal), and measured temperature (Temp [°C]) in a tuple and print it. (5 points)

2) Use numpy genfromtxt() to read the latitude, longitude, measured salinity (Sal), and measured temperature (Temp [°C]) columns into a 2-D numpy array of floats. Pass in your tuple from Part 1 as the input for the "usecols" argument in the genfromtxt() function. Store the resulting 2-D array in the specified answer variable and print it. (5 points)

3) Use matplotlib and cartopy to plot lot longitude, latitude, and salinity on a map. **Include all required plot elements.** Follow these steps: (10 points)

>a) Make a figure using matplotlib.pyplot figure() to set the figure size to (15,8) ([documentaion]("https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.figure.html")). Set the map projection to PlateCaree (check out this [example]("https://scitools.org.uk/cartopy/docs/v0.15/matplotlib/intro.html") from the cartopy documentaion).

>b) Add coastlines to the map, with resolution set to '110m' and color to black.

>c) Add OCEAN, LAND, and BORDERS features to the map. Choose appropriate colors for these features.

>d) Add latitude and longitude gridlines and set to white.

>e) Make a scatter plot of latitude, longitude, and salinity on your map. Plot longitude in your x-axis, latitude in your y-axis, and salinity as color. Add a colorbar and label it.

>f) In a comment below your map, answer the following question: 
>> Do you see any big jumps or anomalies in the salinity data on your map? Where? Why do you think there is/is not a big jump in the data?

4) Make a second map identical to the first, but plot temperature as color instead of salinity. Answer the same question in Question 3, part f but for temperature. (10 points)

In [None]:
# don't forget to print both your answer variables and the outputs that instructions say to print!
## PART 1
print("Part 1)")
# store the string to your file path here
filepath = ...
...
# store the number of header lines in here
header_lines = ...
...
# store your lat, lon, salinity, and temperature column indicies in a tuple here
data_cols = ...

## PART 2
print("Part 2)")
# store your resulting data array from numpy genfromtxt() here
lat_lon_sal_temp = ...
... 

## PART 3
print("Part 3)")
# create your salinity map
...

## PART 4
print("Part 4)")
# create your temperature map
...

In [None]:
grader.check("Question 2")

# Question 3: Xarray and 2-D plotting (30 points)

![offshore plot](figures/question3.png)

This question uses echo sounder data curated by Ocean Data Labs (https://datalab.marine.rutgers.edu/data-nuggets/zooplankton-eclipse/).


The figure above uses sonar to show diel veritical migration of zooplankton. Zooplankton spend their daylight hours deeper in the water column to avoid visual predators. As the sun sets and the water column darkens, they make their way to the food-rich surface. Sonar using backscattering data to predict where the zooplankton are at a given time. Higher backscattering values correlate to increased biomass. Note in the figure above how the surface shows high backscattering at the surface (dark red), but we can see another layer of lighter blue/green that represents zooplankton moving up and down.

  During an eclipse at ~17:00 on August 21st 2017, as the moon blocked the sun’s light, the zooplankton began their nightly vertical migration through the water column. Once the moon moved passed the sun and light in the water column increase, the zooplankton realized their mistake and swam back down. Note that since the eclipse was such a short event, the zooplankton did not make it all the way to the surface.

For this exercise, we will use xarray and 2-D plotting to examine a subset of this data. We will look specifically at the sonar backscatter data for two hours surrounding the solar eclipse.

1) Your netcdf file is stored in the data folder. The filepath string is listed in the starter code cell below this cell. Use the xarray open_dataset() function to open the .nc file as an xarray dataset. (1 point)

2) Display your DataSet in the solution cell below. Answer the following subquestions and store/print the answers in the approriate answer variable in the solution cell: (5 points)

>a) Store the coordinates of this DataSet in the "echo_coords" variable below without hardcoding. _HINT:_ Look at the Parameters section of xarray.dataset in the [dataset documentation](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html) and [here](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.coords.html) on how to access the coords.

>b) Store the data variables of this DataSet in the "echo_vars" variable below without hardcoding. _HINT:_ Look at the Parameters section of xarray.dataset in the [dataset documentation](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html) and [here](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.data_vars.html) on how to access the data_vars.

>c) Store the dimensions of this dataset in the "echo_dims" variable below without hardcoding. _HINT:_ Look at the xarray.dataset.dims [documentation](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.dims.html) on how to access this. 
- In a comment above or below your "echo_dims" line, write which coordinates correspond to the layer, rows, and columns of the "echodata" dataset. For example, is frequency a layer, row, or a column in "echodata"? What about "ping_time" and "range_bin"? Try to visualize and draw out on some paper what you think the data structure looks like based on the figures in the pre-lecture and lecture slides if you're having trouble.

3) The different layers of this DataSet are sonar backscatter measurements (MVBS) taken at different frequencies: 38 kHz, 120 kHz, and 200 kHz. The best frequency for detecting zooplankton is 200 kHz, so let's subset our echodataset by the 200 kHz frequency. (2 points)

>a) Select the 200 kHz layer and assign it to the provided answer variable. 

>b) Display the new DataSet.

4) Extract the ping_time (times), range_bin (depths), and MVBS values from your subsetted 200 kHz DataSet and store them as new arrays in the assigned them answer variables. Store and print the shapes of each new array in their respective answer variables. (3 points)

5) Plot the data using matplotlibs.pyplot figure() and [pcolormesh()](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.pcolormesh.html) function. **Include all required plot elements.** Use the shapes of the data that from Part 4 to figure out which variables correspond to x, y, and C (color) values. Add a colorbar to the side and a label. Can you see where the zooplankton are based on the backscattering data? (5 points)

6) You may notice your plot looks different than the example plot above. This is because our x-axis and y-axis are flipped compared to the example figure. This means, we need to reshape your MVBS array so that we can plot the data with ping_time (time) on the x-axis and range_bin (depth) on the y-axis. Do this by following these steps: (9 points)

>a) Flip the MVBS array using numpy flip(). (Look up the documentation if you're not sure how to do this!)

>b) Transpose the flipped MVBS array using numpy transpose().

>c) Store the shape of the resulting MVBS array in its answer variable and print it.

>d) Plot the data using the new MVBS array with ping_time on the x-axis, and range_bins on the y-axis. Flip the y-axis so that range values increase downward. Include a colorbar to the side and a label.

7) Improve the visibility of the zooplankton migration on your pcolormesh plot from Part 6. You can copy and paste most of your answer from Part 6 here (don't forget the colorbar!), but adjust the following: (5 points) 

>a) Choose a good colormap from [matplotlib](https://matplotlib.org/stable/tutorials/colors/colormaps.html). Jet is a good example you can try, but try the other color gradients under the "Miscellaneous" section.

>b) Experiment with changing the limits of your colormap. Do this by using pcolormesh named variables *vmin* and *vmax*, which represent the data range of the colormap. Try to get your plot visibility to look as similar to the example plot above.
- _HINT:_ Try using adding/subtracting the standard deviation of the color data from the min() and max(). How does this change your color scale, and what can you do to improve it?


In [None]:
# import engine to run xarray
import netCDF4
# filepath to .nc file (DO NOT CHANGE THIS CODE!)
filepath = "data/OOI-D20170821-T163049_MVBS.nc"
# solution cell below

In [None]:
# Write your code in this solution cell:
print('part 1)')
# store your xarray dataset here
echodata = ...

print('part 2)')
...
# store your echodata coordinates here
echo_coords = ...
...
# store your echodata data variables here
echo_vars = ...
...
# store your echodata dimensions here
echo_dims = ...
# write your answer to part c here
...

print('part 3)')
# store your subsetted dataset by frequency of 200 kHz (200,000 Hz) here
echodata_200 = ...
...

print('part 4)')
# store your ping_time array here
times = ...
# store your ping_time shape here 
times_shape = ...
# store your range_bin array here
depths = ...
# store your range_bin shape here
depths_shape = ...
# store your MVBS array here
MVBS = ...
# store your range_bin shape here
MVBS_shape = ...

print('part 5)')
# create your plot
...

print('part 6)')
...
# store your flipped and transposed MVBS array here
MVBS_flipT = ...
# store shape of flipped and transposed MVBS, then print
MVBS_flipT_shape = ...
...
#plot again, with flipped and transposed MVBS
...

print('part 7)')
# plot figure with adjusted color 
...

In [None]:
grader.check("Question 3")