<div><img style="float: left; padding-right: 3em;" src="https://avatars.githubusercontent.com/u/19476722" width="150" /><div/>

# Earth Data Science Coding Challenge!
Before we get started, make sure to read or review the guidelines below. These will help make sure that your code is **readable** and **reproducible**. 

## Don't get **caught** by these Jupyter notebook gotchas

<img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*o0HleR7BSe8W-pTnmucqHA.jpeg" width=300 style="padding: 1em; border-style: solid; border-color: grey;" />

  > *Image source: https://alaskausfws.medium.com/whats-big-and-brown-and-loves-salmon-e1803579ee36*

These are the most common issues that will keep you from getting started and delay your code review:

1. When you try to run some code on GitHub Codespaces, you may be prompted to select a **kernel**.
   * The **kernel** refers to the version of Python you are using
   * You should use the **base** kernel, which should be the default option. 
   * You can also use the `Select Kernel` menu in the upper right to select the **base** kernel
2. Before you commit your work, make sure it runs **reproducibly** by clicking:
   1. `Restart` (this button won't appear until you've run some code), then
   2. `Run All`

## Check your code to make sure it's clean and easy to read

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSO1w9WrbwbuMLN14IezH-iq2HEGwO3JDvmo5Y_hQIy7k-Xo2gZH-mP2GUIG6RFWL04X1k&usqp=CAU" height=200 />

* Format all cells prior to submitting (right click on your code).
* Use expressive names for variables so you or the reader knows what they are. 
* Use comments to explain your code -- e.g. 
  ```python
  # This is a comment, it starts with a hash sign
  ```

## Label and describe your plots

![Source: https://xkcd.com/833](https://imgs.xkcd.com/comics/convincing.png)

Make sure each plot has:
  * A title that explains where and when the data are from
  * x- and y- axis labels with **units** where appropriate
  * A legend where appropriate


## Icons: how to use this notebook
We use the following icons to let you know when you need to change something to complete the challenge:
  * &#128187; means you need to write or edit some code.
  
  * &#128214;  indicates recommended reading
  
  * &#9998; marks written responses to questions
  
  * &#127798; is an optional extra challenge
  

---

# Climate change is affecting temperatures in locations around the country

In this notebook, you will complete the following **five steps**:
  1. **Set up** your analysis by importing any necessary libraries
  2. **Download** data from the National Centers for Environmental Information (NCEI) about annual maximum daily average temperatures in Los Angeles, CA.
  3. **Clean up** some artifacts in the dates of the downloaded data
  4. **Convert** temperature values from units of Fahrenheit to Celcius
  5. **Plot** the data and write a headline/takeaway for your plot
  
Once you have finished running this example analysis, you will **create your own notebook** in which to duplicate the analysis in a location of your choosing.
  
In many locations around the U.S. (and the world!), climate change is causing **increased temperatures**. You will be able to see these trends in the Los Angeles data, and likely in data from other locations as well.

*But first...you have to set up this notebook for your analysis.*

***

## STEP 1: SET UP

### Python is more powerful with **libraries**

Because Python is **open source**, lots of different people and organizations can contribute (including you!). Many contributions are in the form of external &#128214; [**libraries**, also known as packages](https://www.earthdatascience.org/courses/intro-to-earth-data-science/python-code-fundamentals/use-python-packages/). Since they do not come with a standard Python download, &#128214; [external libraries need to be installed and **imported**](https://www.earthdatascience.org/courses/intro-to-earth-data-science/python-code-fundamentals/use-python-packages/). If you are using the `earth-analytics-python` environment on your computer or GitHub Codespaces, you should have everything you need installed.

### There are excellent `Python` libraries for tabular data and plotting

For this workflow, you will need the `pandas` library, which helps us to work with &#128214; [**tabular data** such as comma-separated value or csv files](https://www.earthdatascience.org/courses/intro-to-earth-data-science/file-formats/use-text-files/). You can think of tabular data as being like a **spreadsheet or database**. 

> It is customary to import the `pandas` library under its **alias** `pd`, to avoid taking up too much space when you use the library.

You will also use the [`hvplot` library from Holoviz](https://hvplot.holoviz.org/index.html) to plot your results. There are many ways to plot in `Python`, but `hvplot` is both simple to use and powerful.

### &#128187; YOUR TASK:

Here is some code to import the pandas library, but **watch out! - there's a couple mistakes**:

```python
import hvplot.pands
import pands as pd
```

Using the code above as a starting point, complete the following **two** steps:
  1. [ ] Paste the code from above into the **code cell** below:
  2. [ ] **Correct the typos** to properly import the pandas library under its **alias** pd as well as the hvplot pandas extension.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

The test cell below it will tell you if you completed the task successfully. **Do not try to modify the test cells.** If a test cell isn't working, check that you ran your code immediately before running the test.

In [None]:
# RUN THIS TEST CELL TO CHECK YOUR ANSWER - DO NOT MODIFY
points = 0
try:
    pd.DataFrame()
    points += 3
    print('\u2705 Great work! '
          'You correctly imported the pandas library.')
except:
    print('\u274C Oops - pandas was not imported correctly.')
    
try:
    pd.DataFrame().hvplot
    points += 3
    print('\u2705 Great work! '
          'You correctly imported the hvplot.pandas library.')
except:
    print('\u274C Oops - hvplot.pandas was not imported correctly.')
    
print('You earned {} of 5 points for importing libraries'.format(points))

## STEP 2: DOWNLOAD DATA

### Every day, petabytes of new Earth Observation data are made available online

This workflow uses **annual mean temperature data** from the U.S. National Centers for Environmental Information (NCEI). &#128214; [Check out the NCEI Climate at a Glance website where you can search for more data like this](https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/).

![National Centers for Environmental Information Logo](https://www.nesdis.noaa.gov/s3/migrated/ncei_icon_550px.jpg)

#### &#9998; YOUR TASK:
  1. [ ] In the cell below, write a 2-3 sentence description of the data source. You should describe who takes the data, where they were taken, what the maximum temperature units are, and how they are collected.
  2. [ ] Include a citation of the data (HINT: NCEI has a section for 'Citing this page', but you will have to select a particular dataset such as City > Time Series).
  3. [ ] Add some **context** to your data analysis - research Los Angeles, CA and write a 2-3 sentence **site description**, including a relevant **image**. Don't forget to include the source of your image!
  
> HINT: double-click on the Markdown cells below to modify them.
  

WRITE YOUR DATA DESCRIPTION HERE

WRITE YOUR DATA CITATION HERE

WRITE YOUR SITE DESCRIPTION HERE

### Use a Uniform Resource Locator (URL) from NCEI to download data

Here is a URL you can use to download the NCEI data you will need. Go ahead and &#128187; **copy and paste it into the code cell below**:

```python
'https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/city/time-series/USW00023174/tavg/12/12/1945-2022.csv'
```

The URL is correct! However, we still have a problem - we can't get the URL back later on because it doesn't have a **name**. Right now, it just disappears into the void! 

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/c/cf/Black_hole_-_Messier_87.jpg/320px-Black_hole_-_Messier_87.jpg" width=300 />

> Image source: [Black Hole from Wikipedia](https://commons.wikimedia.org/wiki/File:Black_hole_-_Messier_87.jpg)

#### &#128187; YOUR TASK:
  1. [ ] Pick an **expressive name** for the URL value
  2. [ ] **Reformat the URL** so that it adheres to the [79-character PEP-8 line limit](https://peps.python.org/pep-0008/#maximum-line-length)
  3. [ ] **Call your url** (type out its name) at the end of the cell to test it.
  
#### &#128214; Read about [how to name a value](https://earthlab-education.github.io/Earth-Analytics-2023-01-Intro/intro-eds-textbook/04-python-fundamentals/01-get-started-python/python-fundamentals-02-variables.html) in Python in the textbook.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# RUN THIS TEST CELL TO CHECK YOUR ANSWER - DO NOT MODIFY
url_ans = _
points = 0

import subprocess

# Check the the answer was called for testing
if isinstance(url_ans, str):
    points += 1
    print('\u2705 Great work! You called a string.')
else:
    print('\u274C Oops - make sure to call your url string for testing.')

# Check the URL length to make sure it got copied correctly
if len(url_ans) == 117:
    points += 4
    print('\u2705 Great work! Your URL is the correct length.')
else:
    print('\u274C Oops - your URL is not correct.')
    
# Subtract one point for any PEP-8 errors
tmp_path = "tmp.py"
with open(tmp_path, "w") as tmp_file:
    tmp_file.write(In[-2])
    
ignore_flake8 = 'W292,F401,E302'
flake8_out = subprocess.run(
    ['flake8', 
     '--ignore', ignore_flake8, 
     '--import-order-style', 'edited',
     '--count', 
     tmp_path],
    stdout=subprocess.PIPE,
).stdout.decode("ascii")
print('Formatting problems:')
print(flake8_out)

points -= int(flake8_out.splitlines()[-1])

print('You earned {} of 5 points for naming the URL variable'.format(points))

  > HINT: You can run the `%whos` **iPython magic command** to list all your variables (those are the ones that you can use at that moment). Many Python editors also have variable inspector features that will show you all your variables. Your url value will not be available until you save it to a variable. **Run the cell below and then check that your variable has been created**. Note that there will be other variables in the NameSpace.

In [None]:
# List all current variables
%whos

### Use pandas to download the data

The `pandas` library you imported can download data from the internet directly into a type of Python **object** called a `DataFrame`

#### What do you notice about the code below? 

Here is some (not very clean) code to download NCEI data using `pandas`:

```python
dataframe = pd.read_csv(my_url, header=2, names=['col_1', 'col_2'])
dataframe
```

#### &#9998; YOUR TASK: **Modify** the Markdown cell below and answer the following questions in a **numbered list**

  1. [ ] What do you think the **parameters** of the `pd.read_csv()` function (e.g. `my_url`, `header`, `names`) are supposed to do? 
     
     > HINT: &#128214; Check out [the pandas read_csv() documentation](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) for more info. You can also try changing putting the code in a code cell below running it with different values. See what happens!
    
  2. [ ] What are two things you could do to make this code more **expressive**?
  

ANSWER THE QUESTIONS ABOUT PANDAS.READ_CSV() HERE

### &#128187; Let's fix some code! YOUR TASK:
  1. [ ] Copy the example code from above into the cell below. Make any changes needed to get this code to run. Here's some hints:
    > HINT: The my_url variable doesn't exist - you need to replace it with the variable name **you** chose.
  2. [ ] Modify the value of the `header` parameter so that **only numeric data values** are included in each column.
  3. [ ] Clean up the code by using **comments**, **expressive variable names**, **expressive column names**, and **PEP-8 compliant code**

**Make sure to call your `DataFrame` by typing it's name as the last line of your code cell** Then, you will be able to run the test cell below and find out if your answer is correct.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# RUN THIS TEST CELL TO CHECK YOUR ANSWER - DO NOT MODIFY
tmax_df_resp = _
points = 0

# Check that a DataFrame was called for testing
if isinstance(tmax_df_resp, pd.DataFrame):
    points += 1
    print('\u2705 Great work! You called a DataFrame.')
else:
    print('\u274C Oops - make sure to call your DataFrame for testing.')
    
# Check that the DataFrame has the correct values
summary = [round(val, 2) for val in tmax_df_resp.mean().values]
if summary == [198362.0, 61.93]:
    points += 6
    print('\u2705 Great work! You correctly downloaded data.')
else:
    print('\u274C Oops - your data are not correct.')
    
# Subtract one point for any PEP-8 errors
tmp_path = "tmp.py"
with open(tmp_path, "w") as tmp_file:
    tmp_file.write(In[-2])
    
ignore_flake8 = 'W292,F401,E302,F821'
flake8_out = subprocess.run(
    ['flake8', 
     '--ignore', ignore_flake8, 
     '--import-order-style', 'edited',
     '--count', 
     tmp_path],
    stdout=subprocess.PIPE,
).stdout.decode("ascii")
print('Formatting problems:')
print(flake8_out)
points -= int(flake8_out.splitlines()[-1])

print('You earned {} of 7 points for downloading data'.format(points))

## STEP 3: CLEAN UP

### It's very rare for downloaded data to be formatted *exactly* how we want when it is imported into `Python`

First things first - Take a look at your data. Do you want to use it as is, or does it need to be modified?

![image.png](attachment:image.png)

### &#128187; YOUR TASK:

Below you will find some code that will extract the year from the funky yearmonth value NCEI gives us.

```python
dataframe.year = pd.to_datetime(dataframe.year, format='????').dt.year
dataframe
```

> This code:
>   1. Converts the yearmonth value to a `Datetime` type (`pd.to_datetime()`)
>   2. Extracts only the year (`.dt.year`)
>   3. Saves the year value back to the dataframe "year" column.

Complete the following with the code above as a starting point:
1. [ ] Copy and paste the code above into the code cell below
2. [ ] Replace `dataframe` with the name of **your** dataframe whenever it appears.
3. [ ] Replace `????` with the correct **date formatting string**. 
 
 > HINT: Check out [the Python strftime documentation](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior).
 
***

### GOTCHA ALERT! We have a problem

If you run this code twice, it will not run and you will get a `ValueError`. This happens because once you have extracted the **year only**, the format string you put in no longer applies.

You don't have to do anything about this now - you can go `Run all above` (this is in the `Cell` menu in Jupyter Notebook) to reset. In the future, there are three approaches we recommend to address this sort of problem, depending on what you need your code to do:
  1. Do not modify a `DataFrame` after it has been created - perform any changes you need in the **same cell** where you create the `DataFrame` using `pd.read_csv()`.
  2. Create a new column when you want to compute new values (more on how to do this below).
  3. Save a copy of the `DataFrame` using the `.copy()` method of `DataFrame`s and modify the copy (in the same cell).
  
When you reproduce this workflow in a location of your choosing, you can implement one of these strategies.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# RUN THIS TEST CELL TO CHECK YOUR ANSWER - DO NOT MODIFY
year_df_resp = _
points = 0

# Check that a DataFrame was called for testing
if isinstance(year_df_resp, pd.DataFrame):
    points += 1
    print('\u2705 Great work! You called a DataFrame.')
else:
    print('\u274C Oops - make sure to call your DataFrame for testing.')
    
# Check that the DataFrame has the correct values
summary = [round(val, 2) for val in year_df_resp.mean().values]
if summary == [1983.5, 61.93]:
    points += 6
    print('\u2705 Great work! You correctly extracted the year.')
else:
    print('\u274C Oops - your data are not correct.')
    
# Subtract one point for any PEP-8 errors
tmp_path = "tmp.py"
with open(tmp_path, "w") as tmp_file:
    tmp_file.write(In[-2])
    
ignore_flake8 = 'W292,F401,E302,F821'
flake8_out = subprocess.run(
    ['flake8', 
     '--ignore', ignore_flake8, 
     '--import-order-style', 'edited',
     '--count', 
     tmp_path],
    stdout=subprocess.PIPE,
).stdout.decode("ascii")
print('Formatting problems:')
print(flake8_out)
points -= int(flake8_out.splitlines()[-1])

print('You earned {} of 5 points for extracting the year'.format(points))

## STEP 4: CONVERT UNITS

### For scientific applications, it is often useful to have values in metric units

When you cited the data, you should have noted that temperature was in units of degrees Fahrenheit. Often we want data in metric units, because it makes important calculations (say, sensible heat from temperature) easier. Plus you don't want to be like the [NASA team who crashed a probe into Mars because different teams used different units](https://www.latimes.com/archives/la-xpm-1999-oct-01-mn-17288-story.html))!

### &#128187; YOUR TASK:

The code below converts the data to Celcius, using Python mathematical **operators**, like `+`, `-`, `*`, and `/`. Again, it's not well documented and doesn't follow [PEP-8 guidelines](https://peps.python.org/pep-0008/#other-recommendations), which has caused the author to miss an **important error**!

```python
dataframe['new_temperature']= dataframe['old_temperature']-32*5/9
dataframe
```

Complete the following steps:
1. [ ] Replace `dataframe` with the name of **your** `DataFrame`.
2. [ ] Replace `'old_temperature'` with the column name **you** used; Replace `'new_temperature'` with an **expressive** column name. 
3. [ ] **THERE IS AN ERROR IN THE CONVERSION - Fix it!**
  
### Want an EXTRA CHALLENGE?
Using the code below as a framework, write and apply a **function** that converts to Celcius. You should also rewrite this function name to be more expressive.
  
```python
def convert(temperature):
    """Convert temperature to Celcius"""
    return temperature # Put your equation in here

dataframe['temp_c'] = dataframe['temp_f'].apply(convert)
```

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# RUN THIS TEST CELL TO CHECK YOUR ANSWER - DO NOT MODIFY
celcius_df_resp = _
points = 0

# Check that a DataFrame was called for testing
if isinstance(celcius_df_resp, pd.DataFrame):
    points += 1
    print('\u2705 Great work! You called a DataFrame.')
else:
    print('\u274C Oops - make sure to call your DataFrame for testing.')
    
# Check that the DataFrame has the correct values
summary = [round(val, 2) for val in celcius_df_resp.mean().values]
if summary == [1983.5, 61.93, 16.63]:
    points += 9
    print('\u2705 Great work! You correctly converted units.')
else:
    print('\u274C Oops - your data are not correct.')
    
# Subtract one point for any PEP-8 errors
tmp_path = "tmp.py"
with open(tmp_path, "w") as tmp_file:
    tmp_file.write(In[-2])
    
ignore_flake8 = 'W292,F401,E302,F821'
flake8_out = subprocess.run(
    ['flake8', 
     '--ignore', ignore_flake8, 
     '--import-order-style', 'edited',
     '--count', 
     tmp_path],
    stdout=subprocess.PIPE,
).stdout.decode("ascii")
print('Formatting problems:')
print(flake8_out)
points -= int(flake8_out.splitlines()[-1])

print('You earned {} of 10 points for converting units'.format(points))

## STEP 5: PLOT

### Plot the Maximum Annual Temperature in Los Angeles, CA, USA

Present your work visually with a time-series plot of the maximum temperature values you downloaded.

### &#128187; YOUR TASK:
The code below is an attempt to plot the maximum temperature data with `Python`.

```python
dataframe.hvplot(x='col_1', y='col_2')
```
It's easy to plot in Python, but not quite this easy! You'll always need to add some instructions on labels and how you want your plot to look. Using the code above as a starting point, complete the following **four** steps:

  1. [ ] Change `dataframe` to **your** `DataFrame` name.
  2. [ ] Change `'col_1'` and `'col_2'` to **your** column names
  3. [ ] Use the `title`, `ylabel`, and `xlabel` parameters to add key text to your plot.
     > HINT: labels have to be a type in Python called a **string**. You can make a string by putting quotes around your label, just like the column names in the sample code.
  4. [ ] &#9998; Write a **headline** for your plot in the markdown cell below
    
### &#x1F336; If you want an EXTRA CHALLENGE:
Take a look at the [hvplot reference gallery](https://hvplot.holoviz.org/reference/index.html) to see if there's other changes you want to make to your plot. Some possibilities include:
  * Remove the legend since there's only one data series
  * Increase the figure size
  * Increase the font size
  * Change the colors
  * Use a bar graph instead (usually we use lines for time series, but since this is annual it could go either way)
  * [Add a trend line](https://holoviews.org/reference/elements/bokeh/Slope.html)
  
> HINT: to add a trend line, you will need to `import holoviews as hv` at the beginning of your notebook


YOUR PLOT HEADLINE HERE

In [None]:
# YOUR CODE HERE
raise NotImplementedError()