# Climate change is affecting temperatures in locations around the country

In many locations around the U.S. (and the world!), climate change is causing **increased temperatures**. You will be able to see these trends in data from Los Angeles data.

This workflow has been split into 2 parts:
  1. Wrangling the data (This Python Jupyter Notebook!)
  2. Computing a linear trend line and present the results with a plot (A Quarto R notebook)
  
You should be in a group of 4-6 other hackathon participants, and a subgroup of 2-3. If you're reading this notebook in your group, you're working on part 1. With your subgroup, you will complete the following **five steps**:
  1. **Plan** your analysis a full group of 4-6.
  2. **Set up** your analysis by importing any necessary libraries
  3. **Import data** from the National Centers for Environmental Information (NCEI) about annual maximum daily average temperatures in Los Angeles, CA.
  4. **Clean up** the dates of the downloaded data
  5. **Resample** the daily from monthly to annual

*First...you have to discuss the format of your intermediate data products with your group!*

***

## STEP 1: TALK TO THE SUBGROUP THAT WILL USE YOUR DATA!

Your task as the wrangling subgroup is to produce annual mean temperature data for the City of Los Angeles from the supplied raw data file. You will need to work with the subgroup computing a linear trend line. What file format do they want the data in? If they want a CSV file, as we suggest, what columns do they need? What will the column names be? What will the data in each column look like?

You can always re-negotiate, but it's important to stay in communication with other subgroups. In the cell below, write a description of the intermediate data product you will produce.

WRITE YOUR INTERMEDIATE DATA FILE DESCRIPTION HERE

## STEP 2: SET UP

### Python is more powerful with **libraries**

Because Python is **open source**, lots of different people and organizations can contribute (including you!). Many contributions are in the form of external &#128214; [**libraries**, also known as packages](https://www.earthdatascience.org/courses/intro-to-earth-data-science/python-code-fundamentals/use-python-packages/). Since they do not come with a standard Python download, &#128214; [external libraries need to be installed and **imported**](https://www.earthdatascience.org/courses/intro-to-earth-data-science/python-code-fundamentals/use-python-packages/). If you are using the `earth-analytics-python` environment on your computer or GitHub Codespaces, you should have everything you need installed.

### There are excellent `Python` libraries for working with tabular time-series data

For this workflow, you will need the `pandas` library, which helps us to work with &#128214; [**tabular data** such as comma-separated value or csv files](https://www.earthdatascience.org/courses/intro-to-earth-data-science/file-formats/use-text-files/). You can think of tabular data as being like a **spreadsheet or database**. 

> It is customary to import the `pandas` library under its **alias** `pd`, to avoid taking up too much space when you use the library.

When you work with **file paths** in your code, you should also use the `os` library to generate **reproducible, cross-platform** file paths.

### &#128187; YOUR TASK:

Here is some code to import the os and pandas library, but **watch out! - there's a couple mistakes**:

```python
impot os

import pands as pd
```

Using the code above as a starting point, complete the following **two** steps:
  1. Paste the code from above into the **code cell** below:
  2. **Correct the typos** to properly import the pandas library under its **alias** pd as well as the hvplot pandas extension.

In [None]:
# YOUR CODE HERE

The test cell below it will tell you if you completed the task successfully. **Do not try to modify the test cells.** If a test cell isn't working, check that you ran your code immediately before running the test.

In [None]:
# RUN THIS TEST CELL TO CHECK YOUR ANSWER - DO NOT MODIFY
try:
    pd.DataFrame()
    print('\u2705 Great work! '
          'You correctly imported the pandas library.')
except:
    print('\u274C Oops - pandas was not imported correctly.')
    
try:
    os.path.join('data')
    print('\u2705 Great work! '
          'You correctly imported the os library.')
except:
    print('\u274C Oops - os was not imported correctly.')

## STEP 3: IMPORT DATA

### Every day, petabytes of new Earth Observation data are made available online

This workflow uses **annual mean temperature data** from the U.S. National Centers for Environmental Information (NCEI). &#128214; [Check out the NCEI Climate at a Glance website where you can search for more data like this](https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/).

![National Centers for Environmental Information Logo](https://www.nesdis.noaa.gov/s3/migrated/ncei_icon_550px.jpg)

#### &#9998; YOUR TASK:
  1. In the cell below, write a 2-3 sentence description of the data source. You should describe who takes the data, where they were taken, what the maximum temperature units are, and how they are collected.
  2. Include a citation of the data (HINT: NCEI has a section for 'Citing this page', but you will have to select a particular dataset such as City > Time Series).
  3. Add some **context** to your data analysis - research Los Angeles, CA and write a 2-3 sentence **site description**, including a relevant **image**. Don't forget to include the source of your image!
  
> HINT: double-click on the Markdown cells below to modify them.

WRITE YOUR DATA DESCRIPTION HERE

WRITE YOUR DATA CITATION HERE

WRITE YOUR SITE DESCRIPTION HERE

### Use pandas to import the data

The `pandas` library you imported can download data from the internet directly into a type of Python **object** called a `DataFrame`

#### &#128187; Let's fix some code! YOUR TASK:

     Here is some (not very clean) code to download NCEI data using `pandas`:

```python
my_path = os.path.join('data', 'filename.csv')
dataframe = pd.read_csv(my_path, header=2, names=['col_1', 'col_2'])
dataframe
```
  1. Copy the example code from above into the cell below.
  2. Make any changes needed to get this code to run. Here's some hints:
     * Replace `filename.csv` with the actual file name for the data.
     * Modify the value of the `header` parameter so that **only numeric data values** are included in each column
  3. Clean up the code by using **comments**, **expressive variable names**, **expressive column names**, and **PEP-8 compliant code**

> HINT: &#128214; Check out [the pandas read_csv() documentation](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) for more info. You can also try changing putting the code in a code cell below running it with different values. See what happens!

**Make sure to call your `DataFrame` by typing it's name as the last line of your code cell** Then, you will be able to run the test cell below and find out if your answer is correct.

In [None]:
# YOUR CODE HERE

In [None]:
# RUN THIS TEST CELL TO CHECK YOUR ANSWER - DO NOT MODIFY
tmax_df_resp = _

# Check that a DataFrame was called for testing
if isinstance(tmax_df_resp, pd.DataFrame):
    print('\u2705 Great work! You called a DataFrame.')
else:
    print('\u274C Oops - make sure to call your DataFrame for testing.')
    
# Check that the DataFrame has the correct values
summary = [round(val, 2) for val in tmax_df_resp.mean().values]
if summary == [198356.5, 61.88]:
    print('\u2705 Great work! You correctly import LA temperature data.')
else:
    print('\u274C Oops - your data are not correct.')

## STEP 4: CLEAN UP AND WRANGLE

### It's very rare for downloaded data to be formatted *exactly* how we want when it is imported into `Python`

First things first - Take a look at your data. Do you want to use it as is, or does it need to be modified?

![image.png](attachment:image.png)

### &#128187; YOUR TASK:

Below you will find some code that will extract the year from the funky yearmonth value NCEI gives us.

```python
dataframe['date'] = pd.to_datetime(dataframe.year, format='????')
dataframe
```

Complete the following with the code above as a starting point:
1. Copy and paste the code above into the code cell below
2. Replace `dataframe` with the name of **your** dataframe whenever it appears.
3. Replace `????` with the correct **date formatting string**. 
 
 > HINT: Check out [the Python strftime documentation](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior).

In [None]:
# YOUR CODE HERE

In [None]:
# RUN THIS TEST CELL TO CHECK YOUR ANSWER - DO NOT MODIFY
year_df_resp = _

# Check that a DataFrame was called for testing
if isinstance(year_df_resp, pd.DataFrame):
    print('\u2705 Great work! You called a DataFrame.')
else:
    print('\u274C Oops - make sure to call your DataFrame for testing.')
    
# Check that the DataFrame has the correct values
if year_df_resp.date.dtype=='datetime64[ns]':
    print('\u2705 Great work! You correctly created a date column.')
else:
    print('\u274C Oops - you did not create a date column.')

### YOUR TASK: Resample the monthly data to annual data

```python
df = df.set_index('date').resample('PERIOD_ALIAS').mean()
df.year = df.index.year
df
```

1. Copy the example code into the cell below
2. Replace 'PERIOD_ALIAS' with a period alias from [this list](https://pandas.pydata.org/docs/user_guide/timeseries.html#period-aliases) that will return annual values

In [None]:
# YOUR CODE HERE

## STEP 5: EXPORT THE WRANGLED DATA

### YOUR TASK:

This sample code will export your data into a `.csv` file, removing the datetime index (optional - check with your group!):

```python
la_ann_temp_df.to_csv(
    my_annual_path, 
    index=False)
```

  1. Copy the sample code into the cell below.
  2. Change `my_annual_path` to the path your group has agreed to use. Make sure to generate a **reproducible file path** using `os.path.join()`.
  3. Check that your file adheres to the format your group agreed upon.

In [None]:
# YOUR CODE HERE