<img style="float: left;" src="earth-lab-logo-rgb.png" width="150" height="150" />

# Earth Analytics Education - Bootcamp Course Fall 2020

## Important  - Assignment Guidelines

1. Before you turn in your assignment, make sure to run the entire notebook with a fresh kernel. To do this first, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart & Run All)
2. In the cells below you will replace the `raise NotImplementedError()` code with your code that addresses the activity challenge. If you don't replace that code, your notebook will not run properly.

```
# YOUR CODE HERE
raise NotImplementedError()
```

3. Any open ended questions will have a "YOUR ANSWER HERE" within a markdown cell. Replace that text with your answer also formatted using Markdown.
4. **IMPORTANT: DO NOT RENAME THIS NOTEBOOK!** If the file name changes, the autograder will not grade your assignment properly.
5. **Do not rename the notebook file.** If you do, the autograder will not recognize your submisson.
6. When you plot, please comment out `plt.show()` as the code below will effectively run `plt.show()` for you and also will grab your plot for autograding. DO NOT DELETE any code that says `DO NOT REMOVE LINE BELOW`. That code is for autograding!!

```
### DO NOT REMOVE LINE BELOW ###
student_plot1_ax = nb.convert_axes(plt)
```



## Follow to PEP 8 Syntax Guidelines

* Run the `autopep8` tool on all cells prior to submitting (HINT: hit shift + the tool to run it on all cells at once!
* Use clear and expressive names for variables. 
* Organize your code to support readability.
* Check for code line length
* Use comments and white space sparingly where it is needed


### Add Your Name Below 
**Your Name:**

<img style="float: left;" src="colored-bar.png"/>

---

# Week 9 Homework Template - Modular Code and Functions

To complete assignment 9, be sure to <a href="https://www.earthdatascience.org/courses/intro-to-earth-data-science/write-efficient-python-code/functions-modular-code/" target="_blank">review  the chapter on functions in the Intro to Earth Data Science online textbook</a> online textbook.

## Assignment Data

For this assignment, you will write **Python** pseudo code, and later **Python** functions, to download and work with multiple datasets. The primary dataset you will use can be downloadoed using **this url: https://ndownloader.figshare.com/files/25033508**: 

In this download, you'll see three directories with files containing data by year:

* ca-fires-yearly/monthly-fire-count
    * The dataset contains the total number of fires (greater than 100 acres) that occurred in each month and year in California between 1992 and 2015. The data are organized with a file for each year from 1992 to 2015.
* ca-fires-yearly/monthly-mean-size
    * The dataset contains the mean fire size (acres) of all fires greater than 100 acres for each month and year in California between 1992 and 2015. The data are organized with a file for each year from 1992 to 2015
* ca-fires-yearly/1992-2015-gt-100-acres
    * The dataset contains the cause of all fires greater than 100 acres in California between 1992 and 2015. The data are organized with a file for each year from 1992 to 2015

Other data used includes average monthly maximum temperatures (Fahrenheit) for the Sonoma County Airport and San Diego area in California between 1999 and 2018 provided by <a href="https://w2.weather.gov/climate/xmacis.php?wfo=mtr" target="_blank">the National Weather Service</a>.

In [None]:
# Core imports needed for grading - Do not modify this cell!
import matplotcheck.notebook as nb
from matplotcheck.base import PlotTester

## Practice Pseudo-coding

In the Markdown cell below, answer the following questions using any kind of Markdown list.

For this assignment, you will need to:

1. open up a group of `.csv` files in a directory, 
2. extract the year from the `.csv` file name and add it to a new column in the `DataFrame`
3. store each DataFrame into a list object using `obj-name.append()`
4. create a single pandas `DataFrame` containing all of the data in all .csv files.
5. plot the data in that `DataFrame`. 

Write  pseudo code that outlines the steps needed to complete the task above. 
Do not use actual code to complete this task. Rather, write down each step using a 
markdown list and using plain english words. 

As you are writing your pseudocode, consider how you could make each part of the workflow
generic enough to accept any set of `.csv` files that could be opened using Pandas.

YOUR ANSWER HERE

<img style="float: left;" src="colored-bar.png"/>

## Import Python Packages

In the cell below, add code **after the line for `Your Code Here`**, replacing `raise NotImplementedError()` with your code to import the package/module needed to:
* create plots
* set your working directory
* download data using earthpy functions
* work with numpy arrays

Be sure to list the package imports following the appropriate PEP 8 order and 
spacing requirements. 

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Test package imports - DO NOT MODIFY THIS CELL!
import_answer_points = 0

try:
    pd.NA
    print("\u2705 Score! Pandas has been imported as a pd!")
    import_answer_points += 1
except NameError:
    print("\u274C Pandas has not been imported as a pd, please make sure to import is properly.")

try:
    glob('~')
    print("\u2705 Cool! Glob was imported from glob!")
    import_answer_points += 1
except NameError:
    print("\u274C Glob has not been imported from glob, please make sure to import this properly.")

try:
    plt.show()
    print("\u2705 Nice! matplotlib.pyplot has been imported as plt!")
    import_answer_points += 1
except NameError:
    print("matplotlib.pyplot has not been imported as plt, please make sure to import this properly.")

try:
    os.getcwd()
    print("\u2705 Great work! The os module has imported correctly!")
    import_answer_points += 1
except NameError:
    print("\u274C Oops make sure that the os package is imported.")

try:
    data = et.io
    print("\u2705 Score! The earthpy package has imported correctly!")
    import_answer_points += 1
except NameError:
    print("\u274C Oops make sure that the earthpy package is imported using the alias et.")

print("\n \u27A1 You received {} out of 5 points.".format(import_answer_points))

import_answer_points

## Set Working Directory

In the cell below complete the following task:

* **Use a conditional statement** to:
    * Set the working directory to the **`earth-analytics/data` directory in your home directory** if the path exists.
    * Print a helpful message if the path does not exist. 
* **Use reusable variable(s) to reduce repetition in your code.**
* Use the `os` package to ensure that the paths you create will run successfully on any operating system.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Download Data Using EarthPy

In the cell below use **EarthPy** to download **this url: https://ndownloader.figshare.com/files/25033508**: 

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL
# Tests that the working directory is set to earth-analytics/data

path = os.path.normpath(os.getcwd())
student_wd_parts = path.split(os.sep)

wd_points = 0

if student_wd_parts[-2:] == ['earth-analytics', 'data']:
    print("\u2705 Great - it looks like your working directory is set correctly to ~/earth-analytics/data")
    wd_points += 5
else:
    print("\u274C Oops, the autograder will not run unless your working directory is set to earth-analytics/data")

print("\n \u27A1 You received {} out of 5 points for setting your working directory.".format(
    wd_points))
wd_points

<img style="float: left;" src="colored-bar.png"/>

## Function Challenge 1: Define Function to Import Data into Pandas

In the cell below create a function that opens and converts a single csv file into a 
Pandas DataFrame with a Year column. The function should have the following properties.

1. Parameters:
    * The function takes in a file path pointing to a single `.csv` file. 
    * The function also takes in a tuple of the index positions you expect the year data to be within the file path string. For example, in the path `folder/1996_data.csv` the tuple could either be (7, 11) or (-13, -9). 
2. Returns:
    * The function should return a **Pandas** DataFrame with the data from the `.csv` file, indexed on the year extracted from the file path.

Be sure to include a docstring with a brief description of the function (i.e. how it works, purpose) as well as identify the input parameters (i.e. type, description) and the returned output (i.e. type, description). Below is an example of how to format your docstring. 

```
def function(parameter1):
    """Description of function and
    what it does.
    
    Parameters
    ----------
    parameter1 : data_type_of_parameter1
        Description of parameter1
    
    Returns
    -------
    returned_data : data_type_of_returned_data
        Description of returned_data
    """
```

In [None]:
def import_csv(file_path, year_index):
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL
import_works = False

# Create path to a single file as a test
single_file = os.path.join("earthpy-downloads",
                           "ca-fires-yearly",
                           "1992-2015-gt-100-acres",
                           "gt-100-acres-1993.csv")

try:
    output_df = import_csv(single_file, year_index=(-8, -4))
    # If the above is a dataframe, the line below should run
    output_df.columns
    print("\u2705 Great! Your 'import_csv' function returns a DataFrame! I didn't test that the dataframe also has a year column in it so please make sure it does have a year column to get full credit for this challenge when the autograder runs!")
    import_works = True
except NameError:
    print("\u274C Oops! Your import_csv function should return a dataframe. ")

In [None]:
# DO NOT MODIFY THIS CELL

## Call Help on Your Custom Function

In the cell below, call help on your custom function to see what it returns. If you properly formatted the docstring, you should expect to see a very useful print out of the descriptive text you added in the docstring!

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Function Challenge 1: Use Your Function in a Workflow

Use a `for` loop to open and add the data in the directory:

`earthpy-downloads/ca-fires-yearly/monthly-fire-count`

to a new DataFrame following the same steps that you completed last week, but implementing your function.

1. loop through each `.csv` file sorted by name.
2. put the `.csv` file into your function and add the return from your function to a list. 
3. Outside of the for loop, combine all of the files into a single DataFrame and set the year column to be an index. 

Call the new DataFrame object at the end of the cell.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL

<img style="float: left;" src="colored-bar.png"/>

## Function Challenge 2: Reuse Your Function

One of the benefits of functions is being able to use them in multiple areas of your code. Repeat the exercise above, but with the data in the fire size directory.

`earthpy-downloads/ca-fires-yearly/monthly-mean-size`

Once again, you will:

1. loop through each `.csv` file sorted by name.
2. put the `.csv` file into your function and add the return from your function to a list. 
3. outside of the for loop, combine all of the files into a single DataFrame and set the year column to be an index. 

Call the new `DataFrame` object at the end of the cell.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL

<img style="float: left;" src="colored-bar.png"/>

## Function Challenge 3: Create a Function The Loops Through A Directory Of .csv Files

You may have noticed that a lot of the code that you use to loop through the directories 
and open `.csv` files is repeated. You can also make that entire loop workflow into a 
function that calls the function that you wrote above to open a `.csv` file. In the cell below create a function with the following properties.

1. Parameters:
    * The function takes in an unsorted list of `.csv` files (such as the output from a `glob` operation).
    * The function also takes in a tuple of the index positions you expect the year data to be within the file path string. For example, in the path `folder/1996_data.csv` the tuple could either be (7, 11) or (-13, -9). 
2. Returns:
    * The function should return a **Pandas** DataFrame that contains all of the data stored within that folder, and is indexed on the year column with the data you extracted from the file path. 
3. Other Requirements:
    * The function should call your previous function to open up the individual `.csv` files inside the `for` loop. 
    * Your function needs a numpy-style docstring following the same format outlined above for your first function. 

In [None]:
def process_csv_files(file_path_list, year_index):
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL
process_function_points = 0
process_works = True
single_dir = os.path.join("earthpy-downloads",
                          "ca-fires-yearly",
                          "1992-2015-gt-100-acres")

two_files = sorted(glob(os.path.join(single_dir, "*.csv")))[0:2]
two_files

try:
    out_df = process_csv_files(two_files, (-8, -4))
    out_df.columns
    process_works = True
    print("\u2705 Function 'process_csv_files' returns a dataframe!")
    process_function_points += 5
    if len(out_df) == 424:
        print("\u2705 Great! It looks like your function combines csv files correctly!")
        process_function_points += 5
    else:
        print("\u274C The function doesn't seem to combine csv files correctly.")

except:
    print("\u274C Oops! It looks like your function doesn't return a dataframe.")

print("\n \u27A1 You received {} out of 10 points for your function being defined properly. Please check that it also returns a year column.".format(
    process_function_points))
process_function_points

## Call Help on your Custom Function

In the cell below, call help on your custom function to see what it returns.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

<img style="float: left;" src="colored-bar.png"/>

## Function Challenge 4: Create a DataFrame Using Your Function

In the cell below, use `glob` to get all of the `.csv` files from 

`earthpy-downloads/ca-fires-yearly/1992-2015-gt-100-acres`

Put your **unsorted** output from glob into your function and output the combined `DataFrame`.

Call the output from your function the end of the cell.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL

<img style="float: left;" src="colored-bar.png"/>

## Function Challenge 5 - Loop Through Nested Directories 

In the next section of this assignment, you will implement functions 
using a set of nested directories. To begin, download the data needed 
from **this url: https://ndownloader.figshare.com/files/21894528** using earthpy
in the cell below.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Loop Through Nested Directories

Sometimes you have a nested data directory structure that you need to process. 

The data you just downloaded contains a folder named `avg-monthly-temp-fahr` which 
holds two other folders, `San-Diego` and `Sonoma`. Those folders contain `.csv` 
files for each year from 1999 to 2003 that has the monthly maximum temperature for 
each location. 

Write a function that accepts the path to the parent folder, `avg-monthly-temp-fahr`, 
and creates a single **Pandas** DataFrame with all the data in both the `San-Diego` 
and `Sonoma` folders for monthly maximum temperature. The year and the location (Sonoma 
vs. San-Diego) should be added as new columns in the `DataFrame` using the year
in the file name and the location in each directory.

In the cell below create a function with the following properties.

1. Parameters:
    * The function takes in folder name to be looped through.
    * The function also takes in a tuple of the index positions you expect the year data to be within the file path string. For example, in the path `folder/1996_data.csv` the tuple could either be (7, 11) or (-13, -9). 
2. Returns:
    * The function should return a **Pandas** DataFrame that contains all of the data stored within that folder, and is indexed on the year column with the data you extracted from the file path. 
3. Other Requirements:
    * The function should use the function that you created above (`process_csv_files()` takes in a list of files to open in a for loop. 
    * Your function needs a numpy-style docstring 

In [None]:
def process_nested_csv_files(folder_name, year_index):
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL

nested_function_points = 0
nested_works = False

a_dir = os.path.join(
    "earthpy-downloads",
    "avg-monthly-temp-fahr")

try:
    out_df_challenge_5 = process_nested_csv_files(a_dir, (-8,-4))
    out_df_challenge_5.columns
    nested_works = True
    print("\u2705 Great your function 'process_nested_csv_files' returns a dataframe!")
    nested_function_points += 5
except NameError:
    print("\u274C Oops, make sure your function returns a dataframe or something else went wrong.")

print("\n \u27A1 You received {} out of 5 points for your function being defined properly.".format(
    nested_function_points))
nested_function_points

## Call Help on your Custom Function

In the cell below, call help on your custom function to see what it returns.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

<img style="float: left;" src="colored-bar.png"/>

## Function Challenge 6: Create a Combined DataFrame with your Function

Run your function on the `avg-monthly-temp-fahr` folder that was downloaded earlier. This should return a **pandas** DataFrame that has all the data from both folders in it, as well as the index set to the year and the location added as a new column to the DataFrame.
 
Call the output from your function the end of the cell.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL

## Pep 8, Spelling and Does the Notebook Run?
In this cell, we will give you points for the following

1. PEP 8 is followed throughout the notebook (4 points)
2. Spelling and grammar are considered in your written responses above (4 points)
3. The notebook runs from top to bottom without any editing (it is reproducible) - 4 points