<img style="float: left;" src="earth-lab-logo-rgb.png" width="150" height="150" />

# Earth Analytics Education

## Important  - Assignment Guidelines

1. Before you submit your assignment to GitHub, make sure to run the entire notebook with a fresh kernel. To do this first, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart & Run All)
2. Always replace the `raise NotImplementedError()` code with your code that addresses the activity challenge. If you don't replace that code, your notebook will not run.

```
# YOUR CODE HERE
raise NotImplementedError()
```

3. Any open ended questions will have a "YOUR ANSWER HERE" within a markdown cell. Replace that text with your answer also formatted using Markdown.
4. **DO NOT RENAME THIS NOTEBOOK File!** If the file name changes, the autograder will not grade your assignment properly.

* Only include the package imports, code, and outputs that are required to run your homework assignment.
* Be sure that your code can be run on any operating system. This means that:
   1. the data should be downloaded in the notebook to ensure it's reproducible
   2. all paths should be created dynamically using the `os.path.join`
   3. sort lists of dated files even if they are sorted correctly by default on your machine

## Follow to PEP 8 Syntax Guidelines & Documentation

* Run the `autopep8` tool on all cells prior to submitting (HINT: hit shift + the tool to run it on all cells at once!
* Use clear and expressive names for variables. 
* Organize your code to support readability.
* Check for code line length
* Use comments and white space sparingly where it is needed
* Make sure all python imports are at the top of your notebook and follow PEP 8 order conventions
* Spell check your Notebook before submitting it.

For all of the plots below, be sure to do the following:

* Make sure each plot has a clear TITLE and, where appropriate, label the x and y axes. Be sure to include UNITS in your labels.


### Add Your Name Below 
**Your Name:**

<img style="float: left;" src="colored-bar.png"/>

---

# Review assignment

## Import packages

Import the packages you will need to:
* build cross-platform paths
* find files using a pattern
* download data using earthpy
* create plots
* work with numpy arrays
* work with pandas DataFrames
* use the seaborn plot options (use alias sns)

**BONUS: set the plot theme to the seaborn default**

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Test package imports - DO NOT MODIFY THIS CELL!
import_answer_points = 0

# Imports for testing
import hashlib

try:
    os.getcwd()
    print("\u2705 Great work! The os module has imported correctly!")
    import_answer_points += 1
except NameError:
    print("\u274C Oops make sure that the os package is imported.")
    
try:
    files = glob('~')
    print("\u2705 Great work! The glob module has imported correctly!")
    import_answer_points += 1
except NameError:
    print("\u274C Oops make sure that the glob package is imported.")

try:
    data = et.io
    print("\u2705 Great work! The earthpy package has imported correctly!")
    import_answer_points += 1
except NameError:
    print(("\u274C Oops make sure that the earthpy package is imported "
           "using the alias et."))

try:
    plt.show()
    print("\u2705 Nice! matplotlib.pyplot has been imported as plt!")
    import_answer_points += 1
except NameError:
    print(("matplotlib.pyplot has not been imported as plt, "
           "please make sure to import is properly."))
    
try:
    np.nan
    print("\u2705 Score! Numpy has been imported as a np!")
    import_answer_points += 1
except NameError:
    print(("\u274C Numpy has not been imported as a np, "
           "please make sure to import is properly."))

try:
    no_data = pd.NA
    print("\u2705 Score! Pandas has been imported as a pd!")
    import_answer_points += 1
except NameError:
    print(("\u274C Pandas has not been imported as a pd, "
           "please make sure to import is properly."))

try:
    sns.set_theme()
    print("\u2705 Score! Seaborn has been imported as sns!")
    import_answer_points += 1
except NameError:
    print(("\u274C Seaborn has not been imported as sns, "
           "please make sure to import is properly."))
    
print("\n \u27A1 You received {} out of 6 points.".format(
    import_answer_points))

import_answer_points

## Set up data directory and path and download ca-fires-yearly dataset

We are using the ca-fires-yearly data for this set of exercises. Below, get set up to use this dataset by:
* Creating the earth-analtics/data directory if it does not exist
* Setting the data directory as the working directory
* Downloading the ca-fires-yearly data

Download url: https://ndownloader.figshare.com/files/25033508

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL

# Tests that the working directory is set to earth-analytics/data
# And that the data download directory exists

path = os.path.normpath(os.getcwd())
student_wd_parts = path.split(os.sep)

wd_points = 0

if student_wd_parts[-2:] == ['earth-analytics', 'data']:
    print(("\u2705 Great - it looks like your working directory is set "
           "correctly to .../earth-analytics/data"))
    wd_points += 3
else:
    print(("\u274C Oops, the autograder will not run unless your working "
           "directory is set to earth-analytics/data"))

# Tests that California Fires Dataset is downloaded
ca_fires_yearly_path = os.path.join(
    "earthpy-downloads", 
    "ca-fires-yearly")

if os.path.exists(ca_fires_yearly_path):
    print(("\u2705 Great - it looks like you successfully downloaded the "
           "ca-fires-yearly dataset to .../earthpy-downloads/ca-fires-yearly"))
    wd_points += 2
else:
    print(("\u274C Oops, you still need to download the ca-fires-yearly data"))
    
print(("\n \u27A1 You received {} out of 5 points for setting your working "
       "directory.").format(wd_points))

wd_points

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Challenge 1: Calculate the number of fires between 1992-2015 greater than 100 acres

Simple loop: go through files in the 1992-2015-gt-100-acres directory in your CA yearly fires directory and calculate the number of total fires by adding up the number of fires (rows) in each file

If you are going to calculate a value in a loop - remember to set up a variable and initialize it.

**Call your variable with the number of total fires at the end of the cell**

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL
# Testing to see if the number of fires is an integer

student_num_fires = _
numfires_points = 0

if isinstance(student_num_fires, int):
    print("\u2705 Result is an integer, good job!")
else:
    print("\u274C Result should be an integer.")

ans_hash = hashlib.sha256(
    student_num_fires.to_bytes(16, 'little', signed=False)).hexdigest()
if ans_hash=='68c45f157847777ec1c5b7db7f377cefd044e9db329a31dce4a0e94801a7715d':
    print("\u2705 You correctly computed the number of fires, good job!")
    numfires_points += 5
else:
    print("\u274C That is not the right number of fires.")

print("\n \u27A1 You received {} out of 5 points.".format(
      numfires_points))

numfires_points

## Challenge 2:  Concatenate annual files into single dataframe

Process a series of dataframes and extract the year value from the file name, storing as an **integer column** in the dataframe. **Set the DataFrame Index as the fire unique id (fd_unq_id), and sort by the index.**

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL

student_dataframe_all_fires = _


if isinstance(student_dataframe_all_fires, pd.DataFrame):
    print("\u2705 Object created is a dataframe, good job!")
else:
    print("\u274C Object created is not a dataframe.")

if student_dataframe_all_fires.index.name == 'fd_unq_id':
    print("\u2705 Successfully read in the fire id column as the index!")
else:
    print("\u274C The index is not set to the fire id column.")
    
if pd.api.types.is_integer_dtype(student_dataframe_all_fires.year):
    print("\u2705 Successfully created an integer year column!")
else:
    print("\u274C The year is not an integer column")

In the cell below, answer the following in a **numbered list**:
1. What are the advantages of using the unique id as an index instead of the year as you did in previous assignments?
2. What are the advantages of using sorted year as the index?

YOUR ANSWER HERE

## Challenge 3: Group and aggregate

Use the pandas DataFrame method `.groupby` to take the **yearly maximum** and **monthly mean** fire size, and then call both dataframes, e.g.:

max_fire_size_yearly_df, mean_fire_size_monthly_df

**Use the month_num column to take the monthly mean so that the DataFrame will be in the right order**

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL

# Tests if the results are DataFrames
# Makes sure that the yearly and monthly results are the right length

student_max_yearly, student_mean_monthly = _
summary_pts = 0

if isinstance(student_max_yearly, pd.DataFrame):
    print("\u2705 First object created is a dataframe, good job!")
    summary_pts += 1
else:
    print("\u274C First object created is not a dataframe.")
    

if len(student_max_yearly)==24:
    print("\u2705 First object created is a yearly summary, good job!")
    summary_pts += 3
else:
    print("\u274C First object created is not a yearly summary.") 
    
if isinstance(student_mean_monthly, pd.DataFrame):
    print("\u2705 Second object created is a dataframe, good job!")
    summary_pts += 1
else:
    print("\u274C Second object created is not a dataframe.")
    
if len(student_mean_monthly)==12:
    print("\u2705 Second object created is a monthly summary, good job!")
    summary_pts += 3
else:
    print("\u274C Second object created is not a monthly summary.") 
    
if round(student_max_yearly.fire_size.mean(), 2)==115471.0:
    print("\u2705 First dataframe has the right values!")
    summary_pts += 6
else:
    print("\u274C Check your DataFrame values.")
    
if round(student_mean_monthly.fire_size.mean(), 2)==2349.63:
    print("\u2705 Second Dataframe has the right values!")
    summary_pts += 6
else:
    print("\u274C Check your DataFrame values.")
    
print('You earned {} of 20 points for summarizing your data'
      .format(summary_pts))
summary_pts

## Challenge 4: Add human ignition flag and season fields

Add two columns to your dataframe:
* A boolean `human_ignition` column that is true if the cause is one of Arson, Smoking, Equipment Use, Campfire, Powerline, Railroad, or Debris Burning
* An ordered Categorical `season` column using MAM/JJA/SON/DJF seasons call `pd.Categorical()` on the resulting column. Spring should be the first season

BONUS: Try to create the `season` column by writing your own `month_to_season` function and applying it to the month or month_num column AND/OR create a DateTime column and use the dt.season attribute.

**Call your modified dataframe at the end of the cell**

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL

# Tests if the result is a DataFrame with human_ignition and season columns

student_add_columns_df = _
add_columns_pts = 0

if isinstance(student_add_columns_df, pd.DataFrame):
    print("\u2705 Object is a dataframe, good job!")
    add_columns_pts += 1
else:
    print("\u274C Object is not a dataframe.")
    
if 'human_ignition' in student_add_columns_df.columns:
    print("\u2705 Dataframe has a human_ignition column, good job!")
    add_columns_pts += 2
else:
    print("\u274C Dataframe is missing a human_ignition column.")
    
if 'season' in student_add_columns_df.columns:
    print("\u2705 Dataframe has a season column, good job!")
    add_columns_pts += 2
else:
    print("\u274C Dataframe is missing a season column.")
    
if len(student_add_columns_df)==4101:
    print("\u2705 Dataframe has the right number of rows, good job!")
    add_columns_pts += 5
else:
    print("\u274C Dataframe has the wrong number of rows.")
    
if round(student_add_columns_df.fire_size.mean(), 2)==2995.31:
    print("\u2705 Dataframe has the right values!")
    add_columns_pts += 10
else:
    print("\u274C Check your DataFrame values.")
    
print("You earned {} of 20 points for importing fire data"
      .format(add_columns_pts))
    


## Challenge 5: Making a copy of a dataframe versus a slice

If you want to make changes to the structure of your dataframe and you are only working with a slice, Pandas will give you a warning message. If this is what you truly want to do, you can use the copy command to make a new copy of the subset of rows/columns you are working with. Here we just want to work on a the subset of the fires that are human-caused. Because we created a copy, we can now link in other table data and manipulate it. You can see that this copy is smaller, but it is also a completely new copy of the data. Any changes to human fires will not affect the original dataframe.

Try this out by writing code to do the following:
* Make a copy of just the rows flagged as human_ignition
* Convert fire_name to title case
* Call the first 5 rows of the fire_name column for both the original and the copy and note the difference

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Challenge 6: looping through nested directories

Loop through nested directories to create a data catalog of the "ca-fires-yearly" dataset with the following specifications:
* The data catalog should be a DataFrame with the following columns:
  * The `dataset` column should contain the name of the directory you are searching, such as "monthly-mean-size"
  * The `file_name` column should contain strings of the basename only of each file in alphabetical (and chronological) order
  * The `year` column should be an integer column of the year extracted from the file name
  * The `file_rows` column should contain the number of rows in the corresponding file
  * The `file_columns` columns should contain the number of columns in the corresponding file
  
HINT: There are a couple of ways to create DataFrames from scratch. [Check out the pandas documentation for examples.](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html)

**Call your data catalog dataframe at the end of the cell

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL

# Tests if the result is a DataFrame with the correct columns

student_catalog_df = _
catalog_pts = 0

if isinstance(student_catalog_df, pd.DataFrame):
    print("\u2705 Object is a dataframe, good job!")
    catalog_pts += 1
else:
    print("\u274C Object is not a dataframe.")
    
if all([col in student_catalog_df.columns.values
        for col 
        in ['dataset', 'file_name', 'year', 'file_rows', 'file_columns']]):
    print("\u2705 Dataframe has all the right columns, good job!")
    catalog_pts += 3
else:
    print("\u274C Dataframe is missing required columns.")
    
if len(student_catalog_df)==72:
    print("\u2705 Dataframe has the right number of rows, good job!")
    catalog_pts += 5
else:
    print("\u274C Dataframe has the wrong number of rows.")
    
if round(student_catalog_df.file_rows.mean(), 2)==57.62:
    print("\u2705 Dataframe has the right values!")
    catalog_pts += 6
else:
    print("\u274C Check your DataFrame values.")
    
print("You earned {} of 15 points for importing fire data"
      .format(catalog_pts))
catalog_pts

## Challenge 7: Convert challenge 5 into a function

**Make sure to include a numpy-style docstring**

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Challenge 8: Call create_dataset_catalog

Call the function you just created

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
student_catalog2_df = _
catalog2_pts = 0

if len(student_catalog2_df)==72:
    print("\u2705 Dataframe has the right number of rows, good job!")
    catalog2_pts += 5
else:
    print("\u274C Dataframe has the wrong number of rows.")
    
if round(student_catalog2_df.file_rows.mean(), 2)==57.62:
    print("\u2705 Dataframe has the right values!")
    catalog2_pts += 5
else:
    print("\u274C Check your DataFrame values.")
    
print("You earned {} of 10 points for importing fire data"
      .format(catalog2_pts))
catalog2_pts

## Challenge 9: Call help for catalog function

Call help for the function we just created

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Challenge 10: plot fires by cause (homework 8 challenge 10 revisited

Create a plot using a for loop and conditional statment to the following specifications:
* Filter data to size class 'G' and 2010 or later
* Scatter plot of fire size (y-axis) vs. season (x-axis) labeled by cause
* Colored human causes 'red' and non-human causes 'blue'
* Adjust the size and layout so that labels do not overlap

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Challenge 11: Try plotting using the seaborn stripplot instead
This will let you see overlapping data. Set the hue to human_ignition and dodge to True. You won't need a for loop for this plot.

[Check out the documentation for some differences from matplotlib, including the data parameter](http://seaborn.pydata.org/generated/seaborn.stripplot.html#seaborn.stripplot)

In [None]:
# YOUR CODE HERE
raise NotImplementedError()