# Wildfire Data Acquisition
This notebook contains the code for reading in the complete [Wildfire dataset](https://www.sciencebase.gov/catalog/item/61aa537dd34eb622f699df81), which can be retrieved from a US government repository. It also filters the wildfire data to wildfires occuring within TODO

This notebook has dependencies on [Pyproj](https://pyproj4.github.io/pyproj/stable/index.html), the [geojson](https://pypi.org/project/geojson/) module and on the wildfire user module, developed by Dr. David W. McDonald.

### License
This code was adapted from an example developed by Dr. David W. McDonald for use in DATA 512, a course in the UW MS Data Science degree program. This code is provided under the [Creative Commons](https://creativecommons.org) [CC-BY license](https://creativecommons.org/licenses/by/4.0/). Revision 1.1 - August 16, 2024

## Import Libraries

In [1]:
import json

#    The 'wildfire' module is a user module. 
from wildfire.Reader import Reader as WFReader

## Read the raw wildfire data

In [2]:
# Define the filepath location of the raw data
raw_data_filepath = "../data_raw/USGS_Wildland_Fire_Combined_Dataset.json"

#
#    This bit of code opens a new wildfire reader, gets the header information and prints it to the screen
#
print(f"Attempting to open '{raw_data_filepath}' with wildfire.Reader() object")
wfreader = WFReader(raw_data_filepath)


Attempting to open '../data_raw/USGS_Wildland_Fire_Combined_Dataset.json' with wildfire.Reader() object


In [3]:

#    Load the data using the wildfire module, and create a feature_list object containing the loaded data

# We load in the entire dataset to start, which consisted of 135,061 fires
MAX_FEATURE_LOAD = 1000 #TODO change this to the full dataset
feature_list = list()
feature_count = 0

# A rewind() on the reader object makes sure we're at the start of the feature list
# This way, we can execute this cell multiple times and get the same result 
wfreader.rewind()
# Now, read through each of the features, saving them as dictionaries into a list
feature = wfreader.next()
while feature:
    feature_list.append(feature)
    feature_count += 1
    # if we're loading a lot of features, print progress
    if (feature_count % 100) == 0:
        print(f"Loaded {feature_count} features")
    # loaded the max we're allowed then break
    if feature_count >= MAX_FEATURE_LOAD:
        break
    feature = wfreader.next()
#
#    Print the number of items (features) we think we loaded
print(f"Loaded a total of {feature_count} features")
#
#    Just a validation check - did all the items we loaded get into the list?
print(f"Variable 'feature_list' contains {len(feature_list)} features")

Loaded 100 features
Loaded 200 features
Loaded 300 features
Loaded 400 features
Loaded 500 features
Loaded 600 features
Loaded 700 features
Loaded 800 features
Loaded 900 features
Loaded 1000 features
Loaded a total of 1000 features
Variable 'feature_list' contains 1000 features


In [5]:
with open("../data_intermediate/full_wildfires_SMALL.json", "w") as file: #TODO Change this when running the full dataset
    json.dump(feature_list, file, indent=4)