This project's goal is to analyze fire data and analyze it with respect to air quality data. We will be looking at the following areas:
- Comparing national fire data with California fire data
- Comparing Californa fire data with California annual air quality data
- Looking at the correlation of counties with worse air quality and fires
- Projecting fires (count, counties most impacted, etc.) for the future
All data is taken from Kaggle. See the Major Findings
below or view the presentation for more details.
- While the number of fires is trending up nationally, the number of fires in California is trending down
- National and California burn duration/year, average acreage and total acrege burned are trending up
- The counties most affected by fires are: Fresno, Lake, San Bernardino, Trinity
- The counties with the worst days of air quality are: Calaveras, Placer, Inyo
- The worst fire counties are not the same as the worst air quality areas
- The air moves based on wind patterns and affects other counties more
- Due to air patterns, the same counties will continue to be affected by poor air quality
- While the number of fires in California may continue to decrease, the intensity of fires in terms of duration and acreage is likely to get worse.
- At a national level, the number of fires, the intensity of fires in terms of duration and acreage is likely to get worse.
The project will meet the following requirements:
- Use Pandas to clean and format your data set(s)
- Create a Jupyter Notebook describing the data exploration and cleanup process
- Create a Jupyter Notebook illustrating the final data analysis
- Use Matplotlib to create a total of 6-8 visualizations of your data (ideally, at least 2 per “question” you ask of your data)
- Save PNG images of your visualizations to distribute to the class and instructional team, and for inclusion in your presentation
- Optionally, use at least one API, if you can find an API with data pertinent to your primary research questions
- Create a write-up summarizing your major findings. This should include a heading for each “question” you asked of your data, and under each heading, a short description of what you found and any relevant plots.
- Be at least 8-10 min. long
- Describe the core message or hypothesis for your project.
- Describe the questions you and your group found interesting, and what motivated you to answer them
- Summarize where and how you found the data you used to answer these questions
- Describe the data exploration and cleanup process (accompanied by your Jupyter Notebook)
- Describe the analysis process (accompanied by your Jupyter Notebook)
- Summarize your conclusions. This should include a numerical summary (i.e., what data did your analysis yield), as well as visualizations of that summary (plots of the final analysis data)
- Discuss the implications of your findings. This is where you get to have an open-ended discussion about what your findings “mean”.
- Tell a good story! Storytelling through data analysis is no different than in literature. Find your narrative and use your analysis and visualization skills to highlight conflict and resolution in your data.
To run this project the following tools are needed:
- Python (tested with v3.85, earlier versions may work as well, but have not been tested)
- Jupyter Notebooks and/or Jupyter Labs
- Clone the respository
- Download fire historical data from 1.88 Million US Wildfires to the
Resources
directory. - Run the
Tools/FireDataExporter.ipynb
from Jupyter Notebook or Jupyter Labs to export the data from SQLLite toFires.csv
file which will contain the subset of data used for analysis - Download and save the CA air quality data to the
Resources
directory. This can be done by running this helper Kaggle notebook. Note the following:- The notebook uses the BigQuery API and public Historical Air Quality data to generate the CSV data
- The query can take a few minutes to complete
- Once it is completed the data used for analysis can be downloaded from the
output folder
in the data menu on the right side
- Open and run
FireAnalysis.ipynb
from Jupyter Notebook or Jupyter Labs to run the fire related analysis. - Open and run
AirQualityCA.ipynb
andCorrelationOfCountiesAndFire2015
from Jupyter Notebook or Jupyter Labs to run the air quality related analysis.
Analysis can be viewed in the individual cells.
- In some cases data from previous Jupyter cells are used to calculate values in the current cell. As such make sure to run all above cells to obtain the correct output
The fire analysis made use of the following concepts and links: