# Wildfire Incident Data Collection
For wildfire incidence data collection we used [National Interagency Fire Center - WFIGS (Wildland Fire Interagency Geospacial Services Database) ](https://data-nifc.opendata.arcgis.com/datasets/nifc::wfigs-wildland-fire-locations-full-history/explore?filters=eyJDb250cm9sRGF0ZVRpbWUiOls5OTg5MDM4NTU2ODcuNjEsMTY3MzA0NDMyMDAwMF19&location=37.093867%2C-117.175606%2C7.00&showTable=true).
The data collected from WFIGS includes the point locations for all reported wildland fires in the United States.  

From their website: 
> The Wildland Fire Interagency Geospatial Services (WFIGS) Group provides authoritative geospatial data products under the interagency Wildland Fire Data Program. Hosted in the National Interagency Fire Center ArcGIS Online Organization (The NIFC Org), WFIGS provides both internal and public facing data, accessible in a variety of formats.
> This service contains all wildland fire incidents from the IRWIN (Integrated Reporting of Wildland Fire Information) integration service and historical data converted to the IRWIN schema.

In [51]:
import pandas as pd

In [52]:
df = pd.read_csv('../../data/raw/WFIGS_-_Wildland_Fire_Locations_Full_History (3).csv')
df.head()

  df = pd.read_csv('../../data/raw/WFIGS_-_Wildland_Fire_Locations_Full_History (3).csv')


Unnamed: 0,X,Y,OBJECTID,ABCDMisc,ADSPermissionState,CalculatedAcres,ContainmentDateTime,ControlDateTime,DailyAcres,DiscoveryAcres,...,IsDispatchComplete,OrganizationalAssessment,StrategicDecisionPublishDate,CreatedOnDateTime_dt,ModifiedOnDateTime_dt,Source,GlobalID,IsCpxChild,CpxName,CpxID
0,-111.414812,40.072836,7,,DEFAULT,,2019/10/31 16:30:00+00,2019/11/05 18:30:00+00,170.0,0.1,...,0,Type 4 Incident,,2019/10/27 00:14:29+00,2019/11/13 00:15:39+00,IRWIN,{BFD53772-94E7-43F0-9D2C-62444A07CA68},,,
1,-112.439311,34.403275,13,,DEFAULT,,2019/09/09 17:00:00+00,2019/09/09 17:00:00+00,0.1,0.5,...,0,,,2019/09/05 20:14:11+00,2019/09/14 19:28:38+00,IRWIN,{E656CA4D-EECE-4746-AEE3-4D645C4F1F13},,,
2,-108.895411,40.239896,31,,DEFAULT,,2019/07/30 18:00:00+00,2019/08/03 14:00:00+00,90.0,1.0,...,0,,,2019/07/28 22:52:13+00,2019/08/10 18:31:55+00,IRWIN,{50C0D06E-E3DC-4094-BF22-B9D4B7BA68B1},,,
3,-108.552111,38.145376,35,,DEFAULT,,2018/07/28 03:14:59+00,2018/07/28 14:39:59+00,0.1,0.1,...,0,,,2018/07/28 17:50:47+00,2018/07/29 21:56:13+00,IRWIN,{496DBF20-6556-490D-8B22-1E79AEEE74C7},,,
4,-111.348611,33.195755,51,,DEFAULT,,2020/07/23 05:29:59+00,2020/07/23 05:29:59+00,8.0,2.5,...,0,,,2020/07/22 22:56:30+00,2020/08/09 00:10:34+00,IRWIN,{9E1157E6-8784-42A5-9F43-D740C4ED357F},,,


In [53]:
df.shape

(33564, 96)

In [54]:
for i in df.columns:
    print(i)

X
Y
OBJECTID
ABCDMisc
ADSPermissionState
CalculatedAcres
ContainmentDateTime
ControlDateTime
DailyAcres
DiscoveryAcres
DispatchCenterID
EstimatedCostToDate
FinalFireReportApprovedByTitle
FinalFireReportApprovedByUnit
FinalFireReportApprovedDate
FireBehaviorGeneral
FireBehaviorGeneral1
FireBehaviorGeneral2
FireBehaviorGeneral3
FireCause
FireCauseGeneral
FireCauseSpecific
FireCode
FireDepartmentID
FireDiscoveryDateTime
FireMgmtComplexity
FireOutDateTime
FireStrategyConfinePercent
FireStrategyFullSuppPercent
FireStrategyMonitorPercent
FireStrategyPointZonePercent
FSJobCode
FSOverrideCode
GACC
ICS209ReportDateTime
ICS209ReportForTimePeriodFrom
ICS209ReportForTimePeriodTo
ICS209ReportStatus
IncidentManagementOrganization
IncidentName
IncidentShortDescription
IncidentTypeCategory
IncidentTypeKind
InitialLatitude
InitialLongitude
InitialResponseAcres
InitialResponseDateTime
IrwinID
IsFireCauseInvestigated
IsFireCodeRequested
IsFSAssisted
IsMultiJurisdictional
IsReimbursable
IsTrespass
IsUnifi

In [55]:
# Ensuring that the Incident Type is a true "Wildfire" and not "Prescribed Fire" or "Incident Complex"
df['IncidentTypeCategory'].value_counts()

WF    33497
RX       67
Name: IncidentTypeCategory, dtype: int64

In [56]:
df = df[df['IncidentTypeCategory'] == 'WF']

In [57]:
# The investigation is based on the initial reporting, so not going to take into account how many additional acres burned between initial reporting
    # and human intervention. (Although there's clearly a potential for helping a different model in this data)

(df['InitialResponseAcres']>1.0).value_counts()

False    29607
True      3890
Name: InitialResponseAcres, dtype: int64

#### Using these columns, specifically the initial longitude and latitude to call meteorlogical data at the disovery datatime, this investigation will hopfully be able to create a model to predict the target (dailyacres - which actually stands for total acres burned after discovery)

In [58]:
cols = ['ContainmentDateTime', 'ControlDateTime', 'DailyAcres', 'DiscoveryAcres', 'FireCause', 'FireDiscoveryDateTime', 
        'FireOutDateTime', 'IncidentTypeKind', 'InitialLatitude', 'InitialLongitude', 'POOState']

df = df[cols]

Looking at fires since start of 2020

In [59]:
df['FireDiscoveryDateTime'] = pd.to_datetime(df['FireDiscoveryDateTime'])
df = df[df['FireDiscoveryDateTime']>='2020-01-01']
df.sort_values('FireDiscoveryDateTime')
df.reset_index(drop = True, inplace = True)

In [60]:
df.shape

(22799, 11)

Add id column to sync it with api results:

In [61]:
df['id'] = df.index

In [62]:
df.to_csv('../../data/processed/initial_wildfire.csv', index=False)