## 🔗 Merging the NOAA (National Oceanic and Atmospheric Administration) Dataset with FRAP (Fire and Resource Assessment Program) Dataset

### 📝 Overview of the Merge Process
The goal is to merge the **NOAA Weather Dataset** with the **California Wildfire Data** (FRAP dataset). This allows us to combine weather variables with wildfire-specific information, enhancing our ability to analyze the relationship between weather conditions and wildfire occurrences.

### 🔑 Columns Being Merged

#### **NOAA Weather Dataset Columns**:
- **STATION**: Unique identifier for the weather station.
- **LATITUDE**: Latitude of the weather station.
- **LONGITUDE**: Longitude of the weather station.
- **DATE**: Observation date.

These columns provide **geospatial and temporal** data that help us match weather conditions to specific fire events.

#### **California Wildfire Data (FRAP) Columns**:
- **LATITUDE**: Latitude of the wildfire location.
- **LONGITUDE**: Longitude of the wildfire location.
- **DISCOVERY_DATE**: Date the fire was discovered.
- **FIRE_SIZE**: The size of the wildfire (in acres).
- **FIRE_SIZE_CLASS**: A classification of the wildfire's size.
- **NWCG_CAUSE_CLASSIFICATION**: Classification of the cause of the fire.
- **COUNTY**: The county where the fire occurred.

These columns provide **wildfire location and size** data, essential for analyzing how weather patterns correlate with wildfire behavior.



In [1]:
import pandas as pd
import numpy as np
from math import radians, cos, sin, asin, sqrt
from sklearn.neighbors import BallTree

In [2]:
import pandas as pd

# Load the FRAP (Wildfire Data) and NOAA (Weather Data) datasets
frap_df = pd.read_csv("C:\\Users\\annis\\Project dsc 550\\Fire_Occurrence_Database.csv")
noaa_df = pd.read_csv("C:\\Users\\annis\\Project dsc 550\\Weather_Database.csv")


In [3]:
# Check unique values for LATITUDE, LONGITUDE, and DATE in both datasets
print("FRAP Dataset - Unique LATITUDE, LONGITUDE, and DISCOVERY_DATE:")
print(frap_df[['LATITUDE', 'LONGITUDE', 'DISCOVERY_DATE']].nunique())

print("\nNOAA Dataset - Unique LATITUDE, LONGITUDE, and DATE:")
print(noaa_df[['LATITUDE', 'LONGITUDE', 'DATE']].nunique())

FRAP Dataset - Unique LATITUDE, LONGITUDE, and DISCOVERY_DATE:
LATITUDE          114376
LONGITUDE         113854
DISCOVERY_DATE     10178
dtype: int64

NOAA Dataset - Unique LATITUDE, LONGITUDE, and DATE:
LATITUDE         7
LONGITUDE        7
DATE         10593
dtype: int64


### Merging both FRAP (Fire Records) and NOAA (Weather Stations) Datasets 

In [4]:
# Check the range of LATITUDE and LONGITUDE in both datasets
print("FRAP Dataset LATITUDE Range:", frap_df['LATITUDE'].min(), "to", frap_df['LATITUDE'].max())
print("FRAP Dataset LONGITUDE Range:", frap_df['LONGITUDE'].min(), "to", frap_df['LONGITUDE'].max())

print("\nNOAA Dataset LATITUDE Range:", noaa_df['LATITUDE'].min(), "to", noaa_df['LATITUDE'].max())
print("NOAA Dataset LONGITUDE Range:", noaa_df['LONGITUDE'].min(), "to", noaa_df['LONGITUDE'].max())

FRAP Dataset LATITUDE Range: 32.5374061 to 42.00823
FRAP Dataset LONGITUDE Range: -124.402883 to -114.13751147

NOAA Dataset LATITUDE Range: 18.6 to 37.28597
NOAA Dataset LONGITUDE Range: -120.51788 to -97.2667


In [5]:
# Extract unique stations with their LATITUDE and LONGITUDE
unique_stations = noaa_df[['NAME', 'LATITUDE', 'LONGITUDE']].drop_duplicates().reset_index(drop=True)
print(unique_stations)

                                       NAME   LATITUDE   LONGITUDE
0      FRESNO YOSEMITE INTERNATIONAL, CA US  36.779990 -119.720160
1  LOS ANGELES INTERNATIONAL AIRPORT, CA US  33.938160 -118.386600
2                SAN BERNARDINO LAGUNAS, MX  18.600000  -97.266700
3        RIVERSIDE MUNICIPAL AIRPORT, CA US  33.952820 -117.435230
4              SAN BERNARDINO 5.1 NW, CA US  34.195019 -117.350246
5           MERCED MUNICIPAL AIRPORT, CA US  37.285970 -120.517880
6    SAN DIEGO INTERNATIONAL AIRPORT, CA US  32.733600 -117.183100


In [6]:
# FRAP
frap_df['DISCOVERY_DATE'] = pd.to_datetime(frap_df['DISCOVERY_DATE'])
# NOAA
noaa_df['DATE'] = pd.to_datetime(noaa_df['DATE'])

In [7]:
# Haversine distance
noaa_coords = np.radians(noaa_df[['LATITUDE', 'LONGITUDE']])
frap_coords = np.radians(frap_df[['LATITUDE', 'LONGITUDE']])

In [8]:
tree = BallTree(noaa_coords, metric='haversine')

In [9]:
dist, idx = tree.query(frap_coords, k=1)  # find nearest NOAA station

In [12]:
frap_df['nearest_station_id'] = noaa_df.iloc[idx.flatten()]['STATION'].values

In [13]:
frap_df['DISCOVERY_DATE'] = frap_df['DISCOVERY_DATE'].dt.date
noaa_df['DATE'] = noaa_df['DATE'].dt.date

# Merge datasets
merged_df = pd.merge(
    frap_df,
    noaa_df,
    left_on=['nearest_station_id', 'DISCOVERY_DATE'],
    right_on=['STATION', 'DATE'],
    how='left'
)


In [40]:
merged_df.head()

Unnamed: 0,DISCOVERY_DATE,DISCOVERY_TIME,FIRE_YEAR,LATITUDE_x,LONGITUDE_x,STATE,FIRE_SIZE,FIRE_SIZE_CLASS,NWCG_CAUSE_CLASSIFICATION,NWCG_GENERAL_CAUSE,...,DATE,AWND,PGTM,PRCP,TMAX,TMIN,WDF2,WDF5,WSF2,WSF5
0,2005-02-02,13:00:00,2005,40.036944,-121.005833,CA,0.1,A,Human,Power generation/transmission/distribution,...,2005-02-02,3.13,123.0,0.0,60.0,35.0,110.0,110.0,8.9,13.0
1,2004-05-12,08:45:00,2004,38.933056,-120.404444,CA,0.25,A,Natural,Natural,...,2004-05-12,8.05,1155.0,0.0,80.0,49.0,320.0,340.0,17.0,21.0
2,2004-05-31,19:21:00,2004,38.984167,-120.735556,CA,0.1,A,Human,Debris and open burning,...,2004-05-31,6.26,1708.0,0.0,94.0,56.0,320.0,310.0,15.0,17.0
3,2004-06-28,16:00:00,2004,38.559167,-119.913333,CA,0.1,A,Natural,Natural,...,2004-06-28,6.93,1812.0,0.0,96.0,62.0,270.0,250.0,15.0,19.9
4,2004-06-28,16:00:00,2004,38.559167,-119.933056,CA,0.1,A,Natural,Natural,...,2004-06-28,6.93,1812.0,0.0,96.0,62.0,270.0,250.0,15.0,19.9


### **Data Merging Process**

In this analysis, we aim to combine two datasets: **FRAP (Fire Records)** and **NOAA (Weather Stations)**, to correlate wildfire data with weather data from nearby stations. To achieve this, we took the following approach:

#### Why This Approach Was Chosen:

1. **Geospatial Matching (BallTree Algorithm)**:
   - The first step was to find the closest weather station (from NOAA) to each wildfire record (from FRAP) based on the geographic coordinates (latitude and longitude) of both.
   - We used the **BallTree** algorithm from `sklearn.neighbors` for efficient nearest-neighbor search, specifically using the **Haversine distance metric**, which is ideal for calculating distances on the Earth's surface (taking into account the spherical nature of the Earth).
   - By converting the coordinates into radians, we ensured the BallTree algorithm could accurately compute distances using the Haversine formula.
   - **Benefit**: This method allowed us to efficiently match each wildfire record with its nearest weather station, ensuring that the weather data applied to each wildfire was the most relevant in terms of proximity.

2. **Date Matching**:
   - Once we found the closest weather station, we needed to match the wildfire's discovery date (from FRAP) with the weather data available on that same date (from NOAA). 
   - We converted both the **FRAP 'DISCOVERY_DATE'** and **NOAA 'DATE'** to a common format (date-only) to ensure they could be matched accurately.
   - **Benefit**: Matching by date ensures that we are correlating the fire events with the most recent and relevant weather data available at the time of the fire discovery.

3. **Merge Operation**:
   - After identifying the nearest station and matching the dates, we performed a **left merge** of the two datasets on the `nearest_station_id` and `DISCOVERY_DATE` for FRAP, and `STATION` and `DATE` for NOAA.
   - **Benefit**: The merge ensures that for each fire record, we can attach the corresponding weather information from the nearest station on the same date, enabling a robust analysis of how weather might have influenced the fire.

#### Drawbacks and Considerations:

1. **Accuracy of Geospatial Matching**:
   - **Drawback**: Using the nearest station might not always result in the most accurate weather data, especially if the fire occurred in a geographically diverse area where weather conditions may vary significantly across short distances.
   - For example, a fire in a mountainous area may be influenced by different weather patterns than what is recorded at the nearest station in a valley.
   - **Solution**: Although we used the nearest station, we acknowledge that this method might introduce some level of inaccuracy, and future improvements could involve using more advanced geospatial models that account for topography and microclimates.

2. **Date Alignment**:
   - **Drawback**: Sometimes, there could be slight mismatches between the dates of the fire and the weather data due to gaps in weather station reporting or the time lag in data availability.
   - **Solution**: We dealt with this by ensuring both date columns were formatted properly (date-only) and handling any discrepancies by using a **left merge** so that all fire records would retain their information, even if the weather data for that date was missing. Missing weather data would be marked as `NaN`, allowing us to investigate gaps further if needed.

3. **Data Volume and Complexity**:
   - **Drawback**: If the datasets are large, performing this geospatial matching and merging can become computationally expensive and time-consuming, especially with a large number of stations and fire records.
   - **Solution**: The use of efficient algorithms like **BallTree** helps mitigate some of the performance concerns, but it's important to monitor performance, especially when dealing with massive datasets.

#### Reason:

This approach was chosen because it strikes a balance between computational efficiency and ensuring meaningful, real-world matching of wildfire events with weather data. The **BallTree algorithm** provided an effective method for geospatial matching, and date-based merging helped ensure the accuracy of the data being used. While there are potential drawbacks, especially related to geographic mismatches and data availability, this method is scalable and flexible for future improvements.


In [16]:
merged_df.isnull().sum()

DISCOVERY_DATE                   0
DISCOVERY_TIME                   0
FIRE_YEAR                        0
LATITUDE_x                       0
LONGITUDE_x                      0
STATE                            0
FIRE_SIZE                        0
FIRE_SIZE_CLASS                  0
NWCG_CAUSE_CLASSIFICATION        0
NWCG_GENERAL_CAUSE               0
COUNTY                           0
nearest_station_id               0
STATION                      49850
NAME                         49850
LATITUDE_y                   49850
LONGITUDE_y                  49850
ELEVATION                    49850
DATE                         49850
AWND                         49850
PGTM                         49850
PRCP                         49850
TMAX                         49850
TMIN                         49850
WDF2                         49850
WDF5                         49850
WSF2                         49850
WSF5                         49850
dtype: int64

### Handling Missing Values in Merged Dataset

After merging the **FRAP** (Fire Records) and **NOAA** (Weather Stations) datasets, we encountered a substantial number of missing values in certain columns, particularly in the weather-related fields for some wildfire records. These missing values primarily stemmed from the fact that some fire records did not have a corresponding weather station that fell within the specified radius for matching.

#### Columns with Missing Values:
- **STATION**
- **NAME**
- **LATITUDE_y**
- **LONGITUDE_y**
- **ELEVATION**
- **DATE**
- **AWND**
- **PGTM**
- **PRCP**
- **TMAX**
- **TMIN**
- **WDF2**
- **WDF5**
- **WSF2**
- **WSF5**

These columns represent essential weather data attributes, such as the station's name, coordinates, elevation, and weather metrics like wind speed and temperature. The missing values in these columns indicate that there were no valid matches for these fire records within the geographical radius of any NOAA station.

#### Why These Values are Missing:
1. **Geospatial Mismatch**: The nearest NOAA station for certain wildfire records was outside the radius that we used for the nearest-neighbor search. As a result, no weather data was found for these records.
2. **Date Mismatch**: In addition to geospatial mismatches, some of the fire records did not have corresponding weather data available on the same date, resulting in missing weather values.

#### Decision to Drop These Rows:
After reviewing the data and recognizing that these records lacked the essential weather information needed for analysis, we decided to **drop** the rows containing missing values. This decision was made because:
- The missing data would have introduced inconsistencies into the analysis, potentially skewing results if we attempted to impute or estimate the missing values.
- Without valid weather data, these records would not provide meaningful insights into the relationship between fire events and weather patterns, which is central to our analysis.

#### Outcome:
By dropping the rows with missing values, we ensured that the merged dataset contains only the records with both fire event and relevant weather data. This allows for more reliable and accurate analysis moving forward, especially when investigating the influence of weather on wildfire occurrences.


In [18]:
merged_df.shape

(251678, 27)

In [41]:
# List of columns you want to drop rows with missing values
columns_to_check = ['STATION', 'NAME', 'LATITUDE_y', 'LONGITUDE_y', 'ELEVATION', 
                    'DATE', 'AWND', 'PGTM', 'PRCP', 'TMAX', 'TMIN', 'WDF2', 'WDF5', 
                    'WSF2', 'WSF5']

# Drop rows with missing values in the specified columns
merged_df_cleaned = merged_df.dropna(subset=columns_to_check)

# Check the result
print(merged_df_cleaned.isnull().sum())

DISCOVERY_DATE               0
DISCOVERY_TIME               0
FIRE_YEAR                    0
LATITUDE_x                   0
LONGITUDE_x                  0
STATE                        0
FIRE_SIZE                    0
FIRE_SIZE_CLASS              0
NWCG_CAUSE_CLASSIFICATION    0
NWCG_GENERAL_CAUSE           0
COUNTY                       0
nearest_station_id           0
STATION                      0
NAME                         0
LATITUDE_y                   0
LONGITUDE_y                  0
ELEVATION                    0
DATE                         0
AWND                         0
PGTM                         0
PRCP                         0
TMAX                         0
TMIN                         0
WDF2                         0
WDF5                         0
WSF2                         0
WSF5                         0
dtype: int64


In [45]:
merged_df_cleaned.shape

(201828, 27)

In [42]:
# Filter for the specific station name
filtered_df = merged_df[merged_df['NAME'] == 'RIVERSIDE MUNICIPAL AIRPORT, CA US']
filtered_df[['COUNTY', 'STATION', 'NAME']].value_counts()


COUNTY  STATION      NAME                              
65      USW00003171  RIVERSIDE MUNICIPAL AIRPORT, CA US    16952
59      USW00003171  RIVERSIDE MUNICIPAL AIRPORT, CA US      990
37      USW00003171  RIVERSIDE MUNICIPAL AIRPORT, CA US      901
71      USW00003171  RIVERSIDE MUNICIPAL AIRPORT, CA US      542
73      USW00003171  RIVERSIDE MUNICIPAL AIRPORT, CA US      356
Name: count, dtype: int64

In [44]:
# Save the cleaned DataFrame into a CSV file
merged_df_cleaned.to_csv('merged_df_cleaned.csv', index=False)

In [26]:
# Filter for the specific station name
filtered_df1 = merged_df[merged_df['NAME'] == 'SAN DIEGO INTERNATIONAL AIRPORT, CA US']

filtered_df1[['COUNTY', 'STATION', 'NAME']].value_counts()

COUNTY  STATION      NAME                                  
73      USW00023188  SAN DIEGO INTERNATIONAL AIRPORT, CA US    11739
25      USW00023188  SAN DIEGO INTERNATIONAL AIRPORT, CA US      803
65      USW00023188  SAN DIEGO INTERNATIONAL AIRPORT, CA US      544
Name: count, dtype: int64

In [27]:
# Filter for the specific station name
filtered_df1 = merged_df[merged_df['NAME'] == 'FRESNO YOSEMITE INTERNATIONAL, CA US']

filtered_df1[['COUNTY', 'STATION', 'NAME']].value_counts()

COUNTY  STATION      NAME                                
19      USW00093193  FRESNO YOSEMITE INTERNATIONAL, CA US    11264
107     USW00093193  FRESNO YOSEMITE INTERNATIONAL, CA US     9581
39      USW00093193  FRESNO YOSEMITE INTERNATIONAL, CA US     6065
79      USW00093193  FRESNO YOSEMITE INTERNATIONAL, CA US     4509
29      USW00093193  FRESNO YOSEMITE INTERNATIONAL, CA US     2904
51      USW00093193  FRESNO YOSEMITE INTERNATIONAL, CA US     1253
27      USW00093193  FRESNO YOSEMITE INTERNATIONAL, CA US     1162
53      USW00093193  FRESNO YOSEMITE INTERNATIONAL, CA US      761
31      USW00093193  FRESNO YOSEMITE INTERNATIONAL, CA US      419
83      USW00093193  FRESNO YOSEMITE INTERNATIONAL, CA US      145
43      USW00093193  FRESNO YOSEMITE INTERNATIONAL, CA US      103
69      USW00093193  FRESNO YOSEMITE INTERNATIONAL, CA US       28
109     USW00093193  FRESNO YOSEMITE INTERNATIONAL, CA US       13
Name: count, dtype: int64

In [28]:
# Filter for the specific station name
filtered_df1 = merged_df[merged_df['NAME'] == 'LOS ANGELES INTERNATIONAL AIRPORT, CA US']

filtered_df1[['COUNTY', 'STATION', 'NAME']].value_counts()

COUNTY  STATION      NAME                                    
37      USW00023174  LOS ANGELES INTERNATIONAL AIRPORT, CA US    10328
29      USW00023174  LOS ANGELES INTERNATIONAL AIRPORT, CA US     2561
111     USW00023174  LOS ANGELES INTERNATIONAL AIRPORT, CA US     1112
83      USW00023174  LOS ANGELES INTERNATIONAL AIRPORT, CA US     1008
59      USW00023174  LOS ANGELES INTERNATIONAL AIRPORT, CA US      657
79      USW00023174  LOS ANGELES INTERNATIONAL AIRPORT, CA US       46
Name: count, dtype: int64

In [29]:
# Filter for the specific station name
filtered_df1 = merged_df[merged_df['NAME'] == 'MERCED MUNICIPAL AIRPORT, CA US']

filtered_df1[['COUNTY', 'STATION', 'NAME']].value_counts()

COUNTY  STATION      NAME                           
47      USW00023257  MERCED MUNICIPAL AIRPORT, CA US    7614
23      USW00023257  MERCED MUNICIPAL AIRPORT, CA US    6836
89      USW00023257  MERCED MUNICIPAL AIRPORT, CA US    6002
17      USW00023257  MERCED MUNICIPAL AIRPORT, CA US    5964
7       USW00023257  MERCED MUNICIPAL AIRPORT, CA US    5706
93      USW00023257  MERCED MUNICIPAL AIRPORT, CA US    5501
61      USW00023257  MERCED MUNICIPAL AIRPORT, CA US    5105
103     USW00023257  MERCED MUNICIPAL AIRPORT, CA US    3846
109     USW00023257  MERCED MUNICIPAL AIRPORT, CA US    3652
45      USW00023257  MERCED MUNICIPAL AIRPORT, CA US    3501
97      USW00023257  MERCED MUNICIPAL AIRPORT, CA US    3319
13      USW00023257  MERCED MUNICIPAL AIRPORT, CA US    3122
57      USW00023257  MERCED MUNICIPAL AIRPORT, CA US    3058
35      USW00023257  MERCED MUNICIPAL AIRPORT, CA US    3030
99      USW00023257  MERCED MUNICIPAL AIRPORT, CA US    2693
63      USW00023257  MERCED MUNI

In [30]:
# Filter for the specific station name
filtered_df1 = merged_df[merged_df['NAME'] == 'SAN BERNARDINO LAGUNAS, MX']

filtered_df1[['COUNTY', 'STATION', 'NAME']].value_counts()

Series([], Name: count, dtype: int64)

In [31]:

# Filter for the specific station name
filtered_df1 = merged_df[merged_df['NAME'] == 'SAN BERNARDINO 5.1 NW, CA US']

filtered_df1[['COUNTY', 'STATION', 'NAME']].value_counts()

COUNTY  STATION      NAME                        
71      US1CASR0055  SAN BERNARDINO 5.1 NW, CA US    2262
65      US1CASR0055  SAN BERNARDINO 5.1 NW, CA US     344
29      US1CASR0055  SAN BERNARDINO 5.1 NW, CA US     116
37      US1CASR0055  SAN BERNARDINO 5.1 NW, CA US     116
27      US1CASR0055  SAN BERNARDINO 5.1 NW, CA US      13
Name: count, dtype: int64

In [36]:
merged_df['NAME'].unique()


array(['MERCED MUNICIPAL AIRPORT, CA US', nan,
       'RIVERSIDE MUNICIPAL AIRPORT, CA US',
       'FRESNO YOSEMITE INTERNATIONAL, CA US',
       'LOS ANGELES INTERNATIONAL AIRPORT, CA US',
       'SAN DIEGO INTERNATIONAL AIRPORT, CA US',
       'SAN BERNARDINO 5.1 NW, CA US'], dtype=object)

### Ensuring Correct Merging of Datasets Based on County Codes

After merging the **FRAP** (Fire Records) and **NOAA** (Weather Stations) datasets, it was crucial to verify that the records were correctly matched between the two datasets. A key step in this verification process involved cross-checking the county codes associated with each station.

For each of the following stations, we carefully reviewed the county codes to ensure that they matched the entries in our merged dataset:

1. **SAN DIEGO INTERNATIONAL AIRPORT, CA US**  
   - Verified county codes: 73, 25, 65  
   - Matching Entries in Dataset: USW00023188  
   - Status: ✅ All county codes match.

2. **RIVERSIDE MUNICIPAL AIRPORT, CA US**  
   - Verified county codes: 65, 59, 37, 71, 73  
   - Matching Entries in Dataset: USW00003171  
   - Status: ✅ All county codes match.

3. **FRESNO YOSEMITE INTERNATIONAL, CA US**  
   - Verified county codes: 19, 107, 39, 79, 29, 51, 27, 53, 31, 83, 43, 69, 109  
   - Matching Entries in Dataset: USW00093193  
   - Status: ✅ All county codes match.

4. **LOS ANGELES INTERNATIONAL AIRPORT, CA US**  
   - Verified county codes: 37, 29, 111, 83, 59, 79  
   - Matching Entries in Dataset: USW00023174  
   - Status: ✅ All county codes match.

5. **MERCED MUNICIPAL AIRPORT, CA US**  
   - Verified county codes: 47, 23, 89, 17, 7, 93, 61, 103, 109, 45, 97, 13, 57, 35, 99, 63, 105, 9, 49, 1, 85, 77, 53, 115, 43, 67, 33, 87, 55, 5, 39, 95, 15, 81, 113, 3, 69, 91, 101, 51, 19, 41, 21, 11, 75  
   - Matching Entries in Dataset: USW00023257  
   - Status: ✅ All county codes match.

6. **SAN BERNARDINO 5.1 NW, CA US**  
   - Verified county codes: 71, 65, 29, 37, 27  
   - Matching Entries in Dataset: US1CASR0055  
   - Status: ✅ All county codes match.

#### Conclusion:
By ensuring that all the county codes match between the **FRAP** and **NOAA** datasets for each station, we have confirmed that the merging process was performed correctly. This step guarantees that the fire records are accurately associated with the corresponding weather station data, which is crucial for the integrity of our analysis moving forward.


### Data Visualization of Merged Data: 

In [2]:
import pandas as pd
file_path = r"C:\Users\annis\Project dsc 550\merged_df_cleaned.csv"
df = pd.read_csv(file_path)
df.head()

Unnamed: 0,DISCOVERY_DATE,DISCOVERY_TIME,FIRE_YEAR,LATITUDE_x,LONGITUDE_x,STATE,FIRE_SIZE,FIRE_SIZE_CLASS,NWCG_CAUSE_CLASSIFICATION,NWCG_GENERAL_CAUSE,...,DATE,AWND,PGTM,PRCP,TMAX,TMIN,WDF2,WDF5,WSF2,WSF5
0,2005-02-02,13:00:00,2005,40.036944,-121.005833,CA,0.1,A,Human,Power generation/transmission/distribution,...,2005-02-02,3.13,123.0,0.0,60.0,35.0,110.0,110.0,8.9,13.0
1,2004-05-12,08:45:00,2004,38.933056,-120.404444,CA,0.25,A,Natural,Natural,...,2004-05-12,8.05,1155.0,0.0,80.0,49.0,320.0,340.0,17.0,21.0
2,2004-05-31,19:21:00,2004,38.984167,-120.735556,CA,0.1,A,Human,Debris and open burning,...,2004-05-31,6.26,1708.0,0.0,94.0,56.0,320.0,310.0,15.0,17.0
3,2004-06-28,16:00:00,2004,38.559167,-119.913333,CA,0.1,A,Natural,Natural,...,2004-06-28,6.93,1812.0,0.0,96.0,62.0,270.0,250.0,15.0,19.9
4,2004-06-28,16:00:00,2004,38.559167,-119.933056,CA,0.1,A,Natural,Natural,...,2004-06-28,6.93,1812.0,0.0,96.0,62.0,270.0,250.0,15.0,19.9


In [3]:
df.columns

Index(['DISCOVERY_DATE', 'DISCOVERY_TIME', 'FIRE_YEAR', 'LATITUDE_x',
       'LONGITUDE_x', 'STATE', 'FIRE_SIZE', 'FIRE_SIZE_CLASS',
       'NWCG_CAUSE_CLASSIFICATION', 'NWCG_GENERAL_CAUSE', 'COUNTY',
       'nearest_station_id', 'STATION', 'NAME', 'LATITUDE_y', 'LONGITUDE_y',
       'ELEVATION', 'DATE', 'AWND', 'PGTM', 'PRCP', 'TMAX', 'TMIN', 'WDF2',
       'WDF5', 'WSF2', 'WSF5'],
      dtype='object')