Getting elevation from lat/long and making a new csv file which contains elevation


Importing necessary packages for getting data and making it into the final dataset.

In [16]:
import pandas as pd
import numpy as np

If you have the full dataset (CSV file) with the elevation, you won't need to run the next 4 cells.

***
Function to get elevation

In [2]:
import requests

# script for returning elevation from lat, long, based on open elevation data
# which in turn is based on SRTM
def get_elevation(lat: float, long: float):
    """
    :param lat: latitude coordinates of the point
    :param long: longitude coordinates of the point
    :return: elevation in meters as numpy.int64
    """
    query = ('https://api.open-elevation.com/api/v1/lookup'
             f'?locations={lat},{long}')
    r = requests.get(query).json()  # json object, various ways you can extract value
    # one approach is to use pandas json functionality:
    elevation = pd.json_normalize(r, 'results')['elevation'].values[0]
    return elevation

Getting the latitude and longitude points from the CSV file

In [24]:
dfFullNoElevation = pd.read_csv(filepath_or_buffer=r"D:\Google Drive\Uni\Tilburg\Semester 6\Thesis\Data\FW_Veg_Rem_Combined.csv")
dfLatLong = dfFullNoElevation[["latitude", "longitude"]]

# dfLatLong


Script to get the elevation. Takes a while. When I ran it, it took almost 10hrs to complete.

In [3]:
# import time  # optional for timing the code

dfElevation = pd.DataFrame(columns=["elevation"])

# tic = time.perf_counter()  # timing start

for index, row in dfLatLong.iterrows():
    elevation = get_elevation(row["latitude"], row["longitude"])
    if index % 100 == 0:
        print(f"{index} done.")  # useful for dev purposes, allows you to see the progress
        pass  # in case the printing is commented out
    dfElevation = dfElevation.append([elevation], ignore_index=True)

# toc = time.perf_counter()  # timing end

dfElevation.columns = ["elevation"]  # change the column label
dfElevation.head(10)
# print(f"Done in {toc - tic:0.4f} seconds")  # print the time elapsed


NameError: name 'dfLatLong' is not defined

Now we will want to get the day of the week of the discovery of the fire. First we convert the disc_clean_date to a datetime object.

In [12]:
dfFullNoElevation = pd.read_csv(filepath_or_buffer=r"D:\Google Drive\Uni\Tilburg\Semester 6\Thesis\Data\FW_Veg_Rem_Combined.csv")
dfFullNoElevation["disc_date_datetime"] = pd.to_datetime(dfFullNoElevation["disc_clean_date"], format="%m%d%Y", infer_datetime_format=True)  # create a new column which contains the datetime objects
dfFullNoElevation["disc_dow"] = dfFullNoElevation["disc_date_datetime"].dt.dayofweek  # create a column which contains the day of week (dow)
# 0 - Monday; 1 - Tuesday; ...; 6 - Sunday;

Getting the elevation and date data into the full dataset. Writing it into a CSV file. This is useful so you don't have to run the code above every time you need to generate the data.

In [13]:
dfFull = pd.concat([dfFullNoElevation, dfElevation], axis=1)
#dfFull.head(10)
dfFull.to_csv(path_or_buf=r"D:\Google Drive\Uni\Tilburg\Semester 6\Thesis\Data\fullWithElevation.csv")

***

In case you already have the dataset (CSV file) with the elevation you can just run this part. It will import directly from the CSV into a Pandas DataFrame and run some necessary transformations/selections to produce the final dataset.

In [17]:
dfFull = pd.read_csv(filepath_or_buffer=r"D:\Google Drive\Uni\Tilburg\Semester 6\Thesis\Data\fullWithElevation.csv")
# dfFull.head(10)

dfFull = dfFull[["fire_size_class", "latitude", "longitude", "discovery_month", "disc_dow", "Temp_pre_30", "Temp_pre_15", "Temp_pre_7", "Wind_pre_30", "Wind_pre_15", "Wind_pre_7", "Hum_pre_30", "Hum_pre_15", "Hum_pre_7", "Prec_pre_30", "Prec_pre_15", "Prec_pre_7", "Vegetation", "remoteness", "elevation"]]

First, we can see there's some missing weather data. It's marked with a -1 rather than an NA. Let's fix that and try to drop the NA's.

In [21]:
dfFull.replace(-1.0, np.NaN, inplace=True)
dfFull.dropna(inplace=True)
print(dfFull.shape)

(41118, 20)


Unfortunately we end up losing about 20% of our data...

Perhaps, since we'll be splitting up the dataset, we can limit ourselves to a using this smaller dataset only when dealing with the weather variables.