# Hurricane Path EDA

Goal: clean up the data to be useful, find other forms of data to include (e.g., oceanic/atmospheric variables), and perform EDA

Here's the documentation for the data: https://www.ncei.noaa.gov/sites/g/files/anmtlf171/files/2025-09/IBTrACS_v04r01_column_documentation.pdf

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

pd.options.display.max_columns = None
pd.set_option('future.no_silent_downcasting', True)

In [44]:
def load_ibtracs(path="ibtracs.ALL.list.v04r01.csv"):
    """
    Loads in the hurricane data
    """
    df = pd.read_csv(path, skiprows=[1]) ## IBTrACS has a weird two-header format
    df.columns = [c.lower() for c in df.columns] ## normalizes the df cols
    df = df.replace(' ', np.nan) ## gets rid of all of the blank values

    ## fixes the issue of columns having mixed dtypes (e.g., strs with ints)
    df['storm_dir'] = pd.to_numeric(hurricane_paths['storm_dir'], errors='coerce').astype('Int64')
    df['storm_speed'] = pd.to_numeric(hurricane_paths['storm_speed'], errors='coerce').astype('Int64')
    df['usa_wind'] = pd.to_numeric(hurricane_paths['usa_wind'], errors='coerce').astype('Int64')
    df['usa_lat'] = pd.to_numeric(hurricane_paths['usa_lat'], errors='coerce').astype('Float64')
    df['usa_lon'] = pd.to_numeric(hurricane_paths['usa_lon'], errors='coerce').astype('Float64')

    ## makes the observation times in datetime format
    df['iso_time'] = pd.to_datetime(hurricane_paths['iso_time'])
    return df

hurricane_paths = load_ibtracs("ibtracs.ALL.list.v04r01.csv").copy()

  df = pd.read_csv(path, skiprows=[1]) ## IBTrACS has a weird two-header format


In [47]:
hurricane_paths.dtypes.head(10)

sid                 object
season               int64
number               int64
basin               object
subbasin            object
name                object
iso_time    datetime64[ns]
nature              object
lat                float64
lon                float64
dtype: object

In [51]:
hurricane_paths['iso_time']

0        1842-10-25 03:00:00
1        1842-10-25 06:00:00
2        1842-10-25 09:00:00
3        1842-10-25 12:00:00
4        1842-10-25 15:00:00
                 ...        
721932   2025-11-05 12:00:00
721933   2025-11-05 15:00:00
721934   2025-11-05 18:00:00
721935   2025-11-05 21:00:00
721936   2025-11-06 00:00:00
Name: iso_time, Length: 721937, dtype: datetime64[ns]