# Intro

The goal of this project is to build a model that takes in tropical cyclone tracking data and classify accurately whether readings indicate that a storm is a severe Tropical Storm or a less disruptive disturbance. 

## Business Case

The resulting model will be used by meterologists to understand whether an incoming storm is a major threat to a certain area, and therefore inform news agenices, local governments, and the public to prepare accordingly. 

## Data Understanding

The data for this project is from the National Oceanic and Atmospheric Administration's International Best Track Archive for Climate Stewardship (IBTrACS) project. The goal of this project is make available tropical cyclone best track data to aid understanding of the distribution, frequency, and intensity of tropical cyclones worldwide.

Because the idea is to have a global data source, this data is pulled from many source agenices worldwide, and therefore has many columns that are duplicative, inconsistent, or difficult to interpret. When doing this analysis, reference was made to the data documentation saved in this repository.

[Source](https://www.ncdc.noaa.gov/ibtracs/index.php)

I'll start by importing my data to describe it further.

In [2]:
import pandas as pd
df = pd.read_csv('data/ibtracs.since1980.list.v04r00.csv', dtype='object', parse_dates=True, skiprows=[1])
#drop first row as it's a multi index

pd.set_option('display.max_columns', None)
df.head(3)

(271883, 163)


In [None]:
print(df.shape)

The size of the file is really large but it will get smaller throughout the cleaning process. To start off with, there are 163 columns and they are all reading in as object datatypes. I'll need to go through and clean these up.

In [None]:
df.columns = [x.lower() for x in df.columns]
df.info(verbose=True)

The dataset has readings for storms at multiple points in their progression. There are 4,458 unique storms tracked.

In [None]:
df['sid'].nunique()

My classification task will be to identify whether they are minor storms or severe Tropical Storms. Looking at my target column, 'nature', I can see six different classes that I want to sort into two so this will be a binary - severe storm or not severe. 

NR, not reported, and MX, mixture will be removed as they don't tell me anything. TS, tropical storm, will be my '1' - a severe storm. ET, DS, and SS are extratropical, disturbance, and subtropical - less severe storms. These will be my '0' class. 

In [3]:
df['nature'].unique()

KeyError: 'nature'

## Data Exploration & Cleaning