# Power Outages


### Getting the Data
The data is downloadable [here](https://engineering.purdue.edu/LASCI/research-data/outages/outagerisks).

A data dictionary is available at this [article](https://www.sciencedirect.com/science/article/pii/S2352340918307182) under *Table 1. Variable descriptions*.



### Introduction

Have you ever wondered if there was a reason why massive power outages in your community
happen? Is it related to the weather? Too many people using the electricity? Maybe your region has
poor electricity maintenance! In this project my partner and I attempt to tackle these questions. We
utilize the dataset on these power outage events that dates back from January 2000 to July 2016

In [19]:
# Import packages
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import seaborn as sns
from scipy.stats import pearsonr
sns.set_theme(style= 'whitegrid')
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

### Cleaning and EDA




In [20]:
def read_data(fp):
    #reads the excel file
    data = pd.read_excel(fp, header = 5, usecols = "B:T")

    # drop the first row with the descriptions
    data = data.drop([0])

    return data
fp = os.path.join('data', 'outage.xlsx')
data = read_data(fp).head()

In [21]:
#checking the dtata types of the relevant columns
data.dtypes

OBS                        float64
YEAR                       float64
MONTH                      float64
U.S._STATE                  object
POSTAL.CODE                 object
NERC.REGION                 object
CLIMATE.REGION              object
ANOMALY.LEVEL               object
CLIMATE.CATEGORY            object
OUTAGE.START.DATE           object
OUTAGE.START.TIME           object
OUTAGE.RESTORATION.DATE     object
OUTAGE.RESTORATION.TIME     object
CAUSE.CATEGORY              object
CAUSE.CATEGORY.DETAIL       object
HURRICANE.NAMES             object
OUTAGE.DURATION             object
DEMAND.LOSS.MW              object
CUSTOMERS.AFFECTED         float64
dtype: object

In [22]:
# helper function to convert into seasons
def season_helper(obs):
    if obs >= 3 and obs <= 5:
        return 'Spring'
    elif obs >= 6 and obs <= 8:
        return 'Summer'
    elif obs >= 9 and obs <= 11:
        return 'Fall'
    elif pd.isnull(obs):
        return np.NaN
    else:
        return 'Winter'

def data_cleaning(data):

    data_copy = data.copy(deep=True)

    # combine outage start time/date
    data_copy["OUTAGE.START.DATE"] = pd.to_datetime(data_copy["OUTAGE.START
    data_copy["OUTAGE.START.TIME"]=pd.to_timedelta(data_copy["OUTAGE.STAR
    data_copy["OUTAGE.START"]=data_copy["OUTAGE.START.DATE"] + data_copy
    data_copy=data_copy.drop(columns=["OUTAGE.START.DATE", "OUTAGE.START

    # combine outage restoration time/date
    data_copy["OUTAGE.RESTORATION.DATE"]=pd.to_datetime(data_copy["OUTAGE
    data_copy["OUTAGE.RESTORATION.TIME"]=pd.to_timedelta(data_copy["OUTAG
    data_copy["OUTAGE.RESTORATION"]=data_copy["OUTAGE.RESTORATION.DATE"]
    data_copy=data_copy.drop(columns=["OUTAGE.RESTORATION.DATE", "OUTAG

    # cast values to 'correct' types + set index
    data_copy=data_copy.set_index('OBS')
    data_copy["YEAR"]=data_copy["YEAR"].astype(int)
    data_copy["OUTAGE.DURATION"]=data_copy["OUTAGE.DURATION"].astype(floa
    data_copy['DEMAND.LOSS.MW']=data_copy['DEMAND.LOSS.MW'].astype(float)
    # add on a season column based on the month
    data_copy['SEASON']=data_copy['MONTH'].apply(season_helper)


    data_copy = data_copy.reset_index().drop(columns=['OBS'])

    return data_copy


SyntaxError: EOL while scanning string literal (<ipython-input-22-681f4b314110>, line 19)