Replicate Boruff et al 2003

Gopi Goteti edited this page Jan 21, 2014 · 1 revision

Bryan Boruff and others used data from the Storm Prediction Center to examine the temporal variability and spatial distribution of tornado hazards in the United States ("Boruff, Bryan J; Easoz, Jaime A; Jones, Steve D; Landry, Heather R; Mitchem, Jamie D; Cutter, Susan L; ",Tornado hazards in the United States,Climate Research,24,2,103-117,2003,"INTER-RESEARCH NORDBUNTE 23, D-21385 OLDENDORF LUHE, GERMANY).

The below code produces numbers consistent, if not identical, to those from Boruff et al. The discrepancy could be due to the differences in Tornado data used (due to regular quality control the latest data used here may not be identical to the one used by SPC).

Read raw data add column names similar to those used by Elsner et al 2013 and consistent with documentation.

library(plyr)

torn <- read.csv("data/1950-2012_torn.csv", header = FALSE, sep = ",", as.is = TRUE)

colnames(torn) <- c("OM", "YEAR", "MONTH", "DAY", "DATE", "TIME", "TIMEZONE", 
    "STATE", "FIPS", "STATENUMBER", "FSCALE", "INJURIES", "FATALITIES", "LOSS", 
    "CROPLOSS", "SLAT", "SLON", "ELAT", "ELON", "LENGTH", "WIDTH", "NS", "SN", 
    "SG", "F1", "F2", "F3", "F4")

Tornadoes spanning multiple counties are listed separately for each county. Thus, a single tornado could appear multiple times. Identify unique tornadoes using YEAR, OM and NS. Check for uniqueness; especially, tornadoes spanning multiple years (i.e, those which begin on 12/31 and end on 1/1); need to check only those with NS > 1.

dec31 <- subset(torn, MONTH == 12 & DAY == 31 & NS != 1)
jan01 <- subset(torn, MONTH == 1 & DAY == 1 & NS != 1)
if (nrow(dec31) > 0 & nrow(jan01) > 0) {
    stop("check! unique id assignment may not be accurate!")
}
torn$id <- paste(torn$YEAR, torn$MONTH, torn$OM, torn$NS, sep = "-")

Formatted data for use in subsequent analysis. Boruff et al used data from 1950-99 and excludd Alaska, Hawaii and Puerto Rico.

torn <- subset(torn, YEAR %in% seq(1950, 1999))
torn <- subset(torn, !(STATE %in% c("AK", "HI", "PR")))

str(torn)
## 'data.frame':    41107 obs. of  29 variables:
##  $ OM         : int  1 1 1 2 3 4 5 6 7 8 ...
##  $ YEAR       : int  1950 1950 1950 1950 1950 1950 1950 1950 1950 1950 ...
##  $ MONTH      : int  1 1 1 1 1 1 1 1 1 2 ...
##  $ DAY        : int  3 3 3 3 3 13 25 25 26 11 ...
##  $ DATE       : chr  "1950-01-03" "1950-01-03" "1950-01-03" "1950-01-03" ...
##  $ TIME       : chr  "11:00:00" "11:00:00" "11:10:00" "11:55:00" ...
##  $ TIMEZONE   : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ STATE      : chr  "MO" "MO" "IL" "IL" ...
##  $ FIPS       : int  29 29 17 17 39 5 29 17 48 48 ...
##  $ STATENUMBER: int  1 1 1 2 1 1 2 3 1 2 ...
##  $ FSCALE     : int  3 3 3 3 1 3 2 2 2 2 ...
##  $ INJURIES   : int  3 3 0 3 1 1 5 0 2 0 ...
##  $ FATALITIES : int  0 0 0 0 0 1 0 0 0 0 ...
##  $ LOSS       : num  6 6 5 5 4 3 5 5 0 4 ...
##  $ CROPLOSS   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SLAT       : num  38.8 38.8 38.8 39.1 40.9 ...
##  $ SLON       : num  -90.2 -90.2 -90.1 -89.3 -84.6 ...
##  $ ELAT       : num  38.8 38.8 38.8 39.1 0 ...
##  $ ELON       : num  -90 -90.1 -90 -89.2 0 ...
##  $ LENGTH     : num  9.5 6.2 3.3 3.6 0.1 0.6 2.3 0.1 4.7 9.9 ...
##  $ WIDTH      : int  150 150 100 130 10 17 300 100 133 400 ...
##  $ NS         : int  2 2 2 1 1 1 1 1 1 1 ...
##  $ SN         : int  0 1 1 1 1 1 1 1 1 1 ...
##  $ SG         : int  1 2 2 1 1 1 1 1 1 1 ...
##  $ F1         : int  0 189 119 135 161 113 93 91 47 39 ...
##  $ F2         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ F3         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ F4         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ id         : chr  "1950-1-1-2" "1950-1-1-2" "1950-1-1-2" "1950-1-2-1" ...
head(torn)
##   OM YEAR MONTH DAY       DATE     TIME TIMEZONE STATE FIPS STATENUMBER
## 1  1 1950     1   3 1950-01-03 11:00:00        3    MO   29           1
## 2  1 1950     1   3 1950-01-03 11:00:00        3    MO   29           1
## 3  1 1950     1   3 1950-01-03 11:10:00        3    IL   17           1
## 4  2 1950     1   3 1950-01-03 11:55:00        3    IL   17           2
## 5  3 1950     1   3 1950-01-03 16:00:00        3    OH   39           1
## 6  4 1950     1  13 1950-01-13 05:25:00        3    AR    5           1
##   FSCALE INJURIES FATALITIES LOSS CROPLOSS  SLAT   SLON  ELAT   ELON
## 1      3        3          0    6        0 38.77 -90.22 38.83 -90.03
## 2      3        3          0    6        0 38.77 -90.22 38.82 -90.12
## 3      3        0          0    5        0 38.82 -90.12 38.83 -90.03
## 4      3        3          0    5        0 39.10 -89.30 39.12 -89.23
## 5      1        1          0    4        0 40.88 -84.58  0.00   0.00
## 6      3        1          1    3        0 34.40 -94.37  0.00   0.00
##   LENGTH WIDTH NS SN SG  F1 F2 F3 F4         id
## 1    9.5   150  2  0  1   0  0  0  0 1950-1-1-2
## 2    6.2   150  2  1  2 189  0  0  0 1950-1-1-2
## 3    3.3   100  2  1  2 119  0  0  0 1950-1-1-2
## 4    3.6   130  1  1  1 135  0  0  0 1950-1-2-1
## 5    0.1    10  1  1  1 161  0  0  0 1950-1-3-1
## 6    0.6    17  1  1  1 113  0  0  0 1950-1-4-1
tail(torn)
##        OM YEAR MONTH DAY       DATE     TIME TIMEZONE STATE FIPS
## 41143 673 1999    12   4 1999-12-04 15:20:00        3    TX   48
## 41144 682 1999    12   4 1999-12-04 16:49:00        3    AR    5
## 41145 683 1999    12   4 1999-12-04 17:10:00        3    AR    5
## 41146 684 1999    12   4 1999-12-04 17:30:00        3    AR    5
## 41147 670 1999    12   4 1999-12-04 18:33:00        3    LA   22
## 41148 680 1999    12   9 1999-12-09 18:36:00        3    MS   28
##       STATENUMBER FSCALE INJURIES FATALITIES LOSS CROPLOSS  SLAT   SLON
## 41143          68      1        0          0 0.04        0 32.53 -94.37
## 41144          75      1        0          0 0.00        0 34.20 -92.97
## 41145          76      1        0          0 0.00        0 35.52 -93.53
## 41146          77      1        0          0 0.00        0 35.03 -93.23
## 41147          48      0        0          0 0.00        0 31.75 -93.40
## 41148          10      3        1          0 0.20        0 32.63 -90.35
##        ELAT   ELON LENGTH WIDTH NS SN SG  F1 F2 F3 F4            id
## 41143 32.53 -94.37    1.6   140  1  1  1 203  0  0  0 1999-12-673-1
## 41144 34.25 -92.95    4.3     0  1  1  1  59  0  0  0 1999-12-682-1
## 41145 35.55 -93.52    2.8    50  1  1  1  71  0  0  0 1999-12-683-1
## 41146 35.13 -93.13    8.7     0  1  1  1 149  0  0  0 1999-12-684-1
## 41147 31.78 -93.35    4.0    30  1  1  1  69  0  0  0 1999-12-670-1
## 41148 32.70 -90.28    6.0   400  1  1  1 163  0  0  0 1999-12-680-1

Function to summarize counts of unique number of tornadoes by year and month.

count_unique_tornadoes <- function(in_df, monthly_stats = TRUE) {

    require(plyr)

    if (monthly_stats) {
        # some months dont have data; assign NAs to those months
        mon_totals <- expand.grid(MONTH = seq(1, 12), stringsAsFactors = FALSE)

        # number of unique tornadoes per month
        mon_torn <- ddply(.data = in_df, .variables = .(MONTH), .fun = function(x_df) length(unique(x_df$id)), 
            .drop = FALSE)

        mon_totals <- merge(mon_totals, mon_torn, by = "MONTH", all = TRUE)

        # output matrix
        out_mat <- c(nrow(in_df), length(unique(in_df$id)), mon_totals$V1)
        out_mat <- matrix(out_mat, nrow = 1)
        colnames(out_mat) <- c("N_total", "N_unique", month.abb)
    } else {
        # output matrix
        out_mat <- c(nrow(in_df), length(unique(in_df$id)))
        out_mat <- matrix(out_mat, nrow = 1)
        colnames(out_mat) <- c("N_total", "N_unique")
    }

    return(out_mat)
}

Decades were used by Boruff et al. instead of individual years for summarizing stats. Note that Tornado segments could each have different fatalities and injuries.The data row corresp to the overall event has the max fatalities/injuries.

torn_fat <- aggregate(cbind(FATALITIES, INJURIES) ~ id, data = torn, FUN = max)
torn_fat$YEAR <- as.numeric(substr(torn_fat$id, 1, 4))
torn_fat$MONTH <- as.numeric(substr(torn_fat$id, 6, 6))

time_breaks <- c(1950, 1960, 1970, 1980, 1990, 2000)
time_labels <- c("1950s", "1960s", "1970s", "1980s", "1990s")

torn_fat$time_cat <- cut(torn_fat$YEAR, breaks = time_breaks, labels = time_labels, 
    include.lowest = TRUE, right = FALSE)

head(torn_fat)
##           id FATALITIES INJURIES YEAR MONTH time_cat
## 1 1950-1-1-2          0        3 1950     1    1950s
## 2 1950-1-2-1          0        3 1950     1    1950s
## 3 1950-1-3-1          0        1 1950     1    1950s
## 4 1950-1-4-1          1        1 1950     1    1950s
## 5 1950-1-5-1          0        5 1950     1    1950s
## 6 1950-1-6-1          0        0 1950     1    1950s

Reproduce stats on fatalities and injuries produced by Boruff et al., similar to Table 2 of their paper.

fat_stats <- aggregate(cbind(FATALITIES, INJURIES) ~ time_cat, data = torn_fat, 
    FUN = sum)

str(fat_stats)
## 'data.frame':    5 obs. of  3 variables:
##  $ time_cat  : Factor w/ 5 levels "1950s","1960s",..: 1 2 3 4 5
##  $ FATALITIES: int  1421 942 998 522 581
##  $ INJURIES  : int  14423 17258 21621 11297 11756
fat_stats
##   time_cat FATALITIES INJURIES
## 1    1950s       1421    14423
## 2    1960s        942    17258
## 3    1970s        998    21621
## 4    1980s        522    11297
## 5    1990s        581    11756

Reproduce stats on counts of tornadoes produced by Boruff et al., similar to those in Table 1 of their paper.

event_stats <- ddply(.data = torn_fat, .variables = .(time_cat), .fun = count_unique_tornadoes, 
    monthly_stats = FALSE)

str(event_stats)
## 'data.frame':    5 obs. of  3 variables:
##  $ time_cat: Factor w/ 5 levels "1950s","1960s",..: 1 2 3 4 5
##  $ N_total : int  4791 6801 8568 8185 12132
##  $ N_unique: int  4791 6801 8568 8185 12132
event_stats
##   time_cat N_total N_unique
## 1    1950s    4791     4791
## 2    1960s    6801     6801
## 3    1970s    8568     8568
## 4    1980s    8185     8185
## 5    1990s   12132    12132