Replicate Simmons et al 2013

Gopi Goteti edited this page Jan 21, 2014 · 1 revision

Kevin Simmons and others used data from the Storm Prediction Center to normalize damages from tornadoes in the United States (Simmons, Kevin M., Daniel Sutter, and Roger Pielke. "Normalized tornado damage in the United States: 1950–2011." Environmental Hazards 12.2 (2013): 132-147.).

Initially, the goal was to reproduce the entire analysis of Simmon et al. However, after trying to reproduce Table 2 of the paper, on summary statistics of 1996-2011, it appears that there are several errors with the analysis of Simmons et al.

Read raw data add column names similar to those used by Elsner et al 2013 and consistent with documentation.

library(plyr)

torn <- read.csv("data/1950-2012_torn.csv", header = FALSE, sep = ",", as.is = TRUE)

colnames(torn) <- c("OM", "YEAR", "MONTH", "DAY", "DATE", "TIME", "TIMEZONE", 
    "STATE", "FIPS", "STATENUMBER", "FSCALE", "INJURIES", "FATALITIES", "LOSS", 
    "CROPLOSS", "SLAT", "SLON", "ELAT", "ELON", "LENGTH", "WIDTH", "NS", "SN", 
    "SG", "F1", "F2", "F3", "F4")

Tornadoes spanning multiple counties are listed separately for each county. Thus, a single tornado could appear multiple times. Identify unique tornadoes using YEAR, OM and NS. Check for uniqueness; especially, tornadoes spanning multiple years (i.e, those which begin on 12/31 and end on 1/1); need to check only those with NS > 1.

dec31 <- subset(torn, MONTH == 12 & DAY == 31 & NS != 1)
jan01 <- subset(torn, MONTH == 1 & DAY == 1 & NS != 1)
if (nrow(dec31) > 0 & nrow(jan01) > 0) {
    stop("check! unique id assignment may not be accurate!")
}
torn$id <- paste(torn$YEAR, torn$MONTH, torn$OM, torn$NS, sep = "-")

Discard rows with SN = 0 since these are for the overall event when multiple segments are present.

torn <- subset(torn, SN != 0)

str(torn)
## 'data.frame':    57608 obs. of  29 variables:
##  $ OM         : int  1 1 2 3 4 5 6 7 8 9 ...
##  $ YEAR       : int  1950 1950 1950 1950 1950 1950 1950 1950 1950 1950 ...
##  $ MONTH      : int  1 1 1 1 1 1 1 1 2 2 ...
##  $ DAY        : int  3 3 3 3 13 25 25 26 11 11 ...
##  $ DATE       : chr  "1950-01-03" "1950-01-03" "1950-01-03" "1950-01-03" ...
##  $ TIME       : chr  "11:00:00" "11:10:00" "11:55:00" "16:00:00" ...
##  $ TIMEZONE   : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ STATE      : chr  "MO" "IL" "IL" "OH" ...
##  $ FIPS       : int  29 17 17 39 5 29 17 48 48 48 ...
##  $ STATENUMBER: int  1 1 2 1 1 2 3 1 2 3 ...
##  $ FSCALE     : int  3 3 3 1 3 2 2 2 2 3 ...
##  $ INJURIES   : int  3 0 3 1 1 5 0 2 0 12 ...
##  $ FATALITIES : int  0 0 0 0 1 0 0 0 0 1 ...
##  $ LOSS       : num  6 5 5 4 3 5 5 0 4 4 ...
##  $ CROPLOSS   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SLAT       : num  38.8 38.8 39.1 40.9 34.4 ...
##  $ SLON       : num  -90.2 -90.1 -89.3 -84.6 -94.4 ...
##  $ ELAT       : num  38.8 38.8 39.1 0 0 ...
##  $ ELON       : num  -90.1 -90 -89.2 0 0 ...
##  $ LENGTH     : num  6.2 3.3 3.6 0.1 0.6 2.3 0.1 4.7 9.9 12 ...
##  $ WIDTH      : int  150 100 130 10 17 300 100 133 400 1000 ...
##  $ NS         : int  2 2 1 1 1 1 1 1 1 1 ...
##  $ SN         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ SG         : int  2 2 1 1 1 1 1 1 1 1 ...
##  $ F1         : int  189 119 135 161 113 93 91 47 39 201 ...
##  $ F2         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ F3         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ F4         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ id         : chr  "1950-1-1-2" "1950-1-1-2" "1950-1-2-1" "1950-1-3-1" ...
head(torn)
##   OM YEAR MONTH DAY       DATE     TIME TIMEZONE STATE FIPS STATENUMBER
## 2  1 1950     1   3 1950-01-03 11:00:00        3    MO   29           1
## 3  1 1950     1   3 1950-01-03 11:10:00        3    IL   17           1
## 4  2 1950     1   3 1950-01-03 11:55:00        3    IL   17           2
## 5  3 1950     1   3 1950-01-03 16:00:00        3    OH   39           1
## 6  4 1950     1  13 1950-01-13 05:25:00        3    AR    5           1
## 7  5 1950     1  25 1950-01-25 19:30:00        3    MO   29           2
##   FSCALE INJURIES FATALITIES LOSS CROPLOSS  SLAT   SLON  ELAT   ELON
## 2      3        3          0    6        0 38.77 -90.22 38.82 -90.12
## 3      3        0          0    5        0 38.82 -90.12 38.83 -90.03
## 4      3        3          0    5        0 39.10 -89.30 39.12 -89.23
## 5      1        1          0    4        0 40.88 -84.58  0.00   0.00
## 6      3        1          1    3        0 34.40 -94.37  0.00   0.00
## 7      2        5          0    5        0 37.60 -90.68 37.63 -90.65
##   LENGTH WIDTH NS SN SG  F1 F2 F3 F4         id
## 2    6.2   150  2  1  2 189  0  0  0 1950-1-1-2
## 3    3.3   100  2  1  2 119  0  0  0 1950-1-1-2
## 4    3.6   130  1  1  1 135  0  0  0 1950-1-2-1
## 5    0.1    10  1  1  1 161  0  0  0 1950-1-3-1
## 6    0.6    17  1  1  1 113  0  0  0 1950-1-4-1
## 7    2.3   300  1  1  1  93  0  0  0 1950-1-5-1
tail(torn)
##           OM YEAR MONTH DAY       DATE     TIME TIMEZONE STATE FIPS
## 58164 425349 2012    12  25 2012-12-25 20:09:00        3    AL    1
## 58165 425351 2012    12  25 2012-12-25 20:26:00        3    AL    1
## 58166 425352 2012    12  25 2012-12-25 20:32:00        3    AL    1
## 58167 425358 2012    12  25 2012-12-25 22:24:00        3    AL    1
## 58168 425359 2012    12  25 2012-12-25 22:29:00        3    AL    1
## 58169 421430 2012    12  26 2012-12-26 13:58:00        3    NC   37
##       STATENUMBER FSCALE INJURIES FATALITIES LOSS CROPLOSS  SLAT   SLON
## 58164           0      1        0          0 0.00        0 32.05 -86.85
## 58165           0      0        0          0 0.00        0 32.15 -86.68
## 58166           0      2        0          0 0.00        0 32.20 -86.60
## 58167           0      2        2          0 0.00        0 31.73 -86.15
## 58168           0      1        0          0 0.00        0 32.35 -86.05
## 58169           0      1        0          0 0.01        0 34.79 -76.67
##        ELAT   ELON LENGTH WIDTH NS SN SG  F1 F2 F3 F4               id
## 58164 32.07 -86.82   2.03   300  1  1  1  85  0  0  0 2012-12-425349-1
## 58165 32.16 -86.67   0.62    75  1  1  1  85  0  0  0 2012-12-425351-1
## 58166 32.31 -86.51   9.55   900  1  1  1  85  0  0  0 2012-12-425352-1
## 58167 31.88 -85.96  15.70   600  1  1  1 109  0  0  0 2012-12-425358-1
## 58168 32.36 -86.04   0.72   100  1  1  1 101  0  0  0 2012-12-425359-1
## 58169 34.80 -76.67   0.91   100  1  1  1  31  0  0  0 2012-12-421430-1

Time period used by Simmons et al.

time_breaks <- c(1950, 1974, 2000, 2012, 2020)
time_labels <- c("1950-73", "1974-99", "2000-11", "2012-")

torn$time_cat <- cut(torn$YEAR, breaks = time_breaks, labels = time_labels, 
    include.lowest = TRUE, right = FALSE)

Summary stats presented by Simmons et al in Table 3 of their paper uses F-scale category. The SPC data has -9 assign ed to some events. Assume these correspond to 0. Simmons et al did not specify how these -9 values were treated. The values below are very close and consistent with the number of tornadoes by time period reported in Table 3 of Simmons et al.

torn$FSCALE[torn$FSCALE == -9] <- 0
fscale_stats <- ddply(torn[, c("FSCALE", "time_cat")], .(FSCALE), table)

fscale_stats
##   FSCALE 1950-73 1974-99 2000-11 2012-
## 1      0    4196   12643    9745   577
## 2      1    5374    8587    4389   241
## 3      2    4181    3239    1270    98
## 4      3    1055     945     377    27
## 5      4     269     243      80     4
## 6      5      32      28       8     0

Losses in dollars (current values) was combined with croploss, consistent with Simmons et al. These loss values were then categorized into bins (0 to 9) based on SPC damage intervals (Table 1 of Simmons et al).

torn$tot_loss <- (torn$LOSS + torn$CROPLOSS) * 10^6

# categorize loss using 1950-1995 bins (0 to 9)
loss_breaks <- c(0, 5 * 10^(1:9))
loss_labels <- paste0("Bin", (1:9))
torn$loss_cat <- cut(torn$tot_loss, breaks = loss_breaks, labels = loss_labels, 
    include.lowest = TRUE, right = FALSE)

head(torn)
##   OM YEAR MONTH DAY       DATE     TIME TIMEZONE STATE FIPS STATENUMBER
## 2  1 1950     1   3 1950-01-03 11:00:00        3    MO   29           1
## 3  1 1950     1   3 1950-01-03 11:10:00        3    IL   17           1
## 4  2 1950     1   3 1950-01-03 11:55:00        3    IL   17           2
## 5  3 1950     1   3 1950-01-03 16:00:00        3    OH   39           1
## 6  4 1950     1  13 1950-01-13 05:25:00        3    AR    5           1
## 7  5 1950     1  25 1950-01-25 19:30:00        3    MO   29           2
##   FSCALE INJURIES FATALITIES LOSS CROPLOSS  SLAT   SLON  ELAT   ELON
## 2      3        3          0    6        0 38.77 -90.22 38.82 -90.12
## 3      3        0          0    5        0 38.82 -90.12 38.83 -90.03
## 4      3        3          0    5        0 39.10 -89.30 39.12 -89.23
## 5      1        1          0    4        0 40.88 -84.58  0.00   0.00
## 6      3        1          1    3        0 34.40 -94.37  0.00   0.00
## 7      2        5          0    5        0 37.60 -90.68 37.63 -90.65
##   LENGTH WIDTH NS SN SG  F1 F2 F3 F4         id time_cat tot_loss loss_cat
## 2    6.2   150  2  1  2 189  0  0  0 1950-1-1-2  1950-73    6e+06     Bin7
## 3    3.3   100  2  1  2 119  0  0  0 1950-1-1-2  1950-73    5e+06     Bin7
## 4    3.6   130  1  1  1 135  0  0  0 1950-1-2-1  1950-73    5e+06     Bin7
## 5    0.1    10  1  1  1 161  0  0  0 1950-1-3-1  1950-73    4e+06     Bin6
## 6    0.6    17  1  1  1 113  0  0  0 1950-1-4-1  1950-73    3e+06     Bin6
## 7    2.3   300  1  1  1  93  0  0  0 1950-1-5-1  1950-73    5e+06     Bin7

Reproduce summary stats produced by Simmons et al. in Table 2 of their paper. For each loss category bin, number of events, median, mean, min, max and standard deviation were calculated. Data from 1996-2011 only was used, and also millions of dollars, consistent with Simmons et al.

simmons_events_summary <- function(in_df) {

    data.frame(N = nrow(in_df), Median = median(in_df$tot_loss)/10^6, Mean = mean(in_df$tot_loss)/10^6, 
        Min = min(in_df$tot_loss)/10^6, Max = max(in_df$tot_loss)/10^6, SD = sd(in_df$tot_loss)/10^6, 
        stringsAsFactors = FALSE)
}

torn <- subset(torn, YEAR >= 1996 & YEAR <= 2011)
event_stats <- ddply(.data = torn, .variables = .(loss_cat), .fun = simmons_events_summary)

options(scipen = 10)
event_stats
##   loss_cat     N    Median        Mean      Min       Max         SD
## 1     Bin1 10472    0.0000    0.000000   0.0000    0.0000   0.000000
## 2     Bin2     1    0.0004    0.000400   0.0004    0.0004         NA
## 3     Bin3   952    0.0020    0.001867   0.0005    0.0040   0.000958
## 4     Bin4  3672    0.0110    0.018315   0.0050    0.0480   0.010737
## 5     Bin5  3902    0.1000    0.145521   0.0500    0.4800   0.098500
## 6     Bin6  1533    1.0000    1.332635   0.5000    4.9500   0.973337
## 7     Bin7   365   10.0000   13.776142   5.0000   47.8000  10.138180
## 8     Bin8    76   94.0000  113.979842  50.0000  373.5550  74.363395
## 9     Bin9     7 1000.0000 1286.288571 500.0100 2800.1000 888.521656

Issues

  • The "Max" value in the above table and in Table 2 of Simmons et al., by definition, should not exceed the upper bound of the damage interval reported in Table 1. However, "Max" values in Simmons et al exceed this upper bound. For instance, for Bin 7, the SPC damage interval ranges from 5 million to 50 million. But the "Max" reported in Table 2 of the paper is 69.3 million.
  • Since the "Mean"" values from Table 2, which appear to be erroneous, form the basis for the remainder of the analysis by Simmons et al., it appears that the remainder of the analysis is erroneous.