-
Notifications
You must be signed in to change notification settings - Fork 0
/
ElecGenByFuelType.qmd
316 lines (228 loc) · 8.5 KB
/
ElecGenByFuelType.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
---
title: "Electricity Generation By Fuel Type"
author: "Andy Pickering"
date: "2023-08-04"
format: html
editor: visual
toc: true
---
# Overview
Building on my blog post where I calculated the fractions of electricity generation by fuel source for Colorado and compared to those on the AFDC tool.
My goal now is to generalize that analysis to work for any state, and look in more detail at other states, changes over time, and the US as a whole.
# Getting Data
```{r Load Libraries}
#| code-fold: true
library(httr)
library(jsonlite)
library(ggplot2)
theme_set(theme_grey(base_size = 15)) # make the default font sizes etc a little bigger
suppressPackageStartupMessages(library(dplyr))
library(forcats)
suppressPackageStartupMessages(library(plotly))
library(DT)
```
I'm going to make a simple little function to retrieve data from the API, that will make things a little tidier and warn me if the API call returns an error code.
```{r}
retrieve_api_data <- function(api_path) {
response_raw <- httr::GET(url = complete_api_path)
if (response_raw$status_code != 200) {
print(paste("Warning, API returned error code ", response_raw$status_code))
}
return(response_raw)
}
```
## Download data for 1 state
The data I will use is the annual electric power generation by state from the [EIA API](https://www.eia.gov/opendata/browser/). I'm going to just look at data for Colorado for now, and I'm looking at sector id 98: electric power.
```{r}
# API key stored in .Renviron
api_key <- Sys.getenv("EIA_KEY")
# base url for EIA API V2
api_base <- "https://api.eia.gov/v2/"
route <- "electricity"
subroute <- "electric-power-operational-data"
data_name <- "generation"
state <- "CO"
# sector id 98= electric power
sector_id <- 98
# annual
complete_api_path <- paste0(
api_base, route, "/", subroute, "/", "data/",
"?frequency=annual&data[0]=", data_name,
"&facets[sectorid][]=", sector_id,
"&facets[location][]=", state,
"&api_key=", api_key
)
# get the data from the API
response_raw <- retrieve_api_data(complete_api_path)
# convert from JSON
dat <- jsonlite::fromJSON(httr::content(response_raw, "text"))
if (length(dat$response$warnings)>0){
print("Warning returned with data")
print(dat$response$warnings)
}
# extract the dataframe
df <- dat$response$data
# rename a column and drop some extra unnecessary columns
df <- df %>%
rename(year = period) %>%
select(-c(location, sectorid, sectorDescription, stateDescription))
head(df)
```
Note that some of the *fueltype* categories are subsets of, or overlap with, other categories. For example *COW* is all coal products, which includes SUB (subbituminous coal) and BIT (bituminous coal). For this analysis I will look at the following categories:
- ALL
- COW (all coal)
- Natural Gas
- WND : Wind
- SUN : Solar
- HYC: conventional hydroelectric
- BIO: BiomassPlot total electricity generation by fuel type
## Get data for all states
- note that the API will only return 5000 rows for a single request, so if we request all states for all time, we exceed that limit. The API will return a warning: *dat\$response\$warnings*
- I'll try just getting 2 years of data for all states for now, which returns less than 5000 rows. To get all the data, i'll have to do multiple requests with an offset specified.
- Note that not specifiying a location returns all states, as well as regions and US total.
```{r}
# API key stored in .Renviron
api_key <- Sys.getenv("EIA_KEY")
# base url for EIA API V2
api_base <- "https://api.eia.gov/v2/"
route <- "electricity"
subroute <- "electric-power-operational-data"
data_name <- "generation"
#state <- "CO"
# sector id 98= electric power
sector_id <- 98
# annual
complete_api_path <- paste0(
api_base, route, "/", subroute, "/", "data/",
"?frequency=annual&data[0]=", data_name,
"&facets[sectorid][]=", sector_id,
"&start=2020-01&end=2023-01",
"&api_key=", api_key
)
# get the data from the API
response_raw <- retrieve_api_data(complete_api_path)
# convert from JSON
dat <- jsonlite::fromJSON(httr::content(response_raw, "text"))
if (length(dat$response$warnings)>0){
print("Warning returned with data")
print(dat$response$warnings)
}else{
print("No warnings")
}
# extract the dataframe
df <- dat$response$data
# rename a column and drop some extra unnecessary columns
df <- df %>%
rename(year = period)# %>%
# select(-c(location, sectorid, sectorDescription, stateDescription))
head(df)
```
```{r }
unique(df$fuelTypeDescription)
```
```{r}
df %>%
filter(location == "MA",
year == 2021) %>%
arrange(desc(generation)) %>%
View()
```
```{r}
state <- "NH"
wh_year <- 2021
df %>%
filter(location == state,
year == wh_year) %>%
mutate(fuelTypeDescription = forcats::fct_reorder(fuelTypeDescription, generation)) %>%
ggplot() +
geom_col(aes(fuelTypeDescription, generation)) +
coord_flip() +
ggtitle(paste0(wh_year, "Annual Elec. Gen. in ",state)) +
xlab("Fuel Type") +
ylab(paste0("Generation (", df$`generation-units`, ")" ))
```
```{r}
state_totals_2021 <- df %>%
filter(fueltypeid == "ALL",
year == 2021) %>%
arrange(desc(generation)) %>%
select(location, stateDescription, fueltypeid, generation, `generation-units`)
View(state_totals_2021)
```
choropleth map of total generation by state
```{r state total choropleth}
library(tigris)
library(leaflet)
options(tigris_use_cache = TRUE)
states <- tigris::states(cb = TRUE)
df2 <- states %>%
left_join(state_totals_2021, by = c("NAME" = "stateDescription")) %>%
filter(!is.na(generation),
STUSPS != "DC",
STUSPS != "PR")
pal_elec <- leaflet::colorNumeric(palette = "viridis",
domain = df2$generation)
leaflet() %>%
leaflet::addPolygons(data = df2,
weight = 1,
color = "black",
popup = paste(df2$NAME, "<br>",
round(df2$generation), df2$`generation-units` ),
fillColor = ~pal_elec(generation),
fillOpacity = 0.6) %>%
addLegend(data = df2,
pal = pal_elec,
values = ~generation,
opacity = 1,
title = "Total generation"
)
```
```{r}
df %>%
filter(fueltypeid == "ALL",
year == 2021,
location != "US"
) %>%
arrange(desc(generation)) %>%
select(stateDescription, fueltypeid, generation, `generation-units`) %>%
mutate(stateDescription = forcats::fct_reorder(stateDescription, generation)) %>%
ggplot(aes(stateDescription, generation)) +
geom_col() +
coord_flip()
```
# Computing percent of total generation by fuel type
- One issue is that the fuel types present for each state might differ (for example Colorado doesn't have any nuclear generation, and that field is not returned for Colorado). In the blog post I just ignored nuclear, but if I am comparing states I will want to have a zero value for nuclear even if a state doesn't have any.
Now I want to compute the percent of total generation that each fuel type makes up. Currently the dataframe has a row for each year and fuel type. To make it easier to compute, I need to pivot the data frame to a wide format, so there is one row for each year and a column for each fuel type. Then I can simply divide the value for each fuel type by the total.
After pivoting to a wider format, the dataframe has one row for each year and a column for each fuel type:
```{r Pivot wider}
df_wide <- df %>%
select(year, generation, fueltypeid) %>%
tidyr::pivot_wider(names_from = fueltypeid, values_from = generation)
head(df_wide)
```
Now I can compute the percent of total generation for each fuel type:
```{r Calculate Percentages}
df_perc <- df_wide %>%
mutate(
perc_Solar = round(SUN / ALL * 100, 2),
perc_Wind = round(WND / ALL * 100, 2),
perc_Coal = round(COW / ALL * 100, 2),
perc_NaturalGas = round(NG / ALL * 100, 2),
perc_Hydro = round((HPS + HYC) / ALL * 100, 2),
perc_Biomass = round(BIO / ALL * 100, 2),
) %>%
select(year, starts_with("perc_"))
head(df_perc)
```
Now that I've computed the percent for each fuel type, I will pivot back to a long format that will make plotting easier. In this format there is a row for each year and fueltype, and when I plot the data I can simply specify the FuelType column as the color or fill:
```{r Pivot longer}
df_perc_long <- df_perc %>%
tidyr::pivot_longer(
cols = starts_with("perc_"),
names_prefix = "perc_",
names_to = "FuelType",
values_to = "percent"
)
head(df_perc_long)
```
##