/
2020-07-25-diabetesprevalenceeurope.Rmd
238 lines (190 loc) · 8.55 KB
/
2020-07-25-diabetesprevalenceeurope.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
---
name: "Aspire Data Solutions"
title: "Visualizing the prevalence of diabetes in six European countries, 1990-2017"
categories:
- Data Science
author: Mihiretu Kebede(PhD)
url: http://www.mihiretukebede.com/
google_analytics: "UA-173027539-1"
date: "2020-07-25`"
description: |
This is a quick demonstration of diabetes prevalence in six European countries. The data are from the the Institute of Health Metrics and Evaluation (IHME).
collections:
posts:
share: [twitter, linkedin, facebook, google-plus, pinterest]
output:
distill::distill_article:
self_contained: false
encoding: UTF-8
toc: true
toc_depth: 2
---
<head>
<script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-4447780400240825" crossorigin="anonymous"></script>
</head>
<link rel="stylesheet" href="styles.css" type="text/css">
# A step by-step guide on how to improve a simple scatter plot
## My R version
- See below
```{r,cache=TRUE, include=T, echo=TRUE, warning=FALSE, message=FALSE}
version
```
# Required packages
```{r,cache=TRUE, include=T, echo=TRUE, warning=FALSE, message=FALSE}
#If you don't have any of these packages install them using install.packages("pakage)
library(readr) #to read csv file
library(dplyr) #for data manipulaion
library(ggplot2) # for awesome plotting
library(gganimate) #for animating ggplot objects
library(scales) # for customizing axis
library(lattice) #for enhancing graphics
library(directlabels) #for directly labeling!
library(transformr)
```
# Load the data and have a closer look
```{r,cache=TRUE, include=T, echo=TRUE, warning=FALSE, message=FALSE}
# The data is for all countries included in GBD studies
diabetes <- read_csv("Eurodiabetes.csv")
dim(diabetes) #980 observations and 9 variables.
str(diabetes)
```
## Let's choose only 5 European countries with high diabetes prevalence
```{r,cache=TRUE, include=TRUE, echo=TRUE, warning=FALSE, message=FALSE}
ger_au_ch <- diabetes %>%
filter(Location %in% c("Austria", "Germany", "Switzerland", "Denmark", "Portugal", "Finland"))
ger_au_ch <- na.omit(ger_au_ch) #Remove missing values
ger_au_ch$Prev <- ger_au_ch$Value*100 #Prevalence in percent.
str(ger_au_ch) #structure of the data, variable types
dim(ger_au_ch) #168 rows, 10 columns
is.factor(ger_au_ch$Year) #check if Year is saved as factor variable
ger_au_ch$yearfactor <- factor(ger_au_ch$Year) #convert it to factor and save it as Yearfactor
ger_au_ch$Yearnumeric <- as.numeric(ger_au_ch$Year) #change it to numeric and save it as Year Numeric
```
# Basic plots
Since we already have everything we need for plotting, we can start using ggplot2
```{r,cache=TRUE, layout="l-body-outset", preview=TRUE, include=TRUE, echo=TRUE, message=FALSE, warning=FALSE}
plot1 <- ggplot(ger_au_ch, aes(x=Yearnumeric, y=Prev, col=Location)) +
geom_line() + geom_point() + xlab("Year") +
ylab("Prevalence of diabetes in %")
plot1
```
```{r, cache=TRUE,include=TRUE, echo=TRUE, message=FALSE, warning=FALSE}
plot2 <- ggplot(ger_au_ch, aes(x=Yearnumeric, y=Prev, col=Location)) +
geom_line() + xlab("Year") +
ylab("Prevalence of diabetes in %")
plot2
```
# The fun part
```{r, include=TRUE, echo=TRUE, warning=FALSE, message=FALSE}
library(gganimate)
library(directlabels)
euro_anim <- ggplot(ger_au_ch, aes(x=Yearnumeric, y=Prev, col=Location)) +
geom_point(size=6) + transition_time(Yearnumeric) +
shadow_mark() +
scale_x_continuous(name ="Year",
breaks= c(1990,1995,2000,2005,
2010, 2015, 2020)) +
xlab("Year") +
ylab("Prevalence of diabetes in %") +
labs(col="Country") +
theme(
axis.title.x = element_text(color = "Blue", size=15),
axis.title.y = element_text(color = "Blue", size=15),
axis.text.x = element_text(size = 15),
axis.text.y = element_text(size = 15),
plot.title = element_text(size=20),
legend.title = element_text(size = 10),
legend.text = element_text(size = 10),
legend.position = "None",
text = element_text(family = "Comics Sans MS")
) + ease_aes('cubic-in-out') +
geom_dl(aes(label=Location),
method=list("last.points",rot=40))
```
# Use gifski_renderer to loop or no to loop the gif
- Let's see, how it looks if we assign RUE or just T in short for loop.
```{r,cache=TRUE, echo=TRUE, include=TRUE}
animate(euro_anim, renderer = gifski_renderer(loop = T), width = 700, height = 700, duration = 15) # when you assign loop=TRUE or just T, the gif starts playing again
```
- Assign loop to False or just F
```{r,cache=TRUE, echo=TRUE, include=TRUE}
animate(euro_anim, renderer = gifski_renderer(loop = F), width = 700, height = 700, duration = 15) # when you assign loop=TRUE or just F, the gif stops looping. It only plays once. Refresh if you want to see again
```
# Let's tweak few things and see what happens
```{r,cache=TRUE, echo=TRUE, warning=FALSE, message=FALSE, include=TRUE}
euro_anim <- ggplot(ger_au_ch, aes(x=Yearnumeric, y=Prev, col=Location)) +
geom_point(size=6) + transition_time(Yearnumeric) +
shadow_mark() +
scale_x_continuous(name ="Year",
breaks= c(1990,1995,2000,2005,
2010, 2015, 2020)) +
xlab("Year") +
ylab("Prevalence of diabetes in %") +
labs(col="Country") +
shadow_wake(wake_length = 0.1, alpha = FALSE) +
theme(
axis.title.x = element_text(color = "Blue", size=15),
axis.title.y = element_text(color = "Blue", size=15),
axis.text.x = element_text(size = 15),
axis.text.y = element_text(size = 15),
plot.title = element_text(size=20),
legend.title = element_text(size = 10),
legend.text = element_text(size = 10),
legend.position = "None",
text = element_text(family = "Comics Sans MS")
) + ease_aes('cubic-in-out') +
geom_dl(aes(label=Location),
method=list("last.points",rot=40))
animate(euro_anim, renderer = gifski_renderer(loop = F), width = 700, height = 700, duration = 15)
```
- And again
```{r,cache=TRUE, include=TRUE, echo=TRUE, warning=FALSE, message=FALSE}
euro_anim3 <- ger_au_ch %>%
ggplot( aes(x=Yearnumeric, y=Prev, col=Location)) + geom_line() + geom_line() +
geom_point() +
transition_reveal(Yearnumeric) +
shadow_mark() +
scale_x_continuous(name ="Year",
breaks= c(1990,1995,2000,2005,
2010, 2015, 2020)) +
xlab("Year") +
ylab("Prevalence of diabetes in %") +
labs(col="Country") +
shadow_wake(wake_length = 0.1, alpha = FALSE) +
theme(
axis.title.x = element_text(color = "Blue", size=15),
axis.title.y = element_text(color = "Blue", size=15),
axis.text.x = element_text(size = 15),
axis.text.y = element_text(size = 15),
plot.title = element_text(size=20),
legend.title = element_text(size = 10),
legend.text = element_text(size = 10),
legend.position = "None",
text = element_text(family = "Comics Sans MS")
) + ease_aes('cubic-in-out') +
geom_dl(aes(label=Location),
method=list("last.points",rot=40))
euro_anim3
```
```{r,cache=TRUE, include=TRUE, echo=TRUE, warning=FALSE, message=FALSE}
# Shorter code
ggplot(ger_au_ch, aes(x=Yearnumeric, y=Prev, col=Location)) + geom_line() + geom_line() +
geom_point() +
transition_reveal(Yearnumeric) +
shadow_mark() +
scale_x_continuous(name ="Year",
breaks= c(1990,1995,2000,2005,
2010, 2015, 2020)) +
xlab("Year") +
ylab("Prevalence of diabetes in %") +
labs(col="Country") + ease_aes('cubic-in-out') +
geom_dl(aes(label=Location),
method=list("last.points",rot=40))
```
- If you would like to reproduce these codes, you can download the data from my folder.
- The file name is **`Eurodiabetes.csv`** or simply click [*here*](https://github.com/Mihiretukebede/mihiretukebede.gitlub.io/blob/master/_posts/2020-07-25-2020-07-25-diabetesprevalenceeurope/Eurodiabetes.csv).
- Click [*here*](https://github.com/Mihiretukebede/mihiretukebede.gitlub.io/blob/master/_posts/2020-07-25-2020-07-25-diabetesprevalenceeurope/2020-07-25-diabetesprevalenceeurope.Rmd) for the codes.
- Once you download the data, don't forget to set your working directory!
That is it all for today. I hope, you like it. See you in my next post.
# Contact
Please mention [MihiretuKebede1](https://twitter.com/MihiretuKebede1) if you tweet this post.