/
Introduction-to-ggbash.Rmd
178 lines (133 loc) · 5.33 KB
/
Introduction-to-ggbash.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
---
title: "Introduction to ggbash"
author: "Yasutaka Tanaka"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Introduction to ggbash}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
(This vignette is still in a draft).
ggplot2 provides a comprehensive
collection of functions to draw a variety of informative plots.
```{r, echo=FALSE}
library(ggplot2)
```
```{r, fig.width=6, fig.height=6}
ggplot(iris) +
geom_point(aes(x=Sepal.Width,
y=Sepal.Length,
colour=Species,
size=Petal.Width))
```
ggbash is a different and faster interface for the ggplot2.
In ggbash, you can plot the same figure by the following command:
```r
ggbash( ggplot(iris) + point(x=Sepal.W, y=Sepal.L, col=Species, sz=Petal.W) )
```
The above ggbash source code
(`ggplot(iris) + point(x=Sepal.W, y=Sepal.L, col=Species, sz=Petal.W)`) displays
exactly the same plot of the first ggplot2 command.
# faster way
# how to use in R session
## principles of ggplot2
+ mapping data elements to aesthetics (`x=mpg`)
+ adding layers (point + smooth, point + geom_text_repel)
+ iterative plotting
# Why does that command work?
If you are experienced R users, you might have been surprised by the above command.
Especially if you see the following output:
```r
point
Error: no such object 'point'
```
The `point()` function used above is undefined in R global environment --
+ Why `Sepal.W`, `Sepal.L` and `Petal.W`
Essentially, ggbash is a higher-level language for ggplot2.
It compiles a short ggbash "source code" to an "executable" ggplot2 object like this:
```{r, eval=FALSE}
library(ggbash)
ggbash(iris) # enter in a ggbash session with the 'iris' data frame
```
```{bash, eval=FALSE}
user@host currentDir (iris) $ point Sepal.W Sepal.L c=Sp si=Petal.W | echo
```
```{r, eval=FALSE}
executed:
ggplot(iris) +
geom_point(aes(x=Sepal.Width,
y=Sepal.Length,
colour=Species,
size=Petal.Width))
```
```{r, fig.width=6, fig.height=6, echo=FALSE}
ggplot(iris) +
geom_point(aes(x=Sepal.Width,
y=Sepal.Length,
colour=Species,
size=Petal.Width))
```
If you prefer least possible typing,
```{bash, eval=FALSE}
# specify by column indices
user@host currentDir (iris) $ p 2 1 c=5 si=4
```
gives exactly the same plot. The original ggplot2 notation needs 91 characters (whitespaces are not counted) to draw the above plot, whereas the ggbash version of notation only with 10 characters. This means that **ggbash enables more than 80% reduction of key strokes**. The differece becomes much larger with more complicated plots.
<!-- # Saving Images -->
<!-- Suppose that you are given an unknown dataset and would like to check what the dataset looks like. -->
<!-- You might create a directory called `my_images` and begin generating plots within the directory. -->
<!-- You might want to generate histograms. -->
<!-- While some people uses diag.panel argument -->
<!-- in `graphics::pairs()` for plotting histograms, -->
<!-- others might use `geom_histogram()` for quick analysis. -->
<!-- Then the script might look like this: -->
<!-- ```{r, eval=FALSE} -->
<!-- dataset <- mtcars -->
<!-- dir.create('my_images') -->
<!-- png('my_images/histogram_mpg.png', height=480, width=480) # pixel -->
<!-- ggplot(dataset) + geom_histogram(aes(x=mpg)) -->
<!-- dev.off() -->
<!-- png('my_images/histogram_disp.png', height=480, width=480) # pixel -->
<!-- ggplot(dataset) + geom_histogram(aes(x=disp)) -->
<!-- dev.off() -->
<!-- png('my_images/histogram_hp.png', height=480, width=480) # pixel -->
<!-- ggplot(dataset) + geom_histogram(aes(x=hp)) -->
<!-- dev.off() -->
<!-- # .... -->
<!-- ``` -->
<!-- Or if you don't like to write duplicated codes, -->
<!-- you might define a helper function: -->
<!-- ```{r, eval=FALSE} -->
<!-- ggplot_histogram <- function(dataset, varname='mpg') { -->
<!-- png(paste0('my_images/histogram_',varname,'.png'), -->
<!-- height=480, width=480) -->
<!-- ggplot(dataset) + geom_histogram(aes_string(x=varname)) -->
<!-- dev.off() -->
<!-- } -->
<!-- for ( var in colnames(mtcars) ) -->
<!-- ggplot_histogram(mtcars, var) -->
<!-- ``` -->
<!-- While both approaches suffice for one-time usage, -->
<!-- it would be a pain to repeatedly write -->
<!-- these scripts for every single data analysis project. -->
<!-- You might have to think about some design decisions. -->
<!-- For example, should the directory name `my_images` -->
<!-- be an argument? If so, what is the default value? -->
<!-- How about figure height and width? geom_name like `histogram`? -->
<!-- If you pass geom_name as an argument `mygeom`, -->
<!-- how you can select appropriate geom_mygeom? -->
<!-- ggbash handles these kinds of clerical jobs. -->
<!-- In ggbash, you can do the same task more concisely. -->
<!-- ```{r, eval=FALSE} -->
<!-- ggbash(mtcars) -->
<!-- ``` -->
<!-- ```{bash, eval=FALSE} -->
<!-- user@host currentDir (mtcars) $ mkdir my_images -->
<!-- user@host currentDir (mtcars) $ cd my_images -->
<!-- user@host my_images (mtcars) $ h 1 2 | png small -->
<!-- user@host my_images (mtcars) $ h 1 3 | png small -->
<!-- user@host my_images (mtcars) $ h 1 4 | png small -->
<!-- user@host my_images (mtcars) $ h 1 5 | png small -->
<!-- ``` -->
<!-- Here, `h` partially matches `geom_histogram`. -->