-
Notifications
You must be signed in to change notification settings - Fork 1
/
project_setup.html
1156 lines (1043 loc) · 46.2 KB
/
project_setup.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta http-equiv="X-UA-Compatible" content="IE=EDGE" />
<title>Setting up a reproducible project</title>
<script src="site_libs/header-attrs-2.26/header-attrs.js"></script>
<script src="site_libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="site_libs/bootstrap-3.3.5/css/flatly.min.css" rel="stylesheet" />
<script src="site_libs/bootstrap-3.3.5/js/bootstrap.min.js"></script>
<script src="site_libs/bootstrap-3.3.5/shim/html5shiv.min.js"></script>
<script src="site_libs/bootstrap-3.3.5/shim/respond.min.js"></script>
<style>h1 {font-size: 34px;}
h1.title {font-size: 38px;}
h2 {font-size: 30px;}
h3 {font-size: 24px;}
h4 {font-size: 18px;}
h5 {font-size: 16px;}
h6 {font-size: 12px;}
code {color: inherit; background-color: rgba(0, 0, 0, 0.04);}
pre:not([class]) { background-color: white }</style>
<script src="site_libs/jqueryui-1.13.2/jquery-ui.min.js"></script>
<link href="site_libs/tocify-1.9.1/jquery.tocify.css" rel="stylesheet" />
<script src="site_libs/tocify-1.9.1/jquery.tocify.js"></script>
<script src="site_libs/navigation-1.1/tabsets.js"></script>
<link href="site_libs/font-awesome-6.4.2/css/all.min.css" rel="stylesheet" />
<link href="site_libs/font-awesome-6.4.2/css/v4-shims.min.css" rel="stylesheet" />
<style type="text/css">
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
ul.task-list{list-style: none;}
</style>
<style type="text/css">
code {
white-space: pre;
}
.sourceCode {
overflow: visible;
}
</style>
<style type="text/css" data-origin="pandoc">
pre > code.sourceCode { white-space: pre; position: relative; }
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
pre > code.sourceCode > span:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
pre > code.sourceCode { white-space: pre-wrap; }
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
{ counter-reset: source-line 0; }
pre.numberSource code > span
{ position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
{ content: counter(source-line);
position: relative; left: -1em; text-align: right; vertical-align: baseline;
border: none; display: inline-block;
-webkit-touch-callout: none; -webkit-user-select: none;
-khtml-user-select: none; -moz-user-select: none;
-ms-user-select: none; user-select: none;
padding: 0 4px; width: 4em;
color: #aaaaaa;
}
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; }
div.sourceCode
{ background-color: #f8f8f8; }
@media screen {
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
code span.al { color: #ef2929; } /* Alert */
code span.an { color: #8f5902; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #204a87; } /* Attribute */
code span.bn { color: #0000cf; } /* BaseN */
code span.cf { color: #204a87; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4e9a06; } /* Char */
code span.cn { color: #8f5902; } /* Constant */
code span.co { color: #8f5902; font-style: italic; } /* Comment */
code span.cv { color: #8f5902; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #8f5902; font-weight: bold; font-style: italic; } /* Documentation */
code span.dt { color: #204a87; } /* DataType */
code span.dv { color: #0000cf; } /* DecVal */
code span.er { color: #a40000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #0000cf; } /* Float */
code span.fu { color: #204a87; font-weight: bold; } /* Function */
code span.im { } /* Import */
code span.in { color: #8f5902; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #204a87; font-weight: bold; } /* Keyword */
code span.op { color: #ce5c00; font-weight: bold; } /* Operator */
code span.ot { color: #8f5902; } /* Other */
code span.pp { color: #8f5902; font-style: italic; } /* Preprocessor */
code span.sc { color: #ce5c00; font-weight: bold; } /* SpecialChar */
code span.ss { color: #4e9a06; } /* SpecialString */
code span.st { color: #4e9a06; } /* String */
code span.va { color: #000000; } /* Variable */
code span.vs { color: #4e9a06; } /* VerbatimString */
code span.wa { color: #8f5902; font-weight: bold; font-style: italic; } /* Warning */
</style>
<script>
// apply pandoc div.sourceCode style to pre.sourceCode instead
(function() {
var sheets = document.styleSheets;
for (var i = 0; i < sheets.length; i++) {
if (sheets[i].ownerNode.dataset["origin"] !== "pandoc") continue;
try { var rules = sheets[i].cssRules; } catch (e) { continue; }
var j = 0;
while (j < rules.length) {
var rule = rules[j];
// check if there is a div.sourceCode rule
if (rule.type !== rule.STYLE_RULE || rule.selectorText !== "div.sourceCode") {
j++;
continue;
}
var style = rule.style.cssText;
// check if color or background-color is set
if (rule.style.color === '' && rule.style.backgroundColor === '') {
j++;
continue;
}
// replace div.sourceCode by a pre.sourceCode rule
sheets[i].deleteRule(j);
sheets[i].insertRule('pre.sourceCode{' + style + '}', j);
}
}
})();
</script>
<style type = "text/css">
.main-container {
max-width: 940px;
margin-left: auto;
margin-right: auto;
}
img {
max-width:100%;
}
.tabbed-pane {
padding-top: 12px;
}
.html-widget {
margin-bottom: 20px;
}
button.code-folding-btn:focus {
outline: none;
}
summary {
display: list-item;
}
details > summary > p:only-child {
display: inline;
}
pre code {
padding: 0;
}
</style>
<style type="text/css">
.dropdown-submenu {
position: relative;
}
.dropdown-submenu>.dropdown-menu {
top: 0;
left: 100%;
margin-top: -6px;
margin-left: -1px;
border-radius: 0 6px 6px 6px;
}
.dropdown-submenu:hover>.dropdown-menu {
display: block;
}
.dropdown-submenu>a:after {
display: block;
content: " ";
float: right;
width: 0;
height: 0;
border-color: transparent;
border-style: solid;
border-width: 5px 0 5px 5px;
border-left-color: #cccccc;
margin-top: 5px;
margin-right: -10px;
}
.dropdown-submenu:hover>a:after {
border-left-color: #adb5bd;
}
.dropdown-submenu.pull-left {
float: none;
}
.dropdown-submenu.pull-left>.dropdown-menu {
left: -100%;
margin-left: 10px;
border-radius: 6px 0 6px 6px;
}
</style>
<script type="text/javascript">
// manage active state of menu based on current page
$(document).ready(function () {
// active menu anchor
href = window.location.pathname
href = href.substr(href.lastIndexOf('/') + 1)
if (href === "")
href = "index.html";
var menuAnchor = $('a[href="' + href + '"]');
// mark the anchor link active (and if it's in a dropdown, also mark that active)
var dropdown = menuAnchor.closest('li.dropdown');
if (window.bootstrap) { // Bootstrap 4+
menuAnchor.addClass('active');
dropdown.find('> .dropdown-toggle').addClass('active');
} else { // Bootstrap 3
menuAnchor.parent().addClass('active');
dropdown.addClass('active');
}
// Navbar adjustments
var navHeight = $(".navbar").first().height() + 15;
var style = document.createElement('style');
var pt = "padding-top: " + navHeight + "px; ";
var mt = "margin-top: -" + navHeight + "px; ";
var css = "";
// offset scroll position for anchor links (for fixed navbar)
for (var i = 1; i <= 6; i++) {
css += ".section h" + i + "{ " + pt + mt + "}\n";
}
style.innerHTML = "body {" + pt + "padding-bottom: 40px; }\n" + css;
document.head.appendChild(style);
});
</script>
<!-- tabsets -->
<style type="text/css">
.tabset-dropdown > .nav-tabs {
display: inline-table;
max-height: 500px;
min-height: 44px;
overflow-y: auto;
border: 1px solid #ddd;
border-radius: 4px;
}
.tabset-dropdown > .nav-tabs > li.active:before, .tabset-dropdown > .nav-tabs.nav-tabs-open:before {
content: "\e259";
font-family: 'Glyphicons Halflings';
display: inline-block;
padding: 10px;
border-right: 1px solid #ddd;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open > li.active:before {
content: "\e258";
font-family: 'Glyphicons Halflings';
border: none;
}
.tabset-dropdown > .nav-tabs > li.active {
display: block;
}
.tabset-dropdown > .nav-tabs > li > a,
.tabset-dropdown > .nav-tabs > li > a:focus,
.tabset-dropdown > .nav-tabs > li > a:hover {
border: none;
display: inline-block;
border-radius: 4px;
background-color: transparent;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open > li {
display: block;
float: none;
}
.tabset-dropdown > .nav-tabs > li {
display: none;
}
</style>
<!-- code folding -->
<style type="text/css">
#TOC {
margin: 25px 0px 20px 0px;
}
@media (max-width: 768px) {
#TOC {
position: relative;
width: 100%;
}
}
@media print {
.toc-content {
/* see https://github.com/w3c/csswg-drafts/issues/4434 */
float: right;
}
}
.toc-content {
padding-left: 30px;
padding-right: 40px;
}
div.main-container {
max-width: 1200px;
}
div.tocify {
width: 20%;
max-width: 260px;
max-height: 85%;
}
@media (min-width: 768px) and (max-width: 991px) {
div.tocify {
width: 25%;
}
}
@media (max-width: 767px) {
div.tocify {
width: 100%;
max-width: none;
}
}
.tocify ul, .tocify li {
line-height: 20px;
}
.tocify-subheader .tocify-item {
font-size: 0.90em;
}
.tocify .list-group-item {
border-radius: 0px;
}
.tocify-subheader {
display: inline;
}
.tocify-subheader .tocify-item {
font-size: 0.95em;
}
</style>
</head>
<body>
<div class="container-fluid main-container">
<!-- setup 3col/9col grid for toc_float and main content -->
<div class="row">
<div class="col-xs-12 col-sm-4 col-md-3">
<div id="TOC" class="tocify">
</div>
</div>
<div class="toc-content col-xs-12 col-sm-8 col-md-9">
<div class="navbar navbar-default navbar-fixed-top" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-bs-toggle="collapse" data-target="#navbar" data-bs-target="#navbar">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="index.html">PGR-R</a>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<li>
<a href="setup.html">
<span class="fa fa-cog"></span>
Setup
</a>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" data-bs-toggle="dropdown" aria-expanded="false">
<span class="fa fa-book"></span>
R Book
<span class="caret"></span>
</a>
<ul class="dropdown-menu" role="menu">
<li>
<a href="https://intro2r.com">
<span class="fa fa-firefox"></span>
Web book
</a>
</li>
<li class="divider"></li>
<li>
<a href="https://github.com/alexd106/Rbook/raw/master/docs/Rbook.pdf">
<span class="fa fa-file-pdf"></span>
PDF book
</a>
</li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" data-bs-toggle="dropdown" aria-expanded="false">
<span class="fa fa-university"></span>
Learn R
<span class="caret"></span>
</a>
<ul class="dropdown-menu" role="menu">
<li>
<a href="howto.html">
<span class="fa fa-tv"></span>
How-to
</a>
</li>
<li class="divider"></li>
<li>
<a href="lectures.html">
<span class="fa fa-chalkboard"></span>
Lectures
</a>
</li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" data-bs-toggle="dropdown" aria-expanded="false">
<span class="fa fa-file-contract"></span>
Exercises
<span class="caret"></span>
</a>
<ul class="dropdown-menu" role="menu">
<li>
<a href="exercises.html">
<span class="fa fa-folder"></span>
Excercises
</a>
</li>
<li class="divider"></li>
<li>
<a href="exercise_solutions.html">
<span class="fa fa-folder"></span>
Exercise solutions
</a>
</li>
</ul>
</li>
<li>
<a href="data.html">
<span class="fa fa-download"></span>
Data
</a>
</li>
<li>
<a href="Tutorials.html">
<span class="fa fa-desktop"></span>
Tutorials
</a>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" data-bs-toggle="dropdown" aria-expanded="false">
<span class="fa fa-question-circle"></span>
Info
<span class="caret"></span>
</a>
<ul class="dropdown-menu" role="menu">
<li>
<a href="syllabus.html">
<span class="fa fa-graduation-cap"></span>
Syllabus
</a>
</li>
<li class="divider"></li>
<li>
<a href="People.html">
<span class="fa fa-user-friends"></span>
People
</a>
</li>
<li class="divider"></li>
<li>
<a href="resources.html">
<span class="fa fa-book"></span>
Resources
</a>
</li>
<li>
<a href="https://forms.gle/2yBDzGs29oBMfM9w7">
<span class="fa fa-commenting"></span>
Feedback
</a>
</li>
<li class="divider"></li>
<li>
<a href="People.html">
<span class="fa fa-envelope fa-lg"></span>
Contact
</a>
</li>
<li>
<a href="https://github.com/alexd106/intro2R">
<span class="fa fa-github fa-lg"></span>
Source code
</a>
</li>
<li>
<a href="https://twitter.com/Scedacity">
<span class="fa fa-twitter fa-lg"></span>
Twitter
</a>
</li>
</ul>
</li>
</ul>
<ul class="nav navbar-nav navbar-right">
</ul>
</div><!--/.nav-collapse -->
</div><!--/.container -->
</div><!--/.navbar -->
<div id="header">
<h1 class="title toc-ignore">Setting up a reproducible project</h1>
</div>
<p> </p>
<p>This short tutorial will introduce you to setting up and managing a
project in RStudio to facilitate robust and reproducible research. We
will also touch on creating an organised directory structure, giving
files useful names, documenting data and workflows and good scripting
practice.</p>
<p>I estimate that this tutorial should take you roughly 30 minutes to 1
hour to complete in one sitting, but feel free to dip in and out over a
longer period if that suits you better.</p>
<p>This tutorial assumes that you have already installed the latest
versions of R and RStudio. If you haven’t done this yet you can find
instructions <a href="setup.html">here</a>.</p>
<p> </p>
<div id="why-bother" class="section level2">
<h2>Why bother?</h2>
<p> </p>
<p>As with most things in life, when it comes to dealing with data and
data analysis things are so much simpler if you’re organised. Clear
project organisation makes it easier for both you (especially the future
you) and your collaborators to make sense of what you’ve done. There’s
nothing more frustrating than coming back to a project months (sometimes
years) later and have to spend days (or weeks) figuring out where
everything is, what you did and why you did it. A well documented
project that has a consistent and logical structure increases the
liklihood that you can pick up where you left off with minimal fuss no
matter how much time has passed. In addition, it’s much easier to write
code to automate tasks when files are well organised and are sensibly
named. This is even more relevant nowadays as it’s never been easier to
collect vasts amount of data which can be saved across 1000’s or even
100,000’s of separate data files. Lastly, having a well organised
project reduces the risk of introducing bugs or errors into your
workflow and if they do occur (which inevitably they will at some
point), it makes it easier to track down these errors and deal with them
efficiently.</p>
<p>Thankfully, there are some nice features in R and RStudio that make
it quite easy to manage a project. There are also a few simple steps you
can take right at the start of any project to help keep things
shipshape.</p>
<p> </p>
</div>
<div id="projects-in-rstudio" class="section level2">
<h2>Projects in RStudio</h2>
<p> </p>
<p>A great way of keeping things organised is to use RStudio Projects.
An RStudio Project keeps all of your R scripts, R markdown documents, R
functions and data together in one place. The nice thing about RStudio
Projects is that each project has its own directory, workspace, history
and source documents so different analyses that you are working on are
kept completely separate from each other. This means that you can have
multiple instances of RStudio open at the same time (if that’s your
thing) or you can switch very easily between projects without fear of
them interfering with each other.</p>
<p>To create a project, open RStudio and select <code>File</code> ->
<code>New Project...</code> from the menu. You can create either an
entirely new project, a project from an existing directory or a version
controlled project (see the <a href="Github_intro.html">GitHub
tutorial</a> for further details about this). In this tutorial we will
create a project in a new directory.</p>
<p> </p>
<p><img src="images/new_proj.png" width="60%" style="display: block; margin: auto;" /></p>
<p><br />
</p>
<p>You can also create a new project by clicking on the ‘Project’ button
in the top right of RStudio and selecting ‘New Project…’</p>
<p> </p>
<p><img src="images/new_proj1.png" width="30%" style="display: block; margin: auto;" /></p>
<p> </p>
<p>In the next window select ‘New Project’.</p>
<p> </p>
<p><img src="images/new_proj2.png" width="60%" style="display: block; margin: auto;" /></p>
<p> </p>
<p>Now enter the name of the directory you want to create in the
‘Directory name:’ field (we’ll call it <code>first_project</code> for
this tutorial). If you want to change the location of the directory on
your computer click the ‘Browse…’ button and navigate to where you would
like to create the directory. I always tick the ‘Open in new session’
box as well. Finally, hit the ‘Create Project’ to create the new
project.</p>
<p> </p>
<p><img src="images/new_proj3.png" width="60%" style="display: block; margin: auto;" /></p>
<p> </p>
<p>Once your new project has been created you will now have a new folder
on your computer that contains an RStudio project file called
<code>first_project.Rproj</code>. This <code>.Rproj</code> file contains
various project options (but you shouldn’t really interact with it) and
can also be used as a shortcut for opening the project directly from the
file system (just double click on it). You can check this out in the
‘Files’ tab in RStudio (or in Finder if you’re on a Mac or File Explorer
in Windows).</p>
<p> </p>
<p><img src="images/new_proj4.png" width="80%" style="display: block; margin: auto;" /></p>
<p> </p>
<p>The last thing I suggest you do is select <code>Tools</code> ->
<code>Project Options...</code> from the menu. Click on the ‘General’
tab on the left hand side and then change the values for ‘Restore .RData
into workspace at startup’ and ‘Save workspace to .RData on exit’ from
‘Default’ to ‘No’. This ensures that every time you open your project
you start with a clean R session. You don’t have to do this (many people
don’t) but I prefer to start with a completely clean workspace whenever
I open my projects to avoid any potential conflicts with things I did in
previous sessions. The downside to this is that you will need to rerun
your R code every time you open you project.</p>
<p> </p>
<p><img src="images/new_proj5.png" width="60%" style="display: block; margin: auto;" /></p>
<p> </p>
<p>Now that you have an RStudio project set up you can start creating R
scripts (or <a href="Rmarkdown_intro.html">R markdown</a> documents) or
whatever you need to complete you project. All of the R scripts will now
be contained within the RStudio project and saved in the project
folder.</p>
<p> </p>
</div>
<div id="working-directories" class="section level2">
<h2>Working directories</h2>
<p> </p>
<p>The working directory is the default location where R will look for
files you want to load and where it will put any files you save. One of
the great things about using RStudio Projects is that when you open a
project it will automatically set your working directory to the
appropriate location. You can check the file path of your working
directory by looking at bar at the top of the Console pane. Note: the
<code>~</code> symbol above is shorthand for <code>/Users/nhy163/</code>
on my Mac computer (the same on Linux computers).</p>
<p> </p>
<p><img src="images/dir_struct.png" width="80%" style="display: block; margin: auto;" /></p>
<p> </p>
<p>You can also use the <code>getwd()</code> function in the Console
which returns the file path of the current working directory.</p>
<p> </p>
<p><img src="images/dir_struct2.png" width="60%" style="display: block; margin: auto;" /></p>
<p> </p>
<p>In the example above, my working directory is a folder called
‘first_project’ which is a subfolder of “Teaching’ in my ‘Alex’ folder
which in turn is in a ‘Documents’ folder located in the ‘nhy163’ folder
which itself is in the ‘Users’ folder. On a Windows based computer my
working directory would also include a drive letter
(i.e. <code>C:/Users/nhy163/Documents/Alex/Teaching/first_project</code>).</p>
<p>If you weren’t using an RStudio Project then you would have to set
your working directory using the <code>setwd()</code> function at the
start of every R script (something I did for many years).</p>
<p> </p>
<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" tabindex="-1"></a><span class="fu">setwd</span>(<span class="st">'/Users/nhy163/Documents/Alex/Teaching/first_project'</span>)</span></code></pre></div>
<p> </p>
<p>However, the problem with <code>setwd()</code> is that it uses an
<em>absolute</em> file path which is specific to the computer you are
working on. If you want to send your script to someone else (or if
you’re working on a different computer) this absolute file path is not
going to work on your friend/colleagues computer as their directory
configuration will be different (you are unlikely to have a directory
structure <code>/Users/nhy163/Documents/Alex/Teaching/</code> on your
computer). This results in a project that is not self-contained and not
easily portable. RStudio solves this problem by allowing you to use
<em>relative</em> file paths which are relative to the <em>Root</em>
project directory. The Root project directory is just the directory that
contains the <code>.Rproj</code> file (<code>first_project.Rproj</code>
in our case). If you want to share your analysis with someone else, all
you need to do is save the entire project directory and send to your to
your collaborator. They would then just need to open the project file
and any R scripts that contain references to relative file paths will
just work. For example, let’s say that you’ve created a subdirectory
called <code>raw_data</code> in your Root project directory that
contains a tab delimited datafile called <code>mydata.txt</code> (we
will cover directory structures below). To import this datafile in an
RStudio project all you need to include in your R script is</p>
<p> </p>
<div class="sourceCode" id="cb2"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1" tabindex="-1"></a>dataf <span class="ot"><-</span> <span class="fu">read.table</span>(<span class="st">'raw_data/mydata.txt'</span>, <span class="at">header =</span> <span class="cn">TRUE</span>, <span class="at">sep =</span> <span class="st">'</span><span class="sc">\t</span><span class="st">'</span>)</span></code></pre></div>
<p> </p>
<p>Because the file path <code>raw_data/mydata.txt</code> is relative to
the project directory it doesn’t matter where you collaborator saves the
project directory on their computer it will still work.</p>
<p>If you weren’t using an RStudio project then you would have to use
either of the options below neither of which would work on a different
computer.</p>
<p> </p>
<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" tabindex="-1"></a><span class="fu">setwd</span>(<span class="st">"/Users/nhy163/Documents/Alex/Teaching/first_project/"</span>)</span>
<span id="cb3-2"><a href="#cb3-2" tabindex="-1"></a></span>
<span id="cb3-3"><a href="#cb3-3" tabindex="-1"></a>dataf <span class="ot"><-</span> <span class="fu">read.table</span>(<span class="st">"raw_data/mydata.txt"</span>, <span class="at">header =</span> <span class="cn">TRUE</span>, <span class="at">sep =</span> <span class="st">"</span><span class="sc">\t</span><span class="st">"</span>)</span>
<span id="cb3-4"><a href="#cb3-4" tabindex="-1"></a></span>
<span id="cb3-5"><a href="#cb3-5" tabindex="-1"></a><span class="co"># or</span></span>
<span id="cb3-6"><a href="#cb3-6" tabindex="-1"></a></span>
<span id="cb3-7"><a href="#cb3-7" tabindex="-1"></a>dataf <span class="ot"><-</span> <span class="fu">read.table</span>(<span class="st">"/Users/nhy163/Documents/Alex/Teaching/first_project/raw_data/mydata.txt"</span>,</span>
<span id="cb3-8"><a href="#cb3-8" tabindex="-1"></a> <span class="at">header =</span> <span class="cn">TRUE</span>, <span class="at">sep =</span> <span class="st">"</span><span class="sc">\t</span><span class="st">"</span>)</span></code></pre></div>
<p> </p>
<p>For those of you who want to take the notion of relative file paths a
step further, take a look at the <code>here()</code> function in the
<code>here</code> <a href="https://github.com/r-lib/here">package</a>.
The <code>here()</code> function allows you to automagically build file
paths for any file relative to the project root directory that are also
operating system agnostic (works on a Mac or Windows machine). For
example, to import our <code>mydata.txt</code> file from the
<code>raw_data</code> directory just use</p>
<p> </p>
<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" tabindex="-1"></a><span class="fu">library</span>(here) <span class="co"># you may need to install the here package first</span></span>
<span id="cb4-2"><a href="#cb4-2" tabindex="-1"></a>dataf <span class="ot"><-</span> <span class="fu">read.table</span>(<span class="fu">here</span>(<span class="st">"raw_data"</span>, <span class="st">"mydata.txt"</span>), <span class="at">header =</span> <span class="cn">TRUE</span>, <span class="at">sep =</span> <span class="st">'</span><span class="sc">\t</span><span class="st">'</span>)</span>
<span id="cb4-3"><a href="#cb4-3" tabindex="-1"></a></span>
<span id="cb4-4"><a href="#cb4-4" tabindex="-1"></a><span class="co"># or without loading the here package</span></span>
<span id="cb4-5"><a href="#cb4-5" tabindex="-1"></a></span>
<span id="cb4-6"><a href="#cb4-6" tabindex="-1"></a>dataf <span class="ot"><-</span> <span class="fu">read.table</span>(here<span class="sc">::</span><span class="fu">here</span>(<span class="st">"raw_data"</span>, <span class="st">"mydata.txt"</span>), <span class="at">header =</span> <span class="cn">TRUE</span>, <span class="at">sep =</span> <span class="st">'</span><span class="sc">\t</span><span class="st">'</span>)</span></code></pre></div>
<p> </p>
</div>
<div id="directory-structure" class="section level2">
<h2>Directory structure</h2>
<p> </p>
<p>In addition to using RStudio Projects, it’s also really good practice
to structure your directory in a consistent and logical way to help both
you and your collaborators. I frequently use the following directory
structure in my R based projects</p>
<p> </p>
<pre><code>
Root
|
|__data
| |_raw_data
| |_processed_data
| |_metadata
|
|_R
|
|_Rmd
|
|_scripts
|
|_output
</code></pre>
<p> </p>
<p>In my working directory I have the following directories:</p>
<ul>
<li><p><strong>Root</strong> - This is your project directory containing
your .Rproj file.</p></li>
<li><p><strong>data</strong> - I store all my data in this directory.
The subdirectory called <code>raw_data</code> contains raw data files
and only raw data files. These files should be treated as <strong>read
only</strong> and should not be changed in any way. If you need to
process/clean/modify your data do this in R (not MS Excel) as you can
document (and justify) any changes made. Any processed data should be
saved to a separate file and stored in the <code>processed_data</code>
subdirectory. Information about data collection methods, details of data
download and any other useful metadata should be saved in a text
document (see README text files below) in the <code>metadata</code>
subdirectory.</p></li>
<li><p><strong>R</strong> - This is an optional directory where I save
all of my custom R functions I have written for the current analysis.
These can then be sourced into R using the <code>source()</code>
function.</p></li>
<li><p><strong>Rmd</strong> - An optional directory where I save my R
markdown documents.</p></li>
<li><p><strong>scripts</strong> - All of the main R scripts I have
written for the current project are saved here.</p></li>
<li><p><strong>output</strong> - Outputs from my R scripts such as
plots, HTML files and data summaries are saved in this directory. This
helps me and my collaborators distinguish what files are outputs and
which are source files.</p></li>
</ul>
<p> </p>
<p>Of course, the structure described above is just what works for me
most of the time and should be viewed as a starting point for your own
needs. I tend to have a fairly consistent directory structure across my
projects as this allows me to quickly orientate myself when I return to
a project after a while. Having said that, different projects will have
different requirements so I happily add and remove directories as
required.</p>
<p> </p>
<p>You can create your directory structure using Windows Explorer (or
Finder on a Mac) or within RStudio by clicking on the ‘New folder’
button in the ‘Files’ pane.</p>
<p> </p>
<p><img src="images/dir_struct3.png" width="60%" style="display: block; margin: auto;" /></p>
<p> </p>
<p>An alternative approach is to use the <code>dir.create()</code>
function in the R Console</p>
<p> </p>
<div class="sourceCode" id="cb6"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb6-1"><a href="#cb6-1" tabindex="-1"></a><span class="co"># create directory called 'data'</span></span>
<span id="cb6-2"><a href="#cb6-2" tabindex="-1"></a><span class="fu">dir.create</span>(<span class="st">'data'</span>)</span>
<span id="cb6-3"><a href="#cb6-3" tabindex="-1"></a></span>
<span id="cb6-4"><a href="#cb6-4" tabindex="-1"></a><span class="co"># create subdirectory raw_data in the data directory</span></span>
<span id="cb6-5"><a href="#cb6-5" tabindex="-1"></a><span class="fu">dir.create</span>(<span class="st">'data/raw_data'</span>)</span>
<span id="cb6-6"><a href="#cb6-6" tabindex="-1"></a></span>
<span id="cb6-7"><a href="#cb6-7" tabindex="-1"></a><span class="co"># list the files and directories</span></span>
<span id="cb6-8"><a href="#cb6-8" tabindex="-1"></a><span class="fu">list.files</span>(<span class="at">recursive =</span> <span class="cn">TRUE</span>, <span class="at">include.dirs =</span> <span class="cn">TRUE</span>)</span>
<span id="cb6-9"><a href="#cb6-9" tabindex="-1"></a></span>
<span id="cb6-10"><a href="#cb6-10" tabindex="-1"></a><span class="co"># [1] "data" "data/raw_data" "first_project.Rproj"</span></span></code></pre></div>
<p> </p>
</div>
<div id="file-names" class="section level2">
<h2>File names</h2>
<p> </p>
<p>What you call your files matters more than you might think. Naming
files is also more difficult than you think. The key requirement for a
‘good’ file name is that it’s informative whilst also being relatively
short. This is not always an easy compromise and often requires some
thought. Ideally you should try to avoid the following!</p>
<p> </p>
<div class="figure" style="text-align: center">
<img src="images/xkcd_files.png" alt="source:https://xkcd.com/1459/" width="30%" />
<p class="caption">
source:<a href="https://xkcd.com/1459/"
class="uri">https://xkcd.com/1459/</a>
</p>
</div>
<p> </p>
<p>Although there’s not really a recognised standard approach to naming
files (actually <a href="https://en.wikipedia.org/wiki/Filename">there
is</a>, just not everyone uses it), there are a couple of things to bear
in mind.</p>
<p> </p>
<ul>
<li><p>First, avoid using spaces in file names by replacing them with
underscores or even hyphens. Why does this matter? One reason is that
some command line software (especially many bioinformatic tools) won’t
recognise a file name with a space and you’ll have to go through all
sorts of shenanigans using escape characters to make sure spaces are
handled correctly. Even if you don’t think you will ever use command
line software you may be doing so indirectly. Take R markdown for
example, if you want to render an R markdown document to pdf using the
<code>rmarkdown</code> package you will actually be using a command line
LaTeX engine under the hood. Another good reason not to use spaces in
file names is that it makes searching for file names (or parts of file
names) using <a
href="https://en.wikipedia.org/wiki/Regular_expression">regular
expressions</a> in R (or any other language) much more
difficult.</p></li>
<li><p>For the reasons given above, avoid using special characters
(i.e. @£$%^&*():;<>?{}/) in your file names.</p></li>
<li><p>If you are versioning your files with sequential numbers
(i.e. file1, file2, file3 …) and you have more than 9 files you should
use 01, 02, 03 .. 10 as this will ensure the files are printed in the
correct order (see what happens if you don’t). If you have more than 99
files then use 001, 002, 003 …etc.</p></li>
<li><p>If your file names include dates, use the ISO 8601 format
YYYY-MM-DD (or YYYYMMDD) to ensure your files are sorted in proper
chronological order.</p></li>
<li><p>Never use the word <em>final</em> in any file name - it never
is!</p></li>
</ul>
<p> </p>
<p>Whatever file naming convention you decide to use, try to adopt
early, stick with it and be consistent. You’ll thank me!</p>
<p> </p>
</div>
<div id="project-documentation" class="section level2">
<h2>Project documentation</h2>
<p> </p>
<p>A quick note or two about writing R code and creating R scripts.
Unless you’re doing something really quick and dirty I suggest that you
always write your R code as an R script. R scripts are what make R so
useful. Not only do you have a complete record of your analysis, from
data manipulation, visualisation and statistical analysis, you can also
share this code (and data) with friends, colleagues and importantly when
you submit and publish your research to a journal. With this in mind,
make sure you include in your R script all the information required to
make your work reproducible (author names, dates, sampling design etc).
This information could be included as a series of comments
<code>#</code> or, even better, by mixing executable code with narrative
into an <a href="#Rmarkdown_intro.html">R markdown</a> document. It’s
also good practice to include the output of the
<code>sessionInfo()</code> function at the end of any script which
prints the R version, details of the operating system and also loaded
packages.</p>
<p>Here is an example of including meta-information at the start of an R
script</p>
<p> </p>
<div class="sourceCode" id="cb7"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb7-1"><a href="#cb7-1" tabindex="-1"></a><span class="co"># Title: Time series analysis of snouters</span></span>
<span id="cb7-2"><a href="#cb7-2" tabindex="-1"></a></span>
<span id="cb7-3"><a href="#cb7-3" tabindex="-1"></a><span class="co"># Purpose : This script performs a time series analyses on snouter count data.</span></span>
<span id="cb7-4"><a href="#cb7-4" tabindex="-1"></a><span class="co"># Data consists of counts of snouter species collected from 18 islands </span></span>
<span id="cb7-5"><a href="#cb7-5" tabindex="-1"></a><span class="co"># in the Hy-yi-yi archipelago between 1950 and 1957. </span></span>
<span id="cb7-6"><a href="#cb7-6" tabindex="-1"></a><span class="co"># For details of snouter biology see:</span></span>
<span id="cb7-7"><a href="#cb7-7" tabindex="-1"></a><span class="co"># https://en.wikipedia.org/wiki/Rhinogradentia</span></span>
<span id="cb7-8"><a href="#cb7-8" tabindex="-1"></a></span>
<span id="cb7-9"><a href="#cb7-9" tabindex="-1"></a><span class="co"># Project number: #007</span></span>
<span id="cb7-10"><a href="#cb7-10" tabindex="-1"></a></span>
<span id="cb7-11"><a href="#cb7-11" tabindex="-1"></a><span class="co"># Data file: '/Users/Another/snouter_analysis/snouter_pop.txt'</span></span>
<span id="cb7-12"><a href="#cb7-12" tabindex="-1"></a></span>
<span id="cb7-13"><a href="#cb7-13" tabindex="-1"></a><span class="co"># Author: A. Nother</span></span>
<span id="cb7-14"><a href="#cb7-14" tabindex="-1"></a><span class="co"># Contact details: a.nother@uir.ac.uk</span></span>
<span id="cb7-15"><a href="#cb7-15" tabindex="-1"></a></span>
<span id="cb7-16"><a href="#cb7-16" tabindex="-1"></a><span class="co"># Date script created: Mon Dec 2 16:06:44 2019 ------------------------------</span></span>
<span id="cb7-17"><a href="#cb7-17" tabindex="-1"></a><span class="co"># Date script last modified: Thu Dec 12 16:07:12 2019 ----------------------</span></span>
<span id="cb7-18"><a href="#cb7-18" tabindex="-1"></a></span>
<span id="cb7-19"><a href="#cb7-19" tabindex="-1"></a><span class="co"># package dependencies</span></span>
<span id="cb7-20"><a href="#cb7-20" tabindex="-1"></a><span class="fu">library</span>(PopSnouter)</span>
<span id="cb7-21"><a href="#cb7-21" tabindex="-1"></a><span class="fu">library</span>(ggplot2)</span>
<span id="cb7-22"><a href="#cb7-22" tabindex="-1"></a></span>
<span id="cb7-23"><a href="#cb7-23" tabindex="-1"></a><span class="fu">print</span>(<span class="st">'put your lovely R code here'</span>)</span>
<span id="cb7-24"><a href="#cb7-24" tabindex="-1"></a></span>
<span id="cb7-25"><a href="#cb7-25" tabindex="-1"></a><span class="co"># good practice to include sessionInfo</span></span>
<span id="cb7-26"><a href="#cb7-26" tabindex="-1"></a></span>
<span id="cb7-27"><a href="#cb7-27" tabindex="-1"></a><span class="fu">sessionInfo</span>()</span></code></pre></div>
<p> </p>
<p>This is just one example and there are no hard and fast rules so feel
free to develop a system that works for you. A really useful shortcut in
RStudio is to automatically include a time and date stamp in your R
script. To do this, write <code>ts</code> where you want to insert your
time stamp in your R script and then press the ‘shift + tab’ keys.
RStudio will magically convert <code>ts</code> into the current date and
time and also automatically comment out this line with a <code>#</code>.
Another really useful RStudio shortcut is to comment out multiple lines
in your script with a <code>#</code> symbol. To do this, highlight the
lines of text you want to comment and then press ‘ctrl + shift + c’. To
uncomment the lines just use ‘ctrl + shift + c’ again.</p>
<p>In addition to including metadata in your R scripts it’s also common
practice to create a separate text file to record important information.
By convention these text files are named <code>README</code>. I often
include a <code>README</code> file in the directory where I keep my raw