-
Notifications
You must be signed in to change notification settings - Fork 2
/
09-R-intro.Rmd
147 lines (146 loc) · 15.2 KB
/
09-R-intro.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# Lists and data frames
<hr />
<p><a href="" id="Lists"></a> <a href="" id="Lists-1"></a></p>
<h3 id="lists" class="section">6.1 Lists</h3>
<p><a href="" id="index-Lists"></a></p>
<p>An R <em>list</em> is an object consisting of an ordered collection of objects known as its <em>components</em>.</p>
<p>There is no particular need for the components to be of the same mode or type, and, for example, a list could consist of a numeric vector, a logical value, a matrix, a complex vector, a character array, a function, and so on. Here is a simple example of how to make a list:</p>
<div class="example">
<pre class="example1"><code>> Lst <- list(name="Fred", wife="Mary", no.children=3,
child.ages=c(4,7,9))</code></pre>
</div>
<p><a href="" id="index-list"></a></p>
<p>Components are always <em>numbered</em> and may always be referred to as such. Thus if <code class="calibre2">Lst</code> is the name of a list with four components, these may be individually referred to as <code class="calibre2">Lst[[1]]</code>, <code class="calibre2">Lst[[2]]</code>, <code class="calibre2">Lst[[3]]</code> and <code class="calibre2">Lst[[4]]</code>. If, further, <code class="calibre2">Lst[[4]]</code> is a vector subscripted array then <code class="calibre2">Lst[[4]][1]</code> is its first entry.</p>
<p>If <code class="calibre2">Lst</code> is a list, then the function <code class="calibre2">length(Lst)</code> gives the number of (top level) components it has.</p>
<p>Components of lists may also be <em>named</em>, and in this case the component may be referred to either by giving the component name as a character string in place of the number in double square brackets, or, more conveniently, by giving an expression of the form</p>
<div class="example">
<pre class="example1"><code>> name$component_name</code></pre>
</div>
<p>for the same thing.</p>
<p>This is a very useful convention as it makes it easier to get the right component if you forget the number.</p>
<p>So in the simple example given above:</p>
<p><code class="calibre2">Lst$name</code> is the same as <code class="calibre2">Lst[[1]]</code> and is the string <code class="calibre2">"Fred"</code>,</p>
<p><code class="calibre2">Lst$wife</code> is the same as <code class="calibre2">Lst[[2]]</code> and is the string <code class="calibre2">"Mary"</code>,</p>
<p><code class="calibre2">Lst$child.ages[1]</code> is the same as <code class="calibre2">Lst[[4]][1]</code> and is the number <code class="calibre2">4</code>.</p>
<p>Additionally, one can also use the names of the list components in double square brackets, i.e., <code class="calibre2">Lst[["name"]]</code> is the same as <code class="calibre2">Lst$name</code>. This is especially useful, when the name of the component to be extracted is stored in another variable as in</p>
<div class="example">
<pre class="example1"><code>> x <- "name"; Lst[[x]]</code></pre>
</div>
<p>It is very important to distinguish <code class="calibre2">Lst[[1]]</code> from <code class="calibre2">Lst[1]</code>. ‘<code class="calibre2">[[…]]</code>’ is the operator used to select a single element, whereas ‘<code class="calibre2">[…]</code>’ is a general subscripting operator. Thus the former is the <em>first object in the list</em> <code class="calibre2">Lst</code>, and if it is a named list the name is <em>not</em> included. The latter is a <em>sublist of the list <code class="calibre2">Lst</code> consisting of the first entry only. If it is a named list, the names are transferred to the sublist.</em></p>
<p>The names of components may be abbreviated down to the minimum number of letters needed to identify them uniquely. Thus <code class="calibre2">Lst$coefficients</code> may be minimally specified as <code class="calibre2">Lst$coe</code> and <code class="calibre2">Lst$covariance</code> as <code class="calibre2">Lst$cov</code>.</p>
<p>The vector of names is in fact simply an attribute of the list like any other and may be handled as such. Other structures besides lists may, of course, similarly be given a <em>names</em> attribute also.</p>
<hr />
<p><a href="" id="Constructing-and-modifying-lists"></a> <a href="" id="Constructing-and-modifying-lists-1"></a></p>
<h3 id="constructing-and-modifying-lists" class="section">6.2 Constructing and modifying lists</h3>
<p>New lists may be formed from existing objects by the function <code class="calibre2">list()</code>. An assignment of the form</p>
<div class="example">
<pre class="example1"><code>> Lst <- list(name_1=object_1, …, name_m=object_m)</code></pre>
</div>
<p>sets up a list <code class="calibre2">Lst</code> of <em>m</em> components using object_1, …, object_m for the components and giving them names as specified by the argument names, (which can be freely chosen). If these names are omitted, the components are numbered only. The components used to form the list are <em>copied</em> when forming the new list and the originals are not affected.</p>
<p>Lists, like any subscripted object, can be extended by specifying additional components. For example</p>
<div class="example">
<pre class="example1"><code>> Lst[5] <- list(matrix=Mat)</code></pre>
</div>
<hr />
<p><a href="" id="Concatenating-lists"></a> <a href="" id="Concatenating-lists-1"></a></p>
<h4 id="concatenating-lists" class="subheading">6.2.1 Concatenating lists</h4>
<p><a href="" id="index-Concatenating-lists"></a> <a href="" id="index-c-3"></a></p>
<p>When the concatenation function <code class="calibre2">c()</code> is given list arguments, the result is an object of mode list also, whose components are those of the argument lists joined together in sequence.</p>
<div class="example">
<pre class="example1"><code>> list.ABC <- c(list.A, list.B, list.C)</code></pre>
</div>
<p>Recall that with vector objects as arguments the concatenation function similarly joined together all arguments into a single vector structure. In this case all other attributes, such as <code class="calibre2">dim</code> attributes, are discarded.</p>
<hr />
<p><a href="" id="Data-frames"></a> <a href="" id="Data-frames-1"></a></p>
<h3 id="data-frames" class="section">6.3 Data frames</h3>
<p><a href="" id="index-Data-frames"></a></p>
<p>A <em>data frame</em> is a list with class <code class="calibre2">"data.frame"</code>. There are restrictions on lists that may be made into data frames, namely</p>
<ul>
<li>The components must be vectors (numeric, character, or logical), factors, numeric matrices, lists, or other data frames.</li>
<li>Matrices, lists, and data frames provide as many variables to the new data frame as they have columns, elements, or variables, respectively.</li>
<li>Numeric vectors, logicals and factors are included as is, and by default<a href="appendix-f-references.html#FOOT18" id="DOCF18"><sup>18</sup></a> character vectors are coerced to be factors, whose levels are the unique values appearing in the vector.</li>
<li>Vector structures appearing as variables of the data frame must all have the <em>same length</em>, and matrix structures must all have the same <em>row size</em>.</li>
</ul>
<p>A data frame may for many purposes be regarded as a matrix with columns possibly of differing modes and attributes. It may be displayed in matrix form, and its rows and columns extracted using matrix indexing conventions.</p>
<hr />
<p><a href="" id="Making-data-frames"></a> <a href="" id="Making-data-frames-1"></a></p>
<h4 id="making-data-frames" class="subheading">6.3.1 Making data frames</h4>
<p>Objects satisfying the restrictions placed on the columns (components) of a data frame may be used to form one using the function <code class="calibre2">data.frame</code>: <a href="" id="index-data_002eframe"></a></p>
<div class="example">
<pre class="example1"><code>> accountants <- data.frame(home=statef, loot=incomes, shot=incomef)</code></pre>
</div>
<p>A list whose components conform to the restrictions of a data frame may be <em>coerced</em> into a data frame using the function <code class="calibre2">as.data.frame()</code> <a href="" id="index-as_002edata_002eframe"></a></p>
<p>The simplest way to construct a data frame from scratch is to use the <code class="calibre2">read.table()</code> function to read an entire data frame from an external file. This is discussed further in <a href="#Reading-data-from-files">Reading data from files</a>.</p>
<hr />
<p><a href="" id="attach_0028_0029-and-detach_0028_0029"></a> <a href="" id="attach_0028_0029-and-detach_0028_0029-1"></a></p>
<h4 id="attach-and-detach" class="subheading">6.3.2 <code class="calibre2">attach()</code> and <code class="calibre2">detach()</code></h4>
<p><a href="" id="index-attach"></a> <a href="" id="index-detach"></a></p>
<p>The <code class="calibre2">$</code> notation, such as <code class="calibre2">accountants$home</code>, for list components is not always very convenient. A useful facility would be somehow to make the components of a list or data frame temporarily visible as variables under their component name, without the need to quote the list name explicitly each time.</p>
<p>The <code class="calibre2">attach()</code> function takes a ‘database’ such as a list or data frame as its argument. Thus suppose <code class="calibre2">lentils</code> is a data frame with three variables <code class="calibre2">lentils$u</code>, <code class="calibre2">lentils$v</code>, <code class="calibre2">lentils$w</code>. The attach</p>
<div class="example">
<pre class="example1"><code>> attach(lentils)</code></pre>
</div>
<p>places the data frame in the search path at position 2, and provided there are no variables <code class="calibre2">u</code>, <code class="calibre2">v</code> or <code class="calibre2">w</code> in position 1, <code class="calibre2">u</code>, <code class="calibre2">v</code> and <code class="calibre2">w</code> are available as variables from the data frame in their own right. At this point an assignment such as</p>
<div class="example">
<pre class="example1"><code>> u <- v+w</code></pre>
</div>
<p>does not replace the component <code class="calibre2">u</code> of the data frame, but rather masks it with another variable <code class="calibre2">u</code> in the working directory at position 1 on the search path. To make a permanent change to the data frame itself, the simplest way is to resort once again to the <code class="calibre2">$</code> notation:</p>
<div class="example">
<pre class="example1"><code>> lentils$u <- v+w</code></pre>
</div>
<p>However the new value of component <code class="calibre2">u</code> is not visible until the data frame is detached and attached again.</p>
<p>To detach a data frame, use the function</p>
<div class="example">
<pre class="example1"><code>> detach()</code></pre>
</div>
<p>More precisely, this statement detaches from the search path the entity currently at position 2. Thus in the present context the variables <code class="calibre2">u</code>, <code class="calibre2">v</code> and <code class="calibre2">w</code> would be no longer visible, except under the list notation as <code class="calibre2">lentils$u</code> and so on. Entities at positions greater than 2 on the search path can be detached by giving their number to <code class="calibre2">detach</code>, but it is much safer to always use a name, for example by <code class="calibre2">detach(lentils)</code> or <code class="calibre2">detach("lentils")</code></p>
<blockquote>
<p><strong>Note:</strong> In R lists and data frames can only be attached at position 2 or above, and what is attached is a <em>copy</em> of the original object. You can alter the attached values <em>via</em> <code class="calibre2">assign</code>, but the original list or data frame is unchanged.</p>
</blockquote>
<hr />
<p><a href="" id="Working-with-data-frames"></a> <a href="" id="Working-with-data-frames-1"></a></p>
<h4 id="working-with-data-frames" class="subheading">6.3.3 Working with data frames</h4>
<p>A useful convention that allows you to work with many different problems comfortably together in the same working directory is</p>
<ul>
<li>gather together all variables for any well defined and separate problem in a data frame under a suitably informative name;</li>
<li>when working with a problem attach the appropriate data frame at position 2, and use the working directory at level 1 for operational quantities and temporary variables;</li>
<li>before leaving a problem, add any variables you wish to keep for future reference to the data frame using the <code class="calibre2">$</code> form of assignment, and then <code class="calibre2">detach()</code>;</li>
<li>finally remove all unwanted variables from the working directory and keep it as clean of left-over temporary variables as possible.</li>
</ul>
<p>In this way it is quite simple to work with many problems in the same directory, all of which have variables named <code class="calibre2">x</code>, <code class="calibre2">y</code> and <code class="calibre2">z</code>, for example.</p>
<hr />
<p><a href="" id="Attaching-arbitrary-lists"></a> <a href="" id="Attaching-arbitrary-lists-1"></a></p>
<h4 id="attaching-arbitrary-lists" class="subheading">6.3.4 Attaching arbitrary lists</h4>
<p><code class="calibre2">attach()</code> is a generic function that allows not only directories and data frames to be attached to the search path, but other classes of object as well. In particular any object of mode <code class="calibre2">"list"</code> may be attached in the same way:</p>
<div class="example">
<pre class="example1"><code>> attach(any.old.list)</code></pre>
</div>
<p>Anything that has been attached can be detached by <code class="calibre2">detach</code>, by position number or, preferably, by name.</p>
<hr />
<p><a href="" id="Managing-the-search-path"></a> <a href="" id="Managing-the-search-path-1"></a></p>
<h4 id="managing-the-search-path" class="subheading">6.3.5 Managing the search path</h4>
<p><a href="" id="index-search"></a> <a href="" id="index-Search-path"></a></p>
<p>The function <code class="calibre2">search</code> shows the current search path and so is a very useful way to keep track of which data frames and lists (and packages) have been attached and detached. Initially it gives</p>
<div class="example">
<pre class="example1"><code>> search()
[1] ".GlobalEnv" "Autoloads" "package:base"</code></pre>
</div>
<p>where <code class="calibre2">.GlobalEnv</code> is the workspace.<a href="appendix-f-references.html#FOOT19" id="DOCF19"><sup>19</sup></a></p>
<p>After <code class="calibre2">lentils</code> is attached we have</p>
<div class="example">
<pre class="example1"><code>> search()
[1] ".GlobalEnv" "lentils" "Autoloads" "package:base"
> ls(2)
[1] "u" "v" "w"</code></pre>
</div>
<p>and as we see <code class="calibre2">ls</code> (or <code class="calibre2">objects</code>) can be used to examine the contents of any position on the search path.</p>
<p>Finally, we detach the data frame and confirm it has been removed from the search path.</p>
<div class="example">
<pre class="example1"><code>> detach("lentils")
> search()
[1] ".GlobalEnv" "Autoloads" "package:base"</code></pre>
</div>
<hr />
<p><a href="" id="Reading-data-from-files"></a> <a href="" id="Reading-data-from-files-1"></a></p>
<div id="calibre_pb_16" class="calibre8">
</div>