# Chapter 7: pragmatic matters

## Tabulating and cross-tabulating data

**Creating tables from vectors**

In [1]:
library(lsr)
load("nightgarden.Rdata")
who()

   -- Name --   -- Class --   -- Size --
   speaker      character     10        
   utterance    character     10        

In [2]:
print(speaker)

 [1] "upsy-daisy"  "upsy-daisy"  "upsy-daisy"  "upsy-daisy"  "tombliboo"  
 [6] "tombliboo"   "makka-pakka" "makka-pakka" "makka-pakka" "makka-pakka"


In [3]:
print(utterance)

 [1] "pip" "pip" "onk" "onk" "ee"  "oo"  "pip" "pip" "onk" "onk"


In [4]:
table(speaker)

speaker
makka-pakka   tombliboo  upsy-daisy 
          4           2           4 

In [5]:
table(speaker, utterance)

             utterance
speaker       ee onk oo pip
  makka-pakka  0   2  0   2
  tombliboo    1   0  1   0
  upsy-daisy   0   2  0   2

**Creating tables from data frames**

In [6]:
itng <- data.frame(speaker, utterance)
itng

speaker,utterance
upsy-daisy,pip
upsy-daisy,pip
upsy-daisy,onk
upsy-daisy,onk
tombliboo,ee
tombliboo,oo
makka-pakka,pip
makka-pakka,pip
makka-pakka,onk
makka-pakka,onk


In [7]:
table(itng)

             utterance
speaker       ee onk oo pip
  makka-pakka  0   2  0   2
  tombliboo    1   0  1   0
  upsy-daisy   0   2  0   2

tabulate specific variables

In [8]:
xtabs(formula=~speaker+utterance, data=itng)

             utterance
speaker       ee onk oo pip
  makka-pakka  0   2  0   2
  tombliboo    1   0  1   0
  upsy-daisy   0   2  0   2

**Converting a table of counts to a table of porportions**

In [9]:
itng.table <- table(itng)
itng.table

             utterance
speaker       ee onk oo pip
  makka-pakka  0   2  0   2
  tombliboo    1   0  1   0
  upsy-daisy   0   2  0   2

In [10]:
prop.table(itng.table)

             utterance
speaker        ee onk  oo pip
  makka-pakka 0.0 0.2 0.0 0.2
  tombliboo   0.1 0.0 0.1 0.0
  upsy-daisy  0.0 0.2 0.0 0.2

porortion by row

In [11]:
prop.table(itng.table, margin=1)

             utterance
speaker        ee onk  oo pip
  makka-pakka 0.0 0.5 0.0 0.5
  tombliboo   0.5 0.0 0.5 0.0
  upsy-daisy  0.0 0.5 0.0 0.5

porportion by column

In [12]:
prop.table(itng.table, margin=2)

             utterance
speaker        ee onk  oo pip
  makka-pakka 0.0 0.5 0.0 0.5
  tombliboo   1.0 0.0 1.0 0.0
  upsy-daisy  0.0 0.5 0.0 0.5

## Transforming and recoding a variable

In [13]:
load("likert.Rdata")
likert.raw

center the likert data on 4 since it's "no opinion"

In [14]:
likert.centered <- likert.raw-4
likert.centered

strength of opinion

In [15]:
opinion.strength <- abs(likert.centered)
opinion.strength

direction of opinion, ignore strength

In [16]:
opinion.dir <- sign(likert.centered)
opinion.dir

**Cutting a numeric variable into categories**

In [17]:
age <- c(60, 58, 24, 26, 34, 42, 31, 30, 33, 2, 9)

age.breaks <- seq(from=0, to=60, by=20)
age.breaks

In [18]:
age.labels <- c("young", "adult", "older")
age.labels

In [19]:
age.group <- cut(x=age, breaks=age.breaks, labels=age.labels)
data.frame(age, age.group)


age,age.group
60,older
58,older
24,adult
26,adult
34,adult
42,older
31,adult
30,adult
33,adult
2,young


In [20]:
table(age.group)

age.group
young adult older 
    2     6     3 

R can do that for us

In [21]:
age.group2 <- cut(age, breaks=3)
table(age.group2)

age.group2
(1.94,21.3] (21.3,40.7] (40.7,60.1] 
          2           6           3 

separate into roughly equal numbers of people

use quantileCut() in lsr package

In [22]:
age.group3 <- quantileCut(age, n=3)
table(age.group3)

age.group3
(1.94,27.3] (27.3,33.7] (33.7,60.1] 
          4           3           4 

## A few more mathematical functions and operations

- sqrt() - square root
- abs() - absolute value
- log10() - log base 10
- log() - log (base=3 by default
- exp() - exponentiation
- round() - round to nearest; use digits to specify number of digits to round to
- signif() - round to selected number of significant digits
- floor() - round down
- ceiling() - round up

- %/% - integer division
- % - modulus

## Extacting a subset of a vector

In [23]:
is.MP.speaking <- speaker == 'makka-pakka'
is.MP.speaking

In [24]:
utterance[is.MP.speaking]

In [25]:
utterance[speaker == 'makka-pakka']

**%in% operator**

similar to == but can match multiple values

In [26]:
speaker[utterance %in% c("pip", "oo")]

pick elements 2 and 3

In [27]:
utterance[2:3]

drop elements 2 and 3

In [28]:
utterance[-(2:3)]

**Splitting a vector by group**

split(x=variable that nees to be plit into groups, f=grouping variable)

In [29]:
speech.by.char <- split(x=utterance, f=speaker)
print(speech.by.char)

$`makka-pakka`
[1] "pip" "pip" "onk" "onk"

$tombliboo
[1] "ee" "oo"

$`upsy-daisy`
[1] "pip" "pip" "onk" "onk"



first utterance by makka-pakka:

In [30]:
speech.by.char$'makka-pakka'[1]

In [31]:
speech.by.char$tombliboo

note: R requires the quotes when the original record had a space

use importList() from the lsr package to import these split variables into the workspace

In [32]:
who()

   -- Name --         -- Class --   -- Size --
   age                numeric       11        
   age.breaks         numeric       4         
   age.group          factor        11        
   age.group2         factor        11        
   age.group3         factor        11        
   age.labels         character     3         
   is.MP.speaking     logical       10        
   itng               data.frame    10 x 2    
   itng.table         table         3 x 4     
   likert.centered    numeric       10        
   likert.raw         numeric       10        
   opinion.dir        numeric       10        
   opinion.strength   numeric       10        
   speaker            character     10        
   speech.by.char     list          3         
   utterance          character     10        

In [35]:
importList(speech.by.char)

Create these variables? [y/n] 
Create these variables? [y/n] 
Create these variables? [y/n] 
Create these variables? [y/n] y
Names of variables to be created:
[1] "makka.pakka" "tombliboo"   "upsy.daisy" 


In [36]:
who()

   -- Name --         -- Class --   -- Size --
   age                numeric       11        
   age.breaks         numeric       4         
   age.group          factor        11        
   age.group2         factor        11        
   age.group3         factor        11        
   age.labels         character     3         
   is.MP.speaking     logical       10        
   itng               data.frame    10 x 2    
   itng.table         table         3 x 4     
   likert.centered    numeric       10        
   likert.raw         numeric       10        
   makka.pakka        character     4         
   opinion.dir        numeric       10        
   opinion.strength   numeric       10        
   speaker            character     10        
   speech.by.char     list          3         
   tombliboo          character     2         
   upsy.daisy         character     4         
   utterance          character     10        

In [37]:
makka.pakka

## Extracting a subset of a data frame

note: this is pretty much identical to indexing with pandas

**subset function**

x = data frame<br> 
subset = vector of logical values indicating cases (i.e. rows) to keep<br> 
select = indicates which varialbes (columns) to keep


In [38]:
df <- subset(x=itng,
            subset= speaker=='makka-pakka',
            select=utterance)

In [39]:
print(df)

   utterance
7        pip
8        pip
9        onk
10       onk


note that row numbers are preserved

In [40]:
subset(x=itng,
      subset=speaker=='makka-pakka')

Unnamed: 0,speaker,utterance
7,makka-pakka,pip
8,makka-pakka,pip
9,makka-pakka,onk
10,makka-pakka,onk


In [41]:
subset(x=itng, select=utterance)

utterance
pip
pip
onk
onk
ee
oo
pip
pip
onk
onk


**using square brackets 1. rows and columns**

In [42]:
load("nightgarden2.Rdata")

In [43]:
garden

Unnamed: 0,speaker,utterance,line
case.1,upsy-daisy,pip,1
case.2,upsy-daisy,pip,2
case.3,tombliboo,ee,5
case.4,makka-pakka,pip,7
case.5,makka-pakka,onk,9


In [44]:
garden[4:5, 1:2]

Unnamed: 0,speaker,utterance
case.4,makka-pakka,pip
case.5,makka-pakka,onk


In [45]:
garden[c(4,5), c(1,2)]

Unnamed: 0,speaker,utterance
case.4,makka-pakka,pip
case.5,makka-pakka,onk


In [46]:
garden[c("case.4", "case.5"), c("speaker", "utterance")]

Unnamed: 0,speaker,utterance
case.4,makka-pakka,pip
case.5,makka-pakka,onk


In [47]:
garden[4:5, c("speaker", "utterance")]

Unnamed: 0,speaker,utterance
case.4,makka-pakka,pip
case.5,makka-pakka,onk


In [48]:
is.MP.speaking <- garden$speaker == "makka-pakka"
garden[is.MP.speaking, c("speaker", "utterance")]

Unnamed: 0,speaker,utterance
case.4,makka-pakka,pip
case.5,makka-pakka,onk


**Using square brackets 2: some elaborations**

In [49]:
garden[,1:2]

Unnamed: 0,speaker,utterance
case.1,upsy-daisy,pip
case.2,upsy-daisy,pip
case.3,tombliboo,ee
case.4,makka-pakka,pip
case.5,makka-pakka,onk


In [50]:
garden[4:5,]

Unnamed: 0,speaker,utterance,line
case.4,makka-pakka,pip,7
case.5,makka-pakka,onk,9


delete 3rd column

In [51]:
garden[,-3]

Unnamed: 0,speaker,utterance
case.1,upsy-daisy,pip
case.2,upsy-daisy,pip
case.3,tombliboo,ee
case.4,makka-pakka,pip
case.5,makka-pakka,onk


**Using square brackets: 3. understanding 'dropping'**

In [52]:
garden[5,]

Unnamed: 0,speaker,utterance,line
case.5,makka-pakka,onk,9


In [53]:
garden

Unnamed: 0,speaker,utterance,line
case.1,upsy-daisy,pip,1
case.2,upsy-daisy,pip,2
case.3,tombliboo,ee,5
case.4,makka-pakka,pip,7
case.5,makka-pakka,onk,9


In [54]:
garden[,3]

R noticed that the outpute doesn't need a data frame becuase it's only one variable. 

In [55]:
garden[,3,drop=FALSE]

Unnamed: 0,line
case.1,1
case.2,2
case.3,5
case.4,7
case.5,9


**Using square brackets: 4. columns only**

In [56]:
garden[1:2]

Unnamed: 0,speaker,utterance
case.1,upsy-daisy,pip
case.2,upsy-daisy,pip
case.3,tombliboo,ee
case.4,makka-pakka,pip
case.5,makka-pakka,onk


In [57]:
garden[3]

Unnamed: 0,line
case.1,1
case.2,2
case.3,5
case.4,7
case.5,9


In [58]:
garden[[3]]

## Sorting, flipping, and merging data

**Sorting a numeric or character vector**

In [59]:
numbers <- c(2, 4, 3)
sort(numbers)

In [60]:
sort(numbers, decreasing=TRUE)

In [61]:
text <- c("aardvark", "zebra", "swing")
sort(text)

**Sorting a factor**

In [62]:
fac <- factor(text)
print(fac)

[1] aardvark zebra    swing   
Levels: aardvark swing zebra


In [63]:
print(sort(fac))

[1] aardvark swing    zebra   
Levels: aardvark swing zebra


In [64]:
fac <- factor(text, levels=c("zebra", "swing", "aardvark"))
print(fac)

[1] aardvark zebra    swing   
Levels: zebra swing aardvark


In [65]:
print(sort(fac))

[1] zebra    swing    aardvark
Levels: zebra swing aardvark


**Sorting a data frame**

bit difficult normally, but use the `sortFrame()` method in lsr package

In [66]:
sortFrame(garden, speaker, line)

Unnamed: 0,speaker,utterance,line
case.4,makka-pakka,pip,7
case.5,makka-pakka,onk,9
case.3,tombliboo,ee,5
case.1,upsy-daisy,pip,1
case.2,upsy-daisy,pip,2


Sorts by speaker, then sorts by line

use minus sign for reverse order

In [67]:
sortFrame(garden, speaker, -line)

Unnamed: 0,speaker,utterance,line
case.5,makka-pakka,onk,9
case.4,makka-pakka,pip,7
case.3,tombliboo,ee,5
case.2,upsy-daisy,pip,2
case.1,upsy-daisy,pip,1


**Binding vectors together**

In [68]:
cake.1 <- c(100, 80, 0, 0, 0)
cake.2 <- c(100, 100, 90, 30, 10)

combine with data frame

In [69]:
cake.df <- data.frame(cake.1, cake.2)
cake.df

cake.1,cake.2
100,100
80,100
0,90
0,30
0,10


column bind (cbind())

In [70]:
cake.mat1 <- cbind(cake.1, cake.2)
print(cake.mat1)

     cake.1 cake.2
[1,]    100    100
[2,]     80    100
[3,]      0     90
[4,]      0     30
[5,]      0     10


note that this is a matrix, not data.frame

rbind() binds row-wise, not column-wise

In [71]:
cake.mat2 <- rbind(cake.1, cake.2)
print(cake.mat2)

       [,1] [,2] [,3] [,4] [,5]
cake.1  100   80    0    0    0
cake.2  100  100   90   30   10


can add names using rownames() and colnames(). merge() can do database-like merging of vectors and data frames

**Binding multiple copies of the same vector together**

In [72]:
fibonacci <- c(1, 1, 2, 3, 5, 8)
print(rbind(fibonacci, fibonacci, fibonacci))

          [,1] [,2] [,3] [,4] [,5] [,6]
fibonacci    1    1    2    3    5    8
fibonacci    1    1    2    3    5    8
fibonacci    1    1    2    3    5    8


lsr package: rowCopy and colCopy

In [73]:
print(rowCopy(fibonacci, times=3))

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    2    3    5    8
[2,]    1    1    2    3    5    8
[3,]    1    1    2    3    5    8


**Transposing a matrix or data frame**

In [74]:
load("cakes.Rdata")
print(cakes)

       time.1 time.2 time.3 time.4 time.5
cake.1    100     80      0      0      0
cake.2    100    100     90     30     10
cake.3    100     20     20     20     20
cake.4    100    100    100    100    100


In [75]:
class(cakes)

In [76]:
cakes.flipped <- t(cakes)
print(cakes.flipped)

       cake.1 cake.2 cake.3 cake.4
time.1    100    100    100    100
time.2     80    100     20    100
time.3      0     90     20    100
time.4      0     30     20    100
time.5      0     10     20    100


use tFrame() from lsr package to transpose dataframes

In [77]:
tFrame(itng)

Unnamed: 0,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10
speaker,upsy-daisy,upsy-daisy,upsy-daisy,upsy-daisy,tombliboo,tombliboo,makka-pakka,makka-pakka,makka-pakka,makka-pakka
utterance,pip,pip,onk,onk,ee,oo,pip,pip,onk,onk


## Reshaping a data frame

**Long form and wide form data**

In [78]:
load("repeated.Rdata")
who()

   -- Name --         -- Class --   -- Size --
   age                numeric       11        
   age.breaks         numeric       4         
   age.group          factor        11        
   age.group2         factor        11        
   age.group3         factor        11        
   age.labels         character     3         
   cake.1             numeric       5         
   cake.2             numeric       5         
   cake.df            data.frame    5 x 2     
   cake.mat1          matrix        5 x 2     
   cake.mat2          matrix        2 x 5     
   cakes              matrix        4 x 5     
   cakes.flipped      matrix        5 x 4     
   choice             data.frame    4 x 10    
   df                 data.frame    4 x 1     
   drugs              data.frame    10 x 8    
   fac                factor        3         
   fibonacci          numeric       6         
   garden             data.frame    5 x 3     
   is.MP.speaking     logical       5         
   itng      

In [79]:
drugs

id,gender,WMC_alcohol,WMC_caffeine,WMC_no.drug,RT_alcohol,RT_caffeine,RT_no.drug
1,female,3.7,3.7,3.9,488,236,371
2,female,6.4,7.3,7.9,607,376,349
3,female,4.6,7.4,7.3,643,226,412
4,male,6.4,7.8,8.2,684,206,252
5,female,4.9,5.2,7.0,593,262,439
6,male,5.4,6.6,7.2,492,230,464
7,male,7.9,7.9,8.9,690,259,327
8,male,4.1,5.9,4.5,486,230,305
9,female,5.2,6.2,7.2,686,273,327
10,female,6.2,7.4,7.8,645,240,498


wide form: each participant is a single row<br>
two vars which are characteristic of subject (id and gender)<br> 
6 variables -> 2 measured variables in 3 testing conditions<br> 
drug type is a **within-subject factor**

**Reshaping with wideToLong()**

if we want a separate row for each testing occasion

wideToLong() is in lsr package. relies format of variable names

`id', 'gender` = **between-subject** variables

In [80]:
drugs.2 <- wideToLong(drugs, within="drug")
drugs.2

id,gender,drug,WMC,RT
1,female,alcohol,3.7,488
2,female,alcohol,6.4,607
3,female,alcohol,4.6,643
4,male,alcohol,6.4,684
5,female,alcohol,4.9,593
6,male,alcohol,5.4,492
7,male,alcohol,7.9,690
8,male,alcohol,4.1,486
9,female,alcohol,5.2,686
10,female,alcohol,6.2,645


**Reshaping data using longToWide()**

use a formula to indicate which variables are measured separately for each condition, and which is the within-subject factor specifying the condition

2 sided formula: measured vars ~ within-subject factor vars

In [81]:
longToWide(drugs.2, formula = WMC+RT~drug)

id,gender,WMC_alcohol,RT_alcohol,WMC_caffeine,RT_caffeine,WMC_no.drug,RT_no.drug
1,female,3.7,488,3.7,236,3.9,371
2,female,6.4,607,7.3,376,7.9,349
3,female,4.6,643,7.4,226,7.3,412
4,male,6.4,684,7.8,206,8.2,252
5,female,4.9,593,5.2,262,7.0,439
6,male,5.4,492,6.6,230,7.2,464
7,male,7.9,690,7.9,259,8.9,327
8,male,4.1,486,5.9,230,4.5,305
9,female,5.2,686,6.2,273,7.2,327
10,female,6.2,645,7.4,240,7.8,498


**Reshaping with multiple within-subject factors**

In [82]:
choice

id,gender,MRT/block1/day1,MRT/block1/day2,MRT/block2/day1,MRT/block2/day2,PC/block1/day1,PC/block1/day2,PC/block2/day1,PC/block2/day2
1,male,415,400,455,450,79,88,82,93
2,male,500,490,532,518,83,92,86,97
3,female,478,468,499,474,91,98,90,100
4,female,550,502,602,588,75,89,78,95


In [83]:
choice.2 <- wideToLong(choice, within=c("block", "day"), sep="/")
choice.2

id,gender,MRT,PC,block,day
1,male,415,79,block1,day1
2,male,500,83,block1,day1
3,female,478,91,block1,day1
4,female,550,75,block1,day1
1,male,400,88,block1,day2
2,male,490,92,block1,day2
3,female,468,98,block1,day2
4,female,502,89,block1,day2
1,male,455,82,block2,day1
2,male,532,86,block2,day1


In [84]:
longToWide(choice.2, MRT+PC~block+day, sep="/")

id,gender,MRT/block1/day1,PC/block1/day1,MRT/block1/day2,PC/block1/day2,MRT/block2/day1,PC/block2/day1,MRT/block2/day2,PC/block2/day2
1,male,415,79,400,88,455,82,450,93
2,male,500,83,490,92,532,86,518,97
3,female,478,91,468,98,499,90,474,100
4,female,550,75,502,89,602,78,588,95


**What other options are there?**

reshape(), stack(), unstack()<br> 

reshape package -> melt() and cast()<br> 

## Working with Text

**Shortening a string**

In [86]:
animals <- c("cat", "dog", "kangaroo", "whale")
strtrim(animals, width=3)

In [87]:
substr(animals, start=2, stop=3)

**pasting strings together**
paste():<br> 
- \<strings to paste together\>
- sep = seperators (" " by default)
- collapse- whether the inputs should be collapsed. default: None
    

In [89]:
paste("hello", "world")

In [91]:
paste("hello", "world", sep=".")

In [92]:
hw <- c("hello", "world")
ng <- c("nasty", "government")

In [94]:
paste(hw, ng)

In [95]:
paste(hw, ng, sep=".")

In [96]:
paste(hw, ng, collapse=".")

In [97]:
paste(hw, ng, sep=".", collapse=":::")

**splitting strings**

In [98]:
monkey <- "It was the best of times. It was the blurst of times."

use strsplit()<br>
- x=vector of character strings to be split
- split = fixed string or regular expression
- fixed = fixed delimiter (FALSE by default, should usually be set to true)

In [100]:
monkey.1 <- strsplit(monkey, split=" ", fixed=TRUE)
print(monkey.1)

[[1]]
 [1] "It"     "was"    "the"    "best"   "of"     "times." "It"     "was"   
 [9] "the"    "blurst" "of"     "times."



can use unlist function for single inputs

In [102]:
print(unlist(monkey.1))

 [1] "It"     "was"    "the"    "best"   "of"     "times." "It"     "was"   
 [9] "the"    "blurst" "of"     "times."


**Making simple conversions**

toupper(), tolower() (does what you think they do)

chartr() - character by character substitution

In [107]:
old.text <- "netflix"
chartr(old=c("e"), new=c("o"), x=old.text)

**Applying logical operators to text**

- uppercase letters come before lowercase

In [109]:
"anteater" < "ZEBRA"

may have been changed in an update or something

**Concatenating and printing with cat()**

In [110]:
cat(hw, ng)

hello world nasty government

In [111]:
paste(hw, ng, collapse=" ")

cat is for printing. It does not return anything

In [112]:
x<-cat(hw, ng)
x

hello world nasty government

NULL

print will print literally

In [113]:
print("hello\nworld")

[1] "hello\nworld"


cat will interpret special characters

In [114]:
cat("hello\nworld")

hello
world

**Using escape characters in text**

![image.png](attachment:image.png)

In [116]:
PJ <- "P.J. O\'Rourke says, \"Yay, money!\". It\'s a joke, but no-one laughs."
print(PJ)

[1] "P.J. O'Rourke says, \"Yay, money!\". It's a joke, but no-one laughs."


In [117]:
print.noquote(PJ)

[1] P.J. O'Rourke says, "Yay, money!". It's a joke, but no-one laughs.


In [118]:
cat(PJ)

P.J. O'Rourke says, "Yay, money!". It's a joke, but no-one laughs.

**Matching and substituting text**

grep(), gsub(), and sub()

In [120]:
beers <- c("little creatures", "sierra nevada", "coopers pale")
grep(patter="er", x=beers, fixed=TRUE)

gsub() - replace all instances<br> 
sub() - replaces first instance<br> 

In [121]:
gsub(pattern="a", replacement="BLAH", x=beers, fixed=TRUE)

In [122]:
sub(pattern="a", replacement="BLAH", x=beers, fixed=TRUE)

**Regular expressions**

## Reading unusual data files

**Loading data from text files**

**read_csv**<br> 
- header: if the first row does not contain column names, set this to False
- sep: delimeter (usually ",")
- quote: specify which character is used for quotes
- skip: number of lines to ignore
- na.strings: special string to indicate that an entry is missing

In [123]:
data <- read.csv(file="booksales2.csv",
                header=FALSE,
                skip=8,
                quote="*",
                sep="\t",
                na.strings="NFI")

In [124]:
head(data)

V1,V2,V3,V4
January,31,0.0,high
February,28,100.0,high
March,31,200.0,low
April,30,50.0,out
May,31,,out
June,30,0.0,high


other functions for opening other types of data files (using foreign library)

- read.spss()


library(gdata)<br> 
- read.xls()

library(R.matlab) (MATLAB & Octave)<br> 
- readMat()

ect ect. 

## Coercing data from one class to another

In [125]:
x <- "100"
class(x)

In [126]:
x <- as.numeric(x)
class(x)

In [127]:
x+1

In [128]:
x <- as.character(x)
class(x)

In [129]:
as.numeric("hello world")

“NAs introduced by coercion”

**for booleans**
- can be coerced to TRUE: "T", "TRUE", "True", "true", 1
- can be coerced to FALSE: "F", "FALSE", "False", "false", 0

## Other useful data structures


**Matricies (and arrays)**

In [143]:
row.1 <- c(2, 3, 1)
row.2 <- c(5, 6, 7)

M <- rbind(row.1, row.2)
print(M)

      [,1] [,2] [,3]
row.1    2    3    1
row.2    5    6    7


In [144]:
colnames(M) <- c("col.1", "col.2", "col.3")
print(M)

      col.1 col.2 col.3
row.1     2     3     1
row.2     5     6     7


In [145]:
M[2, 3]

In [146]:
M[2,]

In [147]:
M[,3]

Note on matrix multiplication: R has no concept of a row vector or column vector, so when doing matrix\*vector, R treates the vector as being in whichever orientation makes the calculation work. 

matricies must be of a homogeneous datatype

In [148]:
class(M[1])

In [149]:
M[1, 2] <- "text"

In [150]:
M

Unnamed: 0,col.1,col.2,col.3
row.1,2,text,1
row.2,5,6,7


In [151]:
class(M[1])

**arrays / 3d data structures**

In [154]:
dan.awake <- c(T, T, T, T, T, F, F, F, F, F)
xtab.3d <- table(speaker, utterance, dan.awake)
print(xtab.3d)

, , dan.awake = FALSE

             utterance
speaker       ee onk oo pip
  makka-pakka  0   2  0   2
  tombliboo    0   0  1   0
  upsy-daisy   0   0  0   0

, , dan.awake = TRUE

             utterance
speaker       ee onk oo pip
  makka-pakka  0   0  0   0
  tombliboo    1   0  0   0
  upsy-daisy   0   2  0   2



**Ordered factors**

2 different types of factors in R: ordered and unordered<br> 

unordered factor = nominal scaled variable

In [155]:
likert.raw

In [156]:
likert.ordinal <- factor(x=likert.raw,
                        levels=seq(7, 1, -1),
                        ordered=TRUE)
print(likert.ordinal)

 [1] 1 7 3 4 4 4 2 6 5 5
Levels: 7 < 6 < 5 < 4 < 3 < 2 < 1


*always ensure that your ordered factors are ordered properly*

In [157]:
levels(likert.ordinal) <- c("strong.disagree", "disagree", "weak.disagree",
                           "neutral", "weak.agree", "agree", "strong.agree")
print(likert.ordinal)

 [1] strong.agree    strong.disagree weak.agree      neutral        
 [5] neutral         neutral         agree           disagree       
 [9] weak.disagree   weak.disagree  
7 Levels: strong.disagree < disagree < weak.disagree < ... < strong.agree


**Dates and times**

In [158]:
print(Sys.Date())

[1] "2021-07-18"


In [159]:
today <- Sys.Date()
print(today+365)

[1] "2022-07-18"


weekdays() - tells you what day of the week a particular day is on

In [160]:
weekdays(today)

## MIscellaneous topics

**Problems with floating point arithmetic**

In [161]:
0.1+0.2==0.3

There are super small rounding errors that occur when the computer stores these results in memory: 

In [162]:
0.1+0.2-0.3

decimals like 0.1 are actually very long in binary

**The recycling rule**

In [163]:
x <- c(1, 1, 1, 1, 1, 1)
y <- c(0, 1)
x+y

R recycled the shorter vector several times (i.e. broadcasted?)

**Environments**

In [164]:
search()

**Attaching a data frame**

attach() copies the columns from a dataframe to the workspace