# Week 1 - The Business Environment

## Gapminder

Installing the `gapminder` package and loading it into the environment:

In [1]:
if (!('gapminder' %in% installed.packages())) {
    package.install('gapminder')
}
suppressWarnings(library('gapminder'))
data(gapminder)

Splicing a vector using `which`. This returns a vector of indices that satisfy the criteria.

In [2]:
aus <- gapminder$country=='Australia'
year <- gapminder$year==2007
index1 <- aus & year
gapminder[index1,]

country,continent,year,lifeExp,pop,gdpPercap
Australia,Oceania,2007,81.235,20434176,34435.37


In [3]:
index2 <- which(gapminder$country=='Australia')
print(index2)
gapminder[index2,]

 [1] 61 62 63 64 65 66 67 68 69 70 71 72


country,continent,year,lifeExp,pop,gdpPercap
Australia,Oceania,1952,69.12,8691212,10039.6
Australia,Oceania,1957,70.33,9712569,10949.65
Australia,Oceania,1962,70.93,10794968,12217.23
Australia,Oceania,1967,71.1,11872264,14526.12
Australia,Oceania,1972,71.93,13177000,16788.63
Australia,Oceania,1977,73.49,14074100,18334.2
Australia,Oceania,1982,74.74,15184200,19477.01
Australia,Oceania,1987,76.32,16257249,21888.89
Australia,Oceania,1992,77.56,17481977,23424.77
Australia,Oceania,1997,78.83,18565243,26997.94


In [4]:
index3 <- which(gapminder$country=='Australia' & gapminder$year==2007)
print(index3)
gapminder[index3,]

[1] 72


country,continent,year,lifeExp,pop,gdpPercap
Australia,Oceania,2007,81.235,20434176,34435.37


Splicing a vector using `match`. This returns the index of the first occurrence that satisifies the condition.

In [5]:
index4 <- match('Australia', gapminder$country)
print(index4)
gapminder[index4,]

[1] 61


country,continent,year,lifeExp,pop,gdpPercap
Australia,Oceania,1952,69.12,8691212,10039.6


Splicing a vector using `%in%`. This returns a boolean vector of whether the value is in the object.

In [6]:
print(c('Australia', 'US') %in% gapminder$country)
print(c('Australia', 'United States') %in% gapminder$country)

[1]  TRUE FALSE
[1] TRUE TRUE


Using both `%in%` and `which` can be very powerful. Take the following example:
* `gapminder$country %in% c('Australia', 'United States') & gapminder$year==2007` returns a boolean vector.
* This wrapped inside the `which` function returns a vector of indices where the value is `True`

In [7]:
index5 <- which(gapminder$country %in% c('Australia', 'United States') & gapminder$year==2007)
print(index5)
gapminder[index5,]

[1]   72 1620


country,continent,year,lifeExp,pop,gdpPercap
Australia,Oceania,2007,81.235,20434176,34435.37
United States,Americas,2007,78.242,301139947,42951.65


## Practice Questions

Question 3:

In [8]:
print(log2(5))

[1] 2.321928


Question 4:

In [9]:
print(seq(7, 50, 7))

[1]  7 14 21 28 35 42 49


Question 5:

In [10]:
if (!('babynames' %in% installed.packages())) {
    package.install('babynames')
}
suppressWarnings(library('babynames'))
data(babynames)

In [11]:
head(babynames)
help(babynames)
class(babynames$prop)

year,sex,name,n,prop
1880,F,Mary,7065,0.07238359
1880,F,Anna,2604,0.02667896
1880,F,Emma,2003,0.02052149
1880,F,Elizabeth,1939,0.01986579
1880,F,Minnie,1746,0.01788843
1880,F,Margaret,1578,0.0161672


* There are 5 variables in the `babynames` dataset.
* The `prop` variable is the proportion of people of that `sex` with that `name` born in that `year`.
* The `prop` variable is a numeric class.

Sorting the names vector once and storing it as a variable, `sortedNames`, so it doesn't need to be re-sorted multiple times.

In [12]:
sortedNames = sort(babynames$name)

In [13]:
paste('The first name when sorted alphabetically is', sortedNames[1])

The corresponding `prop` values to the first `name`, Aaban are:

In [31]:
filter1 <- which(babynames$name==sortedNames[1])
babynames[filter1,]

year,sex,name,n,prop
2007,M,Aaban,5,2.26e-06
2009,M,Aaban,6,2.83e-06
2010,M,Aaban,9,4.39e-06
2011,M,Aaban,11,5.42e-06
2012,M,Aaban,11,5.43e-06
2013,M,Aaban,14,6.94e-06
2014,M,Aaban,16,7.83e-06
2015,M,Aaban,15,7.36e-06
2016,M,Aaban,9,4.46e-06
2017,M,Aaban,11,5.6e-06


In [33]:
sortedProp <- sort(babynames$prop, decreasing=T)
paste('The prop with the highest value is', sortedProp[1])

In [38]:
filter2 <- match(sortedProp[1], babynames$prop)
paste('The name corresponding to the highest prop value is', babynames$name[filter2])

Question 6:

In [42]:
data(lifetables)
head(lifetables)

x,qx,lx,dx,Lx,Tx,ex,sex,year
0,0.14596,100000,14596,90026,5151511,51.52,M,1900
1,0.03282,85404,2803,84003,5061484,59.26,M,1900
2,0.01634,82601,1350,81926,4977482,60.26,M,1900
3,0.01052,81251,855,80824,4895556,60.25,M,1900
4,0.00875,80397,703,80045,4814732,59.89,M,1900
5,0.00628,79693,501,79443,4734687,59.41,M,1900


The `sex` variable is the factor vector and has two levels: `M`, and `F`

In [45]:
levels(lifetables$sex)

Finding the corresponding x (age) when lx=76 using:
1. Logicals and []
1. `which()`
1. `match()`
1. `which()` and `%in%`

In [53]:
lifetables[lifetables$lx==76,]
lifetables[which(lifetables$lx==76),]
lifetables[match(76, lifetables$lx),]
lifetables[which(lifetables$lx %in% 76),]

x,qx,lx,dx,Lx,Tx,ex,sex,year
114,0.42433,76,32,60,130,1.72,F,2010


x,qx,lx,dx,Lx,Tx,ex,sex,year
114,0.42433,76,32,60,130,1.72,F,2010


x,qx,lx,dx,Lx,Tx,ex,sex,year
114,0.42433,76,32,60,130,1.72,F,2010


x,qx,lx,dx,Lx,Tx,ex,sex,year
114,0.42433,76,32,60,130,1.72,F,2010


Finding the corresponding x (age) when dx=88 and year=1930:

In [56]:
filter3 <- (lifetables$dx==88) & (lifetables$year==1930)
lifetables[filter3,]

x,qx,lx,dx,Lx,Tx,ex,sex,year
12,0.00098,90630,88,90585,5508047,60.78,M,1930


Find the corresponding x (age) for the the minimum qx value:

In [58]:
lifetables[lifetables$qx==min(lifetables$qx),]

x,qx,lx,dx,Lx,Tx,ex,sex,year
10,7e-05,99251,7,99247,7103080,71.57,M,2010


Subsetting the 12th-15th observations of x:

In [59]:
lifetables[12:15,]

x,qx,lx,dx,Lx,Tx,ex,sex,year
11,0.00217,78044,169,77960,4262653,54.62,M,1900
12,0.00212,77875,165,77793,4184693,53.74,M,1900
13,0.00239,77710,186,77617,4106900,52.85,M,1900
14,0.00254,77525,197,77426,4029283,51.97,M,1900


Show that both `sort()` and `order()` will give the same result:

In [78]:
sort(lifetables$qx)[1:10]

In [79]:
lifetables$qx[order(lifetables$qx)][1:10]