# Getting started with R

**By [Christine Zhang](https://twitter.com/christinezhang) (Knight-Mozilla / Los Angeles Times) & [Ryan Menezes](https://twitter.com/ryanvmenezes) (Los Angeles Times)**

*IRE Conference -- New Orleans, LA*
 
June 18, 2016  

This workshop is a basic introduction to R, a free, open-source software for data analysis and statistics. 

R is a powerful tool that can help you quickly and effectively answer questions using data.

Take our host city, New Orleans, for example. Hurricane Katrina was a devastating natural disaster that substantially affected the population of New Orleans. The hurricane took place in August 2005, which coincidentally falls between the US Census full population counts in 2000 and 2010.

In this session, we will use the "Demographic Profile” -- a large summary file with many different demographic variables downloaded from the [U.S. Census Bureau website](http://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml) -- from 2000 and 2010, for all census tracts in the state of Louisiana.

**In this session, we will:**

* Load the 2000 data in R
* Select a few variables pertaining to population counts and housing occupancy
* Clean the data
* Focus on the Orleans Parish in particular, which represents the city of New Orleans, and tally the population
* Perform the same steps on the 2010 data
* Merge the two sets
* Write the merged data out to a CSV file for our next class, [More with R](More%20with%20R.ipynb), where we'll do some more in depth analysis

Basic analysis techniques like the ones you will learn in this class can help you write data-driven stories, like [this one written by The Times-Picayune](http://www.nola.com/politics/index.ssf/2011/02/new_orleans_officials_2010_pop.html) shortly after the census released its 2010 tally.

The story begins:

> Five years after Hurricane Katrina emptied New Orleans and prompted the largest mass migration in modern American history, **the 2010 Census counted 343,829 people living in the still-recovering city, a 29 percent drop since the last head count a decade ago**, according to data released today.

Using the data we have, we will attempt to replicate the calculations in that lede.

The following code and annotations were written in a Jupyter notebook. The code is best run in RStudio using R version xxx.

We'll start by loading in the 2000 data, which is stored in a CSV (comma-separated values) file. CSVs are plain-text files of data where commas separate the columns within a line. It is sometimes preferable to work with CSVs as opposed to files of a proprietary format, such as Microsoft Excel files, but the Census Bureau readily makes data available in both formats.

Let's run R's `read.csv` command and save the data to a variable called `census2000`:

In [1]:
census2000 <- read.csv('2000_census_demographic_profile.csv')

Now that this ran without incident, let's inspect the first few rows using `head`, which by default prints out the first six rows of a data frame (R's internal term for a spreadsheet):

In [2]:
head(census2000)

Unnamed: 0,GEO.id,GEO.id2,GEO.display.label,HC01_VC01,HC02_VC01,HC01_VC03,HC02_VC03,HC01_VC04,HC02_VC04,HC01_VC05,ellip.h,HC01_VC100,HC02_VC100,HC01_VC101,HC02_VC101,HC01_VC102,HC02_VC102,HC01_VC103,HC02_VC103,HC01_VC104,HC02_VC104
1,Id,Id2,Geography,Number; Total population,Percent; Total population,Number; Total population - SEX AND AGE - Male,Percent; Total population - SEX AND AGE - Male,Number; Total population - SEX AND AGE - Female,Percent; Total population - SEX AND AGE - Female,Number; Total population - SEX AND AGE - Under 5 years,⋯,Number; HOUSING TENURE - Occupied housing units,Percent; HOUSING TENURE - Occupied housing units,Number; HOUSING TENURE - Occupied housing units - Owner-occupied housing units,Percent; HOUSING TENURE - Occupied housing units - Owner-occupied housing units,Number; HOUSING TENURE - Occupied housing units - Renter-occupied housing units,Percent; HOUSING TENURE - Occupied housing units - Renter-occupied housing units,Number; HOUSING TENURE - Occupied housing units - Average household size of owner-occupied unit,Percent; HOUSING TENURE - Occupied housing units - Average household size of owner-occupied unit,Number; HOUSING TENURE - Occupied housing units - Average household size of renter-occupied unit,Percent; HOUSING TENURE - Occupied housing units - Average household size of renter-occupied unit
2,1400000US22001960100,22001960100,"Census Tract 9601, Acadia Parish, Louisiana",6188,100,2920,47,3268,53,462,⋯,2236,100,1526,68,710,32,3,(X),3,(X)
3,1400000US22001960200,22001960200,"Census Tract 9602, Acadia Parish, Louisiana",5056,100,2562,51,2494,49,346,⋯,1764,100,1461,83,303,17,3,(X),3,(X)
4,1400000US22001960300,22001960300,"Census Tract 9603, Acadia Parish, Louisiana",3149,100,1593,51,1556,49,209,⋯,1145,100,1041,91,104,9,3,(X),3,(X)
5,1400000US22001960400,22001960400,"Census Tract 9604, Acadia Parish, Louisiana",5617,100,2754,49,2863,51,429,⋯,1991,100,1630,82,361,18,3,(X),3,(X)
6,1400000US22001960500,22001960500,"Census Tract 9605, Acadia Parish, Louisiana",4927,100,2461,50,2466,50,400,⋯,1692,100,1419,84,273,16,3,(X),3,(X)


Upon inspection, we can see that the file came with two header rows. We clearly do not need the first one so we can rerun the read command and tell it so:

In [3]:
census2000 <- read.csv('2000_census_demographic_profile.csv', skip = 1)

In [4]:
head(census2000)

Unnamed: 0,Id,Id2,Geography,Number..Total.population,Percent..Total.population,Number..Total.population...SEX.AND.AGE...Male,Percent..Total.population...SEX.AND.AGE...Male,Number..Total.population...SEX.AND.AGE...Female,Percent..Total.population...SEX.AND.AGE...Female,Number..Total.population...SEX.AND.AGE...Under.5.years,ellip.h,Number..HOUSING.TENURE...Occupied.housing.units,Percent..HOUSING.TENURE...Occupied.housing.units,Number..HOUSING.TENURE...Occupied.housing.units...Owner.occupied.housing.units,Percent..HOUSING.TENURE...Occupied.housing.units...Owner.occupied.housing.units,Number..HOUSING.TENURE...Occupied.housing.units...Renter.occupied.housing.units,Percent..HOUSING.TENURE...Occupied.housing.units...Renter.occupied.housing.units,Number..HOUSING.TENURE...Occupied.housing.units...Average.household.size.of.owner.occupied.unit,Percent..HOUSING.TENURE...Occupied.housing.units...Average.household.size.of.owner.occupied.unit,Number..HOUSING.TENURE...Occupied.housing.units...Average.household.size.of.renter.occupied.unit,Percent..HOUSING.TENURE...Occupied.housing.units...Average.household.size.of.renter.occupied.unit
1,1400000US22001960100,22001960100,"Census Tract 9601, Acadia Parish, Louisiana",6188,100,2920,47,3268,53,462,⋯,2236,100,1526,68,710,32,3,(X),3,(X)
2,1400000US22001960200,22001960200,"Census Tract 9602, Acadia Parish, Louisiana",5056,100,2562,51,2494,49,346,⋯,1764,100,1461,83,303,17,3,(X),3,(X)
3,1400000US22001960300,22001960300,"Census Tract 9603, Acadia Parish, Louisiana",3149,100,1593,51,1556,49,209,⋯,1145,100,1041,91,104,9,3,(X),3,(X)
4,1400000US22001960400,22001960400,"Census Tract 9604, Acadia Parish, Louisiana",5617,100,2754,49,2863,51,429,⋯,1991,100,1630,82,361,18,3,(X),3,(X)
5,1400000US22001960500,22001960500,"Census Tract 9605, Acadia Parish, Louisiana",4927,100,2461,50,2466,50,400,⋯,1692,100,1419,84,273,16,3,(X),3,(X)
6,1400000US22001960600,22001960600,"Census Tract 9606, Acadia Parish, Louisiana",5654,100,2647,47,3007,53,464,⋯,2073,100,1474,71,599,29,3,(X),3,(X)


Visually, we can see that this data set is very wide. In fact, there are 195 columns.

Let's keep a handful of these:

* `Id2`: This is what the census bureau calls a FIPS code. It is a unique numerical identifier for all census tracts. This will be important when we join our two datasets together.  
 
 
* `Geography`: This is a text description of the tract, with the parish name.  
 
 
* `Number..Total.population`: The total population of the tract.  
 
 
* `Number..HOUSING.OCCUPANCY...Total.housing.units`, `Number..HOUSING.OCCUPANCY...Total.housing.units...Occupied.housing.units`, and `Number..HOUSING.OCCUPANCY...Total.housing.units...Vacant.housing.units`: The total, occupied and vacant housing units.

To help us trim the data set to just these six columns, we are going to import a package. There are thousands of packages for R created by the open-source community, which help improve on what is included in R by default.

The one we will use here is called dplyr.

In [5]:
## if dplyr was not installed we would have to run this
# install.packages('dplyr')

## to import the package and all of its functions
library('dplyr')


Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union



From dplyr, we will use the `select` function to trim the data set and save it to a new variable called census2000.trimmed:

In [6]:
census2000.trimmed <- select(
    census2000, # name of the data frame
    # list of all the column names we want to keep
    Id2, Geography, Number..Total.population, 
    Number..HOUSING.OCCUPANCY...Total.housing.units, 
    Number..HOUSING.OCCUPANCY...Total.housing.units...Occupied.housing.units, 
    Number..HOUSING.OCCUPANCY...Total.housing.units...Vacant.housing.units
)

head(census2000.trimmed)

Unnamed: 0,Id2,Geography,Number..Total.population,Number..HOUSING.OCCUPANCY...Total.housing.units,Number..HOUSING.OCCUPANCY...Total.housing.units...Occupied.housing.units,Number..HOUSING.OCCUPANCY...Total.housing.units...Vacant.housing.units
1,22001960100,"Census Tract 9601, Acadia Parish, Louisiana",6188,2410,2236,174
2,22001960200,"Census Tract 9602, Acadia Parish, Louisiana",5056,1909,1764,145
3,22001960300,"Census Tract 9603, Acadia Parish, Louisiana",3149,1246,1145,101
4,22001960400,"Census Tract 9604, Acadia Parish, Louisiana",5617,2176,1991,185
5,22001960500,"Census Tract 9605, Acadia Parish, Louisiana",4927,1796,1692,104
6,22001960600,"Census Tract 9606, Acadia Parish, Louisiana",5654,2292,2073,219


This shows us that we were able to select the columns correctly. But one lingering issue is that these column names are long and unwieldy. Since we are going to be typing them often, let's rename them to shorter, more convenient versions:

In [7]:
colnames(census2000.trimmed) <- c(
    'fips.code', 'census.tract', 'population',
    'total.housing.units', 'occupied.housing.units', 'vacant.housing.units'
)
head(census2000.trimmed)

Unnamed: 0,fips.code,census.tract,population,total.housing.units,occupied.housing.units,vacant.housing.units
1,22001960100,"Census Tract 9601, Acadia Parish, Louisiana",6188,2410,2236,174
2,22001960200,"Census Tract 9602, Acadia Parish, Louisiana",5056,1909,1764,145
3,22001960300,"Census Tract 9603, Acadia Parish, Louisiana",3149,1246,1145,101
4,22001960400,"Census Tract 9604, Acadia Parish, Louisiana",5617,2176,1991,185
5,22001960500,"Census Tract 9605, Acadia Parish, Louisiana",4927,1796,1692,104
6,22001960600,"Census Tract 9606, Acadia Parish, Louisiana",5654,2292,2073,219


Another helpful command to run on any data set is `str`, which gives you the structure of the variable as defined by R:

In [8]:
str(census2000.trimmed)

'data.frame':	1106 obs. of  6 variables:
 $ fips.code             : num  2.2e+10 2.2e+10 2.2e+10 2.2e+10 2.2e+10 ...
 $ census.tract          : Factor w/ 1106 levels "Census Tract 10.01, Lafayette Parish, Louisiana",..: 970 978 985 993 1000 1006 1011 1015 1018 1021 ...
 $ population            : Factor w/ 1019 levels "0","1","10,248",..: 859 710 350 801 692 806 647 804 711 842 ...
 $ total.housing.units   : Factor w/ 905 levels "1","10","1,002",..: 565 404 104 499 357 531 397 522 426 594 ...
 $ occupied.housing.units: Factor w/ 876 levels "0","1","1,001",..: 512 363 74 446 338 468 347 470 374 514 ...
 $ vacant.housing.units  : Factor w/ 374 levels "0","1","10","100",..: 86 54 5 98 9 134 83 86 110 196 ...


The structure tells us that this is a data frame with 1106 rows and six columns. It further tells us the type of each column. 

Notice how the FIPS code read in as a number but the other numeric columns read in as “factors”? That's R-speak for a categorical variable, and any character variables are by default set to this type. This happened because the numbers and those columns have commas. The presence of a single character within a number makes R treat the entire column as strings. This will be an issue later when we try to add two numbers together, as R doesn't know how to add two characters.

The solution: we need to remove the comma from all the strings, then recast the variable as a number.

To help with this we are going to use another package called `stringr`, and a function from within it called `str_replace`:

In [9]:
# install.packages('stringr')

library('stringr')

Let's start with the population variable. First, let's remove the comma and write the result to the original column. (The format for calling a column from a data frame in R is `df.name$column.name`)

In [10]:
census2000.trimmed$population <- str_replace(census2000.trimmed$population, pattern = ',', replacement = '')

Then we'll visually inspect the head:

In [11]:
head(census2000.trimmed)

Unnamed: 0,fips.code,census.tract,population,total.housing.units,occupied.housing.units,vacant.housing.units
1,22001960100,"Census Tract 9601, Acadia Parish, Louisiana",6188,2410,2236,174
2,22001960200,"Census Tract 9602, Acadia Parish, Louisiana",5056,1909,1764,145
3,22001960300,"Census Tract 9603, Acadia Parish, Louisiana",3149,1246,1145,101
4,22001960400,"Census Tract 9604, Acadia Parish, Louisiana",5617,2176,1991,185
5,22001960500,"Census Tract 9605, Acadia Parish, Louisiana",4927,1796,1692,104
6,22001960600,"Census Tract 9606, Acadia Parish, Louisiana",5654,2292,2073,219


This appeared to work. But R will still think this is a character variable unless we explicitly tell it otherwise:

In [12]:
census2000.trimmed$population <- as.numeric(census2000.trimmed$population)

Running `str` will help us ensure this worked:

In [13]:
str(census2000.trimmed)

'data.frame':	1106 obs. of  6 variables:
 $ fips.code             : num  2.2e+10 2.2e+10 2.2e+10 2.2e+10 2.2e+10 ...
 $ census.tract          : Factor w/ 1106 levels "Census Tract 10.01, Lafayette Parish, Louisiana",..: 970 978 985 993 1000 1006 1011 1015 1018 1021 ...
 $ population            : num  6188 5056 3149 5617 4927 ...
 $ total.housing.units   : Factor w/ 905 levels "1","10","1,002",..: 565 404 104 499 357 531 397 522 426 594 ...
 $ occupied.housing.units: Factor w/ 876 levels "0","1","1,001",..: 512 363 74 446 338 468 347 470 374 514 ...
 $ vacant.housing.units  : Factor w/ 374 levels "0","1","10","100",..: 86 54 5 98 9 134 83 86 110 196 ...


For the rest of the columns we can nest the first function within the second to speed things up:

In [14]:
census2000.trimmed$total.housing.units <- as.numeric(str_replace(census2000.trimmed$total.housing.units, pattern = ',', replacement = ''))
census2000.trimmed$occupied.housing.units <- as.numeric(str_replace(census2000.trimmed$occupied.housing.units, pattern = ',', replacement = ''))
census2000.trimmed$vacant.housing.units <- as.numeric(str_replace(census2000.trimmed$vacant.housing.units, pattern = ',', replacement = ''))

In [15]:
str(census2000.trimmed)

'data.frame':	1106 obs. of  6 variables:
 $ fips.code             : num  2.2e+10 2.2e+10 2.2e+10 2.2e+10 2.2e+10 ...
 $ census.tract          : Factor w/ 1106 levels "Census Tract 10.01, Lafayette Parish, Louisiana",..: 970 978 985 993 1000 1006 1011 1015 1018 1021 ...
 $ population            : num  6188 5056 3149 5617 4927 ...
 $ total.housing.units   : num  2410 1909 1246 2176 1796 ...
 $ occupied.housing.units: num  2236 1764 1145 1991 1692 ...
 $ vacant.housing.units  : num  174 145 101 185 104 219 171 174 196 284 ...


In [16]:
head(census2000.trimmed)

Unnamed: 0,fips.code,census.tract,population,total.housing.units,occupied.housing.units,vacant.housing.units
1,22001960100,"Census Tract 9601, Acadia Parish, Louisiana",6188,2410,2236,174
2,22001960200,"Census Tract 9602, Acadia Parish, Louisiana",5056,1909,1764,145
3,22001960300,"Census Tract 9603, Acadia Parish, Louisiana",3149,1246,1145,101
4,22001960400,"Census Tract 9604, Acadia Parish, Louisiana",5617,2176,1991,185
5,22001960500,"Census Tract 9605, Acadia Parish, Louisiana",4927,1796,1692,104
6,22001960600,"Census Tract 9606, Acadia Parish, Louisiana",5654,2292,2073,219


That worked!

But in the interest of full disclosure, you should know that we added those commas to the original CSVs from the Census Bureau to facilitate this exercise. “Commafied” numbers are one of the most frequent stumbling blocks to creating a cleaned data set.

For our last cleaning exercise, we'll work with the geography column. It has a lot of information in there, but it would be more useful if the census tract, parish name and state were separated, to help us aggregate some of these numbers.

The package tidyr has a function that helps us do just that:

In [17]:
# install.packages('tidyr')

library('tidyr')

Should you run into a function and not know what arguments it takes, running the function name preceded by a question mark will allow you to access the documentation on that function:

In [18]:
# ?separate

In [19]:
census2000.trimmed <- separate(
    census2000.trimmed,
    census.tract,
    c('tract', 'parish', 'state'),
    ', '
)

In [20]:
head(census2000.trimmed)

Unnamed: 0,fips.code,tract,parish,state,population,total.housing.units,occupied.housing.units,vacant.housing.units
1,22001960100,Census Tract 9601,Acadia Parish,Louisiana,6188,2410,2236,174
2,22001960200,Census Tract 9602,Acadia Parish,Louisiana,5056,1909,1764,145
3,22001960300,Census Tract 9603,Acadia Parish,Louisiana,3149,1246,1145,101
4,22001960400,Census Tract 9604,Acadia Parish,Louisiana,5617,2176,1991,185
5,22001960500,Census Tract 9605,Acadia Parish,Louisiana,4927,1796,1692,104
6,22001960600,Census Tract 9606,Acadia Parish,Louisiana,5654,2292,2073,219


Our data set is as cleaned up as we need it to be now.

Let's summarize it with a frequency table of the county names:

In [21]:
table(census2000.trimmed$parish)


              Acadia Parish                Allen Parish 
                         12                           5 
           Ascension Parish           Assumption Parish 
                         14                           6 
           Avoyelles Parish           Beauregard Parish 
                          9                           7 
           Bienville Parish              Bossier Parish 
                          5                          19 
               Caddo Parish            Calcasieu Parish 
                         64                          41 
            Caldwell Parish              Cameron Parish 
                          3                           2 
           Catahoula Parish            Claiborne Parish 
                          3                           5 
           Concordia Parish              De Soto Parish 
                          5                           7 
    East Baton Rouge Parish         East Carroll Parish 
                         89   

Let's filter our cleaned data frame down to just Orleans Parish and its 181 tracts. The Orleans Parish and the city of New Orleans are “coterminous” so this will isolate only the census tracts of the city.

In [22]:
orleans2000 <- filter(census2000.trimmed, parish == 'Orleans Parish')
head(orleans2000)

Unnamed: 0,fips.code,tract,parish,state,population,total.housing.units,occupied.housing.units,vacant.housing.units
1,22071000000.0,Census Tract 1,Orleans Parish,Louisiana,2381,1408,1145,263
2,22071000000.0,Census Tract 2,Orleans Parish,Louisiana,1347,691,496,195
3,22071000000.0,Census Tract 3,Orleans Parish,Louisiana,1468,719,559,160
4,22071000000.0,Census Tract 4,Orleans Parish,Louisiana,2564,1034,873,161
5,22071000000.0,Census Tract 6.01,Orleans Parish,Louisiana,2034,704,506,198
6,22071000000.0,Census Tract 6.02,Orleans Parish,Louisiana,2957,1106,1011,95


Let's do a quick calculation to answer the question: **What was the population of New Orleans in 2000?**

That requires summing up the population column like so:

In [23]:
sum(orleans2000$population)

Now we need to run all of the above cleaning steps on the 2010 data:

In [24]:
census2010 <- read.csv('2010_census_demographic_profile.csv', skip = 1)

census2010.trimmed <- select(
  census2010, # name of the data frame
  # list of all the column names we want to keep
  Id2, Geography, Number..SEX.AND.AGE...Total.population, 
  Number..HOUSING.OCCUPANCY...Total.housing.units, 
  Number..HOUSING.OCCUPANCY...Total.housing.units...Occupied.housing.units, 
  Number..HOUSING.OCCUPANCY...Total.housing.units...Vacant.housing.units
)

colnames(census2010.trimmed) <- c('fips.code', 'census.tract', 'population', 
                               'total.housing.units', 'occupied.housing.units', 'vacant.housing.units')

census2010.trimmed$population <- as.numeric(str_replace(census2010.trimmed$population, pattern = ',', replacement = ''))
census2010.trimmed$total.housing.units <- as.numeric(str_replace(census2010.trimmed$total.housing.units, pattern = ',', replacement = ''))
census2010.trimmed$occupied.housing.units <- as.numeric(str_replace(census2010.trimmed$occupied.housing.units, pattern = ',', replacement = ''))
census2010.trimmed$vacant.housing.units <- as.numeric(str_replace(census2010.trimmed$vacant.housing.units, pattern = ',', replacement = ''))

census2010.trimmed <- separate(census2010.trimmed, census.tract, c('tract', 'parish', 'state'), ', ')

orleans2010 <- filter(census2010.trimmed, parish == 'Orleans Parish')

This allows us to answer the question: **What was the population of New Orleans in 2010?**

In [25]:
sum(orleans2010$population)

This matches the story exactly. What remains is a simple percentage change calculation. To do this, we'll first save each population calculation to new variables. Then we'll create another variable to store the percent change.

In [26]:
nola2000pop <- sum(orleans2000$population)
nola2010pop <- sum(orleans2010$population)

perc.change.nola <- (nola2010pop - nola2000pop)/nola2000pop * 100

In [27]:
print(paste('The percent change in New Orleans population since 2000 is ', round(perc.change.nola), '%', sep =''))

[1] "The percent change in New Orleans population since 2000 is -29%"


Finally, we will merge the 2000 and 2010 data. Merging allows you to link two data sets on values common to both. In this case, we know that the FIPS code and the character names of the tracts should be consistent across the 10-year period.

To be sure, though, we will make sure to keep all entries on both sides. This is what is referred to as a "full outer join” -- we need to do this because census tracts that existed in 2000 do not exist in 2010, and vice versa. If we were to only keep all rows that were common to both data frames (R’s default behavior) we would lose some data.

In [28]:
census.comparison <- merge(
    census2000.trimmed,
    census2010.trimmed, 
    by = c('fips.code', 'tract', 'parish', 'state'), 
    suffixes = c('.00', '.10'), 
    all = TRUE
)
head(census.comparison)

Unnamed: 0,fips.code,tract,parish,state,population.00,total.housing.units.00,occupied.housing.units.00,vacant.housing.units.00,population.10,total.housing.units.10,occupied.housing.units.10,vacant.housing.units.10
1,22001960100,Census Tract 9601,Acadia Parish,Louisiana,6188,2410,2236,174,6213,2574,2345,229
2,22001960200,Census Tract 9602,Acadia Parish,Louisiana,5056,1909,1764,145,5988,2362,2144,218
3,22001960300,Census Tract 9603,Acadia Parish,Louisiana,3149,1246,1145,101,3582,1427,1286,141
4,22001960400,Census Tract 9604,Acadia Parish,Louisiana,5617,2176,1991,185,6584,2604,2362,242
5,22001960500,Census Tract 9605,Acadia Parish,Louisiana,4927,1796,1692,104,6093,2349,2178,171
6,22001960600,Census Tract 9606,Acadia Parish,Louisiana,5654,2292,2073,219,5972,2504,2306,198


Saving your intermediate work to a file is often good practice, so we will write the results of our merge to a CSV.

In [29]:
write.csv(census.comparison, 'census_comparison.csv', row.names = FALSE)