# CSV Input and Output

CSV stands for comma separated variable and its one of the most common ways we'll be working with data throughout this course. The basic format of a csv file is the first line indicating the column names and the rest of the rows/lines being data points separated by commas. One of the most basic ways to read in csv files in R is to use read.csv() which is built-in to R. Later on we'll learn about fread which will be a bit faster and more convenient, but its important to understand all your options!

When using read.csv() you'll need to either pass in the entire path of the file or have the file be in the same directory as your R script. Make sure to account for possible spaces in the file path name, you may need to use backslashes to account for this. This is often a point of confusion for people new to programming, so make sure you understand the above before continuing!

In [1]:
write.csv(mtcars,file = 'example.csv')
# You can use your own CSV if you want or make your own
# This is just for a basic example

In [2]:
csv <- read.csv("example.csv")
class(csv)
str(csv)

'data.frame':	32 obs. of  12 variables:
 $ X   : Factor w/ 32 levels "AMC Javelin",..: 18 19 5 13 14 31 7 21 20 22 ...
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : int  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : int  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : int  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : int  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: int  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: int  4 4 1 1 2 1 4 2 2 4 ...


In [3]:
# Check column names
colnames(csv)

In [4]:
df <- data.frame(csv)
head(df)

Unnamed: 0_level_0,X,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<fct>,<dbl>,<int>,<dbl>,<int>,<dbl>,<dbl>,<dbl>,<int>,<int>,<int>,<int>
1,Mazda RX4,21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
2,Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
3,Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
4,Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
5,Hornet Sportabout,18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
6,Valiant,18.1,6,225,105,2.76,3.46,20.22,1,0,3,1


So we can now see how easy it is to read a csv, if we have another flat file format like a tab separated file, or some other sort of delimiter we can specify this when calling read.csv

In [5]:
help(read.csv)

# read.table()
The __read.table__ function is the general form of __read.csv()__.

In [6]:
read.table('example.csv')

V1,V2
<fct>,<fct>
,",""mpg"",""cyl"",""disp"",""hp"",""drat"",""wt"",""qsec"",""vs"",""am"",""gear"",""carb"""
Mazda RX4,",21,6,160,110,3.9,2.62,16.46,0,1,4,4"
Mazda RX4 Wag,",21,6,160,110,3.9,2.875,17.02,0,1,4,4"
Datsun 710,",22.8,4,108,93,3.85,2.32,18.61,1,1,4,1"
Hornet 4 Drive,",21.4,6,258,110,3.08,3.215,19.44,1,0,3,1"
Hornet Sportabout,",18.7,8,360,175,3.15,3.44,17.02,0,0,3,2"
Valiant,",18.1,6,225,105,2.76,3.46,20.22,1,0,3,1"
Duster 360,",14.3,8,360,245,3.21,3.57,15.84,0,0,3,4"
Merc 240D,",24.4,4,146.7,62,3.69,3.19,20,1,0,4,2"
Merc 230,",22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2"


Oops! Something went wrong! We only have two columns when we should have 12. Let's fix this 

In [7]:
read.table(file = 'example.csv', sep = ',')

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12
<fct>,<fct>,<fct>,<fct>,<fct>,<fct>,<fct>,<fct>,<fct>,<fct>,<fct>,<fct>
,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4,21,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21,6,160,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,8,360,245,3.21,3.57,15.84,0,0,3,4
Merc 240D,24.4,4,146.7,62,3.69,3.19,20,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2


Great! Now we know about both of those. In most situations though, you'll want to use the fread function for ease of use:

# fread()
__fread()__ is similar to __read.table()__, but faster and more conventient to use.

In [11]:
#install.packages("data.table") # you may need to install the data.table package to use fread
library(data.table)
fread('example.csv')

V1,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
<chr>,<dbl>,<int>,<dbl>,<int>,<dbl>,<dbl>,<dbl>,<int>,<int>,<int>,<int>
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4


# Output to CSV file
We can output files to CSV by using the __write.csv__ function

In [9]:
write.csv(df, file = "test.csv")
fread('test.csv')

V1,X,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
<int>,<chr>,<dbl>,<int>,<dbl>,<int>,<dbl>,<dbl>,<dbl>,<int>,<int>,<int>,<int>
1,Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
2,Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
3,Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
4,Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
5,Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
6,Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
7,Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
8,Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
9,Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
10,Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4


In [10]:
## or without row names
### Alternatively
write.csv(df, file = "test2.csv",row.names = FALSE)
fread('test2.csv')

X,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
<chr>,<dbl>,<int>,<dbl>,<int>,<dbl>,<dbl>,<dbl>,<int>,<int>,<int>,<int>
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4


Ok that's it for reading and writing flat/csv files. We'll be using these a lot in when working with datasets!