<h1>R: read .csv</h1>

<h2>Using <code>read.table</code></h2>
<p>Reads a file in table format and creates a data frame from it, with cases corresponding to lines and variables to fields in the file. This function is the principal means of reading tabular data into R.</p>
<p>Source: <a href="https://www.rdocumentation.org/packages/utils/versions/3.6.2/topics/read.table">read.table: Data Input</a></p>

In [None]:
url = "https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv"
file = "titanic.csv"
input <- if(file.exists(file)) {
    file
} else {
    url
}

titanic <- read.csv(input)  # titanic is a data.frame

In [None]:
titanic

<h2>Using <code>readr</code></h2>
<p><code>read_csv()</code> and <code>read_tsv()</code> are special cases of the more general
<code>read_delim()</code>. They're useful for reading the most common types of
flat file data, comma separated values and tab separated values,
respectively. <code>read_csv2()</code> uses <code>;</code> for the field separator and <code>,</code> for the
decimal point. This format is common in some European countries.</p>
<p>Source: <a href="https://readr.tidyverse.org/reference/read_delim.html">Read a delimited file (including CSV and TSV) into a tibble</a></p>

In [None]:
library(readr)

In [None]:
url = "https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv"
file = "titanic.csv"

input <- if(file.exists(file)) {
    file
} else {
    url
}

titanic <- read_csv(input, show_col_types = FALSE)  # titanic is a tibble

In [None]:
titanic

<h2>Using <code>data.table</code></h2>
<p><code>data.table</code> is an R package that provides <strong>an enhanced version</strong> of <code>data.frame</code>s, which are the standard data structure for storing data in <code>base</code> R.</p>
<p>Source: <a href="https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html">Introduction to <code>data.table</code></a></p>

In [None]:
library(data.table)

In [None]:
url = "https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv"
file = "titanic.csv"

input <- if(file.exists(file)) {
    file
} else {
    url
}
titanic <- fread(input) # titanic is a data.table data frame

In [None]:
titanic

<h2>Using <code>sqldf</code></h2>
<p><strong><code>read.csv.sql</code>: Read File Filtered by SQL</strong></p>
<h4>Description</h4>
<p>Read a file into R filtering it with an sql statement. Only the filtered portion is processed by R so that files larger than R can otherwise handle can be accommodated.</p>
<h4>Details</h4>
<p>Reads the indicated file into an sql database creating the database if it does not already exist. Then it applies the sql statement returning the result as a data frame. If the database did not exist prior to this statement it is removed.</p>
<p>Source: <a href="https://www.rdocumentation.org/packages/sqldf/versions/0.4-11/topics/read.csv.sql">RDocumentation</a></p>

In [None]:
# install.packages("sqldf")

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

also installing the dependencies ‘Rcpp’, ‘plogr’, ‘gsubfn’, ‘proto’, ‘RSQLite’, ‘chron’




In [None]:
library(sqldf)

Loading required package: gsubfn

Loading required package: proto

“no DISPLAY variable so Tk is not available”
Loading required package: RSQLite



In [None]:
url = "https://raw.githubusercontent.com/selva86/datasets/master/hflights.csv"
file = "hflights.csv"
query = " SELECT * FROM file WHERE Month == 1 AND DayofMonth == 8 "

input <- if(file.exists(file)) {
    file
} else {
    url
}
hflights <- read.csv.sql(input, sql = query)

“Column `DepTime`: mixed type, first seen values of type integer, coercing other values of type string”
“Column `ArrTime`: mixed type, first seen values of type integer, coercing other values of type string”
“Column `ActualElapsedTime`: mixed type, first seen values of type integer, coercing other values of type string”
“Column `AirTime`: mixed type, first seen values of type integer, coercing other values of type string”
“Column `ArrDelay`: mixed type, first seen values of type integer, coercing other values of type string”
“Column `DepDelay`: mixed type, first seen values of type integer, coercing other values of type string”
“Column `TaxiIn`: mixed type, first seen values of type integer, coercing other values of type string”
“Column `TaxiOut`: mixed type, first seen values of type integer, coercing other values of type string”


In [None]:
hflights

Year,Month,DayofMonth,DayOfWeek,DepTime,ArrTime,UniqueCarrier,FlightNum,TailNum,ActualElapsedTime,⋯,ArrDelay,DepDelay,Origin,Dest,Distance,TaxiIn,TaxiOut,Cancelled,CancellationCode,Diverted
<int>,<int>,<int>,<int>,<int>,<int>,<chr>,<int>,<chr>,<int>,⋯,<int>,<int>,<chr>,<chr>,<int>,<int>,<int>,<int>,<chr>,<int>
2011,1,8,6,1355,1454,"""AA""",428,"""N477AA""",59,⋯,-16,-5,"""IAH""","""DFW""",224,7,12,0,"""""",0
2011,1,8,6,713,805,"""AA""",460,"""N550AA""",52,⋯,-30,-7,"""IAH""","""DFW""",224,3,9,0,"""""",0
2011,1,8,6,1627,1736,"""AA""",1121,"""N583AA""",69,⋯,-9,-3,"""IAH""","""DFW""",224,13,11,0,"""""",0
2011,1,8,6,1749,2100,"""AA""",1294,"""N3AXAA""",131,⋯,-15,-6,"""IAH""","""MIA""",964,11,13,0,"""""",0
2011,1,8,6,1019,1324,"""AA""",1700,"""N3AHAA""",125,⋯,-16,-1,"""IAH""","""MIA""",964,7,12,0,"""""",0
2011,1,8,6,1202,1308,"""AA""",1820,"""N4YTAA""",66,⋯,-2,-3,"""IAH""","""DFW""",224,17,10,0,"""""",0
2011,1,8,6,902,1010,"""AA""",1824,"""N4YUAA""",68,⋯,-15,-8,"""IAH""","""DFW""",224,9,16,0,"""""",0
2011,1,8,6,558,910,"""AA""",1994,"""N3DAAA""",132,⋯,-5,-2,"""IAH""","""MIA""",964,11,16,0,"""""",0
2011,1,8,6,1822,2112,"""AS""",731,"""N607AS""",290,⋯,2,-3,"""IAH""","""SEA""",1874,6,15,0,"""""",0
2011,1,8,6,654,1058,"""B6""",620,"""N630JB""",184,⋯,-19,-6,"""HOU""","""JFK""",1428,9,13,0,"""""",0
