<a href="https://colab.research.google.com/github/christophermalone/stat360/blob/main/COLAB_Readin_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Reading Data into COLAB


There are various ways in which data can be read into a COLAB document.  Three methods will be discussed here.

1.  Read data directly from a URL
2.  Place a copy of the data into COLAB file system and read the data in
3.  Place a copy of the data into your Google Drive and read the data in

A local copy of the data must be obtained when using the COLAB file system or Google Drive.  A copy of the LaCrosse Winona Home Prices data can be downloaded below.

LaCrosse - Winona Home Prices: <a href="http://www.StatsClass.org/stat360/Datasets/LaCrosse_Winona_Redfin.csv">Data</a>



---



---



## Load tidyverse() Package

The tidyverse R package will be used to assist with reading in the dataset into the current R session.

In [1]:
#load tidyverse package
library(tidyverse)

“running command 'timedatectl' had status 1”
── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.6      [32m✔[39m [34mpurrr  [39m 0.3.4 
[32m✔[39m [34mtibble [39m 3.1.7      [32m✔[39m [34mdplyr  [39m 1.0.10
[32m✔[39m [34mtidyr  [39m 1.2.1      [32m✔[39m [34mstringr[39m 1.4.1 
[32m✔[39m [34mreadr  [39m 2.1.2      [32m✔[39m [34mforcats[39m 0.5.2 

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()





---



---



## Read in Data via URL

The **read_csv()** function is used to read in the data set directly from an URL.  The following snip-it of code demostrates reading in a data set from an URL.

In [2]:
# Reading data in using an URL
LaCrosseWinonaHomePrices <- read_csv("http://www.StatsClass.org/stat360/Datasets/LaCrosse_Winona_Redfin.csv")

[1mRows: [22m[34m67[39m [1mColumns: [22m[34m27[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (13): SALETYPE, PROPERTYTYPE, ADDRESS, CITY, STATE, LOCATION, STATUS, NE...
[32mdbl[39m (13): ZIPCODE, PRICE, BEDS, BATHS, SQUAREFEET, LOTSIZE, YEARBUILT, DAYSO...
[33mlgl[39m  (1): SOLDDATE

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.




---



---



## Read in Data via COLAB File System

To read in a data set using the COLAB file system, click on the Folder icon near the upper-left corner of the the COLAB document. Typically, data is placed into the sample_data folder within this file system.

Drag and drop the desired *.csv file into the sample_data folder. 

<p align='center'><img src="https://drive.google.com/uc?export=view&id=1nUVhXBGPrRQKwiJB__venD80JhC-aDQE" width='50%' height='50%'></img></p>

Next, specify the entire file system path to the *.csv file. If filename.csv is placed into the sample_data folder, the entire file system path is:

$$ \mbox{/content/sample_data/filename.csv}$$

In [None]:
# Reading data in from COLAB file system
LaCrosseWinonaHomePrices <- read_csv("/content/sample_data/LaCrosse_Winona_Redfin.csv")



---



---



## Read in Data via Google Drive

To begin, you will need to drag and drop a copy of the *.csv file from your local computer into your Google Drive.

Next, you must make the file shareable and identify Google's File ID for this file.  To do this, right-click on the file and select **Get link**.

<p align='center'><img src="https://drive.google.com/uc?export=view&id=1DmgNP-RIw0grhHteph0_vUJFIkIvyryW" width='35%' height='35%'></img></p>

In the Share window, select **Anyone with the link** under General access.  Click the Copy link button to obtian a copy of the URL for the file.

<p align='center'><img src="https://drive.google.com/uc?export=view&id=1vpBCzpmDmiFSb1WMNWl8khHHKJT8edDz" width='75%' height='75%'></img></p>

The following is an example URL to gain access to a file stored in Google Drive.  The File ID is part of this URL.

<p align='center'><img src="https://drive.google.com/uc?export=view&id=19yzIvXcumLph0QtPVyQhYW5t1snhg9lV" width='75%' height='75%'></img></p>

After the Google File ID has be obtained, there are two approaches that can be taken:

1.   Use an URL that points directly to the file in Google Drive
2.   Download the file from Google Drive to the COLAB file system, then read the data in from the COLAB file system 



To use an URL that points directly to the file in your Google Drive, After Google's File ID is obtained, the File ID can be added to the end of the following string.  This URL string is then used to read in the file.

$$
https://drive.google.com/uc?export=download&id=1ajSXOOCtAe3nnStIOaa-v6U-2KJB116a
$$

In [None]:
# Reading data in from Google Drive
LaCrosseWinonaHomePrices <- read_csv("https://drive.google.com/uc?export=download&id=1ajSXOOCtAe3nnStIOaa-v6U-2KJB116a")

COLAB has the ability to download files directly from Google Drive into COLAB file system.  The **gdown()** command permits this to happen.  The gdown command is a system-level command (not an R command); thus, must be passed through the system() R function.

<u>Note</u>:  The Google File ID specified here is the same as above.

In [4]:
system("gdown --id 1ajSXOOCtAe3nnStIOaa-v6U-2KJB116a")

After the system-level gdown command is executed, the file will be downloaded into COLAB's file system.

<p align='center'><img src="https://drive.google.com/uc?export=view&id=1KCT-YT4LnSZvbfsrD-JL1vea6d4Mi7Sz" width='50%' height='50%'></img></p>

Finally, the read_csv() function can be used to read in this data from the COLAB file system. Notice, this file was *not* placed in the sample_data folder; thus, the path differs from the path used above.

In [5]:
# Reading data in from COLAB file system
LaCrosseWinonaHomePrices <- read_csv("/content/LaCrosse_Winona_Redfin.csv")

[1mRows: [22m[34m67[39m [1mColumns: [22m[34m27[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (13): SALETYPE, PROPERTYTYPE, ADDRESS, CITY, STATE, LOCATION, STATUS, NE...
[32mdbl[39m (13): ZIPCODE, PRICE, BEDS, BATHS, SQUAREFEET, LOTSIZE, YEARBUILT, DAYSO...
[33mlgl[39m  (1): SOLDDATE

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.




---



---



## Use head() and tail() to Look at the Data

The **head()** function can be used to see the first few lines of a dataset.

In [None]:
#Look at data
head(LaCrosseWinonaHomePrices)

Likewise, <strong>tail()</strong> can be used to see the last few lines in a dataset.

In [None]:
tail(LaCrosseWinonaHomePrices)



---



---
End of Document
