<a href="https://colab.research.google.com/github/Akmazad/Data-Science-Fundamentals-in-R/blob/main/Modules/Module_1_Data_Import_%26_Export.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Module 1: Data Import & Export#


---


**Contents**
*   Reading/Writing Structured Data
*   Reading Semi-Structured Data
*   Reading from Databases
---






### Module 1: Data Import & Export ###
- Data are stored in files, e.g.,
 *   local file
 *   remote
 *   streamlined

- Data formats can be:
 *  structured (e.g., Tabular),
 *  semi-structured (tree-fashioned)
 *  unstructured.

---



### Reading/Writing Structured Data ###
- Structured data can be imported using `read.csv` (for *.csv files),  `read_excel` (for *.xlsx files), or `fread` functions, typically invoked through R packages (`base`, `readxl` and `data.table` packages, respectively).





---



In [None]:
# Read Excel files
library(dplyr)
library(readxl)
example <- readxl_example("datasets.xlsx") # `datasets.xlsx` is preloaded within the package
df <- read_excel(example)
df %>% head

# Write excel file to disc
library(writexl)
writexl::write_xlsx(df,  "temp_excel_file.xlsx")

In [None]:
# Read CSVs/tsv files into R
library(data.table) # part of tidyverse
df = fread("/content/sample_data/california_housing_test.csv") # very efficient for loading large file
df %>% head
# Read remote CSV/tsv into R
remote_file_path = "https://raw.githubusercontent.com/Statology/Miscellaneous/main/basketball_data.csv"
remote_df = read.csv(remote_file_path)

# Write CSVs/tsv files into disk
remote_df %>% fwrite("remote_df.csv")



---
###Reading Semi-Structured Data###

* E.g., JSON or XML files can be imported


---



In [None]:
# Read JSON (semi-structured) into R
library(rjson)
JsonData <- fromJSON(file = '/content/iris.json')
print(JsonData[1])

# Read XML (semi-structured) into R
library(xml2)
library(XML)

plant_xml <- read_xml("https://www.w3schools.com/xml/plant_catalog.xml")
plant_xml_parse <- xmlParse(plant_xml)
plant_xml_parse

---
###Reading from Databases###



* Relational database access

---



In [None]:
# Read from Relational Databases (SQLite)
library(RSQLite)

conn = dbConnect(RSQLite::SQLite(), '/content/nba_salary.sqlite')
dbListTables(conn)
# 'NBA_season1718_salary''Seasons_Stats'


db.df <- tryCatch({
  dbGetQuery(conn, 'SELECT * FROM NBA_season1718_salary')
  }, error = function(err) {
    msg <- "Database Connection Error"
    NA
})

db.df %>% head

dbDisconnect(conn)




---
Excercise: Load the dataset (excel spreadsheet) and print the dimension


---




In [None]:
library(data.table)
library(dplyr)

df = fread("https://docs.google.com/spreadsheets/d/e/2PACX-1vRtySA5U09DJktfiQdTP_j50tCI3h64G6zHFxCDJvkpA8VFgRTn6G9zFGDU9Kwv4s0sianfz7YcvYTD/pub?output=csv")
df %>% as_tibble %>% colnames

**Note**:
```tryCatch``` is a statement that helps a code segment to be abruptly break the runtime, and nicely tackle the error response.