This repository has been archived by the owner on Jul 24, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
4-import-export-data.Rmd
146 lines (101 loc) · 2.55 KB
/
4-import-export-data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
title: "Importing and Exporting Data"
author: "Dmitry Petukhov"
output:
html_document:
df_print: paged
code_folding: hide
keep_md: true
toc: true
toc_float:
collapsed: false
smooth_scroll: false
theme: flatly
highlight: textmate
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r set_env_opt, include=FALSE}
options(max.print = 1e3, scipen = 999, width = 1e2)
options(stringsAsFactors = F)
```
## Formats and data sources
Available formats:
- **CSV files**: various separators, various encoding, plain or compressed (.gz, .bz2)
- **Excel**
- **Database**: PostgreSQL, SQL Server, Oracle DB
- **Big Data clusters**: Spark, Hive, Impala
- **Web**
- RDS, RData
- SPSS, SAS, etc.
From sources:
- **local disc**
- **http(s)://, file://, ftp://**
- **Cloud storage services**, such as Azure Blob Storage.
## Import data
### Import data from `CSV`
Import from local storage:
```{r}
library(data.table)
products_csv <- fread("data/products.csv")
products_csv
```
Import from web (this repo in github):
```{r}
library(curl)
products_web <- fread("https://raw.githubusercontent.com/codez0mb1e/StarRter/master/data/products.csv")
products_web
```
Compare results:
```{r}
dim(products_web)
names(products_web)
stopifnot(
dim(products_web) == dim(products_csv),
names(products_web) == names(products_csv)
)
```
### Import data from `Excel`
```{r}
library(readxl)
# read workbook sheets
excel_sheets(path = "data/products.xlsx")
# import sheet of interest
products_xlsx <- read_excel(path = "data/products.xlsx", sheet = "products data")
products_xlsx
```
### Import from `SQL Server`
```
library(DBI)
library(odbc)
# NOTE: change on actual connection string
conn <- dbConnect(odbc(),
.connection_string = "Driver={SQL Server};Driver={SQL Server};Server=<server_db>;database=<db_name>;Trusted_Connection=yes;",
timeout = 10)
products_sql <- dbGetQuery(conn, "select * from dbo.products")
```
## Export data to `CSV`
### Export to CSV
Filter data for export:
```{r}
library(dplyr)
products_csv %>%
count(department, brand_ty, sort = T)
new_products <- products_csv %>%
filter(department == "PHARMA" & brand_ty == "PRIVATE") %>%
mutate_if(is.character, tolower) %>%
select(-x5)
new_products
```
Export:
```{r}
write.table(new_products, "data/new_products.csv",
sep = ",",
row.names = F)
```
And now check result via 'Import dataset' wizard in RStudio.
## Conclusion
<font size="4">
[Back to Course program](/StarRter/)
</font>