# HOW TO TUTORIAL: Send Data From `R` to `PostgreSQL` 

`with Mr Fugu Data Science`

# *(◕‿◕✿)*    

[Github](https://github.com/MrFuguDataScience) | [Youtube](https://www.youtube.com/channel/UCbni-TDI-Ub8VlGaP8HLTNw?view_as=subscriber)

`_____________________________________`


# Purpose & Outcome:     
    

    1.) Send Dataframe --> PostgreSQL

    2.) Send .CSV() --> PostgreSQL


*then*
+ we will Query PostgreSQL 
    + export data from PostgreSQL into R
    
    
 We will emplement a `DSN Credential` to mask our `USER` inputs such as *Password, User_Name, etc*: there will be many ways of doing this I will show a few variations. This file will be written in `YAML` formatting. 
    
`_________________________`


[RpostgreSQL](https://cran.r-project.org/web/packages/RPostgreSQL/RPostgreSQL.pdf) | [DSN setup](https://db.rstudio.com/best-practices/portable-code/)

In [None]:
# install.packages("randomNames")  # install if you don't already have
# install.packages("config")

In [1]:
library(config)      # Access external files defining our credentials

library(randomNames) # Random name generation
library(generator)   # Fake personal information

library(RPostgreSQL) # Run psql instance
library(DBI)

# library(generator)   # Fake personal information
library(knitr)       # Help run code
library(markdown)    # Create markdown files i.e. pdf
# library(DBI)
library(tidyverse)   # If you parse files
library(dplyr)
#library(rPython)#

“package ‘config’ was built under R version 3.4.4”

Attaching package: ‘config’


The following objects are masked from ‘package:base’:

    get, merge


“package ‘randomNames’ was built under R version 3.4.4”
Loading required package: DBI

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.3.0     [32m✔[39m [34mpurrr  [39m 0.3.3
[32m✔[39m [34mtibble [39m 3.0.0     [32m✔[39m [34mdplyr  [39m 0.8.5
[32m✔[39m [34mtidyr  [39m 1.0.2     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.3.1     [32m✔[39m [34mforcats[39m 0.5.0

“package ‘readr’ was built under R version 3.4.4”
“package ‘stringr’ was built under R version 3.4.4”
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



# From randomNames():


| Race                                  	| Gender     	|
|---------------------------------------	|------------	|
| 1 = American Indian or Native Alaskan 	| 1 = Female 	|
| 2 = Asian or Pacific Islander         	| 0 = Male   	|
| 3 = Black (not Hispanic)              	|            	|
| 4 = Hispanic                          	|            	|
| 5 = White (not Hispanic)              	|            	|
| 6 = Middle-Eastern, Arabic            	|            	|

In [2]:
first_name <- randomNames(n=6000,return.complete.data = TRUE,which.names = "first",)
last_name <- list(randomNames(n=6000,return.complete.data = FALSE,which.names = "last",
                            sample.with.replacement=FALSE))


user_info <- list(c(first_name,last_name))
#data.frame(matrix((user_info),nrow=length(user_info)))


usr <- as.data.frame(matrix(unlist(cbind(user_info)),ncol = 4),as.factor=FALSE)

head(usr)

Unnamed: 0_level_0,V1,V2,V3,V4
Unnamed: 0_level_1,<fct>,<fct>,<fct>,<fct>
1,1,5,Breanna,Archambault
2,1,2,Emily,Sauve
3,1,4,Jocelyn,Valencia
4,1,3,Helena,Revello
5,0,5,Maxx,Mangels
6,1,6,Saamyya,Mcbride


In [3]:
# Create Column Names:

names(usr)[1]<-paste("gender")
names(usr)[2]<-paste("race")
names(usr)[3]<-paste("first_name")
names(usr)[4]<-paste("last_name")

head(usr)

Unnamed: 0_level_0,gender,race,first_name,last_name
Unnamed: 0_level_1,<fct>,<fct>,<fct>,<fct>
1,1,5,Breanna,Archambault
2,1,2,Emily,Sauve
3,1,4,Jocelyn,Valencia
4,1,3,Helena,Revello
5,0,5,Maxx,Mangels
6,1,6,Saamyya,Mcbride


In [4]:
# Convert Factor -> Labels:

levels(usr$gender) <- c('Male','Female')

levels(usr$race) <- c('Middle Eastern','American Indian','Black',
                      'Hispanic','White','Asian')

usr<-data.frame(lapply(usr, as.character), stringsAsFactors=FALSE)

head(usr)

Unnamed: 0_level_0,gender,race,first_name,last_name
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>
1,Female,White,Breanna,Archambault
2,Female,American Indian,Emily,Sauve
3,Female,Hispanic,Jocelyn,Valencia
4,Female,Black,Helena,Revello
5,Male,White,Maxx,Mangels
6,Female,Asian,Saamyya,Mcbride


# Securing Credentials:

+ This is very useful in order to avoid publishing them in *plain text*. We can do this savely in `R` in a few ways. 

    **Referenced from R Documentation:**

    * Integrated security without DSN

    * Encrypt credentials with the keyring package

    * Use a configuration file with the config package

    * Environment variables using the .Renviron file

    * Using the options base R command

    * Prompt for credentials using the RStudio IDE


[R doc maintaining credentials](https://db.rstudio.com/best-practices/managing-credentials/)

# Put the config file create/call here: SECURING CREDENTIALS

Here I will make a `config.yml` file: and call it using the name (*dsn*):

use your editor: I used 2 space indentations, you can *increase this stay consostent*!

(**TYPE WHAT IS BETWEEN THE DIVIDERS AND SAVE FILE**)

`____________________`

default:

  datawarehouse:
  
    driver: 'Postgresql'
    user: 'Your_User_name'
    password: 'your_password'
    host: 'localhost'
    dbname: 'your_database'
    port: 5432


`___________________________`

1.) you need to check if `post:5432` is used or not you may need to change

2.) `datawarehouse is what I made up`

3.) Keep this file near the working directory, if not you will need to do something like:
 `config <- config::get(file = "conf/config.yml",use_parent = TRUE)`

In [5]:
# SETTING UP A CONNECTION TO PostgreSQL:

require("RPostgreSQL")

drvr<- dbDriver("PostgreSQL") #create psql connection

dsn <- config::get('datawarehouse') # look into name and see what it is calling

conn <- dbConnect(drvr,dsn)


# Create a `Table` to use *unless* you have something already to use and formatted.

In [6]:
# List Current Tables:

dbListTables(conn)

In [8]:
# Write the data frame to the database
dbWriteTable(conn, name = "fake_r_users",
             value = usr, row.names = FALSE,append=TRUE)


In [9]:
# dbListTables(conn)
dbRemoveTable(conn, "fake_r_users")


In [10]:
dbListTables(conn)

In [11]:
# Table SCHEMA:

res_ <-dbSendQuery(conn, statement=paste("CREATE TABLE fake_R_users(
Gender TEXT, 
Race TEXT,
first_name TEXT,
last_name TEXT)"))



In [12]:
res <-dbSendQuery(conn, "SELECT COUNT(*) FROM fake_r_users")
dbFetch(res)

Unnamed: 0_level_0,count
Unnamed: 0_level_1,<dbl>
1,0


# IMPORTANT NOTE: `if the column names are capitalized you need to take care of that!`

`_____________________________________________`


# _______SENDING `DF` --> `PostgreSQL`______

In [13]:
# Take Data Frame and send to PostgreSQL:

dbWriteTable(conn, name = "fake_r_users",
             value = usr, row.names = FALSE,append=TRUE)


# Check what is in our database now: 

+ We are doing a `QUERY` and `EXPORTING DATA from PSQL --> R` 
    + The data is `IMPORTED` as a `DF`

In [17]:
# Today's NEW TABLE:

res_o <- dbSendQuery(conn, "SELECT * FROM fake_r_users")
# get all remaining records
data_ <- fetch(res_o, n = -1)
# dbDisconnect(conn)
head(data_)

# dbClearResult(res_o)

Unnamed: 0_level_0,gender,race,first_name,last_name
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>
1,Female,White,Breanna,Archambault
2,Female,American Indian,Emily,Sauve
3,Female,Hispanic,Jocelyn,Valencia
4,Female,Black,Helena,Revello
5,Male,White,Maxx,Mangels
6,Female,Asian,Saamyya,Mcbride


# Alternate way to send a DF --> PostgreSQL:

https://www.rdocumentation.org/packages/sqldf/versions/0.4-11


`_________________________________________`

# Send  `.CSV --> PostgreSQL` :

In [18]:
# Send .CSV to PSQL:

my_csv <- write.csv(usr,'r_psql.csv', row.names = FALSE)

#checking if the file is in current directory:
list.files(path = ".")


In [19]:
#getting the ABSOLUTE PATH of the .CSV I need:

library(tools)
r_psql_csv_path <-file_path_as_absolute('r_psql.csv')
#then paste the address into query
r_psql_csv_path

In [20]:
rs_ <-dbSendQuery(conn, 
                   statement=paste("COPY fake_r_users 
                                   FROM '/Users/zatoichi59/Desktop/r_psql.csv' 
                                    DELIMITER ',' CSV HEADER"))


# dbClearResult(rs)

In [None]:
# dbClearResult(res = res)
# dbClearResult(res=res_)
# dbDisconnect(conn)
# dbUnloadDriver(drvr)
# postgresqlCloseConnection(con = conn)

In [21]:
s <-dbSendQuery(conn, "SELECT COUNT(*) FROM fake_r_users")
dbFetch(s)

Unnamed: 0_level_0,count
Unnamed: 0_level_1,<dbl>
1,12000


In [23]:
# dbSendQuery(conn, statement=paste("CREATE ROLE mrfugu WITH SUPERUSER"))

dbUnloadDriver(drvr)
dbDisconnect(conn)
# dbUnloadDriver(drvr)

dbClearResult(res = res)
dbClearResult(res=res_)

ERROR: Error in postgresqlCloseDriver(drv, ...): RS-DBI driver: (There are opened connections -- close them first)


# Citations  ◔̯◔

https://cran.r-project.org/web/packages/randomNames/randomNames.pdf

https://github.com/rstudio/config (config_files)

https://db.rstudio.com/best-practices/managing-credentials/ (safely send data DSN setup)

https://cran.r-project.org/web/packages/RPostgreSQL/RPostgreSQL.pdf

https://rollout.io/blog/yaml-tutorial-everything-you-need-get-started/ (YAML formating)

https://stackoverflow.com/questions/33634713/rpostgresql-import-dataframe-into-a-table

https://db.rstudio.com/best-practices/portable-code/

https://cran.r-project.org/web/packages/config/vignettes/introduction.html (cinfig setup)