Skip to content

Latest commit

 

History

History
43 lines (35 loc) · 1.45 KB

README.md

File metadata and controls

43 lines (35 loc) · 1.45 KB

Spark

Anything Spark

Cheatsheets

SparkR

  • Git Bash Log In: ssh edge.db.co.com -l workid

PySpark

winutils

System Commands/Linux

General Code

# Stop Spark: On top for easy access and because it is really important
sparkR.stop()
Sys.getenv("SPARK_HOME") 
Sys.getenv("HADOOP_CONF_DIR")
Sys.getenv("SPARK_CONF_DIR")
Sys.setenv("SPARK_HOME" = "/usr/share/spark-2.3.0") # "/usr/local/")
# finds SparkR in Hadoop
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
SparkR.sql()
SparkR.take()
SparkR.collect()
SparkR::createOrReplaceTempView(df,"df_v")
SparkR::persist(df,"MEMORY_AND_DISK")
SparkR::saveAsTable(df, "schema.df", mode = "overwrite)
# Management
properties <- sql("SET -v")
showDF(properties, numRows = 200, truncate = FALSE)
# Standardized Workflow