Skip to content

Latest commit

 

History

History

Spark_Notes

Notes and code tips about and around Apache Spark

Note Short description
Spark: Miscellaneous Commands and Tips Miscellaneous info, commands, configurations and tips for Spark.
Spark Memory Configuration Info on how to configure Spark executor memory
Spark: Profiling and Flame Graphs using Pyroscope How to use Pyroscope with Spark for performance troubleshooting with profiling and Flame Graph visualization
Spark: How to generate histograms Examples of how to generate frequency histograms using Spark DataFrame API and Spark SQL
Spark Performace Tool sparkMeasure Examples of how to use a tool called sparkMeasure to collect and display Spark metrics.
Spark and OpenSearch Notes and examples on how to use Spark with OpenSearch
Spark MapinArrow Examples of mapInArrow for Spark UDF and array processing.
Spark EventLog Example code of read and perform analytics on Spark EventLog data using Spark SQL.
Spark SQL: UDF Fun Examples With Mandelbrot Set Mandelbrot set with Spark SQL: examples of Spark SQL and UDF, code in Python and Scala + some eye candy.
Spark: How To Read Oracle Tables How to read Oracle tables into Spark dataframes using JDBC. Use this to transfer data from Oracle to Parquet. With additional notes on performance and Apache Sqoop.
Spark and YARN: How to Set a Custom_ Java Home How use a custom Java Home/Version for Spark executors on YARN.
Spark: How to deploy Kerberos TGT to the Executors Example code of how to access Kerberized resources from Spark jobs/executors.
Spark Task Metrics Short description of Spark task Metrics
Scala Project and Spark SQL A basic example of a working Scala project using Spark SQL
Spark HBase_Connector How to access Hbase from Spark
Spark_TFRecord_HowTo How to convert data into TFRecord (TensorFlow's native) format using Spark
Tools for Apache Parquet Diagnostics Examples of Parquet diagnostic tools: parquet-tools and parquet_reader.
Tools: Measure OS CPU Disk_Network on LInux Notes and examples of OS tools for diagnostics and troubleshooting on Linux
Tools: Measure Linux Memory Performance Notes and examples of tools for measuring CPU-bound workload and memory throughput on Linux
Tools: Spark and Linux Flame Graph Notes and examples of tools for stack profiling and Flame Graph visualization relevant for Spark (Java/JVM) on Linux