Write an Excel document using the Spark2 datasource API
Clone this wiki locally
This is a Spark2 datasource application demonstrating some of the capabilities of the hadoopoffice library. This example features writing of Excel files with formulas and comments. It has successfully been tested with the HDP Sandbox VM 2.5, but other Hadoop distributions should work equally well, if they support Spark 2.
Building the example
git clone https://github.com/ZuInnoTe/hadoopoffice.git hadoopoffice
You can build the application by changing to the directory hadoopoffice/examples/scala-spark2-excel-out-ds and using the following command:
sbt clean +assembly
Running the example
Before you execute the example make sure that the output directory does not exist: hadoop fs -rm -R /user/spark/output
Execute the following command (please take care that you use spark-submit of Spark2)
spark-submit --class org.zuinnote.spark.office.example.excel.SparkScalaExcelOutDataSource ./example-ho-spark-scala-ds-ex celout.jar /user/spark/output/
After the Spark2 job has been completed, you find the Excel file on HDFS. You can copy it to your local filesystem and open it in Excel or LibreOffice Calc using the following command: hadoop fs -copyToLocal /user/spark/output/part-m-00000.xlsx
The Excel files contains cell with numeric values, a formula adding up some cells and a comment.