Apache Spark for High Energy Physics
In this page you can find links and examples related to using Spark for reading and processing HEP data.
LHCb Open Data Analysis Using PySpark
- Jupyter notebook on GitHub at: LHCb_OpenData_Spark.ipynb
- CERN SWAN service users, run from this link to CERN Box
CMS Big Data project
- Example notebook using CMS opendata, spark-root and Hadoop-XRootD: CMS_BigData_Opendata_Spark_Example1.ipynb
Relevant technology and links:
- spark-root: a library to read HEP files in ROOT format into Spark DataFrames.
- Spark SQL, DataFrames and Datasets Guide
- CERN Open Data portal
- LHCb Open Data project
- CERN SWAN service
- CMS Big Data project
- Hadoop-XRootD connector - allows to read files from EOS using the XRootD protocol (currently available on CERN Gitlab, will be available on GitHub)