<div style="text-align:center"><h1>Integration of SWAN with NXCals</h1></div>
<hr style="border-top-width: 4px; border-top-color: #34609b;">

This notebook illustrates the use of __Spark in SWAN to access CERN Accelerator logging service data__.

### Connect to the cluster (NXCals)
In the SWAN configuration menu:
- Choose the NXCals project software stack
- Choose the NXCals Hadoop cluster

To connect to a cluster, click on the star button on the top and follow the instructions
* The star button only appears if you have selected a SPARK cluster in the configuration
* The star button is active after the notebook kernel is ready
* SELECT NXCALS configuration bundle
* Access to the cluster and NXCALS data is controlled by acc-logging-team, please contact acc-logging-team@cern.ch


# NXCals DataExtraction API - Examples
See NXCals API documentation at: http://nxcals-docs.web.cern.ch/current/

In [1]:
from nxcals.api.extraction.data.builders import DataQuery
from pyspark.sql.functions import col

## Extract scalar values

In [2]:
# This example shows how to extract scalar values

df1 = ( DataQuery.builder(spark).byEntities()
          .system('CMW') 
          .startTime('2023-06-29 00:00:00.000')
          .endTime('2023-06-30 00:00:00.000')
          .entity()
          .keyValues({'device': 'LHC.LUMISERVER', 'property': 'CrossingAngleIP1'})
          .build()
      )

23/12/13 19:42:34 WARN URLConfigurationSource: No URLs will be polled as dynamic configuration sources.


In [3]:
df1.printSchema()

root
 |-- DeltaCrossingAngle: double (nullable = true)
 |-- Moving: boolean (nullable = true)
 |-- __record_timestamp__: long (nullable = true)
 |-- __record_version__: long (nullable = true)
 |-- acqStamp: long (nullable = true)
 |-- class: string (nullable = true)
 |-- cyclestamp: long (nullable = true)
 |-- device: string (nullable = true)
 |-- property: string (nullable = true)
 |-- selector: string (nullable = true)
 |-- nxcals_entity_id: long (nullable = true)



In [4]:
df1.limit(3).toPandas()

23/12/13 19:42:42 WARN CheckAllocator: More than one DefaultAllocationManager on classpath. Choosing first found


Unnamed: 0,DeltaCrossingAngle,Moving,__record_timestamp__,__record_version__,acqStamp,class,cyclestamp,device,property,selector,nxcals_entity_id
0,-75.0,True,1688037433065102000,0,1688037433065102000,LhcLumiscan,0,LHC.LUMISERVER,CrossingAngleIP1,,57336
1,-75.0,True,1688037433066202000,0,1688037433066202000,LhcLumiscan,0,LHC.LUMISERVER,CrossingAngleIP1,,57336
2,-75.0,True,1688037433067349000,0,1688037433067349000,LhcLumiscan,0,LHC.LUMISERVER,CrossingAngleIP1,,57336


## Extract vector values

In [5]:
# This example shows how to extract vector values

df2 = ( DataQuery.builder(spark).byVariables()
         .system('CMW') 
         .startTime('2018-05-21 00:00:00.000')
         .endTime('2018-05-21 00:05:00.000')
         .variable('SPS.BCTDC.51895:TOTAL_INTENSITY')
         .build()
      )

In [6]:
df2.printSchema()

root
 |-- nxcals_value: struct (nullable = true)
 |    |-- elements: array (nullable = true)
 |    |    |-- element: double (containsNull = true)
 |    |-- dimensions: array (nullable = true)
 |    |    |-- element: integer (containsNull = true)
 |-- nxcals_entity_id: long (nullable = true)
 |-- nxcals_timestamp: long (nullable = true)
 |-- nxcals_variable_name: string (nullable = true)



In [7]:
df2.schema.fields

[StructField('nxcals_value', StructType([StructField('elements', ArrayType(DoubleType(), True), True), StructField('dimensions', ArrayType(IntegerType(), True), True)]), True),
 StructField('nxcals_entity_id', LongType(), True),
 StructField('nxcals_timestamp', LongType(), True),
 StructField('nxcals_variable_name', StringType(), True)]

In [8]:
elements = df2.withColumn("nx_elements", col("nxcals_value.elements")).withColumn("nx_dimensions", col("nxcals_value.dimensions")).select("nx_elements")
elements.take(3)

[Row(nx_elements=[0.2579849, 0.28976566, 0.30659077, 0.29101196, 0.27730262, 0.27481002, 0.25362283, 0.22246526, 0.21748003, 0.23804405, 0.2517534, 0.24053666, 0.2361746, 0.24926077, 0.25611547, 0.25362283, 0.26421642, 0.2866499, 0.2891425, 0.26172382, 0.24739133, 0.25611547, 0.26172382, 0.24801446, 0.24863762, 0.26920164, 0.27605632, 0.26297012, 0.25985438, 0.2735637, 0.28353414, 0.2766795, 0.28041837, 0.29662034, 0.29038882, 0.26421642, 0.24427556, 0.24302925, 0.23181254, 0.21498741, 0.23243569, 0.2723174, 0.2854036, 0.27543315, 0.2698248, 0.27481002, 0.254246, 0.225581, 0.23181254, 0.2710711, 0.2928814, 0.28415728, 0.27854893, 0.29101196, 0.29599717, 0.2891425, 0.29599717, 0.3165612, 0.31095284, 0.2735637, 0.2436524, 0.2374209, 0.2374209, 0.22869675, 0.24926077, 0.28478044, 0.29599717, 0.27481002, 0.25860804, 0.26920164, 0.28104153, 0.27854893, 0.28353414, 0.29973608, 0.29350457, 0.25611547, 0.22495785, 0.22371154, 0.23243569, 0.23679775, 0.26110068, 0.3022287, 0.32528532, 0.3146917

## Extract matrix values


In [9]:
# This example shows how to extract matrix values

df3 = ( DataQuery.builder(spark).byVariables()
          .system('CMW')
          .startTime('2018-08-15 00:00:00.000')
          .endTime('2018-08-30 00:00:00.000')
          .variable('HIE-BCAM-T2M03:RAWMEAS#NPIXELS')
          .build()
      )

In [10]:
df3.printSchema()

root
 |-- nxcals_value: struct (nullable = true)
 |    |-- elements: array (nullable = true)
 |    |    |-- element: long (containsNull = true)
 |    |-- dimensions: array (nullable = true)
 |    |    |-- element: integer (containsNull = true)
 |-- nxcals_entity_id: long (nullable = true)
 |-- nxcals_timestamp: long (nullable = true)
 |-- nxcals_variable_name: string (nullable = true)



In [11]:
df3.schema.fields

[StructField('nxcals_value', StructType([StructField('elements', ArrayType(LongType(), True), True), StructField('dimensions', ArrayType(IntegerType(), True), True)]), True),
 StructField('nxcals_entity_id', LongType(), True),
 StructField('nxcals_timestamp', LongType(), True),
 StructField('nxcals_variable_name', StringType(), True)]

In [12]:
matrices = (df3
              .withColumn("matrix", col("nxcals_value.elements"))
              .withColumn("dim1", col("nxcals_value.dimensions")[0])
              .withColumn("dim2", col("nxcals_value.dimensions")[1])
              .select("matrix", "dim1", "dim2")
           )

matrices.take(2)

[Row(matrix=[14, 0, 0, 0, 0, 0, 0, 0, 0, 0, 14, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 10, 17, 15, 0, 0, 0, 0, 0, 0, 13, 13, 19, 17, 0, 0, 0, 0, 0, 0, 13, 13, 0, 0, 0, 0, 0, 0, 0, 0, 13, 12, 0, 0, 0, 0, 0, 0, 0, 0, 11, 11, 8, 9, 8, 9, 0, 0, 0, 0, 12, 13, 10, 10, 10, 9, 0, 0, 0, 0, 13, 13, 0, 0, 0, 0, 0, 0, 0, 0, 13, 13, 0, 0, 0, 0, 0, 0, 0, 0, 12, 10, 0, 0, 0, 0, 0, 0, 0, 0, 12, 12, 0, 0, 0, 0, 0, 0, 0, 0, 13, 11, 0, 0, 0, 0, 0, 0, 0, 0, 12, 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

## Working with timestamps
See NXCals API documentation at: http://nxcals-docs.web.cern.ch/current/

In [13]:
import pandas as pd
from nxcals.api.extraction.data.builders import DataQuery

# Getting some sample data to use for demonstrating how to handle timestamps
df = ( DataQuery.builder(spark).byEntities()
         .system('CMW') 
         .startTime('2023-06-29 00:00:00.000')
         .endTime('2023-06-30 01:00:00.000') 
         .entity()
         .keyValue('device', 'LHC.LUMISERVER')
         .keyValue('property', 'CrossingAngleIP1')
         .build()
     )


In [14]:
# Format the timestamp column and output result
# timestamps are stored with nanosecond precision in the example data
df.selectExpr("timestamp_micros(cast(acqStamp/1000 as long)) as stamp", 'acqStamp').limit(5).toPandas()


Unnamed: 0,stamp,acqStamp
0,2023-06-29 13:17:13.065102,1688037433065102000
1,2023-06-29 13:17:13.066202,1688037433066202000
2,2023-06-29 13:17:13.067349,1688037433067349000
3,2023-06-29 13:20:09.764569,1688037609764569000
4,2023-06-29 13:20:28.496843,1688037628496843000


In [15]:
spark.stop()

INFO:SparkMonitorKernel:Scala socket closed - empty data
INFO:SparkMonitorKernel:Socket Exiting Client Loop
INFO:SparkMonitorKernel:Starting socket thread, going to accept
