Skip to content

Latest commit

 

History

History
858 lines (616 loc) · 71.8 KB

File metadata and controls

858 lines (616 loc) · 71.8 KB

Engine-spark

Find below the list.


ConsoleStructuredStreamProviderService

Provide a ways to print output in console in a StructuredStream streams

Class

com.hurence.logisland.stream.spark.structured.provider.ConsoleStructuredStreamProviderService

Tags

None.

Properties

This component has no required or optional properties.

Extra informations

No additional information is provided


DummyRecordStream

No description provided.

Class

com.hurence.logisland.stream.spark.DummyRecordStream

Tags

None.

Properties

This component has no required or optional properties.

Extra informations

No additional information is provided


KafkaConnectBaseProviderService

No description provided.

Class

com.hurence.logisland.stream.spark.provider.KafkaConnectBaseProviderService

Tags

None.

Properties

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

allowable-values
Name Description Allowable Values Default Value Sensitive EL
kc.connector.class The class canonical name of the kafka connector to use. null false false
kc.connector.properties The properties (key=value) for the connector. false false
kc.data.key.converter Key converter class null false false
kc.data.key.converter.properties Key converter properties false false
kc.data.value.converter Value converter class null false false
kc.data.value.converter.properties Value converter properties false false
kc.worker.tasks.max Max number of threads for this connector 1 false false
kc.partitions.max Max number of partitions for this connector. null false false
kc.connector.offset.backing.store The underlying backing store to be used. memory (Standalone in memory offset backing store. Not suitable for clustered deployments unless source is unique or stateless), file (Standalone filesystem based offset backing store. You have to specify the property offset.storage.file.filename for the file path.Not suitable for clustered deployments unless source is unique or standalone), kafka (Distributed kafka topic based offset backing store. See the javadoc of class org.apache.kafka.connect.storage.KafkaOffsetBackingStore for the configuration options.This backing store is well suited for distributed deployments.) memory false false
kc.connector.offset.backing.store.properties Properties to configure the offset backing store false false

Extra informations

No additional information is provided


KafkaConnectStructuredSinkProviderService

No description provided.

Class

com.hurence.logisland.stream.spark.provider.KafkaConnectStructuredSinkProviderService

Tags

None.

Properties

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

allowable-values
Name Description Allowable Values Default Value Sensitive EL
kc.connector.class The class canonical name of the kafka connector to use. null false false
kc.connector.properties The properties (key=value) for the connector. false false
kc.data.key.converter Key converter class null false false
kc.data.key.converter.properties Key converter properties false false
kc.data.value.converter Value converter class null false false
kc.data.value.converter.properties Value converter properties false false
kc.worker.tasks.max Max number of threads for this connector 1 false false
kc.partitions.max Max number of partitions for this connector. null false false
kc.connector.offset.backing.store The underlying backing store to be used. memory (Standalone in memory offset backing store. Not suitable for clustered deployments unless source is unique or stateless), file (Standalone filesystem based offset backing store. You have to specify the property offset.storage.file.filename for the file path.Not suitable for clustered deployments unless source is unique or standalone), kafka (Distributed kafka topic based offset backing store. See the javadoc of class org.apache.kafka.connect.storage.KafkaOffsetBackingStore for the configuration options.This backing store is well suited for distributed deployments.) memory false false
kc.connector.offset.backing.store.properties Properties to configure the offset backing store false false

Extra informations

No additional information is provided


KafkaConnectStructuredSourceProviderService

No description provided.

Class

com.hurence.logisland.stream.spark.provider.KafkaConnectStructuredSourceProviderService

Tags

None.

Properties

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

allowable-values
Name Description Allowable Values Default Value Sensitive EL
kc.connector.class The class canonical name of the kafka connector to use. null false false
kc.connector.properties The properties (key=value) for the connector. false false
kc.data.key.converter Key converter class null false false
kc.data.key.converter.properties Key converter properties false false
kc.data.value.converter Value converter class null false false
kc.data.value.converter.properties Value converter properties false false
kc.worker.tasks.max Max number of threads for this connector 1 false false
kc.partitions.max Max number of partitions for this connector. null false false
kc.connector.offset.backing.store The underlying backing store to be used. memory (Standalone in memory offset backing store. Not suitable for clustered deployments unless source is unique or stateless), file (Standalone filesystem based offset backing store. You have to specify the property offset.storage.file.filename for the file path.Not suitable for clustered deployments unless source is unique or standalone), kafka (Distributed kafka topic based offset backing store. See the javadoc of class org.apache.kafka.connect.storage.KafkaOffsetBackingStore for the configuration options.This backing store is well suited for distributed deployments.) memory false false
kc.connector.offset.backing.store.properties Properties to configure the offset backing store false false

Extra informations

No additional information is provided


KafkaRecordStreamDebugger

No description provided.

Class

com.hurence.logisland.stream.spark.KafkaRecordStreamDebugger

Tags

None.

Properties

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

Extra informations

No additional information is provided


KafkaRecordStreamHDFSBurner

No description provided.

Class

com.hurence.logisland.stream.spark.KafkaRecordStreamHDFSBurner

Tags

None.

Properties

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

Extra informations

No additional information is provided


KafkaRecordStreamParallelProcessing

No description provided.

Class

com.hurence.logisland.stream.spark.KafkaRecordStreamParallelProcessing

Tags

None.

Properties

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

Extra informations

No additional information is provided


KafkaRecordStreamSQLAggregator

This is a stream capable of SQL query interpretations.

Class

com.hurence.logisland.stream.spark.KafkaRecordStreamSQLAggregator

Tags

stream, SQL, query, record

Properties

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

Extra informations

No additional information is provided


KafkaStreamProcessingEngine

No description provided.

Class

com.hurence.logisland.engine.spark.KafkaStreamProcessingEngine

Tags

None.

Properties

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

allowable-values
Name Description Allowable Values Default Value Sensitive EL
spark.app.name Tha application name logisland false false
spark.master The url to Spark Master local[2] false false
spark.monitoring.driver.port The port for exposing monitoring metrics null false false
spark.yarn.deploy-mode The yarn deploy mode null false false
spark.yarn.queue The name of the YARN queue default false false
spark.driver.memory The memory size for Spark driver 512m false false
spark.executor.memory The memory size for Spark executors 1g false false
spark.driver.cores The number of cores for Spark driver 4 false false
spark.executor.cores The number of cores for Spark driver 1 false false
spark.executor.instances The number of instances for Spark app null false false
spark.serializer Class to use for serializing objects that will be sent over the network or need to be cached in serialized form org.apache.spark.serializer.KryoSerializer false false
spark.streaming.blockInterval Interval at which data received by Spark Streaming receivers is chunked into blocks of data before storing them in Spark. Minimum recommended - 50 ms 350 false false
spark.streaming.kafka.maxRatePerPartition Maximum rate (number of records per second) at which data will be read from each Kafka partition 5000 false false
spark.streaming.batchDuration No Description Provided. 2000 false false
spark.streaming.backpressure.enabled This enables the Spark Streaming to control the receiving rate based on the current batch scheduling delays and processing times so that the system receives only as fast as the system can process. false false false
spark.streaming.unpersist Force RDDs generated and persisted by Spark Streaming to be automatically unpersisted from Spark's memory. The raw input data received by Spark Streaming is also automatically cleared. Setting this to false will allow the raw data and persisted RDDs to be accessible outside the streaming application as they will not be cleared automatically. But it comes at the cost of higher memory usage in Spark. false false false
spark.ui.port No Description Provided. 4050 false false
spark.streaming.timeout No Description Provided. -1 false false
spark.streaming.kafka.maxRetries Maximum rate (number of records per second) at which data will be read from each Kafka partition 3 false false
spark.streaming.ui.retainedBatches How many batches the Spark Streaming UI and status APIs remember before garbage collecting. 200 false false
spark.streaming.receiver.writeAheadLog.enable Enable write ahead logs for receivers. All the input data received through receivers will be saved to write ahead logs that will allow it to be recovered after driver failures. false false false
spark.yarn.maxAppAttempts Because Spark driver and Application Master share a single JVM, any error in Spark driver stops our long-running job. Fortunately it is possible to configure maximum number of attempts that will be made to re-run the application. It is reasonable to set higher value than default 2 (derived from YARN cluster property yarn.resourcemanager.am.max-attempts). 4 works quite well, higher value may cause unnecessary restarts even if the reason of the failure is permanent. 4 false false
spark.yarn.am.attemptFailuresValidityInterval If the application runs for days or weeks without restart or redeployment on highly utilized cluster, 4 attempts could be exhausted in few hours. To avoid this situation, the attempt counter should be reset on every hour of so. 1h false false
spark.yarn.max.executor.failures a maximum number of executor failures before the application fails. By default it is max(2 * num executors, 3), well suited for batch jobs but not for long-running jobs. The property comes with corresponding validity interval which also should be set.8 * num_executors 20 false false
spark.yarn.executor.failuresValidityInterval If the application runs for days or weeks without restart or redeployment on highly utilized cluster, x attempts could be exhausted in few hours. To avoid this situation, the attempt counter should be reset on every hour of so. 1h false false
spark.task.maxFailures For long-running jobs you could also consider to boost maximum number of task failures before giving up the job. By default tasks will be retried 4 times and then job fails. 8 false false
spark.memory.fraction expresses the size of M as a fraction of the (JVM heap space - 300MB) (default 0.75). The rest of the space (25%) is reserved for user data structures, internal metadata in Spark, and safeguarding against OOM errors in the case of sparse and unusually large records. 0.6 false false
spark.memory.storageFraction expresses the size of R as a fraction of M (default 0.5). R is the storage space within M where cached blocks immune to being evicted by execution. 0.5 false false
spark.scheduler.mode The scheduling mode between jobs submitted to the same SparkContext. Can be set to FAIR to use fair sharing instead of queueing jobs one after another. Useful for multi-user services. FAIR (fair sharing), FIFO (queueing jobs one after another) FAIR false false
spark.properties.file.path for using --properties-file option while submitting spark job null false false
java.library.path The java library path to use with mesos. null false false
spark.cores.max The maximum number of total executor core with mesos. null false false

Extra informations

No additional information is provided


KafkaStructuredStreamProviderService

Provide a ways to use kafka as input or output in StructuredStream streams

Class

com.hurence.logisland.stream.spark.structured.provider.KafkaStructuredStreamProviderService

Tags

None.

Properties

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

Extra informations

No additional information is provided


LocalFileStructuredStreamProviderService

Provide a way to read a local file as input in StructuredStream streams

Class

com.hurence.logisland.stream.spark.structured.provider.LocalFileStructuredStreamProviderService

Tags

None.

Properties

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

Extra informations

No additional information is provided


MQTTStructuredStreamProviderService

Provide a ways to use Mqtt a input or output in StructuredStream streams

Class

com.hurence.logisland.stream.spark.structured.provider.MQTTStructuredStreamProviderService

Tags

None.

Properties

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

allowable-values
Name Description Allowable Values Default Value Sensitive EL
mqtt.broker.url brokerUrl A url MqttClient connects to. Set this or path as the url of the Mqtt Server. e.g. tcp://localhost:1883 tcp://localhost:1883 false false
mqtt.clean.session cleanSession Setting it true starts a clean session, removes all checkpointed messages by a previous run of this source. This is set to false by default. true false false
mqtt.client.id clientID this client is associated. Provide the same value to recover a stopped client. null false false
mqtt.connection.timeout connectionTimeout Sets the connection timeout, a value of 0 is interpreted as wait until client connects. See MqttConnectOptions.setConnectionTimeout for more information 5000 false false
mqtt.keep.alive keepAlive Same as MqttConnectOptions.setKeepAliveInterval. 5000 false false
mqtt.password password Sets the password to use for the connection null false false
mqtt.persistence persistence By default it is used for storing incoming messages on disk. If memory is provided as value for this option, then recovery on restart is not supported. memory false false
mqtt.version mqttVersion Same as MqttConnectOptions.setMqttVersion 5000 false false
mqtt.username username Sets the user name to use for the connection to Mqtt Server. Do not set it, if server does not need this. Setting it empty will lead to errors. null false false
mqtt.qos QoS The maximum quality of service to subscribe each topic at.Messages published at a lower quality of service will be received at the published QoS.Messages published at a higher quality of service will be received using the QoS specified on the subscribe 0 false false
mqtt.topic Topic MqttClient subscribes to. null false false

Extra informations

No additional information is provided


RateStructuredStreamProviderService

Generates data at the specified number of rows per second, each output row contains a timestamp and value. Where timestamp is a Timestamp type containing the time of message dispatch, and value is of Long type containing the message count, starting from 0 as the first row. This source is intended for testing and benchmarking. Used in StructuredStream streams.

Class

com.hurence.logisland.stream.spark.structured.provider.RateStructuredStreamProviderService

Tags

None.

Properties

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

allowable-values
Name Description Allowable Values Default Value Sensitive EL
local.file.input.path the location of the file to be loaded null false false
local.file.output.path the location of the file to be writen null false false
has.csv.header Is this a csv file with the first line as a header true false false
csv.delimiter the delimiter , false false

Extra informations

No additional information is provided


RemoteApiStreamProcessingEngine

No description provided.

Class

com.hurence.logisland.engine.spark.RemoteApiStreamProcessingEngine

Tags

None.

Properties

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

allowable-values
Name Description Allowable Values Default Value Sensitive EL
spark.app.name Tha application name logisland false false
spark.master The url to Spark Master local[2] false false
spark.monitoring.driver.port The port for exposing monitoring metrics null false false
spark.yarn.deploy-mode The yarn deploy mode null false false
spark.yarn.queue The name of the YARN queue default false false
spark.driver.memory The memory size for Spark driver 512m false false
spark.executor.memory The memory size for Spark executors 1g false false
spark.driver.cores The number of cores for Spark driver 4 false false
spark.executor.cores The number of cores for Spark driver 1 false false
spark.executor.instances The number of instances for Spark app null false false
spark.serializer Class to use for serializing objects that will be sent over the network or need to be cached in serialized form org.apache.spark.serializer.KryoSerializer false false
spark.streaming.blockInterval Interval at which data received by Spark Streaming receivers is chunked into blocks of data before storing them in Spark. Minimum recommended - 50 ms 350 false false
spark.streaming.kafka.maxRatePerPartition Maximum rate (number of records per second) at which data will be read from each Kafka partition 5000 false false
spark.streaming.batchDuration No Description Provided. 2000 false false
spark.streaming.backpressure.enabled This enables the Spark Streaming to control the receiving rate based on the current batch scheduling delays and processing times so that the system receives only as fast as the system can process. false false false
spark.streaming.unpersist Force RDDs generated and persisted by Spark Streaming to be automatically unpersisted from Spark's memory. The raw input data received by Spark Streaming is also automatically cleared. Setting this to false will allow the raw data and persisted RDDs to be accessible outside the streaming application as they will not be cleared automatically. But it comes at the cost of higher memory usage in Spark. false false false
spark.ui.port No Description Provided. 4050 false false
spark.streaming.timeout No Description Provided. -1 false false
spark.streaming.kafka.maxRetries Maximum rate (number of records per second) at which data will be read from each Kafka partition 3 false false
spark.streaming.ui.retainedBatches How many batches the Spark Streaming UI and status APIs remember before garbage collecting. 200 false false
spark.streaming.receiver.writeAheadLog.enable Enable write ahead logs for receivers. All the input data received through receivers will be saved to write ahead logs that will allow it to be recovered after driver failures. false false false
spark.yarn.maxAppAttempts Because Spark driver and Application Master share a single JVM, any error in Spark driver stops our long-running job. Fortunately it is possible to configure maximum number of attempts that will be made to re-run the application. It is reasonable to set higher value than default 2 (derived from YARN cluster property yarn.resourcemanager.am.max-attempts). 4 works quite well, higher value may cause unnecessary restarts even if the reason of the failure is permanent. 4 false false
spark.yarn.am.attemptFailuresValidityInterval If the application runs for days or weeks without restart or redeployment on highly utilized cluster, 4 attempts could be exhausted in few hours. To avoid this situation, the attempt counter should be reset on every hour of so. 1h false false
spark.yarn.max.executor.failures a maximum number of executor failures before the application fails. By default it is max(2 * num executors, 3), well suited for batch jobs but not for long-running jobs. The property comes with corresponding validity interval which also should be set.8 * num_executors 20 false false
spark.yarn.executor.failuresValidityInterval If the application runs for days or weeks without restart or redeployment on highly utilized cluster, x attempts could be exhausted in few hours. To avoid this situation, the attempt counter should be reset on every hour of so. 1h false false
spark.task.maxFailures For long-running jobs you could also consider to boost maximum number of task failures before giving up the job. By default tasks will be retried 4 times and then job fails. 8 false false
spark.memory.fraction expresses the size of M as a fraction of the (JVM heap space - 300MB) (default 0.75). The rest of the space (25%) is reserved for user data structures, internal metadata in Spark, and safeguarding against OOM errors in the case of sparse and unusually large records. 0.6 false false
spark.memory.storageFraction expresses the size of R as a fraction of M (default 0.5). R is the storage space within M where cached blocks immune to being evicted by execution. 0.5 false false
spark.scheduler.mode The scheduling mode between jobs submitted to the same SparkContext. Can be set to FAIR to use fair sharing instead of queueing jobs one after another. Useful for multi-user services. FAIR (fair sharing), FIFO (queueing jobs one after another) FAIR false false
spark.properties.file.path for using --properties-file option while submitting spark job null false false
java.library.path The java library path to use with mesos. null false false
spark.cores.max The maximum number of total executor core with mesos. null false false
remote.api.baseUrl The base URL of the remote server providing logisland configuration null false false
remote.api.polling.rate Remote api polling rate in milliseconds null false false
remote.api.push.rate Remote api configuration push rate in milliseconds null false false
remote.api.timeouts.connect Remote api connection timeout in milliseconds 10000 false false
remote.api.auth.user The basic authentication user for the remote api endpoint. null false false
remote.api.auth.password The basic authentication password for the remote api endpoint. null false false
remote.api.timeouts.socket Remote api default read/write socket timeout in milliseconds 10000 false false

Extra informations

No additional information is provided


StructuredStream

No description provided.

Class

com.hurence.logisland.stream.spark.structured.StructuredStream

Tags

None.

Properties

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

allowable-values
Name Description Allowable Values Default Value Sensitive EL
read.stream.service.provider the controller service that gives connection information null false false
read.topics.serializer the serializer to use com.hurence.logisland.serializer.KryoSerializer (serialize events as binary blocs), com.hurence.logisland.serializer.JsonSerializer (serialize events as json blocs), com.hurence.logisland.serializer.ExtendedJsonSerializer (serialize events as json blocs supporting nested objects/arrays), com.hurence.logisland.serializer.AvroSerializer (serialize events as avro blocs), com.hurence.logisland.serializer.BytesArraySerializer (serialize events as byte arrays), com.hurence.logisland.serializer.StringSerializer (serialize events as string), none (send events as bytes), com.hurence.logisland.serializer.KuraProtobufSerializer (serialize events as Kura protocol buffer) none false false
read.topics.key.serializer The key serializer to use com.hurence.logisland.serializer.KryoSerializer (serialize events as binary blocs), com.hurence.logisland.serializer.JsonSerializer (serialize events as json blocs), com.hurence.logisland.serializer.ExtendedJsonSerializer (serialize events as json blocs supporting nested objects/arrays), com.hurence.logisland.serializer.AvroSerializer (serialize events as avro blocs), com.hurence.logisland.serializer.BytesArraySerializer (serialize events as byte arrays), com.hurence.logisland.serializer.KuraProtobufSerializer (serialize events as Kura protocol buffer), com.hurence.logisland.serializer.StringSerializer (serialize events as string), none (send events as bytes) none false false
write.stream.service.provider the controller service that gives connection information null false false
write.topics.serializer the serializer to use com.hurence.logisland.serializer.KryoSerializer (serialize events as binary blocs), com.hurence.logisland.serializer.JsonSerializer (serialize events as json blocs), com.hurence.logisland.serializer.ExtendedJsonSerializer (serialize events as json blocs supporting nested objects/arrays), com.hurence.logisland.serializer.AvroSerializer (serialize events as avro blocs), com.hurence.logisland.serializer.BytesArraySerializer (serialize events as byte arrays), com.hurence.logisland.serializer.StringSerializer (serialize events as string), none (send events as bytes), com.hurence.logisland.serializer.KuraProtobufSerializer (serialize events as Kura protocol buffer) none false false
write.topics.key.serializer The key serializer to use com.hurence.logisland.serializer.KryoSerializer (serialize events as binary blocs), com.hurence.logisland.serializer.JsonSerializer (serialize events as json blocs), com.hurence.logisland.serializer.ExtendedJsonSerializer (serialize events as json blocs supporting nested objects/arrays), com.hurence.logisland.serializer.AvroSerializer (serialize events as avro blocs), com.hurence.logisland.serializer.BytesArraySerializer (serialize events as byte arrays), com.hurence.logisland.serializer.StringSerializer (serialize events as string), none (send events as bytes), com.hurence.logisland.serializer.KuraProtobufSerializer (serialize events as Kura protocol buffer) none false false
groupby comma separated list of fields to group the partition by null false false
state.timeout.ms the time in ms before we invalidate the microbatch state 2000 false false
chunk.size the number of records to group into chunks 100 false false

Extra informations

No additional information is provided