Find below the list.
Provide a ways to print output in console in a StructuredStream streams
com.hurence.logisland.stream.spark.structured.provider.ConsoleStructuredStreamProviderService
None.
This component has no required or optional properties.
No additional information is provided
No description provided.
com.hurence.logisland.stream.spark.DummyRecordStream
None.
This component has no required or optional properties.
No additional information is provided
No description provided.
com.hurence.logisland.stream.spark.provider.KafkaConnectBaseProviderService
None.
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
allowable-valuesName | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
kc.connector.class | The class canonical name of the kafka connector to use. | null | false | false | |
kc.connector.properties | The properties (key=value) for the connector. | false | false | ||
kc.data.key.converter | Key converter class | null | false | false | |
kc.data.key.converter.properties | Key converter properties | false | false | ||
kc.data.value.converter | Value converter class | null | false | false | |
kc.data.value.converter.properties | Value converter properties | false | false | ||
kc.worker.tasks.max | Max number of threads for this connector | 1 | false | false | |
kc.partitions.max | Max number of partitions for this connector. | null | false | false | |
kc.connector.offset.backing.store | The underlying backing store to be used. | memory (Standalone in memory offset backing store. Not suitable for clustered deployments unless source is unique or stateless), file (Standalone filesystem based offset backing store. You have to specify the property offset.storage.file.filename for the file path.Not suitable for clustered deployments unless source is unique or standalone), kafka (Distributed kafka topic based offset backing store. See the javadoc of class org.apache.kafka.connect.storage.KafkaOffsetBackingStore for the configuration options.This backing store is well suited for distributed deployments.) | memory | false | false |
kc.connector.offset.backing.store.properties | Properties to configure the offset backing store | false | false |
No additional information is provided
No description provided.
com.hurence.logisland.stream.spark.provider.KafkaConnectStructuredSinkProviderService
None.
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
allowable-valuesName | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
kc.connector.class | The class canonical name of the kafka connector to use. | null | false | false | |
kc.connector.properties | The properties (key=value) for the connector. | false | false | ||
kc.data.key.converter | Key converter class | null | false | false | |
kc.data.key.converter.properties | Key converter properties | false | false | ||
kc.data.value.converter | Value converter class | null | false | false | |
kc.data.value.converter.properties | Value converter properties | false | false | ||
kc.worker.tasks.max | Max number of threads for this connector | 1 | false | false | |
kc.partitions.max | Max number of partitions for this connector. | null | false | false | |
kc.connector.offset.backing.store | The underlying backing store to be used. | memory (Standalone in memory offset backing store. Not suitable for clustered deployments unless source is unique or stateless), file (Standalone filesystem based offset backing store. You have to specify the property offset.storage.file.filename for the file path.Not suitable for clustered deployments unless source is unique or standalone), kafka (Distributed kafka topic based offset backing store. See the javadoc of class org.apache.kafka.connect.storage.KafkaOffsetBackingStore for the configuration options.This backing store is well suited for distributed deployments.) | memory | false | false |
kc.connector.offset.backing.store.properties | Properties to configure the offset backing store | false | false |
No additional information is provided
No description provided.
com.hurence.logisland.stream.spark.provider.KafkaConnectStructuredSourceProviderService
None.
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
allowable-valuesName | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
kc.connector.class | The class canonical name of the kafka connector to use. | null | false | false | |
kc.connector.properties | The properties (key=value) for the connector. | false | false | ||
kc.data.key.converter | Key converter class | null | false | false | |
kc.data.key.converter.properties | Key converter properties | false | false | ||
kc.data.value.converter | Value converter class | null | false | false | |
kc.data.value.converter.properties | Value converter properties | false | false | ||
kc.worker.tasks.max | Max number of threads for this connector | 1 | false | false | |
kc.partitions.max | Max number of partitions for this connector. | null | false | false | |
kc.connector.offset.backing.store | The underlying backing store to be used. | memory (Standalone in memory offset backing store. Not suitable for clustered deployments unless source is unique or stateless), file (Standalone filesystem based offset backing store. You have to specify the property offset.storage.file.filename for the file path.Not suitable for clustered deployments unless source is unique or standalone), kafka (Distributed kafka topic based offset backing store. See the javadoc of class org.apache.kafka.connect.storage.KafkaOffsetBackingStore for the configuration options.This backing store is well suited for distributed deployments.) | memory | false | false |
kc.connector.offset.backing.store.properties | Properties to configure the offset backing store | false | false |
No additional information is provided
No description provided.
com.hurence.logisland.stream.spark.KafkaRecordStreamDebugger
None.
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
No additional information is provided
No description provided.
com.hurence.logisland.stream.spark.KafkaRecordStreamHDFSBurner
None.
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
No additional information is provided
No description provided.
com.hurence.logisland.stream.spark.KafkaRecordStreamParallelProcessing
None.
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
No additional information is provided
This is a stream capable of SQL query interpretations.
com.hurence.logisland.stream.spark.KafkaRecordStreamSQLAggregator
stream, SQL, query, record
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
No additional information is provided
No description provided.
com.hurence.logisland.engine.spark.KafkaStreamProcessingEngine
None.
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
allowable-valuesName | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
spark.app.name | Tha application name | logisland | false | false | |
spark.master | The url to Spark Master | local[2] | false | false | |
spark.monitoring.driver.port | The port for exposing monitoring metrics | null | false | false | |
spark.yarn.deploy-mode | The yarn deploy mode | null | false | false | |
spark.yarn.queue | The name of the YARN queue | default | false | false | |
spark.driver.memory | The memory size for Spark driver | 512m | false | false | |
spark.executor.memory | The memory size for Spark executors | 1g | false | false | |
spark.driver.cores | The number of cores for Spark driver | 4 | false | false | |
spark.executor.cores | The number of cores for Spark driver | 1 | false | false | |
spark.executor.instances | The number of instances for Spark app | null | false | false | |
spark.serializer | Class to use for serializing objects that will be sent over the network or need to be cached in serialized form | org.apache.spark.serializer.KryoSerializer | false | false | |
spark.streaming.blockInterval | Interval at which data received by Spark Streaming receivers is chunked into blocks of data before storing them in Spark. Minimum recommended - 50 ms | 350 | false | false | |
spark.streaming.kafka.maxRatePerPartition | Maximum rate (number of records per second) at which data will be read from each Kafka partition | 5000 | false | false | |
spark.streaming.batchDuration | No Description Provided. | 2000 | false | false | |
spark.streaming.backpressure.enabled | This enables the Spark Streaming to control the receiving rate based on the current batch scheduling delays and processing times so that the system receives only as fast as the system can process. | false | false | false | |
spark.streaming.unpersist | Force RDDs generated and persisted by Spark Streaming to be automatically unpersisted from Spark's memory. The raw input data received by Spark Streaming is also automatically cleared. Setting this to false will allow the raw data and persisted RDDs to be accessible outside the streaming application as they will not be cleared automatically. But it comes at the cost of higher memory usage in Spark. | false | false | false | |
spark.ui.port | No Description Provided. | 4050 | false | false | |
spark.streaming.timeout | No Description Provided. | -1 | false | false | |
spark.streaming.kafka.maxRetries | Maximum rate (number of records per second) at which data will be read from each Kafka partition | 3 | false | false | |
spark.streaming.ui.retainedBatches | How many batches the Spark Streaming UI and status APIs remember before garbage collecting. | 200 | false | false | |
spark.streaming.receiver.writeAheadLog.enable | Enable write ahead logs for receivers. All the input data received through receivers will be saved to write ahead logs that will allow it to be recovered after driver failures. | false | false | false | |
spark.yarn.maxAppAttempts | Because Spark driver and Application Master share a single JVM, any error in Spark driver stops our long-running job. Fortunately it is possible to configure maximum number of attempts that will be made to re-run the application. It is reasonable to set higher value than default 2 (derived from YARN cluster property yarn.resourcemanager.am.max-attempts). 4 works quite well, higher value may cause unnecessary restarts even if the reason of the failure is permanent. | 4 | false | false | |
spark.yarn.am.attemptFailuresValidityInterval | If the application runs for days or weeks without restart or redeployment on highly utilized cluster, 4 attempts could be exhausted in few hours. To avoid this situation, the attempt counter should be reset on every hour of so. | 1h | false | false | |
spark.yarn.max.executor.failures | a maximum number of executor failures before the application fails. By default it is max(2 * num executors, 3), well suited for batch jobs but not for long-running jobs. The property comes with corresponding validity interval which also should be set.8 * num_executors | 20 | false | false | |
spark.yarn.executor.failuresValidityInterval | If the application runs for days or weeks without restart or redeployment on highly utilized cluster, x attempts could be exhausted in few hours. To avoid this situation, the attempt counter should be reset on every hour of so. | 1h | false | false | |
spark.task.maxFailures | For long-running jobs you could also consider to boost maximum number of task failures before giving up the job. By default tasks will be retried 4 times and then job fails. | 8 | false | false | |
spark.memory.fraction | expresses the size of M as a fraction of the (JVM heap space - 300MB) (default 0.75). The rest of the space (25%) is reserved for user data structures, internal metadata in Spark, and safeguarding against OOM errors in the case of sparse and unusually large records. | 0.6 | false | false | |
spark.memory.storageFraction | expresses the size of R as a fraction of M (default 0.5). R is the storage space within M where cached blocks immune to being evicted by execution. | 0.5 | false | false | |
spark.scheduler.mode | The scheduling mode between jobs submitted to the same SparkContext. Can be set to FAIR to use fair sharing instead of queueing jobs one after another. Useful for multi-user services. | FAIR (fair sharing), FIFO (queueing jobs one after another) | FAIR | false | false |
spark.properties.file.path | for using --properties-file option while submitting spark job | null | false | false | |
java.library.path | The java library path to use with mesos. | null | false | false | |
spark.cores.max | The maximum number of total executor core with mesos. | null | false | false |
No additional information is provided
Provide a ways to use kafka as input or output in StructuredStream streams
com.hurence.logisland.stream.spark.structured.provider.KafkaStructuredStreamProviderService
None.
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
No additional information is provided
Provide a way to read a local file as input in StructuredStream streams
com.hurence.logisland.stream.spark.structured.provider.LocalFileStructuredStreamProviderService
None.
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
No additional information is provided
Provide a ways to use Mqtt a input or output in StructuredStream streams
com.hurence.logisland.stream.spark.structured.provider.MQTTStructuredStreamProviderService
None.
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
allowable-valuesName | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
mqtt.broker.url | brokerUrl A url MqttClient connects to. Set this or path as the url of the Mqtt Server. e.g. tcp://localhost:1883 | tcp://localhost:1883 | false | false | |
mqtt.clean.session | cleanSession Setting it true starts a clean session, removes all checkpointed messages by a previous run of this source. This is set to false by default. | true | false | false | |
mqtt.client.id | clientID this client is associated. Provide the same value to recover a stopped client. | null | false | false | |
mqtt.connection.timeout | connectionTimeout Sets the connection timeout, a value of 0 is interpreted as wait until client connects. See MqttConnectOptions.setConnectionTimeout for more information | 5000 | false | false | |
mqtt.keep.alive | keepAlive Same as MqttConnectOptions.setKeepAliveInterval. | 5000 | false | false | |
mqtt.password | password Sets the password to use for the connection | null | false | false | |
mqtt.persistence | persistence By default it is used for storing incoming messages on disk. If memory is provided as value for this option, then recovery on restart is not supported. | memory | false | false | |
mqtt.version | mqttVersion Same as MqttConnectOptions.setMqttVersion | 5000 | false | false | |
mqtt.username | username Sets the user name to use for the connection to Mqtt Server. Do not set it, if server does not need this. Setting it empty will lead to errors. | null | false | false | |
mqtt.qos | QoS The maximum quality of service to subscribe each topic at.Messages published at a lower quality of service will be received at the published QoS.Messages published at a higher quality of service will be received using the QoS specified on the subscribe | 0 | false | false | |
mqtt.topic | Topic MqttClient subscribes to. | null | false | false |
No additional information is provided
Generates data at the specified number of rows per second, each output row contains a timestamp and value. Where timestamp is a Timestamp type containing the time of message dispatch, and value is of Long type containing the message count, starting from 0 as the first row. This source is intended for testing and benchmarking. Used in StructuredStream streams.
com.hurence.logisland.stream.spark.structured.provider.RateStructuredStreamProviderService
None.
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
allowable-valuesName | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
local.file.input.path | the location of the file to be loaded | null | false | false | |
local.file.output.path | the location of the file to be writen | null | false | false | |
has.csv.header | Is this a csv file with the first line as a header | true | false | false | |
csv.delimiter | the delimiter | , | false | false |
No additional information is provided
No description provided.
com.hurence.logisland.engine.spark.RemoteApiStreamProcessingEngine
None.
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
allowable-valuesName | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
spark.app.name | Tha application name | logisland | false | false | |
spark.master | The url to Spark Master | local[2] | false | false | |
spark.monitoring.driver.port | The port for exposing monitoring metrics | null | false | false | |
spark.yarn.deploy-mode | The yarn deploy mode | null | false | false | |
spark.yarn.queue | The name of the YARN queue | default | false | false | |
spark.driver.memory | The memory size for Spark driver | 512m | false | false | |
spark.executor.memory | The memory size for Spark executors | 1g | false | false | |
spark.driver.cores | The number of cores for Spark driver | 4 | false | false | |
spark.executor.cores | The number of cores for Spark driver | 1 | false | false | |
spark.executor.instances | The number of instances for Spark app | null | false | false | |
spark.serializer | Class to use for serializing objects that will be sent over the network or need to be cached in serialized form | org.apache.spark.serializer.KryoSerializer | false | false | |
spark.streaming.blockInterval | Interval at which data received by Spark Streaming receivers is chunked into blocks of data before storing them in Spark. Minimum recommended - 50 ms | 350 | false | false | |
spark.streaming.kafka.maxRatePerPartition | Maximum rate (number of records per second) at which data will be read from each Kafka partition | 5000 | false | false | |
spark.streaming.batchDuration | No Description Provided. | 2000 | false | false | |
spark.streaming.backpressure.enabled | This enables the Spark Streaming to control the receiving rate based on the current batch scheduling delays and processing times so that the system receives only as fast as the system can process. | false | false | false | |
spark.streaming.unpersist | Force RDDs generated and persisted by Spark Streaming to be automatically unpersisted from Spark's memory. The raw input data received by Spark Streaming is also automatically cleared. Setting this to false will allow the raw data and persisted RDDs to be accessible outside the streaming application as they will not be cleared automatically. But it comes at the cost of higher memory usage in Spark. | false | false | false | |
spark.ui.port | No Description Provided. | 4050 | false | false | |
spark.streaming.timeout | No Description Provided. | -1 | false | false | |
spark.streaming.kafka.maxRetries | Maximum rate (number of records per second) at which data will be read from each Kafka partition | 3 | false | false | |
spark.streaming.ui.retainedBatches | How many batches the Spark Streaming UI and status APIs remember before garbage collecting. | 200 | false | false | |
spark.streaming.receiver.writeAheadLog.enable | Enable write ahead logs for receivers. All the input data received through receivers will be saved to write ahead logs that will allow it to be recovered after driver failures. | false | false | false | |
spark.yarn.maxAppAttempts | Because Spark driver and Application Master share a single JVM, any error in Spark driver stops our long-running job. Fortunately it is possible to configure maximum number of attempts that will be made to re-run the application. It is reasonable to set higher value than default 2 (derived from YARN cluster property yarn.resourcemanager.am.max-attempts). 4 works quite well, higher value may cause unnecessary restarts even if the reason of the failure is permanent. | 4 | false | false | |
spark.yarn.am.attemptFailuresValidityInterval | If the application runs for days or weeks without restart or redeployment on highly utilized cluster, 4 attempts could be exhausted in few hours. To avoid this situation, the attempt counter should be reset on every hour of so. | 1h | false | false | |
spark.yarn.max.executor.failures | a maximum number of executor failures before the application fails. By default it is max(2 * num executors, 3), well suited for batch jobs but not for long-running jobs. The property comes with corresponding validity interval which also should be set.8 * num_executors | 20 | false | false | |
spark.yarn.executor.failuresValidityInterval | If the application runs for days or weeks without restart or redeployment on highly utilized cluster, x attempts could be exhausted in few hours. To avoid this situation, the attempt counter should be reset on every hour of so. | 1h | false | false | |
spark.task.maxFailures | For long-running jobs you could also consider to boost maximum number of task failures before giving up the job. By default tasks will be retried 4 times and then job fails. | 8 | false | false | |
spark.memory.fraction | expresses the size of M as a fraction of the (JVM heap space - 300MB) (default 0.75). The rest of the space (25%) is reserved for user data structures, internal metadata in Spark, and safeguarding against OOM errors in the case of sparse and unusually large records. | 0.6 | false | false | |
spark.memory.storageFraction | expresses the size of R as a fraction of M (default 0.5). R is the storage space within M where cached blocks immune to being evicted by execution. | 0.5 | false | false | |
spark.scheduler.mode | The scheduling mode between jobs submitted to the same SparkContext. Can be set to FAIR to use fair sharing instead of queueing jobs one after another. Useful for multi-user services. | FAIR (fair sharing), FIFO (queueing jobs one after another) | FAIR | false | false |
spark.properties.file.path | for using --properties-file option while submitting spark job | null | false | false | |
java.library.path | The java library path to use with mesos. | null | false | false | |
spark.cores.max | The maximum number of total executor core with mesos. | null | false | false | |
remote.api.baseUrl | The base URL of the remote server providing logisland configuration | null | false | false | |
remote.api.polling.rate | Remote api polling rate in milliseconds | null | false | false | |
remote.api.push.rate | Remote api configuration push rate in milliseconds | null | false | false | |
remote.api.timeouts.connect | Remote api connection timeout in milliseconds | 10000 | false | false | |
remote.api.auth.user | The basic authentication user for the remote api endpoint. | null | false | false | |
remote.api.auth.password | The basic authentication password for the remote api endpoint. | null | false | false | |
remote.api.timeouts.socket | Remote api default read/write socket timeout in milliseconds | 10000 | false | false |
No additional information is provided
No description provided.
com.hurence.logisland.stream.spark.structured.StructuredStream
None.
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
allowable-valuesName | Description | Allowable Values | Default Value | Sensitive | EL |
---|---|---|---|---|---|
read.stream.service.provider | the controller service that gives connection information | null | false | false | |
read.topics.serializer | the serializer to use | com.hurence.logisland.serializer.KryoSerializer (serialize events as binary blocs), com.hurence.logisland.serializer.JsonSerializer (serialize events as json blocs), com.hurence.logisland.serializer.ExtendedJsonSerializer (serialize events as json blocs supporting nested objects/arrays), com.hurence.logisland.serializer.AvroSerializer (serialize events as avro blocs), com.hurence.logisland.serializer.BytesArraySerializer (serialize events as byte arrays), com.hurence.logisland.serializer.StringSerializer (serialize events as string), none (send events as bytes), com.hurence.logisland.serializer.KuraProtobufSerializer (serialize events as Kura protocol buffer) | none | false | false |
read.topics.key.serializer | The key serializer to use | com.hurence.logisland.serializer.KryoSerializer (serialize events as binary blocs), com.hurence.logisland.serializer.JsonSerializer (serialize events as json blocs), com.hurence.logisland.serializer.ExtendedJsonSerializer (serialize events as json blocs supporting nested objects/arrays), com.hurence.logisland.serializer.AvroSerializer (serialize events as avro blocs), com.hurence.logisland.serializer.BytesArraySerializer (serialize events as byte arrays), com.hurence.logisland.serializer.KuraProtobufSerializer (serialize events as Kura protocol buffer), com.hurence.logisland.serializer.StringSerializer (serialize events as string), none (send events as bytes) | none | false | false |
write.stream.service.provider | the controller service that gives connection information | null | false | false | |
write.topics.serializer | the serializer to use | com.hurence.logisland.serializer.KryoSerializer (serialize events as binary blocs), com.hurence.logisland.serializer.JsonSerializer (serialize events as json blocs), com.hurence.logisland.serializer.ExtendedJsonSerializer (serialize events as json blocs supporting nested objects/arrays), com.hurence.logisland.serializer.AvroSerializer (serialize events as avro blocs), com.hurence.logisland.serializer.BytesArraySerializer (serialize events as byte arrays), com.hurence.logisland.serializer.StringSerializer (serialize events as string), none (send events as bytes), com.hurence.logisland.serializer.KuraProtobufSerializer (serialize events as Kura protocol buffer) | none | false | false |
write.topics.key.serializer | The key serializer to use | com.hurence.logisland.serializer.KryoSerializer (serialize events as binary blocs), com.hurence.logisland.serializer.JsonSerializer (serialize events as json blocs), com.hurence.logisland.serializer.ExtendedJsonSerializer (serialize events as json blocs supporting nested objects/arrays), com.hurence.logisland.serializer.AvroSerializer (serialize events as avro blocs), com.hurence.logisland.serializer.BytesArraySerializer (serialize events as byte arrays), com.hurence.logisland.serializer.StringSerializer (serialize events as string), none (send events as bytes), com.hurence.logisland.serializer.KuraProtobufSerializer (serialize events as Kura protocol buffer) | none | false | false |
groupby | comma separated list of fields to group the partition by | null | false | false | |
state.timeout.ms | the time in ms before we invalidate the microbatch state | 2000 | false | false | |
chunk.size | the number of records to group into chunks | 100 | false | false |
No additional information is provided