Home

Welcome to Anodot Daria Agent's documentation!

Main concepts
How to install
How to upgrade
How to access the agent
Basic flow
How to set up multiple instances of StreamSets
Monitoring
Troubleshooting
Frequently asked questions

Main concepts

Agent structure

agent structure

Source - This is where you want your data to be pulled from.
Destination - Where to put your data. Available destinations: HTTP client - Anodot rest API endpoint
Pipeline - pipelines connect sources and destinations with data processing and transformation stages.
Raw pipeline - are pipelines that pull data from a data source and save it to your local filesystem without any transformations. Desired output directory might be configured via the LOCAL_DESTINATION_OUTPUT_DIR environment variable in the agent docker container, by default it's /usr/src/app/local-output.

CLI config structure

What pipelines do:

Take data from a source
If a destination is an HTTP client - every record is transformed to JSON object according to specs of Anodot 2.0 metric protocol
Values are converted to floating-point numbers
Timestamps are converted to UNIX timestamp in seconds

Available integrations

For more details regarding each integration, please go to the dedicated page for that integration.

Monitoring Apps and tools:

Cacti
Observium
PRTG
Prometheus
Splunk
Solarwinds
VictoriaMetrics, Thanos, Prometheus
Zabbix

Pub/Sub:

Kafka

Files & Logs:

Coralogix
Directory (Files)
Elasticsearch
RRD

Databases:

Clickhouse
Databricks
Impala
InfluxDB
MongoDB
MSSQL
MySQL
Oracle
PostgreSQL

Other:

Sage
SNMP

Required machine resources

Required resources depend really on amount of data you want to stream. In general, agent can process ~1000 eps per 1,5 vCPU. That means that if you need to process 10000eps, you'll need to have 15 vCPU

Memory allocation depends on a number of pipelines to create. Each pipeline represents a single query to run (or a kafka topic to consume). Each pipeline requires 300-500 Mb. So for the standard server with 8Gb memory there shouldn't be more than 25 pipelines.

Minimum requirement: 6GB RAM, 4vCPU

Standard recommendation: 8GB RAM, 12vCPU

How to install

Prerequisites

Docker & docker-compose. Docker installation guide
An active Anodot account; the data destination.

Note: if you're going to work with the agent using REST API, you need to forward the 80 port from the agent docker container to your host machine. To do that uncomment the ports paragraph in the docker-compose.yaml file for the agent service. You can change the target port from 8080 to any other port if needed

Install using the shell script

Download agent.zip
Run unzip agent.zip
Run ./agent.sh install

Increase Java heap size in SDC_JAVA_OPTS in the docker-compose.yaml if you plan to run a lot of pipelines.

Installation via kubernetes

To disable sources validation and data preview use environmental variable VALIDATION_ENABLED

How to access the agent

Script installation

./agent.sh run

Kubernetes installation

AGENTPOD=$(kubectl get pod -l app.kubernetes.io/name=anodot-agent -o jsonpath="{.items[0].metadata.name}")
kubectl exec -it $AGENTPOD bash

Where streamsets-agent-0 should be replaced with the actual pod name

How to upgrade

Breaking compatibility

When upgrading to version >=3.14.0: The user in the agent container was changed from root to agent. It no longer allows processes to run on port 80, so we need to change the listening port for the agent API
- Add environmental variable LISTEN_PORT: 8080 to the agent container. (a docker-compose example is in the installation section)
- For all StreamSets do agent streamsets edit STREAMSETS_URL and for the agent URL change the port to 8080. (this will cause all pipelines to be updated)
When upgrading from a version >2.0.1:
- Sequentially run scripts from src/agent/scripts/upgrade/ directory using the command docker exec -i anodot-agent python src/agent/scripts/upgrade/<script_name> (don't run scripts which version is less or equal your current agent version)
- Run docker exec -i anodot-agent agent pipeline update
When upgrading from a version <2.0.0:
- Upgrade to the 1.18.1 first
- Install a Postgres database alongside the agent (refer to the docker-compose or Kubernetes installation instructions)
- Run docker exec -i anodot-agent python src/agent/scripts/migrate-to-db.py
- Sequentially run scripts from src/agent/scripts/upgrade/ directory using the command docker exec -i anodot-agent python src/agent/scripts/upgrade/<script_name> (don't run scripts which version is less or equal your current agent version)
- Run docker exec -i anodot-agent agent pipeline update
If you upgrade from a version <1.15.0 execute agent destination command before running agent update
If you upgrade from a version <1.6.0 Kafka pipelines will be deprecated. They will still be running but you won't be able to update them. You will need to delete pipelines, delete sources and recreate them with the new config according to the documentation

In order to upgrade the agent you should make such steps depending on the way the agent was installed:

Script installation

./agent.sh upgrade

Kubernetes installation

Set version tag for both images
Apply Kubernetes config
Attach to the agent container
Run

agent pipeline update

Basic flow

Add a StreamSets instance

root@agent:/usr/src/app# agent streamsets add
Enter streamsets url: http://dc:18630
Username [admin]: 
Password [admin]: 
Agent external URL [http://anodot-agent]: http://anodot-agent:8080

Create a destination.

> agent destination
Use proxy for connecting to Anodot? [y/N]: y
Proxy uri: http://squid:3181
Proxy username []:
Proxy password []:
Destination url [https://api.anodot.com]: https://api.anodot.com
Anodot data collection token: tokenhere
Anodot access key: apikey
Destination configured

You can connect to the Anodot application using a proxy. To do that, specify proxy URI, username and password.

Destination URL is a URL of an Anodot application where all data will be transferred.

To get an Anodot data collection token, open your Anodot account, go to Settings > API tokens > Data Collection > Copy.

Add Anodot access key. Please follow the instructions to get it.

After the destination is created you can check the Monitoring pipeline is running and monitoring data is being passed to Anodot

Create a source

agent source create -f /path/to/source/config.json

Create a pipeline

agent pipeline create -f /path/to/pipeline/config.json

Run the pipeline

agent pipeline start PIPELINE_ID

Check pipeline status

agent pipeline info PIPELINE_ID

If errors occur - check the troubleshooting section
1. Fix errors
2. Stop the pipeline agent pipeline stop PIPELINE_ID
3. Reset pipeline origin agent pipeline reset PIPELINE_ID
4. Run pipeline again

Troubleshooting

Pipelines may not work as expected for several reasons, for example, because of a wrong configuration, or some issues connecting to the destination, etc. You can check for errors in such places:

agent pipeline info PIPELINE_ID - This command will show some issues if a pipeline is misconfigured
agent pipeline logs -s ERROR PIPELINE_ID - shows error logs if there are any

docker logs anodot-sdc
docker logs anodot-agent
docker exec -i anodot-agent cat /var/log/agent.log

It's possible to enable logging of requests to Anodot and see the exact data being sent.
1. Stop the pipeline agent pipeline stop PIPELINE_ID
2. Enable logging agent pipeline destination-logs --enable PIPELINE_ID
3. Start the pipeline agent pipeline start PIPELINE_ID
4. See logs destination_logs
5. After troubleshooting stop the pipeline and disable logs because they consume a lot of space agent pipeline destination-logs --disable PIPELINE_ID
If you're having an issue, please contact support@anodot.com. To help us to resolve the issue faster, please send us agent logs package. You can generate it with ./agent.sh diagnostics-info if you're using docker-compose installation (if this command is not available please download the latest script here agent.zip). If the agent is installed in kubernetes cluster please download and run this shell script

Frequently asked questions

Can I edit a pipeline name?

No, you need to delete the pipeline and create a new one with a new name

My data has specific timezone, what should I do?

If your data is not in UTC, when configuring the pipeline you should specify a timezone. Example:

[
  {
    "source": "test",
    "pipeline_id": "test",
    ...
    "timestamp": {
      "type": "string",
      "name": "timestamp",
      "format": "yyyy-MM-dd HH:mm:ss"
    },
    "timezone": "Asia/Dubai"
  }
]

You should use tz database names (like Asia/Dubai, Europe/London etc.) instead of offset numbers (like GMT+05:00) so daylight saving times will be handled automatically. List of all tz database names you can find here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Welcome to Anodot Daria Agent's documentation!

Main concepts

Agent structure

What pipelines do:

Available integrations

Required machine resources

How to install

Prerequisites

Install using the shell script

How to access the agent

Script installation

Kubernetes installation

How to upgrade

Breaking compatibility

Script installation

Kubernetes installation

Basic flow

Troubleshooting

Frequently asked questions

Can I edit a pipeline name?

My data has specific timezone, what should I do?

Clone this wiki locally