Skip to content
Daria Kharlan edited this page Dec 18, 2023 · 89 revisions

Welcome to Anodot Daria Agent's documentation!

Main concepts

Agent structure

agent structure

  • Source - This is where you want your data to be pulled from.
  • Destination - Where to put your data. Available destinations: HTTP client - Anodot rest API endpoint
  • Pipeline - pipelines connect sources and destinations with data processing and transformation stages.
  • Raw pipeline - are pipelines that pull data from a data source and save it to your local filesystem without any transformations. Desired output directory might be configured via the LOCAL_DESTINATION_OUTPUT_DIR environment variable in the agent docker container, by default it's /usr/src/app/local-output.

CLI config structure

What pipelines do:

  1. Take data from a source
  2. If a destination is an HTTP client - every record is transformed to JSON object according to specs of Anodot 2.0 metric protocol
  3. Values are converted to floating-point numbers
  4. Timestamps are converted to UNIX timestamp in seconds

Available integrations

For more details regarding each integration, please go to the dedicated page for that integration.

Monitoring Apps and tools:

  • Cacti
  • Observium
  • PRTG
  • Prometheus
  • Splunk
  • Solarwinds
  • VictoriaMetrics, Thanos, Prometheus
  • Zabbix

Pub/Sub:

  • Kafka

Files & Logs:

  • Coralogix
  • Directory (Files)
  • Elasticsearch
  • RRD

Databases:

  • Clickhouse
  • Databricks
  • Impala
  • InfluxDB
  • MongoDB
  • MSSQL
  • MySQL
  • Oracle
  • PostgreSQL

Other:

  • Sage
  • SNMP

Required machine resources

Required resources depend really on amount of data you want to stream. In general, agent can process ~1000 eps per 1,5 vCPU. That means that if you need to process 10000eps, you'll need to have 15 vCPU

Memory allocation depends on a number of pipelines to create. Each pipeline represents a single query to run (or a kafka topic to consume). Each pipeline requires 300-500 Mb. So for the standard server with 8Gb memory there shouldn't be more than 25 pipelines.

Minimum requirement: 6GB RAM, 4vCPU

Standard recommendation: 8GB RAM, 12vCPU

How to install

Prerequisites

Note: if you're going to work with the agent using REST API, you need to forward the 80 port from the agent docker container to your host machine. To do that uncomment the ports paragraph in the docker-compose.yaml file for the agent service. You can change the target port from 8080 to any other port if needed

Install using the shell script

  1. Download agent.zip
  2. Run unzip agent.zip
  3. Run ./agent.sh install

Increase Java heap size in SDC_JAVA_OPTS in the docker-compose.yaml if you plan to run a lot of pipelines.

Installation via kubernetes

To disable sources validation and data preview use environmental variable VALIDATION_ENABLED

How to access the agent

Script installation

./agent.sh run

Kubernetes installation

AGENTPOD=$(kubectl get pod -l app.kubernetes.io/name=anodot-agent -o jsonpath="{.items[0].metadata.name}")
kubectl exec -it $AGENTPOD bash

Where streamsets-agent-0 should be replaced with the actual pod name

How to upgrade

Breaking compatibility

  • When upgrading to version >=3.14.0: The user in the agent container was changed from root to agent. It no longer allows processes to run on port 80, so we need to change the listening port for the agent API
    • Add environmental variable LISTEN_PORT: 8080 to the agent container. (a docker-compose example is in the installation section)
    • For all StreamSets do agent streamsets edit STREAMSETS_URL and for the agent URL change the port to 8080. (this will cause all pipelines to be updated)
  • When upgrading from a version >2.0.1:
    • Sequentially run scripts from src/agent/scripts/upgrade/ directory using the command docker exec -i anodot-agent python src/agent/scripts/upgrade/<script_name> (don't run scripts which version is less or equal your current agent version)
    • Run docker exec -i anodot-agent agent pipeline update
  • When upgrading from a version <2.0.0:
    • Upgrade to the 1.18.1 first
    • Install a Postgres database alongside the agent (refer to the docker-compose or Kubernetes installation instructions)
    • Run docker exec -i anodot-agent python src/agent/scripts/migrate-to-db.py
    • Sequentially run scripts from src/agent/scripts/upgrade/ directory using the command docker exec -i anodot-agent python src/agent/scripts/upgrade/<script_name> (don't run scripts which version is less or equal your current agent version)
    • Run docker exec -i anodot-agent agent pipeline update
  • If you upgrade from a version <1.15.0 execute agent destination command before running agent update
  • If you upgrade from a version <1.6.0 Kafka pipelines will be deprecated. They will still be running but you won't be able to update them. You will need to delete pipelines, delete sources and recreate them with the new config according to the documentation

In order to upgrade the agent you should make such steps depending on the way the agent was installed:

Script installation

./agent.sh upgrade

Kubernetes installation

  1. Set version tag for both images
  2. Apply Kubernetes config
  3. Attach to the agent container
  4. Run
agent pipeline update

Basic flow

  1. Add a StreamSets instance
root@agent:/usr/src/app# agent streamsets add
Enter streamsets url: http://dc:18630
Username [admin]: 
Password [admin]: 
Agent external URL [http://anodot-agent]: http://anodot-agent:8080
  1. Create a destination.
> agent destination
Use proxy for connecting to Anodot? [y/N]: y
Proxy uri: http://squid:3181
Proxy username []:
Proxy password []:
Destination url [https://api.anodot.com]: https://api.anodot.com
Anodot data collection token: tokenhere
Anodot access key: apikey
Destination configured

You can connect to the Anodot application using a proxy. To do that, specify proxy URI, username and password.

Destination URL is a URL of an Anodot application where all data will be transferred.

To get an Anodot data collection token, open your Anodot account, go to Settings > API tokens > Data Collection > Copy.

Add Anodot access key. Please follow the instructions to get it.

After the destination is created you can check the Monitoring pipeline is running and monitoring data is being passed to Anodot

  1. Create a source
agent source create -f /path/to/source/config.json
  1. Create a pipeline
agent pipeline create -f /path/to/pipeline/config.json
  1. Run the pipeline
agent pipeline start PIPELINE_ID
  1. Check pipeline status
agent pipeline info PIPELINE_ID
  1. If errors occur - check the troubleshooting section
    1. Fix errors
    2. Stop the pipeline agent pipeline stop PIPELINE_ID
    3. Reset pipeline origin agent pipeline reset PIPELINE_ID
    4. Run pipeline again

Troubleshooting

Pipelines may not work as expected for several reasons, for example, because of a wrong configuration, or some issues connecting to the destination, etc. You can check for errors in such places:

  1. agent pipeline info PIPELINE_ID - This command will show some issues if a pipeline is misconfigured

  2. agent pipeline logs -s ERROR PIPELINE_ID - shows error logs if there are any

docker logs anodot-sdc
docker logs anodot-agent
docker exec -i anodot-agent cat /var/log/agent.log
  1. It's possible to enable logging of requests to Anodot and see the exact data being sent.

    1. Stop the pipeline agent pipeline stop PIPELINE_ID
    2. Enable logging agent pipeline destination-logs --enable PIPELINE_ID
    3. Start the pipeline agent pipeline start PIPELINE_ID
    4. See logs destination_logs
    5. After troubleshooting stop the pipeline and disable logs because they consume a lot of space agent pipeline destination-logs --disable PIPELINE_ID
  2. If you're having an issue, please contact support@anodot.com. To help us to resolve the issue faster, please send us agent logs package. You can generate it with ./agent.sh diagnostics-info if you're using docker-compose installation (if this command is not available please download the latest script here agent.zip). If the agent is installed in kubernetes cluster please download and run this shell script

Frequently asked questions

Can I edit a pipeline name?

No, you need to delete the pipeline and create a new one with a new name

My data has specific timezone, what should I do?

If your data is not in UTC, when configuring the pipeline you should specify a timezone. Example:

[
  {
    "source": "test",
    "pipeline_id": "test",
    ...
    "timestamp": {
      "type": "string",
      "name": "timestamp",
      "format": "yyyy-MM-dd HH:mm:ss"
    },
    "timezone": "Asia/Dubai"
  }
]

You should use tz database names (like Asia/Dubai, Europe/London etc.) instead of offset numbers (like GMT+05:00) so daylight saving times will be handled automatically. List of all tz database names you can find here

Clone this wiki locally