Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timeseries json flatten for druid #76

Closed
bourbonkk opened this issue Dec 15, 2022 · 7 comments · Fixed by #78
Closed

timeseries json flatten for druid #76

bourbonkk opened this issue Dec 15, 2022 · 7 comments · Fixed by #78
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@bourbonkk
Copy link
Member

No description provided.

@bourbonkk bourbonkk self-assigned this Dec 15, 2022
@bourbonkk bourbonkk added enhancement New feature or request help wanted Extra attention is needed labels Dec 15, 2022
@bourbonkk
Copy link
Member Author

bourbonkk commented Dec 15, 2022

I need to flatten the data in order to use the druid
The clymene is wrapped in a timeseries array and loaded in kafka.

{
  "timeseries": [
    {
      "labels": [
        {
          "name": "__name__",
          "value": "go_gc_duration_seconds"
        },
        {
          "name": "cluster",
          "value": "target-cluster"
        },
        {
          "name": "instance",
          "value": "localhost:9100"
        },
        {
          "name": "job",
          "value": "node-exporter"
        },
        {
          "name": "quantile",
          "value": "0"
        }
      ],
      "samples": [
        {
          "value": 0.000013105,
          "timestamp": "1671108031021"
        }
      ]
    },
    {
      "labels": [
        {
          "name": "__name__",
          "value": "go_gc_duration_seconds"
        },
        {
          "name": "cluster",
          "value": "target-cluster"
        },
        {
          "name": "instance",
          "value": "localhost:9100"
        },
        {
          "name": "job",
          "value": "node-exporter"
        },
        {
          "name": "quantile",
          "value": "0.25"
        }
      ],
      "samples": [
        {
          "value": 0.000024546,
          "timestamp": "1671108031021"
        }
      ]
    },
    {
      "labels": [
        {
          "name": "__name__",
          "value": "go_gc_duration_seconds"
        },
        {
          "name": "cluster",
          "value": "target-cluster"
        },
        {
          "name": "instance",
          "value": "localhost:9100"
        },
        {
          "name": "job",
          "value": "node-exporter"
        },
        {
          "name": "quantile",
          "value": "0.5"
        }
      ],
      "samples": [
        {
          "value": 0.000026219,
          "timestamp": "1671108031021"
        }
      ]
    },
    {
      "labels": [
        {
          "name": "__name__",
          "value": "go_gc_duration_seconds"
        },
        {
          "name": "cluster",
          "value": "target-cluster"
        },
        {
          "name": "instance",
          "value": "localhost:9100"
        },
        {
          "name": "job",
          "value": "node-exporter"
        },
        {
          "name": "quantile",
          "value": "0.75"
        }
      ],
      "samples": [
        {
          "value": 0.000028494,
          "timestamp": "1671108031021"
        }
      ]
    },
    {
      "labels": [
        {
          "name": "__name__",
          "value": "go_gc_duration_seconds"
        },
        {
          "name": "cluster",
          "value": "target-cluster"
        },
        {
          "name": "instance",
          "value": "localhost:9100"
        },
        {
          "name": "job",
          "value": "node-exporter"
        },
        {
          "name": "quantile",
          "value": "1"
        }
      ],
      "samples": [
        {
          "value": 0.000079431,
          "timestamp": "1671108031021"
        }
      ]
    }
]
}

@bourbonkk
Copy link
Member Author

Can't you make it flat in the flattenSpec setting without modifying the code?

@litkhai
Copy link

litkhai commented Dec 16, 2022

From Druid 24.0, you may use nested json functions without flattening.
Please refer to below and feel free to reach out for more details.

https://druid.apache.org/docs/latest/querying/nested-columns.html
https://druid.apache.org/docs/latest/querying/sql-json-functions.html

1

2

3

4

5

6

7

@litkhai
Copy link

litkhai commented Dec 16, 2022

In ingestion spec, the below part should be set:

  "transformSpec": {
    "transforms": [
      {
        "type": "expression",
        "name": "data",
        "expression": "parse_json(\"data\")"
      }
    ]
  },
  "dimensionsSpec": {
    "dimensions": [
      {
        "name": "data",
        "type": "json"
      }
    ]
  },

@bourbonkk
Copy link
Member Author

bourbonkk commented Jan 23, 2023

@litkhai 안녕하세요. 답변이 가능하실지는 모르겠지만, 다시한번 문의드립니다.

timeseries라는 Object에 쌓여있는 포맷을 json array로 데이터를 평탄화할 수 있도록 수정했는데요
이런 케이스는 druid에서 사용할 수 있을까요?
테스트 해본바로는 json format으로 jq flattening 설정을 추가해서 .[] 넣었더니 parse 에러가 발생합니다.
데이터가 카프카에 한번에 쌓이는 양이 많다보니 건건히 produce하기에는 무리가 있다는 생각이 들어서 아래와 같은 포맷으로 변경해봤습니다.

[{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"0","timestamp":"2023-01-23T02:14:16.019Z","value":0.000006993},{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"0.25","timestamp":"2023-01-23T02:14:16.019Z","value":0.00002612},{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"0.5","timestamp":"2023-01-23T02:14:16.019Z","value":0.000026601},{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"0.75","timestamp":"2023-01-23T02:14:16.019Z","value":0.000027072},{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"1","timestamp":"2023-01-23T02:14:16.019Z","value":0.0000423},{"__name__":"go_gc_duration_seconds_sum","cluster...

image

@bourbonkk
Copy link
Member Author

bourbonkk commented Jan 24, 2023

@litkhai

Thank you for your support.
I checked that it is applied in the format below.
Except for , it was connected in byte array form.
{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"0","timestamp":"2023-01-24T06:17:31.02Z","value":0.000008857}{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"0.25","timestamp":"2023-01-24T06:17:31.02Z","value":0.00002599}{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"0.5","timestamp":"2023-01-24T06:17:31.02Z","value":0.00002668}{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"0.75","timestamp":"2023-01-24T06:17:31.02Z","value":0.000027523}{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"1","timestamp":"2023-01-24T06:17:31.02Z","value":0.000050135}

image

image

@bourbonkk bourbonkk linked a pull request Jan 24, 2023 that will close this issue
@litkhai
Copy link

litkhai commented Jan 24, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants