timeseries json flatten for druid #76

bourbonkk · 2022-12-15T13:08:15Z

No description provided.

bourbonkk · 2022-12-15T13:29:27Z

I need to flatten the data in order to use the druid
The clymene is wrapped in a timeseries array and loaded in kafka.

{
  "timeseries": [
    {
      "labels": [
        {
          "name": "__name__",
          "value": "go_gc_duration_seconds"
        },
        {
          "name": "cluster",
          "value": "target-cluster"
        },
        {
          "name": "instance",
          "value": "localhost:9100"
        },
        {
          "name": "job",
          "value": "node-exporter"
        },
        {
          "name": "quantile",
          "value": "0"
        }
      ],
      "samples": [
        {
          "value": 0.000013105,
          "timestamp": "1671108031021"
        }
      ]
    },
    {
      "labels": [
        {
          "name": "__name__",
          "value": "go_gc_duration_seconds"
        },
        {
          "name": "cluster",
          "value": "target-cluster"
        },
        {
          "name": "instance",
          "value": "localhost:9100"
        },
        {
          "name": "job",
          "value": "node-exporter"
        },
        {
          "name": "quantile",
          "value": "0.25"
        }
      ],
      "samples": [
        {
          "value": 0.000024546,
          "timestamp": "1671108031021"
        }
      ]
    },
    {
      "labels": [
        {
          "name": "__name__",
          "value": "go_gc_duration_seconds"
        },
        {
          "name": "cluster",
          "value": "target-cluster"
        },
        {
          "name": "instance",
          "value": "localhost:9100"
        },
        {
          "name": "job",
          "value": "node-exporter"
        },
        {
          "name": "quantile",
          "value": "0.5"
        }
      ],
      "samples": [
        {
          "value": 0.000026219,
          "timestamp": "1671108031021"
        }
      ]
    },
    {
      "labels": [
        {
          "name": "__name__",
          "value": "go_gc_duration_seconds"
        },
        {
          "name": "cluster",
          "value": "target-cluster"
        },
        {
          "name": "instance",
          "value": "localhost:9100"
        },
        {
          "name": "job",
          "value": "node-exporter"
        },
        {
          "name": "quantile",
          "value": "0.75"
        }
      ],
      "samples": [
        {
          "value": 0.000028494,
          "timestamp": "1671108031021"
        }
      ]
    },
    {
      "labels": [
        {
          "name": "__name__",
          "value": "go_gc_duration_seconds"
        },
        {
          "name": "cluster",
          "value": "target-cluster"
        },
        {
          "name": "instance",
          "value": "localhost:9100"
        },
        {
          "name": "job",
          "value": "node-exporter"
        },
        {
          "name": "quantile",
          "value": "1"
        }
      ],
      "samples": [
        {
          "value": 0.000079431,
          "timestamp": "1671108031021"
        }
      ]
    }
]
}

bourbonkk · 2022-12-15T13:35:13Z

Can't you make it flat in the flattenSpec setting without modifying the code?

litkhai · 2022-12-16T00:12:41Z

From Druid 24.0, you may use nested json functions without flattening.
Please refer to below and feel free to reach out for more details.

https://druid.apache.org/docs/latest/querying/nested-columns.html
https://druid.apache.org/docs/latest/querying/sql-json-functions.html

litkhai · 2022-12-16T00:16:18Z

In ingestion spec, the below part should be set:

  "transformSpec": {
    "transforms": [
      {
        "type": "expression",
        "name": "data",
        "expression": "parse_json(\"data\")"
      }
    ]
  },
  "dimensionsSpec": {
    "dimensions": [
      {
        "name": "data",
        "type": "json"
      }
    ]
  },

bourbonkk · 2023-01-23T04:38:54Z

@litkhai 안녕하세요. 답변이 가능하실지는 모르겠지만, 다시한번 문의드립니다.

timeseries라는 Object에 쌓여있는 포맷을 json array로 데이터를 평탄화할 수 있도록 수정했는데요
이런 케이스는 druid에서 사용할 수 있을까요?
테스트 해본바로는 json format으로 jq flattening 설정을 추가해서 .[] 넣었더니 parse 에러가 발생합니다.
데이터가 카프카에 한번에 쌓이는 양이 많다보니 건건히 produce하기에는 무리가 있다는 생각이 들어서 아래와 같은 포맷으로 변경해봤습니다.

[{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"0","timestamp":"2023-01-23T02:14:16.019Z","value":0.000006993},{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"0.25","timestamp":"2023-01-23T02:14:16.019Z","value":0.00002612},{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"0.5","timestamp":"2023-01-23T02:14:16.019Z","value":0.000026601},{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"0.75","timestamp":"2023-01-23T02:14:16.019Z","value":0.000027072},{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"1","timestamp":"2023-01-23T02:14:16.019Z","value":0.0000423},{"__name__":"go_gc_duration_seconds_sum","cluster...

bourbonkk · 2023-01-24T06:21:35Z

@litkhai

Thank you for your support.
I checked that it is applied in the format below.
Except for , it was connected in byte array form.
{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"0","timestamp":"2023-01-24T06:17:31.02Z","value":0.000008857}{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"0.25","timestamp":"2023-01-24T06:17:31.02Z","value":0.00002599}{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"0.5","timestamp":"2023-01-24T06:17:31.02Z","value":0.00002668}{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"0.75","timestamp":"2023-01-24T06:17:31.02Z","value":0.000027523}{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"1","timestamp":"2023-01-24T06:17:31.02Z","value":0.000050135}

litkhai · 2023-01-24T08:18:20Z

안녕하세요. 먼저 새해 복 많이 받으세요. 제가 아직 내용을 상세히 못봤지만, 배열처리는 아직은 기능 개선이 상당히 필요한 부분이긴 합니다. 내용 상세히 보고 차주 전에 회신 드리겠습니다. 감사합니다!

…

On Jan 23, 2023, at 1:39 PM, Allen Kim ***@***.***> wrote: @litkhai <https://github.com/litkhai> 안녕하세요. 답변이 가능하실지는 모르겠지만, 다시한번 문의드립니다. timeseries라는 Object에 쌓여있는 포맷을 json array로 데이터를 평탄화할 수 있도록 수정했는데요 이런 케이스는 druid에서 사용할 수 있을까요? 테스트 해본바로는 json format으로 jq flattening 설정을 추가해서 .[] 넣었더니 parse 에러가 발생합니다. 데이터가 카프카에 한번에 쌓이는 양이 많다보니 건건히 produce하기에는 무리가 있다는 생각이 들어서 아래와 같은 포맷으로 변경해봤습니다. [{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"0","timestamp":"2023-01-23T02:14:16.019Z","value":0.000006993},{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"0.25","timestamp":"2023-01-23T02:14:16.019Z","value":0.00002612},{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"0.5","timestamp":"2023-01-23T02:14:16.019Z","value":0.000026601},{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"0.75","timestamp":"2023-01-23T02:14:16.019Z","value":0.000027072},{"__name__":"go_gc_duration_seconds","cluster":"target-cluster","instance":"localhost:9100","job":"node-exporter","quantile":"1","timestamp":"2023-01-23T02:14:16.019Z","value":0.0000423},{"__name__":"go_gc_duration_seconds_sum","cluster... — Reply to this email directly, view it on GitHub <#76 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGCNIQPTY25VPU4YWZYPKU3WTYDOVANCNFSM6AAAAAAS7WXGFM>. You are receiving this because you were mentioned.

bourbonkk self-assigned this Dec 15, 2022

bourbonkk added enhancement New feature or request help wanted Extra attention is needed labels Dec 15, 2022

bourbonkk linked a pull request Jan 24, 2023 that will close this issue

Feature/druid support #78

Merged

bourbonkk closed this as completed in #78 Jan 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

timeseries json flatten for druid #76

timeseries json flatten for druid #76

bourbonkk commented Dec 15, 2022

bourbonkk commented Dec 15, 2022 •

edited

Loading

bourbonkk commented Dec 15, 2022

litkhai commented Dec 16, 2022

litkhai commented Dec 16, 2022 •

edited

Loading

bourbonkk commented Jan 23, 2023 •

edited

Loading

bourbonkk commented Jan 24, 2023 •

edited

Loading

litkhai commented Jan 24, 2023 via email

timeseries json flatten for druid #76

timeseries json flatten for druid #76

Comments

bourbonkk commented Dec 15, 2022

bourbonkk commented Dec 15, 2022 • edited Loading

bourbonkk commented Dec 15, 2022

litkhai commented Dec 16, 2022

litkhai commented Dec 16, 2022 • edited Loading

bourbonkk commented Jan 23, 2023 • edited Loading

bourbonkk commented Jan 24, 2023 • edited Loading

litkhai commented Jan 24, 2023 via email

bourbonkk commented Dec 15, 2022 •

edited

Loading

litkhai commented Dec 16, 2022 •

edited

Loading

bourbonkk commented Jan 23, 2023 •

edited

Loading

bourbonkk commented Jan 24, 2023 •

edited

Loading