Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compactor: does not compact 4 consecutive 2-hour blocks #7287

Open
vincent-olivert-riera opened this issue Apr 19, 2024 · 6 comments
Open

compactor: does not compact 4 consecutive 2-hour blocks #7287

vincent-olivert-riera opened this issue Apr 19, 2024 · 6 comments

Comments

@vincent-olivert-riera
Copy link

vincent-olivert-riera commented Apr 19, 2024

Thanos, Prometheus and Golang version used:

Thanos: 0.32.4
Golang: go1.20.8

Prometheus: 2.45.0
goVersion: go1.20.5

Object Storage Provider:

Openstack S3 compatible

What happened:

I have a Thanos compactor with the following metrics:

thanos_compact_halted 0
thanos_compact_todo_compactions 0

It is tracking a bucket where almost all blocks have been compacted up to level-4.
However, there are some level-1 blocks that are not compacted, and I was expecting them to be compacted into a level-2 block. I have made this animated gif to show it more clearly:

compactor

None of those blocks has been marked as no-compaction, so they should be compacted.

These are the meta.json for each one of them:

01HT1G02DF2W21A1KTHDVPX0BR
{
  "ulid": "01HT1G02DF2W21A1KTHDVPX0BR",
  "minTime": 1711584000246,
  "maxTime": 1711591200000,
  "stats": {
    "numSamples": 2492646,
    "numSeries": 5196,
    "numChunks": 20775
  },
  "compaction": {
    "level": 1,
    "sources": [
      "01HT1G02DF2W21A1KTHDVPX0BR"
    ]
  },
  "version": 1,
  "thanos": {
    "labels": {
      "cluster_name": "alpha",
      "cluster_node": "prometheus004",
      "datasource": "alpha-002"
    },
    "downsample": {
      "resolution": 0
    },
    "source": "sidecar",
    "segment_files": [
      "000001"
    ],
    "files": [
      {
        "rel_path": "chunks/000001",
        "size_bytes": 3964613
      },
      {
        "rel_path": "index",
        "size_bytes": 646029
      },
      {
        "rel_path": "meta.json"
      }
    ],
    "index_stats": {}
  }
}
01HT1PVSMCNYF8ZSDW53123NJX
{
  "ulid": "01HT1PVSMCNYF8ZSDW53123NJX",
  "minTime": 1711591200246,
  "maxTime": 1711598400000,
  "stats": {
    "numSamples": 2492640,
    "numSeries": 5193,
    "numChunks": 20772
  },
  "compaction": {
    "level": 1,
    "sources": [
      "01HT1PVSMCNYF8ZSDW53123NJX"
    ]
  },
  "version": 1,
  "thanos": {
    "labels": {
      "cluster_name": "alpha",
      "cluster_node": "prometheus004",
      "datasource": "alpha-002"
    },
    "downsample": {
      "resolution": 0
    },
    "source": "sidecar",
    "segment_files": [
      "000001"
    ],
    "files": [
      {
        "rel_path": "chunks/000001",
        "size_bytes": 3957077
      },
      {
        "rel_path": "index",
        "size_bytes": 644900
      },
      {
        "rel_path": "meta.json"
      }
    ],
    "index_stats": {}
  }
}
01HT1XQGXB5CHQB21YT5DNXFC8
{
  "ulid": "01HT1XQGXB5CHQB21YT5DNXFC8",
  "minTime": 1711598400246,
  "maxTime": 1711605600000,
  "stats": {
    "numSamples": 2492640,
    "numSeries": 5193,
    "numChunks": 20772
  },
  "compaction": {
    "level": 1,
    "sources": [
      "01HT1XQGXB5CHQB21YT5DNXFC8"
    ]
  },
  "version": 1,
  "thanos": {
    "labels": {
      "cluster_name": "alpha",
      "cluster_node": "prometheus004",
      "datasource": "alpha-002"
    },
    "downsample": {
      "resolution": 0
    },
    "source": "sidecar",
    "segment_files": [
      "000001"
    ],
    "files": [
      {
        "rel_path": "chunks/000001",
        "size_bytes": 3969637
      },
      {
        "rel_path": "index",
        "size_bytes": 645540
      },
      {
        "rel_path": "meta.json"
      }
    ],
    "index_stats": {}
  }
}
01HT24K86QTXJ1HV2NW252DAEV
{
  "ulid": "01HT24K86QTXJ1HV2NW252DAEV",
  "minTime": 1711605600246,
  "maxTime": 1711612800000,
  "stats": {
    "numSamples": 2492646,
    "numSeries": 5196,
    "numChunks": 20775
  },
  "compaction": {
    "level": 1,
    "sources": [
      "01HT24K86QTXJ1HV2NW252DAEV"
    ]
  },
  "version": 1,
  "thanos": {
    "labels": {
      "cluster_name": "alpha",
      "cluster_node": "prometheus004",
      "datasource": "alpha-002"
    },
    "downsample": {
      "resolution": 0
    },
    "source": "sidecar",
    "segment_files": [
      "000001"
    ],
    "files": [
      {
        "rel_path": "chunks/000001",
        "size_bytes": 3981026
      },
      {
        "rel_path": "index",
        "size_bytes": 645293
      },
      {
        "rel_path": "meta.json"
      }
    ],
    "index_stats": {}
  }
}

This is the command line that I'm using:

/bin/thanos compact \
  --bucket-web-label=cluster_node \
  --data-dir /var/thanos/compact \
  --objstore.config-file=/etc/thanos/objstore.yml \
  --wait \
  --selector.relabel-config-file=/etc/thanos/relabel_config.yml \
  --downsampling.disable \
  --retention.resolution-5m=1d \
  --retention.resolution-1h=1d \
  --log.format=json \
  --log.level=debug
Contents of /etc/thanos/objstore.yml
type: S3
config:
  bucket: "thanos-alpha"
  endpoint: "redacted"
  access_key: "redacted"
  insecure: false
  signature_version2: false
  secret_key: "redacted"
  list_objects_version: "v1"
  http_config:
    idle_conn_timeout: 60s
Contents of /etc/thanos/relabel_config.yml
- action: keep
  regex: "alpha-002"
  source_labels:
  - datasource

What could be the reason for this behavior?

@GiedriusS
Copy link
Member

Is thanos_compact_iterations_total more than 0? 🤔

@vincent-olivert-riera
Copy link
Author

vincent-olivert-riera commented Apr 19, 2024

Is thanos_compact_iterations_total more than 0? 🤔

Yes, it is constantly growing.

This is how thanos_compact_todo_compactions compares with thanos_compact_iterations_total:

image

image

@douglascamata
Copy link
Contributor

@vincent-olivert-riera can you show us some information about the level 4 blocks you mentioned? What's their duration?

@vincent-olivert-riera
Copy link
Author

@vincent-olivert-riera can you show us some information about the level 4 blocks you mentioned? What's their duration?

Sure.

image

This is its meta.json
{
  "ulid": "01HT1RN6JP9AZWYGHTG8XRXHSS",
  "minTime": 1710374400246,
  "maxTime": 1711584000000,
  "stats": {
    "numSamples": 418763800,
    "numSeries": 5231,
    "numChunks": 3489753
  },
  "compaction": {
    "level": 4,
    "sources": [
      "01HRXEDZ6XVH2MYV7K0W6CQ27Z",
      "01HRXN9PEVG5DWN2DA3H1EQZYW",
      "01HRXW5DPSMG8GMH0CNQZ08S79",
      "01HRY314YTX75WHERFSB1XFMQ2",
      "01HRY9WW6SCNHJBT3BWB46V9Y1",
      "01HRYGRKETBY322F7ABNHFGMNP",
      "01HRYQMAPTCKA67H1XT72KMA4E",
      "01HRYYG1YSWVX80H09XBE68MW4",
      "01HRZ5BS6TX5RANK4KVFE1A25K",
      "01HRZC7GET4BX2PCHXV0AXF0MX",
      "01HRZK37PT5KR89H78K70Z3FW2",
      "01HRZSYYYSEPRXQ3J1G99EMRN5",
      "01HS00TP6TF8HZ11SSBJ5DJTG2",
      "01HS07PDEX23P49VE9KN0Z6B6P",
      "01HS0EJ4PSQQR9HYQNKV7HP2XN",
      "01HS0NDVZX0XGC0K85FKS87HEE",
      "01HS0W9K6T60BN2G1HFWNZDYZC",
      "01HS135AEVCERKJHJAE7YE82PG",
      "01HS1A11PS6BF1W0WM0ZD1X1Y4",
      "01HS1GWRYTJE5BQQDVYQ99DFMB",
      "01HS1QRG6TZAZFAGBQ5DYTFV0H",
      "01HS1YM7ESF2Q93M625Q9SEJYX",
      "01HS25FYPTETJ90722E08R9PFS",
      "01HS2CBNYT6CM6BJ9AK2M8PGN6",
      "01HS2K7D6T8AW1R2VMRCPQ2272",
      "01HS2T34ETMF9BH086GDZEEQ2X",
      "01HS30YVPTN97BAVJ5BPP3854Y",
      "01HS37TJYTHEAMQ4H9JT792RJD",
      "01HS3EPA6TERX4QJTX09QD0PJJ",
      "01HS3NJ1ETC27EQGAB9E36SF2N",
      "01HS3WDRPT2BNNW2R2E3D6BT8X",
      "01HS439FYTNNAYWCN7FM561T3Z",
      "01HS4A576T2G0HS1HFRR55Z83H",
      "01HS4H0YETBSTJCR1VFA2KT4HM",
      "01HS4QWNPSQNT3SVHWB65TWHK0",
      "01HS4YRCYVQFRZZ88FQQKRX6SV",
      "01HS55M46TW584YFK9NYT2Y8J0",
      "01HS5CFVEVSRBDV7MK2SY3QJE7",
      "01HS5KBJPTDC3DHZ3G5DP4Y1XZ",
      "01HS5T79YTS4438E2ZX4FS4T5F",
      "01HS61316SK2JXN87693FRJ3D9",
      "01HS67YRET9XW7TJNJ5A2QSM41",
      "01HS6ETFPT53QCM7VYJZTH8QB8",
      "01HS6NP6YT5FMTPNY8D5N7C9BF",
      "01HS6WHY6T8PCT1GNAC3TVKEY1",
      "01HS73DNET028TXMPQYVVF179Q",
      "01HS7A9CPT1HTA26Q4YC1FAGHV",
      "01HS7H53YT05D4QETBPC042C7C",
      "01HS7R0V6TKWRR46E82709XQGQ",
      "01HS7YWJETVH0E1YV7KWVV6BH8",
      "01HS85R9PT52VPMPFP3B9D30YQ",
      "01HS8CM0YSBNC8E3X5Z1S1QAKY",
      "01HS8KFR6TT0C995731BTSSZ5C",
      "01HS8TBFET873G0CX47NYV5P07",
      "01HS9176PTW3XFMSYGWQKZZC6E",
      "01HS982XYT1JV16HZWKEX5N696",
      "01HS9EYN6TSV8J00BRNE4CD74H",
      "01HS9NTCEV1WCDGHNS5PJSK0NP",
      "01HS9WP3PS3B9NFFP98JRYTHJ4",
      "01HSA3HTYTQCCX7DH8EPDBN4Q0",
      "01HSAADJ6VB2YFJKY4RWY18ZA2",
      "01HSAH99EVKJQZMBSH7PF497SG",
      "01HSAR50PT0ZH3ZNE8N1VJWTXQ",
      "01HSAZ0QYXE5KPBH5NS0WFYHEF",
      "01HSB5WF6SPAB3TJP64V7NSME1",
      "01HSBCR6EVHNNNCJBN27H8RWF2",
      "01HSBKKXPTYZ5D4SH8P4KW74C9",
      "01HSBTFMYW431XKWR750PXYYAQ",
      "01HSC1BC7088CV86NBKTXXQ494",
      "01HSC873ET7YV5PK4EV61GKGD9",
      "01HSCF2TPSNKYMSTCF07FBTYHQ",
      "01HSCNYHYT6SVCYBF58KTZJQ9J",
      "01HSCWT96TDBGKXVZ1VR44X9DV",
      "01HSD3P0ETXPZ80M8EEZ61RE8H",
      "01HSDAHQPYG3XCFY91N41FR4A7",
      "01HSDHDEYT44RNSS14WYNVB9VS",
      "01HSDR966V0NK7E5CN8ED8RQJK",
      "01HSDZ4XEVH5C45F9FZK47TN59",
      "01HSE60MPT0N3CER5QERB3QBH1",
      "01HSECWBYS84V009FYSB3N6B39",
      "01HSEKR36TJCYV52XBSWFRDFW6",
      "01HSETKTETGNGNQBZYS4MSA7EP",
      "01HSF1FHPTVY7PGBHS0MHR0V4Z",
      "01HSF8B8YT8PMPF2YZ7WYX6DXA",
      "01HSFF706TDE9TJ45HVEJE1C5E",
      "01HSFP2QETJHV0QEZ70QBVE2Z4",
      "01HSFWYEPTVXBBVW872WYRQ18S",
      "01HSG3T5YVSTQ8SMZBEDACHG01",
      "01HSGANX6TNA9HNM3ZH3ZGRGRS",
      "01HSGHHMET3487PYA2BRJP80YC",
      "01HSGRDBPTQWZ1ZS64GGH6SZY5",
      "01HSGZ92YT3SNZSRC0M6GH56JN",
      "01HSH64T6TG6N29P8E8WACF9C3",
      "01HSHD0HET2HC2HP9TWRRHFEYH",
      "01HSHKW8PSF5SPN131PA2CHCYN",
      "01HSHTQZYTDZJ016DDXQZ9ZXQ6",
      "01HSJ1KQ6WMAABPCAD4QCZ30BP",
      "01HSJ8FEESK80Z3N9Z5D19841W",
      "01HSJFB5PT40EA8WMKFBWCWZ3X",
      "01HSJP6WYTZE8GE46P726YJVXK",
      "01HSJX2M6VXD0SF4YYKA920WY8",
      "01HSK3YBESNDWHZM80MBY4E4S0",
      "01HSKAT2PYPJXVRZG1NBGX88B0",
      "01HSKHNSYVZA9ZB9MZAKS7G5YP",
      "01HSKRHH6THAAP44ZDB80NGEFE",
      "01HSKZD8ET7W92EFFRP7BDMQR0",
      "01HSM68ZPXA3P18Y0DPZQJXH8N",
      "01HSMD4PYSWWKE2V6DPWFQ5VWA",
      "01HSMM0E6TA2EM8J40F7FR478S",
      "01HSMTW5ESJ3E2X9K3F8CDQJR5",
      "01HSN1QWPSWRBCKV88HVAH6CXW",
      "01HSN8KKYTYBPX1ZQC9BZ4Y4HJ",
      "01HSNFFB6THMDJ7X4FGYBZK8BD",
      "01HSNPB2EY6CXHMPWJH46T3S43",
      "01HSNX6SPV19FPNV99XT4N3BGE",
      "01HSP42GYS3GZPDEEJXEVTE5H5",
      "01HSPAY86SE3D504YN5357EEK2",
      "01HSPHSZEVQ92QFGH66YRM0W9D",
      "01HSPRNPPXG4PJJGBQEZJ0TK2E",
      "01HSPZHDYVDTWSRMDQPJEHR7VA",
      "01HSQ6D56YBN25SBSWJ12H7XCW",
      "01HSQD8WET5WDSJE28PHV491NW",
      "01HSQM4KPTTEXXA5P0JZJ8MHKS",
      "01HSQV0AYTDG339RRNFRR7H7VV",
      "01HSR1W26TJE71X3TGF056TP2S",
      "01HSR8QSET41K89H6GW418HC6X",
      "01HSRFKGPTY39Y12QYYX9RDN86",
      "01HSRPF7YSE9VPF3RQTPHAW7TZ",
      "01HSRXAZ6VYK26MJPCA1CSYSJS",
      "01HSS46PES01ZR3HJS0NQ4XSH8",
      "01HSSB2DPSPMRS3RY72KK7CEM3",
      "01HSSHY4YVT7JREWXH5NNTN14P",
      "01HSSRSW6V89AZKRRF317ZV2RS",
      "01HSSZNKET6RPQ9NH02128GSPH",
      "01HST6HAPV68TBB7GRPY9WEXGS",
      "01HSTDD1YSNXRE53KBARRATVNF",
      "01HSTM8S6V698S7JJ3EGK49AFH",
      "01HSTV4GET9ZQW2866AX8FEQ8F",
      "01HSV207PW7390V9E9J9BBZJYC",
      "01HSV8VYYTTAZAAYD5M5V93NQX",
      "01HSVFQP6V4HCN4WF95QWGVAN7",
      "01HSVPKDETASJTW0BAAJB5VB9M",
      "01HSVXF4PWVT6Q68BN4B0KXA4A",
      "01HSW4AVYTN03K408NBNQ9B7QZ",
      "01HSWB6K6TFQJPSTEDR4Z94KNF",
      "01HSWJ2AET9Q65CZ0ZGPEC69YW",
      "01HSWRY1PVR3FH3GBJN4ANA8G6",
      "01HSWZSRYVX0HNXA527GK123SH",
      "01HSX6NG6WB00R3GRJKE5QSRA4",
      "01HSXDH7EVC955BNRS0KY1R130",
      "01HSXMCYPSNVZ4SMW2MQPSY2Z8",
      "01HSXV8NYTKHZ93211PK4WCK5H",
      "01HSY24D6TRMPDSHG32KNWKVH8",
      "01HSY904ETZ47RVJ3JK0KAFB37",
      "01HSYFVVPTN80CZDQ38HT26RQJ",
      "01HSYPQJYTP7E8E6SGWBHE0SPP",
      "01HSYXKA6TPPMRJW7D1W8WYYNZ",
      "01HSZ4F1ET71NACD56BAM6RNAP",
      "01HSZBARPT7FFE4X7CA2KKTYS9",
      "01HSZJ6FYT8JTW9KSMNG6YEGZ9",
      "01HSZS276TA2KNDMBDXA25RCG6",
      "01HSZZXYEWEGBJTTBWHHQGYYBA",
      "01HT06SNPTV1CKM0NHRS32PAQR",
      "01HT0DNCYS66962KGAWWSGZS9V",
      "01HT0MH46TP81CRNFCCCHRF19J",
      "01HT0VCVETBHMAJM9SVQSX0EM6",
      "01HT128JPTHQ3Q2YVF0RTB5ER5",
      "01HT1949YT230P1R17F1HSCYER"
    ],
    "parents": [
      {
        "ulid": "01HS2VVDW0R0EVGNTND8E2BCTM",
        "minTime": 1710374400246,
        "maxTime": 1710547200000
      },
      {
        "ulid": "01HS80MPGZPTTEY3JBQ3RKV5F3",
        "minTime": 1710547200246,
        "maxTime": 1710720000000
      },
      {
        "ulid": "01HSD5E4H5NJH60BH30PKA3186",
        "minTime": 1710720000246,
        "maxTime": 1710892800000
      },
      {
        "ulid": "01HSJA7FHRDBYF0XFDEYXVMY0Z",
        "minTime": 1710892800246,
        "maxTime": 1711065600000
      },
      {
        "ulid": "01HSQF11TV1XEPV3ZN7WWX66HA",
        "minTime": 1711065600246,
        "maxTime": 1711238400000
      },
      {
        "ulid": "01HSWKTK77ZK2EM1H2EWGWCYNS",
        "minTime": 1711238400246,
        "maxTime": 1711411200000
      },
      {
        "ulid": "01HT1RKXJ3KFABZF5C1V8F7JJZ",
        "minTime": 1711411200246,
        "maxTime": 1711584000000
      }
    ]
  },
  "version": 1,
  "thanos": {
    "labels": {
      "cluster_name": "alpha",
      "cluster_node": "prometheus003-prom-jp2v-dev",
      "datasource": "alpha-002"
    },
    "downsample": {
      "resolution": 0
    },
    "source": "compactor",
    "segment_files": [
      "000001",
      "000002"
    ],
    "files": [
      {
        "rel_path": "chunks/000001",
        "size_bytes": 536870124
      },
      {
        "rel_path": "chunks/000002",
        "size_bytes": 125027769
      },
      {
        "rel_path": "index",
        "size_bytes": 22741614
      },
      {
        "rel_path": "meta.json"
      }
    ],
    "index_stats": {
      "series_max_size": 4800,
      "chunk_max_size": 1013
    }
  }
}

@douglascamata
Copy link
Contributor

@vincent-olivert-riera if you grep your Compactor's log with block IDs of the blocks that didn't get compacted, do you see anything that stands out? If possible, maybe increase the Compactor's log level to generate more logs (then revert it, otherwise logs might be too spammy). 🤔

@vincent-olivert-riera
Copy link
Author

vincent-olivert-riera commented Apr 19, 2024

@douglascamata , I haven't increased the Compactor's log level yet, but this is what the Compactor is doing (in a loop):

Apr 19, 2024 @ 20:29:37.120{"caller":"compact.go:1478","level":"info","msg":"compaction iterations done","ts":"2024-04-19T11:29:29.342094884Z"}
Apr 19, 2024 @ 20:29:37.120{"caller":"compact.go:457","level":"info","msg":"downsampling was explicitly disabled","ts":"2024-04-19T11:29:29.342370667Z"}
Apr 19, 2024 @ 20:27:42.421{"cached":346,"caller":"fetcher.go:487","component":"block.BaseFetcher","duration":"8.78195813s","duration_ms":8781,"level":"info","msg":"successfully synchronized block metadata","partial":0,"returned":346,"ts":"2024-04-19T11:26:37.988636543Z"}
Apr 19, 2024 @ 20:27:42.421{"caller":"fetcher.go:317","component":"block.BaseFetcher","concurrency":32,"level":"debug","msg":"fetching meta data","ts":"2024-04-19T11:27:29.206720687Z"}
Apr 19, 2024 @ 20:27:42.421{"cached":346,"caller":"fetcher.go:487","component":"block.BaseFetcher","duration":"7.076132358s","duration_ms":7076,"level":"info","msg":"successfully synchronized block metadata","partial":0,"returned":346,"ts":"2024-04-19T11:27:36.282764217Z"}
Apr 19, 2024 @ 20:26:36.158{"caller":"fetcher.go:317","component":"block.BaseFetcher","concurrency":32,"level":"debug","msg":"fetching meta data","ts":"2024-04-19T11:25:29.206734945Z"}
Apr 19, 2024 @ 20:26:36.158{"cached":346,"caller":"fetcher.go:487","component":"block.BaseFetcher","duration":"7.2963137s","duration_ms":7296,"level":"info","msg":"successfully synchronized block metadata","partial":0,"returned":346,"ts":"2024-04-19T11:25:36.502939791Z"}
Apr 19, 2024 @ 20:26:36.158{"caller":"fetcher.go:317","component":"block.BaseFetcher","concurrency":32,"level":"debug","msg":"fetching meta data","ts":"2024-04-19T11:26:29.20683454Z"}
Apr 19, 2024 @ 20:25:37.421{"caller":"compact.go:1414","level":"info","msg":"start sync of metas","ts":"2024-04-19T11:24:22.419242154Z"}
Apr 19, 2024 @ 20:25:37.421{"caller":"fetcher.go:317","component":"block.BaseFetcher","concurrency":32,"level":"debug","msg":"fetching meta data","ts":"2024-04-19T11:24:22.419842845Z"}
Apr 19, 2024 @ 20:25:37.421{"cached":346,"caller":"fetcher.go:487","component":"block.BaseFetcher","duration":"5.786435988s","duration_ms":5786,"level":"info","msg":"successfully synchronized block metadata","partial":0,"returned":174,"ts":"2024-04-19T11:24:28.206118667Z"}
Apr 19, 2024 @ 20:25:37.421{"caller":"compact.go:1419","level":"info","msg":"start of GC","ts":"2024-04-19T11:24:28.20786563Z"}
Apr 19, 2024 @ 20:25:37.421{"caller":"compact.go:1442","level":"info","msg":"start of compactions","ts":"2024-04-19T11:24:28.208735693Z"}

I have search for all the block IDs, but Kibana does not return anything at all.
I will try to increase the log level and see what happens. The log level is debug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants