Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] 为什么我的压缩比很低,而且越来越低,现在差不多2.5 #12568

Open
1 of 2 tasks
hzltbs opened this issue May 22, 2024 · 15 comments
Open
1 of 2 tasks

Comments

@hzltbs
Copy link

hzltbs commented May 22, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Version

iotdb 1.2.0

Describe the bug and provide the minimal reproduce step

编码和压缩方式应该是没有问题,数据量大概目前大概在20亿
Uploading 微信截图_20240522102315.png…
Uploading 微信截图_20240522102351.png…

What did you expect to see?

怎么降低压缩比呢

What did you see instead?

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
Copy link

Hi, this is your first issue in IoTDB project. Thanks for your report. Welcome to join the community!

@hzltbs
Copy link
Author

hzltbs commented May 22, 2024

微信截图_20240522102315
微信截图_20240522102351

@HTHou
Copy link
Contributor

HTHou commented May 22, 2024

只有两个序列?

@hzltbs
Copy link
Author

hzltbs commented May 24, 2024

不是,只是所有序列都是这两种类型。 应该有10万左右序列

@hzltbs
Copy link
Author

hzltbs commented May 24, 2024

有10万左右的序列,所有序列的两个属性都是这两种类型

@hzltbs
Copy link
Author

hzltbs commented May 24, 2024

截止到今天有40亿数据量了,压缩比现在只有2.5左右了,每天都在下降。 每天数据的增量在七千万左右

@hzltbs
Copy link
Author

hzltbs commented May 27, 2024

只有两个序列?

有10万左右的序列,所有序列的两个属性都是这两种类型

@jixuan1989
Copy link
Member

现在磁盘占用是多大?
顺便数一下文件个数?
(之前的截图看不到,似乎上传失败了)

@hzltbs
Copy link
Author

hzltbs commented May 28, 2024

现在磁盘占用是多大? 顺便数一下文件个数? (之前的截图看不到,似乎上传失败了)
image
image
image

@HTHou
Copy link
Contributor

HTHou commented May 28, 2024

可以再提供一下 unsequence 目录下的文件数吗,另外部署的 iotdb 给 datanode 分配的内存大小是?

@hzltbs
Copy link
Author

hzltbs commented May 30, 2024

可以再提供一下 unsequence 目录下的文件数吗,另外部署的 iotdb 给 datanode 分配的内存大小是?

image
内存设置应该是32g

@hzltbs
Copy link
Author

hzltbs commented May 30, 2024

看了下sequence的文件数比上次少了
image
今天物理磁盘占用91g,总数据量到75亿了。压缩比还在下降 大概2.41了,降的比之前慢了。
image

@HTHou
Copy link
Contributor

HTHou commented May 30, 2024

数据不敏感的话可以发一个 tsfile 文件,我们可以分析分析

@hzltbs
Copy link
Author

hzltbs commented May 31, 2024

root.JH.zip

数据不敏感的话可以发一个 tsfile 文件,我们可以分析分析

@HTHou
Copy link
Contributor

HTHou commented Jun 6, 2024

目前看起来建模方式有一些不合理

  1. 现在的建模里,假如 root.JH.JHGDS.DV_SYSOPSD81 作为一个device,一个 device下有 ts 和 v 两个measurements。建模可以优化为 root.JH.JHGDS 作为一个 device,DV_SYSOPSD81 作为其中的一个 measurement,ts 和 v 分别作为时间戳和值,不需要写成两个序列。序列 root.JH.JHGDS.DV_SYSOPSD81 可以作为一个 double 类型的序列。
  2. database 的个数有些过多了,推荐 1 个。

1 是导致压缩比不高的主要原因。按目前的建模,所有的时间戳都存储了3份。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants