Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potential db compression settings? #665

Closed
neilh10 opened this issue Jul 11, 2023 · 5 comments
Closed

potential db compression settings? #665

neilh10 opened this issue Jul 11, 2023 · 5 comments

Comments

@neilh10
Copy link

neilh10 commented Jul 11, 2023

Its likely systems in the field stop transmitting readings and then to reconnect at a later date and upload stored readings. #485

It would be optimal for there to be a model for what historical timeframe might be supported.

I'm just wondering what are the current internal settings fort the timescaledb compression are, if any.
https://www.timescale.com/features/compression

Examples are
#658 - had three systems interrupt Mar 12, and reconnected May 23rd and and when restored had three systems upload data.
Salmon Creek - that had its solar power cable pulled May 15, and restored June 27th and on reconnection transmitted data up https://monitormywatershed.org/sites/TUCA_Sa01/
Mill Creek installed June 12, and stopped a couple of days later https://monitormywatershed.org/sites/TUCA_Mi06/
or very remote Navarro Creek that stopped connecting June 2 https://monitormywatershed.org/sites/TUCA-Na13/

As I understand it, the timescaledb will compress data based on a historical timeframe , and then later if there is a POST to that period it will un-compress the rows associated with the device, then insert the new record.
Its likely that the un-compression will take time, and could exceed the 10second timeout that the device will wait for an http 201.
In that case the device will retry on the next connection, and continue retrying till a http 201 is received.

Then I'm guessing then on some schedule, timedb will recompress the data.

@neilh10 neilh10 changed the title timedb compression settings? timescaledb compression settings? Jul 11, 2023
@neilh10
Copy link
Author

neilh10 commented Jul 12, 2023

I guess I'm not clear is MMW / ODM2 using timescaledb or influxdb . Influxdb is referenced in the architecture diagram outlined 4yrs ago https://github.com/ODM2/ODM2DataSharingPortal/blob/main/doc/ArchitectureDiagram/Data%20Sharing%20Portal%20Architecture%20with%20Logos%20-%20Copy.png

@aufdenkampe
Copy link
Member

@neilh10, we migrated from InfluxDB to the PosgreSQL Timescale extension (i.e. TimescaleDB) with our v0.12 release in December 2021. We've unfortunately gotten behind on updating all of our documentation to reflect cumulative changes since then (outside of our release notes).

To answer the original question, we're compressing old data into 90-day chunks, that are auto-created every 90 days, with a 1-month buffer. So there is always at least 1 month of uncompressed data.
See #502 (comment)

Regardless, the server-load required to insert historical data should get substantially improved when we implement #674 with our planned v0.18 Milestone.

@aufdenkampe
Copy link
Member

Closing to track work on this under:

@neilh10
Copy link
Author

neilh10 commented Feb 5, 2024

@aufdenkampe thanks for the update. I'm just wondering then what is the impact when a POST is made with a time stamp that is in a range that is compressed.
Of course this issue probably will still exist with #688 , and I wonder what that effect will be of it being queued.

@aufdenkampe
Copy link
Member

The effect of a post of data into a compressed time chunk is that the server needs to use substantially more resources to insert the data, because the entire chunk needs to get decompressed, appended, then compressed again.

This is only for updates. Reading from a compressed chuck is fast.

#688 does alleviate the problem because it spreads out the work load. We don't mind our server doing the work. The issue is just getting too many posts at once leads to the server getting overloaded in the minute when they all arrive. We have a ton of time where the server CPU is idle. The point of SQS is spread the work into that idle time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants