Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for MongoDB time-series collections #173

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

gregorynoma
Copy link

This adds support for MongoDB time-series collections, which are available starting in MongoDB 5.0. The changes include:

  • Switch from mgo to the official MongoDB Go driver
  • Add options to the MongoDB loader
    • --url
    • --document-per-event
    • --timeseries-collection
    • --retryable-writes
    • --ordered-inserts
    • --random-field-order
  • Add option to query generator
    • --mongo-use-naive
  • Move measurement data to the top level of inserted objects
  • Support queries using MongoDB naive data format

1) HighCPUForHosts panics when trying to generate a query for zero hostnames
2) Option parsing code used mongo-use-native rather than mongo-use-naive for "naive" schema
3) Helper scripts didn't have a way to choose between bucketized and document-per schema
1) GroupByTimeAndPrimaryTag used 2 sorts (on b and then a) when a sort on (a,b) was needed. The double sort is probably inefficient and only correct when a stable sort is guaranteed.
2) LastPointPerHost had a predicate on the "measurement" field which did not exist in the pipeline at that stage, so the query had an empty result
@CLAassistant
Copy link

CLAassistant commented Jul 14, 2021

CLA assistant check
All committers have signed the CLA.

@jonatas
Copy link
Contributor

jonatas commented Aug 4, 2021

Hello @gregorynoma ! thanks for the PR and all improvements related to mongodb! would you mind fixing the mongo test errors for the broken build?

# github.com/timescale/tsbs/pkg/query [github.com/timescale/tsbs/pkg/query.test]
1440pkg/query/mongo_test.go:23:20: cannot use "github.com/globalsign/mgo/bson".M literal (type "github.com/globalsign/mgo/bson".M) as type primitive.M in append
1441FAIL	github.com/timescale/tsbs/pkg/query [build failed]
1442?   	github.com/timescale/tsbs/pkg/query/config	[no test files]
1443?   	github.com/timescale/tsbs/pkg/query/factories	[no test files]
1444?   	github.com/timescale/tsbs/pkg/targets	[no test files]

gregorynoma and others added 3 commits August 6, 2021 10:34
 * Use the official MongoDB Go Driver rather than mgo
 * Add timeseries-collection, retryable-writes, and ordered-inserts options
@gregorynoma
Copy link
Author

Hi @jonatas sorry about that, should be fixed now.

Copy link
Contributor

@jonatas jonatas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the fixes! Just a few things that can make the project easy to maintain:

  1. I was trying to test locally and, I see the project has the full_cycle_minitest folder with one script for each database. Maybe that would be great to have it for mongo too. It can help me to review and understand some basic scenarios from a short benchmark example.
  2. I also see an opportunity to leave some sample-configs with the same purpose but YAML files.

// GroupByOrderByLimit populates a query.Query that has a time WHERE clause, that groups by a
// truncated date, orders by that date, and takes a limit, e.g. in pseudo-SQL:
//
// SELECT minute, MAX(cpu) FROM cpu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the implementation, the column name usage_user. Should we fix the docs too?

Suggested change
// SELECT minute, MAX(cpu) FROM cpu
// SELECT minute, MAX(usage_user) FROM cpu

@gregorynoma
Copy link
Author

Hey @jonatas, I added those files you suggested as well as incorporated some additional changes we made since the original PR!

@gregorynoma gregorynoma requested a review from jonatas June 21, 2022 16:12
@jonatas
Copy link
Contributor

jonatas commented Jun 23, 2022

Thank you, @gregorynoma! I'll review it again!

@lcasassa
Copy link

lcasassa commented Dec 3, 2023

Hi All, Any updates on this? Looking forward to the benchmark results!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants