Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support expireAfterSeconds for indexes (TTL) #2415

Open
Tracked by #2256 ...
AlekSi opened this issue Apr 12, 2023 · 21 comments
Open
Tracked by #2256 ...

Support expireAfterSeconds for indexes (TTL) #2415

AlekSi opened this issue Apr 12, 2023 · 21 comments
Assignees
Labels
area/indexes Issues about indexes code/feature Some user-visible feature is not implemented yet not ready Issues that are not ready to be worked on; PRs that should skip CI

Comments

@AlekSi
Copy link
Member

AlekSi commented Apr 12, 2023

What should be done?

https://www.mongodb.com/docs/manual/tutorial/expire-data/

@AlekSi AlekSi added the code/feature Some user-visible feature is not implemented yet label Apr 12, 2023
@winrid
Copy link

winrid commented Apr 13, 2023

How would this be implemented? Postgres doesn't support this out of the box AFAIK. Create a regular index on a date field and transparently create a cron?

@wqhhust
Copy link
Contributor

wqhhust commented May 1, 2023

When create such an index, create a cron entry for postgres like:
SELECT cron.schedule('*/5 * * * *', $$DELETE FROM events WHERE event_time < now() - interval '1 day(based on TTL)'$$);

And in the postgresql, we need to enable pg_cron by running command
CREATE EXTENSION pg_cron;

If you think this is ok, then please assign it to me.

@AlekSi AlekSi added the not ready Issues that are not ready to be worked on; PRs that should skip CI label May 1, 2023
@AlekSi AlekSi added the area/indexes Issues about indexes label Jun 8, 2023
@rpavlik
Copy link

rpavlik commented Jun 15, 2023

For some reason, in an older version (from early february) ferretdb did not reject an attempt to create these auto expire ttl columns. However, newer versions do, fwiw.

@AlekSi
Copy link
Member Author

AlekSi commented Jun 15, 2023

That's because we are now more strict on checking unsupported parameters. We could revert that change for that parameter if that's a problem for you.

@rpavlik
Copy link

rpavlik commented Jun 15, 2023

No problem, I was able to make connect-mongo handle the expiration itself. I mostly wanted to comment here as breadcrumbs in case someone in the future hits issues when upgrading their version of ferretdb.

@c-nv-s
Copy link

c-nv-s commented Jul 11, 2023

so I checked the roadmap and can't see anywhere that this feature is accommodated.
does that mean it is abandoned?

@BoBiene
Copy link

BoBiene commented Aug 17, 2023

If you think this is ok, then please assign it to me.

@wqhhust do you still plan to implement the TTL index?

@wqhhust
Copy link
Contributor

wqhhust commented Aug 17, 2023 via email

@AlekSi
Copy link
Member Author

AlekSi commented Aug 17, 2023

so I checked the roadmap and can't see anywhere that this feature is accommodated.
does that mean it is abandoned?

The issue is not abandoned. Please vote for it with 👍 for the description if you need it.

@c-nv-s
Copy link

c-nv-s commented Aug 18, 2023

definitely still need it

@mrwogu
Copy link

mrwogu commented Aug 30, 2023

Is there an approximate release date for this solution or any workaround?

@jouir
Copy link

jouir commented Sep 26, 2023

When create such an index, create a cron entry for postgres like: SELECT cron.schedule('*/5 * * * *', $$DELETE FROM events WHERE event_time < now() - interval '1 day(based on TTL)'$$);

And in the postgresql, we need to enable pg_cron by running command CREATE EXTENSION pg_cron;

If you think this is ok, then please assign it to me.

The pg_cron extension is a great idea but it has several drawbacks:

  • Not compatible with multiple databases on the same instance (see issue)
  • Extension that must be declared in the shared preload libraries (requires a restart)
  • PostgreSQL specific

May I suggest another approach?

  • Find a way to retrieve the time based on Object ID
  • At index creation, FerretDB could register the TTL index configuration in a metadata table on the backend (table name, column name, TTL value), update the configuration in memory for further access and create a regular index on the backend
  • At start, FerretDB could connect to the database to load the TTL configurations from the metadata table into memory, then create a go routine that will periodically check for TTL index configurations and perform the various DELETE operations in the background

MongoDB has more optimizations but this may be too soon to implement:

  • drop buckets / partitions
  • delete in multiple passes
  • assign priorities (when multiple passes don't delete a record, priority is raised)

What do you think?

I'm not ready to work on this issue right now because:

  • Didn't tested FerretDB yet
  • Don't know the code by heart / didn't contribute to smaller issues
  • Lack of personal time to work on the issue
  • Management may not be allow me to work on an issue that is already solved on MongoDB itself

But if you are interested, I may give it a try, in a best effort mode.

@c-nv-s
Copy link

c-nv-s commented Sep 26, 2023

If it is of any help, here are two other projects which have implemented TTL over a Postgres database
first project is a job queue
https://github.com/acaloiaro/neoq/pull/59/files

second is a project which takes/understands Redis commands but stores the data on a Postgres backend
https://github.com/alash3al/redix/blob/4b04e059b69fbb488bc7e9ddba2e4eb3c794a8ef/internals/datastore/contract/engine.go#L26

@AlekSi
Copy link
Member Author

AlekSi commented Sep 26, 2023

Find a way to retrieve the time based on Object ID

Unfortunately, _id could be a different type like string. Mixing different _id types in the same collection is also possible. And without a proper way of sorting data from oldest to newest, it is unclear what to do.

One possible solution is #3339, but details are not clear yet. We will be looking into that issue soon™.

@jouir
Copy link

jouir commented Sep 27, 2023

I thought it was automagicaly deduced by the engine but it turns out you need a user-defined time-based field to create the TTL index on:

To create a TTL index, use createIndex(). Specify an index field that is either a date type or an array that contains date type values. Use the expireAfterSeconds option to specify a TTL value in seconds.

https://www.mongodb.com/docs/manual/core/index-ttl/#create-a-ttl-index

No need to create a new internal field to manage TTL indexes after all.

More details in the documentation.

@BoBiene
Copy link

BoBiene commented Nov 3, 2023

With the implementation of capped collection (#3458) the TTL-indexes should be able to be done in a comparable way, or not?
In my understanding for the capped collection, it is required to delete entries based on some condition - same as here.

@BoBiene
Copy link

BoBiene commented Nov 6, 2023

@AlekSi I'm wondering if you have any plans to implement TTL indices in the fourth quarter of 2023. This feature would be very useful and we would appreciate your feedback on its feasibility and timeline. Thanks for your hard work and dedication to this project.

@AlekSi
Copy link
Member Author

AlekSi commented Nov 6, 2023

I tentatively added it to the Q1 of 2024.

@AlekSi AlekSi added this to the v2.0.0 milestone Mar 1, 2024
@AlekSi AlekSi removed this from the v2.0.0 milestone Apr 4, 2024
@boutell
Copy link

boutell commented Apr 4, 2024

This issue also came up in testing for compatibility with ApostropheCMS. It sounds like it was a near miss for inclusion in 2.0.0, so perhaps soon?

Workarounds exist.

@c-nv-s
Copy link

c-nv-s commented Apr 4, 2024

I'm still hoping it gets included soon and would be all ears about any workarounds.

@boutell
Copy link

boutell commented Apr 4, 2024

The workaround makes sense if you control the application code: include a check for the property in question in your actual query, so results that are too old just don't come back, and do your own purges of stuff older than that periodically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/indexes Issues about indexes code/feature Some user-visible feature is not implemented yet not ready Issues that are not ready to be worked on; PRs that should skip CI
Projects
Status: No status
Development

No branches or pull requests

10 participants