Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues with revisions and activity #17894

Open
3 tasks done
Tracked by #21980
BenoitAverty opened this issue Mar 23, 2023 · 11 comments · May be fixed by #20683
Open
3 tasks done
Tracked by #21980

Performance issues with revisions and activity #17894

BenoitAverty opened this issue Mar 23, 2023 · 11 comments · May be fixed by #20683

Comments

@BenoitAverty
Copy link

Checklist

Describe the Bug

There is a problem with the reveisions and activity queries.

When I navigate in my directus instance, I regularly get a popup that says that a request failed with status 504.
When I look in my devtools, it's the requests fetching the revisions for a particular item.

I have to purge the activity and revisions often to avoid that, even though I don't have a particularly big database (there are ~ 1 million in activity and 500 000 in revisions).

My guess is that there's a missing index in the revisions table, because the query is almost always fetching revisions for a particular collection and item : there should be an index on the item column.

To Reproduce

There's not a reliable way I think beause it depends on the volume of revisions and the performance of the DB. However it should be possible to measure a performance boost with an added index.

Hosting Strategy

Self-Hosted (Docker Image)

@Zack-Heisnberg
Copy link

what database are you using ?

@bevanmw
Copy link

bevanmw commented Mar 24, 2023

what database are you using ?

I'm not OP but I am having the same issue with Postgres 14 running on AWS RDS.

@jimhuds
Copy link

jimhuds commented Mar 24, 2023

Same issue here. Revisions take a really long time to load. Have about 700,000 records in the directus_revisions table.

I use two fields for caching stock values from our slow ERP, which are updated every 6 hours. Would be nice to be able to disable revision logging for certain fields.

@bevanmw
Copy link

bevanmw commented Mar 24, 2023

I added an index on the item column of the directus_revisions table and it greatly improved perfomance. It would be great if it was added by default or possible to enable as a config flag.

@joselcvarela
Copy link
Member

On top of the index suggestion we should also have a configuration to limit the amount of revisions per item, so they are rotative. For example, only save last 0, 5, 10, 20, 50 revisions per item. Probably can be configured per collection.

This is because Directus can crash because it does not have sufficient memory (OOMKilled - Out of Memory). It happens
when the machine has limited memory and there's some WYSIWYG, JSON or any other column that can have a great amount of data. Because directus_revisions will store every column of the record in data and will store the huge data column in delta, it can crash sooner than later.

Another thing we can do is truncate data and delta and only retrieve those columns in full when we want to restore a specific version or when we retrieve that single revision 🤔

@rijkvanzanten
Copy link
Member

Partial fix numero uno here is to ensure there's some configuration to control the max retention on those tables. Part of the problem is that it effectively grows infinitely big without limit. #18105 implements two new env vars to control the max retention in time for activity and revisions separately.

@ponanin
Copy link

ponanin commented Apr 12, 2023

Same issue here, ~1,1M records.

@WoLfulus
Copy link
Contributor

WoLfulus commented Jul 27, 2023

For those struggling with issues similar to this (but not quite exactly the one OP opened)

We wanted to clean up the revision/activity table and the estimated time was ~3 hours, ran immediately after the change.

Add an index to column directus_revisions.parent and if you want to clean the table up, also set the ON DELETE action to SET NULL in the same directus_revisions.parent column. To speed up deletions on directus_activity, add an index to directus_revisions.activity too.

These indexes might help with selections too, but I haven't had any issues in this particular case.

For reference (use at your own risk):

CREATE INDEX custom_directus_revisions_parent_idx ON public.directus_revisions USING btree (parent);
ALTER TABLE public.directus_revisions DROP CONSTRAINT directus_revisions_parent_foreign;
ALTER TABLE public.directus_revisions ADD CONSTRAINT directus_revisions_parent_foreign FOREIGN KEY (parent) REFERENCES public.directus_revisions(id) ON DELETE SET NULL;
CREATE INDEX custom_directus_revisions_activity_idx ON public.directus_revisions (activity);

@JohJonker
Copy link

Just to confirm, we had the same issue: (i) random 504s (which I could see in the request monitor were very slow or timed-out requests for revisions) and (ii) the list of revisions in the single item sidebar taking ages to load. I can confirm @BenoitAverty and @bevanmw's suggestion to add an index to the item column of the directus_revisions table fixed this for us. Revisions requests are resolved instantly now.

@joselcvarela
Copy link
Member

Also want to mention this became more trickier with the addition of Content Versioning.
Now, we cannot simply just remove the records from Activity and Revisions.
We must check if field version is populated in directus_revisions and not remove it if that's the case 😕

When we try to delete a collection, it's freezing the whole project.
That's because all the tables under the transaction (see next preview) are locked (due to the transaction) and transaction need to wait until all the records are removed from those (and the other) tables.

await this.knex.transaction(async (trx) => {
if (collectionToBeDeleted!.schema) {
await trx.schema.dropTable(collectionKey);
}
// Make sure this collection isn't used as a group in any other collections
await trx('directus_collections').update({ group: null }).where({ group: collectionKey });
if (collectionToBeDeleted!.meta) {
const collectionItemsService = new ItemsService('directus_collections', {
knex: trx,
accountability: this.accountability,
schema: this.schema,
});
await collectionItemsService.deleteOne(collectionKey, {
bypassEmitAction: (params) =>
opts?.bypassEmitAction ? opts.bypassEmitAction(params) : nestedActionEvents.push(params),
});
}
if (collectionToBeDeleted!.schema) {
const fieldsService = new FieldsService({
knex: trx,
accountability: this.accountability,
schema: this.schema,
});
await trx('directus_fields').delete().where('collection', '=', collectionKey);
await trx('directus_presets').delete().where('collection', '=', collectionKey);
const revisionsToDelete = await trx
.select('id')
.from('directus_revisions')
.where({ collection: collectionKey });
if (revisionsToDelete.length > 0) {
const chunks = chunk(
revisionsToDelete.map((record) => record.id),
10000,
);
for (const keys of chunks) {
await trx('directus_revisions').update({ parent: null }).whereIn('parent', keys);
}
}
await trx('directus_revisions').delete().where('collection', '=', collectionKey);
await trx('directus_activity').delete().where('collection', '=', collectionKey);
await trx('directus_permissions').delete().where('collection', '=', collectionKey);
await trx('directus_relations').delete().where({ many_collection: collectionKey });
const relations = this.schema.relations.filter((relation) => {
return relation.collection === collectionKey || relation.related_collection === collectionKey;
});
for (const relation of relations) {
// Delete related o2m fields that point to current collection
if (relation.related_collection && relation.meta?.one_field) {
await fieldsService.deleteField(relation.related_collection, relation.meta.one_field, {
autoPurgeCache: false,
autoPurgeSystemCache: false,
bypassEmitAction: (params) =>
opts?.bypassEmitAction ? opts.bypassEmitAction(params) : nestedActionEvents.push(params),
});
}
// Delete related m2o fields that point to current collection
if (relation.related_collection === collectionKey) {
await fieldsService.deleteField(relation.collection, relation.field, {
autoPurgeCache: false,
autoPurgeSystemCache: false,
bypassEmitAction: (params) =>
opts?.bypassEmitAction ? opts.bypassEmitAction(params) : nestedActionEvents.push(params),
});
}
}
const a2oRelationsThatIncludeThisCollection = this.schema.relations.filter((relation) => {
return relation.meta?.one_allowed_collections?.includes(collectionKey);
});
for (const relation of a2oRelationsThatIncludeThisCollection) {
const newAllowedCollections = relation
.meta!.one_allowed_collections!.filter((collection) => collectionKey !== collection)
.join(',');
await trx('directus_relations')
.update({ one_allowed_collections: newAllowedCollections })
.where({ id: relation.meta!.id });
}
}
});

@u12206050
Copy link
Contributor

u12206050 commented Apr 12, 2024

Wanted to open a new issue, when I saw this one is in progress. So just adding that any use of LOWER such as in this query will not be able to use indexes. And so I would advise dropping the use of LOWER on possibly all the fields except the user_agent and comment field.

If possible, standardize the case of the values in these columns when data is inserted or updated. This way, you can avoid using LOWER() during querying.

SELECT
  `directus_activity` . `action`,
  `directus_activity` . `collection`,
  `directus_activity` . `timestamp`,
  `directus_activity` . `id`,
  `directus_activity` . `user`
FROM
  `directus_activity`
WHERE
  ( `LOWER` ( `directus_activity` . `action` ) LIKE ?
    OR `LOWER` ( `directus_activity` . `user` ) LIKE ?
    OR `LOWER` ( `directus_activity` . `ip` ) LIKE ?
    OR `LOWER` ( `directus_activity` . `user_agent` ) LIKE ?
    OR `LOWER` ( `directus_activity` . `collection` ) LIKE ?
    OR `LOWER` ( `directus_activity` . `item` ) LIKE ?
    OR `LOWER` ( `directus_activity` . `comment` ) LIKE ?
    OR `LOWER` ( `directus_activity` . `origin` ) LIKE ? )
  AND ( ( ( `directus_activity` . `user` IS NOT NULL )
      OR `directus_activity` . `user` = ? ) )
ORDER BY
  `directus_activity` . `timestamp` DESC
LIMIT
  ?```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🏗 In progress
Development

Successfully merging a pull request may close this issue.