Meta: 🔡 Large synthetic dataset for performance evaluation #2516

PeterNerlich · 2023-10-31T17:14:03Z

This issue is about research into improving the development workflow when interested in performance bottlenecks. While we could just create a copy of the live system for local experimentation (the same as we do with the test system every so often), it might contain personal information which we rather don't want to even be able to obtain as developers.

Devise a method of generating reasonably authentic data in comparison with the live system, namely a similar number of regions with similar content.
Verify that a dev environment with that data behaves similarly to the live environment in performance (mind resources available, which might vary greatly between the machines used for development; while e.g. the redis cache is probably only to be mentioned to the developer as a potentially deciding if not active).

This should be separate from the existing test_data.json fixture, as the small dataset is highly preferable during quick iterations on a feature, except when performance with large data is the focus. The developer should be able to switch between them with relative ease.

The text was updated successfully, but these errors were encountered:

timobrembeck · 2023-10-31T17:21:27Z

Just for completeness, I want to mention

./tools/integreat-cms-cli duplicate_pages augsburg

which can be used to generate a lot of pages, however it does not cover specific edge cases which are not reflected in the original test data. So one solution could be to create a more diverse baseline test data, which hopefully would result in a more realistic dataset if the duplication algorithm is executed a few times (~1k pages for large regions is realistic).

david-venhoff · 2023-11-07T11:19:17Z

however it does not cover specific edge cases which are not reflected in the original test data

One edge case example would be #2530, where performance testing requires lots of different links, which cannot be created using the duplicate_pages tool

timobrembeck changed the title ~~META: Large synthetic dataset for performance evaluation~~ Meta: Large synthetic dataset for performance evaluation Oct 31, 2023

timobrembeck added this to the Meta Issues milestone Oct 31, 2023

timobrembeck changed the title ~~Meta: Large synthetic dataset for performance evaluation~~ Meta: 🔡 Large synthetic dataset for performance evaluation Nov 4, 2023

timobrembeck mentioned this issue Nov 7, 2023

Long loading time on new "To-Do" dashboard for regions with a lot of content #2536

Closed

david-venhoff mentioned this issue Feb 13, 2024

Link list is loading very slow in regions with a lot of content #2539

Open

This was referenced Apr 23, 2024

Automatically update internal links #2530

Open

Allow dynamic export of database objects #2766

Closed

MizukiTemma modified the milestones: Meta Issues, 24Q2 May 2, 2024

MizukiTemma mentioned this issue May 3, 2024

Show broken links in network management #1443

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meta: 🔡 Large synthetic dataset for performance evaluation #2516

Meta: 🔡 Large synthetic dataset for performance evaluation #2516

PeterNerlich commented Oct 31, 2023

timobrembeck commented Oct 31, 2023

david-venhoff commented Nov 7, 2023

Meta: 🔡 Large synthetic dataset for performance evaluation #2516

Meta: 🔡 Large synthetic dataset for performance evaluation #2516

Comments

PeterNerlich commented Oct 31, 2023

timobrembeck commented Oct 31, 2023

david-venhoff commented Nov 7, 2023