[SDK-17] Optimize sdk tests via reducing fixture times #1211

vbrodsky · 2023-08-17T00:01:15Z

Story: https://labelbox.atlassian.net/browse/SDK-17

The goal of this story was to reduce time spent in test setup (i.e. fixtures)

BEFORE estimate of average setup time over 5 slowest features on stage: 300s
AFTER: 224s

BEFORE: total running time of a successful run 13.5 mins (12 reruns)
AFTER: 11.5 mins (15 reruns)

Updates and improvements:

Replaced configured_project(14 data rows) with configured_project_with_one_data_row where possible
- According to my observation, each data row creation takes 1s and we create about 6K rows via the configured_project fixture during a typical run with RERUNs
- The less data rows we create, the less chance of data row creation timing out
Reduced wait for data row creation from default 30s to 3s
- I have observed that we usually create data rows, then proceed to create batch. Inside create batch, we wait for data rows completion with a sleep time of 30s. We usually never have a completion right away, so this adds 30s run time. By reducing the sleep time to 3s I am able to reduce running time of data row creation from ~ 30s to 15-20s per batch
Added a way to record slowest-running individual fixtures (codefresh provides slowest setup times per test only, not cumulative over fixtures)
- Optional feature, off by default
- Currently per worker, need to be post-processed manually to get global total
Provided a way to customize data row ids logic for the prediction_id_mapping fixtures, drastically speed up tests not requiring actual data row creation
- We should be able to extend this in another PR to reduce number of datarows created in configured_project
Reliability improvements
- Removed some more data leaks
- Another attempt to fix flaky test_filtering
- Fixed possibly flaky test test_user_and_org.

IMPROVEMENT STORIES

https://labelbox.atlassian.net/browse/SDK-203 - refactor project fixtures
https://labelbox.atlassian.net/browse/SDK-204 - reduce number of data rows created in configured_project for prediction tests

attila-papai · 2023-08-22T08:20:50Z

tests/integration/annotation_import/conftest.py

+    project._wait_until_data_rows_are_processed(data_row_ids=data_row_ids,
+                                                sleep_interval=3)
+
    project.create_batch(


should we delete the batch after yield ?

attila-papai · 2023-08-22T08:46:41Z

tests/integration/conftest.py

+
+
+@pytest.fixture
+def project_with_ontology(project):


should we rename this to project_with_empty_ontology ?

attila-papai · 2023-08-22T08:47:31Z

tests/integration/conftest.py

-    dataset.delete()
-    project.delete()


no cleanup is needed here?

I will add batch.delete() the rest of resources now come from other fixtures

attila-papai · 2023-08-22T08:53:26Z

tests/integration/conftest.py

    label = _create_label(project, data_row, ontology,
                          wait_for_label_processing)
-
+    print("create_label took: ", time.time() - start_time)


Are you keeping these print calls or were they only for debugging purposes?

no will remove

attila-papai · 2023-08-22T08:59:07Z

tests/integration/conftest.py

    data_rows = [dr.uid for dr in list(dataset.data_rows())]
-    project._wait_until_data_rows_are_processed(
-        data_row_ids=data_rows,
-        wait_processing_max_seconds=DATA_ROW_PROCESSING_WAIT_TIMEOUT_SECONDS,


I think we shouldn't remove the custom wait timeout here, because, in case of any backend issues, we will wait until the default 3600 seconds.

attila-papai · 2023-08-22T09:13:44Z

tests/integration/export_v2/conftest.py


 @pytest.fixture
-def configured_project_without_data_rows(client, ontology, rand_gen):
+def configured_project_with_one_data_row(client, ontology, rand_gen):


this does not create a data row, should we revert the rename?

another good find

attila-papai · 2023-08-22T09:15:40Z

tests/integration/test_filtering.py

+def test_where(client, project_to_test_where):
    p_a, p_b, p_c = project_to_test_where
-    p_a_name, p_b_name, p_c_name = [p.name for p in [p_a, p_b, p_c]]
+    p_a_name, p_b_name, _ = [p.name for p in [p_a, p_b, p_c]]


you can remove _ if you remove p_c from the array

kkim-labelbox · 2023-08-22T21:24:19Z

tests/integration/annotation_import/conftest.py

+    This fixture has only one data row and all predictions will be mapped to it
+
+Custom Data Row IDs Strategy:
+    Individuals can create their own fixture to supply data row ids. 


Let's specify that this is only used for tests that require "data row ids", and not actual data rows created

kkim-labelbox · 2023-08-22T21:38:05Z

tests/integration/conftest.py

+    end = time.time()
+
+    exec_time = end - start
+    if "FIXTURE_PROFILE" in os.environ:


Let's make sure to include this in codefresh & github

…r project fixture

…so making ndjson test use project without datatows

vbrodsky · 2023-08-23T18:39:12Z

@attila @kevin implemented all changes on this PR except from one batch deletion as it was giving me an error labelbox.exceptions.OperationNotAllowedException: You can't delete batches that have labels. I think not worth chasing in this case. Also did not follow up on adding the FIXTURE_PROFILE key as discussed with Kevin
Ready for your review

kkim-labelbox

LGTM! Nice work. Huge improvements!

vbrodsky added the do not merge label Aug 17, 2023

vbrodsky requested review from a team, apollonin and kkim-labelbox August 17, 2023 00:01

vbrodsky force-pushed the VB/optimize-test_AL-6133_2 branch 7 times, most recently from 4e4314d to 2dea1b1 Compare August 18, 2023 22:08

vbrodsky changed the title ~~Vb/optimize test al 6133 2~~ [SDK-17] Optimize sdk tests why reducing fixture times Aug 18, 2023

vbrodsky changed the title ~~[SDK-17] Optimize sdk tests why reducing fixture times~~ [SDK-17] Optimize sdk tests via reducing fixture times Aug 18, 2023

vbrodsky force-pushed the VB/optimize-test_AL-6133_2 branch from 2dea1b1 to 448164d Compare August 18, 2023 22:39

vbrodsky removed the do not merge label Aug 21, 2023

attila-papai reviewed Aug 22, 2023

View reviewed changes

kkim-labelbox reviewed Aug 22, 2023

View reviewed changes

Val Brodsky added 9 commits August 22, 2023 16:46

Add instrumentation for fixtures(temp)

48285e4

Convert tests that do now require many data rows prebuilt to a simple…

b95d1b8

…r project fixture

Adding an option to configure source of data rows for predictions, al…

4976908

…so making ndjson test use project without datatows

Replacing configured_project

551c1ef

Remove more sources of data leakage

6599087

Add config for fixture profiling

9e41e82

Add explanation on how to supply data row ids to prediction_id_mapping

ba2990d

Fix test_user_and_org.py

700fefe

PR updates

5a6e250

vbrodsky force-pushed the VB/optimize-test_AL-6133_2 branch from 235c789 to 5a6e250 Compare August 23, 2023 00:18

Turn on fixture profile for staging

e585e8c

vbrodsky force-pushed the VB/optimize-test_AL-6133_2 branch from 400e36e to e585e8c Compare August 23, 2023 17:39

kkim-labelbox approved these changes Aug 23, 2023

View reviewed changes

vbrodsky merged commit 7339d55 into develop Aug 23, 2023

vbrodsky deleted the VB/optimize-test_AL-6133_2 branch August 23, 2023 20:31

[SDK-17] Optimize sdk tests via reducing fixture times #1211

[SDK-17] Optimize sdk tests via reducing fixture times #1211

Uh oh!

Conversation

vbrodsky commented Aug 17, 2023 • edited by kkim-labelbox Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vbrodsky Aug 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vbrodsky commented Aug 23, 2023

Uh oh!

kkim-labelbox left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vbrodsky commented Aug 17, 2023 •

edited by kkim-labelbox

Loading

vbrodsky Aug 23, 2023 •

edited

Loading