Acceptance tests keeps running for hours #1587

davidgamez · 2023-09-26T12:41:50Z

Description

Acceptance tests kept running without stopping and had to be canceled.

Examples of long-running tests:

jcpitre · 2023-09-27T14:54:29Z

I tested with this dataset: https://storage.googleapis.com/storage/v1/b/mdb-latest/o/de-unknown-ulmer-eisenbahnfreunde-gtfs-1081.zip?alt=media (It's 900 MB with 40 000 000 shape rows)
I had to increase the heap to 8GB, if not there would be an out-of-memory error.
With 8GB it did run seemingly forever.

It seems the problem started with this PR #1553

I locally removed the code from this PR and it ran fast.

Possible solutions:

There might be ways to optimize the code.
We should try to establish limits over which we don't run a validator (and issue a notice about that) Initially we should target the trip vs shape distance validation.

Also it would be useful to add more logging. I don't think there is any DEBUG logging in the validator.
This would help pinpoint problems faster. For example if there was a DEBUG log before and after calling a validator, it would have told us what takes a lot of time.

bdferris-v2 · 2023-09-28T19:27:01Z

Looking at #1553, there is definitely so inefficient code in there.

For example:

    List<String> uniqueShapeIds =
        shapeTable.getEntities().stream()
            .map(GtfsShape::shapeId)
            .distinct()
            .collect(Collectors.toList());

This is iterating over every line in shapes.txt, when the set of unique shape ids is already available via GtfsShapeTableContainer#byShapeIdMap().

Then, the worst offender:

    uniqueShapeIds.forEach(
        shapeId -> {
          double maxShapeDist =
              shapeTable.getEntities().stream()
                  .filter(s -> s.shapeId().equals(shapeId))
                  .mapToDouble(GtfsShape::shapeDistTraveled)
                  .max()
                  .orElse(Double.NEGATIVE_INFINITY);

For each shape ids, we are again looping over every entry in shapes.txt to find matching shape points. This is where the NxM blow-up and slow-down is likely coming from. Again GtfsShapeTableContainer#byShapeIdMap() already has all the shape points grouped by shape id and should be used directly.

I'd also point out that the shape points and stop points below should both probably be filtered by hasShapeDistTraveled as well.

* fix: comments on #1587 * fix: removed unused variable * fix: formatting * fix: rollback sources for acceptance test * fix: removing source causing timeout * fix: removing source causing timeout

davidgamez mentioned this issue Sep 27, 2023

Limit the number of shapes trip_distance_exceeds_shape_distance supports #1589

Closed

3 tasks

jcpitre linked a pull request Sep 27, 2023 that will close this issue

Fix: Removed a dataset that is too big from the list used in PR acceptance test #1590

Merged

4 tasks

jcpitre mentioned this issue Sep 27, 2023

Fix: Removed a dataset that is too big from the list used in PR acceptance test #1590

Merged

4 tasks

jcpitre closed this as completed in #1590 Sep 28, 2023

cka-y mentioned this issue Sep 29, 2023

Fix inefficient code related to #1553 #1591

Closed

cka-y added a commit that referenced this issue Sep 29, 2023

fix: comments on #1587

c64c200

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Acceptance tests keeps running for hours #1587

Acceptance tests keeps running for hours #1587

davidgamez commented Sep 26, 2023

jcpitre commented Sep 27, 2023 •

edited

Loading

bdferris-v2 commented Sep 28, 2023

Acceptance tests keeps running for hours #1587

Acceptance tests keeps running for hours #1587

Comments

davidgamez commented Sep 26, 2023

Description

jcpitre commented Sep 27, 2023 • edited Loading

bdferris-v2 commented Sep 28, 2023

jcpitre commented Sep 27, 2023 •

edited

Loading