Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Test project from NetTopologySuite.IO.ShapeFile repository #7

Merged
merged 123 commits into from
Nov 21, 2022

Conversation

KubaSzostak
Copy link
Member

Add Test project from NetTopologySuite.IO.ShapeFile repository

  • Most of the the tests have been ported without test logic changes. Test code was adjusted to new NTS.IO.Esri library API.
  • Some of the tests have changed test logic caused by new NTS.IO.Esri library approach. All of those changes are marked by TODO: Changed original test logic comments.
  • Some of the tests have been omitted because they are no longer relevant for new NTS.IO.Esri library. All of those tests are marked by TODO: Remove no longer relevant test comments.
  • Some of the test were already ignored (for example because of missing SHP file).
  • In my opinion all omitted and ignored test cases should be removed from source code in order to improve code quality. This requires code review to make sure important test wasn't removed by mistake.

@KubaSzostak
Copy link
Member Author

NetTopologySuite.IO.Esri currently read all polygon shapefiles always as MultiPolygon what is consistent with ESRI Shapefile Technical Description. The same for lines - they are always read as MultiLineString.

@DGuidi
Copy link

DGuidi commented Oct 30, 2022

I have some compilation issues
dbf.Keys() should be record.GetNames() I think

in Test.cs, PrintFieldValues should be something like

        protected void PrintFieldValues(IAttributesTable values)
        {
            Console.WriteLine();
            foreach (var nameVal in values.GetNames())
            {
                PrintFieldValue(nameVal, values[nameVal]);
            }
        }

or am I missing something?

There is also something in ShapeFileDataReaderTest.TestReadingShapeFileWithNulls that prevents subsequent tests to be executed, at least in my machine. There is a Debug.Assert in ShpPolygonReader that fails when this test is executed.
Debug.Assert(!shell.IsCCW, "Invalid Shapefile polygon - shell coordinates are not in in clockwise order.");

Beside this, all tests are basically green.
In my opinion, the code can be integrated and marked as experimental in the docs, so everyone out there can start using it, but the clear disclaimer that the code written against the old shapefile library can be rewritten, the behavour is different and there will be some bugs.

@KubaSzostak
Copy link
Member Author

Thanks for the feedback, @DGuidi . I've fixed compilation errors in the TestConsole app. By the way I also upgraded it from .NET Framework into .NET 6.

Assert statement was set for this piece of code

var shell = Factory.CreateLinearRing(partsBuilder[0]);
if (shell.IsCCW)
{
    if (partsBuilder.Count > 1)
    {
        ThrowInvalidRecordException("Invalid Shapefile polygon - shell coordinates are not in in clockwise order.");
    }
    Debug.Assert(!shell.IsCCW, "Invalid Shapefile polygon - shell coordinates are not in in clockwise order.");
    shell = Factory.CreateLinearRing(partsBuilder[0].Reversed());
}

According to SHP specification vertices for a polygon shell should have clockwise order. I could throw an error if that's not the case, but in order to pass other tests forgiving approach was used. If, contrary to documentation, shell IsCCW and there is only one polygon part (there are no other polygons and there are no holes) then it is recreated using Reversed() vertices order. In theory it shouldn't happen so I decided to add an Assert statement to make developer aware of the issue. In this specific test case we have test shapefile that contains incorrect data. That is intentionally alerted through Assert statement. I started thinking that after passing all other test maybe this is no longer needed? Should I remove it or there is another way to make the subsequent tests pass even if there is failure in an Assert statement?

@DGuidi
Copy link

DGuidi commented Nov 4, 2022

I think that this can be related to IO.ShapeFile issue 70 and related PR 80?

@KubaSzostak
Copy link
Member Author

Yes, it's related to issue 70 and PR 80. Currently it works like this:

  1. Read first LinearRing
    • If IsCCW == false it's fine, go further.
    • If IsCCW == true and there is only one polygon part (no other polygons, no holes) then reversed order of vertices is used.
    • If IsCCW == true and there are more polygon parts then Invalid Shapefile polygon exception is thrown .
  2. Rest of polygon parts are read assuming that:
    • IsCCW == false starts new polygon
    • IsCCW == true are holes in previously read polygon.
  3. If polygon parts are invalid (for example hole is outside the shell) then no exception is thrown. In that case geometry has IsValid property set to fasle.

Above approach was chosen in order to make the reading fast. Personally I don't like the idea of checking every geometry in the world because there can be someone who provided us invalid Shapefile. But if you like to sacrifice performance for convenience I can change the code to follow @FObermaier proposal:

  1. Order the polygon rings by their area (as if they were Polygons without holes)
  2. Pick the largest as 1st shell
  3. Check the others if they are contained by any of the shells and if so, make them a hole of that shell, otherwise create a new shell (in that case we have a MultiPolygon).
  4. Repeat 3 until there are no more rings left
  5. Build the valid [Multi]Polygon.

What is the decision?

@KubaSzostak
Copy link
Member Author

In the meantime I've moved all ported test into Depraced folder/namespace. I've also added a new test checking if MultiPolygons that have a polygon inside the hole are handled properly.

image

@DGuidi
Copy link

DGuidi commented Nov 5, 2022

What is the decision?

My2cents: as in old library (if I remember correctly), enable the "slower code that checks every geometry" only if a specific flag is enabled. Otherwise assume the data provided is correct and use the faster.code.

@KubaSzostak
Copy link
Member Author

I like this idea. I will add a flag to ShapefileReaderOptions and create two ReadGeometry() variants. How to name the flag?

  1. FixInvalidShapes
  2. FixInvalidGeometries
  3. FixStructure (as in WktRreadder)

@DGuidi
Copy link

DGuidi commented Nov 5, 2022

There should be similar flags in the codebase of the io.shapefile library. You can take a look for inspiration, but feel free to choose whatever you prefer.
EDIT: @KubaSzostak actually the code I mentioned is not in the main branch, see PolygonBuilder here

@KubaSzostak
Copy link
Member Author

I'm going to handle it using following flags (common for all geometry types):

  • Strict (throw exceptions)
  • IngoreFailures
  • SkipFailures
  • FixFailures
  • FixBasicFailures

@DGuidi
Copy link

DGuidi commented Nov 14, 2022

In my opinion, the code can be integrated and marked as experimental in the docs, so everyone out there can start using it, but the clear disclaimer that the code written against the old shapefile library can be rewritten, the behavour is different and there will be some bugs.

@FObermaier @airbreather sorry to bother you. any advice on this?

Copy link
Member

@FObermaier FObermaier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the continued work on this.

@FObermaier FObermaier merged commit f948075 into develop Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants