Python: Integration tests #6398

Fokko · 2022-12-09T22:10:10Z

This is the first version of a framework to read Iceberg tables, produced by Spark, using PyIceberg. This makes it easier to run end-to-end tests and also validate the behavior of PyArrow and DuckDB.

…n-tests

rdblue · 2023-03-14T20:50:41Z

.github/workflows/python-integration.yml

+        python-version: '3.9'
+        cache: poetry
+        cache-dependency-path: |
+          ./python/poetry.lock


Should there be more than just the lock file?

If you change the dependencies, you need to regenerate the lock file. So that should be enough

rdblue · 2023-03-14T20:52:19Z

python/dev/spark-defaults.conf

+
+spark.sql.extensions                   org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
+spark.sql.catalog.demo                 org.apache.iceberg.spark.SparkCatalog
+spark.sql.catalog.demo.catalog-impl    org.apache.iceberg.rest.RESTCatalog


Less is more, thanks!

rdblue · 2023-03-14T20:54:01Z

python/dev/docker-compose-integration.yml

+      - minio:minio
+  rest:
+    image: tabulario/iceberg-rest:0.2.0
+    container_name: pyiceberg-rest


Where does this store the underlying catalog metadata?

An in-memory SQLite

rdblue · 2023-03-14T20:54:35Z

python/pyiceberg/io/pyarrow.py

@@ -428,11 +428,11 @@ def visit_not_in(self, term: BoundTerm[pc.Expression], literals: Set[Any]) -> pc

    def visit_is_nan(self, term: BoundTerm[Any]) -> pc.Expression:
        ref = pc.field(term.ref().field.name)
-        return ref.is_null(nan_is_null=True) & ref.is_valid()
+        return pc.is_nan(ref)


This probably shouldn't be in this PR right? Seems like an update with a new version of pyarrow?

This is actually to make the CI pass. I've created a PR to allow ref.is_nan() as well, but this is not released yet.

rdblue · 2023-03-14T20:54:53Z

python/pyiceberg/table/__init__.py

@@ -331,7 +331,7 @@ def __init__(
        self,
        table: Table,
        row_filter: Union[str, BooleanExpression] = ALWAYS_TRUE,
-        selected_fields: Tuple[str] = ("*",),
+        selected_fields: Tuple[str, ...] = ("*",),


This also seems like a separate PR change, but good cleanup.

rdblue · 2023-03-14T20:57:18Z

python/tests/test_integration.py

+    arrow_table = table_test_null_nan.scan(row_filter=IsNaN("col_numeric"), selected_fields=("idx", "col_numeric")).to_arrow()
+    assert len(arrow_table) == 1
+    assert arrow_table[0][0].as_py() == 1
+    assert math.isnan(arrow_table[1][0].as_py())


I think it would be easier to read these tests if you called as_py() to produce rows and validated the rows. It looks like there's just one row, but the row/column indexes are backward because this is columnar?

Let me rewrite those tests a bit

I've changed it into assert math.isnan(arrow_table["col_numeric"][0].as_py())

rdblue · 2023-03-14T20:58:10Z

python/tests/test_integration.py

+def test_duckdb_nan(table_test_null_nan_rewritten: Table) -> None:
+    con = table_test_null_nan_rewritten.scan().to_duckdb("table_test_null_nan")
+    result = con.query("SELECT idx FROM table_test_null_nan WHERE isnan(col_numeric)").fetchone()
+    assert result == (1,)


It doesn't return NaN?

Now it does :)

rdblue

Overall the changes look like a good start.

…n-tests

Fokko · 2023-03-15T19:05:33Z

Thanks for the review @rdblue we can more tests later on

* Integration tests * First version * Add caching * Add caching * Restore pyproject * WIP * NaN seems to be broken * WIP * Coming along * Cleanup * Install duckdb * Cleanup * Revert changes to poetry * Make it even nicer * Revert unneeded change * Update Spark version * Make test passing * comments

Fokko added 2 commits December 7, 2022 20:51

Integration tests

d4e1916

Merge branch 'master' of github.com:apache/iceberg into fd-integratio…

9c68ca2

…n-tests

github-actions bot added INFRA python labels Dec 9, 2022

First version

05b8aed

Fokko force-pushed the fd-integration-tests branch from ad9dae5 to 05b8aed Compare December 9, 2022 22:11

Add caching

79b8e36

Fokko force-pushed the fd-integration-tests branch from e8fc9e1 to 79b8e36 Compare December 11, 2022 21:12

Add caching

58af0c3

This was referenced Dec 15, 2022

Python: Add adlfs support (Azure DataLake FileSystem) #6392

Merged

[Python] support iceberg hadoop catalog in python library #3220

Closed

Fokko added 2 commits December 20, 2022 22:05

Merge branch 'master' of github.com:apache/iceberg into fd-integratio…

084ea4d

…n-tests

Restore pyproject

9b7fc33

Fokko mentioned this pull request Dec 21, 2022

Python: Read a date as an int #6478

Merged

Fokko mentioned this pull request Jan 11, 2023

Python write support #6564

Closed

4 tasks

Fokko mentioned this pull request Jan 30, 2023

Python: Add visitor to DNF expr into Dask/PyArrow format #6566

Merged

Fokko added this to the Python 0.4.0 release milestone Jan 30, 2023

Fokko added 13 commits January 31, 2023 17:49

WIP

b81b45f

Merge branch 'master' of github.com:apache/iceberg into fd-integratio…

cb2741b

…n-tests

NaN seems to be broken

3ff7427

WIP

cffa6cd

Coming along

e3e70ae

Merge branch 'master' of github.com:apache/iceberg into fd-integratio…

9f13128

…n-tests

Cleanup

ff08efc

Install duckdb

3b564d0

Cleanup

8cb8b9c

Revert changes to poetry

099d720

Merge branch 'master' of github.com:apache/iceberg into fd-integratio…

2b9836a

…n-tests

Make it even nicer

0f19e2f

Merge branch 'master' of github.com:apache/iceberg into fd-integratio…

c2635cf

…n-tests

Fokko added 3 commits February 23, 2023 18:32

Revert unneeded change

0bc6861

Merge branch 'master' of github.com:apache/iceberg into fd-integratio…

843e5f0

…n-tests

Update Spark version

8972f22

Fokko mentioned this pull request Feb 27, 2023

Python: Fix timezone concat issue #6946

Merged

Fokko added 3 commits February 28, 2023 09:14

Make test passing

3516159

Merge branch 'master' of github.com:apache/iceberg into fd-integratio…

8205f34

…n-tests

Merge branch 'master' of github.com:apache/iceberg into fd-integratio…

6d857ba

…n-tests

This was referenced Mar 8, 2023

Python: Add Google Cloud Storage support #6906

Closed

Python: Add positional deletes #6775

Merged

Fokko requested a review from rdblue March 12, 2023 21:51

Merge branch 'master' of github.com:apache/iceberg into fd-integratio…

89bf8f7

…n-tests

rdblue reviewed Mar 14, 2023

View reviewed changes

rdblue approved these changes Mar 14, 2023

View reviewed changes

Fokko added 2 commits March 15, 2023 16:36

Merge branch 'master' of github.com:apache/iceberg into fd-integratio…

b1ec6a5

…n-tests

comments

bf1d59a

Fokko merged commit 0807857 into apache:master Mar 15, 2023

Fokko deleted the fd-integration-tests branch March 15, 2023 19:04

This was referenced Mar 18, 2023

Python: Current Python CI ignore most of the unit tests #7135

Closed

Python: Add more unit tests to remain 90% test coverage result #7149

Closed

Fokko mentioned this pull request Oct 2, 2023

Python write support apache/iceberg-python#23

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Integration tests #6398

Python: Integration tests #6398

Fokko commented Dec 9, 2022

rdblue Mar 14, 2023

Fokko Mar 15, 2023

rdblue Mar 14, 2023

Fokko Mar 15, 2023

rdblue Mar 14, 2023

Fokko Mar 15, 2023

rdblue Mar 14, 2023

Fokko Mar 15, 2023

rdblue Mar 14, 2023

rdblue Mar 14, 2023

Fokko Mar 15, 2023

Fokko Mar 15, 2023

rdblue Mar 14, 2023

Fokko Mar 15, 2023

rdblue left a comment

Fokko commented Mar 15, 2023

Python: Integration tests #6398

Python: Integration tests #6398

Conversation

Fokko commented Dec 9, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdblue left a comment

Choose a reason for hiding this comment

Fokko commented Mar 15, 2023