#3 UDT support / Overall Unit & Integration tests #54

grighetto · 2020-11-23T15:26:57Z

This PR adds support for UDT anonymization and includes unit and integration tests for all anonymized elements.

For the integration tests, there's a base CQL schema (generated programmatically) that is executed against a few C* versions in Docker, then the anonymizer is executed and the output is compared to the expected CQL file in the "schemas" folder.

The unit tests are mostly self-explanatory and they validate a few complex UDTs as per the original issue #3.

Steps to run the tests:

Build the Python package as usual with: python3 setup.py install
From python/adelphi/tests/unit, run the unit tests with: pytest
From python/adelphi/integration-tests, run the integration tests with: ./adelphi-docker-tests.sh

…mini functionality into their own modules.

…that run against different versions of C* in Docker containers.

added comments to the various modules indicating what they're supposed to do.

…vide a higher-order fn which can be called with the driver's cluster object (when connected) + infrastructure for extracting what we need from this object.

…_.py entirely.

…nly those selected

…xporters

…nymizer-automated-validation # Conflicts: # schema-anonymizer/cassandra-anonymizer/anonymizer.py

…ctoring.

…ration tests.

…arly indicate what is and isn't being included in a given export operation.

…context object apart more than once and to re-use this object rather than creating new ones.

…k/col prefixes and consolidated them all to the `col_` prefix.

…rd_columns_from_table_metadata`.

…able comment since it may contain sensitive info. Updated integration tests.

…nymizer-automated-validation # Conflicts: # python/adelphi/adelphi/store.py

…ted-validation # Conflicts: # python/adelphi/adelphi/anonymize.py # python/adelphi/adelphi/cql.py # python/adelphi/adelphi/store.py

grighetto · 2020-11-23T16:20:07Z

.gitignore

@@ -1,3 +1,6 @@
+# Test output files
+output/


the integration tests output go there

grighetto · 2020-11-23T17:04:18Z

python/adelphi/integration-tests/adelphi-docker-tests.sh

+	local version=$1
+	echo "Comparing..."
+	diff schemas/$version.cql output/$version.cql
+	diffExitCode=$?


keeps track of the exit code, such that even if the last comparison succeeded, but some previous comparison failed, we want the parent process to fail

grighetto · 2020-11-23T17:05:22Z

python/adelphi/integration-tests/base-schema.cql

@@ -0,0 +1,211 @@
+CREATE KEYSPACE my_ks_0 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;


this is the base schema that gets executed in all C* versions

it's generated with: python3 schema_util.py > ../../integration-tests/base-schema.cql

grighetto · 2020-11-23T17:07:38Z

python/adelphi/tests/unit/schema_util.py

+	IndexMetadata,\
+	SimpleStrategy, \
+	UserType
+


utility script that programmatically generates a C* schema compatible with C* from 2.1.22 to 4.0-beta3

it contains a couple of keyspaces, tables, indexes (regular and custom) and complex UDTs

grighetto · 2020-11-23T17:07:57Z

python/adelphi/tests/unit/test_anonymize.py

+except ImportError:
+    import unittest  # noqa
+
+class TestCqlAnonymize(unittest.TestCase):


anonymizer unit tests

we can expand this to test the command-line params too, for now these gravitate towards the anonymization

absurdfarce · 2020-11-24T06:37:48Z

python/adelphi/adelphi/anonymize.py

@@ -1,12 +1,10 @@
+import re


Nit: this should be moved below the top-level comment for this package. At least that's the convention used in the other Adelphi packages.

absurdfarce · 2020-11-24T06:40:28Z

python/adelphi/adelphi/store.py

@@ -54,7 +54,6 @@ def partition(pred, iterable):
        log.info("Excluding system keyspaces " + ",".join((ks.name for ks in failed)))
    return passed

-


Nit: there are two spaces between the various functions in all the Python packages. We can change that if desired but a one-off change doesn't seem like a good idea.

absurdfarce · 2020-11-24T06:45:36Z

python/adelphi/adelphi/anonymize.py

+    # remove functions, aggregates and views for now
+    keyspace.functions = {}
+    keyspace.aggregates = {}
+    keyspace.views = {}


Is the removal of all of these required for anonymization? Seems like we're giving up a fair amount of useful schema info here.

@absurdfarce do you think we can safely anonymize functions and aggregates?

Maybe materialized views could be added, but since views aren't recommended for prod, it seemed like a lower priority that shouldn't block the rest of the PR. We can open a ticket to track this though.

My inclination is to say we probably can do this, but yeah, it can be held off for future work. It does seem like including as many of these as possible in some capacity is probably worth it, though, since it will make for more robust validation testing.

My inclination is to say we probably can do this

Did you consider anonymizing the function body across all the supported languages? And even so, I think hiding the implementation of an UDF algorithm is more important than just anonymizing its variable names.
Anyway, that's a much larger scope and we can revisit it the future as you said. +1

python/adelphi/adelphi/anonymize.py

absurdfarce · 2020-11-30T21:51:35Z

pytest is actually a third-party tool for running Python unit tests (see https://docs.pytest.org/en/stable/). It needs to be installed as a discrete package for use here, so make sure you do "pip install pytest" if you wish to use it.

absurdfarce · 2020-11-30T21:54:40Z

python/adelphi/tests/unit/test_anonymize.py

@@ -0,0 +1,133 @@
+import pytest


This import doesn't appear to be used: everything below uses the conventional Python unittest framework (which seems like the right decision to me) and I don't see any difference in behaviour when this import is removed.

absurdfarce · 2020-11-30T21:56:30Z

python/adelphi/tests/unit/test_anonymize.py

+		name2 = get_name("test_column", COLUMN_PREFIX)
+		name3 = get_name("another_column", COLUMN_PREFIX)
+		self.assertEqual(name1, name2)
+		self.assertNotEqual(name1, name3)


Worth adding this at the end so that we can run this thing with a simple python test_anonymize.py (i.e. to avoid any explicit dependence on pytest):

if __name__ == "__main__": unittest.main()

absurdfarce · 2020-11-30T21:58:59Z

The unit test is currently failing when run against Python 2.7.14. Looks mainly like a collection of mismatches due to the integers in use for anonymized names. It's not immediately clear to me if that process just isn't deterministic or if there's something else going on here. Integration tests using Python 2.7.14 also fail across the board for what looks to be the same reasons.

Unit test passes without issue using Python 3.9.0

Integration test is behaving... strangely when using Python 3.9.0. Still looking into what's going on there.

grighetto · 2020-12-03T03:58:57Z

@absurdfarce Thanks for the thorough review. PR feedback has been addressed.
I made the change to use pip install instead of python setup.py install to cope with your virtualenv setup (and probably other people would run into that too). Let me know how that goes for you now so we can move on with this PR.

absurdfarce · 2020-12-07T18:24:16Z

@grighetto I agree that all the comments on this PR are put to bed so we're very close. At this point the only thing holding this PR up from going in is the test failures: I continue to see unit test failures for both Python 2.7.x (2.7.14 specifically) and 3.x (3.9.0 specifically). I'd like to at least see passing tests before we move this forward. Is this something you can take a look at?

UPDATE: failures for Python 3.x were presumably user error on my part. After a clean rebuild + retest the unit tests pass when using Python 3.9.0.

absurdfarce · 2020-12-08T21:36:26Z

I'll report the error output I'm seeing from the unit tests just so there's a record of those errors somewhere.

After doing a "pip uninstall; pip install" of this package I'm observing the following failures using Python 2.x:

python2_unit_test_failures.txt

A similar process (pip uninstall; pip install; manual run of unit tests) looks to pass without errors when using Python 3.x (specifically Python 3.9.0).

…orting elements by name.

…ted-validation # Conflicts: # python/adelphi/adelphi/anonymize.py # python/adelphi/adelphi/cql.py

… return statement after merge.

grighetto · 2020-12-09T06:56:24Z

Tests fixed for Python 2 - it was caused by different dict sorting behavior across Python versions. Manually sorting the keyspace elements by name makes it deterministic and fixes the issues across the board.

absurdfarce · 2020-12-09T07:30:49Z

Recent changes to get to passing unit tests seem sensible to me... I think we're finally done with this guy!

absurdfarce and others added 26 commits November 12, 2020 13:06

Base packaging infrastructure for adelphi package. Also broke CQL, Ge…

bde91be

…mini functionality into their own modules.

Rename actual app to "adelphi"

c6d5b97

Convert to click for CLI/command mgmt

1d63762

Turned the anonymizer into a Python package; Added integration tests …

6fa2ca2

…that run against different versions of C* in Docker containers.

Fix (no-)anonymize functionality

3fb95b1

Pulled anonymization logic into it's own module, same with C* fns. Also

7b5d177

added comments to the various modules indicating what they're supposed to do.

Forgot to add, you know, the new modules 🤦

a972ede

Avoid copying the entire metadata object from the driver. Instead pro…

ca97dbb

…vide a higher-order fn which can be called with the driver's cluster object (when connected) + infrastructure for extracting what we need from this object.

#41 Renamed config param from commit_hash to git_identifier.

c76d50b

Removing old anonymizer.py script

b93418e

Moving system keyspaces list to adelphi.store

e28224f

Code review fixes. Also fixed bone-headed mistake of removing __init_…

b1e6343

…_.py entirely.

Missed py3 compat fix

b3250a3

Fixing problem of iterating over all metadata keyspaces rather than o…

493c3c0

…nly those selected

Some additional cleanup; avoid passing top-level metadata object to e…

7899d4d

…xporters

Merge remote-tracking branch 'origin/40-what-about-github' into 3-ano…

712b693

…nymizer-automated-validation # Conflicts: # schema-anonymizer/cassandra-anonymizer/anonymizer.py

#3 Moving integration tests to match new project structure after refa…

3008571

…ctoring.

#3 Created a schemas folder for integration tests.

431c6ba

#3 Improved error handling and debug messages in the anonymizer integ…

da48205

…ration tests.

Exclude system keyspaces in all cases. Also added more logging to cle…

b73af9a

…arly indicate what is and isn't being included in a given export operation.

Trying to simplify arg processing: hope is to not have to pull click …

f211fde

…context object apart more than once and to re-use this object rather than creating new ones.

#3 Added unit tests to the anonymizer and simplified code around pk/c…

163d0da

…k/col prefixes and consolidated them all to the `col_` prefix.

#3 Improved integration tests script; Reverted removal of `get_standa…

131f7a5

…rd_columns_from_table_metadata`.

#3 Fixed UDT anonymization and added tests for complex UDTs; Remove t…

817aff9

…able comment since it may contain sensitive info. Updated integration tests.

Merge remote-tracking branch 'origin/40-what-about-github' into 3-ano…

f9c2a73

…nymizer-automated-validation # Conflicts: # python/adelphi/adelphi/store.py

Merge remote-tracking branch 'origin/master' into 3-anonymizer-automa…

c595058

…ted-validation # Conflicts: # python/adelphi/adelphi/anonymize.py # python/adelphi/adelphi/cql.py # python/adelphi/adelphi/store.py

grighetto requested a review from absurdfarce November 23, 2020 16:19

grighetto added the hard label Nov 23, 2020

grighetto linked an issue Nov 23, 2020 that may be closed by this pull request

Validate schema-anonymizer on nested UDTs #3

Closed

grighetto commented Nov 23, 2020

View reviewed changes

.gitignore

@@ -1,3 +1,6 @@

# Test output files

output/

Copy link

Collaborator Author

grighetto Nov 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the integration tests output go there

grighetto commented Nov 23, 2020

View reviewed changes

absurdfarce reviewed Nov 24, 2020

View reviewed changes

absurdfarce reviewed Nov 25, 2020

View reviewed changes

python/adelphi/adelphi/anonymize.py Show resolved Hide resolved

This was referenced Nov 25, 2020

Flesh out support for (some) custom indexes #61

Open

Convert integration test run script to Python, integrate with test frameworks for more granular feedback #62

Closed

absurdfarce reviewed Nov 30, 2020

View reviewed changes

jdonenine added this to the Schema Repository milestone Dec 2, 2020

#3 Addressed PR comments.

b14d90f

grighetto added 3 commits December 9, 2020 02:13

#3 Made anonymization order deterministic across Python versions by s…

2198864

…orting elements by name.

Merge remote-tracking branch 'origin/master' into 3-anonymizer-automa…

a48f085

…ted-validation # Conflicts: # python/adelphi/adelphi/anonymize.py # python/adelphi/adelphi/cql.py

Updated test schemas; Fixed missing if not exist bug; Fixed missing…

80078bd

… return statement after merge.

absurdfarce approved these changes Dec 9, 2020

View reviewed changes

grighetto merged commit ff6c038 into master Dec 9, 2020

grighetto deleted the 3-anonymizer-automated-validation branch December 9, 2020 14:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#3 UDT support / Overall Unit & Integration tests #54

#3 UDT support / Overall Unit & Integration tests #54

grighetto commented Nov 23, 2020 •

edited

Loading

grighetto Nov 23, 2020

grighetto Nov 23, 2020 •

edited

Loading

grighetto Nov 23, 2020

grighetto Nov 23, 2020

grighetto Nov 23, 2020

grighetto Nov 23, 2020

absurdfarce Nov 24, 2020

grighetto Dec 3, 2020

absurdfarce Nov 24, 2020

grighetto Dec 3, 2020

absurdfarce Nov 24, 2020

grighetto Nov 25, 2020

absurdfarce Nov 25, 2020

grighetto Dec 3, 2020

absurdfarce commented Nov 30, 2020

absurdfarce Nov 30, 2020

grighetto Dec 3, 2020

absurdfarce Nov 30, 2020

grighetto Dec 3, 2020

absurdfarce commented Nov 30, 2020 •

edited

Loading

grighetto commented Dec 3, 2020

absurdfarce commented Dec 7, 2020 •

edited

Loading

absurdfarce commented Dec 8, 2020

grighetto commented Dec 9, 2020

absurdfarce commented Dec 9, 2020

		@@ -0,0 +1,211 @@
		CREATE KEYSPACE my_ks_0 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;

		@@ -54,7 +54,6 @@ def partition(pred, iterable):
		log.info("Excluding system keyspaces " + ",".join((ks.name for ks in failed)))
		return passed

#3 UDT support / Overall Unit & Integration tests #54

#3 UDT support / Overall Unit & Integration tests #54

Conversation

grighetto commented Nov 23, 2020 • edited Loading

Choose a reason for hiding this comment

grighetto Nov 23, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

absurdfarce commented Nov 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

absurdfarce commented Nov 30, 2020 • edited Loading

grighetto commented Dec 3, 2020

absurdfarce commented Dec 7, 2020 • edited Loading

absurdfarce commented Dec 8, 2020

grighetto commented Dec 9, 2020

absurdfarce commented Dec 9, 2020

grighetto commented Nov 23, 2020 •

edited

Loading

grighetto Nov 23, 2020 •

edited

Loading

absurdfarce commented Nov 30, 2020 •

edited

Loading

absurdfarce commented Dec 7, 2020 •

edited

Loading