-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#3 UDT support / Overall Unit & Integration tests #54
Conversation
…mini functionality into their own modules.
…that run against different versions of C* in Docker containers.
added comments to the various modules indicating what they're supposed to do.
…vide a higher-order fn which can be called with the driver's cluster object (when connected) + infrastructure for extracting what we need from this object.
…nly those selected
…nymizer-automated-validation # Conflicts: # schema-anonymizer/cassandra-anonymizer/anonymizer.py
…arly indicate what is and isn't being included in a given export operation.
…context object apart more than once and to re-use this object rather than creating new ones.
…k/col prefixes and consolidated them all to the `col_` prefix.
…rd_columns_from_table_metadata`.
…able comment since it may contain sensitive info. Updated integration tests.
…nymizer-automated-validation # Conflicts: # python/adelphi/adelphi/store.py
…ted-validation # Conflicts: # python/adelphi/adelphi/anonymize.py # python/adelphi/adelphi/cql.py # python/adelphi/adelphi/store.py
@@ -1,3 +1,6 @@ | |||
# Test output files | |||
output/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the integration tests output go there
local version=$1 | ||
echo "Comparing..." | ||
diff schemas/$version.cql output/$version.cql | ||
diffExitCode=$? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keeps track of the exit code, such that even if the last comparison succeeded, but some previous comparison failed, we want the parent process to fail
@@ -0,0 +1,211 @@ | |||
CREATE KEYSPACE my_ks_0 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the base schema that gets executed in all C* versions
it's generated with: python3 schema_util.py > ../../integration-tests/base-schema.cql
IndexMetadata,\ | ||
SimpleStrategy, \ | ||
UserType | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utility script that programmatically generates a C* schema compatible with C* from 2.1.22 to 4.0-beta3
it contains a couple of keyspaces, tables, indexes (regular and custom) and complex UDTs
except ImportError: | ||
import unittest # noqa | ||
|
||
class TestCqlAnonymize(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anonymizer unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can expand this to test the command-line params too, for now these gravitate towards the anonymization
python/adelphi/adelphi/anonymize.py
Outdated
@@ -1,12 +1,10 @@ | |||
import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this should be moved below the top-level comment for this package. At least that's the convention used in the other Adelphi packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
python/adelphi/adelphi/store.py
Outdated
@@ -54,7 +54,6 @@ def partition(pred, iterable): | |||
log.info("Excluding system keyspaces " + ",".join((ks.name for ks in failed))) | |||
return passed | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: there are two spaces between the various functions in all the Python packages. We can change that if desired but a one-off change doesn't seem like a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
# remove functions, aggregates and views for now | ||
keyspace.functions = {} | ||
keyspace.aggregates = {} | ||
keyspace.views = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the removal of all of these required for anonymization? Seems like we're giving up a fair amount of useful schema info here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@absurdfarce do you think we can safely anonymize functions and aggregates?
Maybe materialized views could be added, but since views aren't recommended for prod, it seemed like a lower priority that shouldn't block the rest of the PR. We can open a ticket to track this though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My inclination is to say we probably can do this, but yeah, it can be held off for future work. It does seem like including as many of these as possible in some capacity is probably worth it, though, since it will make for more robust validation testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My inclination is to say we probably can do this
Did you consider anonymizing the function body across all the supported languages? And even so, I think hiding the implementation of an UDF algorithm is more important than just anonymizing its variable names.
Anyway, that's a much larger scope and we can revisit it the future as you said. +1
pytest is actually a third-party tool for running Python unit tests (see https://docs.pytest.org/en/stable/). It needs to be installed as a discrete package for use here, so make sure you do "pip install pytest" if you wish to use it. |
@@ -0,0 +1,133 @@ | |||
import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import doesn't appear to be used: everything below uses the conventional Python unittest framework (which seems like the right decision to me) and I don't see any difference in behaviour when this import is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
name2 = get_name("test_column", COLUMN_PREFIX) | ||
name3 = get_name("another_column", COLUMN_PREFIX) | ||
self.assertEqual(name1, name2) | ||
self.assertNotEqual(name1, name3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding this at the end so that we can run this thing with a simple python test_anonymize.py
(i.e. to avoid any explicit dependence on pytest):
if __name__ == "__main__":
unittest.main()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
The unit test is currently failing when run against Python 2.7.14. Looks mainly like a collection of mismatches due to the integers in use for anonymized names. It's not immediately clear to me if that process just isn't deterministic or if there's something else going on here. Integration tests using Python 2.7.14 also fail across the board for what looks to be the same reasons. Unit test passes without issue using Python 3.9.0 Integration test is behaving... strangely when using Python 3.9.0. Still looking into what's going on there. |
@absurdfarce Thanks for the thorough review. PR feedback has been addressed. |
@grighetto I agree that all the comments on this PR are put to bed so we're very close. At this point the only thing holding this PR up from going in is the test failures: I continue to see unit test failures for both Python 2.7.x (2.7.14 specifically) and 3.x (3.9.0 specifically). I'd like to at least see passing tests before we move this forward. Is this something you can take a look at? UPDATE: failures for Python 3.x were presumably user error on my part. After a clean rebuild + retest the unit tests pass when using Python 3.9.0. |
I'll report the error output I'm seeing from the unit tests just so there's a record of those errors somewhere. After doing a "pip uninstall; pip install" of this package I'm observing the following failures using Python 2.x: python2_unit_test_failures.txt A similar process (pip uninstall; pip install; manual run of unit tests) looks to pass without errors when using Python 3.x (specifically Python 3.9.0). |
…orting elements by name.
…ted-validation # Conflicts: # python/adelphi/adelphi/anonymize.py # python/adelphi/adelphi/cql.py
… return statement after merge.
Tests fixed for Python 2 - it was caused by different dict sorting behavior across Python versions. Manually sorting the keyspace elements by name makes it deterministic and fixes the issues across the board. |
Recent changes to get to passing unit tests seem sensible to me... I think we're finally done with this guy! |
This PR adds support for UDT anonymization and includes unit and integration tests for all anonymized elements.
For the integration tests, there's a base CQL schema (generated programmatically) that is executed against a few C* versions in Docker, then the anonymizer is executed and the output is compared to the expected CQL file in the "schemas" folder.
The unit tests are mostly self-explanatory and they validate a few complex UDTs as per the original issue #3.
Steps to run the tests:
python3 setup.py install
python/adelphi/tests/unit
, run the unit tests with:pytest
python/adelphi/integration-tests
, run the integration tests with:./adelphi-docker-tests.sh