Skip to content

v2.7.0

Choose a tag to compare

@petrkalos petrkalos released this 30 May 09:08
· 78 commits to main since this release
667c2ad

The data.all 2.7.0 release places a strong emphasis on fortifying platform security, while simultaneously delivering significant new capabilities. Major advancements, such as the robust Amazon Redshift integration with enhanced sharing controls and the introduction of row and column level data filtering, dramatically improve granular access governance for diverse data assets. Furthermore, dynamic metadata forms now enable programmatic enforcement of security policies, adding another layer of data protection. These pivotal features are backed by comprehensive security enhancements including strengthened input validation, critical dependency upgrades, platform hardening (like S3 bucket versioning), improved logging and monitoring, and advanced network security controls, all contributing to a more secure and resilient data ecosystem.

Finally a warm welcome to @anushka-singh, @rbernotas , and @TejasRGitHub from Yahoo to data.all's maintainers team

What's Changed

Security Related Changes

  • fix DatabaseResourceArn SSM param by @petrkalos in #1398
  • Add init for resource lock by @noah-paige in #1426
  • Fix: Typo, missing @staticmethod in ResourcePolicyRepository method by @dlpzx in #1439
  • Redshift data sharing - Cluster encryption guardrails and information by @dlpzx in #1447
  • update checkov baseline for cdk synth output by @noah-paige in #1450
  • Updated glue crawler security config by @mourya-33 in #1434
  • allow dbmigrations lambda to invoke any alembic command by @petrkalos in #1488
  • Import Datasets: Validate that bucket is unique by @SofiaSazonova in #1498
  • check bucket encryption type: key|alias by @SofiaSazonova in #1499
  • Validate imported resource names via NamingConventionService by @SofiaSazonova in #1501
  • S3Bucket WRITE/MODIFY permissions by @petrkalos in #1472
  • Allow origins conf changes by @mourya-33 in #1486
  • fix importing sse encrypted buckets by @petrkalos in #1514
  • Redshift data sharing - Add interface for share validations and Redshift guardrails by @dlpzx in #1484
  • Update baseline removing checkov exception for glue security config by @noah-paige in #1516
  • Add External Id Conditions to Deployment Roles by @noah-paige in #1521
  • Add bucket versioning by @noah-paige in #1522
  • Add bucket versioning pt 2 by @noah-paige in #1529
  • Increase access point creation buffer time and fix bug in share cross account if condition by @SofiaSazonova in #1552
  • Bandit fix: explicitly install typing-extensions by @SofiaSazonova in #1600
  • New permission model for Redshift ADMIN connections by @dlpzx in #1573
  • warn users when evaluating a non-readonly share request by @petrkalos in #1568
  • try to create AP every time, catch if already exists by @SofiaSazonova in #1609
  • Restrict invitation to Redshift Connections and edit permission name by @dlpzx in #1638
  • Add forceDelete to shareObjects to clean-up all shareItems by @dlpzx in #1646
  • Add permission checks to markNotificationAsRead + deleteNotification by @noah-paige in #1654
  • Add Removal Policy Retain to Bucket Policy IaC by @noah-paige in #1660
  • Extend Tenant Perms Coverage by @noah-paige in #1630
  • add custom domain support for apigw by @petrkalos in #1679
  • add warning to untrust data.all account when removing an environment by @petrkalos in #1685
  • Restrict pivotRole permissions with DENY statement by @dlpzx in #1681
  • Added Token Validations by @noah-paige in #1682
  • Updating overly permissive policies tagged by checkov for environment role using least privilege principles by @mourya-33 in #1632
  • Update sanitization technique by @noah-paige in #1692
  • Fix/input validation by @noah-paige in #1693
  • Add MANAGE_SHARES permissions by @dlpzx in #1702
  • Disable introspection on prod sizing by @noah-paige in #1704
  • Bump python runtime to bump cdk klayers cryptography version by @noah-paige in #1707
  • tenant-permission tests by @dlpzx in #1694
  • Added permission check - is tenant to update SSM parameters API by @dlpzx in #1714
  • Add GET_SHARE_OBJECT permissions to get data filters API by @dlpzx in #1717
  • Add permissions on list datasets for env group + cosmetic S3 Datasets by @dlpzx in #1718
  • Add GET_WORKSHEET permission in RUN_SQL_QUERY by @dlpzx in #1716
  • Added permissions to Quicksight monitoring service layer by @dlpzx in #1715
  • Add LIST_ENVIRONMENT_DATASETS permission for listing shared datasets and cleanup unused code by @dlpzx in #1719
  • Add omics create_run unauthorized test and improve other tests by @dlpzx in #1723
  • Introduce is_owner permissions to Glossary mutations + add new integration tests by @dlpzx in #1721
  • Refactor env permissions + modify getTrustAccount by @dlpzx in #1712
  • Avoid infinite loop in glossaries checks by @dlpzx in #1725
  • Feed consistent permissions by @dlpzx in #1722
  • Votes consistent permissions by @dlpzx in #1724
  • Consistent get_<DATA_ASSET> permissions - Dashboards by @dlpzx in #1729
  • add resource permission checks by @petrkalos in #1711
  • Consistent get_<DATA_ASSET> permissions - S3_Datasets by @dlpzx in #1727
  • BUGFIX] gh-1734 by @TejasRGitHub in [#1741
  • Gh 884] IAM policy splitting for requestor IAM policies by @TejasRGitHub in [#1650
  • Bugfix] - Changes in logic to delete share db by @TejasRGitHub in [#1706
  • Bugfix] | GH-1749 -Fixing share expiration task by @TejasRGitHub in [#1750
  • Fix: Add conditional to not lock empty list of resources by @dlpzx in #1760
  • disable apigw data tracing to avoid leaking sensitive information by @petrkalos in #1798
  • allow customization of waf rate limits and api gateway throttling limits by @petrkalos in #1800
  • add s3 server access logging by @petrkalos in #1811
  • git CodeBuild baseline role permissions to use GitHub connection by @SofiaSazonova in #1813
  • create a new access logs bucket instead of importing by @petrkalos in #1815
  • Fix/custom auth 500 by @petrkalos in #1792
  • change all lambdas to structured logging by @petrkalos in #1801
  • add explicit token duration config for both JWTs by @noah-paige in #1698
  • Userguide signout flow by @noah-paige in #1629
  • log API handler response only for LOG_LEVEL DEBUG. Set log level INFO for prod deployments by @dlpzx in #1662
  • Separating Out Access Logging by @noah-paige in #1695

Major Changes

Redshift Integration

This section details significant advancements in integrating Amazon Redshift, enabling better management, sharing, and security of Redshift datasets within the platform.

  • Add Redshift datasets module by @dlpzx in #1424 - Introduces a new module for managing Redshift datasets.
  • Redshift dataset module testing: Re-added client factories, mocking clients by @dlpzx in #1449 - Enhances testing capabilities for the Redshift dataset module by re-adding client factories and mocking clients.
  • Redshift data sharing - Redshift connection types and namespace Id by @dlpzx in #1451 - Adds support for different Redshift connection types and namespace IDs for data sharing.
  • Redshift data sharing - Boilerplate for redshift dataset sharing module by @dlpzx in #1461 - Provides foundational code for the Redshift dataset sharing module.
  • Redshift data sharing - Make ShareObject.IAMRole a generic "Role" by @dlpzx in #1462 - Generalizes the IAM Role definition within ShareObject for Redshift data sharing.
  • Redshift data sharing - Polish frontend views for Redshift shares by @dlpzx in #1477 - Improves the user interface for managing Redshift shares.
  • Redshift data sharing - Add sharing tasks to process Redshift datashares by @dlpzx in #1467 - Implements tasks to process Redshift data shares.
  • Redshift data sharing - Added methods from sharing back to redshift datasets (check_on_delete, list_shared_datasets...) by @dlpzx in #1511 - Adds methods for managing shared Redshift datasets, including checks on deletion and listing shared datasets.
  • Redshift data sharing - Documentation 1 - Redshift Connections and Datasets by @dlpzx in #1512 - First part of the Redshift connections and datasets documentation.
  • Redshift data sharing - Documentation 2 - Redshift Sharing by @dlpzx in #1519 - Second part of the Redshift sharing documentation.
  • Redshift data sharing - frontend changes in the Catalog - clean by @dlpzx in #1458 - Cleans up frontend changes related to Redshift data sharing in the Catalog.
  • Fix wrong environment in the verification of redshift role by @dlpzx in #1587 - Corrects an issue with Redshift role verification related to environments.
  • Add Redshift connection tooltips and info + restrict to DATA_USER connections for import Redshift Dataset by @dlpzx in #1565 - Adds helpful tooltips and restricts Redshift dataset import to DATA_USER connections.
  • Integration tests executed on a real deployment as part of the CICD - Redshift Connections by @dlpzx in #1628 - Introduces integration tests for Redshift connections within the CI/CD pipeline, executed on a real deployment.
  • Integration tests executed on a real deployment as part of the CICD - Redshift Datasets by @dlpzx in #1636 - Adds integration tests for Redshift datasets within the CI/CD pipeline, executed on a real deployment.
  • Fix error message of Redshift share verifier by @dlpzx in #1647 - Resolves an issue with the error message from the Redshift share verifier.
  • Fix: check if Redshift table exists before publishing it to data.all by @dlpzx in #1644 - Ensures Redshift tables exist before being published to data.all.
  • Integration tests executed on a real deployment as part of the CICD - Redshift Shares by @dlpzx in #1643 - Implements integration tests for Redshift shares within the CI/CD pipeline, executed on a real deployment.

Test improvements

This section highlights a series of enhancements to the testing framework, including new integration tests, improved test stability, and better CI/CD integration for more robust deployments.

  • Redshift dataset module testing: Re-added client factories, mocking clients by @dlpzx in #1449 - Reintroduces client factories and mocking clients for enhanced Redshift dataset module testing.
  • move backend approval_tests as the last step within the backend stage by @petrkalos in #1423 - Reorders backend approval tests to be the final step in the backend stage.
  • Increase CodeBuild timeout for integration tests by @dlpzx in #1532 - Extends the timeout for CodeBuild integration tests.
  • Add Dataset integration tests - Tables, Folders by @noah-paige in #1391 - Adds integration tests for dataset tables and folders.
  • add mlstudio integ tests by @petrkalos in #1535 - Introduces integration tests for ML Studio.
  • Feat/integration tests dataset filters by @noah-paige in #1539 - Adds integration tests for dataset filters.
  • Add Dataset integration tests - Dataset missing tests, Table Profiling by @dlpzx in #1533 - Adds integration tests for missing dataset tests and table profiling.
  • Add Permissions integration tests by @dlpzx in #1550 - Introduces integration tests for permissions.
  • Add Stacks and KeyValueTags integration tests by @dlpzx in #1551 - Adds integration tests for stacks and key-value tags.
  • Add VPC network integration tests + fix tags bug in networks by @dlpzx in #1555 - Introduces integration tests for VPC networks and fixes a tag-related bug.
  • Add Glossaries integration tests by @dlpzx in #1556 - Adds integration tests for glossaries.
  • Feat/integration tests dashboards by @noah-paige in #1560 - Implements integration tests for dashboards.
  • Add Dataset integration tests - Table Columns by @dlpzx in #1548 - Adds integration tests for dataset table columns.
  • Add Dataset integration tests - S3 Share requests by @SofiaSazonova in #1389 - Introduces integration tests for S3 share requests.
  • increase codebuild timeout for integration tests by @petrkalos in #1584 - Increases the CodeBuild timeout for integration tests.
  • Fialed test fix: rename fixture session_cross_acc_env_1 by @SofiaSazonova in #1586 - Fixes a failed test by renaming a fixture.
  • Integration Test CICD: iam role bugfix by @SofiaSazonova in #1589 - Addresses an IAM role bug in the CI/CD integration tests.
  • Fixes to integration tests by @noah-paige in #1602 - General fixes applied to integration tests.
  • Add integration tests feed by @noah-paige in #1579 - Adds integration tests for the activity feed.
  • add integration tests votes by @noah-paige in #1578 - Introduces integration tests for voting functionality.
  • CICD Integration tests: s3 dataset shares, persistent shares by @SofiaSazonova in #1580 - Adds CI/CD integration tests for S3 dataset and persistent shares.
  • CICD Integration test: table test fix by @SofiaSazonova in #1605 - Fixes a table test in the CI/CD integration tests.
  • CICD Integration test: iam client fix by @SofiaSazonova in #1604 - Addresses an IAM client fix in the CI/CD integration tests.
  • Extend id token duration if tests included as part of pipeline by @noah-paige in #1606 - Extends ID token duration for pipeline tests.
  • Integration tests - refresh tokens of AWS Clients by @dlpzx in #1607 - Implements refreshing of AWS Client tokens during integration tests.
  • Fix - clean up buckets integration test - PR overwrite by @dlpzx in #1622 - Fixes an issue with cleaning up buckets in integration tests.
  • CICD: assume consumption role from environment client by @SofiaSazonova in #1624 - Modifies CI/CD to assume a consumption role from the environment client.
  • CICD: share tests fixes by @SofiaSazonova in #1625 - Provides fixes for share tests in CI/CD.
  • Dashboard Integration Test Improvements by @noah-paige in #1623 - Enhances integration tests for dashboards.
  • Integration tests executed on a real deployment as part of the CICD - Redshift Connections by @dlpzx in #1628 - Adds integration tests for Redshift connections in a real deployment.
  • Integration tests executed on a real deployment as part of the CICD - Redshift Datasets by @dlpzx in #1636 - Adds integration tests for Redshift datasets in a real deployment.
  • Integration tests executed on a real deployment as part of the CICD - Redshift Shares by @dlpzx in #1643 - Implements integration tests for Redshift shares in a real deployment.
  • Fix: integration tests missing default value for principalRoleName and msg in exception forceDelete task by @dlpzx in #1661 - Addresses missing default values and messages in integration tests.
  • fix: missing CREATE_SHARE_OBJECT permission in integration tests by @dlpzx in #1663 - Corrects a missing permission in integration tests.
  • test unhealthy shares by @petrkalos in #1649 - Adds tests for unhealthy shares.
  • assert successful updates based on stack's last log timestamp by @petrkalos in #1676 - Asserts successful updates based on stack log timestamps.
  • Tests/extend token validity by @noah-paige in #1669 - Extends token validity for tests.
  • Fix Snyk Workflow to Find Project Deps by @noah-paige in #1708 - Fixes the Snyk workflow for finding project dependencies.
  • Fix integration tests for list_environment_datasets unauthorized cases by @dlpzx in #1720 - Fixes integration tests for unauthorized list_environment_datasets cases.
  • Fix count votes integ test by @dlpzx in #1733 - Corrects an integration test for counting votes.
  • fix test_get_dashboard_unauthorized by @petrkalos in #1736 - Fixes an unauthorized dashboard retrieval test.
  • Integrational Tests fixes by @SofiaSazonova in #1744 - General fixes for integration tests.
  • CICD Integration tests: new shares for pre-existing datasets by @SofiaSazonova in #1611 - Adds CI/CD integration tests for new shares on pre-existing datasets.
  • Feat/integ tests notifications by @noah-paige in #1597 - Implements integration tests for notifications.
  • Fix global conftest shares after notifications PR by @noah-paige in #1747 - Fixes global conftest shares after notification-related pull requests.
  • Integration tests glossaries/dashboard bugfix by @SofiaSazonova in #1765 - Addresses bug fixes in integration tests for glossaries and dashboards.
  • don't import dataall from integtest by @petrkalos in #1581
  • Clean up S3 Buckets in integration test by @dlpzx in #1603
  • return EnvironmentLogsBucketName from integraiton test getEnv query by @noah-paige in #1697

Metadata Forms

This section details the evolution of the metadata forms functionality, enabling dynamic data capture, improved UI, access control, and versioning for metadata.

  • Database tables and enums for metadata forms by @SofiaSazonova in #1422 - Introduces database tables and enums specifically for metadata forms.
  • Feat: API call to query Enum values by @SofiaSazonova in #1435 - Adds an API call to query Enum values.
  • Feat: API call to query Enum values - continuation - semgrep fix by @SofiaSazonova in #1445 - Continues work on the API call to query Enum values, including a Semgrep fix.
  • Metadata forms-2: Create, display list, search list by @SofiaSazonova in #1444 - Second iteration of metadata forms, enabling creation, display, and search of lists.
  • Fix: Remove enums from i-tests for MFs by @SofiaSazonova in #1473 - Removes enums from integration tests for metadata forms.
  • Metadata forms 3: Metadata Form View page. Add, Edit fields by @SofiaSazonova in #1455 - Third iteration of metadata forms, introducing the Metadata Form View page for adding and editing fields.
  • Fix history of alembic migration scripts data filters vs metadata forms by @dlpzx in #1478 - Corrects the history of Alembic migration scripts for data filters versus metadata forms.
  • Metadata forms 4: Access Control by @SofiaSazonova in #1474 - Fourth iteration of metadata forms, focusing on access control.
  • Metadata forms 5: UI improvement + possible values validation by @SofiaSazonova in #1480 - Fifth iteration of metadata forms, bringing UI improvements and validation for possible values.
  • Metadata forms 6: attach MF to Orgs, Envs and Datasets by @SofiaSazonova in #1495 - Sixth iteration of metadata forms, enabling attachment of metadata forms to organizations, environments, and datasets.
  • Metadata form 7: Access control and deletion behaviour by @SofiaSazonova in #1540 - Seventh iteration of metadata forms, covering access control and deletion behavior.
  • Add schema in database routines in metadata forms migration script by @dlpzx in #1601 - Adds schema to database routines within the metadata forms migration script.
  • MF7 bugfix by @SofiaSazonova in #1595 - Bug fix for Metadata Form iteration 7.
  • Metadata form versioning - 1 by @SofiaSazonova in #1637 - First part of implementing metadata form versioning.
  • Metadata versioning 2 by @SofiaSazonova in #1641 - Second part of implementing metadata form versioning.
  • Metadata form Userguide by @SofiaSazonova in #1596 - Provides a user guide for metadata forms.
  • Metadata form versioning - 3 by @SofiaSazonova in #1648 - Third part of implementing metadata form versioning.
  • Metadata forms: List of Attached Forms by @SofiaSazonova in #1652 - Adds functionality to list attached metadata forms.
  • Metadata form enforcement by @SofiaSazonova in #1730 - Implements enforcement mechanisms for metadata forms.
  • Resolve conflicts in polymorphysm naming by @SofiaSazonova in #1763 - Resolves naming conflicts related to polymorphism.

Row Column filtering

This section details the implementation of fine-grained data access controls, allowing users to define row and column level filters for shared datasets.

  • Row/Column Level Data Filters by @noah-paige in #1438 - Introduces functionality for row and column level data filtering.
  • Save data filter perms before backfilling by @noah-paige in #1485 - Ensures data filter permissions are saved before backfilling.
  • add docs on how to create table filters and assign to shares by @noah-paall/pull/1506 - Adds documentation on creating table filters and assigning them to shares.
  • Fix for getting correct gluedb name for central cataloged dataset by @TejasRGitHub in #1433 - Corrects the retrieval of the Glue database name for centrally cataloged datasets.
  • fix table share revoke with no filters by @noah-paige in #1493 - Fixes an issue with table share revocation when no filters are applied.

Version Upgrades


Other Minor Changes