v2.7.0
The data.all 2.7.0 release places a strong emphasis on fortifying platform security, while simultaneously delivering significant new capabilities. Major advancements, such as the robust Amazon Redshift integration with enhanced sharing controls and the introduction of row and column level data filtering, dramatically improve granular access governance for diverse data assets. Furthermore, dynamic metadata forms now enable programmatic enforcement of security policies, adding another layer of data protection. These pivotal features are backed by comprehensive security enhancements including strengthened input validation, critical dependency upgrades, platform hardening (like S3 bucket versioning), improved logging and monitoring, and advanced network security controls, all contributing to a more secure and resilient data ecosystem.
Finally a warm welcome to @anushka-singh, @rbernotas , and @TejasRGitHub from Yahoo to data.all's maintainers team
What's Changed
Security Related Changes
fix DatabaseResourceArn SSM paramby @petrkalos in #1398Add init for resource lockby @noah-paige in #1426Fix: Typo, missing @staticmethod in ResourcePolicyRepository methodby @dlpzx in #1439Redshift data sharing - Cluster encryption guardrails and informationby @dlpzx in #1447update checkov baseline for cdk synth outputby @noah-paige in #1450Updated glue crawler security configby @mourya-33 in #1434allow dbmigrations lambda to invoke any alembic commandby @petrkalos in #1488Import Datasets: Validate that bucket is uniqueby @SofiaSazonova in #1498check bucket encryption type: key|aliasby @SofiaSazonova in #1499Validate imported resource names via NamingConventionServiceby @SofiaSazonova in #1501S3Bucket WRITE/MODIFY permissionsby @petrkalos in #1472Allow origins conf changesby @mourya-33 in #1486fix importing sse encrypted bucketsby @petrkalos in #1514Redshift data sharing - Add interface for share validations and Redshift guardrailsby @dlpzx in #1484Update baseline removing checkov exception for glue security configby @noah-paige in #1516Add External Id Conditions to Deployment Rolesby @noah-paige in #1521Add bucket versioningby @noah-paige in #1522Add bucket versioning pt 2by @noah-paige in #1529Increase access point creation buffer time and fix bug in share cross account if conditionby @SofiaSazonova in #1552Bandit fix: explicitly install typing-extensionsby @SofiaSazonova in #1600New permission model for Redshift ADMIN connectionsby @dlpzx in #1573warn users when evaluating a non-readonly share requestby @petrkalos in #1568try to create AP every time, catch if already existsby @SofiaSazonova in #1609Restrict invitation to Redshift Connections and edit permission nameby @dlpzx in #1638Add forceDelete to shareObjects to clean-up all shareItemsby @dlpzx in #1646Add permission checks to markNotificationAsRead + deleteNotificationby @noah-paige in #1654Add Removal Policy Retain to Bucket Policy IaCby @noah-paige in #1660Extend Tenant Perms Coverageby @noah-paige in #1630add custom domain support for apigwby @petrkalos in #1679add warning to untrust data.all account when removing an environmentby @petrkalos in #1685Restrict pivotRole permissions with DENY statementby @dlpzx in #1681Added Token Validationsby @noah-paige in #1682Updating overly permissive policies tagged by checkov for environment role using least privilege principlesby @mourya-33 in #1632Update sanitization techniqueby @noah-paige in #1692Fix/input validationby @noah-paige in #1693Add MANAGE_SHARES permissionsby @dlpzx in #1702Disable introspection on prod sizingby @noah-paige in #1704Bump python runtime to bump cdk klayers cryptography versionby @noah-paige in #1707tenant-permission testsby @dlpzx in #1694Added permission check - is tenant to update SSM parameters APIby @dlpzx in #1714Add GET_SHARE_OBJECT permissions to get data filters APIby @dlpzx in #1717Add permissions on list datasets for env group + cosmetic S3 Datasetsby @dlpzx in #1718Add GET_WORKSHEET permission in RUN_SQL_QUERYby @dlpzx in #1716Added permissions to Quicksight monitoring service layerby @dlpzx in #1715Add LIST_ENVIRONMENT_DATASETS permission for listing shared datasets and cleanup unused codeby @dlpzx in #1719Add omics create_run unauthorized test and improve other testsby @dlpzx in #1723Introduce is_owner permissions to Glossary mutations + add new integration testsby @dlpzx in #1721Refactor env permissions + modify getTrustAccountby @dlpzx in #1712Avoid infinite loop in glossaries checksby @dlpzx in #1725Feed consistent permissionsby @dlpzx in #1722Votes consistent permissionsby @dlpzx in #1724Consistent get_<DATA_ASSET> permissions - Dashboardsby @dlpzx in #1729add resource permission checksby @petrkalos in #1711Consistent get_<DATA_ASSET> permissions - S3_Datasetsby @dlpzx in #1727BUGFIX] gh-1734by @TejasRGitHub in [#1741Gh 884] IAM policy splitting for requestor IAM policiesby @TejasRGitHub in [#1650Bugfix] - Changes in logic to delete share dbby @TejasRGitHub in [#1706Bugfix] | GH-1749 -Fixing share expiration taskby @TejasRGitHub in [#1750Fix: Add conditional to not lock empty list of resourcesby @dlpzx in #1760disable apigw data tracing to avoid leaking sensitive informationby @petrkalos in #1798allow customization of waf rate limits and api gateway throttling limitsby @petrkalos in #1800add s3 server access loggingby @petrkalos in #1811git CodeBuild baseline role permissions to use GitHub connectionby @SofiaSazonova in #1813create a new access logs bucket instead of importingby @petrkalos in #1815Fix/custom auth 500by @petrkalos in #1792change all lambdas to structured loggingby @petrkalos in #1801add explicit token duration config for both JWTsby @noah-paige in #1698Userguide signout flowby @noah-paige in #1629log API handler response only for LOG_LEVEL DEBUG. Set log level INFO for prod deploymentsby @dlpzx in #1662Separating Out Access Loggingby @noah-paige in #1695
Major Changes
Redshift Integration
This section details significant advancements in integrating Amazon Redshift, enabling better management, sharing, and security of Redshift datasets within the platform.
Add Redshift datasets moduleby @dlpzx in #1424 - Introduces a new module for managing Redshift datasets.Redshift dataset module testing: Re-added client factories, mocking clientsby @dlpzx in #1449 - Enhances testing capabilities for the Redshift dataset module by re-adding client factories and mocking clients.Redshift data sharing - Redshift connection types and namespace Idby @dlpzx in #1451 - Adds support for different Redshift connection types and namespace IDs for data sharing.Redshift data sharing - Boilerplate for redshift dataset sharing moduleby @dlpzx in #1461 - Provides foundational code for the Redshift dataset sharing module.Redshift data sharing - Make ShareObject.IAMRole a generic "Role"by @dlpzx in #1462 - Generalizes the IAM Role definition withinShareObjectfor Redshift data sharing.Redshift data sharing - Polish frontend views for Redshift sharesby @dlpzx in #1477 - Improves the user interface for managing Redshift shares.Redshift data sharing - Add sharing tasks to process Redshift datasharesby @dlpzx in #1467 - Implements tasks to process Redshift data shares.Redshift data sharing - Added methods from sharing back to redshift datasets (check_on_delete, list_shared_datasets...)by @dlpzx in #1511 - Adds methods for managing shared Redshift datasets, including checks on deletion and listing shared datasets.Redshift data sharing - Documentation 1 - Redshift Connections and Datasetsby @dlpzx in #1512 - First part of the Redshift connections and datasets documentation.Redshift data sharing - Documentation 2 - Redshift Sharingby @dlpzx in #1519 - Second part of the Redshift sharing documentation.Redshift data sharing - frontend changes in the Catalog - cleanby @dlpzx in #1458 - Cleans up frontend changes related to Redshift data sharing in the Catalog.Fix wrong environment in the verification of redshift roleby @dlpzx in #1587 - Corrects an issue with Redshift role verification related to environments.Add Redshift connection tooltips and info + restrict to DATA_USER connections for import Redshift Datasetby @dlpzx in #1565 - Adds helpful tooltips and restricts Redshift dataset import toDATA_USERconnections.Integration tests executed on a real deployment as part of the CICD - Redshift Connectionsby @dlpzx in #1628 - Introduces integration tests for Redshift connections within the CI/CD pipeline, executed on a real deployment.Integration tests executed on a real deployment as part of the CICD - Redshift Datasetsby @dlpzx in #1636 - Adds integration tests for Redshift datasets within the CI/CD pipeline, executed on a real deployment.Fix error message of Redshift share verifierby @dlpzx in #1647 - Resolves an issue with the error message from the Redshift share verifier.Fix: check if Redshift table exists before publishing it to data.allby @dlpzx in #1644 - Ensures Redshift tables exist before being published to data.all.Integration tests executed on a real deployment as part of the CICD - Redshift Sharesby @dlpzx in #1643 - Implements integration tests for Redshift shares within the CI/CD pipeline, executed on a real deployment.
Test improvements
This section highlights a series of enhancements to the testing framework, including new integration tests, improved test stability, and better CI/CD integration for more robust deployments.
Redshift dataset module testing: Re-added client factories, mocking clientsby @dlpzx in #1449 - Reintroduces client factories and mocking clients for enhanced Redshift dataset module testing.move backend approval_tests as the last step within the backend stageby @petrkalos in #1423 - Reorders backend approval tests to be the final step in the backend stage.Increase CodeBuild timeout for integration testsby @dlpzx in #1532 - Extends the timeout for CodeBuild integration tests.Add Dataset integration tests - Tables, Foldersby @noah-paige in #1391 - Adds integration tests for dataset tables and folders.add mlstudio integ testsby @petrkalos in #1535 - Introduces integration tests for ML Studio.Feat/integration tests dataset filtersby @noah-paige in #1539 - Adds integration tests for dataset filters.Add Dataset integration tests - Dataset missing tests, Table Profilingby @dlpzx in #1533 - Adds integration tests for missing dataset tests and table profiling.Add Permissions integration testsby @dlpzx in #1550 - Introduces integration tests for permissions.Add Stacks and KeyValueTags integration testsby @dlpzx in #1551 - Adds integration tests for stacks and key-value tags.Add VPC network integration tests + fix tags bug in networksby @dlpzx in #1555 - Introduces integration tests for VPC networks and fixes a tag-related bug.Add Glossaries integration testsby @dlpzx in #1556 - Adds integration tests for glossaries.Feat/integration tests dashboardsby @noah-paige in #1560 - Implements integration tests for dashboards.Add Dataset integration tests - Table Columnsby @dlpzx in #1548 - Adds integration tests for dataset table columns.Add Dataset integration tests - S3 Share requestsby @SofiaSazonova in #1389 - Introduces integration tests for S3 share requests.increase codebuild timeout for integration testsby @petrkalos in #1584 - Increases the CodeBuild timeout for integration tests.Fialed test fix: rename fixture session_cross_acc_env_1by @SofiaSazonova in #1586 - Fixes a failed test by renaming a fixture.Integration Test CICD: iam role bugfixby @SofiaSazonova in #1589 - Addresses an IAM role bug in the CI/CD integration tests.Fixes to integration testsby @noah-paige in #1602 - General fixes applied to integration tests.Add integration tests feedby @noah-paige in #1579 - Adds integration tests for the activity feed.add integration tests votesby @noah-paige in #1578 - Introduces integration tests for voting functionality.CICD Integration tests: s3 dataset shares, persistent sharesby @SofiaSazonova in #1580 - Adds CI/CD integration tests for S3 dataset and persistent shares.CICD Integration test: table test fixby @SofiaSazonova in #1605 - Fixes a table test in the CI/CD integration tests.CICD Integration test: iam client fixby @SofiaSazonova in #1604 - Addresses an IAM client fix in the CI/CD integration tests.Extend id token duration if tests included as part of pipelineby @noah-paige in #1606 - Extends ID token duration for pipeline tests.Integration tests - refresh tokens of AWS Clientsby @dlpzx in #1607 - Implements refreshing of AWS Client tokens during integration tests.Fix - clean up buckets integration test - PR overwriteby @dlpzx in #1622 - Fixes an issue with cleaning up buckets in integration tests.CICD: assume consumption role from environment clientby @SofiaSazonova in #1624 - Modifies CI/CD to assume a consumption role from the environment client.CICD: share tests fixesby @SofiaSazonova in #1625 - Provides fixes for share tests in CI/CD.Dashboard Integration Test Improvementsby @noah-paige in #1623 - Enhances integration tests for dashboards.Integration tests executed on a real deployment as part of the CICD - Redshift Connectionsby @dlpzx in #1628 - Adds integration tests for Redshift connections in a real deployment.Integration tests executed on a real deployment as part of the CICD - Redshift Datasetsby @dlpzx in #1636 - Adds integration tests for Redshift datasets in a real deployment.Integration tests executed on a real deployment as part of the CICD - Redshift Sharesby @dlpzx in #1643 - Implements integration tests for Redshift shares in a real deployment.Fix: integration tests missing default value for principalRoleName and msg in exception forceDelete taskby @dlpzx in #1661 - Addresses missing default values and messages in integration tests.fix: missing CREATE_SHARE_OBJECT permission in integration testsby @dlpzx in #1663 - Corrects a missing permission in integration tests.test unhealthy sharesby @petrkalos in #1649 - Adds tests for unhealthy shares.assert successful updates based on stack's last log timestampby @petrkalos in #1676 - Asserts successful updates based on stack log timestamps.Tests/extend token validityby @noah-paige in #1669 - Extends token validity for tests.Fix Snyk Workflow to Find Project Depsby @noah-paige in #1708 - Fixes the Snyk workflow for finding project dependencies.Fix integration tests for list_environment_datasets unauthorized casesby @dlpzx in #1720 - Fixes integration tests for unauthorizedlist_environment_datasetscases.Fix count votes integ testby @dlpzx in #1733 - Corrects an integration test for counting votes.fix test_get_dashboard_unauthorizedby @petrkalos in #1736 - Fixes an unauthorized dashboard retrieval test.Integrational Tests fixesby @SofiaSazonova in #1744 - General fixes for integration tests.CICD Integration tests: new shares for pre-existing datasetsby @SofiaSazonova in #1611 - Adds CI/CD integration tests for new shares on pre-existing datasets.Feat/integ tests notificationsby @noah-paige in #1597 - Implements integration tests for notifications.Fix global conftest shares after notifications PRby @noah-paige in #1747 - Fixes global conftest shares after notification-related pull requests.Integration tests glossaries/dashboard bugfixby @SofiaSazonova in #1765 - Addresses bug fixes in integration tests for glossaries and dashboards.don't import dataall from integtestby @petrkalos in #1581Clean up S3 Buckets in integration testby @dlpzx in #1603return EnvironmentLogsBucketName from integraiton test getEnv queryby @noah-paige in #1697
Metadata Forms
This section details the evolution of the metadata forms functionality, enabling dynamic data capture, improved UI, access control, and versioning for metadata.
Database tables and enums for metadata formsby @SofiaSazonova in #1422 - Introduces database tables and enums specifically for metadata forms.Feat: API call to query Enum valuesby @SofiaSazonova in #1435 - Adds an API call to query Enum values.Feat: API call to query Enum values - continuation - semgrep fixby @SofiaSazonova in #1445 - Continues work on the API call to query Enum values, including a Semgrep fix.Metadata forms-2: Create, display list, search listby @SofiaSazonova in #1444 - Second iteration of metadata forms, enabling creation, display, and search of lists.Fix: Remove enums from i-tests for MFsby @SofiaSazonova in #1473 - Removes enums from integration tests for metadata forms.Metadata forms 3: Metadata Form View page. Add, Edit fieldsby @SofiaSazonova in #1455 - Third iteration of metadata forms, introducing the Metadata Form View page for adding and editing fields.Fix history of alembic migration scripts data filters vs metadata formsby @dlpzx in #1478 - Corrects the history of Alembic migration scripts for data filters versus metadata forms.Metadata forms 4: Access Controlby @SofiaSazonova in #1474 - Fourth iteration of metadata forms, focusing on access control.Metadata forms 5: UI improvement + possible values validationby @SofiaSazonova in #1480 - Fifth iteration of metadata forms, bringing UI improvements and validation for possible values.Metadata forms 6: attach MF to Orgs, Envs and Datasetsby @SofiaSazonova in #1495 - Sixth iteration of metadata forms, enabling attachment of metadata forms to organizations, environments, and datasets.Metadata form 7: Access control and deletion behaviourby @SofiaSazonova in #1540 - Seventh iteration of metadata forms, covering access control and deletion behavior.Add schema in database routines in metadata forms migration scriptby @dlpzx in #1601 - Adds schema to database routines within the metadata forms migration script.MF7 bugfixby @SofiaSazonova in #1595 - Bug fix for Metadata Form iteration 7.Metadata form versioning - 1by @SofiaSazonova in #1637 - First part of implementing metadata form versioning.Metadata versioning 2by @SofiaSazonova in #1641 - Second part of implementing metadata form versioning.Metadata form Userguideby @SofiaSazonova in #1596 - Provides a user guide for metadata forms.Metadata form versioning - 3by @SofiaSazonova in #1648 - Third part of implementing metadata form versioning.Metadata forms: List of Attached Formsby @SofiaSazonova in #1652 - Adds functionality to list attached metadata forms.Metadata form enforcementby @SofiaSazonova in #1730 - Implements enforcement mechanisms for metadata forms.Resolve conflicts in polymorphysm namingby @SofiaSazonova in #1763 - Resolves naming conflicts related to polymorphism.
Row Column filtering
This section details the implementation of fine-grained data access controls, allowing users to define row and column level filters for shared datasets.
Row/Column Level Data Filtersby @noah-paige in #1438 - Introduces functionality for row and column level data filtering.Save data filter perms before backfillingby @noah-paige in #1485 - Ensures data filter permissions are saved before backfilling.add docs on how to create table filters and assign to sharesby @noah-paall/pull/1506 - Adds documentation on creating table filters and assigning them to shares.Fix for getting correct gluedb name for central cataloged datasetby @TejasRGitHub in #1433 - Corrects the retrieval of the Glue database name for centrally cataloged datasets.fix table share revoke with no filtersby @noah-paige in #1493 - Fixes an issue with table share revocation when no filters are applied.
Version Upgrades
Dependencies: Upgradefast-xml-parserto 4.4.1by @dlpzx in #1441Upgrade axios versionby @noah-paige in #1483Bump flask-cors from 4.0.1 to 5.0.0 in /backendby @dependabot in #1515Bump webpack to 5.94.0by @noah-paige in #1517Bump micromatch from 4.0.7 to 4.0.8 in /frontendby @dependabot in #1518Upgradepath-to-regexpto 0.1.10by @dlpzx in #1525Upgrade body parser dependencyby @noah-paige in #1530Upgrade send to 0.19.0 and express to 4.20.0by @dlpzx in #1542Upgrade rollup to non-vulnerable version 2.79.1 -> 3.29.5by @dlpzx in #1571set typeguard version 4.2.1by @noah-paige in #1634Upgradehttp-proxy-middleware2.0.7by @dlpzx in #1656Bump werkzeug from 3.0.3 to 3.0.6 in /backend/dataall/base/cdkproxyby @dependabot in #1666Bump werkzeug 3.0.0 to 3.0.6 in tests and integration testsby @dlpzx in #1672Upgrade Spark version to 3.3by @noah-paige in #1675update fastapi dependencyby @noah-paige in #1699Upgrade "cross-spawn" to "7.0.5"by @dlpzx in #1701Bump deps and fix snyk workflowby @noah-paige in #1745Upgrade pyjwtby @dlpzx in #1758npm audit fixby @SofiaSazonova in #1789Bump image-size from 1.1.1 to 1.2.1 in /frontendby @dependabot in #1799Bump @babel/runtime from 7.24.7 to 7.27.0 in /frontendby @dependabot in #1803Bump @babel/helpers from 7.24.7 to 7.27.0 in /frontendby @dependabot in #1802
Other Minor Changes
fix delete_env parameterby @petrkalos in #1397Fix deprecated mui tree viewby @noah-paige in #1427pass ShareableType instead of it's value and log exception detailsby @petrkalos in #1452Issue1456: Fix for persistent email remindersby @anushka-singh in #1457hide access point consumer details if access points feature is disabledby @fourtyplustwo in #1466Fix local share processors registeredby @noah-paige in #1470Issue1468: Submit request redirectby @anushka-singh in #1469Bugfix: Parsing error in Admin settings tabby @SofiaSazonova in #1482Run reapply automatically if Share Verifier Task detects Unhealthy Shared Itemsby @noah-paige in #1476Modifying Regex for fixing redirection not working when visitin s3-datasetsby @TejasRGitHub in #1494Make log query period configurableby @SofiaSazonova in #1503feat(GH-1083) share expirationby @TejasRGitHub in #1489Config log retentionby @noah-paige in #1527Add check to skip processor initialization if there are not shareable items in revoke, verify and reapplyby @dlpzx in #1538Updating logic to check if expiration is changed on the UIby @TejasRGitHub in #1545Allow configurable Region to run CDK IaC checksby @noah-paige in #1531fix setting maintenance modes enumby @noah-paige in #1567Gh-1528] Configurable stack logs displayby @TejasRGitHub in [#1559Gh 1570] feature flag for table metricsby @TejasRGitHub in [#1574migrate local server to FastAPIby @petrkalos in #1577Enable hyperlinks in dataset descriptionby @rbernotas in #1591retry for LF grant_permissionsby @SofiaSazonova in #1585Fix share expiration date calculation for end-of-month daysby @dlpzx in #1594Pipeline Module Updatesby @noah-paige in #1616User modal dialog - team linkby @rbernotas in #1627Changes to the logic of calculating expiration dateby @TejasRGitHub in #1635Fix: Remove optional AllowWrites - not supported in all regionsby @dlpzx in #1664Fix: Remove optional AllowWrites 2 - not supported in all regionsby @dlpzx in #1667Added error view and unified utility to check tenant userby @dlpzx in #1657Limit Response info dataset queriesby @noah-paige in #1665add salt to FrontendCognitoConfig to make it always runby @petrkalos in #1674Lambda Event Logs Handlingby @noah-paige in #1678get-parameter CloudfrontDistributionDomainName from us-east-1by @petrkalos in #1687Move worksheet logic to service layerby @dlpzx in #1696Add snyk workflow on scheduleby @noah-paige in #1705Unify Logger Config for Tasksby @noah-paige in #1709Change Snyk Actionsby @noah-paige in #1713make dashboards optional based on configby @petrkalos in #1677Fix ruff format for latest versionby @dlpzx in #1755GH-1756 - UI Minor changeby @TejasRGitHub in #1757Update MAINTAINERS.mdby @NickCorbett in #1766Symlink fix for GitHub sourceby @SofiaSazonova in #1812