Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
340635f
Feature: Write to Glue Governed Tables (#560)
jaidisido Feb 25, 2021
1a0c551
Minor - Reducing scope gitworkflow
jaidisido Feb 25, 2021
6034f76
Minor - Fixing _sanitize_name
jaidisido Mar 3, 2021
796751c
Merge branch 'main' into main-governed-tables
jaidisido Mar 9, 2021
2660de2
Minor - Adding map_types flag
jaidisido Mar 9, 2021
4a4cef8
Merge branch 'main' into main-governed-tables
jaidisido Mar 12, 2021
cb10ed2
Merge branch 'main' into main-governed-tables
jaidisido Mar 12, 2021
02445e5
Minor - Aligning optional path argument with main branch
jaidisido Mar 12, 2021
09d7f7e
Merge branch 'main' into main-governed-tables
jaidisido Mar 13, 2021
8ff3a1e
Merge branch 'main' into main-governed-tables
jaidisido Mar 16, 2021
e03a23f
Merge branch 'main' into main-governed-tables
jaidisido Mar 21, 2021
c1fb2b1
Merge branch 'main' into main-governed-tables
jaidisido Mar 22, 2021
acb9d98
Merge branch 'main' into main-governed-tables
jaidisido Mar 26, 2021
ca21ff2
Merge branch 'main' into main-governed-tables
jaidisido Apr 6, 2021
0918da0
Minor tests adjustments
jaidisido Apr 7, 2021
152e570
Merge branch 'main' into main-governed-tables
jaidisido Apr 19, 2021
febcb5c
Minor - Removing Chunked parameter
jaidisido Apr 19, 2021
0afc79c
Merge branch 'main' into main-governed-tables
jaidisido Apr 27, 2021
c3e90c7
Merge branch 'main' into main-governed-tables
jaidisido May 3, 2021
886d1ed
Fixing issues from diverged branch
jaidisido May 3, 2021
82c423e
Merge branch 'main' into main-governed-tables
jaidisido May 12, 2021
bdc484b
Merge branch 'main' into main-governed-tables
jaidisido May 24, 2021
b9e868e
Major - M1 Launch (Stable)
jaidisido May 24, 2021
1382e6f
Mmproving read by concatenating zero copies of arrow ttables
jaidisido May 25, 2021
c947a26
[skip ci] - Minor - Killing thread
jaidisido May 27, 2021
b20f3e4
[skip ci] - Minor - Passing client instead of session
jaidisido May 28, 2021
5372f58
Merge branch 'main' into main-governed-tables
jaidisido May 28, 2021
444c41c
Merge branch 'main' into main-governed-tables
jaidisido Jun 4, 2021
9fca3db
Major - Adding Metadata Transaction API changes
jaidisido Jun 5, 2021
53027b5
Minor - Adding query as of time to test
jaidisido Jun 5, 2021
4294111
Merge branch 'main' into main-governed-tables
jaidisido Jun 5, 2021
23eb1c4
[skip ci] - syncing with main branch
jaidisido Jun 11, 2021
5f7dd3f
Merge branch 'main' into main-governed-tables
jaidisido Jun 16, 2021
016f305
Merge main and adapt tests to API changes from Erie team
jaidisido Jul 9, 2021
582ee59
Merge branch 'main' into main-governed-tables
jaidisido Jul 9, 2021
df4eef2
Merge main
jaidisido Aug 5, 2021
1aae34f
Merge branch 'main' into main-governed-tables
jaidisido Aug 5, 2021
a4000b8
Merging main - move to poetry
jaidisido Aug 12, 2021
b361d37
Merge main and resolve conflicts
jaidisido Sep 7, 2021
e5c6fa3
Minor - Sync with main
jaidisido Sep 8, 2021
81b8821
Merge branch 'main' into main-governed-tables
jaidisido Sep 8, 2021
ee49fb6
Merging with main
jaidisido Oct 19, 2021
fe8b152
Merge branch 'main' into main-governed-tables
jaidisido Oct 19, 2021
dd6e847
Lint
jaidisido Oct 19, 2021
865332e
Fixing get_table_obj retries
jaidisido Nov 2, 2021
170f19b
Green tests
jaidisido Nov 3, 2021
6845725
Merge branch 'main' into main-governed-tables
jaidisido Nov 3, 2021
6045e79
Minor - Fixing automated merge
jaidisido Nov 3, 2021
245d365
LakeFormation test infra
jaidisido Nov 5, 2021
a091986
Commit protocol change - Erie
jaidisido Nov 11, 2021
a3e78c6
Merge branch 'main' into main-governed-tables
jaidisido Nov 11, 2021
14c03b7
[skip ci] - Minor - Fixing catalog unit test
jaidisido Nov 22, 2021
7fadf86
[skip ci] - Minor - Adding transaction_id to does_table_exist
jaidisido Nov 22, 2021
0a8cccb
Merge branch 'main' into main-governed-tables
jaidisido Nov 22, 2021
1bb91a7
Minor - Missing projection_storage_location_template
jaidisido Nov 22, 2021
786e205
Merge branch 'main' into main-governed-tables
jaidisido Dec 1, 2021
9284d7c
Upgrading botocore
jaidisido Dec 1, 2021
38fb6a8
xfail moto
jaidisido Dec 2, 2021
2f71850
Adding s3fs to tox
jaidisido Dec 2, 2021
afb209f
LF concurrent modification exception
jaidisido Dec 2, 2021
c54ba97
catalog.py test
jaidisido Dec 2, 2021
04d44ab
lint
jaidisido Dec 2, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/workflows/minimal-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ on:
push:
branches:
- main
- main-governed-tables
pull_request:
branches:
- main
Expand Down
1 change: 0 additions & 1 deletion .github/workflows/static-checking.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ on:
push:
branches:
- main
- main-governed-tables
pull_request:
branches:
- main
Expand Down
24 changes: 14 additions & 10 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,9 +153,9 @@ or

``cd scripts``

* Deploy the Cloudformation template `base.yaml`
* Deploy the `base` CDK stack

``./deploy-base.sh``
``./deploy-stack.sh base``

* Return to the project root directory

Expand All @@ -175,7 +175,7 @@ or

* [OPTIONAL] To remove the base test environment cloud formation stack post testing:

``./test_infra/scripts/delete-base.sh``
``./test_infra/scripts/delete-stack.sh base``

### Full test environment

Expand Down Expand Up @@ -210,14 +210,18 @@ or

``cd scripts``

* Deploy the Cloudformation templates `base.yaml` and `databases.yaml`. This step could take about 15 minutes to deploy.
* Deploy the `base` and `databases` CDK stacks. This step could take about 15 minutes to deploy.

``./deploy-base.sh``
``./deploy-databases.sh``
``./deploy-stack.sh base``
``./deploy-stack.sh databases``

* [OPTIONAL] Deploy the Cloudformation template `opensearch.yaml` (if you need to test Amazon OpenSearch Service). This step could take about 15 minutes to deploy.
* [OPTIONAL] Deploy the `lakeformation` CDK stack (if you need to test against the AWS Lake Formation Service). You must ensure Lake Formation is enabled in the account.

``./deploy-opensearch.sh``
``./deploy-stack.sh lakeformation``

* [OPTIONAL] Deploy the `opensearch` CDK stack (if you need to test against the Amazon OpenSearch Service). This step could take about 15 minutes to deploy.

``./deploy-stack.sh opensearch``

* Go to the `EC2 -> SecurityGroups` console, open the `aws-data-wrangler-*` security group and configure to accept your IP from any TCP port.
- Alternatively run:
Expand Down Expand Up @@ -254,9 +258,9 @@ or

* [OPTIONAL] To remove the base test environment cloud formation stack post testing:

``./test_infra/scripts/delete-base.sh``
``./test_infra/scripts/delete-stack.sh base``

``./test_infra/scripts/delete-databases.sh``
``./test_infra/scripts/delete-stack.sh databases``

## Recommended Visual Studio Code Recommended setting

Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3
- [029 - S3 Select](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/029%20-%20S3%20Select.ipynb)
- [030 - Data Api](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/030%20-%20Data%20Api.ipynb)
- [031 - OpenSearch](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/031%20-%20OpenSearch.ipynb)
- [032 - Lake Formation Governed Tables](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/032%20-%Lake%20Formation%20Governed%20Tables.ipynb)
- [**API Reference**](https://aws-data-wrangler.readthedocs.io/en/2.12.1/api.html)
- [Amazon S3](https://aws-data-wrangler.readthedocs.io/en/2.12.1/api.html#amazon-s3)
- [AWS Glue Catalog](https://aws-data-wrangler.readthedocs.io/en/2.12.1/api.html#aws-glue-catalog)
Expand Down
2 changes: 2 additions & 0 deletions awswrangler/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
dynamodb,
emr,
exceptions,
lakeformation,
mysql,
opensearch,
postgresql,
Expand Down Expand Up @@ -44,6 +45,7 @@
"s3",
"sts",
"redshift",
"lakeformation",
"mysql",
"postgresql",
"secretsmanager",
Expand Down
13 changes: 12 additions & 1 deletion awswrangler/_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,14 +43,15 @@ class _ConfigArg(NamedTuple):
"redshift_endpoint_url": _ConfigArg(dtype=str, nullable=True, enforced=True),
"kms_endpoint_url": _ConfigArg(dtype=str, nullable=True, enforced=True),
"emr_endpoint_url": _ConfigArg(dtype=str, nullable=True, enforced=True),
"lakeformation_endpoint_url": _ConfigArg(dtype=str, nullable=True, enforced=True),
"dynamodb_endpoint_url": _ConfigArg(dtype=str, nullable=True, enforced=True),
"secretsmanager_endpoint_url": _ConfigArg(dtype=str, nullable=True, enforced=True),
# Botocore config
"botocore_config": _ConfigArg(dtype=botocore.config.Config, nullable=True),
}


class _Config: # pylint: disable=too-many-instance-attributes, too-many-public-methods
class _Config: # pylint: disable=too-many-instance-attributes,too-many-public-methods
"""Wrangler's Configuration class."""

def __init__(self) -> None:
Expand All @@ -63,6 +64,7 @@ def __init__(self) -> None:
self.redshift_endpoint_url = None
self.kms_endpoint_url = None
self.emr_endpoint_url = None
self.lakeformation_endpoint_url = None
self.dynamodb_endpoint_url = None
self.secretsmanager_endpoint_url = None
self.botocore_config = None
Expand Down Expand Up @@ -356,6 +358,15 @@ def emr_endpoint_url(self) -> Optional[str]:
def emr_endpoint_url(self, value: Optional[str]) -> None:
self._set_config_value(key="emr_endpoint_url", value=value)

@property
def lakeformation_endpoint_url(self) -> Optional[str]:
"""Property lakeformation_endpoint_url."""
return cast(Optional[str], self["lakeformation_endpoint_url"])

@lakeformation_endpoint_url.setter
def lakeformation_endpoint_url(self, value: Optional[str]) -> None:
self._set_config_value(key="lakeformation_endpoint_url", value=value)

@property
def dynamodb_endpoint_url(self) -> Optional[str]:
"""Property dynamodb_endpoint_url."""
Expand Down
2 changes: 2 additions & 0 deletions awswrangler/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,8 @@ def _get_endpoint_url(service_name: str) -> Optional[str]:
endpoint_url = _config.config.kms_endpoint_url
elif service_name == "emr" and _config.config.emr_endpoint_url is not None:
endpoint_url = _config.config.emr_endpoint_url
elif service_name == "lakeformation" and _config.config.lakeformation_endpoint_url is not None:
endpoint_url = _config.config.lakeformation_endpoint_url
elif service_name == "dynamodb" and _config.config.dynamodb_endpoint_url is not None:
endpoint_url = _config.config.dynamodb_endpoint_url
elif service_name == "secretsmanager" and _config.config.secretsmanager_endpoint_url is not None:
Expand Down
19 changes: 15 additions & 4 deletions awswrangler/catalog/_add.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
_parquet_partition_definition,
_update_table_definition,
)
from awswrangler.catalog._utils import _catalog_id, sanitize_table_name
from awswrangler.catalog._utils import _catalog_id, _transaction_id, sanitize_table_name

_logger: logging.Logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -300,7 +300,8 @@ def add_column(
table: str,
column_name: str,
column_type: str = "string",
column_comment: Optional[str] = "",
column_comment: Optional[str] = None,
transaction_id: Optional[str] = None,
boto3_session: Optional[boto3.Session] = None,
catalog_id: Optional[str] = None,
) -> None:
Expand All @@ -318,6 +319,8 @@ def add_column(
Column type.
column_comment : str
Column Comment
transaction_id: str, optional
The ID of the transaction (i.e. used with GOVERNED tables).
boto3_session : boto3.Session(), optional
Boto3 Session. The default boto3 session will be used if boto3_session receive None.
catalog_id : str, optional
Expand All @@ -341,13 +344,21 @@ def add_column(
"""
if _check_column_type(column_type):
client_glue: boto3.client = _utils.client(service_name="glue", session=boto3_session)
table_res: Dict[str, Any] = client_glue.get_table(DatabaseName=database, Name=table)
table_res: Dict[str, Any] = client_glue.get_table(
**_catalog_id(
catalog_id=catalog_id,
**_transaction_id(transaction_id=transaction_id, DatabaseName=database, Name=table),
)
)
table_input: Dict[str, Any] = _update_table_definition(table_res)
table_input["StorageDescriptor"]["Columns"].append(
{"Name": column_name, "Type": column_type, "Comment": column_comment}
)
res: Dict[str, Any] = client_glue.update_table(
**_catalog_id(catalog_id=catalog_id, DatabaseName=database, TableInput=table_input)
**_catalog_id(
catalog_id=catalog_id,
**_transaction_id(transaction_id=transaction_id, DatabaseName=database, TableInput=table_input),
)
)
if ("Errors" in res) and res["Errors"]:
for error in res["Errors"]:
Expand Down
Loading