Skip to content

Conversation

@jaidisido
Copy link
Contributor

Issue #, if available:
#586

Description of changes:
path argument is now optional. The code changes implementing this feature are based on the below logic tree:

if dataset is False:
	if path is None:
		raise Exception("you must pass `path`")
	else:
		no problem
else:
	if database and table are None:
		if path is None:
			raise Exception("you must pass `path`")
		else:
			no problem
	else:
		if path is None:
			if table exists:
				path = catalog_path
			else:
				raise Exception("you must pass `path`")
		else:
			if table exists:
				if path != catalog_path:
					raise Exception("`path` must be equal to catalog path")
			else:
				no problem

Added and modified s3 tests to evaluate impact

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…ption if existing table path is different to input
@jaidisido jaidisido requested a review from igorborgest March 12, 2021 17:52
@jaidisido jaidisido self-assigned this Mar 12, 2021
@jaidisido jaidisido added this to the 2.6.0 milestone Mar 12, 2021
@jaidisido jaidisido added enhancement New feature or request major release Will be addressed in the next major release labels Mar 12, 2021
Copy link
Contributor Author

@jaidisido jaidisido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified some code sections

df = pd.DataFrame({"c1": [None, 1, None]}, dtype="Int16")
wr.s3.to_parquet(
df=df,
path=path,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since path is now optional, it was removed from some tests to check that they still pass even without referencing it

Comment on lines +106 to +120
def test_missing_or_wrong_path(path, glue_database, glue_table):
# Missing path
df = pd.DataFrame({"FooBoo": [1, 2, 3]})
with pytest.raises(wr.exceptions.InvalidArgumentValue):
wr.s3.to_parquet(df=df)
with pytest.raises(wr.exceptions.InvalidArgumentCombination):
wr.s3.to_parquet(df=df, dataset=True)
with pytest.raises(wr.exceptions.InvalidArgumentValue):
wr.s3.to_parquet(df=df, dataset=True, database=glue_database, table=glue_table)

# Wrong path
wr.s3.to_parquet(df=df, path=path, dataset=True, database=glue_database, table=glue_table)
wrong_path = "s3://bucket/prefix"
with pytest.raises(wr.exceptions.InvalidArgumentValue):
wr.s3.to_parquet(df=df, path=wrong_path, dataset=True, database=glue_database, table=glue_table)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In certain cases, Wrangler must throw an error if the path is missing or wrong

@jaidisido jaidisido added minor release Will be addressed in the next minor release and removed major release Will be addressed in the next major release labels Mar 12, 2021
Copy link
Contributor

@igorborgest igorborgest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jaidisido jaidisido merged commit bd5526e into main Mar 13, 2021
@jaidisido jaidisido deleted the enhancement/glue_catalog_wrong_path branch March 13, 2021 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request minor release Will be addressed in the next minor release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

s3.to_parquet(): weird behaviour when inserting existing partitions on a different location with mode='overwrite_partitions'

2 participants