Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
87d71af
add test cases
bryanyang0528 May 27, 2020
9751ea0
Add one extra digit to tutorial's numbers.
igorborgest May 28, 2020
2189fdc
chech columns_comments
bryanyang0528 May 28, 2020
b942145
add test case for to_csv
bryanyang0528 May 28, 2020
41c37b7
Merge remote-tracking branch 'upstream/dev' into add_to_csv_tests
bryanyang0528 May 28, 2020
a232488
Add read_parquet() tests for filter arg and update doc. #267
igorborgest May 28, 2020
1f8e59d
Merge pull request #269 from bryanyang0528/add_to_csv_tests
igorborgest May 28, 2020
8c22bf0
Add black and isort checks to static checking (GitHub action) #268
igorborgest May 28, 2020
b57e2b7
Fix support for encoding on read_csv and to_csv. #271
igorborgest May 29, 2020
31d9b21
Add dtype functionality to to_parquet with dataset=False.
igorborgest May 31, 2020
0d19bf2
add test for pandas argument and encoding
bryanyang0528 Jun 1, 2020
eca88db
fixed pylint
bryanyang0528 Jun 1, 2020
5bdbf68
catch the exception by pytest
bryanyang0528 Jun 1, 2020
a137e07
Merge pull request #273 from bryanyang0528/add_test_for_pass_argument…
igorborgest Jun 1, 2020
58f1eb9
Add support for reading CSV, JSON and FWF partitions. #265
igorborgest Jun 2, 2020
5249d09
Merge remote-tracking branch 'origin/dev' into dev
igorborgest Jun 2, 2020
8aa84eb
Refactoring some tests.
igorborgest Jun 2, 2020
bf6405d
Bumping version to 1.4.0.
igorborgest Jun 2, 2020
d6e9789
Update README.md
igorborgest Jun 2, 2020
0dc8057
Fix typo in README.md
igorborgest Jun 2, 2020
94994da
Add last empty line to .isort.cfg.
igorborgest Jun 2, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/static-checking.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,7 @@ jobs:
run: flake8 setup.py awswrangler testing/test_awswrangler
- name: Pylint Lint
run: pylint -j 0 awswrangler
- name: Black style
run: black --check --line-length 120 --target-version py36 awswrangler testing/test_awswrangler
- name: Imports order check (isort)
run: isort -rc --check-only awswrangler testing/test_awswrangler
6 changes: 6 additions & 0 deletions .isort.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[settings]
multi_line_output=3
include_trailing_comma=True
force_grid_wrap=0
use_parentheses=True
line_length=120
36 changes: 18 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

![AWS Data Wrangler](docs/source/_static/logo2.png?raw=true "AWS Data Wrangler")

[![Release](https://img.shields.io/badge/release-1.3.0-brightgreen.svg)](https://pypi.org/project/awswrangler/)
[![Release](https://img.shields.io/badge/release-1.4.0-brightgreen.svg)](https://pypi.org/project/awswrangler/)
[![Python Version](https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8-brightgreen.svg)](https://anaconda.org/conda-forge/awswrangler)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
Expand Down Expand Up @@ -63,23 +63,23 @@ df = wr.db.read_sql_query("SELECT * FROM external_schema.my_table", con=engine)
- [EMR](https://aws-data-wrangler.readthedocs.io/en/latest/install.html#emr)
- [From source](https://aws-data-wrangler.readthedocs.io/en/latest/install.html#from-source)
- [**Tutorials**](https://github.com/awslabs/aws-data-wrangler/tree/master/tutorials)
- [01 - Introduction](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/01%20-%20Introduction.ipynb)
- [02 - Sessions](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/02%20-%20Sessions.ipynb)
- [03 - Amazon S3](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/03%20-%20Amazon%20S3.ipynb)
- [04 - Parquet Datasets](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/04%20-%20Parquet%20Datasets.ipynb)
- [05 - Glue Catalog](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/05%20-%20Glue%20Catalog.ipynb)
- [06 - Amazon Athena](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/06%20-%20Amazon%20Athena.ipynb)
- [07 - Databases (Redshift, MySQL and PostgreSQL)](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/07%20-%20Redshift%2C%20MySQL%2C%20PostgreSQL.ipynb)
- [08 - Redshift - Copy & Unload.ipynb](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/08%20-%20Redshift%20-%20Copy%20%26%20Unload.ipynb)
- [09 - Redshift - Append, Overwrite and Upsert](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/09%20-%20Redshift%20-%20Append%2C%20Overwrite%2C%20Upsert.ipynb)
- [10 - Parquet Crawler](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/10%20-%20Parquet%20Crawler.ipynb)
- [11 - CSV Datasets](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/11%20-%20CSV%20Datasets.ipynb)
- [12 - CSV Crawler](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/12%20-%20CSV%20Crawler.ipynb)
- [13 - Merging Datasets on S3](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/13%20-%20Merging%20Datasets%20on%20S3.ipynb)
- [14 - Schema Evolution](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/14%20-%20Schema%20Evolution.ipynb)
- [15 - EMR](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/15%20-%20EMR.ipynb)
- [16 - EMR & Docker](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/16%20-%20EMR%20%26%20Docker.ipynb)
- [17 - Partition Projection](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/17%20-%20Partition%20Projection.ipynb)
- [001 - Introduction](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/001%20-%20Introduction.ipynb)
- [002 - Sessions](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/002%20-%20Sessions.ipynb)
- [003 - Amazon S3](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/003%20-%20Amazon%20S3.ipynb)
- [004 - Parquet Datasets](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/004%20-%20Parquet%20Datasets.ipynb)
- [005 - Glue Catalog](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/005%20-%20Glue%20Catalog.ipynb)
- [006 - Amazon Athena](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/006%20-%20Amazon%20Athena.ipynb)
- [007 - Databases (Redshift, MySQL and PostgreSQL)](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/007%20-%20Redshift%2C%20MySQL%2C%20PostgreSQL.ipynb)
- [008 - Redshift - Copy & Unload.ipynb](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/008%20-%20Redshift%20-%20Copy%20%26%20Unload.ipynb)
- [009 - Redshift - Append, Overwrite and Upsert](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/009%20-%20Redshift%20-%20Append%2C%20Overwrite%2C%20Upsert.ipynb)
- [010 - Parquet Crawler](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/010%20-%20Parquet%20Crawler.ipynb)
- [011 - CSV Datasets](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/011%20-%20CSV%20Datasets.ipynb)
- [012 - CSV Crawler](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/012%20-%20CSV%20Crawler.ipynb)
- [013 - Merging Datasets on S3](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/013%20-%20Merging%20Datasets%20on%20S3.ipynb)
- [014 - Schema Evolution](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/014%20-%20Schema%20Evolution.ipynb)
- [015 - EMR](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/015%20-%20EMR.ipynb)
- [016 - EMR & Docker](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/016%20-%20EMR%20%26%20Docker.ipynb)
- [017 - Partition Projection](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/017%20-%20Partition%20Projection.ipynb)
- [**API Reference**](https://aws-data-wrangler.readthedocs.io/en/latest/api.html)
- [Amazon S3](https://aws-data-wrangler.readthedocs.io/en/latest/api.html#amazon-s3)
- [AWS Glue Catalog](https://aws-data-wrangler.readthedocs.io/en/latest/api.html#aws-glue-catalog)
Expand Down
2 changes: 1 addition & 1 deletion awswrangler/__metadata__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@

__title__ = "awswrangler"
__description__ = "Pandas on AWS."
__version__ = "1.3.0"
__version__ = "1.4.0"
__license__ = "Apache License 2.0"
23 changes: 20 additions & 3 deletions awswrangler/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,10 +203,10 @@ def get_region_from_session(boto3_session: Optional[boto3.Session] = None, defau
) # pragma: no cover


def extract_partitions_from_paths(
def extract_partitions_metadata_from_paths(
path: str, paths: List[str]
) -> Tuple[Optional[Dict[str, str]], Optional[Dict[str, List[str]]]]:
"""Extract partitions from Amazon S3 paths."""
"""Extract partitions metadata from Amazon S3 paths."""
path = path if path.endswith("/") else f"{path}/"
partitions_types: Dict[str, str] = {}
partitions_values: Dict[str, List[str]] = {}
Expand All @@ -217,7 +217,7 @@ def extract_partitions_from_paths(
) # pragma: no cover
path_wo_filename: str = p.rpartition("/")[0] + "/"
if path_wo_filename not in partitions_values:
path_wo_prefix: str = p.replace(f"{path}/", "")
path_wo_prefix: str = path_wo_filename.replace(f"{path}/", "")
dirs: List[str] = [x for x in path_wo_prefix.split("/") if (x != "") and ("=" in x)]
if dirs:
values_tups: List[Tuple[str, str]] = [tuple(x.split("=")[:2]) for x in dirs] # type: ignore
Expand All @@ -238,6 +238,23 @@ def extract_partitions_from_paths(
return partitions_types, partitions_values


def extract_partitions_from_path(path_root: str, path: str) -> Dict[str, Any]:
"""Extract partitions values and names from Amazon S3 path."""
path_root = path_root if path_root.endswith("/") else f"{path_root}/"
if path_root not in path:
raise exceptions.InvalidArgumentValue(
f"Object {path} is not under the root path ({path_root})."
) # pragma: no cover
path_wo_filename: str = path.rpartition("/")[0] + "/"
path_wo_prefix: str = path_wo_filename.replace(f"{path_root}/", "")
dirs: List[str] = [x for x in path_wo_prefix.split("/") if (x != "") and ("=" in x)]
if not dirs:
return {} # pragma: no cover
values_tups: List[Tuple[str, str]] = [tuple(x.split("=")[:2]) for x in dirs] # type: ignore
values_dics: Dict[str, str] = dict(values_tups)
return values_dics


def list_sampling(lst: List[Any], sampling: float) -> List[Any]:
"""Random List sampling."""
if sampling > 1.0 or sampling <= 0.0: # pragma: no cover
Expand Down
Loading