Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 2.17.0
current_version = 2.18.0
commit = False
tag = False
tag_name = {new_version}
Expand Down
6 changes: 3 additions & 3 deletions CONTRIBUTING_COMMON_ERRORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ Requirement already satisfied: pbr!=2.1.0,>=2.0.0 in ./.venv/lib/python3.7/site-
Using legacy 'setup.py install' for python-Levenshtein, since package 'wheel' is not installed.
Installing collected packages: awswrangler, python-Levenshtein
Attempting uninstall: awswrangler
Found existing installation: awswrangler 2.17.0
Uninstalling awswrangler-2.17.0:
Successfully uninstalled awswrangler-2.17.0
Found existing installation: awswrangler 2.18.0
Uninstalling awswrangler-2.18.0:
Successfully uninstalled awswrangler-2.18.0
Running setup.py develop for awswrangler
Running setup.py install for python-Levenshtein ... error
ERROR: Command errored out with exit status 1:
Expand Down
74 changes: 37 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, Q

> An [AWS Professional Service](https://aws.amazon.com/professional-services/) open source initiative | aws-proserve-opensource@amazon.com

[![Release](https://img.shields.io/badge/release-2.17.0-brightgreen.svg)](https://pypi.org/project/awswrangler/)
[![Release](https://img.shields.io/badge/release-2.18.0-brightgreen.svg)](https://pypi.org/project/awswrangler/)
[![Python Version](https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9%20%7C%203.10-brightgreen.svg)](https://anaconda.org/conda-forge/awswrangler)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
Expand All @@ -26,7 +26,7 @@ Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, Q
| **[PyPi](https://pypi.org/project/awswrangler/)** | [![PyPI Downloads](https://pepy.tech/badge/awswrangler)](https://pypi.org/project/awswrangler/) | `pip install awswrangler` |
| **[Conda](https://anaconda.org/conda-forge/awswrangler)** | [![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/awswrangler.svg)](https://anaconda.org/conda-forge/awswrangler) | `conda install -c conda-forge awswrangler` |

> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/install.html#emr-cluster), [Glue PySpark Job](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/install.html#aws-glue-pyspark-jobs), MWAA):**<br>
> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/install.html#emr-cluster), [Glue PySpark Job](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/install.html#aws-glue-pyspark-jobs), MWAA):**<br>
➡️ `pip install pyarrow==2 awswrangler`

Powered By [<img src="https://arrow.apache.org/img/arrow.png" width="200">](https://arrow.apache.org/powered_by/)
Expand All @@ -44,7 +44,7 @@ Powered By [<img src="https://arrow.apache.org/img/arrow.png" width="200">](http

Installation command: `pip install awswrangler`

> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/install.html#emr-cluster), [Glue PySpark Job](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/install.html#aws-glue-pyspark-jobs), MWAA):**<br>
> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/install.html#emr-cluster), [Glue PySpark Job](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/install.html#aws-glue-pyspark-jobs), MWAA):**<br>
➡️`pip install pyarrow==2 awswrangler`

```py3
Expand Down Expand Up @@ -98,17 +98,17 @@ FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3

## [Read The Docs](https://aws-sdk-pandas.readthedocs.io/)

- [**What is AWS SDK for pandas?**](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/about.html)
- [**Install**](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/install.html)
- [PyPi (pip)](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/install.html#pypi-pip)
- [Conda](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/install.html#conda)
- [AWS Lambda Layer](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/install.html#aws-lambda-layer)
- [AWS Glue Python Shell Jobs](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/install.html#aws-glue-python-shell-jobs)
- [AWS Glue PySpark Jobs](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/install.html#aws-glue-pyspark-jobs)
- [Amazon SageMaker Notebook](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/install.html#amazon-sagemaker-notebook)
- [Amazon SageMaker Notebook Lifecycle](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/install.html#amazon-sagemaker-notebook-lifecycle)
- [EMR](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/install.html#emr)
- [From source](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/install.html#from-source)
- [**What is AWS SDK for pandas?**](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/about.html)
- [**Install**](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/install.html)
- [PyPi (pip)](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/install.html#pypi-pip)
- [Conda](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/install.html#conda)
- [AWS Lambda Layer](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/install.html#aws-lambda-layer)
- [AWS Glue Python Shell Jobs](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/install.html#aws-glue-python-shell-jobs)
- [AWS Glue PySpark Jobs](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/install.html#aws-glue-pyspark-jobs)
- [Amazon SageMaker Notebook](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/install.html#amazon-sagemaker-notebook)
- [Amazon SageMaker Notebook Lifecycle](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/install.html#amazon-sagemaker-notebook-lifecycle)
- [EMR](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/install.html#emr)
- [From source](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/install.html#from-source)
- [**Tutorials**](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials)
- [001 - Introduction](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/001%20-%20Introduction.ipynb)
- [002 - Sessions](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/002%20-%20Sessions.ipynb)
Expand Down Expand Up @@ -143,29 +143,29 @@ FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3
- [031 - OpenSearch](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/031%20-%20OpenSearch.ipynb)
- [032 - Lake Formation Governed Tables](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/032%20-%20Lake%20Formation%20Governed%20Tables.ipynb)
- [033 - Amazon Neptune](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/033%20-%20Amazon%20Neptune.ipynb)
- [**API Reference**](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html)
- [Amazon S3](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#amazon-s3)
- [AWS Glue Catalog](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#aws-glue-catalog)
- [Amazon Athena](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#amazon-athena)
- [AWS Lake Formation](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#aws-lake-formation)
- [Amazon Redshift](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#amazon-redshift)
- [PostgreSQL](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#postgresql)
- [MySQL](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#mysql)
- [SQL Server](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#sqlserver)
- [Oracle](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#oracle)
- [Data API Redshift](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#data-api-redshift)
- [Data API RDS](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#data-api-rds)
- [OpenSearch](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#opensearch)
- [Amazon Neptune](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#amazon-neptune)
- [DynamoDB](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#dynamodb)
- [Amazon Timestream](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#amazon-timestream)
- [Amazon EMR](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#amazon-emr)
- [Amazon CloudWatch Logs](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#amazon-cloudwatch-logs)
- [Amazon Chime](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#amazon-chime)
- [Amazon QuickSight](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#amazon-quicksight)
- [AWS STS](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#aws-sts)
- [AWS Secrets Manager](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#aws-secrets-manager)
- [Global Configurations](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html#global-configurations)
- [**API Reference**](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html)
- [Amazon S3](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#amazon-s3)
- [AWS Glue Catalog](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#aws-glue-catalog)
- [Amazon Athena](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#amazon-athena)
- [AWS Lake Formation](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#aws-lake-formation)
- [Amazon Redshift](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#amazon-redshift)
- [PostgreSQL](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#postgresql)
- [MySQL](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#mysql)
- [SQL Server](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#sqlserver)
- [Oracle](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#oracle)
- [Data API Redshift](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#data-api-redshift)
- [Data API RDS](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#data-api-rds)
- [OpenSearch](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#opensearch)
- [Amazon Neptune](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#amazon-neptune)
- [DynamoDB](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#dynamodb)
- [Amazon Timestream](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#amazon-timestream)
- [Amazon EMR](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#amazon-emr)
- [Amazon CloudWatch Logs](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#amazon-cloudwatch-logs)
- [Amazon Chime](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#amazon-chime)
- [Amazon QuickSight](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#amazon-quicksight)
- [AWS STS](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#aws-sts)
- [AWS Secrets Manager](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#aws-secrets-manager)
- [Global Configurations](https://aws-sdk-pandas.readthedocs.io/en/2.18.0/api.html#global-configurations)
- [**License**](https://github.com/aws/aws-sdk-pandas/blob/main/LICENSE.txt)
- [**Contributing**](https://github.com/aws/aws-sdk-pandas/blob/main/CONTRIBUTING.md)

Expand Down
2 changes: 1 addition & 1 deletion awswrangler/__metadata__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@

__title__: str = "awswrangler"
__description__: str = "Pandas on AWS."
__version__: str = "2.17.0"
__version__: str = "2.18.0"
__license__: str = "Apache License 2.0"
16 changes: 8 additions & 8 deletions awswrangler/athena/_read.py
Original file line number Diff line number Diff line change
Expand Up @@ -692,11 +692,11 @@ def read_sql_query( # pylint: disable=too-many-arguments,too-many-locals

**Related tutorial:**

- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/2.17.0/
- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/2.18.0/
tutorials/006%20-%20Amazon%20Athena.html>`_
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/2.17.0/
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/2.18.0/
tutorials/019%20-%20Athena%20Cache.html>`_
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/2.17.0/
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/2.18.0/
tutorials/021%20-%20Global%20Configurations.html>`_

**There are three approaches available through ctas_approach and unload_approach parameters:**
Expand Down Expand Up @@ -760,7 +760,7 @@ def read_sql_query( # pylint: disable=too-many-arguments,too-many-locals
/athena.html#Athena.Client.get_query_execution>`_ .

For a practical example check out the
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/2.17.0/
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/2.18.0/
tutorials/024%20-%20Athena%20Query%20Metadata.html>`_!


Expand Down Expand Up @@ -1011,11 +1011,11 @@ def read_sql_table(

**Related tutorial:**

- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/2.17.0/
- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/2.18.0/
tutorials/006%20-%20Amazon%20Athena.html>`_
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/2.17.0/
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/2.18.0/
tutorials/019%20-%20Athena%20Cache.html>`_
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/2.17.0/
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/2.18.0/
tutorials/021%20-%20Global%20Configurations.html>`_

**There are two approaches to be defined through ctas_approach parameter:**
Expand Down Expand Up @@ -1060,7 +1060,7 @@ def read_sql_table(
/athena.html#Athena.Client.get_query_execution>`_ .

For a practical example check out the
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/2.17.0/
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/2.18.0/
tutorials/024%20-%20Athena%20Query%20Metadata.html>`_!


Expand Down
2 changes: 1 addition & 1 deletion awswrangler/s3/_read_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -864,7 +864,7 @@ def read_parquet_table(
This function MUST return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-sdk-pandas.readthedocs.io/en/2.17.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-sdk-pandas.readthedocs.io/en/2.18.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
columns : List[str], optional
Names of columns to read from the file(s).
validate_schema:
Expand Down
6 changes: 3 additions & 3 deletions awswrangler/s3/_read_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ def read_csv(
This function MUST return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-sdk-pandas.readthedocs.io/en/2.17.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-sdk-pandas.readthedocs.io/en/2.18.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
pandas_kwargs :
KEYWORD arguments forwarded to pandas.read_csv(). You can NOT pass `pandas_kwargs` explicitly, just add valid
Pandas arguments in the function call and awswrangler will accept it.
Expand Down Expand Up @@ -397,7 +397,7 @@ def read_fwf(
This function MUST return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-sdk-pandas.readthedocs.io/en/2.17.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-sdk-pandas.readthedocs.io/en/2.18.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
pandas_kwargs:
KEYWORD arguments forwarded to pandas.read_fwf(). You can NOT pass `pandas_kwargs` explicit, just add valid
Pandas arguments in the function call and awswrangler will accept it.
Expand Down Expand Up @@ -549,7 +549,7 @@ def read_json(
This function MUST return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-sdk-pandas.readthedocs.io/en/2.17.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-sdk-pandas.readthedocs.io/en/2.18.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
pandas_kwargs:
KEYWORD arguments forwarded to pandas.read_json(). You can NOT pass `pandas_kwargs` explicit, just add valid
Pandas arguments in the function call and awswrangler will accept it.
Expand Down
6 changes: 3 additions & 3 deletions awswrangler/s3/_write_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -302,18 +302,18 @@ def to_parquet( # pylint: disable=too-many-arguments,too-many-locals,too-many-b
concurrent_partitioning: bool
If True will increase the parallelism level during the partitions writing. It will decrease the
writing time and increase the memory usage.
https://aws-sdk-pandas.readthedocs.io/en/2.17.0/tutorials/022%20-%20Writing%20Partitions%20Concurrently.html
https://aws-sdk-pandas.readthedocs.io/en/2.18.0/tutorials/022%20-%20Writing%20Partitions%20Concurrently.html
mode: str, optional
``append`` (Default), ``overwrite``, ``overwrite_partitions``. Only takes effect if dataset=True.
For details check the related tutorial:
https://aws-sdk-pandas.readthedocs.io/en/2.17.0/tutorials/004%20-%20Parquet%20Datasets.html
https://aws-sdk-pandas.readthedocs.io/en/2.18.0/tutorials/004%20-%20Parquet%20Datasets.html
catalog_versioning : bool
If True and `mode="overwrite"`, creates an archived version of the table catalog before updating it.
schema_evolution : bool
If True allows schema evolution (new or missing columns), otherwise a exception will be raised. True by default.
(Only considered if dataset=True and mode in ("append", "overwrite_partitions"))
Related tutorial:
https://aws-sdk-pandas.readthedocs.io/en/2.17.0/tutorials/014%20-%20Schema%20Evolution.html
https://aws-sdk-pandas.readthedocs.io/en/2.18.0/tutorials/014%20-%20Schema%20Evolution.html
database : str, optional
Glue/Athena catalog: Database name.
table : str, optional
Expand Down
Loading