Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 2.12.1
current_version = 2.13.0
commit = False
tag = False
tag_name = {new_version}
Expand Down
6 changes: 3 additions & 3 deletions CONTRIBUTING_COMMON_ERRORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ Requirement already satisfied: pbr!=2.1.0,>=2.0.0 in ./.venv/lib/python3.7/site-
Using legacy 'setup.py install' for python-Levenshtein, since package 'wheel' is not installed.
Installing collected packages: awswrangler, python-Levenshtein
Attempting uninstall: awswrangler
Found existing installation: awswrangler 2.12.1
Uninstalling awswrangler-2.12.1:
Successfully uninstalled awswrangler-2.12.1
Found existing installation: awswrangler 2.13.0
Uninstalling awswrangler-2.13.0:
Successfully uninstalled awswrangler-2.13.0
Running setup.py develop for awswrangler
Running setup.py install for python-Levenshtein ... error
ERROR: Command errored out with exit status 1:
Expand Down
28 changes: 14 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, Clo

> An [AWS Professional Service](https://aws.amazon.com/professional-services/) open source initiative | aws-proserve-opensource@amazon.com

[![Release](https://img.shields.io/badge/release-2.12.1-brightgreen.svg)](https://pypi.org/project/awswrangler/)
[![Release](https://img.shields.io/badge/release-2.13.0-brightgreen.svg)](https://pypi.org/project/awswrangler/)
[![Python Version](https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8%20%7C%203.9-brightgreen.svg)](https://anaconda.org/conda-forge/awswrangler)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
Expand All @@ -23,7 +23,7 @@ Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, Clo
| **[PyPi](https://pypi.org/project/awswrangler/)** | [![PyPI Downloads](https://pepy.tech/badge/awswrangler)](https://pypi.org/project/awswrangler/) | `pip install awswrangler` |
| **[Conda](https://anaconda.org/conda-forge/awswrangler)** | [![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/awswrangler.svg)](https://anaconda.org/conda-forge/awswrangler) | `conda install -c conda-forge awswrangler` |

> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-data-wrangler.readthedocs.io/en/2.12.1/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/2.12.1/install.html#aws-glue-pyspark-jobs), MWAA):**<br>
> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-data-wrangler.readthedocs.io/en/2.13.0/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/2.13.0/install.html#aws-glue-pyspark-jobs), MWAA):**<br>
➡️ `pip install pyarrow==2 awswrangler`

Powered By [<img src="https://arrow.apache.org/img/arrow.png" width="200">](https://arrow.apache.org/powered_by/)
Expand All @@ -42,7 +42,7 @@ Powered By [<img src="https://arrow.apache.org/img/arrow.png" width="200">](http

Installation command: `pip install awswrangler`

> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-data-wrangler.readthedocs.io/en/2.12.1/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/2.12.1/install.html#aws-glue-pyspark-jobs), MWAA):**<br>
> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-data-wrangler.readthedocs.io/en/2.13.0/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/2.13.0/install.html#aws-glue-pyspark-jobs), MWAA):**<br>
➡️`pip install pyarrow==2 awswrangler`

```py3
Expand Down Expand Up @@ -96,17 +96,17 @@ FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3

## [Read The Docs](https://aws-data-wrangler.readthedocs.io/)

- [**What is AWS Data Wrangler?**](https://aws-data-wrangler.readthedocs.io/en/2.12.1/what.html)
- [**Install**](https://aws-data-wrangler.readthedocs.io/en/2.12.1/install.html)
- [PyPi (pip)](https://aws-data-wrangler.readthedocs.io/en/2.12.1/install.html#pypi-pip)
- [Conda](https://aws-data-wrangler.readthedocs.io/en/2.12.1/install.html#conda)
- [AWS Lambda Layer](https://aws-data-wrangler.readthedocs.io/en/2.12.1/install.html#aws-lambda-layer)
- [AWS Glue Python Shell Jobs](https://aws-data-wrangler.readthedocs.io/en/2.12.1/install.html#aws-glue-python-shell-jobs)
- [AWS Glue PySpark Jobs](https://aws-data-wrangler.readthedocs.io/en/2.12.1/install.html#aws-glue-pyspark-jobs)
- [Amazon SageMaker Notebook](https://aws-data-wrangler.readthedocs.io/en/2.12.1/install.html#amazon-sagemaker-notebook)
- [Amazon SageMaker Notebook Lifecycle](https://aws-data-wrangler.readthedocs.io/en/2.12.1/install.html#amazon-sagemaker-notebook-lifecycle)
- [EMR](https://aws-data-wrangler.readthedocs.io/en/2.12.1/install.html#emr)
- [From source](https://aws-data-wrangler.readthedocs.io/en/2.12.1/install.html#from-source)
- [**What is AWS Data Wrangler?**](https://aws-data-wrangler.readthedocs.io/en/2.13.0/what.html)
- [**Install**](https://aws-data-wrangler.readthedocs.io/en/2.13.0/install.html)
- [PyPi (pip)](https://aws-data-wrangler.readthedocs.io/en/2.13.0/install.html#pypi-pip)
- [Conda](https://aws-data-wrangler.readthedocs.io/en/2.13.0/install.html#conda)
- [AWS Lambda Layer](https://aws-data-wrangler.readthedocs.io/en/2.13.0/install.html#aws-lambda-layer)
- [AWS Glue Python Shell Jobs](https://aws-data-wrangler.readthedocs.io/en/2.13.0/install.html#aws-glue-python-shell-jobs)
- [AWS Glue PySpark Jobs](https://aws-data-wrangler.readthedocs.io/en/2.13.0/install.html#aws-glue-pyspark-jobs)
- [Amazon SageMaker Notebook](https://aws-data-wrangler.readthedocs.io/en/2.13.0/install.html#amazon-sagemaker-notebook)
- [Amazon SageMaker Notebook Lifecycle](https://aws-data-wrangler.readthedocs.io/en/2.13.0/install.html#amazon-sagemaker-notebook-lifecycle)
- [EMR](https://aws-data-wrangler.readthedocs.io/en/2.13.0/install.html#emr)
- [From source](https://aws-data-wrangler.readthedocs.io/en/2.13.0/install.html#from-source)
- [**Tutorials**](https://github.com/awslabs/aws-data-wrangler/tree/main/tutorials)
- [001 - Introduction](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/001%20-%20Introduction.ipynb)
- [002 - Sessions](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/002%20-%20Sessions.ipynb)
Expand Down
2 changes: 1 addition & 1 deletion awswrangler/__metadata__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@

__title__: str = "awswrangler"
__description__: str = "Pandas on AWS."
__version__: str = "2.12.1"
__version__: str = "2.13.0"
__license__: str = "Apache License 2.0"
16 changes: 8 additions & 8 deletions awswrangler/athena/_read.py
Original file line number Diff line number Diff line change
Expand Up @@ -626,11 +626,11 @@ def read_sql_query(

**Related tutorial:**

- `Amazon Athena <https://aws-data-wrangler.readthedocs.io/en/2.12.1/
- `Amazon Athena <https://aws-data-wrangler.readthedocs.io/en/2.13.0/
tutorials/006%20-%20Amazon%20Athena.html>`_
- `Athena Cache <https://aws-data-wrangler.readthedocs.io/en/2.12.1/
- `Athena Cache <https://aws-data-wrangler.readthedocs.io/en/2.13.0/
tutorials/019%20-%20Athena%20Cache.html>`_
- `Global Configurations <https://aws-data-wrangler.readthedocs.io/en/2.12.1/
- `Global Configurations <https://aws-data-wrangler.readthedocs.io/en/2.13.0/
tutorials/021%20-%20Global%20Configurations.html>`_

**There are two approaches to be defined through ctas_approach parameter:**
Expand Down Expand Up @@ -678,7 +678,7 @@ def read_sql_query(
/athena.html#Athena.Client.get_query_execution>`_ .

For a practical example check out the
`related tutorial <https://aws-data-wrangler.readthedocs.io/en/2.12.1/
`related tutorial <https://aws-data-wrangler.readthedocs.io/en/2.13.0/
tutorials/024%20-%20Athena%20Query%20Metadata.html>`_!


Expand Down Expand Up @@ -911,11 +911,11 @@ def read_sql_table(

**Related tutorial:**

- `Amazon Athena <https://aws-data-wrangler.readthedocs.io/en/2.12.1/
- `Amazon Athena <https://aws-data-wrangler.readthedocs.io/en/2.13.0/
tutorials/006%20-%20Amazon%20Athena.html>`_
- `Athena Cache <https://aws-data-wrangler.readthedocs.io/en/2.12.1/
- `Athena Cache <https://aws-data-wrangler.readthedocs.io/en/2.13.0/
tutorials/019%20-%20Athena%20Cache.html>`_
- `Global Configurations <https://aws-data-wrangler.readthedocs.io/en/2.12.1/
- `Global Configurations <https://aws-data-wrangler.readthedocs.io/en/2.13.0/
tutorials/021%20-%20Global%20Configurations.html>`_

**There are two approaches to be defined through ctas_approach parameter:**
Expand Down Expand Up @@ -960,7 +960,7 @@ def read_sql_table(
/athena.html#Athena.Client.get_query_execution>`_ .

For a practical example check out the
`related tutorial <https://aws-data-wrangler.readthedocs.io/en/2.12.1/
`related tutorial <https://aws-data-wrangler.readthedocs.io/en/2.13.0/
tutorials/024%20-%20Athena%20Query%20Metadata.html>`_!


Expand Down
2 changes: 1 addition & 1 deletion awswrangler/s3/_read_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -841,7 +841,7 @@ def read_parquet_table(
This function MUST return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-data-wrangler.readthedocs.io/en/2.12.1/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-data-wrangler.readthedocs.io/en/2.13.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
columns : List[str], optional
Names of columns to read from the file(s).
validate_schema:
Expand Down
6 changes: 3 additions & 3 deletions awswrangler/s3/_read_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ def read_csv(
This function MUST return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-data-wrangler.readthedocs.io/en/2.12.1/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-data-wrangler.readthedocs.io/en/2.13.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
pandas_kwargs :
KEYWORD arguments forwarded to pandas.read_csv(). You can NOT pass `pandas_kwargs` explicit, just add valid
Pandas arguments in the function call and Wrangler will accept it.
Expand Down Expand Up @@ -389,7 +389,7 @@ def read_fwf(
This function MUST return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-data-wrangler.readthedocs.io/en/2.12.1/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-data-wrangler.readthedocs.io/en/2.13.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
pandas_kwargs:
KEYWORD arguments forwarded to pandas.read_fwf(). You can NOT pass `pandas_kwargs` explicit, just add valid
Pandas arguments in the function call and Wrangler will accept it.
Expand Down Expand Up @@ -541,7 +541,7 @@ def read_json(
This function MUST return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-data-wrangler.readthedocs.io/en/2.12.1/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-data-wrangler.readthedocs.io/en/2.13.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
pandas_kwargs:
KEYWORD arguments forwarded to pandas.read_json(). You can NOT pass `pandas_kwargs` explicit, just add valid
Pandas arguments in the function call and Wrangler will accept it.
Expand Down
6 changes: 3 additions & 3 deletions awswrangler/s3/_write_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -300,18 +300,18 @@ def to_parquet( # pylint: disable=too-many-arguments,too-many-locals,too-many-b
concurrent_partitioning: bool
If True will increase the parallelism level during the partitions writing. It will decrease the
writing time and increase the memory usage.
https://aws-data-wrangler.readthedocs.io/en/2.12.1/tutorials/022%20-%20Writing%20Partitions%20Concurrently.html
https://aws-data-wrangler.readthedocs.io/en/2.13.0/tutorials/022%20-%20Writing%20Partitions%20Concurrently.html
mode: str, optional
``append`` (Default), ``overwrite``, ``overwrite_partitions``. Only takes effect if dataset=True.
For details check the related tutorial:
https://aws-data-wrangler.readthedocs.io/en/2.12.1/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet
https://aws-data-wrangler.readthedocs.io/en/2.13.0/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet
catalog_versioning : bool
If True and `mode="overwrite"`, creates an archived version of the table catalog before updating it.
schema_evolution : bool
If True allows schema evolution (new or missing columns), otherwise a exception will be raised. True by default.
(Only considered if dataset=True and mode in ("append", "overwrite_partitions"))
Related tutorial:
https://aws-data-wrangler.readthedocs.io/en/2.12.1/tutorials/014%20-%20Schema%20Evolution.html
https://aws-data-wrangler.readthedocs.io/en/2.13.0/tutorials/014%20-%20Schema%20Evolution.html
database : str, optional
Glue/Athena catalog: Database name.
table : str, optional
Expand Down
12 changes: 6 additions & 6 deletions awswrangler/s3/_write_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,18 +177,18 @@ def to_csv( # pylint: disable=too-many-arguments,too-many-locals,too-many-state
concurrent_partitioning: bool
If True will increase the parallelism level during the partitions writing. It will decrease the
writing time and increase the memory usage.
https://aws-data-wrangler.readthedocs.io/en/2.12.1/tutorials/022%20-%20Writing%20Partitions%20Concurrently.html
https://aws-data-wrangler.readthedocs.io/en/2.13.0/tutorials/022%20-%20Writing%20Partitions%20Concurrently.html
mode : str, optional
``append`` (Default), ``overwrite``, ``overwrite_partitions``. Only takes effect if dataset=True.
For details check the related tutorial:
https://aws-data-wrangler.readthedocs.io/en/2.12.1/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet
https://aws-data-wrangler.readthedocs.io/en/2.13.0/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet
catalog_versioning : bool
If True and `mode="overwrite"`, creates an archived version of the table catalog before updating it.
schema_evolution : bool
If True allows schema evolution (new or missing columns), otherwise a exception will be raised.
(Only considered if dataset=True and mode in ("append", "overwrite_partitions")). False by default.
Related tutorial:
https://aws-data-wrangler.readthedocs.io/en/2.12.1/tutorials/014%20-%20Schema%20Evolution.html
https://aws-data-wrangler.readthedocs.io/en/2.13.0/tutorials/014%20-%20Schema%20Evolution.html
database : str, optional
Glue/Athena catalog: Database name.
table : str, optional
Expand Down Expand Up @@ -750,18 +750,18 @@ def to_json( # pylint: disable=too-many-arguments,too-many-locals,too-many-stat
concurrent_partitioning: bool
If True will increase the parallelism level during the partitions writing. It will decrease the
writing time and increase the memory usage.
https://aws-data-wrangler.readthedocs.io/en/2.12.1/tutorials/022%20-%20Writing%20Partitions%20Concurrently.html
https://aws-data-wrangler.readthedocs.io/en/2.13.0/tutorials/022%20-%20Writing%20Partitions%20Concurrently.html
mode : str, optional
``append`` (Default), ``overwrite``, ``overwrite_partitions``. Only takes effect if dataset=True.
For details check the related tutorial:
https://aws-data-wrangler.readthedocs.io/en/2.12.1/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet
https://aws-data-wrangler.readthedocs.io/en/2.13.0/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet
catalog_versioning : bool
If True and `mode="overwrite"`, creates an archived version of the table catalog before updating it.
schema_evolution : bool
If True allows schema evolution (new or missing columns), otherwise a exception will be raised.
(Only considered if dataset=True and mode in ("append", "overwrite_partitions"))
Related tutorial:
https://aws-data-wrangler.readthedocs.io/en/2.12.1/tutorials/014%20-%20Schema%20Evolution.html
https://aws-data-wrangler.readthedocs.io/en/2.13.0/tutorials/014%20-%20Schema%20Evolution.html
database : str, optional
Glue/Athena catalog: Database name.
table : str, optional
Expand Down
6 changes: 3 additions & 3 deletions docs/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ Go to your Glue PySpark job and create a new *Job parameters* key/value:

To install a specific version, set the value for above Job parameter as follows:

* Value: ``cython==0.29.21,pg8000==1.21.0,pyarrow==2,pandas==1.3.0,awswrangler==2.12.1``
* Value: ``cython==0.29.21,pg8000==1.21.0,pyarrow==2,pandas==1.3.0,awswrangler==2.13.0``

.. note:: Pyarrow 3 is not currently supported in Glue PySpark Jobs, which is why a previous installation of pyarrow 2 is required.

Expand All @@ -139,7 +139,7 @@ Lambda zipped layers and Python wheels are stored in a publicly accessible S3 bu

* Python wheel: ``awswrangler-<version>-py3-none-any.whl``

For example: ``s3://aws-data-wrangler-public-artifacts/releases/2.12.1/awswrangler-layer-2.12.1-py3.8.zip``
For example: ``s3://aws-data-wrangler-public-artifacts/releases/2.13.0/awswrangler-layer-2.13.0-py3.8.zip``

Amazon SageMaker Notebook
-------------------------
Expand Down Expand Up @@ -231,7 +231,7 @@ complement Big Data pipelines.
sudo pip install pyarrow==2 awswrangler

.. note:: Make sure to freeze the Wrangler version in the bootstrap for productive
environments (e.g. awswrangler==2.12.1)
environments (e.g. awswrangler==2.13.0)

.. note:: Pyarrow 3 is not currently supported in the default EMR image, which is why a previous installation of pyarrow 2 is required.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/what.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL

Built on top of other open-source projects like `Pandas <https://github.com/pandas-dev/pandas>`_, `Apache Arrow <https://github.com/apache/arrow>`_ and `Boto3 <https://github.com/boto/boto3>`_, it offers abstracted functions to execute usual ETL tasks like load/unload data from **Data Lakes**, **Data Warehouses** and **Databases**.

Check our `tutorials <https://github.com/awslabs/aws-data-wrangler/tree/main/tutorials>`_ or the `list of functionalities <https://aws-data-wrangler.readthedocs.io/en/2.12.1/api.html>`_.
Check our `tutorials <https://github.com/awslabs/aws-data-wrangler/tree/main/tutorials>`_ or the `list of functionalities <https://aws-data-wrangler.readthedocs.io/en/2.13.0/api.html>`_.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "awswrangler"
version = "2.12.1"
version = "2.13.0"
description = "Pandas on AWS."
authors = ["Igor Tavares"]
license = "Apache License 2.0"
Expand Down
2 changes: 1 addition & 1 deletion test_infra/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "awswrangler - test infrastructure"
version = "2.12.1"
version = "2.13.0"
description = "CDK test infrastructure for AWS"
authors = ["Amazon Web Services"]
license = "Apache License 2.0"
Expand Down
2 changes: 1 addition & 1 deletion tests/test_metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@


def test_metadata():
assert wr.__version__ == "2.12.1"
assert wr.__version__ == "2.13.0"
assert wr.__title__ == "awswrangler"
assert wr.__description__ == "Pandas on AWS."
assert wr.__license__ == "Apache License 2.0"
Loading