Skip to content

Releases: aliyun/aliyun-odps-python-sdk

v0.11.6

17 Apr 03:17
01a8f38
Compare
Choose a tag to compare

Features

  • Add support for cluster info and views in tables and table DDL output.
  • Add support for easier threaded writing and writing in multiple processes for TableWriter.

Enhancements

  • Use monotonic time to calculate timeout.
  • Add support for http+unix socket connection.
  • Optimize RequestsIO by introducing buffering and simplify threaded sync.
  • Revoke embedded requests and use buffered writer for table API by default.
  • Add cython converter for legacy decimal.
  • Store local configs inside context variables if possible.
  • Add support for zoneinfo of Python standard library since Python 3.9.
  • Show SQL statement when encountered ParseError.
  • (experimental) Add support for new V4 signature. The signature is turned off by default.
  • (experimental) Add support for accessing MaxCompute with AlibabaCloud credentials.
  • (experimental) Upgrade six and setuptools requirements under Python 3.12.

Bugfixes

  • Fix TypeError when calling open_resource to create resources.
  • Fix superset error when reuse_odps=true.
  • Fix errors when decimal digits are not sufficient.
  • Fix support for DataFrame UDFs under Python 3.11.
  • Fix timezone of arrow tunnel to make it consistent with record tunnels.
  • Fix comparison of date sequences in DataFrame.

Tests

  • Add check for malicious tunnel requests.

Documentation

  • Fix default timezone document.
  • Add detailed usage for reading table data under multiprocessing and multiple threads.
  • Add detailed info for low-level tunnel interfaces.

Compatibility issues

  • Timestamp objects obtained with arrow tunnel now uses local timezone instead of UTC to keep consistency with record tunnels. Please update your code when you've already done manual timezone conversion.

v0.11.5.post0

24 Jan 07:54
fa594e8
Compare
Choose a tag to compare

Bugfix

  1. Fix attribute errors for table preview and storage API.

v0.11.5

05 Jan 10:59
4c12aa6
Compare
Choose a tag to compare

Features

  • Add support for arrow table preview reader
  • Enhance support for Apache Superset
  • Add support for storage tier on tables and partitions
  • (Experimental) Add support for tunnel upsert
  • (Experimental) Add image argument for DataFrame

Bugfixes

  • Fill partition value for tunnel records
  • Use PERCENTILE_APPROX for doubles under ODPS 2.0
  • Convert all requirement files to UNIX format for pyodps-pack
  • Fix error when reloading volume tunnel session
  • Fix logview setting not working in options
  • Dump SQL statement when encountered ParseError
  • Remove misplaced warnings when pickling user functions
  • Fix errors of to_pandas for InSessionInstance readers
  • Fix position of tablesample clause for sample
  • Fix compatibility for SQLAlchemy 2.0
  • Fix results of value_counts when values are None
  • Remove empty equal mark for url actions
  • Stop copying and caching for DataFrame(pd).persist if possible to reduce memory usage
  • Fix missing quotaName in full lifecycle of tunnel requests
  • Fill partition value for tunnel records
  • Fix starting of Mars notebook and Mars import in some case
  • Delete deflate Content-Encoding header for halo in storage api

Enhancements

  • Supports scanning dependencies for pkg_resources
  • Add PEP517 args for pyodps-pack
  • Persist pandas dataframes in batches
  • Use date in response headers to replace fields in Schemas
  • Add detailed logs for sign server on errors
  • Make option context as thread locals
  • Adapt to extended types for ODPS arrow format
  • Supports schema API along with SQL implementations
  • Add support for MaxFieldSize passed by server end
  • Add options to allow keeping resources for DataFrame
  • Add support for timestamp_ntz type
  • Refine error message for malfunctioning create instance response
  • Allow adding custom log handlers to support displaying logs in notebook kernels
  • Allow using run_sql to execute merge smallfiles or compact commands
  • Allow specifying transactional table property
  • Unify verbose_log into standard Python logging and dump progress when waiting for instances
  • Return struct values as namedtuples by default and fix DataFrame customized functions on complex types
  • Add retry for BufferedRecordWriter when writing blocks
  • Reuse task utilities to simplify MCQA submission

Documentations

  • Fix pyodps-pack doc on docker requirements
  • Add doc for timezone setting
  • Make bare tunnel docs more explicit
  • Refine documents of instance tunnel limit

Compatibility issues

  • PyODPS now returns struct values as namedtuples for tunnels to keep consistency with UDFs. For most of the cases your code might still work. If it doesn't, try configuring options.struct_as_dict = True.
  • From v0.11.5 nullable property of columns is added for transactional tables, and default value for partition columns is False. If you use these column instances in some scenario, for instance, using them as common columns to create tables, non-nullable columns could be created and insertion of null values will result in errors. To ignore nullable flags in columns, try configuring sql.ignore_fields_not_null = True.

v0.11.5b2

20 Nov 11:35
4ce1c53
Compare
Choose a tag to compare
v0.11.5b2 Pre-release
Pre-release

Bugfixes

  • Stop copying and caching for DataFrame(pd).persist if possible to reduce memory usage.

v0.11.5b1

10 Nov 09:51
Compare
Choose a tag to compare
v0.11.5b1 Pre-release
Pre-release

Features

  • Add support for arrow table preview reader
  • Enhance support for Apache Superset
  • Add support for storage tier on tables and partitions
  • (Experimental) Add support for tunnel upsert

Bugfixes

  • Fill partition value for tunnel records
  • Use PERCENTILE_APPROX for doubles under ODPS 2.0
  • Convert all requirement files to UNIX format for pyodps-pack
  • Fix error when reloading volume tunnel session
  • Fix logview setting not working in options
  • Dump SQL statement when encountered ParseError
  • Remove misplaced warnings when pickling user functions
  • Fix errors of to_pandas for InSessionInstance readers
  • Fix position of tablesample clause for sample
  • Fix compatibility for SQLAlchemy 2.0
  • Fix results of value_counts when values are None
  • Remove empty equal mark for url actions

Enhancements

  • Supports scanning dependencies for pkg_resources
  • Add PEP517 args for pyodps-pack
  • Persist pandas dataframes in batches
  • Use date in response headers to replace fields in Schemas
  • Add detailed logs for sign server on errors
  • Make option context as thread locals
  • Adapt to extended types for ODPS arrow format
  • Supports schema API along with SQL implementations
  • Add support for MaxFieldSize passed by server end
  • Add options to allow keeping resources for DataFrame
  • Add support for timestamp_ntz type
  • Refine error message for malfunctioning create instance response
  • Allow adding custom log handlers to support displaying logs in notebook kernels
  • Allow using run_sql to execute merge smallfiles or compact commands
  • Allow specifying transactional table property
  • Unify verbose_log into standard Python logging and dump progress when waiting for instances
  • Return struct values as namedtuples by default and fix DataFrame customized functions on complex types

Documentations

  • Fix pyodps-pack doc on docker requirements
  • Add doc for timezone setting
  • Make bare tunnel docs more explicit
  • Refine documents of instance tunnel limit

Compatibility issues

  • PyODPS now returns struct values as namedtuples for tunnels to keep consistency with UDFs. For most of the cases your code might still work. If it doesn't, try configuring options.struct_as_dict = True.

v0.11.4.1

19 Jul 06:29
c5b897f
Compare
Choose a tag to compare

Enhancements

  • Reuse UDFs when code is same and without closures
  • Add function to show versions of dependencies
  • Make stream tunnel to write in blocks
  • Add quota_name params for various tunnel sessions
  • Refine MCQA execution API and fallback behavior
  • Supports JSON column type
  • Use TABLESAMPLE clause to implement sampling with frac or rows
  • Allow packing dynamic libraries with pyodps-pack
  • Auto resolve source dependencies in no docker mode in pyodps-pack

Bug fixes

  • Fix jump targets when jump instruction size changes
  • Fix auto-flush for arrow writers

v0.11.4.post0

19 May 02:13
95404d3
Compare
Choose a tag to compare

Deployment

  • Restrict urllib3 version to 1.x.

v0.11.4

18 May 08:28
0f130fb
Compare
Choose a tag to compare

Features

  • Add API-by-API implementation for storage API
  • Add retry for table read API
  • Add automatic submission for table write API

Bugfixes

  • Fix OSError caused by BPO-29097 under certain Python versions
  • Show composite error message when failed to parse data type

Enhancements

  • Drop support for Python 2.6
  • Add more options of pip into pyodps-pack
  • Show more information when command not found on pyodps-pack
  • Refine creating ODPS instances from environment variables
  • Use modified requests library to simplify file-like writers
  • Optimize cython implementation of tunnel record IO by introducing more nogil marks
  • Refine error parsing and add tag of endpoint
  • Reduce calls of tenant APIs
  • Add options to read antique datetime as None
  • Add supports for minikube for pyodps-pack
  • Support yielding data while writing in arrow tunnels
  • Support to_pandas on slices of readers

Deployment

  • Fix dir missing on installing with source code with Jupyter

Tests

  • Migrate all tests to pytest

Documentation

  • Require jQuery for documentations
  • Add notifications for checking XFlow instances aster iter_xflow_subinstances.

Compatibility Issues

  • Supports of Python 2.6 is formally dropped since 0.11.4. Please use 0.11.3.1 or earlier versions.
  • Using async_ arguments as position arguments is deprecated. Please use it as a keyword argument.
  • BufferredRecordWriter is now renamed as BufferedRecordWriter. References to old class should be switched into new one.

v0.11.3.1

10 Apr 03:44
b9e6316
Compare
Choose a tag to compare

Enhancements

  • Add support for none-Docker mode for pyodps-pack. It now supports limited scenarios when Docker not available.
  • Reduce maximum memory cost of to_pandas() on tunnels by converting to pandas in batches
  • Supports complex types when calling to_pandas() on tunnels
  • Use default schema when odps.namespace.schema enabled on tenants, or options.always_enable_schema set to True
  • Make sure merging small files is available under schemas
  • (Experimental) Supports more functionality of external volumes

Bugfixes

  • Fixes tunnel writing when pd.NA is used

Documentation

  • Multiple documentation fixes

v0.11.3

10 Mar 11:15
6a51282
Compare
Choose a tag to compare

Features

  • Add new command line tool pyodps-pack to pack third-party libraries, recommended as standard packing mechanism
  • (Experimental) Add preliminary support for custom DataFrame functions with Python 3.8 / 3.9 / 3.10
  • Supports DataFrame column join methods
  • Support configuring instance settings via connection strings with SQLAlchemy
  • (Experimental) Supports external volume
  • Supports run_sql_interactive_with_fallback interface in pyodps
  • Supports get_max_partition for tables
  • Implements stream read and write of resources
  • Supports iterationg table partitions with logical conditions

Enhancements

  • Supports quota name when creating tunnels
  • Allow apply UDFs to use __getitem__
  • Reset table object when alter table is called
  • Allow persisting DataFrames with table objects
  • Support compression with LZ4 and zstd for tunnels
  • Remove dependencies of deprecated distutils package
  • Rename last_modified_time with last_data_modified_time for clarity
  • Refine error display and arrow format when fetching data with multiprocessing

Bug fixes

  • Fix downloading with multiprocessing under Windows
  • Fix negative timestamp issue under Windows
  • Fix multiple issues on Github
  • Resolve coded json when parsing
  • Add retry when encountering MetaTransactionFailed
  • Fix parsing RFC822 date when loading ODPS meta
  • Fix schema error when calling to_pandas() with columns specified
  • Fix handling NaNs with fillna
  • Fix compiling SQL scalars with different integer types

Compatibility Issues

  • As new Schema object level is introduced, it is now discouraged to use schema for table schemas and warnings will be produced. Try using table_schema instead when you code with the new version.
  • Attributes like creation_time now uses local time instead of UTC time for consistency with other datetime attributes by default. Switch to old behavior by setting options.use_legacy_parsedate = True.
  • Attribute last_modified_time on tables and partitions now renamed into last_data_modified_time for clarity. Warnings might be produced with old attribute names.