Skip to content

Commit

Permalink
Upgrade to v0.11.2.3 (#194)
Browse files Browse the repository at this point in the history
  • Loading branch information
wjsi committed Dec 23, 2022
1 parent 61e34b2 commit d3a82e1
Show file tree
Hide file tree
Showing 23 changed files with 332 additions and 79 deletions.
2 changes: 2 additions & 0 deletions docs/source/df-basic.rst
Original file line number Diff line number Diff line change
Expand Up @@ -854,6 +854,8 @@ DataFrame 中相应字段的值决定该行将被写入的分区。例如,当
>>> iris[iris.sepalwidth < 2.5].persist('pyodps_iris4', partition='ds=test', drop_partition=True, create_partition=True)
persist 时,默认会覆盖原有数据。例如,当 persist 到一张分区表,对应分区的数据将会被重写。如果写入一张非分区表,整张表的数据都将被重写。如果你想要追加数据,可以使用参数 ``overwrite=False`` 。

写入表时,还可以指定表的生命周期,如下列语句将表的生命周期指定为10天:

.. code:: python
Expand Down
74 changes: 43 additions & 31 deletions docs/source/locale/en/LC_MESSAGES/df-basic.po
Original file line number Diff line number Diff line change
Expand Up @@ -1508,16 +1508,28 @@ msgid ""
msgstr ""

#: ../../source/df-basic.rst:857
msgid ""
"persist 时,默认会覆盖原有数据。例如,当 persist "
"到一张分区表,对应分区的数据将会被重写。如果写入一张非分区表,整张表的数据都将被重写。如果你想要追加数据,可以使用参数 "
"``overwrite=False`` 。"
msgstr ""
"Persisting a DataFrame will overwrite existing data by default. For "
"instance, when persisting into a partitioned table, data in corresponding "
"partitions will be overwritten, while persisting into an unpartitioned "
"table will overwrite all data in it. If you want to append data into "
"existing tables or partitions, you may add ``overwrite=False`` ."

#: ../../source/df-basic.rst:859
msgid "写入表时,还可以指定表的生命周期,如下列语句将表的生命周期指定为10天:"
msgstr ""
"You can also specify the lifecycle of a table when writing to it. The "
"following example sets the lifecycle of a table to 10 days."

#: ../../source/df-basic.rst:859
#: ../../source/df-basic.rst:861
msgid ">>> iris[iris.sepalwidth < 2.5].persist('pyodps_iris5', lifecycle=10)"
msgstr ""

#: ../../source/df-basic.rst:863
#: ../../source/df-basic.rst:865
msgid ""
"如果数据源中没有 ODPS 对象,例如数据源仅为 Pandas,在 persist 时需要手动指定 ODPS 入口对象, "
"或者将需要的入口对象标明为全局对象,如:"
Expand All @@ -1526,7 +1538,7 @@ msgstr ""
"data, you need to manually specify the ODPS object or mark the object as "
"global when calling persist. For example:"

#: ../../source/df-basic.rst:866
#: ../../source/df-basic.rst:868
msgid ""
">>> # 假设入口对象为 o\n"
">>> # 指定入口对象\n"
Expand All @@ -1542,25 +1554,25 @@ msgstr ""
">>> o.to_global()\n"
">>> df.persist('table_name')"

#: ../../source/df-basic.rst:876
#: ../../source/df-basic.rst:878
msgid "保存执行结果为 Pandas DataFrame"
msgstr "Save results to pandas DataFrame"

#: ../../source/df-basic.rst:878
#: ../../source/df-basic.rst:880
msgid "我们可以使用 ``to_pandas``\\ 方法,如果wrap参数为True,将返回PyODPS DataFrame对象。"
msgstr ""
"You can use the ``to_pandas``\\ method. If wrap is set to True, a PyODPS "
"DataFrame object is returned."

#: ../../source/df-basic.rst:880
#: ../../source/df-basic.rst:882
msgid ""
">>> type(iris[iris.sepalwidth < 2.5].to_pandas())\n"
"pandas.core.frame.DataFrame\n"
">>> type(iris[iris.sepalwidth < 2.5].to_pandas(wrap=True))\n"
"odps.df.core.DataFrame"
msgstr ""

#: ../../source/df-basic.rst:889
#: ../../source/df-basic.rst:891
msgid ""
"``to_pandas`` 返回的 pandas DataFrame 与直接通过 pandas 创建的 DataFrame 没有任何区别, "
"数据的存储和计算均在本地。如果 ``wrap=True``,生成的即便是 PyODPS DataFrame,数据依然在本地。 "
Expand All @@ -1574,47 +1586,47 @@ msgstr ""
" of data, or your running enviromnent is quite restrictive, please be "
"cautious when using ``to_pandas``."

#: ../../source/df-basic.rst:894
#: ../../source/df-basic.rst:896
msgid "立即运行设置运行参数"
msgstr "Set runtime parameters"

#: ../../source/df-basic.rst:896
#: ../../source/df-basic.rst:898
msgid ""
"对于立即执行的方法,比如 ``execute``、``persist``、``to_pandas`` 等,可以设置运行时参数(仅对ODPS "
"SQL后端有效 )。"
msgstr ""
"For actions such as `execute``, ``persist``, and ``to_pandas``, you can "
"set runtime parameters. This is only valid for MaxCompute SQL."

#: ../../source/df-basic.rst:898
#: ../../source/df-basic.rst:900
msgid "一种方法是设置全局参数。详细参考 :ref:`SQL设置运行参数 <sql_hints>` 。"
msgstr ""
"You can also set global parameters. For details, see :ref:`SQL - runtime "
"parameters <sql_hints>`."

#: ../../source/df-basic.rst:900
#: ../../source/df-basic.rst:902
msgid "也可以在这些立即执行的方法上,使用 ``hints`` 参数。这样,这些参数只会作用于当前的计算过程。"
msgstr ""
"Additionally, you can use the `hints`` parameter. These parameters are "
"only valid for the current calculation."

#: ../../source/df-basic.rst:903
#: ../../source/df-basic.rst:905
msgid ""
">>> iris[iris.sepallength < "
"5].to_pandas(hints={'odps.sql.mapper.split.size': 16})"
msgstr ""

#: ../../source/df-basic.rst:909
#: ../../source/df-basic.rst:911
msgid "运行时显示详细信息"
msgstr "Display details at runtime"

#: ../../source/df-basic.rst:911
#: ../../source/df-basic.rst:913
msgid "有时,用户需要查看运行时instance的logview时,需要修改全局配置:"
msgstr ""
"You sometimes need to modify the global configuration to view the logview"
" of an instance."

#: ../../source/df-basic.rst:913
#: ../../source/df-basic.rst:915
msgid ""
">>> from odps import options\n"
">>> options.verbose = True\n"
Expand All @@ -1636,11 +1648,11 @@ msgid ""
"4 2.9 1.4 0.2 Iris-setosa"
msgstr ""

#: ../../source/df-basic.rst:934
#: ../../source/df-basic.rst:936
msgid "用户可以指定自己的日志记录函数,比如像这样:"
msgstr "You can specify a logging function as follows:"

#: ../../source/df-basic.rst:936
#: ../../source/df-basic.rst:938
msgid ""
">>> my_logs = []\n"
">>> def my_logger(x):\n"
Expand All @@ -1663,11 +1675,11 @@ msgid ""
"\\nLIMIT 5', 'logview:', u'http://logview']"
msgstr ""

#: ../../source/df-basic.rst:956
#: ../../source/df-basic.rst:958
msgid "缓存中间Collection计算结果"
msgstr "Cache intermediate results"

#: ../../source/df-basic.rst:958
#: ../../source/df-basic.rst:960
msgid ""
"DataFrame的计算过程中,一些Collection被多处使用,或者用户需要查看中间过程的执行结果, 这时用户可以使用 ``cache``\\"
" 标记某个collection需要被优先计算。"
Expand All @@ -1677,13 +1689,13 @@ msgstr ""
"intermediate process, you can use the cache method to mark a collection "
"object so that it is calculated first."

#: ../../source/df-basic.rst:963
#: ../../source/df-basic.rst:965
msgid "值得注意的是,``cache``\\ 延迟执行,调用cache不会触发立即计算。"
msgstr ""
"Note that ``cache``\\ delays execution. Calling this method does not "
"trigger automatic calculation."

#: ../../source/df-basic.rst:965
#: ../../source/df-basic.rst:967
msgid ""
">>> cached = iris[iris.sepalwidth < 3.5].cache()\n"
">>> df = cached['sepallength', 'name'].head(3)\n"
Expand Down Expand Up @@ -1712,11 +1724,11 @@ msgstr ""
"1 4.7 Iris-setosa\n"
"2 4.6 Iris-setosa"

#: ../../source/df-basic.rst:984
#: ../../source/df-basic.rst:986
msgid "异步和并行执行"
msgstr "Asynchronous and parallel executions"

#: ../../source/df-basic.rst:986
#: ../../source/df-basic.rst:988
msgid ""
"DataFrame 支持异步操作,对于立即执行的方法,包括 "
"``execute``、``persist``、``head``、``tail``、``to_pandas`` (其他方法不支持), 传入 "
Expand All @@ -1732,7 +1744,7 @@ msgstr ""
"<https://docs.python.org/3/library/concurrent.futures.html#future-"
"objects>`_ objects."

#: ../../source/df-basic.rst:990
#: ../../source/df-basic.rst:992
msgid ""
">>> future = iris[iris.sepal_width < 10].head(10, async=True)\n"
">>> future.done()\n"
Expand All @@ -1751,7 +1763,7 @@ msgid ""
"9 4.9 3.1 1.5 0.1 Iris-setosa"
msgstr ""

#: ../../source/df-basic.rst:1009
#: ../../source/df-basic.rst:1011
msgid ""
"DataFrame 的并行执行可以使用多线程来并行,单个 expr 的执行可以通过 ``n_parallel`` 参数来指定并发度。 比如,当一个"
" DataFrame 的执行依赖的多个 cache 的 DataFrame 能够并行执行时,该参数就会生效。"
Expand All @@ -1762,7 +1774,7 @@ msgstr ""
"the multiple cached DataFrame objects that a single DataFrame execution "
"depends on can be executed in parallel."

#: ../../source/df-basic.rst:1012
#: ../../source/df-basic.rst:1014
msgid ""
">>> expr1 = "
"iris.groupby('category').agg(value=iris.sepal_width.sum()).cache()\n"
Expand Down Expand Up @@ -1807,14 +1819,14 @@ msgstr ""
"7 Iris-versicolor 3.000\n"
"8 Iris-virginica 4.500"

#: ../../source/df-basic.rst:1032
#: ../../source/df-basic.rst:1034
msgid "当同时执行多个 expr 时,我们可以用多线程执行,但会面临一个问题, 比如两个 DataFrame 有共同的依赖,这个依赖将会被执行两遍。"
msgstr ""
"You can use multiple threads to execute multiple expr objects in "
"parallel, but you may encounter a problem when two DataFrame objects "
"share the same dependency, and this dependency will be executed twice."

#: ../../source/df-basic.rst:1035
#: ../../source/df-basic.rst:1037
msgid ""
"现在我们提供了新的 ``Delay API``, 来将立即执行的操作(包括 "
"``execute``、``persist``、``head``、``tail``、``to_pandas``,其他方法不支持)变成延迟操作, "
Expand All @@ -1829,7 +1841,7 @@ msgstr ""
"dependency is executed based on the degree of parallelism you have "
"specified. Asynchronous execution is supported."

#: ../../source/df-basic.rst:1040
#: ../../source/df-basic.rst:1042
msgid ""
">>> from odps.df import Delay\n"
">>> delay = Delay() # 创建Delay对象\n"
Expand Down Expand Up @@ -1865,7 +1877,7 @@ msgstr ""
">>> future2.result()\n"
"3.0540000000000007"

#: ../../source/df-basic.rst:1057
#: ../../source/df-basic.rst:1059
msgid ""
"可以看到上面的例子里,共同依赖的对象会先执行,然后再以并发度为3分别执行future1到future3。 当 ``n_parallel`` "
"为1时,执行时间会达到37s。"
Expand All @@ -1875,7 +1887,7 @@ msgstr ""
"parallelism set to 3. When ``n_parallel`` is set to 1, the execution time"
" reaches 37s."

#: ../../source/df-basic.rst:1060
#: ../../source/df-basic.rst:1062
msgid ""
"``delay.execute`` 也接受 ``async`` 操作来指定是否异步执行,当异步的时候,也可以指定 ``timeout`` "
"参数来指定超时时间。"
Expand Down
2 changes: 1 addition & 1 deletion docs/source/options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ PyODPS 提供了一系列的配置选项,可通过 ``odps.options`` 获得,
+------------------------+---------------------------------------------------+-------+
|pool_maxsize | 连接池最大容量 |10 |
+------------------------+---------------------------------------------------+-------+
|connect_timeout | 连接超时 |10 |
|connect_timeout | 连接超时 |120 |
+------------------------+---------------------------------------------------+-------+
|read_timeout | 读取超时 |120 |
+------------------------+---------------------------------------------------+-------+
Expand Down
2 changes: 1 addition & 1 deletion odps/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

version_info = (0, 11, 2, 2)
version_info = (0, 11, 2, 3)
_num_index = max(idx if isinstance(v, int) else 0
for idx, v in enumerate(version_info))
__version__ = '.'.join(map(str, version_info[:_num_index + 1])) + \
Expand Down
2 changes: 1 addition & 1 deletion odps/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@

DEFAULT_CHUNK_SIZE = 1496
DEFAULT_CONNECT_RETRY_TIMES = 4
DEFAULT_CONNECT_TIMEOUT = 10
DEFAULT_CONNECT_TIMEOUT = 120
DEFAULT_READ_TIMEOUT = 120
DEFAULT_POOL_CONNECTIONS = 10
DEFAULT_POOL_MAXSIZE = 10
Expand Down
24 changes: 23 additions & 1 deletion odps/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
from .rest import RestClient
from .config import options
from .errors import NoSuchObject
from .errors import ODPSError
from .tempobj import clean_stored_objects
from .utils import split_quoted
from .compat import six, Iterable
Expand Down Expand Up @@ -1873,7 +1874,7 @@ def create_session(self, session_worker_count, session_worker_memory,
def run_sql_interactive(self, sql, hints=None, **kwargs):
"""
Run SQL query in interactive mode (a.k.a MaxCompute QueryAcceleration).
Won't fallback to offline mode automatically if query not supported or fails
:param sql: the sql query.
:param hints: settings for sql query.
:return: instance.
Expand All @@ -1894,6 +1895,27 @@ def run_sql_interactive(self, sql, hints=None, **kwargs):
self._default_session_name = service_name
return self._default_session.run_sql(sql, hints, **kwargs)

def run_sql_interactive_with_fallback(self, sql, hints=None, **kwargs):
"""
Run SQL query in interactive mode (a.k.a MaxCompute QueryAcceleration).
If query is not supported or fails, will fallback to offline mode automatically
:param sql: the sql query.
:param hints: settings for sql query.
:return: instance.
"""
inst = None
try:
if inst is None:
inst = self.run_sql_interactive(self, sql, hints=hints, **kwargs)
else:
inst.wait_for_success(interval=0.2)
rd = inst.open_reader(tunnel=True, limit=False)
if not rd:
raise ODPSError('Get sql result fail')
return inst
except:
return self.execute_sql(sql, hints=hints)

@classmethod
def _build_account(cls, access_id, secret_access_key):
return accounts.AliyunAccount(access_id, secret_access_key)
Expand Down
8 changes: 4 additions & 4 deletions odps/df/backends/odpssql/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,8 @@ def prepare_resources(self, libraries):
tar.close()

res_name = self._gen_resource_name() + '.tar.gz'
res = self._odps.create_resource(res_name, 'archive', file_obj=tarbinary.getvalue())
res = self._odps.create_resource(res_name, 'archive', file_obj=tarbinary.getvalue(), temp=True)
tempobj.register_temp_resource(self._odps, res_name)
self._path_to_resources[lib] = res
self._to_drops.append(res)
ret_libs.append(res)
Expand All @@ -166,7 +167,7 @@ def create_udfs(self, libraries=None):

for func, udf in six.iteritems(self._func_to_udfs):
udf_name = self._registered_funcs[func]
py_resource = self._odps.create_resource(udf_name + '.py', 'py', file_obj=udf)
py_resource = self._odps.create_resource(udf_name + '.py', 'py', file_obj=udf, temp=True)
tempobj.register_temp_resource(self._odps, udf_name + '.py')
self._to_drops.append(py_resource)

Expand All @@ -176,8 +177,7 @@ def create_udfs(self, libraries=None):
if not create:
resources.append(name)
else:
res = self._odps.create_resource(name, 'table',
table_name=table_name)
res = self._odps.create_resource(name, 'table', table_name=table_name, temp=True)
tempobj.register_temp_resource(self._odps, name)
resources.append(res)
self._to_drops.append(res)
Expand Down
2 changes: 1 addition & 1 deletion odps/df/backends/odpssql/tests/test_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -3515,7 +3515,7 @@ def testComposites(self):
expr = expr_in[expr_in.name, expr_in.detail.values().explode()]
res = self.engine.execute(expr)
result = self._get_result(res)
self.assertEqual(result, expected)
self.assertEqual(sorted(result), sorted(expected))

expected = [
['name1', 4.0, 2.0, False, False, ['HKG', 'PEK', 'SHA', 'YTY'],
Expand Down

0 comments on commit d3a82e1

Please sign in to comment.