Skip to content

Commit

Permalink
PullRequest: 58 merge github 2524
Browse files Browse the repository at this point in the history
Merge branch merge_github_2524 of git@gitlab.alipay-inc.com:ray-project/mars.git into master
https://code.alipay.com/ray-project/mars/pull_requests/58?tab=diff

Signed-off-by: 捕牛 <hejialing.hjl@antgroup.com>


* [Ray] Support reconstructing worker (mars-project#2413)


* Make cmdline support third party modules (mars-project#2454)

Co-authored-by: hanguang <zhusiyuan.zsy@alibaba-inc.com>
* Support visualizing subtask graphs on Mars Web (mars-project#2426)


* Fix timeout error when waiting for a submitted task (mars-project#2457)


* Print the error message when error happens in `TaskProcessor` (mars-project#2458)


* Add nightly builds for docker images (mars-project#2456)


* Fix misuse of `name` parameter in DataFrame align (mars-project#2469)


* Fix hang when start sub pool fails (mars-project#2468)


* Refine and unify subtask detail APIs (mars-project#2465)


* Fix coverage for Azure pipeline (mars-project#2474)


* Split tileable information and subtask graph into two tabs (mars-project#2480)


* Support specified vineyard socket and skip the launching vineyardd process (mars-project#2481)


* Basic reschedule subtask (mars-project#2467)


* Compatible with scikit-learn 1.0 (mars-project#2486)

Co-authored-by: hekaisheng <kaisheng.hks@alibaba-inc.com>
* Fix wrong translation in cluster deployment. (mars-project#2489)


* Fix bug that failed to execute query when there are multiple arguments (mars-project#2490)


* Include tileable property in detail api (mars-project#2493)


* Fix version of statsmodels to pass CI (mars-project#2497)


* Implements `glm.LogisticRegression` (mars-project#2466)


* Implements bagging sampling (mars-project#2496)


* Refine MarsDMatrix & support more parameters for XGB classifier and regressor (mars-project#2498)


* Fix output of df.groupby(as_index=False).size() (mars-project#2507)


* Add preliminary implementations for ufunc methods (mars-project#2510)


* Add doc for reading csv in oss (mars-project#2514)


* [Ray] Fix serializing lambdas in web (mars-project#2512)


* Add `make_regression` support for learn module (mars-project#2515)


* Fix reduction result on empty series (mars-project#2520)


* Fix df.loc when df is empty (mars-project#2524)


* fix start subpool

* fix test_kill_and_wait_timeout

* fix autoscale timeout

* fix ray larger clsuter fixture

* Update ci ray package to 1.2.2

* remove python3.6 3.8 .39 ut and upgrade ray 3.7 image

* echo python path

* fix json decode error

* fix bundle release timeout

* fix remove placement group timeout

* fix no_restart

* fix ci

* fix autoscale
  • Loading branch information
chaokunyang committed Oct 27, 2021
1 parent 4522b5f commit 7bf62f7
Show file tree
Hide file tree
Showing 133 changed files with 4,288 additions and 970 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/core-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ jobs:
source ./ci/reload-env.sh
export DEFAULT_VENV=$VIRTUAL_ENV
if [[ ! "$PYTHON" =~ "3.9" ]]; then
if [[ ! "$PYTHON" =~ "3.6" ]]; then
conda install -n test --quiet --yes -c conda-forge python=$PYTHON numba
fi
Expand Down
13 changes: 11 additions & 2 deletions .github/workflows/docker-cd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@ name: Mars CD for DockerHub

on:
push:
branches:
- master
- main
- 'v*'
tags:
- '*'

Expand All @@ -27,5 +31,10 @@ jobs:
if [[ "$DOCKER_ORG" == "marsuploader" ]]; then
export DOCKER_ORG="marsproject"
fi
bash bin/kube-image-tool.sh -o "$DOCKER_ORG" -t "$GIT_TAG" build
docker push "$DOCKER_ORG/mars:$GIT_TAG"
if [[ -n "$GIT_TAG" ]]; then
export IMAGE_TAG="$GIT_TAG"
else
export IMAGE_TAG="nightly-$GIT_BRANCH"
fi
bash bin/kube-image-tool.sh -o "$DOCKER_ORG" -t "$IMAGE_TAG" build
docker push "$DOCKER_ORG/mars:$IMAGE_TAG"
2 changes: 1 addition & 1 deletion .github/workflows/install-hadoop.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ sudo npm uninstall -g yarn || true

sudo apt-get install -yq ssh rsync

VERSION=3.3.0
VERSION=3.3.1
HADOOP_URL="https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download&filename=hadoop/common/hadoop-$VERSION/hadoop-$VERSION.tar.gz"

# download hadoop
Expand Down
31 changes: 4 additions & 27 deletions .upload.aci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ upload_py36_wheel_to_oss:
inputs:
image: ${ACI_VAR_MARS_IMAGES_FOR_RELEASE_PY36}
command:
- bash ci/antbuild.sh
- |
python setup.py bdist_wheel
pluginConfig:
appname: mars
source: ray-project
Expand Down Expand Up @@ -79,7 +80,8 @@ upload_py38_wheel_to_oss:
inputs:
image: ${ACI_VAR_MARS_IMAGES_FOR_RELEASE_PY38}
command:
- bash ci/antbuild.sh
- |
python setup.py bdist_wheel
pluginConfig:
appname: mars
source: ray-project
Expand All @@ -91,28 +93,3 @@ upload_py38_wheel_to_oss:
only:
change:
- ^mars/*

upload_py39_wheel_to_oss:
stage: 上传wheel包
aciTags: DOCKER
agent:
resourceClass: L
steps:
- plugin: clone
- plugin: shell
defaultWebTerminal: true
inputs:
image: ${ACI_VAR_MARS_IMAGES_FOR_RELEASE_PY39}
command:
- bash ci/antbuild.sh
pluginConfig:
appname: mars
source: ray-project
type: UT
publisher:
archiveArtifacts:
artifacts: '**/dist/pymars-*-cp39-cp39-linux_x86_64.whl'
allowEmptyArchive: false
only:
change:
- ^mars/*
2 changes: 2 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,8 @@ MIT License
- material-ui:4.0.0
- lodash:4.17.0
- dagre-d3:0.6.4
- cytoscape:3.19.1
- cytoscape-dagre:2.3.2


ISC License
Expand Down
7 changes: 6 additions & 1 deletion azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,9 @@ jobs:
conda install -n test --quiet --yes -c pkgs/main python=$PYTHON certifi
if [[ "$(mars.test.module)" == "learn" ]]; then
# remove version limit when blue-yonder/tsfresh#897 is fixed.
pip install xgboost lightgbm tensorflow faiss-cpu torch torchvision \
statsmodels tsfresh
statsmodels\<0.13.0 tsfresh
fi
fi
conda list -n test
Expand All @@ -98,6 +99,10 @@ jobs:
displayName: 'Run tests'
- bash: |
GIT_ORIGIN="$(git config --get remote.origin.url)"
if [[ "$GIT_ORIGIN" == *"mars-project/mars"* ]]; then
export CODECOV_TOKEN="f578555e-b4f4-49a8-8d33-b21a305299ee"
fi
bash <(curl -s https://codecov.io/bash)
displayName: 'Upload coverage'
Expand Down
16 changes: 3 additions & 13 deletions ci/antbuild.sh
Original file line number Diff line number Diff line change
Expand Up @@ -54,24 +54,14 @@ $OSS_HTTP_PREFIX/$(ls pyodps*)
EOF

echo "Upload files..."
FILES=(dist/"$(ls dist)" dsmp/dist/"$(ls dsmp/dist)" "$(ls pyodp*)" package.txt requirements.txt
client/mars_driver.py client/odps_op_with_dsmp.py)
FILES=(dist/"$(ls dist)" dsmp/dist/"$(ls dsmp/dist)" "$(ls pyodp*)" package.txt requirements.tx client/mars_driver.py client/odps_op_with_dsmp.py)
for f in "${FILES[@]}"; do
set +x

FILE_NAME=$(basename $f)
if [[ $FILE_NAME == pymars-* ]]; then
IFS='-' read -ra wheel_name_array <<< "$FILE_NAME"
# If in release branch, wheel_name_array[1] == $VERSION
wheel_name_array[1]=$VERSION
WHEEL_NAME=$(IFS="-" ; echo "${wheel_name_array[*]}")
else
WHEEL_NAME=$FILE_NAME
fi
echo "Upload $WHEEL_NAME to oss"

echo "Upload $FILE_NAME to oss"
osscmd put --headers="x-oss-forbid-overwrite:$FORBID_OVERWRITE" "$WORKSPACE_DIR/$f" \
oss://rayoltest/"$PKG_PATH/$WHEEL_NAME" \
oss://rayoltest/"$PKG_PATH/$FILE_NAME" \
config --id=LTAI4GKK6Chbk8gNULPpYV4j --key="${ACI_VAR_MARS_PACK_OSS_KEY}" \
--host=oss-cn-hangzhou-zmf.aliyuncs.com;
set -x
Expand Down
4 changes: 4 additions & 0 deletions ci/reload-env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ if [[ "$GITHUB_REF" =~ ^"refs/tags/" ]]; then
export GITHUB_TAG_REF="$GITHUB_REF"
unset CYTHON_TRACE
export GIT_TAG=$(echo "$GITHUB_REF" | sed -e "s/refs\/tags\///g")
else
export GIT_BRANCH=$(echo "$GITHUB_REF" | sed -e "s/refs\/heads\///g")
fi

if [[ $UNAME == "mingw"* ]] || [[ $UNAME == "msys"* ]]; then
Expand Down Expand Up @@ -91,3 +93,5 @@ shopt -s expand_aliases
# TODO(buniu): ant internal ray hava some bug.
# Enable gcs when bug fix.
export RAY_GCS_TASK_SCHEDULING_ENABLED=true

echo "python path: $(which python)"
16 changes: 16 additions & 0 deletions docs/source/getting_started/dataframe.rst
Original file line number Diff line number Diff line change
Expand Up @@ -143,3 +143,19 @@ DataFrame can be saved to CSV etc.
Index: []
Refer to :ref:`api.dataframe.io` for more information.

DataFrame can be read from CSV which is saved in `OSS <https://help.aliyun.com/document_detail/32027.html>`_.

.. code-block:: python
>>> import mars.dataframe as md
>>> from mars.lib.filesystem.oss import build_oss_path
>>> access_key_id = your_oss_id
>>> access_key_secret = your_oss_key
>>> end_point = oss_endpoint
>>> file_path = 'oss://buckey/path/to/data.csv'
>>> auth_path = build_oss_path(file_path, access_key_id, access_key_secret, end_point)
>>> df = md.read_csv(auth_path).execute()
You can find AccessKey by `AcessKey Documentation <https://help.aliyun.com/document_detail/53045.htm?spm=a2c4g.11186623.0.0.401f4b78nWS2uH#task968>`_
and get endpoint by `Comparison table of OSS Region and Endpoint <https://help.aliyun.com/document_detail/31837.htm?spm=a2c4g.11186623.0.0.cd7c7041aR2CL9>`_.
4 changes: 2 additions & 2 deletions docs/source/installation/kubernetes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -104,10 +104,10 @@ Arguments for workers:
+====================+================================================================+
| worker_num | Number of workers in the cluster, 1 by default |
+--------------------+----------------------------------------------------------------+
| worker_cpu | Number of CPUs for every worker |
| worker_cpu | Number of CPUs for every worker, required. |
+--------------------+----------------------------------------------------------------+
| worker_mem | Memory size for workers in the cluster, in bytes or size units |
| | like ``1g`` |
| | like ``1g``, required. |
+--------------------+----------------------------------------------------------------+
| worker_spill_paths | List of spill paths for worker pods on hosts |
+--------------------+----------------------------------------------------------------+
Expand Down
4 changes: 2 additions & 2 deletions docs/source/installation/yarn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -122,10 +122,10 @@ Arguments for workers:
+====================+================================================================+
| worker_num | Number of workers in the cluster, 1 by default |
+--------------------+----------------------------------------------------------------+
| worker_cpu | Number of CPUs for every worker |
| worker_cpu | Number of CPUs for every worker, required. |
+--------------------+----------------------------------------------------------------+
| worker_mem | Memory size for workers in the cluster, in bytes or size units |
| | like ``1g`` |
| | like ``1g``, required. |
+--------------------+----------------------------------------------------------------+
| worker_spill_paths | List of spill paths for worker pods on hosts |
+--------------------+----------------------------------------------------------------+
Expand Down
11 changes: 11 additions & 0 deletions docs/source/locale/zh_CN/LC_MESSAGES/getting_started/dataframe.po
Original file line number Diff line number Diff line change
Expand Up @@ -103,3 +103,14 @@ msgstr "DataFrame 能被保存到 CSV 文件。"
msgid "Refer to :ref:`api.dataframe.io` for more information."
msgstr "参考 :ref:`api.dataframe.io` 获取更多信息。"

#: ../../source/getting_started/dataframe.rst:147
msgid "DataFrame can be read from CSV which is saved in `OSS <https://help.aliyun.com/document_detail/32027.html>`_."
msgstr "DataFrame 可以从被保存在 OSS 上的 CSV 文件中读取。"

#: ../../source/getting_started/dataframe.rst:160
msgid "You can find AccessKey by `AcessKey Documentation <https://help.aliyun.com/document_detail/53045.htm?spm=a2c4g.11186623.0.0.401f4b78nWS2uH#task968>`_"
msgstr "你可以通过 `AcessKey 文档 <https://help.aliyun.com/document_detail/32027.html>`_ 描述的方法获取你的 AccessKey"

#: ../../source/getting_started/dataframe.rst:161
msgid "and get endpoint by `Comparison table of OSS Region and Endpoint <https://help.aliyun.com/document_detail/31837.htm?spm=a2c4g.11186623.0.0.cd7c7041aR2CL9>`_."
msgstr "并且,你可以通过 `公共云下 OSS Region 和 Endpoint 对照表 <https://help.aliyun.com/document_detail/31837.htm?spm=a2c4g.11186623.0.0.cd7c7041aR2CL9>`_ 获得对应 bucket 的 endpoint 。"
4 changes: 2 additions & 2 deletions docs/source/locale/zh_CN/LC_MESSAGES/installation/deploy.po
Original file line number Diff line number Diff line change
Expand Up @@ -217,11 +217,11 @@ msgstr ""

#: ../../source/installation/deploy.rst:112
msgid "On Supervisor 1 (192.168.1.10):"
msgstr "在 Worker 1(192.168.1.20)上"
msgstr "在 Supervisor 1(192.168.1.10)上"

#: ../../source/installation/deploy.rst:118
msgid "On Supervisor 2 (192.168.1.11):"
msgstr "在 Worker 2(192.168.1.21)上"
msgstr "在 Supervisor 2(192.168.1.11)上"

#: ../../source/installation/deploy.rst:124
msgid "On Worker 1 (192.168.1.20):"
Expand Down
14 changes: 9 additions & 5 deletions docs/source/locale/zh_CN/LC_MESSAGES/installation/kubernetes.po
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: mars 0.5.0a2\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2021-08-04 17:13+0800\n"
"POT-Creation-Date: 2021-09-16 16:05+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
Expand Down Expand Up @@ -169,16 +169,20 @@ msgid "worker_cpu"
msgstr ""

#: ../../source/installation/kubernetes.rst:107
msgid "Number of CPUs for every worker"
msgstr "每个 Worker 的 CPU 数目"
msgid "Number of CPUs for every worker, required."
msgstr "每个 Worker 的 CPU 数目,此参数为必需"

#: ../../source/installation/kubernetes.rst:109
msgid "worker_mem"
msgstr ""

#: ../../source/installation/kubernetes.rst:109
msgid "Memory size for workers in the cluster, in bytes or size units like ``1g``"
msgstr "每个 Worker 的内存大小,可使用字节数或带单位的大小,例如 ``1g``"
msgid ""
"Memory size for workers in the cluster, in bytes or size units like "
"``1g``, required."
msgstr ""
"每个 Worker 的内存大小,可使用字节数或带单位的大小,例如 ``1g``,此参数为"
"必需"

#: ../../source/installation/kubernetes.rst:112
msgid "worker_spill_paths"
Expand Down
14 changes: 9 additions & 5 deletions docs/source/locale/zh_CN/LC_MESSAGES/installation/yarn.po
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: mars 0.7.0a2\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2021-08-04 17:36+0800\n"
"POT-Creation-Date: 2021-09-16 16:05+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
Expand Down Expand Up @@ -185,16 +185,20 @@ msgid "worker_cpu"
msgstr ""

#: ../../source/installation/yarn.rst:125
msgid "Number of CPUs for every worker"
msgstr "每个 Worker 的 CPU 数目"
msgid "Number of CPUs for every worker, required."
msgstr "每个 Worker 的 CPU 数目,此参数为必需"

#: ../../source/installation/yarn.rst:127
msgid "worker_mem"
msgstr ""

#: ../../source/installation/yarn.rst:127
msgid "Memory size for workers in the cluster, in bytes or size units like ``1g``"
msgstr "每个 Worker 的内存大小,可使用字节数或带单位的大小,例如 ``1g``"
msgid ""
"Memory size for workers in the cluster, in bytes or size units like "
"``1g``, required."
msgstr ""
"每个 Worker 的内存大小,可使用字节数或带单位的大小,例如 ``1g``,此参数为"
"必需"

#: ../../source/installation/yarn.rst:130
msgid "worker_spill_paths"
Expand Down
1 change: 1 addition & 0 deletions docs/source/reference/learn/reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ Samples generator
datasets.make_blobs
datasets.make_classification
datasets.make_low_rank_matrix
datasets.make_regression

.. _decomposition_ref:

Expand Down
2 changes: 1 addition & 1 deletion mars/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
import os
from typing import NamedTuple, Optional

__version__ = "0.7.1"
__version__ = "0.8.0"


def _get_cmd_results(pkg_root, cmd):
Expand Down
5 changes: 3 additions & 2 deletions mars/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,10 @@ def ray_large_cluster(request): # pragma: no cover
cluster = Cluster()
remote_nodes = []
for i in range(num_nodes):
remote_nodes.append(cluster.add_node(num_cpus=num_cpus))
remote_nodes.append(cluster.add_node(num_cpus=num_cpus, memory=32 * 1024 ** 3))
if len(remote_nodes) == 1:
ray.init(address=cluster.address)
ray.init(address=cluster.address,
job_config=ray.job_config.JobConfig(total_memory_mb=num_nodes * 32 * 1024 ** 2))
register_ray_serializers()
try:
yield
Expand Down
2 changes: 2 additions & 0 deletions mars/core/entity/tileables.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,8 @@ def __getitem__(self, item):
The length of indices must be the same with `chunk_shape` of tileable.
"""
if isinstance(item, int):
item = (item,)
if isinstance(item, tuple):
if len(item) == 0 and self._tileable.is_scalar():
return self._tileable.chunks[0]
Expand Down
2 changes: 1 addition & 1 deletion mars/core/operand/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ def inputs(self, vals):
self._set_inputs(vals)

@property
def output_limit(self):
def output_limit(self) -> int:
return 1

@property
Expand Down
4 changes: 2 additions & 2 deletions mars/dataframe/align.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ def build_map_chunk_kw(self, inputs, **kw):
if kw.get('dtype', None) is None and getattr(inputs[0], 'dtype', None) is not None:
kw['dtype'] = inputs[0].dtype
if kw.get('name', None) is None and getattr(inputs[0], 'name', None) is not None:
kw['name'] = inputs[0].dtype
kw['name'] = inputs[0].name
return kw

def build_reduce_chunk_kw(self, inputs, index, **kw):
Expand Down Expand Up @@ -192,7 +192,7 @@ def build_reduce_chunk_kw(self, inputs, index, **kw):
if kw.get('dtype', None) is None and getattr(inputs[0].inputs[0], 'dtype', None) is not None:
kw['dtype'] = inputs[0].inputs[0].dtype
if kw.get('name', None) is None and getattr(inputs[0].inputs[0], 'name', None) is not None:
kw['name'] = inputs[0].inputs[0].dtype
kw['name'] = inputs[0].inputs[0].name
return kw

@classmethod
Expand Down
3 changes: 1 addition & 2 deletions mars/dataframe/base/eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@
from ... import opcodes
from ...core import ENTITY_TYPE, OutputType, get_output_types, recursive_tile
from ...serialization.serializables import BoolField, DictField, StringField
from ..core import DATAFRAME_TYPE
from ..operands import DataFrameOperand, DataFrameOperandMixin
from ..utils import parse_index

Expand Down Expand Up @@ -123,7 +122,7 @@ def get_named_object(self, obj_name):
raise KeyError(f'name {obj_name} is not defined')

def visit(self, node):
if isinstance(node, DATAFRAME_TYPE):
if isinstance(node, ENTITY_TYPE):
return node
node_name = node.__class__.__name__
method = 'visit_' + node_name
Expand Down

0 comments on commit 7bf62f7

Please sign in to comment.