PullRequest: 58 merge github 2524

Merge branch merge_github_2524 of git@gitlab.alipay-inc.com:ray-project/mars.git into master https://code.alipay.com/ray-project/mars/pull_requests/58?tab=diff Signed-off-by: 捕牛 <hejialing.hjl@antgroup.com> * [Ray] Support reconstructing worker (mars-project#2413) * Make cmdline support third party modules (mars-project#2454) Co-authored-by: hanguang <zhusiyuan.zsy@alibaba-inc.com> * Support visualizing subtask graphs on Mars Web (mars-project#2426) * Fix timeout error when waiting for a submitted task (mars-project#2457) * Print the error message when error happens in `TaskProcessor` (mars-project#2458) * Add nightly builds for docker images (mars-project#2456) * Fix misuse of `name` parameter in DataFrame align (mars-project#2469) * Fix hang when start sub pool fails (mars-project#2468) * Refine and unify subtask detail APIs (mars-project#2465) * Fix coverage for Azure pipeline (mars-project#2474) * Split tileable information and subtask graph into two tabs (mars-project#2480) * Support specified vineyard socket and skip the launching vineyardd process (mars-project#2481) * Basic reschedule subtask (mars-project#2467) * Compatible with scikit-learn 1.0 (mars-project#2486) Co-authored-by: hekaisheng <kaisheng.hks@alibaba-inc.com> * Fix wrong translation in cluster deployment. (mars-project#2489) * Fix bug that failed to execute query when there are multiple arguments (mars-project#2490) * Include tileable property in detail api (mars-project#2493) * Fix version of statsmodels to pass CI (mars-project#2497) * Implements `glm.LogisticRegression` (mars-project#2466) * Implements bagging sampling (mars-project#2496) * Refine MarsDMatrix & support more parameters for XGB classifier and regressor (mars-project#2498) * Fix output of df.groupby(as_index=False).size() (mars-project#2507) * Add preliminary implementations for ufunc methods (mars-project#2510) * Add doc for reading csv in oss (mars-project#2514) * [Ray] Fix serializing lambdas in web (mars-project#2512) * Add `make_regression` support for learn module (mars-project#2515) * Fix reduction result on empty series (mars-project#2520) * Fix df.loc when df is empty (mars-project#2524) * fix start subpool * fix test_kill_and_wait_timeout * fix autoscale timeout * fix ray larger clsuter fixture * Update ci ray package to 1.2.2 * remove python3.6 3.8 .39 ut and upgrade ray 3.7 image * echo python path * fix json decode error * fix bundle release timeout * fix remove placement group timeout * fix no_restart * fix ci * fix autoscale
chaokunyang · Oct 27, 2021 · 7bf62f7 · 7bf62f7
1 parent 4522b5f
commit 7bf62f7
Show file tree

Hide file tree

Showing 133 changed files with 4,288 additions and 970 deletions.
diff --git a/.github/workflows/core-ci.yml b/.github/workflows/core-ci.yml
@@ -39,7 +39,7 @@ jobs:
         source ./ci/reload-env.sh
         export DEFAULT_VENV=$VIRTUAL_ENV
 
-        if [[ ! "$PYTHON" =~ "3.9" ]]; then
+        if [[ ! "$PYTHON" =~ "3.6" ]]; then
           conda install -n test --quiet --yes -c conda-forge python=$PYTHON numba
         fi
 

diff --git a/.github/workflows/docker-cd.yml b/.github/workflows/docker-cd.yml
@@ -2,6 +2,10 @@ name: Mars CD for DockerHub
 
 on:
   push:
+    branches:
+      - master
+      - main
+      - 'v*'
     tags:
       - '*'
 
@@ -27,5 +31,10 @@ jobs:
           if [[ "$DOCKER_ORG" == "marsuploader" ]]; then
             export DOCKER_ORG="marsproject"
           fi
-          bash bin/kube-image-tool.sh -o "$DOCKER_ORG" -t "$GIT_TAG" build
-          docker push "$DOCKER_ORG/mars:$GIT_TAG"
+          if [[ -n "$GIT_TAG" ]]; then
+            export IMAGE_TAG="$GIT_TAG"
+          else
+            export IMAGE_TAG="nightly-$GIT_BRANCH"
+          fi
+          bash bin/kube-image-tool.sh -o "$DOCKER_ORG" -t "$IMAGE_TAG" build
+          docker push "$DOCKER_ORG/mars:$IMAGE_TAG"
diff --git a/.github/workflows/install-hadoop.sh b/.github/workflows/install-hadoop.sh
@@ -9,7 +9,7 @@ sudo npm uninstall -g yarn || true
 
 sudo apt-get install -yq ssh rsync
 
-VERSION=3.3.0
+VERSION=3.3.1
 HADOOP_URL="https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download&filename=hadoop/common/hadoop-$VERSION/hadoop-$VERSION.tar.gz"
 
 # download hadoop

diff --git a/.upload.aci.yml b/.upload.aci.yml
@@ -29,7 +29,8 @@ upload_py36_wheel_to_oss:
       inputs:
         image: ${ACI_VAR_MARS_IMAGES_FOR_RELEASE_PY36}
         command:
-        - bash ci/antbuild.sh
+        - |
+          python setup.py bdist_wheel
   pluginConfig:
     appname: mars
     source: ray-project
@@ -79,7 +80,8 @@ upload_py38_wheel_to_oss:
       inputs:
         image: ${ACI_VAR_MARS_IMAGES_FOR_RELEASE_PY38}
         command:
-          - bash ci/antbuild.sh
+        - |
+          python setup.py bdist_wheel
   pluginConfig:
     appname: mars
     source: ray-project
@@ -91,28 +93,3 @@ upload_py38_wheel_to_oss:
   only:
     change:
       - ^mars/*
-
-upload_py39_wheel_to_oss:
-  stage: 上传wheel包
-  aciTags: DOCKER
-  agent:
-    resourceClass: L
-  steps:
-    - plugin: clone
-    - plugin: shell
-      defaultWebTerminal: true
-      inputs:
-        image: ${ACI_VAR_MARS_IMAGES_FOR_RELEASE_PY39}
-        command:
-          - bash ci/antbuild.sh
-  pluginConfig:
-    appname: mars
-    source: ray-project
-    type: UT
-  publisher:
-    archiveArtifacts:
-      artifacts: '**/dist/pymars-*-cp39-cp39-linux_x86_64.whl'
-      allowEmptyArchive: false
-  only:
-    change:
-      - ^mars/*
diff --git a/LICENSE b/LICENSE
@@ -265,6 +265,8 @@ MIT License
 - material-ui:4.0.0
 - lodash:4.17.0
 - dagre-d3:0.6.4
+- cytoscape:3.19.1
+- cytoscape-dagre:2.3.2
 
 
 ISC License

diff --git a/azure-pipelines.yml b/azure-pipelines.yml
@@ -82,8 +82,9 @@ jobs:
           conda install -n test --quiet --yes -c pkgs/main python=$PYTHON certifi
 
           if [[ "$(mars.test.module)" == "learn" ]]; then
+            # remove version limit when blue-yonder/tsfresh#897 is fixed.
             pip install xgboost lightgbm tensorflow faiss-cpu torch torchvision \
-              statsmodels tsfresh
+              statsmodels\<0.13.0 tsfresh
           fi
         fi
         conda list -n test
@@ -98,6 +99,10 @@ jobs:
       displayName: 'Run tests'
 
     - bash: |
+        GIT_ORIGIN="$(git config --get remote.origin.url)"
+        if [[ "$GIT_ORIGIN" == *"mars-project/mars"* ]]; then
+          export CODECOV_TOKEN="f578555e-b4f4-49a8-8d33-b21a305299ee"
+        fi
         bash <(curl -s https://codecov.io/bash)
       displayName: 'Upload coverage'
 

diff --git a/ci/antbuild.sh b/ci/antbuild.sh
@@ -54,24 +54,14 @@ $OSS_HTTP_PREFIX/$(ls pyodps*)
 EOF
 
 echo "Upload files..."
-FILES=(dist/"$(ls dist)" dsmp/dist/"$(ls dsmp/dist)" "$(ls pyodp*)" package.txt requirements.txt
-  client/mars_driver.py client/odps_op_with_dsmp.py)
+FILES=(dist/"$(ls dist)" dsmp/dist/"$(ls dsmp/dist)" "$(ls pyodp*)" package.txt requirements.tx  client/mars_driver.py client/odps_op_with_dsmp.py)
 for f in "${FILES[@]}"; do
   set +x
 
   FILE_NAME=$(basename $f)
-  if [[ $FILE_NAME == pymars-* ]]; then
-    IFS='-' read -ra wheel_name_array <<< "$FILE_NAME"
-    # If in release branch, wheel_name_array[1] == $VERSION
-    wheel_name_array[1]=$VERSION
-    WHEEL_NAME=$(IFS="-" ; echo "${wheel_name_array[*]}")
-  else
-    WHEEL_NAME=$FILE_NAME
-  fi
-  echo "Upload $WHEEL_NAME to oss"
-
+  echo "Upload $FILE_NAME to oss"
   osscmd put --headers="x-oss-forbid-overwrite:$FORBID_OVERWRITE" "$WORKSPACE_DIR/$f" \
-                  oss://rayoltest/"$PKG_PATH/$WHEEL_NAME" \
+                  oss://rayoltest/"$PKG_PATH/$FILE_NAME" \
                    config --id=LTAI4GKK6Chbk8gNULPpYV4j --key="${ACI_VAR_MARS_PACK_OSS_KEY}" \
                    --host=oss-cn-hangzhou-zmf.aliyuncs.com;
   set -x

diff --git a/ci/reload-env.sh b/ci/reload-env.sh
@@ -7,6 +7,8 @@ if [[ "$GITHUB_REF" =~ ^"refs/tags/" ]]; then
   export GITHUB_TAG_REF="$GITHUB_REF"
   unset CYTHON_TRACE
   export GIT_TAG=$(echo "$GITHUB_REF" | sed -e "s/refs\/tags\///g")
+else
+  export GIT_BRANCH=$(echo "$GITHUB_REF" | sed -e "s/refs\/heads\///g")
 fi
 
 if [[ $UNAME == "mingw"* ]] || [[ $UNAME == "msys"* ]]; then
@@ -91,3 +93,5 @@ shopt -s expand_aliases
 # TODO(buniu): ant internal ray hava some bug.
 # Enable gcs when bug fix.
 export RAY_GCS_TASK_SCHEDULING_ENABLED=true
+
+echo "python path: $(which python)"
diff --git a/docs/source/getting_started/dataframe.rst b/docs/source/getting_started/dataframe.rst
@@ -143,3 +143,19 @@ DataFrame can be saved to CSV etc.
    Index: []
 
 Refer to :ref:`api.dataframe.io` for more information.
+
+DataFrame can be read from CSV which is saved in `OSS <https://help.aliyun.com/document_detail/32027.html>`_.
+
+.. code-block:: python
+
+   >>> import mars.dataframe as md
+   >>> from mars.lib.filesystem.oss import build_oss_path
+   >>> access_key_id = your_oss_id
+   >>> access_key_secret = your_oss_key
+   >>> end_point = oss_endpoint
+   >>> file_path = 'oss://buckey/path/to/data.csv'
+   >>> auth_path = build_oss_path(file_path, access_key_id, access_key_secret, end_point)
+   >>> df = md.read_csv(auth_path).execute()
+
+You can find AccessKey by `AcessKey Documentation <https://help.aliyun.com/document_detail/53045.htm?spm=a2c4g.11186623.0.0.401f4b78nWS2uH#task968>`_
+and get endpoint by `Comparison table of OSS Region and Endpoint <https://help.aliyun.com/document_detail/31837.htm?spm=a2c4g.11186623.0.0.cd7c7041aR2CL9>`_.
diff --git a/docs/source/installation/kubernetes.rst b/docs/source/installation/kubernetes.rst
@@ -104,10 +104,10 @@ Arguments for workers:
 +====================+================================================================+
 | worker_num         | Number of workers in the cluster, 1 by default                 |
 +--------------------+----------------------------------------------------------------+
-| worker_cpu         | Number of CPUs for every worker                                |
+| worker_cpu         | Number of CPUs for every worker, required.                     |
 +--------------------+----------------------------------------------------------------+
 | worker_mem         | Memory size for workers in the cluster, in bytes or size units |
-|                    | like ``1g``                                                    |
+|                    | like ``1g``, required.                                         |
 +--------------------+----------------------------------------------------------------+
 | worker_spill_paths | List of spill paths for worker pods on hosts                   |
 +--------------------+----------------------------------------------------------------+

diff --git a/docs/source/installation/yarn.rst b/docs/source/installation/yarn.rst
@@ -122,10 +122,10 @@ Arguments for workers:
 +====================+================================================================+
 | worker_num         | Number of workers in the cluster, 1 by default                 |
 +--------------------+----------------------------------------------------------------+
-| worker_cpu         | Number of CPUs for every worker                                |
+| worker_cpu         | Number of CPUs for every worker, required.                     |
 +--------------------+----------------------------------------------------------------+
 | worker_mem         | Memory size for workers in the cluster, in bytes or size units |
-|                    | like ``1g``                                                    |
+|                    | like ``1g``, required.                                         |
 +--------------------+----------------------------------------------------------------+
 | worker_spill_paths | List of spill paths for worker pods on hosts                   |
 +--------------------+----------------------------------------------------------------+

diff --git a/docs/source/locale/zh_CN/LC_MESSAGES/getting_started/dataframe.po b/docs/source/locale/zh_CN/LC_MESSAGES/getting_started/dataframe.po
@@ -103,3 +103,14 @@ msgstr "DataFrame 能被保存到 CSV 文件。"
 msgid "Refer to :ref:`api.dataframe.io` for more information."
 msgstr "参考 :ref:`api.dataframe.io` 获取更多信息。"
 
+#: ../../source/getting_started/dataframe.rst:147
+msgid "DataFrame can be read from CSV which is saved in `OSS <https://help.aliyun.com/document_detail/32027.html>`_."
+msgstr "DataFrame 可以从被保存在 OSS 上的 CSV 文件中读取。"
+
+#: ../../source/getting_started/dataframe.rst:160
+msgid "You can find AccessKey by `AcessKey Documentation <https://help.aliyun.com/document_detail/53045.htm?spm=a2c4g.11186623.0.0.401f4b78nWS2uH#task968>`_"
+msgstr "你可以通过 `AcessKey 文档 <https://help.aliyun.com/document_detail/32027.html>`_ 描述的方法获取你的 AccessKey"
+
+#: ../../source/getting_started/dataframe.rst:161
+msgid "and get endpoint by `Comparison table of OSS Region and Endpoint <https://help.aliyun.com/document_detail/31837.htm?spm=a2c4g.11186623.0.0.cd7c7041aR2CL9>`_."
+msgstr "并且，你可以通过 `公共云下 OSS Region 和 Endpoint 对照表 <https://help.aliyun.com/document_detail/31837.htm?spm=a2c4g.11186623.0.0.cd7c7041aR2CL9>`_ 获得对应 bucket 的 endpoint 。"
diff --git a/docs/source/locale/zh_CN/LC_MESSAGES/installation/deploy.po b/docs/source/locale/zh_CN/LC_MESSAGES/installation/deploy.po
@@ -217,11 +217,11 @@ msgstr ""
 
 #: ../../source/installation/deploy.rst:112
 msgid "On Supervisor 1 (192.168.1.10):"
-msgstr "在 Worker 1（192.168.1.20）上"
+msgstr "在 Supervisor 1（192.168.1.10）上"
 
 #: ../../source/installation/deploy.rst:118
 msgid "On Supervisor 2 (192.168.1.11):"
-msgstr "在 Worker 2（192.168.1.21）上"
+msgstr "在 Supervisor 2（192.168.1.11）上"
 
 #: ../../source/installation/deploy.rst:124
 msgid "On Worker 1 (192.168.1.20):"

diff --git a/docs/source/locale/zh_CN/LC_MESSAGES/installation/kubernetes.po b/docs/source/locale/zh_CN/LC_MESSAGES/installation/kubernetes.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: mars 0.5.0a2\n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2021-08-04 17:13+0800\n"
+"POT-Creation-Date: 2021-09-16 16:05+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language-Team: LANGUAGE <LL@li.org>\n"
@@ -169,16 +169,20 @@ msgid "worker_cpu"
 msgstr ""
 
 #: ../../source/installation/kubernetes.rst:107
-msgid "Number of CPUs for every worker"
-msgstr "每个 Worker 的 CPU 数目"
+msgid "Number of CPUs for every worker, required."
+msgstr "每个 Worker 的 CPU 数目，此参数为必需"
 
 #: ../../source/installation/kubernetes.rst:109
 msgid "worker_mem"
 msgstr ""
 
 #: ../../source/installation/kubernetes.rst:109
-msgid "Memory size for workers in the cluster, in bytes or size units like ``1g``"
-msgstr "每个 Worker 的内存大小，可使用字节数或带单位的大小，例如 ``1g``"
+msgid ""
+"Memory size for workers in the cluster, in bytes or size units like "
+"``1g``, required."
+msgstr ""
+"每个 Worker 的内存大小，可使用字节数或带单位的大小，例如 ``1g``，此参数为"
+"必需"
 
 #: ../../source/installation/kubernetes.rst:112
 msgid "worker_spill_paths"

diff --git a/docs/source/locale/zh_CN/LC_MESSAGES/installation/yarn.po b/docs/source/locale/zh_CN/LC_MESSAGES/installation/yarn.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: mars 0.7.0a2\n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2021-08-04 17:36+0800\n"
+"POT-Creation-Date: 2021-09-16 16:05+0800\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language-Team: LANGUAGE <LL@li.org>\n"
@@ -185,16 +185,20 @@ msgid "worker_cpu"
 msgstr ""
 
 #: ../../source/installation/yarn.rst:125
-msgid "Number of CPUs for every worker"
-msgstr "每个 Worker 的 CPU 数目"
+msgid "Number of CPUs for every worker, required."
+msgstr "每个 Worker 的 CPU 数目，此参数为必需"
 
 #: ../../source/installation/yarn.rst:127
 msgid "worker_mem"
 msgstr ""
 
 #: ../../source/installation/yarn.rst:127
-msgid "Memory size for workers in the cluster, in bytes or size units like ``1g``"
-msgstr "每个 Worker 的内存大小，可使用字节数或带单位的大小，例如 ``1g``"
+msgid ""
+"Memory size for workers in the cluster, in bytes or size units like "
+"``1g``, required."
+msgstr ""
+"每个 Worker 的内存大小，可使用字节数或带单位的大小，例如 ``1g``，此参数为"
+"必需"
 
 #: ../../source/installation/yarn.rst:130
 msgid "worker_spill_paths"

diff --git a/docs/source/reference/learn/reference.rst b/docs/source/reference/learn/reference.rst
@@ -58,6 +58,7 @@ Samples generator
    datasets.make_blobs
    datasets.make_classification
    datasets.make_low_rank_matrix
+   datasets.make_regression
 
 .. _decomposition_ref:
 

diff --git a/mars/_version.py b/mars/_version.py
@@ -16,7 +16,7 @@
 import os
 from typing import NamedTuple, Optional
 
-__version__ = "0.7.1"
+__version__ = "0.8.0"
 
 
 def _get_cmd_results(pkg_root, cmd):

diff --git a/mars/conftest.py b/mars/conftest.py
@@ -55,9 +55,10 @@ def ray_large_cluster(request):  # pragma: no cover
     cluster = Cluster()
     remote_nodes = []
     for i in range(num_nodes):
-        remote_nodes.append(cluster.add_node(num_cpus=num_cpus))
+        remote_nodes.append(cluster.add_node(num_cpus=num_cpus, memory=32 * 1024 ** 3))
         if len(remote_nodes) == 1:
-            ray.init(address=cluster.address)
+            ray.init(address=cluster.address,
+            job_config=ray.job_config.JobConfig(total_memory_mb=num_nodes * 32 * 1024 ** 2))
     register_ray_serializers()
     try:
         yield

diff --git a/mars/core/entity/tileables.py b/mars/core/entity/tileables.py
@@ -151,6 +151,8 @@ def __getitem__(self, item):
 
         The length of indices must be the same with `chunk_shape` of tileable.
         """
+        if isinstance(item, int):
+            item = (item,)
         if isinstance(item, tuple):
             if len(item) == 0 and self._tileable.is_scalar():
                 return self._tileable.chunks[0]

diff --git a/mars/core/operand/base.py b/mars/core/operand/base.py
@@ -192,7 +192,7 @@ def inputs(self, vals):
         self._set_inputs(vals)
 
     @property
-    def output_limit(self):
+    def output_limit(self) -> int:
         return 1
 
     @property

diff --git a/mars/dataframe/align.py b/mars/dataframe/align.py
@@ -154,7 +154,7 @@ def build_map_chunk_kw(self, inputs, **kw):
             if kw.get('dtype', None) is None and getattr(inputs[0], 'dtype', None) is not None:
                 kw['dtype'] = inputs[0].dtype
             if kw.get('name', None) is None and getattr(inputs[0], 'name', None) is not None:
-                kw['name'] = inputs[0].dtype
+                kw['name'] = inputs[0].name
         return kw
 
     def build_reduce_chunk_kw(self, inputs, index, **kw):
@@ -192,7 +192,7 @@ def build_reduce_chunk_kw(self, inputs, index, **kw):
             if kw.get('dtype', None) is None and getattr(inputs[0].inputs[0], 'dtype', None) is not None:
                 kw['dtype'] = inputs[0].inputs[0].dtype
             if kw.get('name', None) is None and getattr(inputs[0].inputs[0], 'name', None) is not None:
-                kw['name'] = inputs[0].inputs[0].dtype
+                kw['name'] = inputs[0].inputs[0].name
         return kw
 
     @classmethod

diff --git a/mars/dataframe/base/eval.py b/mars/dataframe/base/eval.py
@@ -30,7 +30,6 @@
 from ... import opcodes
 from ...core import ENTITY_TYPE, OutputType, get_output_types, recursive_tile
 from ...serialization.serializables import BoolField, DictField, StringField
-from ..core import DATAFRAME_TYPE
 from ..operands import DataFrameOperand, DataFrameOperandMixin
 from ..utils import parse_index
 
@@ -123,7 +122,7 @@ def get_named_object(self, obj_name):
         raise KeyError(f'name {obj_name} is not defined')
 
     def visit(self, node):
-        if isinstance(node, DATAFRAME_TYPE):
+        if isinstance(node, ENTITY_TYPE):
             return node
         node_name = node.__class__.__name__
         method = 'visit_' + node_name