Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat][Spark] Implementation of PySpark bindings to Scala API #300

Merged
merged 25 commits into from
Jan 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
069311b
Implementation draft & concept
SemyonSinchenko Dec 23, 2023
5f12da4
Merge remote-tracking branch 'upstream/main' into 297-add-pyspark-bin…
SemyonSinchenko Dec 27, 2023
418168d
Part of tests & update branch
SemyonSinchenko Dec 27, 2023
5de3c04
Update VertexInfo.load_vertex_info & test & fixes
SemyonSinchenko Dec 27, 2023
815043c
Push changes before pulling from upstream
SemyonSinchenko Dec 28, 2023
e361cb6
Merge remote-tracking branch 'upstream/main' into 297-add-pyspark-bin…
SemyonSinchenko Dec 28, 2023
fe39bef
Tests + fixes + updates from comments
SemyonSinchenko Dec 29, 2023
20e3aee
Fix init for GraphArSession
SemyonSinchenko Dec 29, 2023
4c02ef5
Tests and fixes from comments
SemyonSinchenko Dec 29, 2023
0b4a4cd
Make PR ready for review
SemyonSinchenko Dec 29, 2023
5cefd2f
Merge remote-tracking branch 'upstream/main' into 297-add-pyspark-bin…
SemyonSinchenko Dec 29, 2023
d78b0ae
Merge remote-tracking branch 'upstream/main' into 297-add-pyspark-bin…
SemyonSinchenko Jan 2, 2024
072d2f9
Fixes from comments && docs
SemyonSinchenko Jan 2, 2024
56b9a1a
Add license-header to pyspark/README
SemyonSinchenko Jan 2, 2024
7a24cc9
Update tests && small fixes
SemyonSinchenko Jan 3, 2024
b650c93
Merge remote-tracking branch 'upstream/main' into 297-add-pyspark-bin…
SemyonSinchenko Jan 3, 2024
3f4eb14
Drop outdated comment and TODO
SemyonSinchenko Jan 3, 2024
4a03944
Fix broken commit
SemyonSinchenko Jan 3, 2024
5297399
Tests coverage 95% && docstrings && linting pass
SemyonSinchenko Jan 4, 2024
2800e4b
Ci & docs
SemyonSinchenko Jan 9, 2024
b70e158
Merge remote-tracking branch 'upstream/main' into 297-add-pyspark-bin…
SemyonSinchenko Jan 9, 2024
a52728a
Update branch && update README
SemyonSinchenko Jan 9, 2024
851c668
Fixes from comments
SemyonSinchenko Jan 10, 2024
2afc52b
Merge remote-tracking branch 'upstream/main' into 297-add-pyspark-bin…
SemyonSinchenko Jan 10, 2024
5195a7c
Fix linter errors
SemyonSinchenko Jan 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions .github/workflows/pyspark.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Copyright 2022-2023 Alibaba Group Holding Limited.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name: GraphAr PySpark CI

on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
pull_request:
branches:
- main

concurrency:
group: ${{ github.repository }}-${{ github.event.number || github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true

jobs:
GraphAr-spark:
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v3
with:
submodules: true

- name: Install Python
uses: actions/setup-python@v4
with:
python-version: 3.9

- name: Install Poetry
uses: abatilo/actions-poetry@v2

- name: Install Spark Scala && PySpark
run: |
cd pyspark
make install_test

- name: Run PyTest
run: |
cd pyspark
make test

- name: Lint
run: |
cd pyspark
make install_lint
make lint

63 changes: 63 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,66 @@
.ccls-cache

compile_commands.json

### Python ###
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
pyspark/assets

# Jupyter Notebook
.ipynb_checkpoints
*.ipynb


# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Ruff
.ruff_cache

### Scala ###
*.bloop
*.metals
3 changes: 2 additions & 1 deletion .licenserc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,12 @@ header:
- '*.md'
- '*.rst'
- '**/*.json'
- 'pyspark/poetry.lock' # This file is generated automatically by Poetry-tool; there is no way to add license header

comment: on-failure

# If you don't want to check dependencies' license compatibility, remove the following part
dependency:
files:
- spark/pom.xml # If this is a maven project.
- java/pom.xml # If this is a maven project.
- java/pom.xml # If this is a maven project.
22 changes: 22 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,25 @@ html: cpp-apidoc spark-apidoc
--quiet
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."

.PHONY: pyspark-apidoc
pyspark-apidoc:
cd $(ROOTDIR)/pyspark && \
poetry run sphinx-apidoc -o $(ROOTDIR)/docs/pyspark/api graphar_pyspark/
SemyonSinchenko marked this conversation as resolved.
Show resolved Hide resolved

.PHONY: html-poetry
html-poetry:
cd $(ROOTDIR)/pyspark && \
poetry run bash -c "cd $(ROOTDIR)/docs && $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html"
SemyonSinchenko marked this conversation as resolved.
Show resolved Hide resolved
rm -fr $(BUILDDIR)/html/spark/reference
cp -fr $(ROOTDIR)/spark/target/site/scaladocs $(BUILDDIR)/html/spark/reference/
cd $(ROOTDIR)/java && \
mvn -P javadoc javadoc:aggregate \
-Dmaven.antrun.skip=true \
-DskipTests \
-Djavadoc.output.directory=$(ROOTDIR)/docs/$(BUILDDIR)/html/java/ \
-Djavadoc.output.destDir=reference \
--quiet
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."

1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
C++ <cpp/index>
Java <java/index>
Spark <spark/index>
PySpark <pyspark/index>

.. toctree::
:maxdepth: 2
Expand Down
69 changes: 69 additions & 0 deletions docs/pyspark/api/graphar_pyspark.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
graphar\_pyspark package
========================

Submodules
----------

graphar\_pyspark.enums module
-----------------------------

.. automodule:: graphar_pyspark.enums
:members:
:undoc-members:
:show-inheritance:

graphar\_pyspark.errors module
------------------------------

.. automodule:: graphar_pyspark.errors
:members:
:undoc-members:
:show-inheritance:

graphar\_pyspark.graph module
-----------------------------

.. automodule:: graphar_pyspark.graph
:members:
:undoc-members:
:show-inheritance:

graphar\_pyspark.info module
----------------------------

.. automodule:: graphar_pyspark.info
:members:
:undoc-members:
:show-inheritance:

graphar\_pyspark.reader module
------------------------------

.. automodule:: graphar_pyspark.reader
:members:
:undoc-members:
:show-inheritance:

graphar\_pyspark.util module
----------------------------

.. automodule:: graphar_pyspark.util
:members:
:undoc-members:
:show-inheritance:

graphar\_pyspark.writer module
------------------------------

.. automodule:: graphar_pyspark.writer
:members:
:undoc-members:
:show-inheritance:

Module contents
---------------

.. automodule:: graphar_pyspark
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/pyspark/api/modules.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
graphar_pyspark
===============

.. toctree::
:maxdepth: 4

graphar_pyspark
Loading
Loading