Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Infra][Spark] Repackaging Spark library under the Apache namespace #428

Merged
merged 7 commits into from
Mar 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .licenserc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ header:
- 'cpp/thirdparty'
- 'cpp/include/gar/external/result.hpp'
- 'cpp/misc/cpplint.py'
- 'spark/datasources-32/src/main/scala/com/alibaba/graphar/datasources'
- 'spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources'
- 'spark/datasources-32/src/main/scala/org/apache/graphar/datasources'
- 'spark/datasources-33/src/main/scala/org/apache/graphar/datasources'
- 'java/src/main/java/com/alibaba/graphar/stdcxx/StdString.java'
- 'java/src/main/java/com/alibaba/graphar/stdcxx/StdVector.java'
- '*.md'
Expand Down
2 changes: 1 addition & 1 deletion docs/spark/spark-lib.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ You can include GraphAr as a dependency in your maven project
</repositories>
<dependencies>
<dependency>
<groupId>com.alibaba</groupId>
<groupId>org.apache</groupId>
<artifactId>graphar</artifactId>
<version>0.1.0</version>
</dependency>
Expand Down
2 changes: 1 addition & 1 deletion pyspark/graphar_pyspark/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ def set_spark_session(self, spark_session: SparkSession) -> None:
self.ss = spark_session # Python SparkSession
self.sc = spark_session.sparkContext # Python SparkContext
self.jvm = spark_session._jvm # JVM
self.graphar = spark_session._jvm.com.alibaba.graphar # Alias to scala graphar
self.graphar = spark_session._jvm.org.apache.graphar # Alias to scala graphar
self.jsc = spark_session._jsc # Java SparkContext
self.jss = spark_session._jsparkSession # Java SparkSession

Expand Down
4 changes: 2 additions & 2 deletions pyspark/graphar_pyspark/graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
# specific language governing permissions and limitations
# under the License.

"""Bidnings to com.alibaba.graphar.graph."""
"""Bidnings to org.apache.graphar.graph."""

from __future__ import annotations

Expand Down Expand Up @@ -188,7 +188,7 @@ def write(
) -> None:
"""Write graph data in graphar format.

Note: for default parameters check com.alibaba.graphar.GeneralParams;
Note: for default parameters check org.apache.graphar.GeneralParams;
For this method None for any of arguments means that the default value will be used.

:param path: the directory to write.
Expand Down
2 changes: 1 addition & 1 deletion pyspark/graphar_pyspark/info.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
# see the license for the specific language governing permissions and
# limitations under the license.

"""Bindings to com.alibaba.graphar info classes."""
"""Bindings to org.apache.graphar info classes."""

# because we are using type-hints, we need to define few custom TypeVar
# to describe returns of classmethods;
Expand Down
2 changes: 1 addition & 1 deletion pyspark/graphar_pyspark/reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
# see the license for the specific language governing permissions and
# limitations under the license.

"""Bidnings to com.alibaba.graphar.graph."""
"""Bidnings to org.apache.graphar.graph."""

from __future__ import annotations

Expand Down
2 changes: 1 addition & 1 deletion pyspark/graphar_pyspark/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
# see the license for the specific language governing permissions and
# limitations under the license.

"""Bindings to com.alibaba.graphar.util."""
"""Bindings to org.apache.graphar.util."""

from __future__ import annotations

Expand Down
2 changes: 1 addition & 1 deletion pyspark/graphar_pyspark/writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
# see the license for the specific language governing permissions and
# limitations under the license.

"""Bindings to com.alibaba.graphar.writer."""
"""Bindings to org.apache.graphar.writer."""


from __future__ import annotations
Expand Down
10 changes: 5 additions & 5 deletions spark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,8 @@ Build and run the unit tests:
Build and run certain unit test:

```bash
$ mvn clean test -Dsuites='com.alibaba.graphar.GraphInfoSuite' # run the GraphInfo test suite
$ mvn clean test -Dsuites='com.alibaba.graphar.GraphInfoSuite load graph info' # run the `load graph info` test of test suite
$ mvn clean test -Dsuites='org.apache.graphar.GraphInfoSuite' # run the GraphInfo test suite
$ mvn clean test -Dsuites='org.apache.graphar.GraphInfoSuite load graph info' # run the `load graph info` test of test suite
```

### Generate API document
Expand All @@ -68,7 +68,7 @@ The API document is generated in the directory ``spark/graphar/target/site/scala
## Running Neo4j to GraphAr example

Spark provides a simple example to convert Neo4j data to GraphAr data.
The example is located in the directory ``spark/graphar/src/main/scala/com/alibaba/graphar/examples/``.
The example is located in the directory ``spark/graphar/src/main/scala/org/apache/graphar/examples/``.

To run the example, download Spark and Neo4j first.

Expand Down Expand Up @@ -159,7 +159,7 @@ We can write a json configuration file like `import/neo4j.json` to do the import
Running this example requires `Docker` to be installed, if not, follow [this link](https://docs.docker.com/engine/install/). Run `docker version` to check it.

Spark provides a simple example to convert NebulaGraph data to GraphAr data.
The example is located in the directory ``spark/src/main/scala/com/alibaba/graphar/examples/``.
The example is located in the directory ``spark/src/main/scala/org/apache/graphar/examples/``.

To run the example, download Spark and Neo4j first.

Expand Down Expand Up @@ -242,7 +242,7 @@ You can include GraphAr as a dependency in your maven project
</repositories>
<dependencies>
<dependency>
<groupId>com.alibaba</groupId>
<groupId>org.apache</groupId>
<artifactId>graphar</artifactId>
<version>0.1.0</version>
</dependency>
Expand Down
4 changes: 2 additions & 2 deletions spark/datasources-32/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,12 @@
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>com.alibaba</groupId>
<groupId>org.apache</groupId>
<artifactId>graphar</artifactId>
<version>${graphar.version}</version>
</parent>

<groupId>com.alibaba</groupId>
<groupId>org.apache</groupId>
<artifactId>graphar-datasources</artifactId>
<version>${graphar.version}</version>
<packaging>jar</packaging>
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@
* limitations under the License.
*/

package com.alibaba.graphar.datasources
package org.apache.graphar.datasources

import com.alibaba.graphar.GeneralParams
import org.apache.graphar.GeneralParams

import org.json4s._
import org.json4s.jackson.JsonMethods._
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* limitations under the License.
*/

package com.alibaba.graphar.datasources
package org.apache.graphar.datasources

import scala.collection.JavaConverters._
import scala.util.matching.Regex
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* limitations under the License.
*/

package com.alibaba.graphar.datasources
package org.apache.graphar.datasources

import scala.collection.JavaConverters._
import scala.collection.mutable.ArrayBuffer
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* limitations under the License.
*/

package com.alibaba.graphar.datasources
package org.apache.graphar.datasources

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.connector.read.{Scan, SupportsPushDownFilters}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* limitations under the License.
*/

package com.alibaba.graphar.datasources
package org.apache.graphar.datasources

import scala.collection.JavaConverters._

Expand All @@ -31,9 +31,9 @@ import org.apache.spark.sql.execution.datasources.v2.FileTable
import org.apache.spark.sql.types._
import org.apache.spark.sql.util.CaseInsensitiveStringMap

import com.alibaba.graphar.datasources.csv.CSVWriteBuilder
import com.alibaba.graphar.datasources.parquet.ParquetWriteBuilder
import com.alibaba.graphar.datasources.orc.OrcWriteBuilder
import org.apache.graphar.datasources.csv.CSVWriteBuilder
import org.apache.graphar.datasources.parquet.ParquetWriteBuilder
import org.apache.graphar.datasources.orc.OrcWriteBuilder

/** GarTable is a class to represent the graph data in GraphAr as a table. */
case class GarTable(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
* The implementation of GarWriteBuilder is referred from FileWriteBuilder of spark 3.1.1
*/

package com.alibaba.graphar.datasources
package org.apache.graphar.datasources

import java.util.UUID

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
* The implementation of CSVWriteBuilder is refered from CSVWriteBuilder of spark 3.1.1
*/

package com.alibaba.graphar.datasources.csv
package org.apache.graphar.datasources.csv

import org.apache.hadoop.mapreduce.{Job, TaskAttemptContext}
import org.apache.spark.sql.catalyst.csv.CSVOptions
Expand All @@ -31,7 +31,7 @@ import org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.{DataType, StructType}

import com.alibaba.graphar.datasources.GarWriteBuilder
import org.apache.graphar.datasources.GarWriteBuilder

class CSVWriteBuilder(
paths: Seq[String],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
* The implementation of OrcOutputWriter is referred from OrcOutputWriter of spark 3.1.1
*/

package com.alibaba.graphar.datasources.orc
package org.apache.graphar.datasources.orc

import org.apache.hadoop.fs.Path
import org.apache.hadoop.io.NullWritable
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
* The implementation of OrcWriteBuilder is referred from OrcWriteBuilder of spark 3.1.1
*/

package com.alibaba.graphar.datasources.orc
package org.apache.graphar.datasources.orc

import org.apache.hadoop.mapred.JobConf
import org.apache.hadoop.mapreduce.{Job, TaskAttemptContext}
Expand All @@ -32,7 +32,7 @@ import org.apache.spark.sql.execution.datasources.orc.{OrcOptions, OrcUtils}
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types._

import com.alibaba.graphar.datasources.GarWriteBuilder
import org.apache.graphar.datasources.GarWriteBuilder

object OrcWriteBuilder {
// the getQuotedSchemaString method of spark OrcFileFormat
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
* The implementation of ParquetWriteBuilder is referred from ParquetWriteBuilder of spark 3.1.1
*/

package com.alibaba.graphar.datasources.parquet
package org.apache.graphar.datasources.parquet

import org.apache.hadoop.mapreduce.{Job, OutputCommitter, TaskAttemptContext}
import org.apache.parquet.hadoop.{ParquetOutputCommitter, ParquetOutputFormat}
Expand All @@ -35,7 +35,7 @@ import org.apache.spark.sql.execution.datasources.parquet._
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types._

import com.alibaba.graphar.datasources.GarWriteBuilder
import org.apache.graphar.datasources.GarWriteBuilder

class ParquetWriteBuilder(
paths: Seq[String],
Expand Down
4 changes: 2 additions & 2 deletions spark/datasources-33/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,12 @@
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>com.alibaba</groupId>
<groupId>org.apache</groupId>
<artifactId>graphar</artifactId>
<version>${graphar.version}</version>
</parent>

<groupId>com.alibaba</groupId>
<groupId>org.apache</groupId>
<artifactId>graphar-datasources</artifactId>
<version>${graphar.version}</version>
<packaging>jar</packaging>
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@
* limitations under the License.
*/

package com.alibaba.graphar.datasources
package org.apache.graphar.datasources

import com.alibaba.graphar.GeneralParams
import org.apache.graphar.GeneralParams

import org.json4s._
import org.json4s.jackson.JsonMethods._
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* limitations under the License.
*/

package com.alibaba.graphar.datasources
package org.apache.graphar.datasources

import scala.collection.JavaConverters._
import scala.util.matching.Regex
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* limitations under the License.
*/

package com.alibaba.graphar.datasources
package org.apache.graphar.datasources

import scala.collection.JavaConverters._
import scala.collection.mutable.ArrayBuffer
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* limitations under the License.
*/

package com.alibaba.graphar.datasources
package org.apache.graphar.datasources

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.connector.read.Scan
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* limitations under the License.
*/

package com.alibaba.graphar.datasources
package org.apache.graphar.datasources

import scala.collection.JavaConverters._

Expand All @@ -31,9 +31,9 @@ import org.apache.spark.sql.execution.datasources.v2.FileTable
import org.apache.spark.sql.types._
import org.apache.spark.sql.util.CaseInsensitiveStringMap

import com.alibaba.graphar.datasources.csv.CSVWriteBuilder
import com.alibaba.graphar.datasources.parquet.ParquetWriteBuilder
import com.alibaba.graphar.datasources.orc.OrcWriteBuilder
import org.apache.graphar.datasources.csv.CSVWriteBuilder
import org.apache.graphar.datasources.parquet.ParquetWriteBuilder
import org.apache.graphar.datasources.orc.OrcWriteBuilder

/** GarTable is a class to represent the graph data in GraphAr as a table. */
case class GarTable(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
* The implementation of GarWriteBuilder is referred from FileWriteBuilder of spark 3.1.1
*/

package com.alibaba.graphar.datasources
package org.apache.graphar.datasources

import java.util.UUID

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
* The implementation of CSVWriteBuilder is refered from CSVWriteBuilder of spark 3.1.1
*/

package com.alibaba.graphar.datasources.csv
package org.apache.graphar.datasources.csv

import org.apache.hadoop.mapreduce.{Job, TaskAttemptContext}
import org.apache.spark.sql.catalyst.csv.CSVOptions
Expand All @@ -31,7 +31,7 @@ import org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.{DataType, StructType}

import com.alibaba.graphar.datasources.GarWriteBuilder
import org.apache.graphar.datasources.GarWriteBuilder

class CSVWriteBuilder(
paths: Seq[String],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
* The implementation of OrcOutputWriter is referred from OrcOutputWriter of spark 3.1.1
*/

package com.alibaba.graphar.datasources.orc
package org.apache.graphar.datasources.orc

import org.apache.hadoop.fs.Path
import org.apache.hadoop.io.NullWritable
Expand Down
Loading
Loading