Skip to content

Commit

Permalink
Initial PR (#1)
Browse files Browse the repository at this point in the history
* Initial PR

Co-authored-by: Liang-Chi Hsieh <liangchi@apple.com>
Co-authored-by: Kazuyuki Tanimura <ktanimura@apple.com>
Co-authored-by: Steve Vaughan Jr <s_vaughan@apple.com>
Co-authored-by: Huaxin Gao <huaxin_gao@apple.com>
Co-authored-by: Parth Chandra <parthc@apple.com>
Co-authored-by: Oleksandr Voievodin <ovoievodin@apple.com>

* Add license header to Makefile. Remove unncessary file core/.lldbinit.

* Update DEBUGGING.md

* add license and address comments

---------

Co-authored-by: Liang-Chi Hsieh <liangchi@apple.com>
Co-authored-by: Kazuyuki Tanimura <ktanimura@apple.com>
Co-authored-by: Steve Vaughan Jr <s_vaughan@apple.com>
Co-authored-by: Huaxin Gao <huaxin_gao@apple.com>
Co-authored-by: Parth Chandra <parthc@apple.com>
Co-authored-by: Oleksandr Voievodin <ovoievodin@apple.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
  • Loading branch information
8 people committed Feb 9, 2024
1 parent 20edb17 commit 383c8fd
Show file tree
Hide file tree
Showing 233 changed files with 55,020 additions and 1 deletion.
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
target
.idea
*.iml
derby.log
metastore_db/
spark-warehouse/
dependency-reduced-pom.xml
core/src/execution/generated
prebuild
.flattened-pom.xml
27 changes: 27 additions & 0 deletions .scalafix.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
rules = [
ExplicitResultTypes,
NoAutoTupling,
RemoveUnused,

DisableSyntax,
LeakingImplicitClassVal,
NoValInForComprehension,
ProcedureSyntax,
RedundantSyntax
]
96 changes: 96 additions & 0 deletions DEBUGGING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Comet Debugging Guide

This HOWTO describes how to debug JVM code and Native code concurrently. The guide assumes you have:
1. Intellij as the Java IDE
2. CLion as the Native IDE. For Rust code, the CLion Rust language plugin is required. Note that the
Intellij Rust plugin is not sufficient.
3. CLion/LLDB as the native debugger. CLion ships with a bundled LLDB and the Rust community has
its own packaging of LLDB (`lldb-rust`). Both provide a better display of Rust symbols than plain
LLDB or the LLDB that is bundled with XCode. We will use the LLDB packaged with CLion for this guide.
4. We will use a Comet _unit_ test as the canonical use case.

_Caveat: The steps here have only been tested with JDK 11_ on Mac (M1)

## Debugging for Advanced Developers

Add a `.lldbinit` to comet/core. This is not strictly necessary but will be useful if you want to
use advanced `lldb` debugging.

### In Intellij

1. Set a breakpoint in `NativeBase.load()`, at a point _after_ the Comet library has been loaded.

1. Add a Debug Configuration for the unit test

1. In the Debug Configuration for that unit test add `-Xint` as a JVM parameter. This option is
undocumented *magic*. Without this, the LLDB debugger hits a EXC_BAD_ACCESS (or EXC_BAD_INSTRUCTION) from
which one cannot recover.

1. Add a println to the unit test to print the PID of the JVM process. (jps can also be used but this is less error prone if you have multiple jvm processes running)
``` JDK8
println("Waiting for Debugger: PID - ", ManagementFactory.getRuntimeMXBean().getName())
```
This will print something like : `PID@your_machine_name`.

For JDK9 and newer
```JDK9
println("Waiting for Debugger: PID - ", ProcessHandle.current.pid)
```

==> Note the PID

1. Debug-run the test in Intellij and wait for the breakpoint to be hit

### In CLion

1. After the breakpoint is hit in Intellij, in Clion (or LLDB from terminal or editor) -

1. Attach to the jvm process (make sure the PID matches). In CLion, this is `Run -> Atttach to process`

1. Put your breakpoint in the native code

1. Go back to intellij and resume the process.

1. Most debugging in CLion is similar to Intellij. For advanced LLDB based debugging the LLDB command line can be accessed from the LLDB tab in the Debugger view. Refer to the [LLDB manual](https://lldb.llvm.org/use/tutorial.html) for LLDB commands.

### After your debugging is done,

1. In CLion, detach from the process if not already detached

2. In Intellij, the debugger might have lost track of the process. If so, the debugger tab
will show the process as running (even if the test/job is shown as completed).

3. Close the debugger tab, and if the IDS asks whether it should terminate the process,
click Yes.

4. In terminal, use jps to identify the process with the process id you were debugging. If
it shows up as running, kill -9 [pid]. If that doesn't remove the process, don't bother,
the process will be left behind as a zombie and will consume no (significant) resources.
Eventually it will be cleaned up when you reboot possibly after a software update.

### Additional Info

OpenJDK mailing list on debugging the JDK on MacOS
https://mail.openjdk.org/pipermail/hotspot-dev/2019-September/039429.html

Detecting the debugger
https://stackoverflow.com/questions/5393403/can-a-java-application-detect-that-a-debugger-is-attached#:~:text=No.,to%20let%20your%20app%20continue.&text=I%20know%20that%20those%20are,meant%20with%20my%20first%20phrase).
65 changes: 65 additions & 0 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Comet Development Guide

## Project Layout

```
├── common <- common Java/Scala code
├── conf <- configuration files
├── core <- core native code, in Rust
├── spark <- Spark integration
```

## Development Setup

1. Make sure `JAVA_HOME` is set and point to JDK 11 installation.
2. Install Rust toolchain. The easiest way is to use
[rustup](https://rustup.rs).

## Build & Test

A few common commands are specified in project's `Makefile`:

- `make`: compile the entire project, but don't run tests
- `make test`: compile the project and run tests in both Rust and Java
side.
- `make release`: compile the project and creates a release build. This
is useful when you want to test Comet local installation in another project
such as Spark.
- `make clean`: clean up the workspace
- `bin/comet-spark-shell -d . -o spark/target/` run Comet spark shell for V1 datasources
- `bin/comet-spark-shell -d . -o spark/target/ --conf spark.sql.sources.useV1SourceList=""` run Comet spark shell for V2 datasources

## Benchmark

There's a `make` command to run micro benchmarks in the repo. For
instance:

```
make benchmark-org.apache.spark.sql.benchmark.CometReadBenchmark
```

To run TPC-H or TPC-DS micro benchmarks, please follow the instructions
in the respective source code, e.g., `CometTPCHQueryBenchmark`.

## Debugging
Comet is a multi-language project with native code written in Rust and JVM code written in Java and Scala.
It is possible to debug both native and JVM code concurrently as described in the [DEBUGGING guide](DEBUGGING.md)
96 changes: 96 additions & 0 deletions EXPRESSIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Expressions Supported by Comet

The following Spark expressions are currently available:

+ Literals
+ Arithmetic Operators
+ UnaryMinus
+ Add/Minus/Multiply/Divide/Remainder
+ Conditional functions
+ Case When
+ If
+ Cast
+ Coalesce
+ Boolean functions
+ And
+ Or
+ Not
+ EqualTo
+ EqualNullSafe
+ GreaterThan
+ GreaterThanOrEqual
+ LessThan
+ LessThanOrEqual
+ IsNull
+ IsNotNull
+ In
+ String functions
+ Substring
+ Coalesce
+ StringSpace
+ Like
+ Contains
+ Startswith
+ Endswith
+ Ascii
+ Bit_length
+ Octet_length
+ Upper
+ Lower
+ Chr
+ Initcap
+ Trim/Btrim/Ltrim/Rtrim
+ Concat_ws
+ Repeat
+ Length
+ Reverse
+ Instr
+ Replace
+ Translate
+ Bitwise functions
+ Shiftright/Shiftleft
+ Date/Time functions
+ Year/Hour/Minute/Second
+ Math functions
+ Abs
+ Acos
+ Asin
+ Atan
+ Atan2
+ Cos
+ Exp
+ Ln
+ Log10
+ Log2
+ Pow
+ Round
+ Signum
+ Sin
+ Sqrt
+ Tan
+ Ceil
+ Floor
+ Aggregate functions
+ Count
+ Sum
+ Max
+ Min

0 comments on commit 383c8fd

Please sign in to comment.