Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(Interactive): Support more configurable options for rt_mutable_graph bulk loading with csv file #3116

Closed
zhanglei1949 opened this issue Aug 16, 2023 · 0 comments · Fixed by #3154

Comments

@zhanglei1949
Copy link
Collaborator

zhanglei1949 commented Aug 16, 2023

Currently the supported options are very restricted. For example,
the delimiter can only be |, and user cannot configure whether to skip header, or some commented lines.

@zhanglei1949 zhanglei1949 self-assigned this Aug 16, 2023
@zhanglei1949 zhanglei1949 added this to To do in GIE GraphDB-like production PoC via automation Aug 16, 2023
@longbinlai longbinlai changed the title refactor(Interactive): Support more delimeter for rt_mutable_graph bulk loading with csv file refactor(Interactive): Support more configurable options for rt_mutable_graph bulk loading with csv file Aug 18, 2023
GIE GraphDB-like production PoC automation moved this from To do to Done Sep 8, 2023
zhanglei1949 added a commit that referenced this issue Sep 8, 2023
…3154)

1. Use Arrow CSV Reader to replace current adhoc csv reader, to support
more configurable options in `bulk_load.yaml`.
2. Introduce `CSVFragmentLoader`, `BasicFragmentLoader` for
`MutablePropertyFragment`.

With this PR merged, `MutablePropertyFragment` will support loading
fragment from csv with options:
- delimeter: default '|'
- header_row: default true
- quoting: default false
- quoting_char: default '"'
- escaping: default false
- escaping_char: default'\\'
- batch_size: the batch size of when reading file into memory, default
1MB.
- batch_reader: default false. If set to true,
`arrow::csv::StreamingReader` will be used to parse the input file.
Otherwise, `arrow::TableReader` will be used.


With this PR merged, the performance of graph loading will be improved. 
The Adhoc Reader denote the current implemented csv parser, 1,2,4,8
denotes the parallelism of graph loading, i.e. how many labels of
vertex/edge are concurrently processed.

Note that TableReader is around 10x faster than StreamingReader. The
possible reason could be the multi-threading is used.
See [arrow-csv-doc](https://arrow.apache.org/docs/cpp/csv.html) for
details.


### SF30
| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |805s|	468s|	349s|	313s|
| Adhoc Reader | Serialization | 126s|	126s|	126s|	126s|
| Adhoc Reader  | **Total** |931s|	594s|	475s|	439s|
| Table Reader |  ReadFile | 9s	|9s	|9s|	9s|
| Table Reader | LoadGraph |455s|	280s|	211s|	182s|
| Table Reader |Serialization |126s|	126s|	126s|	126s|
| Table Reader | **Total** | 600s|	415s|	346s|	317s|
| Streaming Reader | ReadFile |91s|	91s|	91s|	91s|
| Streaming Reader | LoadGraph | 555s|	289s|	196s|	149s|
| Streaming Reader | Serialization |126s|	126s|	126s|	126s|
| Streaming Reader | **Total** | 772s|	506s|	413s|	366s|

### SF100
| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |2720s|	1548s|	1176s|	948s|
| Adhoc Reader | Serialization | 409s|	409s|	409s|	409s|
| Adhoc Reader  | **Total** | 3129s|	1957s|	1585s|	1357s|
| Table Reader |  ReadFile |24s|	24s|	24s|	24s|
| Table Reader | LoadGraph |1576s|	949s|	728s|	602s|
| Table Reader |Serialization |409s|	409s|	409s|	409s|
| Table Reader | **Total** | 2009s|	1382s|	1161s|	1035s|
| Streaming Reader | ReadFile |300s|	300s|	300s|	300s|
| Streaming Reader | LoadGraph | 1740s|	965s|	669s|	497s|
| Streaming Reader | Serialization | 409s|	409s|	409s|	409s|
| Streaming Reader | **Total** | 2539s|	1674s|	1378s|	1206s|
### SF300
| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph | 8260s|	4900s	|3603s	|2999s|
| Adhoc Reader | Serialization | 1201s |	1201s|	1201s	|1201s|
| Adhoc Reader  | **Total** | 9461s|	6101s | 4804s	|4200s|
| Table Reader |  ReadFile | 73s	|73s|	96s|	96s|
| Table Reader | LoadGraph |4650s|	2768s|	2155s	|1778s|
| Table Reader |Serialization | 1201s |	1201s|	1201s	|1201s|
| Table Reader | **Total** | 5924s|	4042s|	3452s|	3075s|
| Streaming Reader | ReadFile | 889s |889s | 889s| 889s|
| Streaming Reader | LoadGraph | 5589s|	3005s|	2200s|	1712s|
| Streaming Reader | Serialization | 1201s| 1201s| 1201s |1201s |
| Streaming Reader | **Total** | 7679s	| 5095s |4290s| 	3802s|


FIx #3116
zhanglei1949 added a commit to zhanglei1949/GraphScope that referenced this issue Sep 11, 2023
…libaba#3154)

1. Use Arrow CSV Reader to replace current adhoc csv reader, to support
more configurable options in `bulk_load.yaml`.
2. Introduce `CSVFragmentLoader`, `BasicFragmentLoader` for
`MutablePropertyFragment`.

With this PR merged, `MutablePropertyFragment` will support loading
fragment from csv with options:
- delimeter: default '|'
- header_row: default true
- quoting: default false
- quoting_char: default '"'
- escaping: default false
- escaping_char: default'\\'
- batch_size: the batch size of when reading file into memory, default
1MB.
- batch_reader: default false. If set to true,
`arrow::csv::StreamingReader` will be used to parse the input file.
Otherwise, `arrow::TableReader` will be used.

With this PR merged, the performance of graph loading will be improved.
The Adhoc Reader denote the current implemented csv parser, 1,2,4,8
denotes the parallelism of graph loading, i.e. how many labels of
vertex/edge are concurrently processed.

Note that TableReader is around 10x faster than StreamingReader. The
possible reason could be the multi-threading is used.
See [arrow-csv-doc](https://arrow.apache.org/docs/cpp/csv.html) for
details.

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |805s|	468s|	349s|	313s|
| Adhoc Reader | Serialization | 126s|	126s|	126s|	126s|
| Adhoc Reader  | **Total** |931s|	594s|	475s|	439s|
| Table Reader |  ReadFile | 9s	|9s	|9s|	9s|
| Table Reader | LoadGraph |455s|	280s|	211s|	182s|
| Table Reader |Serialization |126s|	126s|	126s|	126s|
| Table Reader | **Total** | 600s|	415s|	346s|	317s|
| Streaming Reader | ReadFile |91s|	91s|	91s|	91s|
| Streaming Reader | LoadGraph | 555s|	289s|	196s|	149s|
| Streaming Reader | Serialization |126s|	126s|	126s|	126s|
| Streaming Reader | **Total** | 772s|	506s|	413s|	366s|

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |2720s|	1548s|	1176s|	948s|
| Adhoc Reader | Serialization | 409s|	409s|	409s|	409s|
| Adhoc Reader  | **Total** | 3129s|	1957s|	1585s|	1357s|
| Table Reader |  ReadFile |24s|	24s|	24s|	24s|
| Table Reader | LoadGraph |1576s|	949s|	728s|	602s|
| Table Reader |Serialization |409s|	409s|	409s|	409s|
| Table Reader | **Total** | 2009s|	1382s|	1161s|	1035s|
| Streaming Reader | ReadFile |300s|	300s|	300s|	300s|
| Streaming Reader | LoadGraph | 1740s|	965s|	669s|	497s|
| Streaming Reader | Serialization | 409s|	409s|	409s|	409s|
| Streaming Reader | **Total** | 2539s|	1674s|	1378s|	1206s|
| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph | 8260s|	4900s	|3603s	|2999s|
| Adhoc Reader | Serialization | 1201s |	1201s|	1201s	|1201s|
| Adhoc Reader  | **Total** | 9461s|	6101s | 4804s	|4200s|
| Table Reader |  ReadFile | 73s	|73s|	96s|	96s|
| Table Reader | LoadGraph |4650s|	2768s|	2155s	|1778s|
| Table Reader |Serialization | 1201s |	1201s|	1201s	|1201s|
| Table Reader | **Total** | 5924s|	4042s|	3452s|	3075s|
| Streaming Reader | ReadFile | 889s |889s | 889s| 889s|
| Streaming Reader | LoadGraph | 5589s|	3005s|	2200s|	1712s|
| Streaming Reader | Serialization | 1201s| 1201s| 1201s |1201s |
| Streaming Reader | **Total** | 7679s	| 5095s |4290s| 	3802s|

FIx alibaba#3116

minor fix and move modern graph

fix grin test

todo: do_start

fix

fix

stash

fix

fix

make rules unique

dockerfile stash
zhanglei1949 pushed a commit to zhanglei1949/GraphScope that referenced this issue Sep 12, 2023
refactor(flex): Replace the Adhoc csv reader with Arrow CSV reader (alibaba#3154)

1. Use Arrow CSV Reader to replace current adhoc csv reader, to support
more configurable options in `bulk_load.yaml`.
2. Introduce `CSVFragmentLoader`, `BasicFragmentLoader` for
`MutablePropertyFragment`.

With this PR merged, `MutablePropertyFragment` will support loading
fragment from csv with options:
- delimeter: default '|'
- header_row: default true
- quoting: default false
- quoting_char: default '"'
- escaping: default false
- escaping_char: default'\\'
- batch_size: the batch size of when reading file into memory, default
1MB.
- batch_reader: default false. If set to true,
`arrow::csv::StreamingReader` will be used to parse the input file.
Otherwise, `arrow::TableReader` will be used.

With this PR merged, the performance of graph loading will be improved.
The Adhoc Reader denote the current implemented csv parser, 1,2,4,8
denotes the parallelism of graph loading, i.e. how many labels of
vertex/edge are concurrently processed.

Note that TableReader is around 10x faster than StreamingReader. The
possible reason could be the multi-threading is used.
See [arrow-csv-doc](https://arrow.apache.org/docs/cpp/csv.html) for
details.

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |805s|	468s|	349s|	313s|
| Adhoc Reader | Serialization | 126s|	126s|	126s|	126s|
| Adhoc Reader  | **Total** |931s|	594s|	475s|	439s|
| Table Reader |  ReadFile | 9s	|9s	|9s|	9s|
| Table Reader | LoadGraph |455s|	280s|	211s|	182s|
| Table Reader |Serialization |126s|	126s|	126s|	126s|
| Table Reader | **Total** | 600s|	415s|	346s|	317s|
| Streaming Reader | ReadFile |91s|	91s|	91s|	91s|
| Streaming Reader | LoadGraph | 555s|	289s|	196s|	149s|
| Streaming Reader | Serialization |126s|	126s|	126s|	126s|
| Streaming Reader | **Total** | 772s|	506s|	413s|	366s|

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |2720s|	1548s|	1176s|	948s|
| Adhoc Reader | Serialization | 409s|	409s|	409s|	409s|
| Adhoc Reader  | **Total** | 3129s|	1957s|	1585s|	1357s|
| Table Reader |  ReadFile |24s|	24s|	24s|	24s|
| Table Reader | LoadGraph |1576s|	949s|	728s|	602s|
| Table Reader |Serialization |409s|	409s|	409s|	409s|
| Table Reader | **Total** | 2009s|	1382s|	1161s|	1035s|
| Streaming Reader | ReadFile |300s|	300s|	300s|	300s|
| Streaming Reader | LoadGraph | 1740s|	965s|	669s|	497s|
| Streaming Reader | Serialization | 409s|	409s|	409s|	409s|
| Streaming Reader | **Total** | 2539s|	1674s|	1378s|	1206s|
| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph | 8260s|	4900s	|3603s	|2999s|
| Adhoc Reader | Serialization | 1201s |	1201s|	1201s	|1201s|
| Adhoc Reader  | **Total** | 9461s|	6101s | 4804s	|4200s|
| Table Reader |  ReadFile | 73s	|73s|	96s|	96s|
| Table Reader | LoadGraph |4650s|	2768s|	2155s	|1778s|
| Table Reader |Serialization | 1201s |	1201s|	1201s	|1201s|
| Table Reader | **Total** | 5924s|	4042s|	3452s|	3075s|
| Streaming Reader | ReadFile | 889s |889s | 889s| 889s|
| Streaming Reader | LoadGraph | 5589s|	3005s|	2200s|	1712s|
| Streaming Reader | Serialization | 1201s| 1201s| 1201s |1201s |
| Streaming Reader | **Total** | 7679s	| 5095s |4290s| 	3802s|

FIx alibaba#3116

minor fix and move modern graph

fix grin test

todo: do_start

fix

fix

stash

fix

fix

make rules unique

dockerfile stash

minor change

remove plugin-dir

fix

minor fix

debug

debug

fix

fix
zhanglei1949 pushed a commit to zhanglei1949/GraphScope that referenced this issue Sep 15, 2023
refactor(flex): Replace the Adhoc csv reader with Arrow CSV reader (alibaba#3154)

1. Use Arrow CSV Reader to replace current adhoc csv reader, to support
more configurable options in `bulk_load.yaml`.
2. Introduce `CSVFragmentLoader`, `BasicFragmentLoader` for
`MutablePropertyFragment`.

With this PR merged, `MutablePropertyFragment` will support loading
fragment from csv with options:
- delimeter: default '|'
- header_row: default true
- quoting: default false
- quoting_char: default '"'
- escaping: default false
- escaping_char: default'\\'
- batch_size: the batch size of when reading file into memory, default
1MB.
- batch_reader: default false. If set to true,
`arrow::csv::StreamingReader` will be used to parse the input file.
Otherwise, `arrow::TableReader` will be used.

With this PR merged, the performance of graph loading will be improved.
The Adhoc Reader denote the current implemented csv parser, 1,2,4,8
denotes the parallelism of graph loading, i.e. how many labels of
vertex/edge are concurrently processed.

Note that TableReader is around 10x faster than StreamingReader. The
possible reason could be the multi-threading is used.
See [arrow-csv-doc](https://arrow.apache.org/docs/cpp/csv.html) for
details.

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |805s|	468s|	349s|	313s|
| Adhoc Reader | Serialization | 126s|	126s|	126s|	126s|
| Adhoc Reader  | **Total** |931s|	594s|	475s|	439s|
| Table Reader |  ReadFile | 9s	|9s	|9s|	9s|
| Table Reader | LoadGraph |455s|	280s|	211s|	182s|
| Table Reader |Serialization |126s|	126s|	126s|	126s|
| Table Reader | **Total** | 600s|	415s|	346s|	317s|
| Streaming Reader | ReadFile |91s|	91s|	91s|	91s|
| Streaming Reader | LoadGraph | 555s|	289s|	196s|	149s|
| Streaming Reader | Serialization |126s|	126s|	126s|	126s|
| Streaming Reader | **Total** | 772s|	506s|	413s|	366s|

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |2720s|	1548s|	1176s|	948s|
| Adhoc Reader | Serialization | 409s|	409s|	409s|	409s|
| Adhoc Reader  | **Total** | 3129s|	1957s|	1585s|	1357s|
| Table Reader |  ReadFile |24s|	24s|	24s|	24s|
| Table Reader | LoadGraph |1576s|	949s|	728s|	602s|
| Table Reader |Serialization |409s|	409s|	409s|	409s|
| Table Reader | **Total** | 2009s|	1382s|	1161s|	1035s|
| Streaming Reader | ReadFile |300s|	300s|	300s|	300s|
| Streaming Reader | LoadGraph | 1740s|	965s|	669s|	497s|
| Streaming Reader | Serialization | 409s|	409s|	409s|	409s|
| Streaming Reader | **Total** | 2539s|	1674s|	1378s|	1206s|
| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph | 8260s|	4900s	|3603s	|2999s|
| Adhoc Reader | Serialization | 1201s |	1201s|	1201s	|1201s|
| Adhoc Reader  | **Total** | 9461s|	6101s | 4804s	|4200s|
| Table Reader |  ReadFile | 73s	|73s|	96s|	96s|
| Table Reader | LoadGraph |4650s|	2768s|	2155s	|1778s|
| Table Reader |Serialization | 1201s |	1201s|	1201s	|1201s|
| Table Reader | **Total** | 5924s|	4042s|	3452s|	3075s|
| Streaming Reader | ReadFile | 889s |889s | 889s| 889s|
| Streaming Reader | LoadGraph | 5589s|	3005s|	2200s|	1712s|
| Streaming Reader | Serialization | 1201s| 1201s| 1201s |1201s |
| Streaming Reader | **Total** | 7679s	| 5095s |4290s| 	3802s|

FIx alibaba#3116

minor fix and move modern graph

fix grin test

todo: do_start

fix

fix

stash

fix

fix

make rules unique

dockerfile stash

minor change

remove plugin-dir

fix

minor fix

debug

debug

fix

fix

fix bulk_load.yaml

bash format

some fix

fix format

fix grin test

some fi

check ci

fix ci

set

fix ci

fix

dd

f

disable tmate
zhanglei1949 pushed a commit to zhanglei1949/GraphScope that referenced this issue Sep 19, 2023
refactor(flex): Replace the Adhoc csv reader with Arrow CSV reader (alibaba#3154)

1. Use Arrow CSV Reader to replace current adhoc csv reader, to support
more configurable options in `bulk_load.yaml`.
2. Introduce `CSVFragmentLoader`, `BasicFragmentLoader` for
`MutablePropertyFragment`.

With this PR merged, `MutablePropertyFragment` will support loading
fragment from csv with options:
- delimeter: default '|'
- header_row: default true
- quoting: default false
- quoting_char: default '"'
- escaping: default false
- escaping_char: default'\\'
- batch_size: the batch size of when reading file into memory, default
1MB.
- batch_reader: default false. If set to true,
`arrow::csv::StreamingReader` will be used to parse the input file.
Otherwise, `arrow::TableReader` will be used.

With this PR merged, the performance of graph loading will be improved.
The Adhoc Reader denote the current implemented csv parser, 1,2,4,8
denotes the parallelism of graph loading, i.e. how many labels of
vertex/edge are concurrently processed.

Note that TableReader is around 10x faster than StreamingReader. The
possible reason could be the multi-threading is used.
See [arrow-csv-doc](https://arrow.apache.org/docs/cpp/csv.html) for
details.

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |805s|	468s|	349s|	313s|
| Adhoc Reader | Serialization | 126s|	126s|	126s|	126s|
| Adhoc Reader  | **Total** |931s|	594s|	475s|	439s|
| Table Reader |  ReadFile | 9s	|9s	|9s|	9s|
| Table Reader | LoadGraph |455s|	280s|	211s|	182s|
| Table Reader |Serialization |126s|	126s|	126s|	126s|
| Table Reader | **Total** | 600s|	415s|	346s|	317s|
| Streaming Reader | ReadFile |91s|	91s|	91s|	91s|
| Streaming Reader | LoadGraph | 555s|	289s|	196s|	149s|
| Streaming Reader | Serialization |126s|	126s|	126s|	126s|
| Streaming Reader | **Total** | 772s|	506s|	413s|	366s|

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |2720s|	1548s|	1176s|	948s|
| Adhoc Reader | Serialization | 409s|	409s|	409s|	409s|
| Adhoc Reader  | **Total** | 3129s|	1957s|	1585s|	1357s|
| Table Reader |  ReadFile |24s|	24s|	24s|	24s|
| Table Reader | LoadGraph |1576s|	949s|	728s|	602s|
| Table Reader |Serialization |409s|	409s|	409s|	409s|
| Table Reader | **Total** | 2009s|	1382s|	1161s|	1035s|
| Streaming Reader | ReadFile |300s|	300s|	300s|	300s|
| Streaming Reader | LoadGraph | 1740s|	965s|	669s|	497s|
| Streaming Reader | Serialization | 409s|	409s|	409s|	409s|
| Streaming Reader | **Total** | 2539s|	1674s|	1378s|	1206s|
| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph | 8260s|	4900s	|3603s	|2999s|
| Adhoc Reader | Serialization | 1201s |	1201s|	1201s	|1201s|
| Adhoc Reader  | **Total** | 9461s|	6101s | 4804s	|4200s|
| Table Reader |  ReadFile | 73s	|73s|	96s|	96s|
| Table Reader | LoadGraph |4650s|	2768s|	2155s	|1778s|
| Table Reader |Serialization | 1201s |	1201s|	1201s	|1201s|
| Table Reader | **Total** | 5924s|	4042s|	3452s|	3075s|
| Streaming Reader | ReadFile | 889s |889s | 889s| 889s|
| Streaming Reader | LoadGraph | 5589s|	3005s|	2200s|	1712s|
| Streaming Reader | Serialization | 1201s| 1201s| 1201s |1201s |
| Streaming Reader | **Total** | 7679s	| 5095s |4290s| 	3802s|

FIx alibaba#3116

minor fix and move modern graph

fix grin test

todo: do_start

fix

fix

stash

fix

fix

make rules unique

dockerfile stash

minor change

remove plugin-dir

fix

minor fix

debug

debug

fix

fix

fix bulk_load.yaml

bash format

some fix

fix format

fix grin test

some fi

check ci

fix ci

set

fix ci

fix

dd

f

disable tmate

fix some bug

fix

fix

refactor

fix

fix

fix

minor

some fix
zhanglei1949 pushed a commit to zhanglei1949/GraphScope that referenced this issue Sep 21, 2023
refactor(flex): Replace the Adhoc csv reader with Arrow CSV reader (alibaba#3154)

1. Use Arrow CSV Reader to replace current adhoc csv reader, to support
more configurable options in `bulk_load.yaml`.
2. Introduce `CSVFragmentLoader`, `BasicFragmentLoader` for
`MutablePropertyFragment`.

With this PR merged, `MutablePropertyFragment` will support loading
fragment from csv with options:
- delimeter: default '|'
- header_row: default true
- quoting: default false
- quoting_char: default '"'
- escaping: default false
- escaping_char: default'\\'
- batch_size: the batch size of when reading file into memory, default
1MB.
- batch_reader: default false. If set to true,
`arrow::csv::StreamingReader` will be used to parse the input file.
Otherwise, `arrow::TableReader` will be used.

With this PR merged, the performance of graph loading will be improved.
The Adhoc Reader denote the current implemented csv parser, 1,2,4,8
denotes the parallelism of graph loading, i.e. how many labels of
vertex/edge are concurrently processed.

Note that TableReader is around 10x faster than StreamingReader. The
possible reason could be the multi-threading is used.
See [arrow-csv-doc](https://arrow.apache.org/docs/cpp/csv.html) for
details.

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |805s|	468s|	349s|	313s|
| Adhoc Reader | Serialization | 126s|	126s|	126s|	126s|
| Adhoc Reader  | **Total** |931s|	594s|	475s|	439s|
| Table Reader |  ReadFile | 9s	|9s	|9s|	9s|
| Table Reader | LoadGraph |455s|	280s|	211s|	182s|
| Table Reader |Serialization |126s|	126s|	126s|	126s|
| Table Reader | **Total** | 600s|	415s|	346s|	317s|
| Streaming Reader | ReadFile |91s|	91s|	91s|	91s|
| Streaming Reader | LoadGraph | 555s|	289s|	196s|	149s|
| Streaming Reader | Serialization |126s|	126s|	126s|	126s|
| Streaming Reader | **Total** | 772s|	506s|	413s|	366s|

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |2720s|	1548s|	1176s|	948s|
| Adhoc Reader | Serialization | 409s|	409s|	409s|	409s|
| Adhoc Reader  | **Total** | 3129s|	1957s|	1585s|	1357s|
| Table Reader |  ReadFile |24s|	24s|	24s|	24s|
| Table Reader | LoadGraph |1576s|	949s|	728s|	602s|
| Table Reader |Serialization |409s|	409s|	409s|	409s|
| Table Reader | **Total** | 2009s|	1382s|	1161s|	1035s|
| Streaming Reader | ReadFile |300s|	300s|	300s|	300s|
| Streaming Reader | LoadGraph | 1740s|	965s|	669s|	497s|
| Streaming Reader | Serialization | 409s|	409s|	409s|	409s|
| Streaming Reader | **Total** | 2539s|	1674s|	1378s|	1206s|
| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph | 8260s|	4900s	|3603s	|2999s|
| Adhoc Reader | Serialization | 1201s |	1201s|	1201s	|1201s|
| Adhoc Reader  | **Total** | 9461s|	6101s | 4804s	|4200s|
| Table Reader |  ReadFile | 73s	|73s|	96s|	96s|
| Table Reader | LoadGraph |4650s|	2768s|	2155s	|1778s|
| Table Reader |Serialization | 1201s |	1201s|	1201s	|1201s|
| Table Reader | **Total** | 5924s|	4042s|	3452s|	3075s|
| Streaming Reader | ReadFile | 889s |889s | 889s| 889s|
| Streaming Reader | LoadGraph | 5589s|	3005s|	2200s|	1712s|
| Streaming Reader | Serialization | 1201s| 1201s| 1201s |1201s |
| Streaming Reader | **Total** | 7679s	| 5095s |4290s| 	3802s|

FIx alibaba#3116

minor fix and move modern graph

fix grin test

todo: do_start

fix

fix

stash

fix

fix

make rules unique

dockerfile stash

minor change

remove plugin-dir

fix

minor fix

debug

debug

fix

fix

fix bulk_load.yaml

bash format

some fix

fix format

fix grin test

some fi

check ci

fix ci

set

fix ci

fix

dd

f

disable tmate

fix some bug

fix

fix

refactor

fix

fix

fix

minor

some fix

fix

support default src_dst primarykey mapping in bulk load

fix

fix

fix

fix

Ci
zhanglei1949 pushed a commit to zhanglei1949/GraphScope that referenced this issue Sep 21, 2023
refactor(flex): Replace the Adhoc csv reader with Arrow CSV reader (alibaba#3154)

1. Use Arrow CSV Reader to replace current adhoc csv reader, to support
more configurable options in `bulk_load.yaml`.
2. Introduce `CSVFragmentLoader`, `BasicFragmentLoader` for
`MutablePropertyFragment`.

With this PR merged, `MutablePropertyFragment` will support loading
fragment from csv with options:
- delimeter: default '|'
- header_row: default true
- quoting: default false
- quoting_char: default '"'
- escaping: default false
- escaping_char: default'\\'
- batch_size: the batch size of when reading file into memory, default
1MB.
- batch_reader: default false. If set to true,
`arrow::csv::StreamingReader` will be used to parse the input file.
Otherwise, `arrow::TableReader` will be used.

With this PR merged, the performance of graph loading will be improved.
The Adhoc Reader denote the current implemented csv parser, 1,2,4,8
denotes the parallelism of graph loading, i.e. how many labels of
vertex/edge are concurrently processed.

Note that TableReader is around 10x faster than StreamingReader. The
possible reason could be the multi-threading is used.
See [arrow-csv-doc](https://arrow.apache.org/docs/cpp/csv.html) for
details.

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |805s|	468s|	349s|	313s|
| Adhoc Reader | Serialization | 126s|	126s|	126s|	126s|
| Adhoc Reader  | **Total** |931s|	594s|	475s|	439s|
| Table Reader |  ReadFile | 9s	|9s	|9s|	9s|
| Table Reader | LoadGraph |455s|	280s|	211s|	182s|
| Table Reader |Serialization |126s|	126s|	126s|	126s|
| Table Reader | **Total** | 600s|	415s|	346s|	317s|
| Streaming Reader | ReadFile |91s|	91s|	91s|	91s|
| Streaming Reader | LoadGraph | 555s|	289s|	196s|	149s|
| Streaming Reader | Serialization |126s|	126s|	126s|	126s|
| Streaming Reader | **Total** | 772s|	506s|	413s|	366s|

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |2720s|	1548s|	1176s|	948s|
| Adhoc Reader | Serialization | 409s|	409s|	409s|	409s|
| Adhoc Reader  | **Total** | 3129s|	1957s|	1585s|	1357s|
| Table Reader |  ReadFile |24s|	24s|	24s|	24s|
| Table Reader | LoadGraph |1576s|	949s|	728s|	602s|
| Table Reader |Serialization |409s|	409s|	409s|	409s|
| Table Reader | **Total** | 2009s|	1382s|	1161s|	1035s|
| Streaming Reader | ReadFile |300s|	300s|	300s|	300s|
| Streaming Reader | LoadGraph | 1740s|	965s|	669s|	497s|
| Streaming Reader | Serialization | 409s|	409s|	409s|	409s|
| Streaming Reader | **Total** | 2539s|	1674s|	1378s|	1206s|
| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph | 8260s|	4900s	|3603s	|2999s|
| Adhoc Reader | Serialization | 1201s |	1201s|	1201s	|1201s|
| Adhoc Reader  | **Total** | 9461s|	6101s | 4804s	|4200s|
| Table Reader |  ReadFile | 73s	|73s|	96s|	96s|
| Table Reader | LoadGraph |4650s|	2768s|	2155s	|1778s|
| Table Reader |Serialization | 1201s |	1201s|	1201s	|1201s|
| Table Reader | **Total** | 5924s|	4042s|	3452s|	3075s|
| Streaming Reader | ReadFile | 889s |889s | 889s| 889s|
| Streaming Reader | LoadGraph | 5589s|	3005s|	2200s|	1712s|
| Streaming Reader | Serialization | 1201s| 1201s| 1201s |1201s |
| Streaming Reader | **Total** | 7679s	| 5095s |4290s| 	3802s|

FIx alibaba#3116

minor fix and move modern graph

fix grin test

todo: do_start

fix

fix

stash

fix

fix

make rules unique

dockerfile stash

minor change

remove plugin-dir

fix

minor fix

debug

debug

fix

fix

fix bulk_load.yaml

bash format

some fix

fix format

fix grin test

some fi

check ci

fix ci

set

fix ci

fix

dd

f

disable tmate

fix some bug

fix

fix

refactor

fix

fix

fix

minor

some fix

fix

support default src_dst primarykey mapping in bulk load

fix

fix

fix

fix

Ci

rename

fix java and add get_person_name.cypher
zhanglei1949 pushed a commit to zhanglei1949/GraphScope that referenced this issue Sep 21, 2023
refactor(flex): Replace the Adhoc csv reader with Arrow CSV reader (alibaba#3154)

1. Use Arrow CSV Reader to replace current adhoc csv reader, to support
more configurable options in `bulk_load.yaml`.
2. Introduce `CSVFragmentLoader`, `BasicFragmentLoader` for
`MutablePropertyFragment`.

With this PR merged, `MutablePropertyFragment` will support loading
fragment from csv with options:
- delimeter: default '|'
- header_row: default true
- quoting: default false
- quoting_char: default '"'
- escaping: default false
- escaping_char: default'\\'
- batch_size: the batch size of when reading file into memory, default
1MB.
- batch_reader: default false. If set to true,
`arrow::csv::StreamingReader` will be used to parse the input file.
Otherwise, `arrow::TableReader` will be used.

With this PR merged, the performance of graph loading will be improved.
The Adhoc Reader denote the current implemented csv parser, 1,2,4,8
denotes the parallelism of graph loading, i.e. how many labels of
vertex/edge are concurrently processed.

Note that TableReader is around 10x faster than StreamingReader. The
possible reason could be the multi-threading is used.
See [arrow-csv-doc](https://arrow.apache.org/docs/cpp/csv.html) for
details.

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |805s|	468s|	349s|	313s|
| Adhoc Reader | Serialization | 126s|	126s|	126s|	126s|
| Adhoc Reader  | **Total** |931s|	594s|	475s|	439s|
| Table Reader |  ReadFile | 9s	|9s	|9s|	9s|
| Table Reader | LoadGraph |455s|	280s|	211s|	182s|
| Table Reader |Serialization |126s|	126s|	126s|	126s|
| Table Reader | **Total** | 600s|	415s|	346s|	317s|
| Streaming Reader | ReadFile |91s|	91s|	91s|	91s|
| Streaming Reader | LoadGraph | 555s|	289s|	196s|	149s|
| Streaming Reader | Serialization |126s|	126s|	126s|	126s|
| Streaming Reader | **Total** | 772s|	506s|	413s|	366s|

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |2720s|	1548s|	1176s|	948s|
| Adhoc Reader | Serialization | 409s|	409s|	409s|	409s|
| Adhoc Reader  | **Total** | 3129s|	1957s|	1585s|	1357s|
| Table Reader |  ReadFile |24s|	24s|	24s|	24s|
| Table Reader | LoadGraph |1576s|	949s|	728s|	602s|
| Table Reader |Serialization |409s|	409s|	409s|	409s|
| Table Reader | **Total** | 2009s|	1382s|	1161s|	1035s|
| Streaming Reader | ReadFile |300s|	300s|	300s|	300s|
| Streaming Reader | LoadGraph | 1740s|	965s|	669s|	497s|
| Streaming Reader | Serialization | 409s|	409s|	409s|	409s|
| Streaming Reader | **Total** | 2539s|	1674s|	1378s|	1206s|
| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph | 8260s|	4900s	|3603s	|2999s|
| Adhoc Reader | Serialization | 1201s |	1201s|	1201s	|1201s|
| Adhoc Reader  | **Total** | 9461s|	6101s | 4804s	|4200s|
| Table Reader |  ReadFile | 73s	|73s|	96s|	96s|
| Table Reader | LoadGraph |4650s|	2768s|	2155s	|1778s|
| Table Reader |Serialization | 1201s |	1201s|	1201s	|1201s|
| Table Reader | **Total** | 5924s|	4042s|	3452s|	3075s|
| Streaming Reader | ReadFile | 889s |889s | 889s| 889s|
| Streaming Reader | LoadGraph | 5589s|	3005s|	2200s|	1712s|
| Streaming Reader | Serialization | 1201s| 1201s| 1201s |1201s |
| Streaming Reader | **Total** | 7679s	| 5095s |4290s| 	3802s|

FIx alibaba#3116

minor fix and move modern graph

fix grin test

todo: do_start

fix

fix

stash

fix

fix

make rules unique

dockerfile stash

minor change

remove plugin-dir

fix

minor fix

debug

debug

fix

fix

fix bulk_load.yaml

bash format

some fix

fix format

fix grin test

some fi

check ci

fix ci

set

fix ci

fix

dd

f

disable tmate

fix some bug

fix

fix

refactor

fix

fix

fix

minor

some fix

fix

support default src_dst primarykey mapping in bulk load

fix

fix

fix

fix

Ci

rename

fix java and add get_person_name.cypher
zhanglei1949 pushed a commit to zhanglei1949/GraphScope that referenced this issue Sep 22, 2023
author shirly121 <yihe.zxl@alibaba-inc.com> 1694167237 +0800
committer xiaolei.zl <xiaolei.zl@alibaba-inc.com> 1695348300 +0800

parent 6ab796e
author shirly121 <yihe.zxl@alibaba-inc.com> 1694167237 +0800
committer xiaolei.zl <xiaolei.zl@alibaba-inc.com> 1695348286 +0800

[GIE Compiler] fix bugs of columnId in schema

refactor(flex): Replace the Adhoc csv reader with Arrow CSV reader (alibaba#3154)

1. Use Arrow CSV Reader to replace current adhoc csv reader, to support
more configurable options in `bulk_load.yaml`.
2. Introduce `CSVFragmentLoader`, `BasicFragmentLoader` for
`MutablePropertyFragment`.

With this PR merged, `MutablePropertyFragment` will support loading
fragment from csv with options:
- delimeter: default '|'
- header_row: default true
- quoting: default false
- quoting_char: default '"'
- escaping: default false
- escaping_char: default'\\'
- batch_size: the batch size of when reading file into memory, default
1MB.
- batch_reader: default false. If set to true,
`arrow::csv::StreamingReader` will be used to parse the input file.
Otherwise, `arrow::TableReader` will be used.

With this PR merged, the performance of graph loading will be improved.
The Adhoc Reader denote the current implemented csv parser, 1,2,4,8
denotes the parallelism of graph loading, i.e. how many labels of
vertex/edge are concurrently processed.

Note that TableReader is around 10x faster than StreamingReader. The
possible reason could be the multi-threading is used.
See [arrow-csv-doc](https://arrow.apache.org/docs/cpp/csv.html) for
details.

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |805s|	468s|	349s|	313s|
| Adhoc Reader | Serialization | 126s|	126s|	126s|	126s|
| Adhoc Reader  | **Total** |931s|	594s|	475s|	439s|
| Table Reader |  ReadFile | 9s	|9s	|9s|	9s|
| Table Reader | LoadGraph |455s|	280s|	211s|	182s|
| Table Reader |Serialization |126s|	126s|	126s|	126s|
| Table Reader | **Total** | 600s|	415s|	346s|	317s|
| Streaming Reader | ReadFile |91s|	91s|	91s|	91s|
| Streaming Reader | LoadGraph | 555s|	289s|	196s|	149s|
| Streaming Reader | Serialization |126s|	126s|	126s|	126s|
| Streaming Reader | **Total** | 772s|	506s|	413s|	366s|

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |2720s|	1548s|	1176s|	948s|
| Adhoc Reader | Serialization | 409s|	409s|	409s|	409s|
| Adhoc Reader  | **Total** | 3129s|	1957s|	1585s|	1357s|
| Table Reader |  ReadFile |24s|	24s|	24s|	24s|
| Table Reader | LoadGraph |1576s|	949s|	728s|	602s|
| Table Reader |Serialization |409s|	409s|	409s|	409s|
| Table Reader | **Total** | 2009s|	1382s|	1161s|	1035s|
| Streaming Reader | ReadFile |300s|	300s|	300s|	300s|
| Streaming Reader | LoadGraph | 1740s|	965s|	669s|	497s|
| Streaming Reader | Serialization | 409s|	409s|	409s|	409s|
| Streaming Reader | **Total** | 2539s|	1674s|	1378s|	1206s|
| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph | 8260s|	4900s	|3603s	|2999s|
| Adhoc Reader | Serialization | 1201s |	1201s|	1201s	|1201s|
| Adhoc Reader  | **Total** | 9461s|	6101s | 4804s	|4200s|
| Table Reader |  ReadFile | 73s	|73s|	96s|	96s|
| Table Reader | LoadGraph |4650s|	2768s|	2155s	|1778s|
| Table Reader |Serialization | 1201s |	1201s|	1201s	|1201s|
| Table Reader | **Total** | 5924s|	4042s|	3452s|	3075s|
| Streaming Reader | ReadFile | 889s |889s | 889s| 889s|
| Streaming Reader | LoadGraph | 5589s|	3005s|	2200s|	1712s|
| Streaming Reader | Serialization | 1201s| 1201s| 1201s |1201s |
| Streaming Reader | **Total** | 7679s	| 5095s |4290s| 	3802s|

FIx alibaba#3116

minor fix and move modern graph

fix grin test

todo: do_start

fix

fix

stash

fix

fix

make rules unique

dockerfile stash

minor change

remove plugin-dir

fix

minor fix

debug

debug

fix

fix

fix bulk_load.yaml

bash format

some fix

fix format

fix grin test

some fi

check ci

fix ci

set

fix ci

fix

dd

f

disable tmate

fix some bug

fix

fix

refactor

fix

fix

fix

minor

some fix

fix

support default src_dst primarykey mapping in bulk load

fix

fix

fix

fix

Ci

rename

fix java and add get_person_name.cypher

[GIE Compiler] minor fix

use graphscope gstest

format

add movie queries
zhanglei1949 pushed a commit to zhanglei1949/GraphScope that referenced this issue Sep 24, 2023
author shirly121 <yihe.zxl@alibaba-inc.com> 1694167237 +0800
committer xiaolei.zl <xiaolei.zl@alibaba-inc.com> 1695348300 +0800

parent 6ab796e
author shirly121 <yihe.zxl@alibaba-inc.com> 1694167237 +0800
committer xiaolei.zl <xiaolei.zl@alibaba-inc.com> 1695348286 +0800

[GIE Compiler] fix bugs of columnId in schema

refactor(flex): Replace the Adhoc csv reader with Arrow CSV reader (alibaba#3154)

1. Use Arrow CSV Reader to replace current adhoc csv reader, to support
more configurable options in `bulk_load.yaml`.
2. Introduce `CSVFragmentLoader`, `BasicFragmentLoader` for
`MutablePropertyFragment`.

With this PR merged, `MutablePropertyFragment` will support loading
fragment from csv with options:
- delimeter: default '|'
- header_row: default true
- quoting: default false
- quoting_char: default '"'
- escaping: default false
- escaping_char: default'\\'
- batch_size: the batch size of when reading file into memory, default
1MB.
- batch_reader: default false. If set to true,
`arrow::csv::StreamingReader` will be used to parse the input file.
Otherwise, `arrow::TableReader` will be used.

With this PR merged, the performance of graph loading will be improved.
The Adhoc Reader denote the current implemented csv parser, 1,2,4,8
denotes the parallelism of graph loading, i.e. how many labels of
vertex/edge are concurrently processed.

Note that TableReader is around 10x faster than StreamingReader. The
possible reason could be the multi-threading is used.
See [arrow-csv-doc](https://arrow.apache.org/docs/cpp/csv.html) for
details.

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |805s|	468s|	349s|	313s|
| Adhoc Reader | Serialization | 126s|	126s|	126s|	126s|
| Adhoc Reader  | **Total** |931s|	594s|	475s|	439s|
| Table Reader |  ReadFile | 9s	|9s	|9s|	9s|
| Table Reader | LoadGraph |455s|	280s|	211s|	182s|
| Table Reader |Serialization |126s|	126s|	126s|	126s|
| Table Reader | **Total** | 600s|	415s|	346s|	317s|
| Streaming Reader | ReadFile |91s|	91s|	91s|	91s|
| Streaming Reader | LoadGraph | 555s|	289s|	196s|	149s|
| Streaming Reader | Serialization |126s|	126s|	126s|	126s|
| Streaming Reader | **Total** | 772s|	506s|	413s|	366s|

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |2720s|	1548s|	1176s|	948s|
| Adhoc Reader | Serialization | 409s|	409s|	409s|	409s|
| Adhoc Reader  | **Total** | 3129s|	1957s|	1585s|	1357s|
| Table Reader |  ReadFile |24s|	24s|	24s|	24s|
| Table Reader | LoadGraph |1576s|	949s|	728s|	602s|
| Table Reader |Serialization |409s|	409s|	409s|	409s|
| Table Reader | **Total** | 2009s|	1382s|	1161s|	1035s|
| Streaming Reader | ReadFile |300s|	300s|	300s|	300s|
| Streaming Reader | LoadGraph | 1740s|	965s|	669s|	497s|
| Streaming Reader | Serialization | 409s|	409s|	409s|	409s|
| Streaming Reader | **Total** | 2539s|	1674s|	1378s|	1206s|
| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph | 8260s|	4900s	|3603s	|2999s|
| Adhoc Reader | Serialization | 1201s |	1201s|	1201s	|1201s|
| Adhoc Reader  | **Total** | 9461s|	6101s | 4804s	|4200s|
| Table Reader |  ReadFile | 73s	|73s|	96s|	96s|
| Table Reader | LoadGraph |4650s|	2768s|	2155s	|1778s|
| Table Reader |Serialization | 1201s |	1201s|	1201s	|1201s|
| Table Reader | **Total** | 5924s|	4042s|	3452s|	3075s|
| Streaming Reader | ReadFile | 889s |889s | 889s| 889s|
| Streaming Reader | LoadGraph | 5589s|	3005s|	2200s|	1712s|
| Streaming Reader | Serialization | 1201s| 1201s| 1201s |1201s |
| Streaming Reader | **Total** | 7679s	| 5095s |4290s| 	3802s|

FIx alibaba#3116

minor fix and move modern graph

fix grin test

todo: do_start

fix

fix

stash

fix

fix

make rules unique

dockerfile stash

minor change

remove plugin-dir

fix

minor fix

debug

debug

fix

fix

fix bulk_load.yaml

bash format

some fix

fix format

fix grin test

some fi

check ci

fix ci

set

fix ci

fix

dd

f

disable tmate

fix some bug

fix

fix

refactor

fix

fix

fix

minor

some fix

fix

support default src_dst primarykey mapping in bulk load

fix

fix

fix

fix

Ci

rename

fix java and add get_person_name.cypher

[GIE Compiler] minor fix

use graphscope gstest

format

add movie queries

dd

debug

add movie test

format

format
zhanglei1949 pushed a commit to zhanglei1949/GraphScope that referenced this issue Sep 25, 2023
author shirly121 <yihe.zxl@alibaba-inc.com> 1694167237 +0800
committer xiaolei.zl <xiaolei.zl@alibaba-inc.com> 1695348300 +0800

parent 6ab796e
author shirly121 <yihe.zxl@alibaba-inc.com> 1694167237 +0800
committer xiaolei.zl <xiaolei.zl@alibaba-inc.com> 1695348286 +0800

[GIE Compiler] fix bugs of columnId in schema

refactor(flex): Replace the Adhoc csv reader with Arrow CSV reader (alibaba#3154)

1. Use Arrow CSV Reader to replace current adhoc csv reader, to support
more configurable options in `bulk_load.yaml`.
2. Introduce `CSVFragmentLoader`, `BasicFragmentLoader` for
`MutablePropertyFragment`.

With this PR merged, `MutablePropertyFragment` will support loading
fragment from csv with options:
- delimeter: default '|'
- header_row: default true
- quoting: default false
- quoting_char: default '"'
- escaping: default false
- escaping_char: default'\\'
- batch_size: the batch size of when reading file into memory, default
1MB.
- batch_reader: default false. If set to true,
`arrow::csv::StreamingReader` will be used to parse the input file.
Otherwise, `arrow::TableReader` will be used.

With this PR merged, the performance of graph loading will be improved.
The Adhoc Reader denote the current implemented csv parser, 1,2,4,8
denotes the parallelism of graph loading, i.e. how many labels of
vertex/edge are concurrently processed.

Note that TableReader is around 10x faster than StreamingReader. The
possible reason could be the multi-threading is used.
See [arrow-csv-doc](https://arrow.apache.org/docs/cpp/csv.html) for
details.

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |805s|	468s|	349s|	313s|
| Adhoc Reader | Serialization | 126s|	126s|	126s|	126s|
| Adhoc Reader  | **Total** |931s|	594s|	475s|	439s|
| Table Reader |  ReadFile | 9s	|9s	|9s|	9s|
| Table Reader | LoadGraph |455s|	280s|	211s|	182s|
| Table Reader |Serialization |126s|	126s|	126s|	126s|
| Table Reader | **Total** | 600s|	415s|	346s|	317s|
| Streaming Reader | ReadFile |91s|	91s|	91s|	91s|
| Streaming Reader | LoadGraph | 555s|	289s|	196s|	149s|
| Streaming Reader | Serialization |126s|	126s|	126s|	126s|
| Streaming Reader | **Total** | 772s|	506s|	413s|	366s|

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |2720s|	1548s|	1176s|	948s|
| Adhoc Reader | Serialization | 409s|	409s|	409s|	409s|
| Adhoc Reader  | **Total** | 3129s|	1957s|	1585s|	1357s|
| Table Reader |  ReadFile |24s|	24s|	24s|	24s|
| Table Reader | LoadGraph |1576s|	949s|	728s|	602s|
| Table Reader |Serialization |409s|	409s|	409s|	409s|
| Table Reader | **Total** | 2009s|	1382s|	1161s|	1035s|
| Streaming Reader | ReadFile |300s|	300s|	300s|	300s|
| Streaming Reader | LoadGraph | 1740s|	965s|	669s|	497s|
| Streaming Reader | Serialization | 409s|	409s|	409s|	409s|
| Streaming Reader | **Total** | 2539s|	1674s|	1378s|	1206s|
| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph | 8260s|	4900s	|3603s	|2999s|
| Adhoc Reader | Serialization | 1201s |	1201s|	1201s	|1201s|
| Adhoc Reader  | **Total** | 9461s|	6101s | 4804s	|4200s|
| Table Reader |  ReadFile | 73s	|73s|	96s|	96s|
| Table Reader | LoadGraph |4650s|	2768s|	2155s	|1778s|
| Table Reader |Serialization | 1201s |	1201s|	1201s	|1201s|
| Table Reader | **Total** | 5924s|	4042s|	3452s|	3075s|
| Streaming Reader | ReadFile | 889s |889s | 889s| 889s|
| Streaming Reader | LoadGraph | 5589s|	3005s|	2200s|	1712s|
| Streaming Reader | Serialization | 1201s| 1201s| 1201s |1201s |
| Streaming Reader | **Total** | 7679s	| 5095s |4290s| 	3802s|

FIx alibaba#3116

minor fix and move modern graph

fix grin test

todo: do_start

fix

fix

stash

fix

fix

make rules unique

dockerfile stash

minor change

remove plugin-dir

fix

minor fix

debug

debug

fix

fix

fix bulk_load.yaml

bash format

some fix

fix format

fix grin test

some fi

check ci

fix ci

set

fix ci

fix

dd

f

disable tmate

fix some bug

fix

fix

refactor

fix

fix

fix

minor

some fix

fix

support default src_dst primarykey mapping in bulk load

fix

fix

fix

fix

Ci

rename

fix java and add get_person_name.cypher

[GIE Compiler] minor fix

use graphscope gstest

format

add movie queries

dd

debug

add movie test

format

format

fix script

debug

fix test script

minor

sort query results

minor

minor

format
zhanglei1949 pushed a commit to zhanglei1949/GraphScope that referenced this issue Sep 25, 2023
author shirly121 <yihe.zxl@alibaba-inc.com> 1694167237 +0800
committer xiaolei.zl <xiaolei.zl@alibaba-inc.com> 1695348300 +0800

parent 6ab796e
author shirly121 <yihe.zxl@alibaba-inc.com> 1694167237 +0800
committer xiaolei.zl <xiaolei.zl@alibaba-inc.com> 1695348286 +0800

[GIE Compiler] fix bugs of columnId in schema

refactor(flex): Replace the Adhoc csv reader with Arrow CSV reader (alibaba#3154)

1. Use Arrow CSV Reader to replace current adhoc csv reader, to support
more configurable options in `bulk_load.yaml`.
2. Introduce `CSVFragmentLoader`, `BasicFragmentLoader` for
`MutablePropertyFragment`.

With this PR merged, `MutablePropertyFragment` will support loading
fragment from csv with options:
- delimeter: default '|'
- header_row: default true
- quoting: default false
- quoting_char: default '"'
- escaping: default false
- escaping_char: default'\\'
- batch_size: the batch size of when reading file into memory, default
1MB.
- batch_reader: default false. If set to true,
`arrow::csv::StreamingReader` will be used to parse the input file.
Otherwise, `arrow::TableReader` will be used.

With this PR merged, the performance of graph loading will be improved.
The Adhoc Reader denote the current implemented csv parser, 1,2,4,8
denotes the parallelism of graph loading, i.e. how many labels of
vertex/edge are concurrently processed.

Note that TableReader is around 10x faster than StreamingReader. The
possible reason could be the multi-threading is used.
See [arrow-csv-doc](https://arrow.apache.org/docs/cpp/csv.html) for
details.

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |805s|	468s|	349s|	313s|
| Adhoc Reader | Serialization | 126s|	126s|	126s|	126s|
| Adhoc Reader  | **Total** |931s|	594s|	475s|	439s|
| Table Reader |  ReadFile | 9s	|9s	|9s|	9s|
| Table Reader | LoadGraph |455s|	280s|	211s|	182s|
| Table Reader |Serialization |126s|	126s|	126s|	126s|
| Table Reader | **Total** | 600s|	415s|	346s|	317s|
| Streaming Reader | ReadFile |91s|	91s|	91s|	91s|
| Streaming Reader | LoadGraph | 555s|	289s|	196s|	149s|
| Streaming Reader | Serialization |126s|	126s|	126s|	126s|
| Streaming Reader | **Total** | 772s|	506s|	413s|	366s|

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |2720s|	1548s|	1176s|	948s|
| Adhoc Reader | Serialization | 409s|	409s|	409s|	409s|
| Adhoc Reader  | **Total** | 3129s|	1957s|	1585s|	1357s|
| Table Reader |  ReadFile |24s|	24s|	24s|	24s|
| Table Reader | LoadGraph |1576s|	949s|	728s|	602s|
| Table Reader |Serialization |409s|	409s|	409s|	409s|
| Table Reader | **Total** | 2009s|	1382s|	1161s|	1035s|
| Streaming Reader | ReadFile |300s|	300s|	300s|	300s|
| Streaming Reader | LoadGraph | 1740s|	965s|	669s|	497s|
| Streaming Reader | Serialization | 409s|	409s|	409s|	409s|
| Streaming Reader | **Total** | 2539s|	1674s|	1378s|	1206s|
| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph | 8260s|	4900s	|3603s	|2999s|
| Adhoc Reader | Serialization | 1201s |	1201s|	1201s	|1201s|
| Adhoc Reader  | **Total** | 9461s|	6101s | 4804s	|4200s|
| Table Reader |  ReadFile | 73s	|73s|	96s|	96s|
| Table Reader | LoadGraph |4650s|	2768s|	2155s	|1778s|
| Table Reader |Serialization | 1201s |	1201s|	1201s	|1201s|
| Table Reader | **Total** | 5924s|	4042s|	3452s|	3075s|
| Streaming Reader | ReadFile | 889s |889s | 889s| 889s|
| Streaming Reader | LoadGraph | 5589s|	3005s|	2200s|	1712s|
| Streaming Reader | Serialization | 1201s| 1201s| 1201s |1201s |
| Streaming Reader | **Total** | 7679s	| 5095s |4290s| 	3802s|

FIx alibaba#3116

minor fix and move modern graph

fix grin test

todo: do_start

fix

fix

stash

fix

fix

make rules unique

dockerfile stash

minor change

remove plugin-dir

fix

minor fix

debug

debug

fix

fix

fix bulk_load.yaml

bash format

some fix

fix format

fix grin test

some fi

check ci

fix ci

set

fix ci

fix

dd

f

disable tmate

fix some bug

fix

fix

refactor

fix

fix

fix

minor

some fix

fix

support default src_dst primarykey mapping in bulk load

fix

fix

fix

fix

Ci

rename

fix java and add get_person_name.cypher

[GIE Compiler] minor fix

use graphscope gstest

format

add movie queries

dd

debug

add movie test

format

format

fix script

debug

fix test script

minor

sort query results

minor

minor

format
zhanglei1949 pushed a commit to zhanglei1949/GraphScope that referenced this issue Sep 25, 2023
author shirly121 <yihe.zxl@alibaba-inc.com> 1694167237 +0800
committer xiaolei.zl <xiaolei.zl@alibaba-inc.com> 1695348300 +0800

parent 6ab796e
author shirly121 <yihe.zxl@alibaba-inc.com> 1694167237 +0800
committer xiaolei.zl <xiaolei.zl@alibaba-inc.com> 1695348286 +0800

[GIE Compiler] fix bugs of columnId in schema

refactor(flex): Replace the Adhoc csv reader with Arrow CSV reader (alibaba#3154)

1. Use Arrow CSV Reader to replace current adhoc csv reader, to support
more configurable options in `bulk_load.yaml`.
2. Introduce `CSVFragmentLoader`, `BasicFragmentLoader` for
`MutablePropertyFragment`.

With this PR merged, `MutablePropertyFragment` will support loading
fragment from csv with options:
- delimeter: default '|'
- header_row: default true
- quoting: default false
- quoting_char: default '"'
- escaping: default false
- escaping_char: default'\\'
- batch_size: the batch size of when reading file into memory, default
1MB.
- batch_reader: default false. If set to true,
`arrow::csv::StreamingReader` will be used to parse the input file.
Otherwise, `arrow::TableReader` will be used.

With this PR merged, the performance of graph loading will be improved.
The Adhoc Reader denote the current implemented csv parser, 1,2,4,8
denotes the parallelism of graph loading, i.e. how many labels of
vertex/edge are concurrently processed.

Note that TableReader is around 10x faster than StreamingReader. The
possible reason could be the multi-threading is used.
See [arrow-csv-doc](https://arrow.apache.org/docs/cpp/csv.html) for
details.

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |805s|	468s|	349s|	313s|
| Adhoc Reader | Serialization | 126s|	126s|	126s|	126s|
| Adhoc Reader  | **Total** |931s|	594s|	475s|	439s|
| Table Reader |  ReadFile | 9s	|9s	|9s|	9s|
| Table Reader | LoadGraph |455s|	280s|	211s|	182s|
| Table Reader |Serialization |126s|	126s|	126s|	126s|
| Table Reader | **Total** | 600s|	415s|	346s|	317s|
| Streaming Reader | ReadFile |91s|	91s|	91s|	91s|
| Streaming Reader | LoadGraph | 555s|	289s|	196s|	149s|
| Streaming Reader | Serialization |126s|	126s|	126s|	126s|
| Streaming Reader | **Total** | 772s|	506s|	413s|	366s|

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |2720s|	1548s|	1176s|	948s|
| Adhoc Reader | Serialization | 409s|	409s|	409s|	409s|
| Adhoc Reader  | **Total** | 3129s|	1957s|	1585s|	1357s|
| Table Reader |  ReadFile |24s|	24s|	24s|	24s|
| Table Reader | LoadGraph |1576s|	949s|	728s|	602s|
| Table Reader |Serialization |409s|	409s|	409s|	409s|
| Table Reader | **Total** | 2009s|	1382s|	1161s|	1035s|
| Streaming Reader | ReadFile |300s|	300s|	300s|	300s|
| Streaming Reader | LoadGraph | 1740s|	965s|	669s|	497s|
| Streaming Reader | Serialization | 409s|	409s|	409s|	409s|
| Streaming Reader | **Total** | 2539s|	1674s|	1378s|	1206s|
| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph | 8260s|	4900s	|3603s	|2999s|
| Adhoc Reader | Serialization | 1201s |	1201s|	1201s	|1201s|
| Adhoc Reader  | **Total** | 9461s|	6101s | 4804s	|4200s|
| Table Reader |  ReadFile | 73s	|73s|	96s|	96s|
| Table Reader | LoadGraph |4650s|	2768s|	2155s	|1778s|
| Table Reader |Serialization | 1201s |	1201s|	1201s	|1201s|
| Table Reader | **Total** | 5924s|	4042s|	3452s|	3075s|
| Streaming Reader | ReadFile | 889s |889s | 889s| 889s|
| Streaming Reader | LoadGraph | 5589s|	3005s|	2200s|	1712s|
| Streaming Reader | Serialization | 1201s| 1201s| 1201s |1201s |
| Streaming Reader | **Total** | 7679s	| 5095s |4290s| 	3802s|

FIx alibaba#3116

minor fix and move modern graph

fix grin test

todo: do_start

fix

fix

stash

fix

fix

make rules unique

dockerfile stash

minor change

remove plugin-dir

fix

minor fix

debug

debug

fix

fix

fix bulk_load.yaml

bash format

some fix

fix format

fix grin test

some fi

check ci

fix ci

set

fix ci

fix

dd

f

disable tmate

fix some bug

fix

fix

refactor

fix

fix

fix

minor

some fix

fix

support default src_dst primarykey mapping in bulk load

fix

fix

fix

fix

Ci

rename

fix java and add get_person_name.cypher

[GIE Compiler] minor fix

use graphscope gstest

format

add movie queries

dd

debug

add movie test

format

format

fix script

debug

fix test script

minor

sort query results

minor

minor

format

fix ci

format

gstest
zhanglei1949 pushed a commit to zhanglei1949/GraphScope that referenced this issue Sep 25, 2023
author shirly121 <yihe.zxl@alibaba-inc.com> 1694167237 +0800
committer xiaolei.zl <xiaolei.zl@alibaba-inc.com> 1695348300 +0800

parent 6ab796e
author shirly121 <yihe.zxl@alibaba-inc.com> 1694167237 +0800
committer xiaolei.zl <xiaolei.zl@alibaba-inc.com> 1695348286 +0800

[GIE Compiler] fix bugs of columnId in schema

refactor(flex): Replace the Adhoc csv reader with Arrow CSV reader (alibaba#3154)

1. Use Arrow CSV Reader to replace current adhoc csv reader, to support
more configurable options in `bulk_load.yaml`.
2. Introduce `CSVFragmentLoader`, `BasicFragmentLoader` for
`MutablePropertyFragment`.

With this PR merged, `MutablePropertyFragment` will support loading
fragment from csv with options:
- delimeter: default '|'
- header_row: default true
- quoting: default false
- quoting_char: default '"'
- escaping: default false
- escaping_char: default'\\'
- batch_size: the batch size of when reading file into memory, default
1MB.
- batch_reader: default false. If set to true,
`arrow::csv::StreamingReader` will be used to parse the input file.
Otherwise, `arrow::TableReader` will be used.

With this PR merged, the performance of graph loading will be improved.
The Adhoc Reader denote the current implemented csv parser, 1,2,4,8
denotes the parallelism of graph loading, i.e. how many labels of
vertex/edge are concurrently processed.

Note that TableReader is around 10x faster than StreamingReader. The
possible reason could be the multi-threading is used.
See [arrow-csv-doc](https://arrow.apache.org/docs/cpp/csv.html) for
details.

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |805s|	468s|	349s|	313s|
| Adhoc Reader | Serialization | 126s|	126s|	126s|	126s|
| Adhoc Reader  | **Total** |931s|	594s|	475s|	439s|
| Table Reader |  ReadFile | 9s	|9s	|9s|	9s|
| Table Reader | LoadGraph |455s|	280s|	211s|	182s|
| Table Reader |Serialization |126s|	126s|	126s|	126s|
| Table Reader | **Total** | 600s|	415s|	346s|	317s|
| Streaming Reader | ReadFile |91s|	91s|	91s|	91s|
| Streaming Reader | LoadGraph | 555s|	289s|	196s|	149s|
| Streaming Reader | Serialization |126s|	126s|	126s|	126s|
| Streaming Reader | **Total** | 772s|	506s|	413s|	366s|

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |2720s|	1548s|	1176s|	948s|
| Adhoc Reader | Serialization | 409s|	409s|	409s|	409s|
| Adhoc Reader  | **Total** | 3129s|	1957s|	1585s|	1357s|
| Table Reader |  ReadFile |24s|	24s|	24s|	24s|
| Table Reader | LoadGraph |1576s|	949s|	728s|	602s|
| Table Reader |Serialization |409s|	409s|	409s|	409s|
| Table Reader | **Total** | 2009s|	1382s|	1161s|	1035s|
| Streaming Reader | ReadFile |300s|	300s|	300s|	300s|
| Streaming Reader | LoadGraph | 1740s|	965s|	669s|	497s|
| Streaming Reader | Serialization | 409s|	409s|	409s|	409s|
| Streaming Reader | **Total** | 2539s|	1674s|	1378s|	1206s|
| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph | 8260s|	4900s	|3603s	|2999s|
| Adhoc Reader | Serialization | 1201s |	1201s|	1201s	|1201s|
| Adhoc Reader  | **Total** | 9461s|	6101s | 4804s	|4200s|
| Table Reader |  ReadFile | 73s	|73s|	96s|	96s|
| Table Reader | LoadGraph |4650s|	2768s|	2155s	|1778s|
| Table Reader |Serialization | 1201s |	1201s|	1201s	|1201s|
| Table Reader | **Total** | 5924s|	4042s|	3452s|	3075s|
| Streaming Reader | ReadFile | 889s |889s | 889s| 889s|
| Streaming Reader | LoadGraph | 5589s|	3005s|	2200s|	1712s|
| Streaming Reader | Serialization | 1201s| 1201s| 1201s |1201s |
| Streaming Reader | **Total** | 7679s	| 5095s |4290s| 	3802s|

FIx alibaba#3116

minor fix and move modern graph

fix grin test

todo: do_start

fix

fix

stash

fix

fix

make rules unique

dockerfile stash

minor change

remove plugin-dir

fix

minor fix

debug

debug

fix

fix

fix bulk_load.yaml

bash format

some fix

fix format

fix grin test

some fi

check ci

fix ci

set

fix ci

fix

dd

f

disable tmate

fix some bug

fix

fix

refactor

fix

fix

fix

minor

some fix

fix

support default src_dst primarykey mapping in bulk load

fix

fix

fix

fix

Ci

rename

fix java and add get_person_name.cypher

[GIE Compiler] minor fix

use graphscope gstest

format

add movie queries

dd

debug

add movie test

format

format

fix script

debug

fix test script

minor

sort query results

minor

minor

format

fix ci

format

gstest

Add License
zhanglei1949 pushed a commit to zhanglei1949/GraphScope that referenced this issue Sep 26, 2023
author shirly121 <yihe.zxl@alibaba-inc.com> 1694167237 +0800
committer xiaolei.zl <xiaolei.zl@alibaba-inc.com> 1695348300 +0800

parent 6ab796e
author shirly121 <yihe.zxl@alibaba-inc.com> 1694167237 +0800
committer xiaolei.zl <xiaolei.zl@alibaba-inc.com> 1695348286 +0800

[GIE Compiler] fix bugs of columnId in schema

refactor(flex): Replace the Adhoc csv reader with Arrow CSV reader (alibaba#3154)

1. Use Arrow CSV Reader to replace current adhoc csv reader, to support
more configurable options in `bulk_load.yaml`.
2. Introduce `CSVFragmentLoader`, `BasicFragmentLoader` for
`MutablePropertyFragment`.

With this PR merged, `MutablePropertyFragment` will support loading
fragment from csv with options:
- delimeter: default '|'
- header_row: default true
- quoting: default false
- quoting_char: default '"'
- escaping: default false
- escaping_char: default'\\'
- batch_size: the batch size of when reading file into memory, default
1MB.
- batch_reader: default false. If set to true,
`arrow::csv::StreamingReader` will be used to parse the input file.
Otherwise, `arrow::TableReader` will be used.

With this PR merged, the performance of graph loading will be improved.
The Adhoc Reader denote the current implemented csv parser, 1,2,4,8
denotes the parallelism of graph loading, i.e. how many labels of
vertex/edge are concurrently processed.

Note that TableReader is around 10x faster than StreamingReader. The
possible reason could be the multi-threading is used.
See [arrow-csv-doc](https://arrow.apache.org/docs/cpp/csv.html) for
details.

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |805s|	468s|	349s|	313s|
| Adhoc Reader | Serialization | 126s|	126s|	126s|	126s|
| Adhoc Reader  | **Total** |931s|	594s|	475s|	439s|
| Table Reader |  ReadFile | 9s	|9s	|9s|	9s|
| Table Reader | LoadGraph |455s|	280s|	211s|	182s|
| Table Reader |Serialization |126s|	126s|	126s|	126s|
| Table Reader | **Total** | 600s|	415s|	346s|	317s|
| Streaming Reader | ReadFile |91s|	91s|	91s|	91s|
| Streaming Reader | LoadGraph | 555s|	289s|	196s|	149s|
| Streaming Reader | Serialization |126s|	126s|	126s|	126s|
| Streaming Reader | **Total** | 772s|	506s|	413s|	366s|

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |2720s|	1548s|	1176s|	948s|
| Adhoc Reader | Serialization | 409s|	409s|	409s|	409s|
| Adhoc Reader  | **Total** | 3129s|	1957s|	1585s|	1357s|
| Table Reader |  ReadFile |24s|	24s|	24s|	24s|
| Table Reader | LoadGraph |1576s|	949s|	728s|	602s|
| Table Reader |Serialization |409s|	409s|	409s|	409s|
| Table Reader | **Total** | 2009s|	1382s|	1161s|	1035s|
| Streaming Reader | ReadFile |300s|	300s|	300s|	300s|
| Streaming Reader | LoadGraph | 1740s|	965s|	669s|	497s|
| Streaming Reader | Serialization | 409s|	409s|	409s|	409s|
| Streaming Reader | **Total** | 2539s|	1674s|	1378s|	1206s|
| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph | 8260s|	4900s	|3603s	|2999s|
| Adhoc Reader | Serialization | 1201s |	1201s|	1201s	|1201s|
| Adhoc Reader  | **Total** | 9461s|	6101s | 4804s	|4200s|
| Table Reader |  ReadFile | 73s	|73s|	96s|	96s|
| Table Reader | LoadGraph |4650s|	2768s|	2155s	|1778s|
| Table Reader |Serialization | 1201s |	1201s|	1201s	|1201s|
| Table Reader | **Total** | 5924s|	4042s|	3452s|	3075s|
| Streaming Reader | ReadFile | 889s |889s | 889s| 889s|
| Streaming Reader | LoadGraph | 5589s|	3005s|	2200s|	1712s|
| Streaming Reader | Serialization | 1201s| 1201s| 1201s |1201s |
| Streaming Reader | **Total** | 7679s	| 5095s |4290s| 	3802s|

FIx alibaba#3116

minor fix and move modern graph

fix grin test

todo: do_start

fix

fix

stash

fix

fix

make rules unique

dockerfile stash

minor change

remove plugin-dir

fix

minor fix

debug

debug

fix

fix

fix bulk_load.yaml

bash format

some fix

fix format

fix grin test

some fi

check ci

fix ci

set

fix ci

fix

dd

f

disable tmate

fix some bug

fix

fix

refactor

fix

fix

fix

minor

some fix

fix

support default src_dst primarykey mapping in bulk load

fix

fix

fix

fix

Ci

rename

fix java and add get_person_name.cypher

[GIE Compiler] minor fix

use graphscope gstest

format

add movie queries

dd

debug

add movie test

format

format

fix script

debug

fix test script

minor

sort query results

minor

minor

format

fix ci

format

gstest

Add License

fix bugs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

1 participant