[ggma] Add ggma_tokenize.h and implementation#16267
[ggma] Add ggma_tokenize.h and implementation#16267glistening merged 4 commits intoSamsung:masterfrom
Conversation
91945c7 to
a675788
Compare
It adds ggma_tokenize API and implementation. It supports SentencePiece library. ONE-DCO-1.0-Signed-off-by: Sanggyu Lee <sg5.lee@samsung.com>
2facee6 to
77595bf
Compare
| return() | ||
| endif(NOT SentencePieceSource_FOUND) | ||
|
|
||
| include_directories(${SentencePieceSource_DIR}) |
There was a problem hiding this comment.
Any reason need this line?
There was a problem hiding this comment.
In file included from /z/ONE/runtime/externals/SENTENCEPIECE/src/util.h:32,
from /z/ONE/runtime/externals/SENTENCEPIECE/src/flags.cc:27:
/z/ONE/runtime/externals/SENTENCEPIECE/src/sentencepiece_processor.h:25:10: fatal error: third_party/absl/strings/string_view.h: No such file or directory
25 | #include "third_party/absl/strings/string_view.h"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
There was a problem hiding this comment.
It seems v0.1.90 does not consider build by add_subdirectory(). It considers build as root or by external build like ExternalProject_Add. v0.2.1 seems considering add_subdirectory().
There was a problem hiding this comment.
It seems
v0.1.90does not consider build byadd_subdirectory()
And because of this, install(...) in SentencePiece's cmake is not working, fortunately. If it is working, it will install SentencePiece's libraries, headers, and tools in our install path.
There was a problem hiding this comment.
SentencePiece libraries are installed under lib/ggma.
$ tree Product/x86_64-linux.debug/out/lib/ggma
Product/x86_64-linux.debug/out/lib/ggma
├── libggma_api.so
├── libggma_tokenize.so
├── libsentencepiece.so -> libsentencepiece.so.0
├── libsentencepiece.so.0 -> libsentencepiece.so.0.0.0
└── libsentencepiece.so.0.0.0
No header is installed under include/ggma
$ tree Product/x86_64-linux.debug/out/include/ggma/
Product/x86_64-linux.debug/out/include/ggma/
├── ggma_api.h
├── ggma_context.h
├── ggma_generate.h
├── ggma_tokenize.h
└── ggma_types.h
I don't recognize any problem. Could you please let me know if somethig is wrong?
Do you prefer static library which hides libsentencepiece.so in libggma_tokenize.so ?
There was a problem hiding this comment.
As I told, fortunately no problem, because SentencePiece's cmake is not working correctly.
Do you prefer static library which hides libsentencepiece.so in libggma_tokenize.so ?
I have no preference about this.
There was a problem hiding this comment.
It seems
v0.1.90does not consider build byadd_subdirectory(). It considers build as root or by external build likeExternalProject_Add.v0.2.1seems consideringadd_subdirectory().
For your information, I've used v0.1.90 since our internal translation model does not work the latest version of sentencepiece. I confirmed it works with v0.1.90. I hope I can use v0.2.1 or later after translation model is done.
There was a problem hiding this comment.
I see. Could you change this to use target setting target_include_directories(sentencepiece PRIVATE ${SentencePieceSource_DIR}) instead of global setting?
| return() | ||
| endif() | ||
|
|
||
| set_property(TARGET sentencepiece PROPERTY POSITION_INDEPENDENT_CODE ON) |
There was a problem hiding this comment.
Because you does not set SPM_ENABLE_SHARED as FALSE, sentencepiece may be shared library.
- Is
POSITION_INDEPENDENT_CODErequired? - Is it intended to use shared
sentencepiecelibrary?
There was a problem hiding this comment.
I made sentencepiece as shared library.
By default, SPM_ENABLED_SHARED seems ON.
runtime/externals/SENTENCEPIECE/CMakeLists.txt
option(SPM_ENABLE_SHARED "Builds shared libaries in addition to static libraries." ON)
Product/x86_64-linux.debug/out/lib/ggma/libsentencepiece.so
Product/x86_64-linux.debug/out/lib/ggma/libsentencepiece.so.0
Product/x86_64-linux.debug/out/lib/ggma/libsentencepiece.so.0.0.0
I prefered shared since we may choose to load the necessary shared library later using dlopen.
Do you prefer static library, then could you please let me know why?
There was a problem hiding this comment.
Do you prefer static library, then could you please let me know why?
No. I commented because line 22 is meaningless if you want to use shared library. I thought that you added POSITION_INDEPENDENT_CODE property explicitly because you want to use static library.
|
@hseok-oh Please see the last commit. I've applied your reviews. After this PR merged, I will update debian build and Tizen rpm build. |
It adds ggma_tokenize API and implementation.
It supports SentencePiece library.
ONE-DCO-1.0-Signed-off-by: Sanggyu Lee sg5.lee@samsung.com
I will enable tokenizer Tizen build in separate PR since it requires SENTENCEPIECE source tarball.