Skip to content

构建知识库中Vectorizer阶段报错 #544

@iazkaban

Description

@iazkaban

系统:Ubuntu
运行环境:Docker

换了多个不同的文件,都会卡在这个地方
完整的日志信息如下:

Reader

2025-04-25 13:31:40(192.168.176.5): Task scheduling completed. cost:22 ms !
2025-04-25 13:31:40(192.168.176.5): Lock released successfully!
2025-04-25 13:31:40(192.168.176.5): Store the results of the read operator. file:builder/builder/project_1/instance_2/7_kagReaderSyncTask.kag
2025-04-25 13:31:40(192.168.176.5): The read operator was invoked successfully. chunk size:1
2025-04-25 13:31:40(192.168.176.5): Invoke read operator:run_reader
2025-04-25 13:31:40(192.168.176.5): Lock preempted successfully!

Splitter

2025-04-25 13:31:50(192.168.176.5): Task scheduling completed. cost:8 ms !
2025-04-25 13:31:50(192.168.176.5): Lock released successfully!
2025-04-25 13:31:50(192.168.176.5): Splitter task trace log:
>> 13:31:50: Store the results of the split operator. file:builder/builder/project_1/instance_2/8_kagSplitterAsyncTask.kag
>> 13:31:50: Split document complete. number of paragraphs:3
>> 13:31:50: The split operator was invoked successfully. chunk size:3
>> 13:31:50: Split chunk(灵脉之巅) successfully. chunk size:3
>> 13:31:50: Split chunk(灵脉之巅)
>> 13:31:50: Invoke the split operator
>> 13:31:50: Start split document!

2025-04-25 13:31:50(192.168.176.5): Splitter task status is FINISH
2025-04-25 13:31:50(192.168.176.5): The asynchronous task has been created! resource:builder/project_1/instance_2/8_kagSplitterAsyncTask.kag
2025-04-25 13:31:50(192.168.176.5): Lock preempted successfully!

2025-04-25 13:31:50(192.168.176.5): Task scheduling completed. cost:11 ms !
2025-04-25 13:31:50(192.168.176.5): Lock released successfully!
2025-04-25 13:31:50(192.168.176.5): The asynchronous task created successfully! resource:builder/project_1/instance_2/8_kagSplitterAsyncTask.kag
2025-04-25 13:31:50(192.168.176.5): Splitter task has been successfully created!
2025-04-25 13:31:50(192.168.176.5): The asynchronous task has not been created yet!
2025-04-25 13:31:50(192.168.176.5): Lock preempted successfully!

Extractor

2025-04-25 13:34:58(192.168.176.5): Task scheduling completed. cost:7 ms !
2025-04-25 13:34:58(192.168.176.5): Lock released successfully!
2025-04-25 13:34:58(192.168.176.5): Extractor task trace log:
>> 13:34:58: Store the results of the extract operator. file:builder/builder/project_1/instance_2/9_kagExtractorAsyncTask.kag
>> 13:34:58: Extract document complete. nodes:43. edges:43
>> 13:34:58: Extract chunk(灵脉之巅_split_0:5a62a0) index:1/3 successfully. nodes:20. edges:22

2025-04-25 13:34:58(192.168.176.5): Extractor task status is FINISH
2025-04-25 13:34:58(192.168.176.5): The asynchronous task has been created! resource:builder/project_1/instance_2/9_kagExtractorAsyncTask.kag
2025-04-25 13:34:58(192.168.176.5): Lock preempted successfully!

2025-04-25 13:34:04(192.168.176.5): Task scheduling completed. cost:4 ms !
2025-04-25 13:34:04(192.168.176.5): Lock released successfully!
2025-04-25 13:34:04(192.168.176.5): Extractor task trace log:
>> 13:33:30: Extract chunk(灵脉之巅_split_1:f58a25) index:2/3 successfully. nodes:13. edges:12

2025-04-25 13:34:04(192.168.176.5): Extractor task status is RUNNING
2025-04-25 13:34:04(192.168.176.5): The asynchronous task has been created! resource:builder/project_1/instance_2/9_kagExtractorAsyncTask.kag
2025-04-25 13:34:04(192.168.176.5): Lock preempted successfully!

2025-04-25 13:33:04(192.168.176.5): Task scheduling completed. cost:3 ms !
2025-04-25 13:33:04(192.168.176.5): Lock released successfully!
2025-04-25 13:33:04(192.168.176.5): Extractor task trace log:
>> 13:32:59: Extract chunk(灵脉之巅_split_2:00402f) index:3/3 successfully. nodes:10. edges:9

2025-04-25 13:33:04(192.168.176.5): Extractor task status is RUNNING
2025-04-25 13:33:04(192.168.176.5): The asynchronous task has been created! resource:builder/project_1/instance_2/9_kagExtractorAsyncTask.kag
2025-04-25 13:33:04(192.168.176.5): Lock preempted successfully!

2025-04-25 13:32:04(192.168.176.5): Task scheduling completed. cost:4 ms !
2025-04-25 13:32:04(192.168.176.5): Lock released successfully!
2025-04-25 13:32:04(192.168.176.5): Extractor task trace log:
>> 13:32:00: Start extract chunk(灵脉之巅_split_2:00402f) index:3/3
>> 13:32:00: Start extract chunk(灵脉之巅_split_1:f58a25) index:2/3
>> 13:32:00: Start extract chunk(灵脉之巅_split_0:5a62a0) index:1/3
>> 13:32:00: Extract ThreadPool(9535ae5a-6017-4a3b-a5d8-9c72c2d9f29f). size:13 active:2 completed:11 total:13 queue:0
>> 13:32:00: Extract ThreadPool(9535ae5a-6017-4a3b-a5d8-9c72c2d9f29f). size:12 active:1 completed:11 total:12 queue:0
>> 13:32:00: Extract ThreadPool(9535ae5a-6017-4a3b-a5d8-9c72c2d9f29f). size:11 active:0 completed:11 total:11 queue:0
>> 13:32:00: Start extract document. chunk size:3

2025-04-25 13:32:04(192.168.176.5): Extractor task status is RUNNING
2025-04-25 13:32:04(192.168.176.5): The asynchronous task has been created! resource:builder/project_1/instance_2/9_kagExtractorAsyncTask.kag
2025-04-25 13:32:04(192.168.176.5): Lock preempted successfully!

2025-04-25 13:32:00(192.168.176.5): Task scheduling completed. cost:15 ms !
2025-04-25 13:32:00(192.168.176.5): Lock released successfully!
2025-04-25 13:32:00(192.168.176.5): The asynchronous task created successfully! resource:builder/project_1/instance_2/9_kagExtractorAsyncTask.kag
2025-04-25 13:32:00(192.168.176.5): Extractor task has been successfully created!
2025-04-25 13:32:00(192.168.176.5): The asynchronous task has not been created yet!
2025-04-25 13:32:00(192.168.176.5): Lock preempted successfully!

Vectorizer

2025-04-25 14:09:04(192.168.176.5): Task scheduling completed. cost:3 ms !
2025-04-25 14:09:04(192.168.176.5): Lock released successfully!
2025-04-25 14:09:04(192.168.176.5): Vectorizer task status is ERROR
2025-04-25 14:09:04(192.168.176.5): The asynchronous task has been created! resource:builder/project_1/instance_2/10_kagVectorizerAsyncTask.kag
2025-04-25 14:09:04(192.168.176.5): Lock preempted successfully!

2025-04-25 14:08:05(192.168.176.5): Task scheduling completed. cost:4 ms !
2025-04-25 14:08:05(192.168.176.5): Lock released successfully!
2025-04-25 14:08:05(192.168.176.5): Vectorizer task trace log:
pemja.core.PythonException: <class 'TypeError'>: 'NoneType' object is not iterable
at /home/admin/miniconda3/lib/python3.10/site-packages/kag/bridge/spg_server_bridge.run_component(spg_server_bridge.py:111)
at /home/admin/miniconda3/lib/python3.10/site-packages/kag/bridge/spg_server_bridge.run_component(spg_server_bridge.py:103)
at /home/admin/miniconda3/lib/python3.10/site-packages/kag/interface/builder/base.invoke(base.py:153)
at /home/admin/miniconda3/lib/python3.10/site-packages/kag/builder/component/vectorizer/batch_vectorizer._invoke(batch_vectorizer.py:327)
at /home/admin/miniconda3/lib/python3.10/site-packages/tenacity/init.wrapped_f(init.py:338)
at /home/admin/miniconda3/lib/python3.10/site-packages/tenacity/init.call(init.py:477)
at /home/admin/miniconda3/lib/python3.10/site-packages/tenacity/init.iter(init.py:378)
at /home/admin/miniconda3/lib/python3.10/site-packages/tenacity/init.exc_check(init.py:420)
at /home/admin/miniconda3/lib/python3.10/site-packages/tenacity/init.reraise(init.py:187)
at /home/admin/miniconda3/lib/python3.10/concurrent/futures/_base.result(_base.py:451)
at /home/admin/miniconda3/lib/python3.10/concurrent/futures/_base.__get_result(_base.py:403)
at /home/admin/miniconda3/lib/python3.10/site-packages/tenacity/init.call(init.py:480)
at /home/admin/miniconda3/lib/python3.10/site-packages/kag/builder/component/vectorizer/batch_vectorizer._generate_embedding_vectors(batch_vectorizer.py:273)
at /home/admin/miniconda3/lib/python3.10/site-packages/kag/builder/component/vectorizer/batch_vectorizer.batch_generate(batch_vectorizer.py:164)
at /home/admin/miniconda3/lib/python3.10/site-packages/kag/builder/component/vectorizer/batch_vectorizer.batch_generate(batch_vectorizer.py:119)
at /home/admin/miniconda3/lib/python3.10/site-packages/kag/builder/component/vectorizer/batch_vectorizer._generate_vectors(batch_vectorizer.py:92)
at pemja.core.PythonInterpreter.invokeMethod(Native Method)
at pemja.core.PythonInterpreter.invokeMethod(PythonInterpreter.java:118)
at com.antgroup.openspg.common.util.pemja.PemjaUtils.invoke(PemjaUtils.java:41)
at com.antgroup.openspg.server.core.scheduler.service.task.async.builder.KagVectorizerAsyncTask$VectorizerTaskCallable.vectorizer(KagVectorizerAsyncTask.java:209)
at com.antgroup.openspg.server.core.scheduler.service.task.async.builder.KagVectorizerAsyncTask$VectorizerTaskCallable.call(KagVectorizerAsyncTask.java:161)
at com.antgroup.openspg.server.core.scheduler.service.task.async.builder.KagVectorizerAsyncTask$VectorizerTaskCallable.call(KagVectorizerAsyncTask.java:128)
at com.antgroup.openspg.server.core.scheduler.service.common.MemoryTaskServer.executeTask(MemoryTaskServer.java:74)
at com.antgroup.openspg.server.core.scheduler.service.common.MemoryTaskServer.lambda$submit$0(MemoryTaskServer.java:62)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

2025-04-25 14:08:05(192.168.176.5): Vectorizer task status is ERROR
2025-04-25 14:08:05(192.168.176.5): The asynchronous task has been created! resource:builder/project_1/instance_2/10_kagVectorizerAsyncTask.kag
2025-04-25 14:08:05(192.168.176.5): Lock preempted successfully!

2025-04-25 14:08:04(192.168.176.5): Task scheduling completed. cost:6 ms !
2025-04-25 14:08:04(192.168.176.5): Lock released successfully!
2025-04-25 14:08:04(192.168.176.5): Vectorizer task has been successfully created!
2025-04-25 14:08:04(192.168.176.5): Vectorizer task execute failed, recreating……
2025-04-25 14:08:04(192.168.176.5): Vectorizer task status is ERROR
2025-04-25 14:08:04(192.168.176.5): The asynchronous task has been created! resource:builder/project_1/instance_2/10_kagVectorizerAsyncTask.kag
2025-04-25 14:08:04(192.168.176.5): Lock preempted successfully!

2025-04-25 14:07:04(192.168.176.5): Task scheduling completed. cost:4 ms !
2025-04-25 14:07:04(192.168.176.5): Lock released successfully!
2025-04-25 14:07:04(192.168.176.5): Vectorizer task status is ERROR
2025-04-25 14:07:04(192.168.176.5): The asynchronous task has been created! resource:builder/project_1/instance_2/10_kagVectorizerAsyncTask.kag
2025-04-25 14:07:04(192.168.176.5): Lock preempted successfully!

2025-04-25 14:06:05(192.168.176.5): Task scheduling completed. cost:5 ms !
2025-04-25 14:06:05(192.168.176.5): Lock released successfully!
2025-04-25 14:06:05(192.168.176.5): Vectorizer task trace log:

……字数原因中间省略了一些……

2025-04-25 13:35:05(192.168.176.5): Vectorizer task status is ERROR
2025-04-25 13:35:05(192.168.176.5): The asynchronous task has been created! resource:builder/project_1/instance_2/10_kagVectorizerAsyncTask.kag
2025-04-25 13:35:05(192.168.176.5): Lock preempted successfully!

2025-04-25 13:35:04(192.168.176.5): Task scheduling completed. cost:8 ms !
2025-04-25 13:35:04(192.168.176.5): Lock released successfully!
2025-04-25 13:35:04(192.168.176.5): The asynchronous task created successfully! resource:builder/project_1/instance_2/10_kagVectorizerAsyncTask.kag
2025-04-25 13:35:04(192.168.176.5): Vectorizer task has been successfully created!
2025-04-25 13:35:04(192.168.176.5): The asynchronous task has not been created yet!
2025-04-25 13:35:04(192.168.176.5): Lock preempted successfully!

Alignment

Writer

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions