Skip to content

feat(llm): support multiple property_type & importing graph from the entire doc#84

Merged
imbajin merged 14 commits intoapache:mainfrom
vichayturen:main0920
Oct 10, 2024
Merged

feat(llm): support multiple property_type & importing graph from the entire doc#84
imbajin merged 14 commits intoapache:mainfrom
vichayturen:main0920

Conversation

@vichayturen
Copy link
Contributor

@vichayturen vichayturen commented Sep 22, 2024

Done:

  1. Fixed bug of config qianfan llm api
  2. Added two enums for properties settings
  3. support different property type
  4. support list/set cardinality for property_key

TODO:

  • enhance the logic for split chunk for vector & graph (maybe we should divide them separately)

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Sep 22, 2024
@github-actions github-actions bot added the llm label Sep 22, 2024
@dosubot dosubot bot added bug Something isn't working enhancement New feature or request labels Sep 22, 2024
@vichayturen vichayturen changed the title feat: Modify the method of importing graph by extracting from the entire document feat: Supported multiple property data types and modified the method of importing graph by extracting from the entire document Sep 29, 2024
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Oct 9, 2024
for prop in properties:
self.schema.propertyKey(prop).asText().ifNotExist().create()
# for prop in properties:
# self.schema.propertyKey(prop).asText().ifNotExist().create()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove directly?

for prop in properties:
self.schema.propertyKey(prop).asText().ifNotExist().create()
# for prop in properties:
# self.schema.propertyKey(prop).asText().ifNotExist().create()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

language: Literal["zh", "en"] = "zh"
self,
text: Union[str, List[str]], # text to be split
split_type: Literal["document", "paragraph", "sentence"] = "document",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the default value changed to document, does it have a better effect?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the default value changed to document, does it have a better effect?

handle it in next PR (we need refactor the split logic for better effect)

print(context)
return json.dumps(context, ensure_ascii=False, indent=2)
except Exception as e: # pylint: disable=W0718
except Exception as e: # pylint: disable=W0718
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this configuration universal? Does it need to be added to pylint.conf?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this configuration universal? Does it need to be added to pylint.conf?

No need for now maybe

"""Get max-allowed token length"""
# TODO: list all models and their max tokens from api
return 2049
return 8192
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why increase to 8192?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why increase to 8192?

2049 is outdated (in 2023y)

BTW, this method is not used now (may remove it later)

Copy link
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge it to avoid blocking other PRs (handle chunk split logic in another PR later) @vichayturen

@imbajin imbajin changed the title feat: Supported multiple property data types and modified the method of importing graph by extracting from the entire document feat(llm): support multiple property_type & importing graph from the entire doc Oct 10, 2024
@imbajin imbajin merged commit f7fc02f into apache:main Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request llm size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants