airbyte.documents

This module contains the Documents class for converting Airbyte records into documents.

Generally you will not create Documents objects directly. Instead, you can use one of the following methods to generate documents from records:

  • Source.get_documents(): Get an iterable of documents from a source.
  • Dataset.to_documents(): Get an iterable of documents from a dataset.
 1# Copyright (c) 2024 Airbyte, Inc., all rights reserved.
 2"""This module contains the `Documents` class for converting Airbyte records into documents.
 3
 4Generally you will not create `Documents` objects directly. Instead, you can use one of the
 5following methods to generate documents from records:
 6
 7- `Source.get_documents()`: Get an iterable of documents from a source.
 8- `Dataset.to_documents()`: Get an iterable of documents from a dataset.
 9"""
10
11from __future__ import annotations
12
13from typing import TYPE_CHECKING, Any, Optional
14
15from pydantic import BaseModel, Field
16
17
18if TYPE_CHECKING:
19    import datetime
20
21
22MAX_SINGLE_LINE_LENGTH = 60
23AIRBYTE_DOCUMENT_RENDERING = "airbyte_document_rendering"
24TITLE_PROPERTY = "title_property"
25CONTENT_PROPS = "content_properties"
26METADATA_PROPERTIES = "metadata_properties"
27
28
29class Document(BaseModel):
30    """A PyAirbyte document is a specific projection on top of a record.
31
32    Documents have the following structure:
33    - id (str): A unique string identifier for the document.
34    - content (str): A string representing the record when rendered as a document.
35    - metadata (dict[str, Any]): Associated metadata about the document, such as the record's IDs
36      and/or URLs.
37
38    This class is duck-typed to be compatible with LangChain project's `Document` class.
39    """
40
41    id: Optional[str] = Field(default=None)
42    content: str
43    metadata: dict[str, Any]
44    last_modified: Optional[datetime.datetime] = Field(default=None)
45
46    def __str__(self) -> str:
47        return self.content
48
49    @property
50    def page_content(self) -> str:
51        """Return the content of the document.
52
53        This is an alias for the `content` property, and is provided for duck-type compatibility
54        with the LangChain project's `Document` class.
55        """
56        return self.content
57
58
59__all__ = [
60    "Document",
61]
class Document(pydantic.main.BaseModel):
30class Document(BaseModel):
31    """A PyAirbyte document is a specific projection on top of a record.
32
33    Documents have the following structure:
34    - id (str): A unique string identifier for the document.
35    - content (str): A string representing the record when rendered as a document.
36    - metadata (dict[str, Any]): Associated metadata about the document, such as the record's IDs
37      and/or URLs.
38
39    This class is duck-typed to be compatible with LangChain project's `Document` class.
40    """
41
42    id: Optional[str] = Field(default=None)
43    content: str
44    metadata: dict[str, Any]
45    last_modified: Optional[datetime.datetime] = Field(default=None)
46
47    def __str__(self) -> str:
48        return self.content
49
50    @property
51    def page_content(self) -> str:
52        """Return the content of the document.
53
54        This is an alias for the `content` property, and is provided for duck-type compatibility
55        with the LangChain project's `Document` class.
56        """
57        return self.content

A PyAirbyte document is a specific projection on top of a record.

Documents have the following structure:

  • id (str): A unique string identifier for the document.
  • content (str): A string representing the record when rendered as a document.
  • metadata (dict[str, Any]): Associated metadata about the document, such as the record's IDs and/or URLs.

This class is duck-typed to be compatible with LangChain project's Document class.

id: Optional[str]
content: str
metadata: dict[str, typing.Any]
last_modified: Optional[datetime.datetime]
page_content: str
50    @property
51    def page_content(self) -> str:
52        """Return the content of the document.
53
54        This is an alias for the `content` property, and is provided for duck-type compatibility
55        with the LangChain project's `Document` class.
56        """
57        return self.content

Return the content of the document.

This is an alias for the content property, and is provided for duck-type compatibility with the LangChain project's Document class.

Inherited Members
pydantic.main.BaseModel
BaseModel
Config
dict
json
parse_obj
parse_raw
parse_file
from_orm
construct
copy
schema
schema_json
validate
update_forward_refs