# Building blocks in Haystack: Data classes 

With Haystack, we can leverage the following built-in data classes: 

* Haystack Documents data class 

* Haystack ByteStream data class 

* Haystack ChatMessage data class 

* Haystack StreaminhChunk data class 

Each of these classes act as data structures that can be used to store and process data. We can use these classes to store data in a standardized format, and then use the Haystack API to process the data through data pipelines.

In the next section, we will provide examples of each.

### Haystack Documents data class 

The Document is a foundational data class in Haystack that encapsulates a variety of data types that can be queried, such as text snippets, tables, and binary data.

Let's import it and take a look at its functionality.

In [1]:
from haystack.preview.dataclasses import Document

help(Document)

Help on class Document in module haystack.preview.dataclasses.document:

class Document(builtins.object)
 |  Document(*args, **kwargs)
 |  
 |  Base data class containing some data to be queried.
 |  Can contain text snippets, tables, and file paths to images or audios.
 |  Documents can be sorted by score and saved to/from dictionary and JSON.
 |  
 |  :param id: Unique identifier for the document. When not set, it's generated based on the Document fields' values.
 |  :param content: Text of the document, if the document contains text.
 |  :param dataframe: Pandas dataframe with the document's content, if the document contains tabular data.
 |  :param blob: Binary data associated with the document, if the document has any binary data associated with it.
 |  :param meta: Additional custom metadata for the document. Must be JSON-serializable.
 |  :param score: Score of the document. Used for ranking, usually assigned by retrievers.
 |  :param embedding: Vector representation of the docu

Using the `help` function lets us see what parameters it accepts. 

Let's create a simple Document object.

In [11]:
sample_document = Document(content="This is a simple document", meta={"name": "test_doc"})
sample_document

Document(id='ca53157e450d009adb4c2217111faadc9e7c02aefb22717c4901e1c1c1ba314a', content='This is a simple document', dataframe=None, blob=None, meta={'name': 'test_doc'}, score=None)

In [18]:
sample_document.id

'ca53157e450d009adb4c2217111faadc9e7c02aefb22717c4901e1c1c1ba314a'

We see that an id was automatically generated for the document. Let's create a dataframe-based document.

In [19]:
sample_document.content

'This is a simple document'

In [20]:
sample_document.meta

{'name': 'test_doc'}

In [9]:
import pandas as pd
from sklearn.datasets import fetch_20newsgroups, load_iris

# Load some example data
iris_df = load_iris(as_frame=True)["frame"]
news_df = pd.DataFrame(fetch_20newsgroups(subset="train").data, columns=["text"])

# Save each row as a Document Object
iris_docs = [Document(dataframe=row.to_frame().T) for _, row in iris_df.iterrows()]

We see that each row was converted into a Document object, each with its own id. Let's access the first Document  and attributes.

In [17]:
iris_docs[0]

Document(id='22cf9396b67c1929c273ed65a6fcea5b8ba8b384ae45d5164be9ca7b6827c66c', content=None, dataframe=   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   

   target  
0     0.0  , blob=None, meta={}, score=None)

In [16]:
iris_docs[0].id

'22cf9396b67c1929c273ed65a6fcea5b8ba8b384ae45d5164be9ca7b6827c66c'

In [15]:
iris_docs[0].dataframe

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0.0
