## Load data (Ingestion)

Before your chosen LLM can act on your data, you first need to process the data and load it. This has parallels to data cleaning/feature engineering pipelines in the ML world, or ETL pipelines in the traditional data setting.

This ingestion pipeline typically consists of three main stages:

- Load the data
- Transform the data
- Index and store the data

#### SimpleDirectoryReader

In [2]:
from llama_index.core import SimpleDirectoryReader


documents = SimpleDirectoryReader("../../data").load_data()

### Using Readers from LlamaHub

In [None]:
from llama_index.core import download_loader
from llama_index.readers.database import DatabaseReader
import os
from dotenv import load_dotenv


reader = DatabaseReader(
    scheme = os.getenv("DB_SCHEME"),
    host=os.getenv("DB_HOST"),
    port=os.getenv("DB_PORT"),
    user=os.getenv("DB_USER"),
    password=os.getenv("DB_PASS"),
    dbname=os.getenv("DB_NAME"),
    
)

query = "SELECT * FROM users"
documents = reader.load_data(query=query)

#### Creating Documents directly
Instead of using a loader, you can also use a Document directly.

In [14]:
from llama_index.core import Document

doc = Document(text = "This is the longer piece of text", metadata = {"source":"example.txt", "author":"Koyilbek"})

In [13]:
doc

Document(id_='b0d18c3e-dcfd-468b-966b-1f695c32d391', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text_resource=MediaResource(embeddings=None, data=None, text='Text', path=None, url=None, mimetype=None), image_resource=None, audio_resource=None, video_resource=None, text_template='{metadata_str}\n\n{content}')