# Apify dataset
This covers how to load a dataset from Apify to langchain.

## Prerequsites
You need to have an existing dataset on the Apify platform. If you don't, please check out [this notebook](../modules/agents/tools/examples/apify.ipynb) on how to scrape websites with Apify.

## Basic Usage

In [None]:
from langchain.document_loaders import ApifyDatasetLoader
from langchain.document_loaders.base import Document

You need to provide a mapping function for the dataset.

For example, if your dataset items have structure like this:
```json
{
    "url": "https://apify.com",
    "text": "Apify is the best web scraping and automation platform."
}
```
The mapping function in the code below will convert them to the langchain's `Document` format, and you can use them further with any LLM model (e.g. for question answering).

In [None]:
loader = ApifyDatasetLoader(
    dataset_id="your-dataset-id",
    dataset_mapping_function=lambda dataset_item: Document(
        page_content=dataset_item["text"], metadata={"source": dataset_item["url"]}
    ),
)

In [None]:
data = loader.load()

## An example with question answering

In this example, we use data from a dataset to answer a question.

In [None]:
from langchain.docstore.document import Document
from langchain.document_loaders import ApifyDatasetLoader
from langchain.indexes import VectorstoreIndexCreator

if __name__ == "__main__":
    loader = ApifyDatasetLoader(
        dataset_id="your-dataset-id",
        dataset_mapping_function=lambda item: Document(
            page_content=item["text"] or "", metadata={"source": item["url"]}
        ),
    )

    index = VectorstoreIndexCreator().from_loaders([loader])

    query = "What is Apify?"
    result = index.query_with_sources(query)
    print(result["answer"])
    print(result["sources"])
