# Apify

This notebook goes over how to use Apify to get content from documentation, knowledge bases, help centers, or blogs. You can use the data to train, fine-tune, or feed your large language models (LLMs) such as ChatGPT or LLaMA.

You can [check this link](https://docs.apify.com/) for more information about the Apify platform.

## Example usage
The example below shows how to run a website content crawler on Apify to extract text from all pages in the platform documentation, we then use the scraped text for question answering.

In [1]:
from langchain.document_loaders.base import Document
from langchain.indexes import VectorstoreIndexCreator
from langchain.utilities import ApifyWrapper

In [2]:
apify = ApifyWrapper()

The code below calls an Actor and fetches its results into a document loader. If you already have some results in a dataset and want to work with them, please check out [this notebook](../../../indexes/document_loaders/examples/apify_dataset.ipynb) which shows how to work with `ApifyDatasetLoader`. You should also check the notebook for explanation of the `dataset_mapping_function`.

In [3]:
loader = apify.call_actor(
    actor_id="apify/website-content-crawler",
    run_input={"startUrls": [{"url": "https://docs.apify.com/platform"}]},
    dataset_mapping_function=lambda item: Document(
        page_content=item["text"] or "", metadata={"source": item["url"]}
    ),
)

In [None]:
index = VectorstoreIndexCreator().from_loaders([loader])

In [5]:
query = "What is Apify?"
result = index.query_with_sources(query)

In [6]:
print(result["answer"])
print(result["sources"])

 Apify is a platform for developing, running, and sharing serverless cloud programs. It enables users to create web scraping and automation tools and publish them on the Apify platform.

https://docs.apify.com/platform/actors, https://docs.apify.com/platform/actors/running/actors-in-store, https://docs.apify.com/platform/security, https://docs.apify.com/platform/actors/examples
