In [1]:
from llama_index.llms.ollama import Ollama

In [2]:
llm = Ollama(model="gemma:2b",request_timeout=600)

In [11]:
response = llm.complete("what is pathway api?")
print(response)

**Pathway API** is a software development kit (SDK) designed to facilitate the creation of secure and efficient APIs for various platforms. It provides a set of tools and resources to help developers define, build, and manage their APIs in a consistent and standardized manner.

**Key features of the Pathway API SDK include:**

* **Support for multiple platforms:** It provides support for popular platforms such as Azure, AWS, Google Cloud Platform, Salesforce, SAP, and more.
* **Rich documentation:** The SDK comes with comprehensive documentation, including tutorials, examples, and best practices.
* **Code generation tools:** It offers code generation tools to simplify the development process and generate API code from your specifications.
* **Security and compliance:** Pathway API is built with security and compliance in mind, ensuring that your APIs meet industry standards and best practices.
* **Versioning and rollback:** The SDK supports versioning and rollback, allowing you to mana

In [8]:
from llama_index.embeddings.ollama import OllamaEmbedding

In [10]:
from llama_index.retrievers.pathway import PathwayRetriever

retriever = PathwayRetriever(
    url="https://demo-document-indexing.pathway.stream"
)
retriever.retrieve(str_or_query_bundle="what is pathway")

[NodeWithScore(node=TextNode(id_='f83da7b8-b901-42df-b1d3-6c4170744cce', embedding=None, metadata={'created_at': 1706785638, 'modified_at': 1706785638, 'path': '/sites/ConnectorSandbox/Shared Documents/IndexerSandbox/arxiv 2307.13116.pdf', 'seen_at': 1715464864, 'size': 1550383, 'status': 'parsed'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='3 2 0 2\nl u J\n2 1\n]\nG L . s c [\n1 v 6 1 1 3 1 . 7 0 3 2 : v i X r a\nPathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications\nMichał Bartoszkiewicz\nJan Chorowski∗\nAdrian Kosowski\nJakub Kowalski\nSergey Kulik\nMateusz Lewandowski\nKrzysztof Nowicki\nKamil Piechowiak\nOlivier Ruas\nZuzanna Stamirowska\nPrzemysław Uznański\n{firstname.lastname}@pathway.com Pathway.com Paris, France\nABSTRACT We present Pathway, a new unified data processing framework that can run workloads on both bounded and unbounded data streams. The framework was cre

In [13]:
!mkdir -p 'data/'
!wget 'https://gist.githubusercontent.com/janchorowski/dd22a293f3d99d1b726eedc7d46d2fc0/raw/pathway_readme.md' -O 'data/pathway_readme.md'

--2024-05-14 01:35:19--  https://gist.githubusercontent.com/janchorowski/dd22a293f3d99d1b726eedc7d46d2fc0/raw/pathway_readme.md
Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.111.133, ...
Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8645 (8.4K) [text/plain]
Saving to: ‘data/pathway_readme.md’


2024-05-14 01:35:20 (17.3 MB/s) - ‘data/pathway_readme.md’ saved [8645/8645]



In [14]:
import pathway as pw

data_sources = []
data_sources.append(
    pw.io.fs.read(
        "./data",
        format="binary",
        mode="streaming",
        with_metadata=True,
    )  # This creates a `pathway` connector that tracks
    # all the files in the ./data directory
)

# This creates a connector that tracks files in Google drive.
# please follow the instructions at https://pathway.com/developers/tutorials/connectors/gdrive-connector/ to get credentials
# data_sources.append(
#     pw.io.gdrive.read(object_id="17H4YpBOAKQzEJ93xmC2z170l0bP2npMy", service_user_credentials_file="credentials.json", with_metadata=True))

In [15]:
from pathway.xpacks.llm.vector_store import VectorStoreServer
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.core.node_parser import TokenTextSplitter

embed_model = OllamaEmbedding(
    model_name="all-minilm:latest",
    base_url="http://localhost:11434",
    ollama_additional_kwargs={"mirostat": 0},
)

transformations_example = [
    TokenTextSplitter(
        chunk_size=150,
        chunk_overlap=10,
        separator=" ",
    ),
    embed_model,
]

processing_pipeline = VectorStoreServer.from_llamaindex_components(
    *data_sources,
    transformations=transformations_example,
)

# Define the Host and port that Pathway will be on
PATHWAY_HOST = "127.0.0.1"
PATHWAY_PORT = 8754

# `threaded` runs pathway in detached mode, we have to set it to False when running from terminal or container
# for more information on `with_cache` check out https://pathway.com/developers/api-docs/persistence-api
processing_pipeline.run_server(
    host=PATHWAY_HOST, port=PATHWAY_PORT, with_cache=False, threaded=True
)

[nltk_data] Downloading package stopwords to /home/alan/porgramming/py
[nltk_data]     thon/llamaindex/lindex/lib/python3.11/site-
[nltk_data]     packages/llama_index/legacy/_static/nltk_cache...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /home/alan/porgramming/python
[nltk_data]     /llamaindex/lindex/lib/python3.11/site-
[nltk_data]     packages/llama_index/legacy/_static/nltk_cache...
[nltk_data]   Unzipping tokenizers/punkt.zip.
    https://beartype.readthedocs.io/en/latest/api_roar/#pep-585-deprecations
  warn(


<Thread(VectorStoreServer, started 127985001436736)>

(Press CTRL+C to quit)
