# Runhouse

The [Runhouse](https://www.run.house/) allows remote compute and data across environments and users. See the [Runhouse docs](https://www.run.house/docs).

This example goes over how to use LangChain and [Runhouse](https://github.com/run-house/runhouse) to interact with models hosted on your own GPU, or on-demand GPUs on AWS, GCP, AWS, or Lambda.

**Note**: Code uses `SelfHosted` name instead of the `Runhouse`.

In [None]:
%pip install --upgrade --quiet "runhouse[sky]"

In [2]:
import runhouse as rh
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_community.llms import SelfHostedHuggingFaceLLM, SelfHostedPipeline

In [None]:
# For an on-demand A100 with GCP, Azure, or Lambda
gpu = rh.cluster(name="langchain-rh-a10x", instance_type="g5.4xlarge")
gpu.up_if_not()
# For an on-demand A10G with AWS (no single A100s on AWS)
# gpu = rh.cluster(name='rh-a10x', instance_type='g5.2xlarge', provider='aws')

# For an existing cluster
# gpu = rh.cluster(ips=['<ip of the cluster>'],
#                  ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'},
#                  name='rh-a10x')

In [None]:
model_env = rh.env(
    name="model_env",
    reqs=["transformers", "torch", "accelerate", "huggingface-hub"],
    secrets=["huggingface"],  # need for downloading google/gemma-2b-it
).to(system=gpu)

In [None]:
gpu.run(commands=["pip install langchain"])

In [None]:
llm = SelfHostedHuggingFaceLLM(
    model_id="google/gemma-2b-it",
    hardware=gpu,
    env=model_env,
)

In [7]:
template = """Question: {question}

Answer: Let's think step by step."""

In [8]:
prompt = PromptTemplate.from_template(template)

In [9]:
llm_chain = LLMChain(prompt=prompt, llm=llm)

In [10]:
question = "What is the capital of Germany?"

llm_chain.invoke(question)

INFO | 2024-03-24 13:58:18.040352 | Calling LangchainLLMModelPipeline.interface_fn
INFO | 2024-03-24 14:00:22.353148 | Time to call LangchainLLMModelPipeline.interface_fn: 124.31 seconds


{'question': 'What is the capital of Germany?',
 'text': '\n\nThe word "Germany" refers to a country in Western Europe that is located between the River Rhine and the River Danube.\n\nThe capital city of Germany is Berlin.\n\nTherefore, the capital of Germany is Berlin.'}

You can also execute the prediction function of the model directly:


In [11]:
llm.invoke("Write me a short poem about Super Bowl")

INFO | 2024-03-24 14:00:22.377121 | Calling LangchainLLMModelPipeline.interface_fn
INFO | 2024-03-24 14:04:15.849123 | Time to call LangchainLLMModelPipeline.interface_fn: 233.47 seconds


' Sunday.\n\nBright lights paint the stadium floor,\nA symphony of cheers and roars.\nFamilies gather, hand in hand,\nTo watch their heroes on the grandest stage.\n\nThe crowd roars loud, a thunderous beat,\nAs the game unfolds, a thrilling feat.\nThe halftime show ignites the sky,\nA spectacle that leaves a joyous cry.\n\nSuper Bowl Sunday, a spectacle to behold,\nA moment to cherish, a story to be told.\nThe anticipation hangs in air,\nAs the excitement reaches its peak.'