# Fake LLM

https://api.python.langchain.com/en/stable/llms/langchain_community.llms.fake.FakeListLLM.html#langchain_community.llms.fake.FakeListLLM

Use at development time to save on the costs.


#### Google Colab
If you are running the code in Google colab, install the packages by uncommenting/running the cell below

* The API key file file will not be available
* You will be prompted to provide the Cohere API Token

Uncomment & run the code in the cell below:

In [1]:
# !pip install langchain langchain-community -q

## 1. Import fake LLM

In [1]:
from langchain_community.llms.fake import FakeListLLM
from langchain_community.llms.fake import FakeStreamingListLLM

## 2. Setup the Fake List LLM

In [2]:
# Random responses sent by the LLM
fake_responses = [
    "this is a fake response 1",
    "this is a fake response 2",
]

# Create the faker
fake_list_llm = FakeListLLM(responses=fake_responses)

#### Synchronous invoke()

In [3]:
%%time

# Quick test to see if it is working
fake_list_llm.invoke("give whatever")

CPU times: total: 0 ns
Wall time: 3.78 ms


'this is a fake response 1'

#### Asynchronous ainvoke()

* Uses the Python [asynchio library](https://docs.python.org/3/library/asyncio.html)


In [7]:
# use await to invoke the async function
response =  await fake_list_llm.ainvoke("give whatever")

# print response
print(response)


## 3. Streaming

https://python.langchain.com/v0.1/docs/expression_language/streaming/

* Supports synchronous & asynchronous streaming
* Use Fake Streaming LLM to demonstrate the working

https://api.python.langchain.com/en/stable/llms/langchain_community.llms.fake.FakeStreamingListLLM.html#langchain_community.llms.fake.FakeStreamingListLLM



#### Synchronous stream()

In [8]:
# Create a fake streaming LLM with an artificial delay of 0.5 seconds between chunks
fake_streaming_llm = FakeStreamingListLLM(responses=fake_responses, sleep=0.5)

# Sysnchronous streaming
for chunk in fake_streaming_llm.stream("does not matter"):
    print(chunk,end="")

this is a fake response 1

#### Asynchronous astream()

In [10]:
# Initiate the LLM streaming as async
streaming_object = fake_streaming_llm.astream("does not matter")

# Do other things, while LLM is processing in parallel
print("\nDo something while, while LLM is generating a response i.e., thread is not blocked ...")

# Gather some of the streamed 
async for chunk in streaming_object:
    print(chunk,end="")


Do something while, while LLM is generating a response i.e., thread is not blocked ...
this is a fake response 1

## 4. Batches

In [12]:
# Fake responses
fake_responses = [
    '{"person": "tom", "org": "apple"}',
    '{"person": "nick", "org": "amazon"}',
    '{"person": "anil", "org": "meta"}',
]

fake_list_llm = FakeListLLM(responses=fake_responses)

# Test requests sent in a batch in parallel
request_inputs = [
    "extract named entitities :  ......",
    "extract named entitities :  ......",
    "extract named entitities :  ......",
    "extract named entitities :  ......",
]

# Invoke LLM in parallel for all 3 
fake_list_llm.batch(request_inputs)

['{"person": "tom", "org": "apple"}',
 '{"person": "nick", "org": "amazon"}',
 '{"person": "anil", "org": "meta"}',
 '{"person": "tom", "org": "apple"}']