# Build apps with language models

This is the first follow-up notebook from the list of tutorial.

Link: https://www.youtube.com/watch?v=LbT1yp6quS8

In [26]:
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv('../.env'))

True

In [2]:
# Import model from Hugging Face Hub
from langchain import HuggingFaceHub

llm = HuggingFaceHub(
    repo_id="google/flan-t5-small",
    model_kwargs={
        "temperature": 0,
        "max_length": 64
    }
)

llm("Translate English to French: How old are you?")

Could not import azure.core python package.
  from .autonotebook import tqdm as notebook_tqdm


'How old sont vous?'

In [43]:
from langchain.llms import OpenAI

llm = OpenAI(temperature=0)

In [4]:
# Prompt template
from langchain import PromptTemplate

template = """Question: {question}

Let us think step by step

Answer:"""

prompt = PromptTemplate(template=template, input_variables=["question"])
prompt.format(question="Can you hear me?")

'Question: Can you hear me?\n\nLet us think step by step\n\nAnswer:'

In [5]:
llm(prompt.format(question="Can you hear me?"))

'\n\nStep 1: Are you speaking out loud?\n\nStep 2: Are you using a microphone or other audio device?\n\nStep 3: Is the audio device connected to a speaker or other audio output device?\n\nStep 4: Is the speaker or other audio output device turned on and working properly?\n\nIf the answer to all of these questions is yes, then the answer to your question is yes, you can be heard.'

In [6]:
# Chain
from langchain import LLMChain

llm_chain = LLMChain(prompt=prompt, llm=llm)
question = "Can you answer my question?"
llm_chain.run(question)

' It depends on what your question is. If it is something that I can answer, then yes, I can answer your question. If it is something that I do not know the answer to, then I cannot answer your question.'

In [7]:
# Agents and Tools
from langchain.agents import load_tools
from langchain.agents import initialize_agent

tools = load_tools(["wikipedia", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)

agent.run('In what year was the film Departed with Leopnardo Dicarpio released? What is this year raised to the 0.43 power?')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out the year the film was released and then use the calculator to calculate the power.
Action: Wikipedia
Action Input: Departed with Leonardo Dicarpio[0m
Observation: [36;1m[1;3mNo good Wikipedia Search Result was found[0m
Thought:[32;1m[1;3m I should try searching for the movie title
Action: Wikipedia
Action Input: The Departed[0m



  lis = BeautifulSoup(html).find_all('li')



Observation: [36;1m[1;3mPage: The Departed
Summary: The Departed is a 2006 American epic crime thriller film directed by Martin Scorsese and written by William Monahan. It is both a remake of the 2002 Hong Kong film Infernal Affairs and also loosely based on the real-life Boston Winter Hill Gang; the character Colin Sullivan is based on the corrupt FBI agent John Connolly, while the character Frank Costello is based on Irish-American gangster and crime boss Whitey Bulger. The film stars Leonardo DiCaprio, Matt Damon, Jack Nicholson, and Mark Wahlberg, with Martin Sheen, Ray Winstone, Vera Farmiga, Alec Baldwin, Anthony Anderson and James Badge Dale in supporting roles.
The film takes place in Boston and the surrounding metro area, primarily in the city’s South End neighborhood. Irish Mob boss Frank Costello (Nicholson) plants Colin Sullivan (Damon) as a spy within the Massachusetts State Police; simultaneously, the police assign undercover state trooper Billy Costigan (DiCaprio) to 

'The film Departed with Leonardo Dicarpio was released in 2006 and this year raised to the 0.43 power is 26.30281917656938.'

In [8]:
# Memory
from langchain import ConversationChain

conversation = ConversationChain(llm=llm, verbose=True)
conversation.predict(input="Hi there!")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi there!
AI:[0m

[1m> Finished chain.[0m


" Hi there! It's nice to meet you. My name is AI. What's your name?"

In [9]:
conversation.predict(input="Can we talk about AI?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi there!
AI:  Hi there! It's nice to meet you. My name is AI. What's your name?
Human: Can we talk about AI?
AI:[0m

[1m> Finished chain.[0m


' Sure! What would you like to know about AI?'

In [10]:
conversation.predict(input="I'm interested in how you were trained?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi there!
AI:  Hi there! It's nice to meet you. My name is AI. What's your name?
Human: Can we talk about AI?
AI:  Sure! What would you like to know about AI?
Human: I'm interested in how you were trained?
AI:[0m

[1m> Finished chain.[0m


' I was trained using a combination of supervised and unsupervised learning techniques. Supervised learning involves providing the AI with labeled data sets that it can use to learn from. Unsupervised learning involves giving the AI unlabeled data sets and allowing it to find patterns and correlations in the data.'

In [20]:
# Document Loader
from langchain.document_loaders import TextLoader

loader = TextLoader("big.txt")
documents = loader.load()

In [23]:
# Text Splitter
from langchain.text_splitter import CharacterTextSplitter

text_splitter = CharacterTextSplitter(chunk_size=3000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

Created a chunk of size 3932, which is longer than the specified 3000
Created a chunk of size 5781, which is longer than the specified 3000
Created a chunk of size 3248, which is longer than the specified 3000
Created a chunk of size 4125, which is longer than the specified 3000
Created a chunk of size 3541, which is longer than the specified 3000
Created a chunk of size 3054, which is longer than the specified 3000
Created a chunk of size 3526, which is longer than the specified 3000
Created a chunk of size 4387, which is longer than the specified 3000
Created a chunk of size 3166, which is longer than the specified 3000
Created a chunk of size 3322, which is longer than the specified 3000
Created a chunk of size 3592, which is longer than the specified 3000
Created a chunk of size 3376, which is longer than the specified 3000
Created a chunk of size 3094, which is longer than the specified 3000
Created a chunk of size 54862, which is longer than the specified 3000


In [13]:
# Embeddings
from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings()

Downloading (…)a8e1d/.gitattributes: 100%|████████████████████████████████████████████████████████| 1.18k/1.18k [00:00<00:00, 1.45MB/s]
Downloading (…)_Pooling/config.json: 100%|█████████████████████████████████████████████████████████████| 190/190 [00:00<00:00, 255kB/s]
Downloading (…)b20bca8e1d/README.md: 100%|████████████████████████████████████████████████████████| 10.6k/10.6k [00:00<00:00, 2.98MB/s]
Downloading (…)0bca8e1d/config.json: 100%|█████████████████████████████████████████████████████████████| 571/571 [00:00<00:00, 836kB/s]
Downloading (…)ce_transformers.json: 100%|█████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 191kB/s]
Downloading (…)e1d/data_config.json: 100%|█████████████████████████████████████████████████████████| 39.3k/39.3k [00:00<00:00, 197kB/s]
Downloading pytorch_model.bin: 100%|████████████████████████████████████████████████████████████████| 438M/438M [01:18<00:00, 5.61MB/s]
Downloading (…)nce_bert_config.json: 100%|██████

In [24]:
# Vector Store
from langchain.vectorstores import FAISS

db = FAISS.from_documents(docs, embeddings)

query = "Did Holmes return to the Baker Street by 3 o'clock?"
docs = db.similarity_search(query)

In [25]:
docs[0].page_content

'I had been delayed at a case, and it was a little after half-past six when I found myself in Baker Street once more. As I approached the house I saw a tall man in a Scotch bonnet with a coat which was buttoned up to his chin waiting outside in the bright semicircle which was thrown from the fanlight. Just as I arrived the door was opened, and we were shown up together to Holmes\' room.\n\n"Mr. Henry Baker, I believe," said he, rising from his armchair and greeting his visitor with the easy air of geniality which he could so readily assume. "Pray take this chair by the fire, Mr. Baker. It is a cold night, and I observe that your circulation is more adapted for summer than for winter. Ah, Watson, you have just come at the right time. Is that your hat, Mr. Baker?"\n\n"Yes, sir, that is undoubtedly my hat."\n\nHe was a large man with rounded shoulders, a massive head, and a broad, intelligent face, sloping down to a pointed beard of grizzled brown. A touch of red in nose and cheeks, with 

### Additional Practice - Analyzing a Paper

In [50]:
# Document Loader
from langchain.document_loaders import ArxivLoader

loader = ArxivLoader("2206.04564")
documents = loader.load()

In [51]:
# Text Splitter
from langchain.text_splitter import LatexTextSplitter

text_splitter = LatexTextSplitter()
docs = text_splitter.split_documents(documents)

In [52]:
docs[0].page_content

'TwiBot-22: Towards Graph-Based\nTwitter Bot Detection\nShangbin Feng1,2∗ Zhaoxuan Tan1∗ Herun Wan1∗ Ningnan Wang1∗ Zilong Chen1,3∗ Binchi Zhang1,4∗\nQinghua Zheng1† Wenqian Zhang1\nZhenyu Lei1\nShujie Yang1\nXinshun Feng1\nQingyue Zhang1\nHongrui Wang1\nYuhan Liu1\nYuyang Bai1\nHeng Wang1\nZijian Cai1\nYanbo Wang1\nLijing Zheng1\nZihan Ma1\nJundong Li4\nMinnan Luo1\nXi’an Jiaotong University1, University of Washington2, Tsinghua University3, University of Virginia4\ncontact: shangbin@cs.washington.edu\nAbstract\nTwitter bot detection has become an increasingly important task to combat\nmisinformation, facilitate social media moderation, and preserve the integrity of\nthe online discourse. State-of-the-art bot detection methods generally leverage the\ngraph structure of the Twitter network, and they exhibit promising performance\nwhen confronting novel Twitter bots that traditional methods fail to detect.\nHowever, very few of the existing Twitter bot detection datasets are graph-based

In [53]:
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

In [54]:
from langchain.vectorstores import Chroma

vector_store = Chroma.from_documents(
    docs,
    embeddings,
    collection_name="deformable-graph-transformer",
    persist_directory="../chroma"
)

Using embedded DuckDB with persistence: data will be stored in: ../chroma


In [60]:
from langchain.chains import ConversationalRetrievalChain

chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vector_store.as_retriever(),
    return_source_documents=True
)

prompt = "How many users are there in the Twibot-22 dataset?"
response = chain({
    "question": prompt,
    "chat_history": ""
})

# Retrieve the answer
answer = response["answer"]
source = response["source_documents"]
print("Question:", prompt)
print("Answer:", answer)

Chroma collection deformable-graph-transformer contains fewer than 4 elements.
Chroma collection deformable-graph-transformer contains fewer than 3 elements.
Chroma collection deformable-graph-transformer contains fewer than 2 elements.


Question: How many users are there in the Twibot-22 dataset?
Answer:  I don't know.
