In [5]:
import os
from dotenv import load_dotenv, find_dotenv

In [6]:
# Load the .env file
load_dotenv(find_dotenv(), override=True)


True

LLM Models (Wrappers) Using Chat-GPT3 

In [7]:
from langchain.llms import OpenAI 
llm = OpenAI(model_name='text-davinci-003', temperature=0.7, max_tokens=512)
print(llm)

[1mOpenAI[0m
Params: {'model_name': 'text-davinci-003', 'temperature': 0.7, 'top_p': 1, 'frequency_penalty': 0, 'presence_penalty': 0, 'n': 1, 'logit_bias': {}, 'max_tokens': 512}


In [8]:
output = llm("What is an AI Large Language Model?")
print(output)

print("Number of tokens in prompt from prompt above", llm.get_num_tokens(output))



AI Large Language Models are a type of artificial intelligence system that uses large amounts of data to learn the structure and rules of a language. These models are typically used for natural language processing (NLP) tasks such as text classification, sentiment analysis, and machine translation. They are trained on large amounts of data, such as text from books, articles, and other sources, and use unsupervised learning techniques to learn the underlying patterns and rules of languages.
Number of tokens in prompt from prompt above 93


ChatModels: GPT-3.5-Turbo and GPT-4

In [9]:
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
from langchain.chat_models import ChatOpenAI

In [10]:
# going to make a chat act as a german physicist and respond to what quantum mechanics is 
chat = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0.5, max_tokens=1024) # 1024 tokens is the max for the free version
messages = [
    SystemMessage(content="You are a German physicist and respond only in German."),
    HumanMessage(content="What is quantum mechanics?")
]
output = chat(messages)

print(output)

content='Quantenmechanik ist ein physikalisches Theoriegebäude, das die Eigenschaften und das Verhalten von Teilchen auf atomarer und subatomarer Ebene beschreibt. Sie beschäftigt sich mit dem Verhalten von Materie und Energie auf kleinster Skala und beschreibt Phänomene wie Superposition, Verschränkung und Quanteninterferenz. Die Quantenmechanik verwendet mathematische Modelle wie die Schrödinger-Gleichung, um die Wahrscheinlichkeitsverteilung von Teilchen und deren Zustände zu beschreiben. Sie ist eine der fundamentalen Theorien der modernen Physik und hat zu vielen technologischen Fortschritten geführt, wie zum Beispiel der Entwicklung von Quantencomputern und Quantenkommunikationssystemen.'


Prompt Templates


In [1]:
from langchain import PromptTemplate
from langchain.llms import OpenAI

template = '''You are an experience virologist. Write a few sentences about the following {virus} in {language}.'''

prompt = PromptTemplate(
    input_variables=['virus', 'language'],
    template=template
)
print(prompt)

llm = OpenAI(model_name='text-davinci-003', temperature=0.7, max_tokens=512)

output = llm(prompt.format(virus='SARS-CoV-2', language='Spanish'))
print(output)

input_variables=['language', 'virus'] template='You are an experience virologist. Write a few sentences about the following {virus} in {language}.'


SARS-CoV-2 es un virus altamente contagioso que causa enfermedades respiratorias graves, como la COVID-19. Es una cepa de coronavirus que se transmite principalmente a través de contacto cercano con una persona infectada. Los síntomas se pueden manifestar desde unos pocos días hasta varias semanas después de la exposición. Por lo tanto, es importante tomar medidas preventivas para prevenir la propagación de SARS-CoV-2.


Simple Chains

In [2]:
from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate
from langchain.chains import LLMChain

llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0.5, max_tokens=1024)
prompt = PromptTemplate(
    input_variables=['virus', 'language'],
    template=template
)

chain = LLMChain(llm=llm, prompt=prompt)
output = chain.run({'virus':'HSV-1', 'language':'Spanish'})

In [3]:
print(output)

El HSV-1, también conocido como virus del herpes simple tipo 1, es un virus de ADN perteneciente a la familia Herpesviridae. Este virus es la causa principal del herpes labial, una infección que se caracteriza por la aparición de ampollas dolorosas alrededor de la boca. El HSV-1 se transmite principalmente a través del contacto directo con una persona infectada, ya sea a través de besos, compartir utensilios o por contacto con las lesiones activas. Aunque el herpes labial es una enfermedad común y generalmente benigna, en casos raros puede causar complicaciones graves en personas con sistemas inmunológicos debilitados.


Sequential Chains

In [5]:
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain import PromptTemplate
from langchain.chains import LLMChain, SimpleSequentialChain

llm1 = OpenAI(model_name='text-davinci-003', temperature=0.7, max_tokens=512)
prompt1 = PromptTemplate(
    input_variables=['concept'],
    template='You are an experienced scientist and Python Programmer. Write a function taht implements the concept of {concept}.'
)
chain1 = LLMChain(llm=llm1, prompt=prompt1)

llm2 = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=1.2, max_tokens=1024)
prompt2 = PromptTemplate(
    input_variables=['functions'],
    template = 'Given the python function {function}, describe it with as much detail.'
)
chain2 = LLMChain(llm=llm2, prompt=prompt2)

combine_chain = SimpleSequentialChain(chains=[chain1, chain2], verbose=True) # this chains the two chains above
output = combine_chain.run('linear regression')



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m

def linear_regression(x, y):
    """
    Implements the concept of linear regression.
    Returns the slope (m) and y-intercept (b) for the linear regression equation y = mx + b
    Args:
        x (list): a list of x values
        y (list): a list of y values
    Returns:
        m (float): the slope
        b (float): the y-intercept
    """
    
    # Calculate sums
    x_sum = sum(x)
    y_sum = sum(y)
    x_squared_sum = sum([x_i**2 for x_i in x])
    xy_sum = sum([x[i] * y[i] for i in range(len(x))])
    
    # Calculate m 
    m = (len(x) * xy_sum - x_sum * y_sum) / (len(x) * x_squared_sum - x_sum ** 2)
    
    # Calculate b
    b = (y_sum - m * x_sum) / len(x)
    
    return m, b[0m
[33;1m[1;3mThe given Python function implements the concept of linear regression. Linear regression is a statistical method used to model the relationship between two variables by fitting a linear equation to observed data.

In [6]:
print(output)

The given Python function implements the concept of linear regression. Linear regression is a statistical method used to model the relationship between two variables by fitting a linear equation to observed data. The linear equation is of the form y = mx + b, where m represents the slope (change in y for a unit change in x) and b represents the y-intercept (value of y when x is zero).

The function accepts two arguments, x and y, which are lists of x and y values, respectively. These lists represent the observed data points for which the linear regression line needs to be fitted.

The function first calculates the following sums:
- x_sum: the sum of all x values in the list
- y_sum: the sum of all y values in the list
- x_squared_sum: the sum of all squared x values (multiplying each x value by itself and summing them)
- xy_sum: the sum of the product of each x and y pair (multiplying x[i] and y[i] for each respective index i and summing them)

Using these sums, the function then calcu

Learning Agents (since LLMs can't solve logical issues well (i.e. 3.4**5.2) an alternative would be to have the LLM generate python code for it and run it. Of course there are many issues that can arise of this as well )

In [8]:
from langchain_experimental.agents.agent_toolkits import create_python_agent 
from langchain_experimental.tools.python.tool import PythonREPLTool
from langchain.llms import OpenAI

In [9]:
llm = OpenAI(temperature=0)
agent_executor = create_python_agent(
    llm=llm,
    tool=PythonREPLTool(),
    verbose=True
)

output = agent_executor.run('Calculate the square root of the factorial 20 and display with 4 decimal places.')

print(output)



[1m> Entering new AgentExecutor chain...[0m


Python REPL can execute arbitrary code. Use with caution.


[32;1m[1;3m I need to calculate the square root of 20!
Action: Python_REPL
Action Input: import math[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I need to use the math.sqrt() function
Action: Python_REPL
Action Input: print(round(math.sqrt(math.factorial(20)), 4))[0m
Observation: [36;1m[1;3m1559776268.6285
[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: 1559776268.6285[0m

[1m> Finished chain.[0m
1559776268.6285


Splitting and Embedding Text Using LangChain 

In [11]:
import os 
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [12]:
with open('churchill_speech.txt') as f:
    churchill_speech = f.read()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=100,
    length_function=len
)

chunks = text_splitter.create_documents([churchill_speech])
print("length of chunks", len(chunks))

length of chunks 24


Embedding costs 

In [13]:
def print_embedding_costs(texts):
    import tiktoken

    enc = tiktoken.encoding_for_model('text-embedding-ada-002')
    total_tokens = sum([len(enc.encode(page.page_content)) for page in texts])
    print(f"Total tokens: {total_tokens}")
    print(f"Embedding cost in USD: {total_tokens * 0.00000006}")

print_embedding_costs(chunks)

Total tokens: 5015
Embedding cost in USD: 0.0003009


In [15]:
from langchain.embeddings import OpenAIEmbeddings
embedding = OpenAIEmbeddings()

In [16]:
vector = embedding.embed_query('abc')
vector

[0.0026003900945243607,
 -0.011285445990905635,
 -0.009404538636195538,
 -0.039117222457942386,
 -0.03425231768417374,
 0.012063265143912339,
 -0.02122738659540084,
 -0.022853734632851872,
 0.01865351288299622,
 -0.0005285633100776218,
 0.0034188677622028346,
 0.019148488453639798,
 -0.0026852431226671723,
 -0.004553776460640191,
 -0.01868179714810028,
 0.0030900624818762422,
 0.027082240647811583,
 0.010521768970450962,
 0.010663190295971263,
 0.007410493289746674,
 -0.014382579539057892,
 0.017748414537021247,
 -0.006979157315587233,
 -0.015386672812897077,
 -0.020152582658801505,
 -0.0034754365252415865,
 0.009850016277245747,
 -0.020124298393697445,
 0.026247853895919285,
 -0.007353924759538554,
 0.007778189667421981,
 0.013583547652884408,
 -0.007452919687402765,
 -0.009941940604495205,
 -0.0103520633798266,
 -0.014255300346089623,
 -0.013640116183092526,
 -0.01578265345567644,
 0.010203570522369023,
 -0.00027334150236069853,
 0.0243245201435531,
 0.0046421650219210095,
 0.0139512

Inserting the Embeddings into a Pinecone Index

In [17]:
import os 
import pinecone 
from langchain.vectorstores import Pinecone 

pinecone.init(api_key=os.environ['PINECONE_API_KEY'], environment=os.environ.get('PINECONE_ENV'))

  from tqdm.autonotebook import tqdm


In [19]:
# deleting previous indexes 

index = pinecone.list_indexes()
for i in index:
    print('Deleting indexes')
    pinecone.delete_index(i)

print('done')

index_name = 'churchill-speech'
if index_name not in pinecone.list_indexes():
    print(f'Creating index {index_name}')
    pinecone.create_index(index_name, dimension=1536, metric='cosine')
    print('done')

done
Creating index churchill-speech
done


In [21]:
vector_store = Pinecone.from_documents(chunks, embedding, index_name=index_name)

Asking the agent questions and do similarity searches 

In [24]:
query = 'Where should we fight?'
result = vector_store.similarity_search(query)


for r in result:
    print(r.page_content)
    print('='*100)

the Gestapo and all the odious apparatus of Nazi rule, we shall not flag or fail. We shall go on to the
end, we shall fight in France, we shall fight on the seas and oceans, we shall fight with growing
confidence and growing strength in the air, we shall defend our Island, whatever the cost may be, we
shall fight on the beaches, we shall fight on the landing grounds, we shall fight in the fields and in the
streets, we shall fight in the hills; we shall never surrender, and even if, which I do not for a moment
believe, this Island or a large part of it were subjugated and starving, then our Empire beyond the
seas, armed and guarded by the British Fleet, would carry on the struggle, until, in God's good time,
the New World, with all its power and might, steps forth to the rescue and the liberation of the old.
the Gestapo and all the odious apparatus of Nazi rule, we shall not flag or fail. We shall go on to the
end, we shall fight in France, we shall fight on the seas and oceans, we shal

In [25]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=1, max_tokens=1024)

retriever = vector_store.as_retriever(search_type='similarity', search_kwargs={'k':3})

chain = RetrievalQA.from_chain_type(llm=llm, chain_type='stuff', retriever=retriever)

In [27]:
query = 'Where should we fight?'
answer = chain.run(query)
print(answer)

According to the Winston Churchill speech, "We shall fight on the beaches, we shall fight on the landing grounds, we shall fight in the fields and in the streets, we shall fight in the hills." Therefore, it suggests that the fighting should take place on various terrains, including the beaches, landing grounds, fields, streets, and hills.
