# PyJax LangChain examples

Use this interactive python notebook to see how you can use Python and LangChain for your own custom RAG applications.

If you are using a frontier LLM model like OpenAI you will need to set your api keys like the below example.

In [3]:
import os

os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY') # getpass.getpass()

from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o-mini")

# Basic model invocation

Here is the most basic example of a LLM model invocation.

In [4]:
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(content="Translate the following from English into Italian"),
    HumanMessage(content="hi!"),
]

ai_message = model.invoke(messages)

ai_message.content

'Ciao!'

# Document loaders

You can use document loaders to load text, html, markdown and PDF files into a text splitter.

In [5]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load PDF file
loader = PyPDFLoader("AIM_Bsc_w_Chg_1_2_and_3_dtd_9-5-24.pdf")
pages = loader.load_and_split()

# Create text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len
)

# Split documents
docs = text_splitter.split_documents(pages)
print(len(docs))
print(docs[0].page_content)

3325
April 20, 2023 
U.S. Department 
of Transportation 
Federal Aviation 
Administration 
Aeronautical 
Information 
Manual  Official Guide to
 Basic Flight Information and ATC Procedures 
An electronic version of this publication is available online at 
http://www.faa.gov/air_traffic/publications


# Create embeddings

Embeddings are needed to create data that can stored as vectors.

In [6]:
from langchain_openai import OpenAIEmbeddings

# Initialize the embedding model (make sure your OPENAI_API_KEY is set)
embeddings = OpenAIEmbeddings()

input_text = "The meaning of life is 42"
vector = embeddings.embed_query("What is the meaning of Life")
print(vector)

[-0.001592583954334259, -0.023564955219626427, -0.007989352568984032, -0.0030331783927977085, -0.031243721023201942, -0.013546875678002834, -0.023512089625000954, 0.0067932624369859695, -0.014009451493620872, -0.012251662090420723, -0.006231562700122595, 0.014736357145011425, 0.004351520910859108, 0.01078463438898325, 0.003406543517485261, -0.00637694401666522, 0.04007231816649437, -0.00939690601080656, 0.004473772831261158, -0.0037270428147166967, 0.003753475844860077, 0.00951585453003645, -0.007097241003066301, -0.003908769227564335, -0.00516763748601079, 0.010401356965303421, 0.006063052918761969, -0.03409847617149353, 0.01555577851831913, -0.011676745489239693, 0.017921525985002518, -0.018595565110445023, -0.01641484908759594, -0.013071082532405853, -0.007784497458487749, -0.00900701992213726, 0.006393464282155037, 0.0037237387150526047, 9.390916966367513e-05, -0.003921985626220703, 0.02746381238102913, 0.013546875678002834, -0.0022996647749096155, 0.003750171745195985, -0.02838896

In [7]:
vectors = embeddings.embed_documents(["hello", "goodbye"])
# Showing only the first 3 coordinates
print(len(vectors))
print(vectors[0][:3])

2
[-0.025122925639152527, -0.019487135112285614, -0.02802019938826561]


# Nomic Embedding

A good way to lower your costs is by doing your own document embedding rather than paying for Frontier embedding.

In [20]:

# from langchain_nomic import NomicEmbeddings

# Initialize the NomicEmbeddings model
# embeddings = NomicEmbeddings(model="nomic-embed-text-v1.5", inference_mode='local', device='gpu')

# nomic_vectors = embeddings.embed_query("Jokes about cats!")

# print(nomic_vectors[:3])
# print(len(nomic_vectors))

# Create vector store

The vector store is a database or program that can store and search based on vectors.

In [8]:
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_openai import OpenAIEmbeddings

vector_store = InMemoryVectorStore(embeddings)

from langchain_core.documents import Document

document_1 = Document(id="1", page_content="foo", metadata={"baz": "bar"})
document_2 = Document(id="2", page_content="thud", metadata={"bar": "baz"})
document_3 = Document(id="3", page_content="i will be deleted :(")

documents = [document_1, document_2, document_3]
vector_store.add_documents(documents=documents)

results = vector_store.similarity_search(query="thud",k=1)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

* thud [{'bar': 'baz'}]


In [9]:
# vector_store.add_documents(documents=docs)
print("PDF documents added to the vector_store.")

vector_store.add_documents(documents=docs)

results = vector_store.similarity_search(query="Runway End Identifier Lights (REIL)",k=5)
print(len(results))
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

PDF documents added to the vector_store.
5
* AIM 4/20/23 
2−1−3. Runway End Identifier Lights (REIL) 
REILs are installed at many airfields to provide rapid and positive identification of the approach end of a 
particular runway. The system consists of a pair of synchronized flashing lights located laterally on each side of 
the runway threshold. REILs may be either omnidirectional or unidirectional facing the approach area. They are 
effective for: 
a. Identification of a runway surrounded by a preponderance of other lighting. 
b. Identification of a runway which lacks contrast with surrounding terrain. 
c. Identification of a runway during reduced visibility. 
2−1−4. Runway Edge Light Systems 
a. Runway edge lights are used to outline the edges of runways during periods of darkness or restricted 
visibility conditions. These light systems are classified according to the intensity or brightness they are capable 
of producing: they are the High Intensity Runway Lights (HIRL), Medium In

In [10]:
# Define your query
query = "What does the document say about taxiway lights?"

# Retrieve relevant documents using vector_store (k is the number of documents to fetch)
retrieved_docs = vector_store.similarity_search(query, k=3)

# Combine the content from the retrieved documents to form context
context = "\n\n".join([doc.page_content for doc in retrieved_docs])

# Build a prompt that uses the context to answer the query
prompt = f"""
Use the following context to answer the question.

Context:
{context}

Question: {query}

Answer:
"""

# Build the messages (SystemMessage gives instruction and HumanMessage contains the actual prompt)
messages = [
  SystemMessage(content="Answer the question using only the provided context."),
  HumanMessage(content=prompt)
]

# Call ChatOpenAI to get the answer and print the result
response = model.invoke(messages)
print(response.content)

The document states that there are two types of taxiway lights: 

1. **Taxiway Edge Lights**: These lights outline the edges of taxiways during darkness or restricted visibility conditions and emit blue light. They often have variable intensity settings that can be adjusted at pilot request or by the controller when necessary.

2. **Taxiway Centerline Lights**: These lights facilitate ground traffic under low visibility conditions. They are located along the taxiway centerline and emit green light, burning steadily.

Additionally, there are **Clearance Bar Lights**, which are installed at holding positions on taxiways to increase the visibility of the holding position in low visibility conditions.


# Chain

Here is an example of a chain in LangChain.

In [11]:
from langchain_core.output_parsers.json import SimpleJsonOutputParser
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

prompt = PromptTemplate.from_template(
    'In JSON format, give me a list of {topic} and their '
    'corresponding names in French, Spanish and in a '
    'Cat Language.'
)

model = ChatOpenAI()
chain = prompt | model | SimpleJsonOutputParser()

async for chunk in chain.astream({'topic': 'colors'}):
    print('-')  # noqa: T201
    print(chunk, sep='', flush=True)  # noqa: T201

-
{}
-
{'colors': []}
-
{'colors': [{}]}
-
{'colors': [{'name': ''}]}
-
{'colors': [{'name': 'red'}]}
-
{'colors': [{'name': 'red', 'french': ''}]}
-
{'colors': [{'name': 'red', 'french': 'rou'}]}
-
{'colors': [{'name': 'red', 'french': 'rouge'}]}
-
{'colors': [{'name': 'red', 'french': 'rouge', 'spanish': ''}]}
-
{'colors': [{'name': 'red', 'french': 'rouge', 'spanish': 'ro'}]}
-
{'colors': [{'name': 'red', 'french': 'rouge', 'spanish': 'rojo'}]}
-
{'colors': [{'name': 'red', 'french': 'rouge', 'spanish': 'rojo', 'cat': ''}]}
-
{'colors': [{'name': 'red', 'french': 'rouge', 'spanish': 'rojo', 'cat': 'me'}]}
-
{'colors': [{'name': 'red', 'french': 'rouge', 'spanish': 'rojo', 'cat': 'meow'}]}
-
{'colors': [{'name': 'red', 'french': 'rouge', 'spanish': 'rojo', 'cat': 'meow'}, {}]}
-
{'colors': [{'name': 'red', 'french': 'rouge', 'spanish': 'rojo', 'cat': 'meow'}, {'name': ''}]}
-
{'colors': [{'name': 'red', 'french': 'rouge', 'spanish': 'rojo', 'cat': 'meow'}, {'name': 'blue'}]}
-
{'colo