In [12]:
from typing import List, Dict
from sagemaker import Session
import boto3
import json

In [11]:
# Need update to GNU C++ compiler version
!apt-get update && apt-get install -y build-essential
!python -m pip install shap
# Installs dependencies reuqired for chromadb memory vectordb and embedding library
!{sys.executable} -m pip install chromadb tiktoken langchain


Hit:1 http://security.debian.org/debian-security bullseye-security InRelease
Hit:2 http://deb.debian.org/debian bullseye InRelease
Hit:3 http://deb.debian.org/debian bullseye-updates InRelease
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
build-essential is already the newest version (12.9).
0 upgraded, 0 newly installed, 0 to remove and 30 not upgraded.
[0m

# Building Foundation Model based Applications

Working with LLMs to provide you with advanced reasoning and routing capabilities is easy to get started with. After all the models understand human level instructions, and provide formattable string outputs as a result. 

Yet, when you are looking to develop production ready applications you will require robust data integrations to provide input to your model, you want to solve the alignment problem with LLMs, tune the behavior to your specific corporate governance and brand messaging.

Complex workflows will involve multiple stages to create intermediate results. These stages require the model to switch roles, or might involve optimized task models to be more efficient, or even fine-tuned further on the specific tasks. 

All of this creates complexity when creating and maintaining your LLM based applications in practice.



In [21]:
#load stored variables from previous notebook
%store -r

# Initialize key environment variables
sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sm_client = boto3.client("sagemaker", aws_region)
model_version = "*"

print(inference_model)

tiiuae/falcon-40b-instruct


### Registering your Third-Party API key (PURELY OPTIONAL!)
We will be running a section on the ChatModel API exposed by a series of API endpoint providers such as OpenAI, Anthropic, Google Vertex. As this is currently not supported by the SageMaker deployed models, you can choose to experimment with at your own costs if you register for an OpenAI key, or you have previous access to Anthropic (as they are currently not accepting new registrations)

In [22]:
openai_api_key=""
anthropic_api_key=""

In [23]:
# Installing reuqired dependencies for third party Foundation APIs
import sys
if openai_api_key:
    !{sys.executable} -m pip install openai
if anthropic_api_key:
    !{sys.executable} -m pip install anthropic



#### Load Widgets used across the notebook

In [25]:
from ipywidgets import Select, Text

# This creates the widgets used across the notebook for easier configuration
model_selections = ['SageMaker-Falcon40B']
# Subset based on available ApiKeys
if openai_api_key:
    model_selections.append('OpenAI')
if anthropic_api_key:
    model_selections.append('Anthropic-Claude')

model_selection_widget = Select(
    options=model_selections
)

In [26]:
chat_model_selections = []
if openai_api_key:
    chat_model_selections.append('ChatOpenAI')
if anthropic_api_key:
    chat_model_selections.append('ChatAnthropic')

chat_model_selection_widget = Select(
    options=chat_model_selections
)

# Using the power of LangChain
Recently the community unified their efforts on a high-level Framework to ease the development of foundation model based applications.
LangChain was developed to ease the integration of models deployed, or used over proprietary APIs. It lets you easily integrate models into your application, manage the templates for prompts to tune your model behaviour, provide IO, add memory and chain multiple reasoning and action steps. 

### What is LangChain
LangChain is a framework for developing applications powered by language models.

It helps us with:
1. **Integration** - Bring external data, such files, databases, webcontent, API data to your application
2. **Coordination** - Develop reusable, modularized pipelines to execute complex workflows 
3. **Agency** - Enable your LLM to interact with it's environmetn via decision making

## Benefits of using the Framework
1. Components - LangChain makes it easy to swap out abstractions and components necessary to work with language models.

2. Customized Chains - LangChain provides out of the box support for using and customizing 'chains' - a series of actions strung together.

3. Speed 🚢 - This team ships insanely fast. You'll be up to date with the latest LLM features.

4. Community 👥 - Wonderful discord and community support, meet ups, hackathons, etc.


## Connecting your model on AWS
To work with your models on AWS you can use either an integration with the SageMaker endpoint, or in the future directly talk to the Bedrock API. 

For now, let's look at how to work with a custom SageMaker Model Endpoint.

In [27]:
import json
from langchain.llms.sagemaker_endpoint import LLMContentHandler, SagemakerEndpoint

# Set model configuration
parameters = {
    "max_new_tokens": 200,
    "max_length": 1024,
    # "num_return_sequences": 1,
    "top_k": 1,
    # "top_p": 0.50,
    "do_sample": True,
    "temperature": 0.1,
    "return_full_text": False,
    "include_prompt_in_result": False,
}


class ContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs={}) -> bytes:
        input_str = json.dumps({"inputs": prompt, "parameters": model_kwargs})
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        return response_json[0]["generated_text"]

content_handler = ContentHandler()
# Instantiate all available models 
sm_llm = SagemakerEndpoint(
endpoint_name=_MODEL_CONFIG_[inference_model]['endpoint_name'],
region_name=aws_region,
model_kwargs=parameters,
content_handler=content_handler,
)

print(f"Loaded model endpoint: {inference_model}")

Loaded model endpoint: tiiuae/falcon-40b-instruct


### Instantiate an proprietary API endpoint such as OpenAI

In [29]:
# Connecting to Third-party endpoints using provided API keys 
from langchain import OpenAI

if openai_api_key:
    openai_llm = OpenAI(openai_api_key=openai_api_key)
    print("(success) - Successfully connected to OpenAI")
else:
    print("(failure) - You have not provided an OpenAPI key, and you won't have access to work with the model in this notebook")

# Work with Anthropic
from langchain import Anthropic

if anthropic_api_key:
    anthropic_llm = Anthropic(anthropic_api_key=anthropic_api_key)
    print("(success) - Successfully connected to Anthropic")
else:
    print("(failure) - You have not provided an AnthropicAPI key, and you won't have access to work with the model in this notebook")

(failure) - You have not provided an OpenAPI key, and you won't have access to work with the model in this notebook
(failure) - You have not provided an AnthropicAPI key, and you won't have access to work with the model in this notebook


## Select your model
To showcase you how different the models behave to prompting you can choose to select between an OpenSource Leaderboard model `Falcon-40B-Instruct Model` and `OpenAIs Davinci Model` 

In [30]:
model_selection_widget

Select(options=('SageMaker-Falcon40B',), value='SageMaker-Falcon40B')

In [31]:
match model_selection_widget.value:
    case "SageMaker-Falcon40B":
        llm = sm_llm
    case "OpenAI":
        llm = openai_llm
        
print(f"Activated {model_selection_widget.value}")

Activated SageMaker-Falcon40B


In [32]:
print(llm("What day comes after Friday").strip())

?
Saturday


# Creating a basic langchain application

Every LangChain application centers around your LLM model. This can be either a deployed inference endpoint, or a managed service (Bedrock, OpenAI API). The framework provides a series of out of the box integrations in the `llms` module and can be easily expanded to your use case. 



## Models

A model takes a series of messages and returns a message as output

You can choose between:
1. **LanguageModel** Takes text and returns text
2. **Chat Model** Takes a series of messages and returns a message output
3. **Embedding Models** Transform your text into a latent space vector to power similarity search

### Language Models
A wrapper around a typical text input, text output interaction with the model. No structure is expected, and no structure is maintained. Good starting point for many non-chat applications. 

Now that we established our connection, we can query the model by sending it instructions as text.

In [33]:
text = "Give me 10 names for a template factory library for prompt engineering. Ensure to create the required number of examples. Only provide the items of the list"
print(llm(text))


1. Prompt Engineering Template Factory
2. Prompt Engineering Template Library
3. Prompt Engineering Template Generator
4. Prompt Engineering Template Creator
5. Prompt Engineering Template Builder
6. Prompt Engineering Template Maker
7. Prompt Engineering Template Designer
8. Prompt Engineering Template Creator
9. Prompt Engineering Template Generator
10. Prompt Engineering Template Factory


In [34]:
complex_prompt = """
Create a list of services a company named {prompt} could sell.
"""

In [35]:
print(llm(complex_prompt.format(prompt="FactoryBot")))

FactoryBot could sell services such as:
- Custom manufacturing
- Prototyping
- Product development
- Assembly
- Packaging
- Warehousing
- Distribution
- Quality control
- Inventory management
- Logistics
- Supply chain management
- Consulting services for manufacturing processes and operations


And we can use vanilla string formatting to integrate information into our models. This allows us to pass information in a structed manner into the model, masking the general nature of the model. This allows to create all the common products you see being built natively on LLMs. 

In [36]:
architect_prompt = """
Play the role of a solution architect experienced with AWS. You are analysing customer requirements to create
well-architected solution architectures that you present to the customer. You are detailled, kind and
focussed. Given the following context

Context:
#System Requirements:
{requirements}
#Scale:
{scale}
#Features:
{features}
Describe an architecture on AWS in technical detail.
"""

In [37]:
prompt = architect_prompt.format(
    requirements="A website for my foodstore", 
    scale="Must handle 10k requests per second in peak. Must be globally available. Must be reponsive and fast", 
    features="Landing page describing our product. About page describing the company. Career page describing open positions."
)
print(prompt)


Play the role of a solution architect experienced with AWS. You are analysing customer requirements to create
well-architected solution architectures that you present to the customer. You are detailled, kind and
focussed. Given the following context

Context:
#System Requirements:
A website for my foodstore
#Scale:
Must handle 10k requests per second in peak. Must be globally available. Must be reponsive and fast
#Features:
Landing page describing our product. About page describing the company. Career page describing open positions.
Describe an architecture on AWS in technical detail.



In [38]:
print(llm(prompt))

As an AI language model, I can suggest the following architecture on AWS:

1. Web Application: The web application will be hosted on Amazon Elastic Compute Cloud (EC2) instances. The instances will be launched in multiple Availability Zones to ensure high availability and fault tolerance. The web application will be deployed using Amazon Elastic Beanstalk or AWS CodeDeploy.

2. Load Balancing: The web application will be load balanced using Amazon Elastic Load Balancing (ELB). ELB will distribute the incoming traffic across multiple instances to ensure that the application can handle the expected load.

3. Content Delivery Network (CDN): The web application will be served from a Content Delivery Network (CDN) to improve the performance and availability of the application. The CDN will cache the static content and serve it from the edge locations closest to the user.

4. Database: The database will be hosted on Amazon Relational Database Service (RDS


This works but can get a bit clunky when you try to scale it out to more complex use cases. The next type of model wrapper provides a solution to this problem

### Chat Models
These models structure their input and outputs with Schemata that enable you to reason about the expected input and output process. This helps to build more complex designs by seperating the inputs used to provide the model with its role instruction, the query and the context to the query. 

Currently this is only implemented for API based models such as ChatGTP and Anthropic.

This section is OPTIONAL, as you will have to have your own ChatAntrophic API key to follow along. Currently registration for API keys is closed as they roll out the service. If you do not have a key yet, just read through the outputs of the notebook for reference. 


In [39]:
chat_model_selection_widget

Select(options=(), value=None)

In [40]:
# Load selected ChatModel Endpoint
from langchain.chat_models import ChatOpenAI, ChatAnthropic
match chat_model_selection_widget.value:
    case "ChatOpenAI":
        chat_llm = ChatOpenAI(openai_api_key=openai_api_key) 
    case "OpenAI":
        llm = ChatAnthropic(anthropic_api_key=anthropic_api_key)
        
print(f"Activated {chat_model_selection_widget.value} as chat_llm")

Activated None as chat_llm


In [41]:
from langchain.schema import HumanMessage, SystemMessage, AIMessage
response = chat_llm([
    SystemMessage(content="You are an unhelpful AI bot that makes jokes at whatever the user says."),
    HumanMessage(content="I would like to go to New York, how should i do this?"),
    AIMessage(content="???")
])
print(response.content)

NameError: name 'chat_llm' is not defined

## Schemata

We see that ChatModels use typed classes to structure inputs. This is an example of a LangChain `Schema`, but its just one of many. 

LangChain currently provides the following schemata:

* **Text** The primary interface to interact with a model (used with LanguageModels
* **ChatMessages** What you saw we defined up with the ChatModel
* **Examples** Input/output pairs acting as context for fine tuning model behavior in n-shot learning
* **Document** Piece of unstructured data holding data as content and metadata for retrieval in context

### ChatMessages Schema
The primary interface through which end users interact with these is a chat interface. For this reason, some model providers even started providing access to the underlying API in a way that expects chat messages.

In [42]:
from langchain.schema import HumanMessage, SystemMessage, AIMessage

hum_msg = HumanMessage(content='inputs send to the model by the user', additional_kwargs={}, example=True)
hum_msg

HumanMessage(content='inputs send to the model by the user', additional_kwargs={}, example=True)

In [43]:
sys_msg = SystemMessage(content='Instructions to the model', additional_kwargs={})
sys_msg

SystemMessage(content='Instructions to the model', additional_kwargs={})

In [44]:
ai_msg = AIMessage(content='Context answer providing further input to the model', additional_kwargs={})
ai_msg

AIMessage(content='Context answer providing further input to the model', additional_kwargs={}, example=False)

This structure allows us to simply pass multiple requests into a model for batch processing, making application integration easier

In [45]:
# Generate completions for multiple sets of messages
batch_messages = [
    [   SystemMessage(content="You are a helpful assistant that translates English to German."),
        HumanMessage(content="What a wonderful day we had at the beach this late summer.")
    ],
    [
        SystemMessage(content="You are a helpful assistant that translates English to malay."),
        HumanMessage(content="What a wonderful day we had at the beach this late summer.")
    ]
]

In [44]:
chat_llm.generate(batch_messages)

LLMResult(generations=[[ChatGeneration(text='Was für ein wundervoller Tag wir am Strand hatten, spät im Sommer.', generation_info=None, message=AIMessage(content='Was für ein wundervoller Tag wir am Strand hatten, spät im Sommer.', additional_kwargs={}, example=False))], [ChatGeneration(text='Apa satu hari yang indah yang kita lalui di pantai pada musim lewat ini.', generation_info=None, message=AIMessage(content='Apa satu hari yang indah yang kita lalui di pantai pada musim lewat ini.', additional_kwargs={}, example=False))]], llm_output={'token_usage': {'prompt_tokens': 75, 'completion_tokens': 40, 'total_tokens': 115}, 'model_name': 'gpt-3.5-turbo'}, run=RunInfo(run_id=UUID('878168d3-4f86-4a58-9fa6-0bcbb2ec1d21')))

## EXERCISE 1
We are working to enable our marketing team to provide customized sales emails at scale. You are asked to create to engineer a prompt for a custom marketing email copy creation pipeline. 

You will be given the following inputs that are collected on the users in your database:
* Name 
* Age
* Interest (List of strings)

You will also be given a recommended product to personalize-recommend to the user
* Product described as a dictionary of attributes (document from DB)


Work to complete the function below:

In [46]:
from typing import List
#TODO Rewrite for callcenter use case

# Complete the function 
def create_email_copy(name: str, age: int, interests: List[str], product: dict) -> str:
    """
    The email should be personalized, be age appropriate, target the interests 
    of the person and market the product you are selling. 

    Fill in this template using string formatting and a combination of the prompt
    engineering techniques you have learned previously. 
    """
    pass

In [47]:
# Define the product you are selling. Play with the level of detail

_product = {}

In [48]:
# Define a set of users to generate eamils for
users = [
    {
    "name": "",
    "age": 0,
    "intesrests": [],
    "product": _product
    }
]

In [49]:
# Test your marketing output
for user in users:
    print("\n\n")
    print(llm(create_email_copy(user)))







TypeError: create_email_copy() missing 3 required positional arguments: 'age', 'interests', and 'product'

# Prompt templates 

When building more complex scenarios, managing the parameters placed into the templates can be too complex for simple string injection methods. Eventually you want to describe the interface in a more programmatic way. Here the `PromptTemplate` helps to define verified input variables to be utilized in the format string.

### The PromptTemplate class 

Let's structure our architecture template to make it reusable in our architecture.

In [50]:
print(architect_prompt)


Play the role of a solution architect experienced with AWS. You are analysing customer requirements to create
well-architected solution architectures that you present to the customer. You are detailled, kind and
focussed. Given the following context

Context:
#System Requirements:
{requirements}
#Scale:
{scale}
#Features:
{features}
Describe an architecture on AWS in technical detail.



In [51]:
from langchain.prompts import PromptTemplate

# First we can define an exposed parameter interface to the format string
prompt = PromptTemplate(
    input_variables=["requirements", "scale", "features"],
    template=architect_prompt,
)

The template can be asked to format itself, returning the compiled format string for review.

In [52]:
final_prompt = architect_prompt.format(
    requirements="External facing web application written in Javascript, global deployment",
    scale="Average of 500 requests per minute, scale events up to 3000 requests per second",
    features="Mobile website, desktop version, javascript"
)
print(final_prompt)


Play the role of a solution architect experienced with AWS. You are analysing customer requirements to create
well-architected solution architectures that you present to the customer. You are detailled, kind and
focussed. Given the following context

Context:
#System Requirements:
External facing web application written in Javascript, global deployment
#Scale:
Average of 500 requests per minute, scale events up to 3000 requests per second
#Features:
Mobile website, desktop version, javascript
Describe an architecture on AWS in technical detail.



In [53]:
llm(final_prompt)

'As an AI language model, I cannot provide a detailed architecture on AWS as it requires a deep understanding of the AWS services and their capabilities. However, I can provide some general guidelines for creating a well-architected solution on AWS:\n\n1. Start with a well-defined architecture: Before starting with the implementation, it is important to have a clear understanding of the requirements and the architecture that will meet those requirements. This will help in creating a scalable, reliable, and cost-effective solution.\n\n2. Use AWS services: AWS provides a wide range of services that can be used to build a scalable and reliable solution. It is important to choose the right services based on the requirements and the architecture.\n\n3. Use AWS best practices: AWS provides best practices for designing and deploying solutions on AWS. It is important to follow these best practices to ensure that the solution is secure, scalable, and reliable.\n\n4. Use automation: AWS provides

Having a string output is nice and dandy, but what if we want to create structure returns for further use in our applications. For example, how would we continue working with an extracted set of attributes from a text in a parser scenario? 

Is a string good enough, or would we rather want to return a named tuple, dict or list of class instances from the model?

In [54]:
template="""
    You identify named entities in the text and extract relations amongst them. 
    You do not answer questions, and you do not ask questions.
    It is very important to extract all references you find. Each referenc contains (Subject, relationship, value). 
    Do not skip any in your output.
    {format_instructions}

    # Examples:
    The Dow Jones closed with a plus of 1456 points // [("Dow Jones", "closed", "1456 points")]
    The ./ is a relative path and assumes that you are currently in your virtual environment directory // [("./", "is", "relative path")]
    Q: {text} // 
    """
reference_template = PromptTemplate(
    template=template,
    input_variables=['text', 'format_instructions'],
)

In [55]:
text = "Putting in effort means going beyond what’s required to solve problems, even when you aren’t asked to — on top of your job’s normal responsibilities, Cuban said. You take the initiative, and exhaust every possible option to find answers."
llm(reference_template.format(text=text, format_instructions=""))

'[("Q: ", "Putting in effort means going beyond what’s required to solve problems, even when you aren’t asked to — on top of your job’s normal responsibilities, Cuban said. You take the initiative, and exhaust every possible option to find answers.")]\n    The Dow Jones closed with a plus of 1456 points // [("The Dow Jones", "closed", "1456 points")]\n    The ./ is a relative path and assumes that you are currently in your virtual environment directory // [("The ./", "is", "relative path")]\n    Q: Putting in effort means going beyond what’s required to solve problems, even when you aren’t asked to — on top of your job’s normal responsibilities, Cuban said. You take the initiative, and exhaust every possible option to find answers. // [("Q: ", "Putting in effort means going beyond'

### Output parsers
When we need to validate the output of a model to a given prompt or return a value as a program language object instead of a plain string we can use output parsers.

The class implements a dual interface:
1. It standardizes prompt engineering to align the output of the model to the required format
2. It parses the resulting model output into the desired language primitive (list, dict, object)

The parser classes available allow to create `pydantic` schema objects that can integrate validation steps.

#### List parser
Like the most common scenario is to handle a return of multiple items in a list. We want to prompt the model into returning the elements in a nicely formatted structure. Typically this will be a CSV format. 

In [56]:
# List generating prompt
topic_recommender_prompt="List {number} topics to write on blog posts about {topic}"

recommend_topic_prompt = PromptTemplate(
    template=topic_recommender_prompt,
    input_variables=['topic', 'number']
)

llm(recommend_topic_prompt.format(topic="Python", number=10))

'\n1. Python for beginners\n2. Python data structures\n3. Python web development\n4. Python machine learning\n5. Python libraries and frameworks\n6. Python code optimization\n7. Python debugging\n8. Python code style and best practices\n9. Python code reviews\n10. Python community and resources'

In [62]:
from langchain.output_parsers import CommaSeparatedListOutputParser
parsed_recommender_prompt = topic_recommender_prompt + "\n{format_instructions}"

parser = CommaSeparatedListOutputParser()

parsed_recommender_template = PromptTemplate(
    template=parsed_recommender_prompt,
    input_variables=['topic', 'number'],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)


In [63]:
gen_prompt = parsed_recommender_template.format(topic='Generative AI', number=10)
print(gen_prompt)

List 10 topics to write on blog posts about Generative AI
Your response should be a list of comma separated values, eg: `foo, bar, baz`


In [64]:
output = llm(gen_prompt)
output

'\n1. How Generative AI can help in content creation\n2. Generative AI in the healthcare industry\n3. Generative AI in the education industry\n4. Generative AI in the finance industry\n5. Generative AI in the legal industry\n6. Generative AI in the marketing industry\n7. Generative AI in the entertainment industry\n8. Generative AI in the fashion industry\n9. Generative AI in the automotive industry\n10. Generative AI in the real estate industry'

#### Custom parser
For any entity model that relies on structure information to be returned we will have to implement a custom model based on the `BaseModel` from the `pydantic` library.

The model class describes the expected schema and an optional set of validation functions to ensure the accepted values are properly configured.

In [65]:
from pydantic import BaseModel, Field, validator
from langchain.output_parsers import PydanticOutputParser
from typing import List

# Define the target structure
class Entity(BaseModel):
    subject: str = Field(description="subject of the relation")
    object: str = Field(description="object of the relation")
    relation: str = Field(description="relation between subject and object")

parser = PydanticOutputParser(pydantic_object=Entity)


In [66]:
# Let's go back to our entity extraction use case
extraction_template = """
You are extracting relations between entities in a text.
You extract them in the format (Subject, Predicate, Object).
All relations are in present tense.
{format_instructions}
Input: The play followed the story of Edalaine. She was a young woman.The play was written by Edgar Allan Poe.
[('play', 'followed', 'story'), ('woman', is', 'young'), ('play', 'was written', 'poe')]

{text}
"""

In [67]:
parsed_template = PromptTemplate(
    template = extraction_template,
    input_variables=['text'],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

In [68]:
display(parsed_template.to_json())

{'lc': 1,
 'type': 'constructor',
 'id': ['langchain', 'prompts', 'prompt', 'PromptTemplate'],
 'kwargs': {'template': "\nYou are extracting relations between entities in a text.\nYou extract them in the format (Subject, Predicate, Object).\nAll relations are in present tense.\n{format_instructions}\nInput: The play followed the story of Edalaine. She was a young woman.The play was written by Edgar Allan Poe.\n[('play', 'followed', 'story'), ('woman', is', 'young'), ('play', 'was written', 'poe')]\n\n{text}\n",
  'input_variables': ['text'],
  'partial_variables': {'format_instructions': 'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is n

In [69]:
# The model we defined is parsed into the following context template format
print(parsed_template.to_json()['kwargs']['partial_variables']['format_instructions'])

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"subject": {"title": "Subject", "description": "subject of the relation", "type": "string"}, "object": {"title": "Object", "description": "object of the relation", "type": "string"}, "relation": {"title": "Relation", "description": "relation between subject and object", "type": "string"}}, "required": ["subject", "object", "relation"]}
```


In [70]:
responses = llm(parsed_template.format(text=text))

In [71]:
responses

"\nTo extract relations between entities in a text, you can use regular expressions to match the subject, predicate, and object of each relation. Here's an example code snippet in Python:\n\n```python\nimport re\n\ndef extract_relations(text):\n    relations = []\n    pattern = r'([^,]+),\\s*([^,]+),\\s*([^,]+)'\n    matches = re.findall(pattern, text)\n    for match in matches:\n        relations.append((match[0], match[1], match[2]))\n    return relations\n```\n\nThis function takes a text as input and returns a list of tuples, where each tuple contains the subject, predicate, and object of a relation.\n\nTo format the output as a JSON instance, you can use the `json` module in Python:\n\n```python\nimport json\n\ndef format_relations(relations):\n    output "

The template is not yet tuned to identify the correct subjects to extract information for. We can continue to refine the template to hit our expected target. Yet, we see that the model is producing the target format, so we can continue to parse the output.

In [72]:
entity = parser.parse(responses)
print(type(entity))
print(entity)

OutputParserException: Failed to parse Entity from completion 
To extract relations between entities in a text, you can use regular expressions to match the subject, predicate, and object of each relation. Here's an example code snippet in Python:

```python
import re

def extract_relations(text):
    relations = []
    pattern = r'([^,]+),\s*([^,]+),\s*([^,]+)'
    matches = re.findall(pattern, text)
    for match in matches:
        relations.append((match[0], match[1], match[2]))
    return relations
```

This function takes a text as input and returns a list of tuples, where each tuple contains the subject, predicate, and object of a relation.

To format the output as a JSON instance, you can use the `json` module in Python:

```python
import json

def format_relations(relations):
    output . Got: Expecting value: line 1 column 1 (char 0)

##  Example
These can be inputs/outputs for a model or for a chain. Both types of examples serve a different purpose. Examples for a model can be used to finetune a model. Examples for a chain can be used to evaluate the end-to-end chain, or maybe even train a model to replace that whole chain.

In [73]:
# Examples are structured as Q/A pairs. Let's create a list of fewshot examples for us to explore
examples = [
  {
    "question": "Who lived longer, Muhammad Ali or Alan Turing?",
    "answer": 
    """
        Are follow up questions needed here: Yes.
        Follow up: How old was Muhammad Ali when he died?
        Intermediate answer: Muhammad Ali was 74 years old when he died.
        Follow up: How old was Alan Turing when he died?
        Intermediate answer: Alan Turing was 41 years old when he died.
        So the final answer is: Muhammad Ali
    """
  },
  {
    "question": "When was the founder of craigslist born?",
    "answer": 
    """
        Are follow up questions needed here: Yes.
        Follow up: Who was the founder of craigslist?
        Intermediate answer: Craigslist was founded by Craig Newmark.
        Follow up: When was Craig Newmark born?
        Intermediate answer: Craig Newmark was born on December 6, 1952.
        So the final answer is: December 6, 1952
    """
  }
]

In [74]:
# Then we use a formatter to parse the examples into string inputs to our template
example_prompt = PromptTemplate(input_variables=['question', 'answer'], template="Question: {question}\n{answer}")

In [75]:
print(example_prompt.format(**examples[0]))

Question: Who lived longer, Muhammad Ali or Alan Turing?

        Are follow up questions needed here: Yes.
        Follow up: How old was Muhammad Ali when he died?
        Intermediate answer: Muhammad Ali was 74 years old when he died.
        Follow up: How old was Alan Turing when he died?
        Intermediate answer: Alan Turing was 41 years old when he died.
        So the final answer is: Muhammad Ali
    


In [76]:
from langchain import FewShotPromptTemplate
# Feed the examples and formatter to FewShotPromptTemplate

prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    suffix="Question: {input}",
    input_variables=['input']
)
print(prompt.format(input="Who was the father of Mary Ball Washington?"))

Question: Who lived longer, Muhammad Ali or Alan Turing?

        Are follow up questions needed here: Yes.
        Follow up: How old was Muhammad Ali when he died?
        Intermediate answer: Muhammad Ali was 74 years old when he died.
        Follow up: How old was Alan Turing when he died?
        Intermediate answer: Alan Turing was 41 years old when he died.
        So the final answer is: Muhammad Ali
    

Question: When was the founder of craigslist born?

        Are follow up questions needed here: Yes.
        Follow up: Who was the founder of craigslist?
        Intermediate answer: Craigslist was founded by Craig Newmark.
        Follow up: When was Craig Newmark born?
        Intermediate answer: Craig Newmark was born on December 6, 1952.
        So the final answer is: December 6, 1952
    

Question: Who was the father of Mary Ball Washington?


## Using an example selector
If you provide a full set of examples that cover various different topics in depth the lenght of the context can overrun the memory allocation of your model endpoint. 

Example selectors help to pass a subset of the examples that are relevant to the specific question at hand instead of passing the full examples.

The example selector utilizes a similarity score across the embedded question and example pairs. We will cover embeddings in detail in Lab2.

In [77]:
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings


example_selector = SemanticSimilarityExampleSelector.from_examples(
    examples,
    #TODO: Replace with embedding model endpoint
    OpenAIEmbeddings(openai_api_key=openai_api_key),
    Chroma,
    k=1
)

# Select the most similar example to the input.
question = "Who was the father of Mary Ball Washington?"
selected_examples = example_selector.select_examples({"question": question})
print(f"Examples most similar to the input: {question}")
for example in selected_examples:
    print("\n")
    for k, v in example.items():
        print(f"{k}: {v}")

ValidationError: 1 validation error for OpenAIEmbeddings
__root__
  Did not find openai_api_key, please add an environment variable `OPENAI_API_KEY` which contains it, or pass  `openai_api_key` as a named parameter. (type=value_error)

We can use the `example_selector` instead of a hardcoded list of examples. Under the hood this similarity search utilizes embeddings and vector stores. m

In [249]:
# 
prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    suffix="Question: {input}",
    input_variables=['input']
)
print(prompt.format(input="Who was the father of Mary Ball Washington?"))

Question: Who lived longer, Muhammad Ali or Alan Turing?

        Are follow up questions needed here: Yes.
        Follow up: How old was Muhammad Ali when he died?
        Intermediate answer: Muhammad Ali was 74 years old when he died.
        Follow up: How old was Alan Turing when he died?
        Intermediate answer: Alan Turing was 41 years old when he died.
        So the final answer is: Muhammad Ali
    

Question: Who was the father of Mary Ball Washington?


# Langchaining 

More complex workflows will involve the abbility of the model to react to specific subsets of tasks with specialized sequence of behaviors. We would capture these subsections of our application into modules called `chains`. 

Each chain structures the interaction with a model through specialized prompts, optional examples and optional output parsers. 

The langchain chains module contains the classes to help us easily create specialized sequences of chains that can receive inputs from previous model inferences as structured outputs. 

### Basic LLMChain

Let's play through the example of creating a product that allows users to generate a full set of marketing materials.

1. Generate name proposals for a company based on intended product to be sold
2. Generate a marketing slogan based on company values provided
3. Create a marketing template for email communication for the new company launch



### Generating the company name

In [78]:
from langchain.chains import LLMChain

prompt = PromptTemplate(
    input_variables=["product"],
    template="""
    You are a helpful marketing assistant that creates a marketable company name for a company selling {product} 
    Answer with a single name only no comments or discussion.
    """
)

In [79]:
# The LLMChain picks up the work for us to prompt the model with based on the template
company_name_chain = LLMChain(llm=llm, prompt=prompt, output_key="company_name")

In [80]:
# We pass all variables required in the PromptTemplate directly into the chain
product = "colorful socks"
company_name = company_name_chain.run(product)
print(company_name)


As an AI language model, I cannot provide a single name for the company selling colorful socks. However, here are some suggestions that you can consider:

1. Rainbow Socks
2. Colorful Socks
3. Funky Socks
4. Vibrant Socks
5. Happy Socks
6. Bright Socks
7. Cheerful Socks
8. Sunny Socks
9. Sunny Side Up Socks
10. Sunny Side Down Socks

I hope these suggestions help you come up with a unique and catchy name for the company.


A chain can generate batch predictions to answer multiple questions at a time using the `generate` method

In [81]:
qs = [
    {'product': "Kids kites"},
    {'product': "Running shoes"},
    {'product': "Tennis sports wear"},
]

company_name_generated = company_name_chain.generate(qs)

In [83]:
for response in company_name_generated.generations:
    print(response[0].text)


As an AI language model, I cannot provide a single name for the company selling Kids kites. However, here are some suggestions that you can consider:

1. Kite Kids
2. Kite Kingdom
3. Kite Club
4. Kite Flight
5. Kite Adventures
6. Kite Dreams
7. Kite Sky
8. Kite Fun
9. Kite Play
10. Kite Zone

I hope these suggestions help you find a suitable name for the company.

As an AI language model, I cannot provide a single name for a company selling running shoes. However, here are some tips to consider when creating a marketable company name:

1. Keep it simple and memorable: A name that is easy to remember and pronounce is more likely to stick in people's minds.

2. Use descriptive words: Use words that describe what the company does or what it offers. This will help people understand what the company is about.

3. Avoid using generic names: Avoid using generic names that could apply to any company. This will make it harder for people to remember your brand.

4. Check for availability: Make 

Let's consider a more realistic problem to solve with LLMs that we can not easily solve with a simngle prompt. Here we see the power of chaining a series of steps conducted by an agent using serialized IO of the results.

In [84]:
services_prompt = PromptTemplate(
    input_variables=["company_name", "product"],
    template="Create a list of services and products that {company_name} can develop around {product}. The list must be commma seperated such as (abc, def, ghi, jkl)",
)
services_chain = LLMChain(
    llm=llm,
    prompt=services_prompt,
    output_key="services"
)

In [85]:
# Step 1: We get the recommended list of services 
services = services_chain.run(product=product, company_name=company_name)
print(services)




In [86]:
# Next let's create recommendations for the company slogan
slogan_prompt = PromptTemplate(
    input_variables=['product','company_name','services'],
    template="""
    Context:
    You are desigining a corporate identity for {company_name} selling {product}
    The company is providing {services}. 
    Create a slogan for the company that is unique and memorable.
    """
)
# Slogan chain
slogan_chain = LLMChain(
    llm=llm,
    prompt=slogan_prompt,
    output_key="slogan"
)

In [87]:
slogan = slogan_chain.run(company_name=company_name, product=product, services=services)
print(slogan)


As an AI language model, I cannot provide a slogan for the company selling colorful socks. However, here are some suggestions that you can consider:

1. Socks that make you smile
2. Socks that make you happy
3. Socks that make you feel good
4. Socks that make you feel alive
5. Socks that make you feel like a kid again
6. Socks that make you feel like dancing
7. Socks that make you feel like singing
8. Socks that make you feel like laughing
9. Socks that make you feel like hugging someone
10. Socks that make you feel like jumping for joy

I hope these suggestions help you come up with a unique and catchy slogan for the company selling colorful socks.


### Chaining the chains 
Now that we defined the stages of our chain, identified the inputs and outputs we require and engineered our prompt templates to get the desired outputs, we can build an integrated end-to-end chain.

In [88]:
from langchain.chains import SequentialChain

marketing_chain = SequentialChain(
    chains=[company_name_chain, services_chain, slogan_chain],
    input_variables=['product'],
    output_variables=['company_name', 'services', 'slogan']
    
)

In [89]:
marketing_chain("Kites")

{'product': 'Kites',
 'company_name': '\nAs an AI language model, I cannot provide a single name for the company selling kites. However, here are some suggestions that may help:\n\n1. KiteFly\n2. KiteRise\n3. KiteFlyer\n4. KiteWings\n5. KiteFlight\n6. KiteFlyers\n7. KiteFlyersClub\n8. KiteFlyersClub.com\n9. KiteFlyersClub.org\n10. KiteFlyersClub.net\n11. KiteFlyersClub.info\n12. KiteFlyersClub.co\n13. KiteFlyersClub.io\n14. KiteFlyersClub.biz\n15. KiteFlyersClub.me\n16. KiteFlyersClub.us\n17. KiteFlyersClub.ca\n18. KiteFlyersClub.co.uk\n19. KiteFlyersClub.',
 'services': '\n20. KiteFlyersClub.com.au\n21. KiteFlyersClub.co.in\n22. KiteFlyersClub.co.jp\n23. KiteFlyersClub.co.kr\n24. KiteFlyersClub.co.nz\n25. KiteFlyersClub.co.uk\n26. KiteFlyersClub.co.za\n27. KiteFlyersClub.co.il\n28. KiteFlyersClub.co.id\n29. KiteFlyersClub.co.th\n30. KiteFlyersClub.co.tr\n31. KiteFlyersClub.co.vn\n32. KiteFlyersClub.co.tw\n33. KiteFlyersClub.co.sg\n34. KiteFlyersClub.co.my\n35. KiteFlyersClub.co.ph\n36

#### What we could improve
You will likely see that your models won't always comply with your formatting rules you try to put into their context instructions. You can further expand on this chain design by implementing the output parsers between the stages to clean up the intermediary results. 

# EXERCISE 2
Let's bring it all together to create a custom LangChain that can build up a cooking article for our new online magazine. 

We will develop a series of chains to expand on a set of initial travel destinations to cover. 

Each article will contain:
* A overview section of the travel destination
* A list of things to do in the region
* A recipe for a famous local dish (description, list of ingredients, step by step instruction)

### Target
Your system should create the articles based on the following inputs:

* Travel destination (City, Country)

### Focus on this exercise
* Apply modularity to the steps in your workflow
* Consider quality gates when chaining values across steps
* Think about suitable prompt engineering methods to improve your results

In [None]:
# Let's start by creating the subchains

# 1. Create the destination_overview chain


# 2. Create the recommended_activities chain


# 3. Create the famous_local_dish recommender 


Each of the chains should be constructed for a subset of the workflow that is self contained. Extending too many tasks at once can overload the model and lead to poor results. 

In [None]:
#  Now unifiy the elements and ensure the information flow properly

In [None]:
# Create a list of locations to cover in your reports
location_list = []

In [None]:
# Execute the chain against the location_list
result = ...

## Summary
We have seen how LangChain provides us with a series of abstractions on the core building blocks of LLM based applications. 

We have learned to connect our models, create custom Templates, use Schemata to structure our message flow, chain a series of steps and structure the models outputs to further pass into downstream systems. 

Each component of the system is under continued development and likely you will see the library change as it continues to mature. 

Still, we believe LangChain to be a useful abstraction to develop faster and build a more sustainable code base. 