# Creating Conservational AI with Large Language Models for Business

# Algoritma

Algoritma Data Science School is an educational institution based in Indonesia that specializes in providing training and courses in the field of data science and analytics. It aims to equip individuals with the knowledge and skills needed to excel in the rapidly growing field of data science.

This Jupyter Notebook has been specifically designed for Algoritma Data Science School, providing a platform for students to explore, analyze, and visualize data using Python and its associated data science libraries. Jupyter Notebook serves as an interactive workspace where students can demonstrate their understanding of data science concepts, showcase their skills, and present their findings in a structured and organized manner. Please note that this Jupyter Notebook is intended for personal or educational use only. Kindly refrain from reproducing or distributing this notebook without prior permission. 

# Outline

- Large Language Model (LLM)
    - Overview of Large Language Model & Transformer
    - Introduction to populer LLM like GPT-3, GPT-2, and BERT
    - Understanding Capabilities and limitation of LLM

- Large Language Model Implementation
    - Introduction to LangChain
    - Setting API key and .env
    - LangChain QuickStart

- Build Question-Answering System
    - Introduction to Question-Answering System
    - Connecting datasource (database and text data) with LLM
    - Basics of building Question-Answering System using LLM with database and text data
    - Demonstration of using OpenAI and LangChain to build a Question-Answering System

- Hugging Face
    - Introduction to Text Generation and Hugging Face
    - Setting API key and .env
    - Applying HuggingFace's inference API to use LLM without OpenAI credits
    - Integrating HuggingFace's Inference API into the previously built Question-Answering System
    - Demonstration of using HuggingFace's Inference API to build a Question-Answering System

# Large Language Model (LLM)

# Large Language Model Implementation

## Intro to LangChain

LangChain is a framework for developing applications powered by language models. The LangChain framework is designed around these principle : 

1. Data-aware: connect a language model to other sources of data

2. Agentic: allow a language model to interact with its environment

## Setting API key and .env

### dotenv

The dotenv library is a popular Python library that simplifies the process of loading environment variables from a .env file into your Python application. It allows you to store configuration variables separately from your code, making it easier to manage sensitive information such as API keys, database credentials, or other environment-specific settings.

#### `.env` file

the .env file is a text file commonly used in software development projects to store environment-specific configuration variables. It follows a simple key-value format, where each line represents a single configuration variable.

Here are a few important points about the .env file:

1. Purpose: The primary purpose of the .env file is to separate sensitive or environment-specific information from your codebase. It allows you to store configuration variables such as API keys, database credentials, or other settings that may change based on the environment (e.g., development, staging, production).

2. File Format: The .env file is typically a plain text file without any special formatting. Each line in the file consists of a key-value pair, where the key and value are separated by an equal sign (=). For example :

```{python}
API_KEY=abc123
DATABASE_URL=mysql://user:password@localhost/db

```

3. Environment Variables: Each line in the .env file represents an environment variable. The key is the name of the environment variable, and the value is the corresponding value for that variable. These variables can be accessed within your code to retrieve the associated values.

4. Loading Variables: To make use of the variables defined in the .env file, you need to load them into your application. This is typically done using a library like dotenv in Python. The library reads the .env file and sets the defined variables as environment variables that can be accessed within your code.

5. Security: It's essential to ensure the security of your .env file. The file may contain sensitive information, such as passwords or access tokens. Make sure to exclude the .env file from version control systems like Git and only share it with authorized individuals who require access to the environment-specific configuration.

Overall, the .env file provides a convenient and flexible way to manage configuration variables in your project, allowing you to keep sensitive information separate from your code and easily configure different environments.


### Verify

In [1]:
from dotenv import load_dotenv

load_dotenv()

True

## LangChain QuickStart

**Topic**

- Prompt
- Chain
- Agent

In [2]:
from langchain import OpenAI

llm = OpenAI(temperature=0.9) # parameter temperature

### Prompt

#### Basic Prompt

The basic building of LangChain is the LLM, which takes in text and generates more text (answer!).

Example, we're building an application that generates a brand name based on company description.

In [10]:
prompt = "What is a good name for a brand that makes local burger?"

print(llm.predict(prompt))



HomeTown Burgers.


Notice every re-run it generate new answer.

We can also did it in other languages, let's try with Bahasa

In [13]:
print(llm.predict("Nama yang bagus untuk brand yang membuat pisang goreng mentai?"))



MentaiFrits


#### Prompt Templates

Most LLM applications do not pass input directly into an LLM. Usually they will add the usre input to a larger piece of text, called a prompt template.

A prompt template refers to a reproducible way to generate a prompt. It contains a text string (“the template”), that can take in a set of parameters from the end user and generate a prompt.

The prompt template may contain:

- instructions to the language model,

- a set of few shot examples to help the language model generate a better response,

- a question to the language model.

In the previous example, the text we passed to the model contained instruction to generate a brand name based on description. For our application, it'd be great if the user only had to provide the description of a company/product, without having to worry about giving the model instruction.

In [24]:
from langchain.prompts import PromptTemplate

template_prompt = PromptTemplate.from_template("What is a good name for a brand that makes {product}?")

prompt = template_prompt.format(product="local burger")

print(prompt)

What is a good name for a brand that makes local burger?


Notice the instruction changes automatically based on user input, this instruction will be input to `llm` to generate the response.

In [17]:
print(llm.predict(prompt))



Burgers by the Bay.


Because this is a template, it can handle more than one input, for example.

In [18]:
template = "Write a {adjective} poem about {subject}"

poem_template = PromptTemplate(
    input_variables=["adjective", "subject"],
    template=template,
)

In [20]:
poem_template.format(adjective='sad', subject='ducks')

'Write a sad poem about ducks'

In [21]:
print(llm.predict(poem_template.format(adjective='sad', subject='ducks')))



Oh, little ducks in sorrow
Sad are your cries
Your little feet leave a sorrowful trail
On the mirror lakes you glide

You quack so sadly in pain
Your sorrowful quacks echoing
A mournful reminder of your sadness
And a call to understand

Oh, little ducks I do not know
The pain of your sorrow
But I would like to share your load
To carry it with you tomorrow

For little ducks are so sad
Their little hearts so strong
They may not understand the pain
But they still keep going on


The prompt template may contain:

- instructions to the language model,

- a set of few shot examples to help the language model generate a better response,

- a question to the language model.

In [22]:
template = """
I want you to act as a naming consultant for new companies.

Here are some examples of good company names:

- search engine, Google
- social media, Facebook
- video sharing, YouTube

The name should be short, catchy and easy to remember.

What is a good name for a brand that makes {product}?
"""

brand_template = PromptTemplate(
    input_variables=["product"],
    template=template,
)

batik_prompt = brand_template.format(product='batik')

print(llm.predict(batik_prompt))


BatikBye.


### Chain

We've got a model and prompt template, we'll want to combine the two by "Chain"-ing them up. Chains give us a way to link (or chain) together multiple primitives, like models, prompts, and other chains.

The simplest and most common type of chain is an LLMChain, which passes an input first to a PromptTemplate and then to an LLM. We can construct an LLM chain from our existing model and prompt template.

For example, if we want to generate response from our template our workflow would be:

1. Create the prompt based on input with `template_prompt`

In [25]:
prompt = template_prompt.format(product="rendang mozarella")

print(prompt)

What is a good name for a brand that makes rendang mozarella?


2. Generate response from prompt with `llm`

In [31]:
print(llm.predict(prompt))



Rendez Mozzarella.


We can simplify the workflow by chaining (link) them up with `Chains`

In [32]:
from langchain.chains import LLMChain

chain = LLMChain(llm=llm, prompt=template_prompt)

In [35]:
print(chain.run('rendang mozarella'))



Mozarella Rendang Co.


Notice that every new input we just need one line code to generate the response using `Chains`. Understanding how this simple chain works will set you up well for working with more complex chains.

## Agents

We had a plan for our initial chain to follow specific steps. However, for more complicated workflows, it's important to be able to pick actions based on the situation.

Agents help us do exactly that. They use a language model to figure out which actions to take and in what order. These agents have tools at their disposal, and they keep selecting a tool, running it, and examining the results until they find the ultimate solution.

To load an agent, you need to choose a(n):

- **LLM/Chat model:** The language model powering the agent.

- **Tool(s):** A function that performs a specific duty. This can be things like: Google Search, Database lookup, Python REPL, other chains. For a list of predefined tools and their specifications, see the Tools documentation.

- **Agent name:** A string that references a supported agent class. An agent class is largely parameterized by the prompt the language model uses to determine which action to take. Because this notebook focuses on the simplest, highest level API, this only covers using the standard supported agents. If you want to implement a custom agent, see here. For a list of supported agents and their specifications, see here.

For this example, we'll be using `wikipedia` tools to query a response from wikipedia information.

In [3]:
from langchain.agents import AgentType, initialize_agent, load_tools

# The language model we're going to use to control the agent.
llm_agent = OpenAI(temperature=0)

# The tools we'll give the Agent access to. Note that the 'llm-math' tool uses an LLM, so we need to pass that in.
tools = load_tools(["wikipedia", "llm-math"], llm=llm_agent)

# Finally, let's initialize an agent with the tools, the language model, and the type of agent we want to use.
agent = initialize_agent(tools, llm_agent, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)

Let's test it out

In [4]:
agent.run("What year did Lionel Messi Joined Barcelona? What is his current age raised to the 0.43 power?")



[1m> Entering new  chain...[0m
[32;1m[1;3m I need to find out when Messi joined Barcelona and then calculate his current age raised to the 0.43 power.
Action: Wikipedia
Action Input: Lionel Messi[0m
Observation: [36;1m[1;3mPage: Lionel Messi
Summary: Lionel Andrés Messi (Spanish pronunciation: [ljoˈnel anˈdɾes ˈmesi] (listen); born 24 June 1987), also known as Leo Messi, is an Argentine professional footballer who plays as a forward for and captains the Argentina national team. Widely regarded as one of the greatest players of all time, Messi has won a record seven Ballon d'Or awards and a record six European Golden Shoes, and in 2020 he was named to the Ballon d'Or Dream Team. Until leaving the club in 2021, he had spent his entire professional career with Barcelona, where he won a club-record 34 trophies, including ten La Liga titles, seven Copa del Rey titles and the UEFA Champions League four times. With his country, he won the 2021 Copa América and the 2022 FIFA World Cup

'Lionel Messi joined Barcelona in 2004 and his current age raised to the 0.43 power is 3.92.'