# Creating a chatbot using Langchain & D-id API

In this notebook, we will build a langchain app based on technqiues such as prompts, chains, document loaders and etc. We will connect the LLM response to a facial animation + TTS output via the D-id API. 

The steps taken will involve:
1. Import modules and openaAI api keys
2. Create prompts 
3. Document embedding
4. Pass response to D-id

### Creating a quick test without document embedding

In [8]:
import os
import openai


In [9]:
# import API key

os.environ["OPENAI_API_KEY"] = "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"
API_KEY = os.getenv("OPENAI_API_KEY")


In [14]:
# Prompt templates

from langchain import PromptTemplate

test_template = "Descrie briefly what is {physics_concept} in one sentence"

prompt = PromptTemplate(
    input_variables=['physics_concept'],
    template=test_template
    )

prompt.format(physics_concept='special relativity')


'Descrie briefly what is special relativity in one sentence'

In [18]:
# llms - Chain takes prompt as input

from langchain.llms import OpenAI
from langchain.chains import LLMChain

llm = OpenAI(temperature=0.7, model="text-davinci-003", openai_api_key=API_KEY)
chain = LLMChain(llm=llm, prompt=prompt)

answer = chain.run('special relativity')

In [19]:
# pass answer from LLM chain to D-id 

import requests
import json

url = "https://api.d-id.com/talks"

payload = {
    "script": {
        "type": "text",
        "subtitles": "false",
        "provider": {
            "type": "microsoft",
            "voice_id": "en-US-JennyNeural",
            "voice_config": {
                "style": "Friendly",
                "rate": "0.75"
            }               
        },
        "ssml": "false",
        "input": answer
    },
    "config": {
        "fluent": "false",
        "pad_audio": "0.0",
        "stitch": True
    },
    "source_url": "https://create-images-results.d-id.com/google-oauth2%7C107017662203149014763/upl_0ta3JhUnJ1W2qKckIjRMu/image.png"
}
headers = {
    "accept": "application/json",
    "content-type": "application/json",
    "authorization": "Basic Y21samFXOXRZVzR1YVhSQVoyMWhhV3d1WTI5dDpYSlVzR0lDeXdHSFBQaGZVekFiSG4="
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)
r = response.json()


{"id":"tlk_k2bM9d6nApJn8pVSrHMlV","created_at":"2023-07-04T11:43:27.628Z","created_by":"google-oauth2|107017662203149014763","status":"created","object":"talk"}


In [20]:
# response from D-id endpoint

url = "https://api.d-id.com/talks/"+r["id"]

response = requests.get(url, headers=headers)

print(response.text)
r = response.json()

{"user":{"features":["stitch","clips:write",null],"id":"google-oauth2|107017662203149014763","plan":"deid-trial","authorizer":"basic","email":"ricioman.it@gmail.com","owner_id":"google-oauth2|107017662203149014763"},"script":{"ssml":false,"subtitles":false,"type":"text","provider":{"type":"microsoft","voice_id":"en-US-JennyNeural","voice_config":{"rate":"0.75","style":"Friendly"}}},"metadata":{"driver_url":"bank://lively/driver-03/flipped","mouth_open":false,"num_faces":1,"num_frames":258,"processing_fps":39.321661709038295,"resolution":[942,774],"size_kib":1625.9423828125},"audio_url":"https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%7C107017662203149014763/tlk_k2bM9d6nApJn8pVSrHMlV/microsoft.wav?AWSAccessKeyId=AKIA5CUMPJBIK65W6FGA&Expires=1688557407&Signature=JSS5MKIFCXdIJrnPijIzQmnbbcw%3D&X-Amzn-Trace-Id=Root%3D1-64a405dd-1efc95bc68c275f92df9df42%3BParent%3D5f64a0da599ea635%3BSampled%3D0%3BLineage%3Da08e19fe%3A0","created_at":"2023-07-04T11:43:27.628Z","face":{"mask_

In [21]:
print(r["result_url"])

https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%7C107017662203149014763/tlk_k2bM9d6nApJn8pVSrHMlV/1688471007628.mp4?AWSAccessKeyId=AKIA5CUMPJBIK65W6FGA&Expires=1688557415&Signature=5X%2FRwinrU8gYqi%2F1k1G%2FQ16QKvc%3D&X-Amzn-Trace-Id=Root%3D1-64a405e7-74e71c9e58a14693196056d2%3BParent%3D18c39a4cfaab62b9%3BSampled%3D1%3BLineage%3D6b931dd4%3A0


### Adding document embedding into prompt

In [2]:
# importing modules

from PyPDF2 import PdfReader
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain, SequentialChain 
from langchain.memory import ConversationBufferMemory
from langchain.utilities import WikipediaAPIWrapper
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS


In [5]:
# read pdf file

pdfreader = PdfReader('CubeSat measurements of thermospheric plasma_Publication.pdf')

from typing_extensions import Concatenate

# read text from pdf

raw_text = ''
for i, page in enumerate(pdfreader.pages):
    content = page.extract_text()
    if content:
        raw_text += content


print(len(raw_text))

51609


In [3]:
raw_text

"Vol.:(0123456789)1 3CEAS Space Journal \nhttps://doi.org/10.1007/s12567-022-00439-y\nORIGINAL PAPER\nCubeSat measurements of\xa0thermospheric plasma: spacecraft charging \neffects on\xa0a\xa0plasma analyzer\nSachin\xa0Reddy1 \xa0· Dhiren\xa0Kataria1\xa0· Gethyn\xa0Lewis1\xa0· Anasuya\xa0Aruliah2\xa0· Daniel\xa0Verscharen1,4\xa0· Joel\xa0Baby\xa0Abraham1\xa0· \nGregoire\xa0Deprez3\xa0· Rifat\xa0Mahammod2\nReceived: 27 August 2021 / Revised: 31 January 2022 / Accepted: 1 March 2022 \n© The Author(s) 2022\nAbstract\nSpacecraft charging affects the accuracy of in-situ plasma measurements in space. We investigate the impact of spacecraft \ncharging on upper thermospheric plasma measurements captured by a 2U CubeSat called Phoenix. Using the Spacecraft \nPlasma Interactions Software (SPIS), we simulate dayside surface potentials of −\xa00.6\xa0V, and nightside potentials of −\xa00.2\xa0V. \nWe also observe this charging mechanism in the distribution function captured by the Ion and Neutral 

In [6]:
# We need to split the jumbled text

text_splitter = CharacterTextSplitter(
    separator="\n",
    chunk_overlap=200,
    chunk_size=800, # this should be less than max input token taken by llm
    length_function=len
)

texts = text_splitter.split_text(raw_text)


['Vol.:(0123456789)1 3CEAS Space Journal \nhttps://doi.org/10.1007/s12567-022-00439-y\nORIGINAL PAPER\nCubeSat measurements of\xa0thermospheric plasma: spacecraft charging \neffects on\xa0a\xa0plasma analyzer\nSachin\xa0Reddy1 \xa0· Dhiren\xa0Kataria1\xa0· Gethyn\xa0Lewis1\xa0· Anasuya\xa0Aruliah2\xa0· Daniel\xa0Verscharen1,4\xa0· Joel\xa0Baby\xa0Abraham1\xa0· \nGregoire\xa0Deprez3\xa0· Rifat\xa0Mahammod2\nReceived: 27 August 2021 / Revised: 31 January 2022 / Accepted: 1 March 2022 \n© The Author(s) 2022\nAbstract\nSpacecraft charging affects the accuracy of in-situ plasma measurements in space. We investigate the impact of spacecraft \ncharging on upper thermospheric plasma measurements captured by a 2U CubeSat called Phoenix. Using the Spacecraft',
 'charging on upper thermospheric plasma measurements captured by a 2U CubeSat called Phoenix. Using the Spacecraft \nPlasma Interactions Software (SPIS), we simulate dayside surface potentials of −\xa00.6\xa0V, and nightside potentials of

In [5]:
len(texts)

87

In [10]:
# get embeddings from OpenAI

embeddings = OpenAIEmbeddings()

# text from pdf is converted into embeddings and stored into vector database
vector_db = FAISS.from_texts(texts=texts, embedding=embeddings)


In [11]:
# initilise LLM from OpenAI in chain, load query into chain, perform similarity search on query, 
# run chain answer based on selected doc

from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI

query = "what does the word INMS stand for?"
llm = OpenAI(temperature=0.7, model="text-davinci-003", openai_api_key=API_KEY)
chain = load_qa_chain(llm=llm, chain_type="stuff")

docs = vector_db.similarity_search(query=query)
answer = chain.run(input_documents=docs, question=query)
# chain.run({"input_documents": docs, "question": query},return_only_outputs=True)

In [12]:
answer

' INMS stands for Ion Neutral Mass Spectrometer.'

### Speech Recognition 

In [14]:
import whisper
import speech_recognition as sr

In [52]:
def STT_microphone():

    r = sr.Recognizer()
    with sr.Microphone() as source:
        m=1
        print("Calibrating...")
        r.adjust_for_ambient_noise(source, duration=0.5)
        # optional parameters to adjust microphone sensitivity
        # r.energy_threshold = 200
        # r.pause_threshold=0.5

        # print("Okay, go!")
        # while m==1:
        #     text = ""
        print("listening now...")
    
        audio = r.listen(source, timeout=5, phrase_time_limit=30,)
        print("Recognizing...")
        
        text = r.recognize_whisper(
            audio,
            model="base.en",
            show_dict=True,
        )["text"]
        print(text)

    return text



In [54]:
text = STT_microphone() 

Calibrating...
listening now...
Recognizing...
 Thank you.


### Opening video in Streamlit app

In [13]:
import streamlit as st

st.title('🤖 VVA Bot Test')
st.video("test2.mp4")


2023-07-13 10:50:53.199 
  command:

    streamlit run /Users/rifatmahammod/.pyenv/versions/3.9.2/envs/vva_env/lib/python3.9/site-packages/ipykernel_launcher.py [ARGUMENTS]


DeltaGenerator()

In [14]:
! streamlit run "/Users/rifatmahammod/.pyenv/versions/3.9.2/envs/vva_env/lib/python3.9/site-packages/ipykernel_launcher.py"

[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8501[0m
[34m  Network URL: [0m[1mhttp://192.168.1.27:8501[0m
[0m
[34m[1m  For better performance, install the Watchdog module:[0m

  $ xcode-select --install
  $ pip install watchdog
            [0m
NOTE: When using the `ipython kernel` entry point, Ctrl-C will not work.

To exit, you will have to explicitly quit this process, by either sending
"quit" from a client, or using Ctrl-\ in UNIX-like environments.

To read more about this, see https://github.com/ipython/ipython/issues/2049


To connect another client to this kernel, use:
    --existing kernel-28164.json
[IPKernelApp] ERROR | Unable to initialize signal:
Traceback (most recent call last):
  File "/Users/rifatmahammod/.pyenv/versions/3.9.2/envs/vva_env/lib/python3.9/site-packages/ipykernel/kernelapp.py", line 690, in initialize
    self.init_signal()
  File "/Users/rifatmahammod/.pyenv/versions/3.9