## Generation: Generating a Response

In [1]:
%load_ext dotenv
%dotenv

In [11]:
from langchain_chroma import Chroma
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_core.documents import Document
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.runnables import RunnableParallel
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

In [3]:
vectorstore = Chroma(persist_directory = "./intro-to-ds-lectures",
                                   embedding_function = OpenAIEmbeddings(model = "text-embedding-ada-002"))

In [4]:
retriever = vectorstore.as_retriever(search_type="mmr",
                                     search_kwargs={'k': 3,
                                                    'lambda_mult': 0.7})

In [5]:
TEMPLATE = '''
Answer the following question:
{question}

To answer the question, use only the following context:
{context}

At the end of the response, specify the name of the lecture this context is taken from in the format:
Resources: *Lecture Title*
where *Lecture Title* should be substituted with the title of all resource lectures
'''

prompt_template = PromptTemplate.from_template(TEMPLATE)

In [12]:
chat = ChatOpenAI(model='gpt-4',
                  seed=365,
                  max_tokens=250)

In [13]:
question = "What software do Data Scientists use?"

In [16]:
chain = ({'context': retriever,
         'question': RunnablePassthrough()} | prompt_template | chat | StrOutputParser())

In [17]:
chain.invoke(question)

'Data Scientists often use several sets of software and programming languages. R and Python are popular tools due to their ability to manipulate data, integrate within multiple data and data science software platforms, and perform mathematical and statistical computations. These languages are adaptable, solving a wide range of business and data-related problems. Hadoop, a software framework, is employed to address the complexity and computational intensity of big data by distributing it across multiple computers. For business intelligence visualizations, software like Power BI, SaS, Qlik, and especially Tableau are prominent examples used in this field.\n\nResources: Programming Languages & Software Employed in Data Science - All the Tools You Need'

In [18]:
print('Data Scientists often use several sets of software and programming languages. R and Python are popular tools due to their ability to manipulate data, integrate within multiple data and data science software platforms, and perform mathematical and statistical computations. These languages are adaptable, solving a wide range of business and data-related problems. Hadoop, a software framework, is employed to address the complexity and computational intensity of big data by distributing it across multiple computers. For business intelligence visualizations, software like Power BI, SaS, Qlik, and especially Tableau are prominent examples used in this field.\n\nResources: Programming Languages & Software Employed in Data Science - All the Tools You Need')

Data Scientists often use several sets of software and programming languages. R and Python are popular tools due to their ability to manipulate data, integrate within multiple data and data science software platforms, and perform mathematical and statistical computations. These languages are adaptable, solving a wide range of business and data-related problems. Hadoop, a software framework, is employed to address the complexity and computational intensity of big data by distributing it across multiple computers. For business intelligence visualizations, software like Power BI, SaS, Qlik, and especially Tableau are prominent examples used in this field.

Resources: Programming Languages & Software Employed in Data Science - All the Tools You Need
