# Install dependencies

In [1]:
%conda install langchain langchain-community langchain-chroma -c conda-forge
%pip install -qU langchain-openai
%pip install beautifulsoup4

Retrieving notices: ...working... done
Channels:
 - conda-forge
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-community 0.2.16 requires langchain-core<0.3.0,>=0.2.38, but you have langchain-core 0.3.12 which is incompatible.
langchain 0.2.16 requires langchain-core<0.3.0,>=0.2.38, but you have langchain-core 0.3.12 which is incompatible.
langchain-text-splitters 0.2.4 requires langchain-core<0.3.0,>=0.2.38, but you have langchain-core 0.3.12 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


# Perform the following steps before setting up to get API Keys
- Create a Langsmith account here
  - (https://smith.langchain.com/)
- Create an API key on OpenAIs website here
  - (https://openai.com/index/openai-api/)
- Follow the following link for setting up the environment and testing a hello world application
  - (https://python.langchain.com/docs/tutorials/rag/)

## Setting up api keys:
  - Create a file called `set_api_keys.py`
  - Paste in the following template
---
```
import os

# Set environment variables
os.environ['OPENAI_API_KEY'] = '<INSERT OPEN AI API KEY>'
os.environ['LANGCHAIN_ENDPOINT'] = "<INSERT LANGCHAIN ENDPOINT>"
os.environ['LANGCHAIN_API_KEY'] = "<INSERT LANGCHAIN API KEY>"
os.environ['LANGCHAIN_PROJECT'] = "<INSERT LANGCHAIN PROJECT>"
```
---
- Replace key values to set all the API keys appropriately from accounts


#### The following cell will run the `set_api_keys.py` file to set your API keys as environment variables that can then be accessed to interface with the model

In [2]:
%run set_api_keys.py
import os

os.environ["LANGCHAIN_TRACING_V2"] = "true"


In [3]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

In [5]:
import bs4
from langchain import hub
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load, chunk and index the contents of the blog.
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("What is a Scientific Discovery Agent?")

'A Scientific Discovery Agent, like ChemCrow, is a domain-specific tool that combines a large language model (LLM) with expert-designed tools to facilitate tasks in areas such as organic synthesis, drug discovery, and materials design. It utilizes a structured workflow that incorporates reasoning and actions based on user prompts, following a format known as ReAct. This agent is designed to assist in scientific inquiries while considering the associated risks of misuse, particularly concerning sensitive applications.'

In [5]:
# cleanup
vectorstore.delete_collection()