In [94]:
pip install -qU beautifulsoup4


[0mNote: you may need to restart the kernel to use updated packages.


In [4]:
from langchain_community.document_loaders import RecursiveUrlLoader

In [96]:
#basic loading example of recursive url loader
loader = RecursiveUrlLoader(
    "https://docs.python.org/3.9/",
    # max_depth=2,
    # use_async=False,
    # extractor=None,
    # metadata_extractor=None,
    # exclude_dirs=(),
    # timeout=10,
    # check_response_status=True,
    # continue_on_failure=True,
    # prevent_outside=True,
    # base_url=None,
    # ...
)

In [97]:
#docs = loader.load()

In [98]:
#print(docs[1].page_content[:300])


<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />

    <title>What’s New in Python &#8212; Python 3.9.24 documentation</title><meta name="viewport" content="width=de


In [1]:
!pip install lxml

[0m

In [2]:
#now add an extractor to make it more llm friendly
import re 

from bs4 import BeautifulSoup

def bs4_extractor(html: str) -> str:
    soup = BeautifulSoup(html, "lxml")
    return re.sub(r"\n\n+", "\n\n", soup.text).strip()

In [5]:
loader = RecursiveUrlLoader("https://docs.python.org/3.9/", extractor=bs4_extractor)
docs = loader.load()
print(docs[0].page_content[:200])


Assuming this really is an XML document, what you're doing might work, but you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the Python package 'lxml' installed, and pass the keyword argument `features="xml"` into the BeautifulSoup constructor.




  soup = BeautifulSoup(html, "lxml")

Assuming this really is an XML document, what you're doing might work, but you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the Python package 'lxml' installed, and pass the keyword argument `features="xml"` into the BeautifulSoup constructor.




  soup = BeautifulSoup(raw_html, "html.parser")


3.9.24 Documentation

Download
Download these documents
Docs by version

Stable
In development
All versions

Other resources

PEP Index
Beginner's Guide
Book List
Audio/Visual Talks
Python Developer’s


In [31]:
from langchain.schema.document import Document

##apparently pinecone does not accept values like None or "". 
#therefor you gotta clean that up 

def fix_metadata(data):
    fixed = {}
    for k, v in data.items():
        if isinstance(v, (str, int, float, bool)):
            fixed[k] = v
        return fixed

cleaned_docs = [
    Document(
        page_content=d.page_content,
        metadata=fix_metadata(d.metadata)
    )
    for d in docs
]

#chunk the docs 
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)

chunks = splitter.split_documents(cleaned_docs)

ids = [f"id_{i}" for i in range(len(chunks))]

In [9]:
pip install langchain_core langchain_pinecone

[0mNote: you may need to restart the kernel to use updated packages.


In [32]:
# embed the documents
from langchain_openai import OpenAIEmbeddings
from langchain_pinecone import PineconeVectorStore
index_name = "warmer-mitten"

#load env 
import os 
from dotenv import load_dotenv

load_dotenv()
os.environ["PINECONE_API_KEY"] = os.getenv("PINECONE_API_KEY")
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

embeddings = OpenAIEmbeddings()

In [35]:
#create vector db 

vectorstore = PineconeVectorStore(index_name=index_name, embedding=embeddings)

vectorstore.from_documents(
    documents=chunks,
    index_name=index_name,
    embedding=embeddings
)


<langchain_pinecone.vectorstores.PineconeVectorStore at 0xffff50628f40>

In [36]:
#create retriever
retriever = vectorstore.as_retriever()



In [40]:
#generate query and answer it 
query = "how do i do type casting in python?"


In [41]:
result = retriever.invoke(query)
print(result)

[Document(id='74084a7c-c983-4839-aeee-e11fe3af0202', metadata={'source': 'https://docs.python.org/3.9/tutorial/index.html'}, page_content='The Python Tutorial¶\nPython is an easy to learn, powerful programming language. It has efficient\nhigh-level data structures and a simple but effective approach to\nobject-oriented programming. Python’s elegant syntax and dynamic typing,\ntogether with its interpreted nature, make it an ideal language for scripting\nand rapid application development in many areas on most platforms.\nThe Python interpreter and the extensive standard library are freely available\nin source or binary form for all major platforms from the Python Web site,\nhttps://www.python.org/, and may be freely distributed. The same site also\ncontains distributions of and pointers to many free third party Python modules,\nprograms and tools, and additional documentation.\nThe Python interpreter is easily extended with new functions and data types\nimplemented in C or C++ (or other

In [44]:
#generating an answer


#first, need to clean the data 
context = "\n\n".join([entry.page_content for entry in result])

In [46]:
from openai import OpenAI

client = OpenAI()



#da proompt
prompt = f"""
you are an expert assistant that specializes in answering questions. You will always be given some context to work with.
Based on the context, please answer the query as best as possible.

context:
{context}

question:
{query}
"""


In [47]:
rag_answer = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{
        "role": "user",
        "content": prompt}]
)

print(rag_answer.choices[0].message.content)

In Python, type casting is the process of converting a variable from one type to another. This can be done using built-in functions that correspond to the data types you want to convert to. Here are some common type casting methods:

1. **Converting to Integer**: You can convert a string or float to an integer using `int()`.
   ```python
   num_str = "123"
   num_int = int(num_str)  # Converts string to integer
   ```

2. **Converting to Float**: You can convert a string or integer to a float using `float()`.
   ```python
   num_int = 123
   num_float = float(num_int)  # Converts integer to float
   ```

3. **Converting to String**: You can convert an integer or float to a string using `str()`.
   ```python
   num_float = 123.45
   num_str = str(num_float)  # Converts float to string
   ```

4. **Converting to List**: You can convert other iterable types (like tuples or strings) to a list using `list()`.
   ```python
   tuple_data = (1, 2, 3)
   list_data = list(tuple_data)  # Converts