# What is Langchain?

Langchain is a Python framework for building and working with LLM APIs and related tools. It is designed to be easy to use and flexible. It introduces two new abstractions: chains and agents. Chains are "invoked", while agents are "executed". 

The framework has some bells and whistles: Langchain Expression Language (LCEL), used to describe the chains and inputs to it. And a paid service for request tracing called Langsmith.

This notebook is based on [Langchain Quickstart](https://python.langchain.com/docs/use_cases/question_answering/quickstart/)

Langchain is secretly not a single Python package, but shipped as a collection of packages:

1. Core: Parsers, Runnables
2. Community: Vector Stores, Document Loaders
3. Text Splitters, LLM APIs, and more

The framework is designed to be powerful at the cost of being complex. It has happy paths for beginners, but primarily built with a lot of advanced features to enable experimentation. It is not a "plug and play" framework, but a "plug and experiment" one.

In [13]:
import getpass
import os

import bs4
from langchain import hub

## Langchain imports
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Qdrant
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import DirectoryLoader
from tqdm import tqdm

In [2]:
os.environ["OPENAI_API_KEY"] = getpass.getpass()

In [3]:
llm = ChatOpenAI(model="gpt-4-turbo")

## Getting the data

We've used data from [mdn-docs](https://github.com/mdn/content/tree/main/files/en-us/web/css) for our use case. 

We are going to use Directory Loader for reading the markdown files from the directory.



In [4]:
# Load the data
loader = DirectoryLoader(
    "../data/css_docs",
    glob="**/*.md"
)
docs = loader.load()


[nltk_data] Downloading package punkt to /Users/abcom/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/abcom/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [7]:
len(docs)

1057

In [5]:
docs[0].metadata

{'source': '../data/css_docs/index.md'}

In [31]:
chars = len(docs[0].page_content)
approx_tokens = chars*3/4
print(f"Approximate number of tokens: {approx_tokens}")

Approximate number of tokens: 4101.75


In [30]:
# count total number of tokens for all docs
total_approx_tokens = sum([len(doc.page_content)*(3/4) for doc in docs])
total_approx_tokens
print(f"Total number of tokens: {total_approx_tokens}")


Total number of tokens: 3091506.0


> 💡 Reader Exercise: Find the exact number of tokens for GPT4 model instead of approximate tokens. Post your code snippet on Discourse!

### Chunking
Chunking involves dividing a text into smaller, manageable pieces. Chunk overlap refers to the portion of text that is repeated across consecutive chunks to maintain context continuity.


In [10]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
splits = text_splitter.split_documents(docs)

In [11]:
splits[:5]

[Document(page_content='title: "CSS: Cascading Style Sheets"\nslug: Web/CSS\npage-type: landing-page\n\n{{CSSRef}}\n\nCascading Style Sheets (CSS) is a stylesheet language used to describe the presentation of a document written in HTML or XML (including XML dialects such as SVG, MathML or {{Glossary("XHTML")}}). CSS describes how elements should be rendered on screen, on paper, in speech, or on other media.\n\nCSS is among the core languages of the open web and is standardized across Web browsers according to W3C specifications. Previously, the development of various parts of CSS specification was done synchronously, which allowed the versioning of the latest recommendations. You might have heard about CSS1, CSS2.1, or even CSS3. There will never be a CSS3 or a CSS4; rather, everything is now CSS without a version number.', metadata={'source': '../data/css_docs/index.md'}),
 Document(page_content="After CSS 2.1, the scope of the specification increased significantly and the progress on

In [14]:
vectorstore = Qdrant.from_documents(
    documents=tqdm(splits, desc="Processing documents"),
    embedding=OpenAIEmbeddings(),
    output_parser=StrOutputParser(),
    collection_name="css_docs",
    location=":memory:"
)

Processing documents: 100%|██████████| 5220/5220 [00:00<00:00, 183898.90it/s]


In [29]:
# Let's check how the retrieval process wotks , we will perform a semantic search and see what documents are retrieved
results = vectorstore.similarity_search("change the border of a text field that has been autocompleted by the browser", k=3)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

* Syntax

css
:autofill {
  /* ... */
}

Examples

The following example demonstrates the use of the :autofill pseudo-class to change the border of a text field that has been autocompleted by the browser. For the best browser compatibility use both :-webkit-autofill and :autofill.

```css
input {
  border: 3px solid grey;
  border-radius: 3px;
}

input:-webkit-autofill {
  border: 3px solid blue;
}
input:autofill {
  border: 3px solid blue;
}
```

```html

```

{{EmbedLiveSample('Examples')}}

Specifications

{{Specifications}}

Browser compatibility

{{Compat}}

See also

Chromium issue 46543: Auto-filled input text box yellow background highlight cannot be turned off

WebKit bug 66032: Allow site authors to override autofilled fields' colors.

Mozilla bug 740979: implement :-moz-autofill pseudo-class on input elements with an autofilled value

User Interface Module Level 4: more selectors [{'source': '../data/css_docs/_colon_autofill/index.md', '_id': '2028eaf94edc46169ba06a4a16bae62

In [15]:
retriever = vectorstore.as_retriever()

In [16]:
prompt = hub.pull("rlm/rag-prompt")

In [17]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [18]:
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [22]:
result = rag_chain.invoke("change the border of a text field that has been autocompleted by the browser")
print(result)

To change the border of a text field that has been autocompleted by the browser, you can use the CSS pseudo-classes `:-webkit-autofill` and `:autofill`. Here is an example of how to set the border color to blue when the input field has been autofilled:

```css
input:-webkit-autofill,
input:autofill {
  border: 3px solid blue;
}
```
