One of the coolest examples of stuff you can do with LangChain is tagging and extraction.

![](./assets-resources/tagging.png)

Let's invert stuff a bit and let's work through an example before diving into any details.

FOr this example imagine you have a github repository that you want to test it out.

Usually you would go to that repo and read through the instalation section to see which commands you have to run in the terminal to set up that repo locally. How about using openai functinos to extract the exact list of commands needed to set that up?

This way we could later pass that into some model that can do that for us!

Let's try that out.

In [14]:
from pydantic import BaseModel, Field
from langchain.prompts import ChatPromptTemplate
from langchain.utils.openai_functions import convert_pydantic_to_openai_function
from typing import List



class Tagging(BaseModel):
    """Call this function to extract the terminal commands to set up a given github repository"""
    commands: List[str] = Field(description="URL to the GitHub repo")


tagging_function = convert_pydantic_to_openai_function(Tagging)

functions = [tagging_function]

In [53]:
from langchain.chat_models import ChatOpenAI

llm_chat = ChatOpenAI(temperature=0)


llm_chat_with_tagging_function = llm_chat.bind(functions=functions, function_call={"name": "Tagging"})

prompt = ChatPromptTemplate.from_template("{input_document}")

Now, we need some other function to read through the github repository content.
We can use LangChain's `WebBaseLoader` to load the text from a webpage like a github repo page.

In [54]:
from langchain.document_loaders import WebBaseLoader

url = "https://github.com/ml-explore/mlx#installation"

docs = WebBaseLoader(url).load()    

# getting the contents of the extracted page document
document = docs[0].page_content

Now, let's inspect this `docs` variable:

In [11]:
# from IPython.display import Markdown
# Markdown(document.page_content)

Now, let's use our tagging function to extract the commands from the document by feeding that document into a chain that uses the function we defined earlier:

In [55]:
tagging_chain = prompt | llm_chat_with_tagging_function
output_commands = tagging_chain.invoke({"input_document": document})
output_commands

AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{\n  "commands": [\n    "git clone https://github.com/ml-explore/mlx.git",\n    "cd mlx"\n  ]\n}', 'name': 'Tagging'}})

And there we have it! THe correct commands, properly organized in a list we can easily call. 

Now, let's look at extraction which is similar to tagging but for more complex extraction scenarios where we want to extract multiple pieces of information.