# Example : Extract contact information from a business webpage

Lets extract contact details like emails, phone numbers, and addresses from business websites - using Unstructured

https://python.langchain.com/docs/integrations/document_loaders/unstructured_file/#overview

In [None]:
import os
from dotenv import load_dotenv
load_dotenv()

In [None]:
from langchain_unstructured import UnstructuredLoader

loader = UnstructuredLoader(web_url="https://csc.iitd.ac.in/contact")
docs = loader.load()

In [None]:
full_doc = "\n\n".join(doc.page_content for doc in docs)
print(full_doc)

## No need for chunking and splitted

Since this is very small page, we can pass the content of the entire page as a context. No need to splitting and rerieval in this case.

In [None]:
question = "Extract all email addresses and phone numbers from this contact page."

In [None]:
prompt_template = """You are a contact information extractor. Use the following webpage content to find contact details. Extract emails, phone numbers, and addresses if available. Present the information in a clear, organized format.
Question: {question} 
Webpage Content: {context} 
Contact Information:"""

from langchain.chat_models import init_chat_model
llm = init_chat_model("gpt-4o-mini", model_provider="openai")

response = llm.invoke(prompt_template.format(
    context=full_doc,
    question=question))
    
print(response.content)

In [None]:
from IPython.display import Markdown
Markdown(response.content)

In [None]:
prompt_template = """You are a business directory formatter. Use the following webpage content to extract and organize contact information. The content contains contact details for various departments or offices.
Webpage Content: {context} 
Task: Extract department names, email addresses, and phone numbers. Format the output as:

**Department/Office Name**
- Email: email@domain.com
- Phone: +xx-xxx-xxx-xxxx

"""

from langchain.chat_models import init_chat_model
llm = init_chat_model("gpt-4o-mini", model_provider="openai")

response = llm.invoke(prompt_template.format(
    context=full_doc,
    question=question))
    
from IPython.display import Markdown
Markdown(response.content)

# Exercise: Try extracting contact information from other business websites

Try websites like university contact pages, company about/contact pages, or government office directories to extract emails, phone numbers, and addresses.