# 🔍 Search-Augmented Question Answering with SerpAPI + LangChain

## 1. Introduction
- Brief description of the goal: using search engine results to build a lightweight Q&A system
- Tech stack: Python, SerpAPI, LangChain, HuggingFace, FAISS

## 2. Web Search and Text Collection (via SerpAPI)
- Use SerpAPI to query a topic (e.g., “History of AI”)
- Extract top result URLs and article content
- Save all content to `result.txt`

In [None]:
!pip install newspaper3k
!pip install langchain faiss-cpu
!pip install sentence-transformers
!pip install transformers serpapi

import os
import requests
from newspaper import Article

from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.llms import HuggingFacePipeline
from langchain.chains import RetrievalQA
from transformers import pipeline

In [None]:
def serpapi_bing_search(query, api_key, num_results=10):
    url = "https://serpapi.com/search"
    params = {
        "engine": "bing",            # Using Bing Engine
        "q": query,                  # Search for keyword
        "count": num_results,        # result
        "api_key": api_key
    }

    try:
        res = requests.get(url, params=params)
        res.raise_for_status()
        data = res.json()

        results = data.get("organic_results", [])
        urls = [item.get("link") for item in results if "link" in item]
        return urls

    except Exception as e:
        print(f"Error: {e}")
        return []

def detailed_search(url, txtNum=500):
    article = Article(url)
    article.download()
    article.parse()
    print(article.title)
    print(article.text[:txtNum])

def save_articles_to_file(urls, output_path="result.txt"):
    with open(output_path, "w", encoding="utf-8") as f:
        for url in urls:
            try:
                article = Article(url)
                article.download()
                article.parse()
                f.write(f"=== {article.title} ===\n")
                f.write(article.text + "\n\n")
            except Exception as e:
                print(f"Failed to process {url}: {e}")


## 3. Text Processing and Vector Embedding
- Load `result.txt`
- Split into chunks
- Use `sentence-transformers` + FAISS for vector indexing

In [None]:
keyword = input("Enter search keyword: ")
os.environ["SERPAPI_API_KEY"] = input("Enter your SerpAPI key: ")
api_key = os.environ["SERPAPI_API_KEY"]

urls = serpapi_bing_search(keyword, api_key)
save_articles_to_file(urls)

## 4. Lightweight LLM + QA Pipeline
- Load a tiny LLM using HuggingFace `pipeline`
- Setup `RetrievalQA` using LangChain
- Ask sample questions and print answers

In [None]:
# Using tiny model
generator = pipeline("text-generation", model="sshleifer/tiny-gpt2", max_new_tokens=256)
llm = HuggingFacePipeline(pipeline=generator)

# Loading the privous result
loader = TextLoader("result.txt", encoding="utf-8")
docs = loader.load()

# slice the content
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
split_docs = text_splitter.split_documents(docs)

# 3. Text vectorization（using English model）
embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
db = FAISS.from_documents(split_docs, embedding)

# 4. QA system setup
qa = RetrievalQA.from_chain_type(llm=llm, retriever=db.as_retriever())

# 5. test QA
question = ""
result = qa.run(question)
print("Answer:", result)

## 5. Notes on API Key Security
- Use environment variable or mount secrets in Colab to protect SerpAPI key