# Part 1 Semantic Search Engine: Demonstration Notebook

This notebook is designed to test and demonstrate the functionality of the FastAPI-based semantic search engine.

**Workflow:**
1.  It will read compatible files (`.txt`) from a local folder `../documents`.
2.  It will upload these files to the `/index` endpoint to be processed and indexed.
3.  It will then allow you to run several search queries against the `/search` endpoint to see the results.

### Pre-requisites:

**1. `Documents` Folder**
* In the main directory, there is a folder called **`documents`** containing 5 '.txt' files.

**2. Launch the FastAPI Server**
* This notebook only acts as a **client**. The API server must be running in the background for this to work.
* Open a **separate terminal**, navigate to the code directory, and run the following command:
    ```bash
    uvicorn main:app --reload --port 8080
    ```
* Keep that terminal open while you use this notebook.

In [5]:
import requests
import json
import os

# The base URL of your running FastAPI application
BASE_URL = "http://127.0.0.1:8080"
DOCS_FOLDER = "../documents"

print("Setup complete. Ready to proceed.")

Setup complete. Ready to proceed.


**Step 1 - Index Local Documents (Code Cell)**

In [6]:
# --- Step 1: Index Local Documents (Corrected) ---

# Import the mimetypes library
import mimetypes

print(f"--- Reading and indexing all files from the '{DOCS_FOLDER}' folder ---")

if not os.path.exists(DOCS_FOLDER):
    print(f"❌ Error: The '{DOCS_FOLDER}' directory was not found. Please complete the prerequisite steps.")
else:
    upload_files = []
    for filename in os.listdir(DOCS_FOLDER):
        file_path = os.path.join(DOCS_FOLDER, filename)
        
        # --- FIX: Guess the content type of the file ---
        content_type, _ = mimetypes.guess_type(file_path)
        if content_type is None:
            # Provide a default content type if guessing fails
            content_type = 'application/octet-stream'

        # --- FIX: Include the content_type in the upload tuple ---
        upload_files.append(('files', (filename, open(file_path, 'rb'), content_type)))

    if not upload_files:
        print("❌ Error: No files found in the 'documents' folder.")
    else:
        try:
            response = requests.post(f"{BASE_URL}/index", files=upload_files)
            response.raise_for_status()
            print("\n✅ Indexing successful!")
            print(response.json())
        except requests.exceptions.RequestException as e:
            print(f"\n❌ Error during indexing: {e}")
        finally:
            for _, file_tuple in upload_files:
                file_tuple[1].close()

--- Reading and indexing all files from the '../documents' folder ---

✅ Indexing successful!
{'message': 'Successfully indexed content from 5 files.'}


In [7]:
print("\n--- Performing Search Queries ---")

queries = [
    "Wifi",
    "Ocean",
    "Interstellar",
    "Medival history",
    "Money"
]

for query in queries:
    print(f"\n🔎 Searching for: '{query}'")
    try:
        response = requests.get(f"{BASE_URL}/search", params={'query': query})
        response.raise_for_status()
        results = response.json()
        print("✅ Results found:")
        print(json.dumps(results, indent=2))
    except requests.exceptions.RequestException as e:
        print(f"❌ Error during search: {e}")


--- Performing Search Queries ---

🔎 Searching for: 'Wifi'
✅ Results found:
{
  "results": [
    "The internet's origins can be traced back to the 1960s as a United States military project called ARPANET (Advanced Research Projects Agency Network). The goal was to create a decentralised, robust communications network that could withstand a nuclear attack. The first successful message was sent between two computers at UCLA and Stanford in 1969. A key innovation was packet switching, a method of breaking down data into small blocks, or packets, that could be sent independently over various routes and reassembled at the destination.\n\nThe 1980s saw the standardisation of the TCP/IP protocol, which remains the fundamental communication language of the internet. However, it was the invention of the World Wide Web by British scientist Tim Berners-Lee at CERN in 1990 that transformed the internet from a niche academic and military tool into a global platform for information sharing. He deve