# üöÄ FoundryIQ Deployment Notebook

This notebook enables deploying the FoundryIQ AI Document Assistant directly from **GitHub Codespaces**.

## What This Notebook Does

1. **Logs into Azure** - Using device code flow
2. **Installs dependencies** - Azure SDK, OpenAI client
3. **Discovers Azure resources** - Finds AI Services and Search
4. **Creates Azure AI Search index** - With vector search
5. **Processes documents** - From `/files` folder
6. **Indexes documents** - Generates embeddings
7. **Tests the system** - With sample queries

---

**Run each cell in order using `Shift+Enter`.**

## 1Ô∏è‚É£ Install Required Dependencies

In [None]:
import subprocess, sys

packages = ["azure-identity", "azure-search-documents", "openai", "python-dotenv", "pandas", "openpyxl", "python-docx", "PyPDF2"]

print("üì¶ Installing dependencies...")
for pkg in packages:
    subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", pkg])
print("‚úÖ Done!")

## 2Ô∏è‚É£ Login to Azure

Uses **device code flow** which works in GitHub Codespaces.

In [None]:
import subprocess, os, json

def check_login():
    try:
        r = subprocess.run(["az", "account", "show", "--query", "user.name", "-o", "tsv"], capture_output=True, text=True, timeout=10)
        return r.stdout.strip() if r.returncode == 0 else None
    except: return None

user = check_login()
if user:
    print(f"‚úÖ Logged in as: {user}")
else:
    print("üîê Starting Azure login...")
    subprocess.run(["az", "login", "--use-device-code"])

### Select Subscription

In [None]:
subs = json.loads(subprocess.run(["az", "account", "list", "-o", "json"], capture_output=True, text=True).stdout)
print("üìã Subscriptions:")
for i, s in enumerate(subs):
    print(f"  [{i+1}] {s['name']}" + (" ‚Üê CURRENT" if s.get('isDefault') else ""))

# Set SUBSCRIPTION_INDEX to change (e.g., 1, 2, 3)
SUBSCRIPTION_INDEX = None
if SUBSCRIPTION_INDEX:
    subprocess.run(["az", "account", "set", "--subscription", subs[SUBSCRIPTION_INDEX-1]['id']])

## 3Ô∏è‚É£ Discover Azure Resources

In [None]:
from dotenv import load_dotenv
load_dotenv('.env') if os.path.exists('.env') else None

config = {
    "AZURE_OPENAI_ENDPOINT": os.getenv("AZURE_OPENAI_ENDPOINT", ""),
    "AZURE_OPENAI_API_KEY": os.getenv("AZURE_OPENAI_API_KEY", ""),
    "AZURE_OPENAI_DEPLOYMENT": os.getenv("AZURE_OPENAI_DEPLOYMENT", "gpt-4.1"),
    "AZURE_OPENAI_EMBEDDING_DEPLOYMENT": os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", "text-embedding-3-small"),
    "AZURE_SEARCH_ENDPOINT": os.getenv("AZURE_SEARCH_ENDPOINT", ""),
    "AZURE_SEARCH_API_KEY": os.getenv("AZURE_SEARCH_API_KEY", ""),
    "AZURE_SEARCH_INDEX_NAME": os.getenv("AZURE_SEARCH_INDEX_NAME", "foundryiq-documents"),
}

print("üîç Discovering resources...")
ai_svc = json.loads(subprocess.run(["az", "cognitiveservices", "account", "list", "--query", "[?kind=='AIServices'||kind=='OpenAI'].{name:name,endpoint:properties.endpoint,rg:resourceGroup}", "-o", "json"], capture_output=True, text=True).stdout or '[]')
search_svc = json.loads(subprocess.run(["az", "resource", "list", "--resource-type", "Microsoft.Search/searchServices", "--query", "[].{name:name,rg:resourceGroup}", "-o", "json"], capture_output=True, text=True).stdout or '[]')

print(f"ü§ñ AI Services: {[s['name'] for s in ai_svc]}")
print(f"üîé Search: {[s['name'] for s in search_svc]}")

### Select Resources

In [None]:
OPENAI_INDEX, SEARCH_INDEX = 1, 1

if ai_svc:
    sel = ai_svc[OPENAI_INDEX-1]
    config["AZURE_OPENAI_ENDPOINT"] = sel.get("endpoint", "")
    config["AZURE_OPENAI_API_KEY"] = subprocess.run(["az", "cognitiveservices", "account", "keys", "list", "--name", sel["name"], "--resource-group", sel["rg"], "--query", "key1", "-o", "tsv"], capture_output=True, text=True).stdout.strip()
    print(f"‚úÖ AI Service: {sel['name']}")

if search_svc:
    sel = search_svc[SEARCH_INDEX-1]
    config["AZURE_SEARCH_ENDPOINT"] = f"https://{sel['name']}.search.windows.net"
    config["AZURE_SEARCH_API_KEY"] = subprocess.run(["az", "search", "admin-key", "show", "--service-name", sel["name"], "--resource-group", sel["rg"], "--query", "primaryKey", "-o", "tsv"], capture_output=True, text=True).stdout.strip()
    print(f"‚úÖ Search: {sel['name']}")

with open('.env', 'w') as f:
    for k, v in config.items(): f.write(f"{k}={v}\n")
print("üíæ Saved to .env")

## 4Ô∏è‚É£ Create Search Index

In [None]:
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import SearchIndex, SearchField, SearchFieldDataType, VectorSearch, HnswAlgorithmConfiguration, VectorSearchProfile, SearchableField, SimpleField
from azure.core.credentials import AzureKeyCredential

index_client = SearchIndexClient(config["AZURE_SEARCH_ENDPOINT"], AzureKeyCredential(config["AZURE_SEARCH_API_KEY"]))
index_name = config["AZURE_SEARCH_INDEX_NAME"]

try:
    index_client.get_index(index_name)
    print(f"‚úÖ Index '{index_name}' exists")
except:
    fields = [
        SimpleField(name="id", type=SearchFieldDataType.String, key=True),
        SearchableField(name="content", type=SearchFieldDataType.String),
        SearchableField(name="title", type=SearchFieldDataType.String),
        SimpleField(name="file_name", type=SearchFieldDataType.String, filterable=True),
        SearchField(name="content_vector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single), searchable=True, vector_search_dimensions=1536, vector_search_profile_name="vector-profile")
    ]
    vs = VectorSearch(algorithms=[HnswAlgorithmConfiguration(name="hnsw")], profiles=[VectorSearchProfile(name="vector-profile", algorithm_configuration_name="hnsw")])
    index_client.create_index(SearchIndex(name=index_name, fields=fields, vector_search=vs))
    print(f"‚úÖ Created '{index_name}'")


## 5Ô∏è‚É£ Process Documents

In [None]:
import pandas as pd, hashlib
from pathlib import Path

def read_file(path):
    docs, name, suffix = [], Path(path).name, Path(path).suffix.lower()
    try:
        if suffix == '.csv':
            for i, row in pd.read_csv(path).iterrows():
                docs.append({"id": hashlib.md5(f"{name}_{i}".encode()).hexdigest(), "content": " | ".join([f"{c}: {v}" for c,v in row.items() if pd.notna(v)]), "title": f"{name} - Row {i+1}", "file_name": name})
        elif suffix in ['.xlsx', '.xls']:
            for i, row in pd.read_excel(path).iterrows():
                docs.append({"id": hashlib.md5(f"{name}_{i}".encode()).hexdigest(), "content": " | ".join([f"{c}: {v}" for c,v in row.items() if pd.notna(v)]), "title": f"{name} - Row {i+1}", "file_name": name})
    except Exception as e: print(f"‚ö†Ô∏è {name}: {e}")
    return docs

all_docs = []
for f in Path('files').iterdir():
    if not f.name.startswith('.'):
        docs = read_file(str(f))
        if docs: all_docs.extend(docs); print(f"‚úÖ {f.name}: {len(docs)}")
print(f"üìä Total: {len(all_docs)} chunks")

## 6Ô∏è‚É£ Generate Embeddings & Index

In [None]:
from openai import AzureOpenAI
from azure.search.documents import SearchClient

oai = AzureOpenAI(api_key=config["AZURE_OPENAI_API_KEY"], api_version="2024-08-01-preview", azure_endpoint=config["AZURE_OPENAI_ENDPOINT"])

print("üß† Generating embeddings...")
for i in range(0, len(all_docs), 16):
    batch = all_docs[i:i+16]
    resp = oai.embeddings.create(input=[d["content"][:8000] for d in batch], model=config["AZURE_OPENAI_EMBEDDING_DEPLOYMENT"])
    for j, item in enumerate(resp.data): batch[j]["content_vector"] = item.embedding
    print(f"  Batch {i//16+1}")

print("üì§ Uploading...")
sc = SearchClient(config["AZURE_SEARCH_ENDPOINT"], config["AZURE_SEARCH_INDEX_NAME"], AzureKeyCredential(config["AZURE_SEARCH_API_KEY"]))
for i in range(0, len(all_docs), 100):
    sc.upload_documents(all_docs[i:i+100])
print("‚úÖ Done!")

## 7Ô∏è‚É£ Test

In [None]:
from azure.search.documents.models import VectorizedQuery

def ask(q):
    vec = oai.embeddings.create(input=[q], model=config["AZURE_OPENAI_EMBEDDING_DEPLOYMENT"]).data[0].embedding
    results = list(sc.search(search_text=q, vector_queries=[VectorizedQuery(vector=vec, k_nearest_neighbors=5, fields="content_vector")], top=5))
    ctx = "\n".join([f"[{r['title']}]: {r['content'][:300]}" for r in results])
    return oai.chat.completions.create(model=config["AZURE_OPENAI_DEPLOYMENT"], messages=[{"role": "system", "content": "Answer based on context."}, {"role": "user", "content": f"Context:\n{ctx}\n\nQ: {q}"}]).choices[0].message.content

print("üß™ Testing...")
for q in ["What products are available?", "Customer status overview?"]:
    print(f"‚ùì {q}\nüí¨ {ask(q)}\n")

## ‚úÖ Complete!

Next steps:
- `python -m uvicorn src.api:app --reload`
- `cd frontend && npm run dev`

In [None]:
# Interactive - change question and re-run
YOUR_QUESTION = "Executive summary of operational health"
print(f"üí¨ {ask(YOUR_QUESTION)}")