+ Load people.csv → list of dicts
+ Build unweighted NetworkX graph: connect if same school/company or ≥2 shared skills
+ BFS from “You” to a target node → show path


# MVP — Batch test extraction on 10–20 profiles

This notebook:
1. Loads `sample_profiles.jsonl`
2. Calls your API `POST /extract-profiles` for each text
3. Upserts all results via `POST /ingest-people`
4. Shows a preview and the saved `data/people.json`

> Make sure your server is running first:
```bash
uvicorn main:app --reload --port 8000
```


In [None]:

%pip install -q requests pandas networkx
import json, requests, time, pandas as pd
from pathlib import Path

BASE_URL = "http://127.0.0.1:8000"
SAMPLES = Path("../sample_profiles.jsonl")
OUT = Path("../data/people.json")
#rows = [json.loads(line) for line in SAMPLES.read_text(encoding="utf-8").splitlines() if line.strip()]
rows = json.loads(SAMPLES.read_text(encoding="utf-8"))
print("Loaded samples:", len(rows))

people = []
for r in rows:
    payload = {"text": r["text"]}
    resp = requests.post(f"{BASE_URL}/extract-profiles", json=payload, timeout=60)
    if resp.status_code != 200:
        print("ERR", resp.status_code, resp.text[:200])
        continue
    prof = resp.json()
    prof["_id"] = r["id"]
    people.append(prof)
    time.sleep(0.2)  # gentle

print("Extracted", len(people), "profiles")

# Ingest
resp = requests.post(f"{BASE_URL}/ingest-people", json={"people": people}, timeout=60)
print("Ingest:", resp.status_code, resp.text)

# Preview
df = pd.DataFrame(people)
df.head(400)


Note: you may need to restart the kernel to use updated packages.
Loaded samples: 32
Extracted 32 profiles
Ingest: 200 {"ok":true,"inserted":2,"updated":30,"total":34,"path":"/Users/crishuynh/Documents/Semester 7/DPS970/6ixPathConnect/service/data/people.json"}


Unnamed: 0,name,company,role,schools,skills,keywords,seniority,_id
0,Travis Liu,RBC,Software Engineer in Test,[Seneca Polytechnic],"[Selenium, Test Automation, Test Engineering, ...","[automation, cloud-native, observability, UI t...",Other,travis_liu
1,Kristina Z,(Company),(Current Role),[(School name)],"[Communication, MS Office, Teamwork, Problem S...",[],Other,kristina_z_16412b2a7
2,Ngoc Vien Do,York University,Student/Research/Club Roles,[York University],"[Capital Markets, Communication, Data Analysis...","[quantitative finance, data-driven research, c...",Student/Intern,ngoc_vien_do_b13b37244
3,Andrew Nguyen,KPMG Canada,Software Engineer,[Seneca College],"[Agile/Scrum, Communication, Git/GitHub, JavaS...","[software engineering, web development, team c...",Entry,andrewnt219
4,"Tanise Lacasse, CPA, CA",KPMG Canada,"Partner, Canadian Corporate / Asset & Wealth M...",[University of Calgary],"[Corporate Tax, Canadian Tax, Accounting, Fina...","[corporate tax, asset management, wealth manag...",Manager+,tanise_lacasse_cpa_ca_5ab67640


In [2]:

# Inspect saved JSON (optional)
from pathlib import Path
p = Path("../data/people.json")
print("people.json exists:", p.exists(), "size:", p.stat().st_size if p.exists() else 0)
print(p.read_text(encoding="utf-8")[:1000])


people.json exists: True size: 838
[
  {
    "name": "Travis Liu",
    "company": "RBC",
    "role": "Software Engineer in Test",
    "schools": [
      "Seneca Polytechnic"
    ],
    "skills": [
      "Selenium",
      "Test Automation",
      "Test Engineering",
      "Software Testing",
      "API Testing",
      "Web Testing",
      "CI/CD",
      "Kubernetes",
      "OpenShift",
      "Docker",
      "Elastic Stack (ELK)",
      "Dynatrace",
      "REST APIs",
      "Python",
      "JavaScript",
      "TypeScript",
      "React.js",
      "Next.js",
      "MongoDB",
      "SQL",
      "Database Testing",
      "OAuth",
      "Jest",
      "GitHub",
      "DevOps",
      "Project Management"
    ],
    "keywords": [
      "automation",
      "cloud-native",
      "observability",
      "UI testing",
      "API testing"
    ],
    "seniority": "Other"
  }
]
