+ Load people.csv → list of dicts
+ Build unweighted NetworkX graph: connect if same school/company or ≥2 shared skills
+ BFS from “You” to a target node → show path


# MVP — Batch test extraction on 10–20 profiles

This notebook:
1. Loads `sample_profiles.jsonl`
2. Calls your API `POST /extract-profiles` for each text
3. Upserts all results via `POST /ingest-people`
4. Shows a preview and the saved `data/people.json`

> Make sure your server is running first:
```bash
uvicorn main:app --reload --port 8000
```


In [10]:

%pip install -q requests pandas networkx
import json, requests, time, pandas as pd
from pathlib import Path

BASE_URL = "http://127.0.0.1:8000"
SAMPLES = Path("../sample_profiles.jsonl")
OUT = Path("../data/people.json")
#rows = [json.loads(line) for line in SAMPLES.read_text(encoding="utf-8").splitlines() if line.strip()]
rows = json.loads(SAMPLES.read_text(encoding="utf-8"))
print("Loaded samples:", len(rows))

people = []
for r in rows:
    payload = {"text": r["text"]}
    resp = requests.post(f"{BASE_URL}/extract-profiles", json=payload, timeout=60)
    if resp.status_code != 200:
        print("ERR", resp.status_code, resp.text[:200])
        continue
    prof = resp.json()
    prof["_id"] = r["id"]
    people.append(prof)
    time.sleep(0.2)  # gentle

print("Extracted", len(people), "profiles")
OUT.write_text(json.dumps(people, indent=2, ensure_ascii=False), encoding="utf-8")

Note: you may need to restart the kernel to use updated packages.
Loaded samples: 148
Extracted 148 profiles


71839

### Ingest data

In [11]:
# Ingest
resp = requests.post(f"{BASE_URL}/ingest-people", json={"people": people}, timeout=60)
print("Ingest:", resp.status_code, resp.text)

Ingest: 200 {"ok":true,"inserted":0,"updated":148,"total":148,"path":"/Users/crishuynh/Documents/Semester 7/DPS970/6ixPathConnect/service/data/people.json"}


In [12]:
# Preview
df = pd.DataFrame(people)
df.head(400)

Unnamed: 0,_id,name,company,role,schools,skills,keywords,seniority
0,travis_liu,Travis Liu,RBC,Software Engineer in Test,[Seneca Polytechnic],"[Selenium, Test Automation, Test Engineering, ...","[automation, cloud-native, observability, UI t...",Other
1,kristina_z_16412b2a7,Kristina Zaporozhets,Scotiabank,Global Analytics and Financial Engineer Intern,[Seneca Polytechnic],"[C, C++, Applied Research, Communication, Git,...","[analytics, financial engineering, software de...",Student/Intern
2,ngoc_vien_do_b13b37244,Ngoc Vien Do,York University,Student/Research/Club Roles,[York University],"[Capital Markets, Communication, Data Analysis...","[quantitative finance, data-driven research, c...",Student/Intern
3,andrewnt219,Andrew Nguyen,KPMG Canada,Software Engineer,[Seneca College],"[Agile/Scrum, Communication, Git/GitHub, JavaS...","[software engineering, web development, team c...",Entry
4,tanise_lacasse_cpa_ca_5ab67640,"Tanise Lacasse, CPA, CA",KPMG Canada,Partner,[University of Calgary],"[Corporate Tax, Canadian Tax, Accounting, Fina...","[corporate tax, asset management, wealth manag...",Manager+
...,...,...,...,...,...,...,...,...
143,google_eng_manager_01,Sarah Park,Google,Engineering Manager,[University of Toronto],"[People Management, Goal Setting, Web Performa...","[browser, performance, delivery, career growth...",Manager+
144,meta_eng_manager_01,Ryan Cole,Meta,Engineering Manager,[MIT],"[People Leadership, Product Engineering, Mobil...","[video creation, editing experiences, product ...",Manager+
145,pwc_partner_candidate_01,Amanda Li,PwC Canada,"Director, Deals & Valuations",[University of Waterloo],"[Valuation, Financial Modeling, Client Relatio...","[valuation, financial modeling, transaction su...",Manager+
146,deloitte_partner_01,Matthew O’Reilly,Deloitte,Partner,[Ivey Business School],"[Risk Advisory, Financial Services, Executive ...","[regulatory change, cyber risk, third-party ri...",Manager+


In [13]:

# Inspect saved JSON (optional)
from pathlib import Path
p = Path("../data/people.json")
print("people.json exists:", p.exists(), "size:", p.stat().st_size if p.exists() else 0)
print(p.read_text(encoding="utf-8")[:1000])


people.json exists: True size: 71859
[
  {
    "_id": "travis_liu",
    "name": "Travis Liu",
    "company": "RBC",
    "role": "Software Engineer in Test",
    "schools": [
      "Seneca Polytechnic"
    ],
    "skills": [
      "Selenium",
      "Test Automation",
      "Test Engineering",
      "Software Testing",
      "API Testing",
      "Web Testing",
      "CI/CD",
      "Kubernetes",
      "OpenShift",
      "Docker",
      "Elastic Stack (ELK)",
      "Dynatrace",
      "REST APIs",
      "Python",
      "JavaScript",
      "TypeScript",
      "React.js",
      "Next.js",
      "MongoDB",
      "SQL",
      "Database Testing",
      "OAuth",
      "Jest",
      "GitHub",
      "DevOps",
      "Project Management"
    ],
    "keywords": [
      "automation",
      "cloud-native",
      "observability",
      "UI testing",
      "API testing"
    ],
    "seniority": "Other"
  },
  {
    "_id": "kristina_z_16412b2a7",
    "name": "Kristina Zaporozhets",
    "company": "Scotiaban