# 🚀 Colab Setup — AI_Matching

This notebook clones the repo, installs dependencies, runs the full pipeline, and renders `learn_from_outputs.ipynb` into an HTML report inline.

## 1) Clone the repo & enter project folder

In [1]:
# --- Repo settings ---
REPO_URL = "https://github.com/Clem085/AI_Matching.git"
BRANCH   = "main"        # change if needed
SUBDIR   = "ai_matching" # code lives here

# --- Fresh clone and enter project folder ---
!rm -rf /content/AI_Matching
!git clone -b "$BRANCH" --single-branch "$REPO_URL" /content/AI_Matching

%cd /content/AI_Matching/{SUBDIR}
!pwd && ls -lah


Cloning into '/content/AI_Matching'...
remote: Enumerating objects: 77, done.[K
remote: Counting objects: 100% (77/77), done.[K
remote: Compressing objects: 100% (62/62), done.[K
remote: Total 77 (delta 27), reused 57 (delta 15), pack-reused 0 (from 0)[K
Receiving objects: 100% (77/77), 3.25 MiB | 9.13 MiB/s, done.
Resolving deltas: 100% (27/27), done.
/content/AI_Matching/ai_matching
/content/AI_Matching/ai_matching
total 432K
drwxr-xr-x 3 root root 4.0K Oct 19 08:12 .
drwxr-xr-x 6 root root 4.0K Oct 19 08:12 ..
-rw-r--r-- 1 root root  13K Oct 19 08:12 associates_unassigned.csv
-rw-r--r-- 1 root root 6.2K Oct 19 08:12 build_learn_notebook.py
-rw-r--r-- 1 root root 1.3K Oct 19 08:12 build_pairs.py
-rw-r--r-- 1 root root 3.8K Oct 19 08:12 generate_data.py
-rw-r--r-- 1 root root  28K Oct 19 08:12 learn_from_outputs.ipynb
-rw-r--r-- 1 root root 4.8K Oct 19 08:12 matcher_lib.py
drwxr-xr-x 2 root root 4.0K Oct 19 08:12 __pycache__
-rw-r--r-- 1 root root 1.5K Oct 19 08:12 score_matches.p

## 2) Install dependencies

In [2]:
import os

# Prefer requirements.txt in ai_matching/. If not there, look one level up.
if os.path.exists("requirements.txt"):
    req_path = "requirements.txt"
elif os.path.exists("../requirements.txt"):
    req_path = "../requirements.txt"
else:
    req_path = None

if req_path:
    print("Installing from", req_path)
    %pip install -q -r {req_path}
else:
    print("No requirements.txt found — installing a minimal set")
    %pip install -q pandas numpy scikit-learn joblib matplotlib


Installing from ../requirements.txt


## 3) Make the repo importable

In [3]:
import sys, os
cwd = os.getcwd()

# Put the project dir on sys.path so notebook executes can import your modules
if cwd not in sys.path:
    sys.path.insert(0, cwd)

# Also export PYTHONPATH for any subprocess (nbconvert)
%env PYTHONPATH={cwd}

print("CWD:", cwd)
print("sys.path[0]:", sys.path[0])


env: PYTHONPATH=/content/AI_Matching/ai_matching
CWD: /content/AI_Matching/ai_matching
sys.path[0]: /content/AI_Matching/ai_matching


## 4) Run the full pipeline

In [4]:
# Generates synthetic data, builds pairs, trains, and scores
!python supervision_tool.py generate build train score


=== generate ===
Wrote synthetic supervisors & associates.
done.

=== build ===
Wrote pairs -> /content/AI_Matching/ai_matching/Supervision_HistoricalPairs_SYNTH.csv
done.

=== train ===
=== Validation Report ===
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         9
           1       0.84      1.00      0.91        48

    accuracy                           0.84        57
   macro avg       0.42      0.50      0.46        57
weighted avg       0.71      0.84      0.77        57

Saved model -> /content/AI_Matching/ai_matching/supervision_pair_model.joblib
done.

=== score ===
Wrote -> /content/AI_Matching/ai_matching/supervision_matches.csv
Wrote -> /content/AI_Matching/ai_matching/associates_unassigned.csv
done.



## 5) See outputs (CSV previews and listing)

In [5]:
import os, glob, pandas as pd

print("Working dir:", os.getcwd())
print("\nGenerated artifacts:")
for f in sorted(glob.glob("*.csv") + glob.glob("*.joblib") + glob.glob("*.png")):
    print(" -", f)

# Preview tables inline when present
for f in ["Supervision_HistoricalPairs_SYNTH.csv",
          "supervision_matches.csv",
          "associates_unassigned.csv"]:
    if os.path.exists(f):
        print(f"\n=== {f} (head) ===")
        display(pd.read_csv(f).head())


Working dir: /content/AI_Matching/ai_matching

Generated artifacts:
 - Supervision_Associates_SYNTH.csv
 - Supervision_HistoricalPairs_SYNTH.csv
 - Supervision_Supervisors_SYNTH.csv
 - associates_unassigned.csv
 - supervision_matches.csv
 - supervision_pair_model.joblib

=== Supervision_HistoricalPairs_SYNTH.csv (head) ===


Unnamed: 0,assoc_idx,sup_idx,Associate,Associate Email,Associate State,Associate License,Associate Availability,Supervisor,Supervisor Email,Supervisor State,Who can you supervise?,Supervisor Availability,Capacity,AvailabilityOverlap,AvailabilityScore,Label,Reason
0,0,22,Dakota Wilson,dakota.wilson@sample.org,VA,Social Worker,"Monday 10:30 AM, Tuesday 11:00 AM, Tuesday 12:...",Casey Bennett,casey.bennett@example.com,VA,"Social Worker, Marriage and Family Therapist","Monday 12:00 PM, Monday 3:30 PM, Tuesday 10:00...",4,1,0.641311,1,good_match
1,4,28,Sam Martinez,sam.martinez@mail.net,IA,Marriage and Family Therapist,"Monday 11:30 AM, Monday 4:30 PM, Monday 5:00 P...",Taylor Lee,taylor.lee@example.com,IA,"Counselor, Marriage and Family Therapist","Monday 9:00 AM, Monday 11:30 AM, Monday 12:00 ...",6,5,0.837173,1,good_match
2,4,56,Sam Martinez,sam.martinez@mail.net,IA,Marriage and Family Therapist,"Monday 11:30 AM, Monday 4:30 PM, Monday 5:00 P...",Emerson Martinez,emerson.martinez@sample.org,IA,"Marriage and Family Therapist, Social Worker, ...","Monday 10:30 AM, Monday 2:00 PM, Monday 4:30 P...",0,1,0.694562,0,capacity_full
3,4,58,Sam Martinez,sam.martinez@mail.net,IA,Marriage and Family Therapist,"Monday 11:30 AM, Monday 4:30 PM, Monday 5:00 P...",Reese Lopez,reese.lopez@example.com,IA,"Psychologist, Marriage and Family Therapist, C...","Monday 8:30 AM, Monday 3:30 PM, Tuesday 2:00 P...",4,0,0.779163,1,good_match
4,5,68,Rowan Garcia,rowan.garcia@sample.org,TN,Social Worker,"Monday 11:00 AM, Monday 1:30 PM, Tuesday 9:30 ...",Reese Brooks,reese.brooks@mail.net,TN,"Social Worker, Marriage and Family Therapist, ...","Monday 8:00 AM, Monday 5:00 PM, Tuesday 9:00 A...",2,1,0.653683,1,good_match



=== supervision_matches.csv (head) ===


Unnamed: 0,assoc_idx,sup_idx,Associate,Associate Email,Associate State,Associate License,Associate Availability,Supervisor,Supervisor Email,Supervisor State,Who can you supervise?,Supervisor Availability,Capacity,AvailabilityOverlap,AvailabilityScore,match_score,final_score
0,169,47,Harper Chen,harper.chen@example.com,IL,Marriage and Family Therapist,"Monday 11:30 AM, Monday 5:00 PM, Monday 6:00 P...",Alex Bennett,alex.bennett@sample.org,IL,"Marriage and Family Therapist, Psychologist, C...","Monday 9:30 AM, Monday 11:00 AM, Monday 3:30 P...",6,2,0.874573,0.924525,0.899549
1,64,99,Casey Clark,casey.clark@mail.net,MN,Psychologist,"Monday 1:00 PM, Monday 4:30 PM, Tuesday 1:00 P...",Jordan Ramirez,jordan.ramirez@mail.net,MN,"Counselor, Psychologist","Monday 11:00 AM, Monday 2:00 PM, Monday 2:30 P...",6,1,0.838486,0.945587,0.892037
2,129,27,Alex Cooper,alex.cooper@example.com,"OK, TN, RI",Marriage and Family Therapist,"Monday 12:00 PM, Monday 2:30 PM, Tuesday 3:00 ...",Kendall Martinez,kendall.martinez@sample.org,OK,"Counselor, Marriage and Family Therapist","Monday 12:30 PM, Tuesday 12:30 PM, Tuesday 1:0...",6,2,0.847485,0.930792,0.889138
3,187,79,Emerson Reed,emerson.reed@example.com,WV,Marriage and Family Therapist,"Monday 5:00 PM, Monday 6:00 PM, Tuesday 9:00 A...",Sam Nguyen,sam.nguyen@example.com,WV,"Social Worker, Marriage and Family Therapist, ...","Monday 9:00 AM, Monday 12:30 PM, Monday 2:00 P...",6,2,0.835162,0.93348,0.884321
4,35,67,Skyler Rivera,skyler.rivera@example.com,MN,Counselor,"Monday 10:30 AM, Monday 2:30 PM, Wednesday 8:0...",Quinn Hernandez,quinn.hernandez@mail.net,"RI, MD, MN",Counselor,"Monday 2:30 PM, Monday 3:00 PM, Tuesday 11:30 ...",6,1,0.793352,0.953063,0.873207



=== associates_unassigned.csv (head) ===


Unnamed: 0,Timestamp,Email Address,Name,State,License Type,Availability
0,,alex.davis@example.com,Alex Davis,ME,Counselor,"Monday 5:00 PM, Tuesday 12:30 PM, Tuesday 4:30..."
1,,logan.cooper@mail.net,Logan Cooper,MD,Marriage and Family Therapist,"Tuesday 8:30 AM, Thursday 12:00 PM, Thursday 2..."
2,,harper.brooks@sample.org,Harper Brooks,AR,Marriage and Family Therapist,"Monday 5:30 PM, Tuesday 9:00 AM, Tuesday 4:30 ..."
3,,logan.jenkins@sample.org,Logan Jenkins,PA,Psychologist,"Monday 5:30 PM, Tuesday 10:00 AM, Tuesday 12:0..."
4,,peyton.gonzalez@mail.net,Peyton Gonzalez,AL,Psychologist,"Monday 8:30 AM, Monday 12:00 PM, Monday 5:30 P..."


## 6) Execute `learn_from_outputs.ipynb` and render report

In [6]:
import os

nb_in  = "learn_from_outputs.ipynb"
nb_out = "learn_from_outputs_EXECUTED.ipynb"
html   = "learn_from_outputs_report.html"

if not os.path.exists(nb_in):
    print(f"{nb_in} not found in this directory.")
else:
    # Clear stale outputs first to avoid json errors
    !jupyter nbconvert --to notebook --ClearOutputPreprocessor.enabled=True --inplace {nb_in}

    # Execute with the current kernel and environment (PYTHONPATH already set)
    !jupyter nbconvert --to notebook \
      --execute \
      --ExecutePreprocessor.kernel_name=python3 \
      --ExecutePreprocessor.timeout=600 \
      {nb_in} \
      --output {nb_out}

    # Convert executed notebook to HTML
    !jupyter nbconvert --to html {nb_out} --output {html}

    # Render the HTML directly (more reliable than an IFrame in Colab)
    from IPython.display import HTML
    HTML(open(html, "r", encoding="utf-8").read())


[NbConvertApp] Converting notebook learn_from_outputs.ipynb to notebook
  validate(nb)
[NbConvertApp] ERROR | Notebook JSON is invalid: Additional properties are not allowed ('errorDetails' was unexpected)

Failed validating 'additionalProperties' in error:

On instance['cells'][9]['outputs'][0]:
{'ename': 'ModuleNotFoundError',
 'errorDetails': {'actions': [{'action': 'open_url',
                               'actionText': 'Open Examples',
                               'url': '/notebooks/snippets/importing_libraries.ipynb'}]},
 'evalue': "No module named 'matcher_lib'",
 'output_type': 'error',
 'traceback': ['\x1b[0;31m---------------------------------------------------------...',
               '\x1b[0;31mModuleNotFoundError\x1b[0m                       '
               'Traceback (...',
               '\x1b[0;32m/tmp/ipython-input-2009321633.py\x1b[0m in '
               '\x1b[0;36m<cell line...',
               '\x1b[0;31mModuleNotFoundError\x1b[0m: No module named '
           

## 7) (Optional) Download the report or executed notebook

In [7]:
from google.colab import files
# files.download("learn_from_outputs_EXECUTED.ipynb")
# files.download("learn_from_outputs_report.html")


## Sanity-check (optional)

In [8]:
# Confirm we’re in the correct directory and imports work
import os, sys, glob
print("CWD:", os.getcwd())
print("sys.path[0]:", sys.path[0])
import matcher_lib
print("matcher_lib import OK")
print("Artifacts:", sorted(glob.glob("*.csv") + glob.glob("*.joblib")))


CWD: /content/AI_Matching/ai_matching
sys.path[0]: /content/AI_Matching/ai_matching
matcher_lib import OK
Artifacts: ['Supervision_Associates_SYNTH.csv', 'Supervision_HistoricalPairs_SYNTH.csv', 'Supervision_Supervisors_SYNTH.csv', 'associates_unassigned.csv', 'supervision_matches.csv', 'supervision_pair_model.joblib']
