<a href="https://colab.research.google.com/github/PriyaSinha786/research-papers/blob/main/CDIPR/hr_poc_colab_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# HR Governance POC — Colab Notebook (Faster model)

This notebook sets up a small, self-contained POC for *Building Ethical AI in HR*. It prefers a smaller sentence-transformer model for faster runs but will fall back if unavailable.

**Steps:**
1. Install dependencies
2. Write POC scripts to `/content/hr_poc`
3. Run data preparation, index build, training, and demo
4. (Optional) Set `OPENAI_API_KEY` to enable LLM answers


In [1]:
!pip install -q sentence-transformers==2.2.2 scikit-learn pandas joblib openai
print('Installed packages (if not already present).')

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/86.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━[0m [32m81.9/86.0 kB[0m [31m2.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for sentence-transformers (setup.py) ... [?25l[?25hdone
Installed packages (if not already present).


In [2]:

import base64, os, pathlib, json
BASE = pathlib.Path('/content/hr_poc')
if BASE.exists():
    import shutil
    shutil.rmtree(BASE)
BASE.mkdir(parents=True, exist_ok=True)

files_b64 = {"README.md": "SFIgR292ZXJuYW5jZSBQT0MgLSBRdWlja3N0YXJ0IChDb2xhYikKPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQpUaGlzIGZvbGRlciBjb250YWlucyBhIG1pbmltYWwgUE9DIGZvciAiRXRoaWNhbCBBSSBpbiBIUiIgKHN5bnRoZXRpYyBkYXRhICsgcmV0cmlldmVyICsgY2xhc3NpZmllcikuClJ1biB0aGUgc2NyaXB0cyBpbiBvcmRlcjoKICAxKSBweXRob24gcHJlcGFyZV9kYXRhLnB5CiAgMikgcHl0aG9uIGluZGV4X2RvY3MucHkKICAzKSBweXRob24gdHJhaW5fY2xhc3NpZmllci5weQogIDQpIHB5dGhvbiByYWdfcXVlcnkucHkgLS1yZXN1bWVfcGF0aCBkYXRhL3Jlc3VtZXMvcmVzdW1lXzEudHh0CiAgNSkgcHl0aG9uIGV2YWxfcGlwZWxpbmUucHkK", "prepare_data.py": "IyBwcmVwYXJlX2RhdGEucHkKaW1wb3J0IG9zLCByYW5kb20sIGpzb24KZnJvbSBwYXRobGliIGltcG9ydCBQYXRoCnJhbmRvbS5zZWVkKDQyKQoKYmFzZSA9IFBhdGgoX19maWxlX18pLnJlc29sdmUoKS5wYXJlbnQKZGF0YV9kaXIgPSBiYXNlIC8gImRhdGEiCnJlc3VtZXNfZGlyID0gZGF0YV9kaXIgLyAicmVzdW1lcyIKcG9saWNpZXNfZGlyID0gZGF0YV9kaXIgLyAicG9saWN5X2RvY3MiCm9zLm1ha2VkaXJzKHJlc3VtZXNfZGlyLCBleGlzdF9vaz1UcnVlKQpvcy5tYWtlZGlycyhwb2xpY2llc19kaXIsIGV4aXN0X29rPVRydWUpCgojIFN5bnRoZXRpYyBhdHRyaXRpb24gQ1NWCmltcG9ydCBwYW5kYXMgYXMgcGQKbiA9IDUwMApyb3dzID0gW10KZm9yIGkgaW4gcmFuZ2Uobik6CiAgICBhZ2UgPSByYW5kb20ucmFuZGludCgyMiwgNjApCiAgICBtb250aGx5X2luY29tZSA9IHJhbmRvbS5yYW5kaW50KDIwMDAsIDIwMDAwKQogICAgam9iX3NhdCA9IHJhbmRvbS5jaG9pY2UoWzEsIDIsIDMsIDRdKQogICAgeWVhcnMgPSByYW5kb20ucmFuZGludCgwLCAzMCkKICAgIGdlbmRlciA9IHJhbmRvbS5jaG9pY2UoWyJNYWxlIiwgIkZlbWFsZSJdKQogICAgZWR1Y2F0aW9uID0gcmFuZG9tLmNob2ljZShbMSwgMiwgMywgNCwgNV0pCiAgICBwcm9iX2F0dHIgPSAwLjIKICAgIGlmIGFnZSA8IDMwIGFuZCBqb2Jfc2F0IDw9IDIgYW5kIHllYXJzIDwgMzoKICAgICAgICBwcm9iX2F0dHIgPSAwLjYKICAgIGF0dHJpdGlvbiA9IDEgaWYgcmFuZG9tLnJhbmRvbSgpIDwgcHJvYl9hdHRyIGVsc2UgMAogICAgcm93cy5hcHBlbmQoW2FnZSwgbW9udGhseV9pbmNvbWUsIGpvYl9zYXQsIHllYXJzLCBnZW5kZXIsIGVkdWNhdGlvbiwgYXR0cml0aW9uXSkKCmRmID0gcGQuRGF0YUZyYW1lKHJvd3MsIGNvbHVtbnM9WyJBZ2UiLCAiTW9udGhseUluY29tZSIsICJKb2JTYXRpc2ZhY3Rpb24iLCAiWWVhcnNBdENvbXBhbnkiLCAiR2VuZGVyIiwgIkVkdWNhdGlvbiIsICJBdHRyaXRpb24iXSkKZGYudG9fY3N2KGRhdGFfZGlyIC8gImF0dHJpdGlvbl9zeW50aGV0aWMuY3N2IiwgaW5kZXg9RmFsc2UpCnByaW50KCJXcm90ZSBzeW50aGV0aWMgYXR0cml0aW9uIGRhdGFzZXQ6IiwgZGF0YV9kaXIgLyAiYXR0cml0aW9uX3N5bnRoZXRpYy5jc3YiKQoKIyBDcmVhdGUgcmVzdW1lcwpza2lsbHNfcG9vbCA9IFsiUHl0aG9uIiwgIkphdmEiLCAiU1FMIiwgIk1hY2hpbmUgTGVhcm5pbmciLCAiRGVlcCBMZWFybmluZyIsICJOTFAiLCAiQ29tcHV0ZXIgVmlzaW9uIiwgIkRhdGEgRW5naW5lZXJpbmciLCAiVGVuc29yRmxvdyIsICJQeVRvcmNoIiwgIkt1YmVybmV0ZXMiLCAiQVdTIiwgIkNvbW11bmljYXRpb24iLCAiTGVhZGVyc2hpcCJdCnRpdGxlcyA9IFsiU29mdHdhcmUgRW5naW5lZXIiLCAiRGF0YSBTY2llbnRpc3QiLCAiTUwgRW5naW5lZXIiLCAiQmFja2VuZCBEZXZlbG9wZXIiLCAiRGF0YSBFbmdpbmVlciIsICJQcm9kdWN0IE1hbmFnZXIiXQpmb3IgaSBpbiByYW5nZSgxLCAxMSk6CiAgICBuYW1lID0gZiJDYW5kaWRhdGUge2l9IgogICAgdGl0bGUgPSByYW5kb20uY2hvaWNlKHRpdGxlcykKICAgIHllYXJzID0gcmFuZG9tLnJhbmRpbnQoMCwgMTIpCiAgICBza2lsbHMgPSByYW5kb20uc2FtcGxlKHNraWxsc19wb29sLCBrPXJhbmRvbS5yYW5kaW50KDMsIDYpKQogICAgZXhwID0gZiJJIHdvcmtlZCBhcyBhIHt0aXRsZX0gZm9yIHt5ZWFyc30geWVhcnMuIEkgaGF2ZSBleHBlcmllbmNlIGluICIgKyAiLCAiLmpvaW4oc2tpbGxzKSArICIuIgogICAgcmVzdW1lX3RleHQgPSBmIntuYW1lfVxue3RpdGxlfVxue2V4cH1cblJlc3BvbnNpYmlsaXRpZXM6IERlbGl2ZXJlZCBwcm9qZWN0cywgY29sbGFib3JhdGVkIHdpdGggdGVhbXMsIGFuZCBpbXByb3ZlZCBzeXN0ZW1zLiIKICAgIChyZXN1bWVzX2RpciAvIGYicmVzdW1lX3tpfS50eHQiKS53cml0ZV90ZXh0KHJlc3VtZV90ZXh0KQoKcHJpbnQoIldyb3RlIHNhbXBsZSByZXN1bWVzIHVuZGVyIiwgcmVzdW1lc19kaXIpCgojIFBvbGljeSBkb2NzCnBvbGljeV90ZXh0cyA9IHsKICAgICJwb2xpY3lfMS50eHQiOiAiRXF1YWwgT3Bwb3J0dW5pdHkgUG9saWN5OiBUaGUgY29tcGFueSBpcyBhbiBlcXVhbCBvcHBvcnR1bml0eSBlbXBsb3llci4gSGlyaW5nIGRlY2lzaW9ucyBtdXN0IG5vdCBiZSBiYXNlZCBvbiBnZW5kZXIsIHJhY2UsIHJlbGlnaW9uLCBvciBhZ2UuIEFzc2VzcyBjYW5kaWRhdGVzIG9uIHNraWxscyBhbmQgZXhwZXJpZW5jZS4iLAogICAgInBvbGljeV8yLnR4dCI6ICJEYXRhIFByaXZhY3kgUG9saWN5OiBDYW5kaWRhdGUgcGVyc29uYWwgZGF0YSBtdXN0IGJlIGhhbmRsZWQgcGVyIGxvY2FsIGxhd3MuIERvIG5vdCBleHBvc2UgUElJIGluIHJlcG9ydHMuIFVzZSBhZ2dyZWdhdGVkIG1ldHJpY3Mgd2hlcmUgcG9zc2libGUuIiwKICAgICJwb2xpY3lfMy50eHQiOiAiUHJvbW90aW9uIEVsaWdpYmlsaXR5OiBNaW5pbXVtIDIgeWVhcnMgaW4gcm9sZSBhbmQgZGVtb25zdHJhYmxlIGltcGFjdC4gTWFuYWdlcnMgbXVzdCBjb25zdWx0IEhSIGJlZm9yZSBwcm9tb3Rpb24gZGVjaXNpb25zLiIsCiAgICAicG9saWN5XzQudHh0IjogIkludGVydmlldyBGZWVkYmFjayBQb2xpY3k6IEludGVydmlldyBub3RlcyBjb250YWluaW5nIHBlcnNvbmFsIG9waW5pb25zIG11c3QgYmUgZmFjdHVhbCBhbmQgYmFzZWQgb24gb2JzZXJ2ZWQgYmVoYXZpb3VyLiBBdm9pZCBzdWJqZWN0aXZlIHVuc3VwcG9ydGVkIGNsYWltcy4iCn0KZm9yIGZuYW1lLCB0eHQgaW4gcG9saWN5X3RleHRzLml0ZW1zKCk6CiAgICAocG9saWNpZXNfZGlyIC8gZm5hbWUpLndyaXRlX3RleHQodHh0KQpwcmludCgiV3JvdGUgcG9saWN5IGRvY3MgdW5kZXIiLCBwb2xpY2llc19kaXIpCgojIGNvbmZpZwpjb25maWcgPSB7CiAgICAiYXR0cml0aW9uX2NzdiI6IHN0cihkYXRhX2RpciAvICJhdHRyaXRpb25fc3ludGhldGljLmNzdiIpLAogICAgInJlc3VtZXNfZGlyIjogc3RyKHJlc3VtZXNfZGlyKSwKICAgICJwb2xpY2llc19kaXIiOiBzdHIocG9saWNpZXNfZGlyKQp9CihkYXRhX2RpciAvICJjb25maWcuanNvbiIpLndyaXRlX3RleHQoanNvbi5kdW1wcyhjb25maWcsIGluZGVudD0yKSkKcHJpbnQoIldyaXR0ZW4gY29uZmlnLmpzb24iKQ==", "index_docs.py": "IyBpbmRleF9kb2NzLnB5CiMgVXNlcyBhIHNtYWxsIHNlbnRlbmNlLXRyYW5zZm9ybWVyIG1vZGVsIGZvciBmYXN0ZXIgZW1iZWRkaW5nIGNyZWF0aW9uOyBmYWxscyBiYWNrIHRvIGEgc2xpZ2h0bHkgbGFyZ2VyIG1vZGVsIGlmIG5lZWRlZC4KaW1wb3J0IG9zLCBqc29uCmZyb20gcGF0aGxpYiBpbXBvcnQgUGF0aAoKTU9ERUxfQ0FORElEQVRFID0gInBhcmFwaHJhc2UtTWluaUxNLUwzLXYyIgpGQUxMQkFDS19NT0RFTCA9ICJhbGwtTWluaUxNLUw2LXYyIgoKQkFTRSA9IFBhdGgoX19maWxlX18pLnJlc29sdmUoKS5wYXJlbnQKREFUQV9ESVIgPSBCQVNFIC8gImRhdGEiCk1PREVMU19ESVIgPSBCQVNFIC8gIm1vZGVscyIKb3MubWFrZWRpcnMoTU9ERUxTX0RJUiwgZXhpc3Rfb2s9VHJ1ZSkKCmNvbmZpZ19wYXRoID0gREFUQV9ESVIgLyAiY29uZmlnLmpzb24iCmlmIG5vdCBjb25maWdfcGF0aC5leGlzdHMoKToKICAgIHJhaXNlIEZpbGVOb3RGb3VuZEVycm9yKCJSdW4gcHJlcGFyZV9kYXRhLnB5IGZpcnN0LiIpCmNvbmZpZyA9IGpzb24ubG9hZChvcGVuKGNvbmZpZ19wYXRoKSkKcmVzdW1lc19kaXIgPSBQYXRoKGNvbmZpZ1sicmVzdW1lc19kaXIiXSkKcG9saWNpZXNfZGlyID0gUGF0aChjb25maWdbInBvbGljaWVzX2RpciJdKQoKZG9jcyA9IFtdCm1ldGFkYXRhID0gW10KZm9yIHAgaW4gc29ydGVkKHJlc3VtZXNfZGlyLmdsb2IoIioudHh0IikpOgogICAgZG9jcy5hcHBlbmQocC5yZWFkX3RleHQoKSkKICAgIG1ldGFkYXRhLmFwcGVuZCh7InNvdXJjZSI6IHN0cihwKSwgInR5cGUiOiAicmVzdW1lIn0pCmZvciBwIGluIHNvcnRlZChwb2xpY2llc19kaXIuZ2xvYigiKi50eHQiKSk6CiAgICBkb2NzLmFwcGVuZChwLnJlYWRfdGV4dCgpKQogICAgbWV0YWRhdGEuYXBwZW5kKHsic291cmNlIjogc3RyKHApLCAidHlwZSI6ICJwb2xpY3kifSkKCnByaW50KGYiTG9hZGVkIHtsZW4oZG9jcyl9IGRvY3MgaW50byBjb3JwdXMiKQoKdHJ5OgogICAgZnJvbSBzZW50ZW5jZV90cmFuc2Zvcm1lcnMgaW1wb3J0IFNlbnRlbmNlVHJhbnNmb3JtZXIKICAgIGltcG9ydCBudW1weSBhcyBucCwgam9ibGliCiAgICAjIFRyeSB0aGUgc21hbGxlciBjYW5kaWRhdGUgbW9kZWwgZmlyc3QgZm9yIHNwZWVkCiAgICBtb2RlbCA9IFNlbnRlbmNlVHJhbnNmb3JtZXIoTU9ERUxfQ0FORElEQVRFKQogICAgZW1icyA9IG1vZGVsLmVuY29kZShkb2NzLCBzaG93X3Byb2dyZXNzX2Jhcj1GYWxzZSkKICAgIGpvYmxpYi5kdW1wKHsgImVtYmVkZGluZ3MiOiBlbWJzLCAiZG9jcyI6IGRvY3MsICJtZXRhZGF0YSI6IG1ldGFkYXRhIH0sIE1PREVMU19ESVIgLyAiZG9jX2luZGV4LmpvYmxpYiIpCiAgICBwcmludCgiU2F2ZWQgZGVuc2UgZW1iZWRkaW5ncyBpbmRleCB3aXRoIG1vZGVsIiwgTU9ERUxfQ0FORElEQVRFKQpleGNlcHQgRXhjZXB0aW9uIGFzIGU6CiAgICBwcmludCgiRmFpbGVkIHRvIGxvYWQgc21hbGwgbW9kZWwgKG9yIGVuY29kaW5nIGVycm9yKToiLCBlKQogICAgdHJ5OgogICAgICAgIHByaW50KCJGYWxsaW5nIGJhY2sgdG8iLCBGQUxMQkFDS19NT0RFTCkKICAgICAgICBmcm9tIHNlbnRlbmNlX3RyYW5zZm9ybWVycyBpbXBvcnQgU2VudGVuY2VUcmFuc2Zvcm1lcgogICAgICAgIG1vZGVsID0gU2VudGVuY2VUcmFuc2Zvcm1lcihGQUxMQkFDS19NT0RFTCkKICAgICAgICBlbWJzID0gbW9kZWwuZW5jb2RlKGRvY3MsIHNob3dfcHJvZ3Jlc3NfYmFyPUZhbHNlKQogICAgICAgIGltcG9ydCBqb2JsaWIKICAgICAgICBqb2JsaWIuZHVtcCh7ICJlbWJlZGRpbmdzIjogZW1icywgImRvY3MiOiBkb2NzLCAibWV0YWRhdGEiOiBtZXRhZGF0YSB9LCBNT0RFTFNfRElSIC8gImRvY19pbmRleC5qb2JsaWIiKQogICAgICAgIHByaW50KCJTYXZlZCBkZW5zZSBlbWJlZGRpbmdzIGluZGV4IHdpdGggbW9kZWwiLCBGQUxMQkFDS19NT0RFTCkKICAgIGV4Y2VwdCBFeGNlcHRpb24gYXMgZTI6CiAgICAgICAgcHJpbnQoIkRlbnNlIGVtYmVkZGluZyBmYWlsZWQ7IGZhbGxpbmcgYmFjayB0byBURi1JREYiLCBlMikKICAgICAgICBmcm9tIHNrbGVhcm4uZmVhdHVyZV9leHRyYWN0aW9uLnRleHQgaW1wb3J0IFRmaWRmVmVjdG9yaXplcgogICAgICAgIGZyb20gc2tsZWFybi5uZWlnaGJvcnMgaW1wb3J0IE5lYXJlc3ROZWlnaGJvcnMKICAgICAgICBpbXBvcnQgam9ibGliCiAgICAgICAgdmVjID0gVGZpZGZWZWN0b3JpemVyKG1heF9mZWF0dXJlcz0yMDAwKQogICAgICAgIFggPSB2ZWMuZml0X3RyYW5zZm9ybShkb2NzKQogICAgICAgIG5uID0gTmVhcmVzdE5laWdoYm9ycyhuX25laWdoYm9ycz01LCBtZXRyaWM9J2Nvc2luZScpLmZpdChYKQogICAgICAgIGpvYmxpYi5kdW1wKHsgInZlY3Rvcml6ZXIiOiB2ZWMsICJubiI6IG5uLCAiZG9jcyI6IGRvY3MsICJtZXRhZGF0YSI6IG1ldGFkYXRhIH0sIE1PREVMU19ESVIgLyAiZG9jX2luZGV4X3RmaWRmLmpvYmxpYiIpCiAgICAgICAgcHJpbnQoIlNhdmVkIFRGLUlERiBpbmRleCIp", "train_classifier.py": "IyB0cmFpbl9jbGFzc2lmaWVyLnB5CmZyb20gcGF0aGxpYiBpbXBvcnQgUGF0aAppbXBvcnQgcGFuZGFzIGFzIHBkLCBqb2JsaWIsIG9zCmZyb20gc2tsZWFybi5lbnNlbWJsZSBpbXBvcnQgUmFuZG9tRm9yZXN0Q2xhc3NpZmllcgpmcm9tIHNrbGVhcm4ubW9kZWxfc2VsZWN0aW9uIGltcG9ydCB0cmFpbl90ZXN0X3NwbGl0CmZyb20gc2tsZWFybi5tZXRyaWNzIGltcG9ydCBhY2N1cmFjeV9zY29yZSwgZjFfc2NvcmUsIGNsYXNzaWZpY2F0aW9uX3JlcG9ydAoKQkFTRSA9IFBhdGgoX19maWxlX18pLnJlc29sdmUoKS5wYXJlbnQKREFUQV9ESVIgPSBCQVNFIC8gImRhdGEiCk1PREVMU19ESVIgPSBCQVNFIC8gIm1vZGVscyIKb3MubWFrZWRpcnMoTU9ERUxTX0RJUiwgZXhpc3Rfb2s9VHJ1ZSkKCmRmID0gcGQucmVhZF9jc3YoREFUQV9ESVIgLyAiYXR0cml0aW9uX3N5bnRoZXRpYy5jc3YiKQpkZjIgPSBwZC5nZXRfZHVtbWllcyhkZiwgY29sdW1ucz1bIkdlbmRlciIsICJFZHVjYXRpb24iXSwgZHJvcF9maXJzdD1UcnVlKQpYID0gZGYyLmRyb3AoY29sdW1ucz1bIkF0dHJpdGlvbiJdKQp5ID0gZGYyWyJBdHRyaXRpb24iXQoKWF90cmFpbiwgWF90ZXN0LCB5X3RyYWluLCB5X3Rlc3QgPSB0cmFpbl90ZXN0X3NwbGl0KFgsIHksIHRlc3Rfc2l6ZT0wLjIsIHJhbmRvbV9zdGF0ZT00Miwgc3RyYXRpZnk9eSkKCmNsZiA9IFJhbmRvbUZvcmVzdENsYXNzaWZpZXIobl9lc3RpbWF0b3JzPTEwMCwgcmFuZG9tX3N0YXRlPTQyKQpjbGYuZml0KFhfdHJhaW4sIHlfdHJhaW4pCnByZWRzID0gY2xmLnByZWRpY3QoWF90ZXN0KQpwcmludCgiQWNjOiIsIGFjY3VyYWN5X3Njb3JlKHlfdGVzdCwgcHJlZHMpLCAiRjE6IiwgZjFfc2NvcmUoeV90ZXN0LCBwcmVkcykpCmpvYmxpYi5kdW1wKHsgIm1vZGVsIjogY2xmLCAiZmVhdHVyZXMiOiBsaXN0KFguY29sdW1ucykgfSwgTU9ERUxTX0RJUiAvICJhdHRyaXRpb25fcmYuam9ibGliIikKcHJpbnQoIlNhdmVkIG1vZGVsIik=", "audit_logger.py": "IyBhdWRpdF9sb2dnZXIucHkKaW1wb3J0IGpzb24sIG9zLCB0aW1lCmZyb20gcGF0aGxpYiBpbXBvcnQgUGF0aApCQVNFID0gUGF0aChfX2ZpbGVfXykucmVzb2x2ZSgpLnBhcmVudApPVVRfRElSID0gQkFTRSAvICJvdXRwdXQiCm9zLm1ha2VkaXJzKE9VVF9ESVIsIGV4aXN0X29rPVRydWUpCkxPR19GSUxFID0gT1VUX0RJUiAvICJhdWRpdF9sb2cuanNvbmwiCmRlZiBsb2coZW50cnk6IGRpY3QpOgogICAgZW50cnkgPSBkaWN0KGVudHJ5KQogICAgZW50cnlbJ3RzJ10gPSB0aW1lLnN0cmZ0aW1lKCclWS0lbS0lZCAlSDolTTolUycsIHRpbWUubG9jYWx0aW1lKCkpCiAgICB3aXRoIG9wZW4oTE9HX0ZJTEUsICdhJykgYXMgZjoKICAgICAgICBmLndyaXRlKGpzb24uZHVtcHMoZW50cnkpICsgJ1xuJykKICAgIHByaW50KCJMb2dnZWQgYXVkaXQgZW50cnkgdG8iLCBMT0dfRklMRSk=", "rag_query.py": "IyByYWdfcXVlcnkucHkKIyBVc2VzIHNhbWUgc21hbGwtbW9kZWwgY2FuZGlkYXRlIGFzIGluZGV4X2RvY3M7IHRyaWVzIHNtYWxsIG1vZGVsIHRoZW4gZmFsbGJhY2suCmltcG9ydCBvcywgam9ibGliLCBqc29uLCBhcmdwYXJzZQpmcm9tIHBhdGhsaWIgaW1wb3J0IFBhdGgKZnJvbSBhdWRpdF9sb2dnZXIgaW1wb3J0IGxvZwoKTU9ERUxfQ0FORElEQVRFID0gInBhcmFwaHJhc2UtTWluaUxNLUwzLXYyIgpGQUxMQkFDS19NT0RFTCA9ICJhbGwtTWluaUxNLUw2LXYyIgoKQkFTRSA9IFBhdGgoX19maWxlX18pLnJlc29sdmUoKS5wYXJlbnQKTU9ERUxTX0RJUiA9IEJBU0UgLyAibW9kZWxzIgpEQVRBX0RJUiA9IEJBU0UgLyAiZGF0YSIKCnBhcnNlciA9IGFyZ3BhcnNlLkFyZ3VtZW50UGFyc2VyKCkKcGFyc2VyLmFkZF9hcmd1bWVudCgnLS1yZXN1bWVfcGF0aCcsIHR5cGU9c3RyLCBkZWZhdWx0PXN0cihEQVRBX0RJUiAvICdyZXN1bWVzJyAvICdyZXN1bWVfMS50eHQnKSkKcGFyc2VyLmFkZF9hcmd1bWVudCgnLS10b3BrJywgdHlwZT1pbnQsIGRlZmF1bHQ9MykKYXJncyA9IHBhcnNlci5wYXJzZV9hcmdzKCkKCnJlc3VtZV90ZXh0ID0gUGF0aChhcmdzLnJlc3VtZV9wYXRoKS5yZWFkX3RleHQoKQpwcmludCgiTG9hZGVkIHJlc3VtZToiLCBhcmdzLnJlc3VtZV9wYXRoKQoKcmV0cmlldmVkID0gW10KaWYgKE1PREVMU19ESVIgLyAiZG9jX2luZGV4LmpvYmxpYiIpLmV4aXN0cygpOgogICAgaWR4ID0gam9ibGliLmxvYWQoTU9ERUxTX0RJUiAvICJkb2NfaW5kZXguam9ibGliIikKICAgIGltcG9ydCBudW1weSBhcyBucAogICAgZnJvbSBudW1weS5saW5hbGcgaW1wb3J0IG5vcm0KICAgIHRyeToKICAgICAgICBmcm9tIHNlbnRlbmNlX3RyYW5zZm9ybWVycyBpbXBvcnQgU2VudGVuY2VUcmFuc2Zvcm1lcgogICAgICAgICMgVHJ5IHRvIHVzZSB0aGUgc21hbGwgbW9kZWwgZm9yIHF1ZXJ5IGVuY29kaW5nIChmYXN0KQogICAgICAgIHRyeToKICAgICAgICAgICAgbW9kZWwgPSBTZW50ZW5jZVRyYW5zZm9ybWVyKE1PREVMX0NBTkRJREFURSkKICAgICAgICBleGNlcHQgRXhjZXB0aW9uOgogICAgICAgICAgICBtb2RlbCA9IFNlbnRlbmNlVHJhbnNmb3JtZXIoRkFMTEJBQ0tfTU9ERUwpCiAgICAgICAgcV9lbWIgPSBtb2RlbC5lbmNvZGUoW3Jlc3VtZV90ZXh0XSlbMF0KICAgICAgICBlbWJzID0gaWR4WydlbWJlZGRpbmdzJ10KICAgICAgICBzY29yZXMgPSAoZW1icyBAIHFfZW1iKSAvICgobm9ybShlbWJzLCBheGlzPTEpICogbm9ybShxX2VtYikpICsgMWUtOCkKICAgICAgICB0b3BrX2lkeCA9IGxpc3Qoc2NvcmVzLmFyZ3NvcnQoKVstYXJncy50b3BrOl1bOjotMV0pCiAgICAgICAgcmV0cmlldmVkID0gW3sgInNjb3JlIjogZmxvYXQoc2NvcmVzW2ldKSwgInRleHQiOiBpZHhbJ2RvY3MnXVtpXSwgIm1ldGEiOiBpZHhbJ21ldGFkYXRhJ11baV0gfSBmb3IgaSBpbiB0b3BrX2lkeF0KICAgIGV4Y2VwdCBFeGNlcHRpb24gYXMgZToKICAgICAgICBwcmludCgiRmFpbGVkIHRvIHJlLWVuY29kZSBxdWVyeSB3aXRoIHNlbnRlbmNlLXRyYW5zZm9ybWVyczoiLCBlKQogICAgICAgIHJldHJpZXZlZCA9IFtdCmVsaWYgKE1PREVMU19ESVIgLyAiZG9jX2luZGV4X3RmaWRmLmpvYmxpYiIpLmV4aXN0cygpOgogICAgaWR4ID0gam9ibGliLmxvYWQoTU9ERUxTX0RJUiAvICJkb2NfaW5kZXhfdGZpZGYuam9ibGliIikKICAgIHZlY3Rvcml6ZXIgPSBpZHhbJ3ZlY3Rvcml6ZXInXQogICAgbm4gPSBpZHhbJ25uJ10KICAgIHF2ID0gdmVjdG9yaXplci50cmFuc2Zvcm0oW3Jlc3VtZV90ZXh0XSkKICAgIGRpc3RzLCBpZHMgPSBubi5rbmVpZ2hib3JzKHF2LCBuX25laWdoYm9ycz1hcmdzLnRvcGspCiAgICByZXRyaWV2ZWQgPSBbXQogICAgZm9yIGksIGRpc3QgaW4gemlwKGlkc1swXSwgZGlzdHNbMF0pOgogICAgICAgIHJldHJpZXZlZC5hcHBlbmQoeyAic2NvcmUiOiBmbG9hdCgxIC0gZGlzdCksICJ0ZXh0IjogaWR4Wydkb2NzJ11baV0sICJtZXRhIjogaWR4WydtZXRhZGF0YSddW2ldIH0pCmVsc2U6CiAgICBwcmludCgiTm8gaW5kZXggZm91bmQuIFJ1biBpbmRleF9kb2NzLnB5IGZpcnN0LiIpCiAgICByZXRyaWV2ZWQgPSBbXQoKcHJpbnQoIlxuVG9wIHJldHJpZXZlZCBkb2NzIChzb3VyY2UsIHNjb3JlKToiKQpmb3IgciBpbiByZXRyaWV2ZWQ6CiAgICBwcmludCgiLSIsIHJbJ21ldGEnXVsnc291cmNlJ10sIGYiKHNjb3JlPXtyWydzY29yZSddOi4zZn0pIikKCiMgU2ltcGxlIHRlbXBsYXRlIGFuc3dlciAobm8gT3BlbkFJIHJlcXVpcmVkKQpzb3VyY2VzX2NpdGVkID0gW3JbJ21ldGEnXVsnc291cmNlJ10gZm9yIHIgaW4gcmV0cmlldmVkXQp0ZXh0ID0gcmVzdW1lX3RleHQubG93ZXIoKQppZiAnbWFjaGluZSBsZWFybmluZycgaW4gdGV4dCBvciAnZGF0YScgaW4gdGV4dCBvciAnbWwnIGluIHRleHQ6CiAgICByZWMgPSAiQ29uc2lkZXIgZm9yIERhdGEvTUwgcm9sZSDigJQgc3Ryb25nIHJlbGV2YW50IHNraWxscy4iCmVsc2U6CiAgICByZWMgPSAiQ29uc2lkZXIgd2l0aCBjYXV0aW9uIOKAlCBpbnN1ZmZpY2llbnQgZG9tYWluLXNwZWNpZmljIHNraWxscy4iCnJlYXNvbnMgPSAiUmVjb21tZW5kYXRpb24gYmFzZWQgb24gc2tpbGxzIG1lbnRpb25lZCBpbiByZXN1bWUgYW5kIEVxdWFsIE9wcG9ydHVuaXR5IHBvbGljeS4iCmFuc3dlciA9IGYiUmVjb21tZW5kYXRpb246IHtyZWN9XG5SZWFzb25zOiB7cmVhc29uc31cbkNpdGVkIHNvdXJjZXM6IHsnLCAnLmpvaW4oc291cmNlc19jaXRlZCl9IgpwcmludCgiXG4tLS0gUkFHLXN0eWxlIEFuc3dlciAtLS1cbiIpCnByaW50KGFuc3dlcikKbG9nKHsgInF1ZXJ5IjogcmVzdW1lX3RleHRbOjIwMF0sICJyZXRyaWV2ZWQiOiBzb3VyY2VzX2NpdGVkLCAiYW5zd2VyX3N1bW1hcnkiOiBhbnN3ZXIuc3BsaXRsaW5lcygpWzBdIH0p", "eval_pipeline.py": "IyBldmFsX3BpcGVsaW5lLnB5CmltcG9ydCBvcwpmcm9tIHBhdGhsaWIgaW1wb3J0IFBhdGgKQkFTRSA9IFBhdGgoX19maWxlX18pLnJlc29sdmUoKS5wYXJlbnQKREFUQV9ESVIgPSBCQVNFIC8gImRhdGEiCk1PREVMU19ESVIgPSBCQVNFIC8gIm1vZGVscyIKCiMgY2xhc3NpZmljYXRpb24gZXZhbAppZiAoTU9ERUxTX0RJUiAvICJhdHRyaXRpb25fcmYuam9ibGliIikuZXhpc3RzKCk6CiAgICBpbXBvcnQgam9ibGliLCBwYW5kYXMgYXMgcGQKICAgIG0gPSBqb2JsaWIubG9hZChNT0RFTFNfRElSIC8gImF0dHJpdGlvbl9yZi5qb2JsaWIiKQogICAgbW9kZWwgPSBtWydtb2RlbCddOyBmZWF0dXJlcyA9IG1bJ2ZlYXR1cmVzJ10KICAgIGRmID0gcGQucmVhZF9jc3YoREFUQV9ESVIgLyAiYXR0cml0aW9uX3N5bnRoZXRpYy5jc3YiKQogICAgZGYyID0gcGQuZ2V0X2R1bW1pZXMoZGYsIGNvbHVtbnM9WyJHZW5kZXIiLCAiRWR1Y2F0aW9uIl0sIGRyb3BfZmlyc3Q9VHJ1ZSkKICAgIFggPSBkZjIucmVpbmRleChjb2x1bW5zPWZlYXR1cmVzLCBmaWxsX3ZhbHVlPTApOyB5ID0gZGYyWyJBdHRyaXRpb24iXQogICAgcHJlZHMgPSBtb2RlbC5wcmVkaWN0KFgpCiAgICBmcm9tIHNrbGVhcm4ubWV0cmljcyBpbXBvcnQgYWNjdXJhY3lfc2NvcmUsIGYxX3Njb3JlCiAgICBwcmludCgiQWNjOiIsIGFjY3VyYWN5X3Njb3JlKHksIHByZWRzKSwgIkYxOiIsIGYxX3Njb3JlKHksIHByZWRzKSkKZWxzZToKICAgIHByaW50KCJObyBtb2RlbCBmb3VuZC4gUnVuIHRyYWluX2NsYXNzaWZpZXIucHkiKQoKIyByZXRyaWV2YWwgZGVtbyBmb3IgcmVzdW1lcwpyZXN1bWVzID0gc29ydGVkKChEQVRBX0RJUiAvICJyZXN1bWVzIikuZ2xvYigiKi50eHQiKSkKZm9yIHIgaW4gcmVzdW1lczoKICAgIHByaW50KCJcbi0tLSIsIHIubmFtZSwgIi0tLSIpCiAgICBvcy5zeXN0ZW0oZidweXRob24gcmFnX3F1ZXJ5LnB5IC0tcmVzdW1lX3BhdGggIntyfSIgLS10b3BrIDInKQ==", "paper_outline.md": "UGFwZXIgb3V0bGluZSAoc3VnZ2VzdGVkKSAtICJCdWlsZGluZyBFdGhpY2FsIEFJIGluIEh1bWFuIFJlc291cmNlIE1hbmFnZW1lbnQ6IEFuIEVuZC10by1FbmQgR292ZXJuYW5jZSBQT0MiCjEuIEFic3RyYWN0CjIuIEludHJvZHVjdGlvbgozLiBSZWxhdGVkIFdvcmsKNC4gU3lzdGVtIERlc2lnbgo1LiBFeHBlcmltZW50cwo2LiBSZXN1bHRzCjcuIERpc2N1c3Npb24KOC4gQ29uY2x1c2lvbgo=", "requirements.txt": "c2VudGVuY2UtdHJhbnNmb3JtZXJzPT0yLjIuMgpzY2lraXQtbGVhcm4KcGFuZGFzCmpvYmxpYgpvcGVuYWkK"}

for fname, b64 in files_b64.items():
    data = base64.b64decode(b64.encode('ascii')).decode('utf-8')
    p = BASE / fname
    p.write_text(data)
    print('Wrote', p)

print('All files written to', BASE)


Wrote /content/hr_poc/README.md
Wrote /content/hr_poc/prepare_data.py
Wrote /content/hr_poc/index_docs.py
Wrote /content/hr_poc/train_classifier.py
Wrote /content/hr_poc/audit_logger.py
Wrote /content/hr_poc/rag_query.py
Wrote /content/hr_poc/eval_pipeline.py
Wrote /content/hr_poc/paper_outline.md
Wrote /content/hr_poc/requirements.txt
All files written to /content/hr_poc


In [3]:
# Run the pipeline: prepare data, build index, train classifier, demo RAG query, and evaluation.
import os
os.chdir('/content/hr_poc')
print('Running prepare_data.py ...')
!python prepare_data.py
print('\nRunning index_docs.py ...')
!python index_docs.py
print('\nRunning train_classifier.py ...')
!python train_classifier.py
print('\nRunning rag_query.py demo ...')
!python rag_query.py --resume_path data/resumes/resume_1.txt
print('\nRunning eval_pipeline.py ...')
!python eval_pipeline.py
print('\nDone. Check /content/hr_poc/output and /content/hr_poc/models for artifacts.')

Running prepare_data.py ...
Wrote synthetic attrition dataset: /content/hr_poc/data/attrition_synthetic.csv
Wrote sample resumes under /content/hr_poc/data/resumes
Wrote policy docs under /content/hr_poc/data/policy_docs
Written config.json

Running index_docs.py ...
Loaded 14 docs into corpus
Failed to load small model (or encoding error): cannot import name 'cached_download' from 'huggingface_hub' (/usr/local/lib/python3.12/dist-packages/huggingface_hub/__init__.py)
Falling back to all-MiniLM-L6-v2
Dense embedding failed; falling back to TF-IDF cannot import name 'cached_download' from 'huggingface_hub' (/usr/local/lib/python3.12/dist-packages/huggingface_hub/__init__.py)
Saved TF-IDF index

Running train_classifier.py ...
Acc: 0.79 F1: 0.0
Saved model

Running rag_query.py demo ...
Loaded resume: data/resumes/resume_1.txt

Top retrieved docs (source, score):
- /content/hr_poc/data/resumes/resume_1.txt (score=1.000)
- /content/hr_poc/data/resumes/resume_3.txt (score=0.870)
- /content

## Optional: use OpenAI API for LLM-generated answers

If you want `rag_query.py` to call OpenAI for richer answers, add your API key to the environment before running the demo. Do **not** share your key. Example:

```python
from getpass import getpass
import os
os.environ['OPENAI_API_KEY'] = getpass('OpenAI API key: ')
```

In [None]:
print('Notebook created. Run the cells in order to execute the POC.')

In [None]:
import shutil
shutil.make_archive('/content/hr_poc_artifacts', 'zip', '/content/hr_poc')
print('Created /content/hr_poc_artifacts.zip')