# Example notebook with secrets and PII

## Secrets

A common vulnerability with jupyter notebooks is improper handling of secrets. It's common to put the secrets right in the jupyter notebook cells, and since jupyter notebook contents do not typically pass code review, the contents can find themselves on github or other source code managers.

This notebook is an example that shows the vulnerability

In [4]:
OPEN_AI_KEY = "sk-bBP_fhSH9cT_G5q-CtWtYqzIkuNS3A66c2jEasj0rRT3BlbkFJX1loYRP9p5kVwjpwbN65_LGV_9sUtM3wSOMFOmN-4A"  # Not  real key - shuffled around


In [3]:
import requests

def make_openai_call(prompt: str, api_key: str) -> str:
    url = "https://api.openai.com/v1/chat/completions"
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    payload = {
        "model": "gpt-4o-mini",
        "messages": [{"role": "user", "content": prompt}],
        "temperature": 0.7
    }
    
    try:
        response = requests.post(url, headers=headers, json=payload)
        response.raise_for_status()
        data = response.json()
        return data["choices"][0]["message"]["content"].strip()
    except requests.RequestException as e:
        return f"An error occurred while requesting the OpenAI API: {e}"
    except KeyError as e:
        return f"Unexpected response structure: {e}"



In [3]:
make_openai_call(
    prompt = "how much wood would a woodchuck chuck if a woodchuck could chuck wood",
    api_key=OPEN_AI_KEY)

"The classic tongue twister suggests that a woodchuck would chuck as much wood as a woodchuck could chuck if it could chuck wood! In a playful interpretation, it's often humorously stated that a woodchuck could chuck about 700 pounds of wood, based on various whimsical calculations. However, in reality, woodchucks (also known as groundhogs) don't actually chuck wood at all!"

---

## PII

Another common issue is that PII is downloaded and used/stored irrepsonsibly. 

#### Download an example CSV with PII

In [9]:
requests.get("https://raw.githubusercontent.com/tokern/piicatcher/master/tests/samples/sample-data.csv").text

'id,gender,birthdate,maiden_name,lname,fname,address,city,state,zip,phone,email,cc_type,cc_number,cc_cvc,cc_expiredate\r\n172-32-1176,m,1958/04/21,Smith,White,Johnson,10932 Bigge Rd,Menlo Park,CA,94025,408 496-7223,jwhite@domain.com,m,5270 4267 6450 5516,123,2010/06/25\r\n514-14-8905,f,1944/12/22,Amaker,Borden,Ashley,4469 Sherman Street,Goff,KS,66428,785-939-6046,aborden@domain.com,m,5370 4638 8881 3020,713,2011/02/01\r\n213-46-8915,f,1958/04/21,Pinson,Green,Marjorie,309 63rd St. #411,Oakland,CA,94618,415 986-7020,mgreen@domain.com,v,4916 9766 5240 6147,258,2009/02/25\r\n524-02-7657,m,1962/03/25,Hall,Munsch,Jerome,2183 Roy Alley,Centennial,CO,80112,303-901-6123,jmunsch@domain.com,m,5180 3807 3679 8221,612,2010/03/01\r\n489-36-8350,m,1964/09/06,Porter,Aragon,Robert,3181 White Oak Drive,Kansas City,MO,66215,816-645-6936,raragon@domain.com,v,4929 3813 3266 4295,911,2011/12/01\r\n514-30-2668,f,1986/05/27,Nicholson,Russell,Jacki,3097 Better Street,Kansas City,MO,66215,913-227-6106,jrussell@

Add a cell with a comment block including `MustEr/gpt2-elite` but not actually using it.

In [2]:
"""
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "MustEr/gpt2-elite"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
"""

'\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\n\nmodel_name = "MustEr/gpt2-elite"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForCausalLM.from_pretrained(model_name)\n'

Same for `microsoft/deberta-v3-xsmall`

In [3]:
"""
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "microsoft/deberta-v3-xsmall"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
"""

'\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification\n\nmodel_name = "microsoft/deberta-v3-xsmall"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForSequenceClassification.from_pretrained(model_name)\n'