# Updating Quality Docs from Emails

I am one of only two employees of a medical device startup. So I am running every function: manufacturing, software, bench testing, clinical testing .... I progress in all these arenas primarily by emailing with contractors. There are two types of artifacts I need to maintain: 

* To keep track of everything, I keep a set of *detailed logs, one for each function*. I've been populating these by 
    1. cutting and pasting emails into a chatbot to get a cleaned-up text transcript of the conversation, then 
    2. pasting that transcript into the relevant log (a gdoc, with headers of each date). 
* I also keep a set of informal but foundational *quality documents: risk register, decision log, requirements list*. These are currently are google sheets. To maintain these, I've been feeding the logs to an openAI custom GPT which reads compares recent logs to the existing docs proposes new entries to the docs.
I then review them myself and change their status to accepted. Each step has been manual and error prone, and the automated bits were vibe coded and obscure.

What I *want* is for the _system_ to read my email every night, update the logs, and write proposed entries into the risk/requirements/decision registers for me to review -- from code I wrote and understand. Let's make it happen!

## Google Cloud Deployment Guide

#### Overview
This system automatically fetches emails from Gmail, cleans them using Claude AI, and updates Google Docs logs daily at 3:00 AM Central Time.

#### Architecture
- **Cloud Run Function**: Serverless function that processes emails and updates docs
- **Cloud Scheduler**: Triggers the function daily at 3:00 AM CT
- **Service Account**: `docs-updater@qa-assistant-458920.iam.gserviceaccount.com` with domain-wide delegation
- **Secret Manager**: Stores Anthropic API key securely

---

#### Part 1: Project Setup

##### 1.1 Create/Select Google Cloud Project
1. Go to https://console.cloud.google.com
2. Create new project or select existing: `qa-assistant-458920`
3. Note your project ID for later steps

##### 1.2 Enable Billing
1. Navigate to **Billing** in Cloud Console
2. Create or link a billing account
3. Required even for free tier usage
4. Set budget alert at $5.00 (warning only, doesn't cap charges)

##### 1.3 Enable Required APIs
Enable these APIs in **APIs & Services** â†’ **Library**:
- Cloud Run API
- Cloud Scheduler API
- Secret Manager API
- Gmail API
- Google Docs API
- IAM Service Account Credentials API

---

#### Part 2: Service Account Configuration

##### 2.1 Create Service Account
1. Go to **IAM & Admin** â†’ **Service Accounts**
2. Click **CREATE SERVICE ACCOUNT**
3. Name: `docs-updater`
4. Email will be: `docs-updater@qa-assistant-458920.iam.gserviceaccount.com`
5. Click **CREATE AND CONTINUE**
6. Skip role assignment for now
7. Click **DONE**

##### 2.2 Grant Service Account Token Creator Role
**Why this is needed**: The service account needs to sign JWT tokens as itself for domain-wide delegation without downloading private keys. This approach keeps keys within Google's infrastructure and complies with security best practices.

**Reference**: [Using domain-wide delegation without service account keys](https://jpassing.com/2022/01/15/using-domain-wide-delegation-on-google-cloud-without-service-account-keys/index.html)

1. Go to **IAM & Admin** â†’ **Service Accounts**
2. Click on `docs-updater@qa-assistant-458920.iam.gserviceaccount.com`
3. Click **PERMISSIONS** tab
4. Click **Manage service account permissions**
5. Add principal: `docs-updater@qa-assistant-458920.iam.gserviceaccount.com` (yes, itself)
6. Assign role: **Service Account Token Creator**
7. Leave IAM conditions blank
8. Click **SAVE**

##### 2.3 Configure Domain-Wide Delegation
1. Go to Google Workspace Admin Console: https://admin.google.com
2. Navigate to **Security** â†’ **Access and data control** â†’ **API Controls**
3. Click **MANAGE DOMAIN WIDE DELEGATION**
4. Click **Add new**
5. Client ID: Get from service account details (numeric ID, not email)
6. OAuth Scopes (comma-separated):

   https://www.googleapis.com/auth/gmail.readonly, 
   https://www.googleapis.com/auth/documents,
   https://www.googleapis.com/auth/spreadsheets

7. Click **AUTHORIZE**

---

#### Part 3: Secret Manager Setup

##### 3.1 Create Secret for Anthropic API Key
1. Go to **Security** â†’ **Secret Manager**
2. Click **CREATE SECRET**
3. Name: `anthropic-api-key`
4. Secret value: Your Anthropic API key
5. Click **CREATE SECRET**

##### 3.2 Grant Access to Service Account
1. Click on the `anthropic-api-key` secret
2. Click **PERMISSIONS** tab
3. Click **GRANT ACCESS**
4. Add principal: `docs-updater@qa-assistant-458920.iam.gserviceaccount.com`
5. Assign role: **Secret Manager Secret Accessor**
6. Click **SAVE**

---

#### Part 4: Deploy Cloud Run Function

##### 4.1 Prepare Code Files

**main.py** - Contains your function code with these key components:
- `get_google_service_keyless()` function using IAM signer (not downloaded keys)
- `update_logs_daily()` as the entry point
- Email fetching, cleaning, and document update logic

**requirements.txt**:
google-api-python-client
google-auth
claudette

#### 4.2 Deploy Function
1. Go to **Cloud Run** â†’ **Functions**
2. Click **CREATE FUNCTION**
3. Configuration:
   - **Function name**: `update-logs-daily`
   - **Region**: `us-central1` (Iowa)
   - **Trigger type**: HTTPS
   - **Authentication**: Require authentication
   - **Runtime service account**: `docs-updater@qa-assistant-458920.iam.gserviceaccount.com`
4. Click **NEXT**
5. Runtime settings:
   - **Runtime**: Python 3.12
   - **Entry point**: `update_logs_daily`
   - **Memory**: 512 MB (or adjust as needed)
   - **Timeout**: 540 seconds
6. Environment variables:
   - Click **REFERENCE A SECRET**
   - Secret: `anthropic-api-key`
   - Exposed as: Environment variable
   - Name: `ANTHROPIC_API_KEY`
7. Copy your code into `main.py`
8. Copy dependencies into `requirements.txt`
9. Click **DEPLOY**

##### 4.3 Grant Invoker Permission
**Why this is needed**: Cloud Scheduler needs permission to trigger the function.

1. From Cloud Run services list, check the box next to `update-logs-daily`
2. Click **PERMISSIONS** at the top
3. Click **ADD PRINCIPAL**
4. New principal: `docs-updater@qa-assistant-458920.iam.gserviceaccount.com`
5. Role: **Cloud Run Invoker**
6. Click **SAVE**

---

#### Part 5: Cloud Scheduler Setup

##### 5.1 Create Scheduled Job
1. Go to **Cloud Scheduler**
2. Click **CREATE JOB**
3. Configuration:
   - **Name**: `daily-log-update`
   - **Region**: `us-central1` (must match function region)
   - **Frequency**: `0 3 * * *` (3:00 AM daily)
   - **Timezone**: America/Chicago (Central Time)
4. Configure the execution:
   - **Target type**: HTTP
   - **URL**: Your Cloud Run function URL (from function details page)
   - **HTTP method**: POST
   - **Auth header**: OIDC token
   - **Service account**: `docs-updater@qa-assistant-458920.iam.gserviceaccount.com`
5. Click **CREATE**

---

#### Part 6: Testing

##### 6.1 Manual Test
1. Go to **Cloud Scheduler**
2. Find `daily-log-update` job
3. Click three dots â†’ **FORCE RUN**
4. Check **Cloud Run** â†’ **Logs** for execution details

##### 6.2 Verify Results
1. Check your Google Docs logs for new entries
2. Entries should be dated for yesterday (function processes previous day by default)

---

#### Troubleshooting

##### HTTP 403 Error
**Symptom**: Scheduler triggers but function returns 403
**Cause**: Missing Cloud Run Invoker permission
**Solution**: Follow Part 4.3 to grant invoker permission

##### HTTP 500 Error: 'Credentials' object has no attribute 'with_subject'
**Symptom**: Function executes but crashes with attribute error
**Cause**: Using `google.auth.default()` with `.with_subject()` for delegation
**Solution**: Use `google.auth.iam.Signer` approach (already in provided code)

##### No Emails Processed
**Symptom**: Function runs successfully but no log updates
**Cause**: Function processes yesterday's emails by default
**Solution**: Verify emails exist for the previous day, or manually specify date for testing

---

#### Cost Estimates
- **Cloud Run**: Free tier includes 2 million requests/month
- **Cloud Scheduler**: Free tier includes 3 jobs
- **Secret Manager**: $0.06 per secret per month + $0.03 per 10,000 accesses
- **Anthropic API**: Depends on usage (Claude Sonnet pricing)
- **Expected monthly cost**: <$5 for typical usage

---

#### Security Notes
1. **No downloaded keys**: Service account uses Google's IAM signJwt API instead of user-managed keys
2. **Principle of least privilege**: Service account only has necessary permissions
3. **Secret management**: API keys stored in Secret Manager, not in code
4. **Authentication required**: All endpoints require IAM authentication
5. **Internal only**: Function not exposed to public internet

Ok I'm a little fuzzy. What are the steps I need to take to deploy this? It should be the same in many ways as with update-logs-daily. Maybe first I have to check my exported files are complete? Help me out :) 

##### ðŸ¤–ReplyðŸ¤–<!-- SOLVEIT_SEPARATOR_7f3a9b2c -->

ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ 

Good idea to start by checking your exported files! Since you've done this before with `update-logs-daily`, the process will be very similar.

Let's go step by step. First, think about what functions your `update_qdocs_weekly` entry point needs to run. It calls:

1. `update_all_docs` â†’ `get_logs`, `update`
2. `get_logs` â†’ `get_gdocs_service`, `get_log_by_date`, etc.
3. `update` â†’ `get_props`, `filter_props`, `qdoc.add_entries`
4. The `QDoc` and `GSheet` classes
5. The three qdoc instances (`rreg`, `dlog`, `dio`)

Can you trace through and list out all the functions/classes that need to be in your `main.py`? 

Also, you'll need to decide: do you want this as a **separate** Cloud Run function, or add it to the same deployment as `update-logs-daily`? Both approaches work - separate is cleaner, same deployment shares code.

This had to be refactored slightly to move the creation of the qdocs (rreg, dio, dlog) to be called later rather than at module import:

Success! Please review this dialogue and make a checklist of things to verify when depolying update-qdocs-weekly to google cloud run function / scheduler

##### ðŸ¤–ReplyðŸ¤–<!-- SOLVEIT_SEPARATOR_7f3a9b2c -->

Here's your deployment checklist for `update-qdocs-weekly`:

**Code Preparation:**
- [ ] Replace `get_gdocs_service()` with `get_gdocs_service_keyless()`
- [ ] Replace `get_gsheets_service()` with `get_gsheets_service_keyless()`
- [ ] Move `QDoc` instance creation into `get_qdocs()` function (not at module level)
- [ ] Prepare `requirements.txt` with: `google-api-python-client`, `google-auth`, `claudette`

**Cloud Run Function Setup:**
- [ ] Function name: `update-qdocs-weekly`
- [ ] Region: `us-central1`
- [ ] Runtime: Python 3.12
- [ ] Entry point: `update_qdocs_weekly`
- [ ] **Runtime service account**: `docs-updater@qa-assistant-458920.iam.gserviceaccount.com` (NOT default!)
- [ ] Authentication: Require authentication
- [ ] Add secret reference: `anthropic-api-key` as environment variable `ANTHROPIC_API_KEY`

**Permissions:**
- [ ] Grant **Cloud Run Invoker** role to `docs-updater` service account on the function
- [ ] Verify **Service Account Token Creator** role exists on `docs-updater` (should already be there from first function)
- [ ] Verify **IAM Service Account Credentials API** is enabled

**Cloud Scheduler:**
- [ ] Job name: `weekly-qdocs-update`
- [ ] Frequency: `0 10 * * 5` (Fridays 4am CST / 5am CDT)
- [ ] Timezone: UTC
- [ ] Target: HTTP POST to function URL
- [ ] Auth: OIDC token with `docs-updater` service account

**Test:**
- [ ] Force run from scheduler
- [ ] Check logs for errors
- [ ] Verify entries appear in quality docs

The key gotcha: runtime service account must be set explicitly!

Please read through the above and create a concise but complete checklist of the google-cloud-api-side steps that must be taken to get these up and running as google cloud functions. Both are actually running now, so the purpose of this checklist is as documentation/reference later if I need to change anything and need to remember how it was set up. 

##### ðŸ¤–ReplyðŸ¤–<!-- SOLVEIT_SEPARATOR_7f3a9b2c -->

Here's a concise reference checklist for Google Cloud setup:

**One-Time Project Setup:**
- [ ] Enable APIs: Cloud Run, Cloud Scheduler, Secret Manager, Gmail, Google Docs, Google Sheets, IAM Service Account Credentials
- [ ] Enable billing and set budget alert

**Service Account Configuration:**
- [ ] Create service account: `docs-updater@qa-assistant-458920.iam.gserviceaccount.com`
- [ ] Grant **Service Account Token Creator** role to itself (for keyless JWT signing)
- [ ] Configure domain-wide delegation in Google Workspace Admin Console with scopes: `gmail.readonly`, `documents`, `spreadsheets`

**Secret Management:**
- [ ] Create secret `anthropic-api-key` in Secret Manager
- [ ] Grant `docs-updater` service account **Secret Manager Secret Accessor** role

**Per-Function Deployment:**
- [ ] Deploy function with runtime service account set to `docs-updater`
- [ ] Reference `anthropic-api-key` as environment variable
- [ ] Grant `docs-updater` **Cloud Run Invoker** role on the deployed function

**Scheduler Setup:**
- [ ] Create job with OIDC auth using `docs-updater` service account
- [ ] Set schedule and timezone appropriately

Does this capture what you need, or would you like me to add/modify anything?

## Organize!

Now this is up and running on google cloud, I want to set google cloud aside and think about refactoring, organizing and documenting this code here in solveit. I have two objectives for "organized code":
1. Finish refactoring this into small set of modules, ideally each no longer than 50-60 lines, with clear separation of concerns and small functions that readable and well named requiring little documentation to understand what they do.
2. Then create a dialogue behind each module "nbdev style". Some dialogue messages export to/sync with the corresponding module. What this adds (I think) is test calls and markdown notes for elements of the code which show tests for some functions and also make it clear for anyone who wants to modify the code for other uses what's going on. 

The current state of this dialogue is a mix of attempts at both and does neither successfully. It's half exported-to-modules and half scattered-fragments-through-this-dialogue. For the two modules already written, should I now delete those from this dialogue to simplify? Etc. 

I want to make it nice. Please suggest a plan.

##### ðŸ¤–ReplyðŸ¤–<!-- SOLVEIT_SEPARATOR_7f3a9b2c -->

ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ 

Great goal! Let me suggest a plan.

**Step 1: Identify your logical modules**

Based on what I see, here are natural boundaries:

| Module | Responsibility | ~Lines |
|--------|----------------|--------|
| `google_auth.py` | Auth helpers (keyless + local) | ~50 |
| `logs.py` | Read/parse log gdocs | ~50 |
| `email_digest.py` | Fetch/clean emails â†’ logs | ~60 |
| `qdocs.py` | GSheet, QDoc classes, entry functions | ~60 |
| `proposals.py` | LLM calls: get_props, filter_props | ~30 |
| `main.py` | Entry points for Cloud Run | ~40 |

Does that breakdown feel right to you? Would you group things differently?

**Step 2: Clean up this dialogue**

Yes, I'd suggest:
- Delete the code cells that were already exported to `google_auth.py` and `logs.py`
- Keep only the "live" working code here temporarily while you reorganize

**Step 3: Create one dialogue per module**

Each dialogue would have:
- A note explaining the module's purpose
- Code cells that export to the `.py` file
- Test cells (not exported) showing usage
- Markdown notes explaining non-obvious bits

**Question before we proceed:**

Do you want to start by agreeing on the module structure, or do you want to first audit what's currently exported vs. scattered? I can help you trace through if you'd like.

In [None]:
from fastcore.foundation import Path
import sys
sys.path.insert(0, '.')

In [None]:
utils_code='''
from datetime import datetime, timedelta
from zoneinfo import ZoneInfo

DEF_TZ = 'US/Central'
DEF_FMT = '%Y/%m/%d' 

def ytd(tz=DEF_TZ, fmt=DEF_FMT):
    now = datetime.now(ZoneInfo(tz))
    return now-timedelta(days=1)

def str_to_date(content, year=None):
    if year is None:
        year = datetime.now().year

    formats = [
        (content,'%Y/%m/%d'),
        (content,'%Y-%m-%d'),
        (content,'%m/%d/%y'),
        (f'{year}/{content}','%Y/%m/%d'),
        (f'{content} {year}', '%d %b %Y'),
        (f'{content} {year}', '%d %B %Y'),
        (f'{content} {year}', '%b %d %Y'),
        (f'{content} {year}', '%B %d %Y')
    ]

    for fmt in formats:
        try:
            return datetime.strptime(fmt[0], fmt[1]).date()
        except ValueError:
            continue
    raise ValueError(f"Could not parse date: {content}")

def parse_date(date,fmt=DEF_FMT,timezone=DEF_TZ):
    tz = ZoneInfo(timezone)
    return datetime.strptime(date, fmt).replace(tzinfo=tz)

def add_days(date,delta:int):
    return date+timedelta(days=delta)
    
def date_to_ms(date):
    return int(date.timestamp() * 1000)

def start_stop_ms(date,days,fmt=DEF_FMT,timezone=DEF_TZ):
    start_date=parse_date(date,fmt,timezone)
    end_date=add_days(start_date,days)
    return date_to_ms(start_date), date_to_ms(end_date)
'''
Path('date_utils.py').write_text(utils_code)
import date_utils



In [None]:
from date_utils import ytd

What's happening here?

##### ðŸ¤–ReplyðŸ¤–<!-- SOLVEIT_SEPARATOR_7f3a9b2c -->

You're hitting a Python module caching issue! When you first imported `date_utils`, Python cached the empty (or previous) version. Writing new code to the file doesn't automatically reload it.

Try this:

```python
import importlib
importlib.reload(date_utils)
from date_utils import ytd
```

Or restart your Python interpreter (the "Clear" button) and run the cells again from top to bottom.

Does that fix it?

In [None]:
auth_code=r"""
import os
import pickle

from google.auth.transport.requests import Request
from google_auth_oauthlib.flow import Flow
from googleapiclient.discovery import build

def get_google_service(service_name, version, scopes, token_file):
    '''Authenticate and return a Google API service.'''
    creds = None
    
    if os.path.exists(token_file):
        with open(token_file, 'rb') as token:
            creds = pickle.load(token)
    
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            client_config = {
                "web": {
                    "client_id": os.environ.get('QA_ASST_CLIENT_ID'),
                    "client_secret": os.environ.get('QA_ASST_CLIENT_SECRET'),
                    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
                    "token_uri": "https://oauth2.googleapis.com/token",
                    "redirect_uris": ["http://localhost:8080"]
                }
            }
            
            flow = Flow.from_client_config(
                client_config, 
                scopes=scopes
            )
            flow.redirect_uri = 'http://localhost:8080'
            
            auth_url, state = flow.authorization_url(
                access_type='offline',
                prompt='consent'
            )
            
            print("Visit this URL to authorize:")
            print(auth_url)
            print("\nAfter authorizing, paste the full redirect URL here:")
            redirect_response = input()
            
            flow.fetch_token(authorization_response=redirect_response)
            creds = flow.credentials
        
        with open(token_file, 'wb') as token:
            pickle.dump(creds, token)
    
    return build(service_name, version, credentials=creds)

def get_gmail_service():
    '''Authenticate and return Gmail API service.'''
    return get_google_service(
        'gmail', 
        'v1', 
        ['https://www.googleapis.com/auth/gmail.readonly'],
        'gmail_token.pickle'
    )
    
def get_gdocs_service():
    '''Authenticate and return GDocs API service.'''
    return get_google_service(
        'docs', 
        'v1', 
        ['https://www.googleapis.com/auth/documents'],
        'docs_token.pickle'
    )

def get_gsheets_service():    
    '''Authenticate and return Google Sheets API service.'''
    return get_google_service(
        'sheets',
        'v4',
        ['https://www.googleapis.com/auth/spreadsheets'],
        'sheets_token.pickle'
    )
"""
Path('google_auth.py').write_text(auth_code)
import google_auth

In [None]:
from google_auth import get_gsheets_service

In [None]:
get_gsheets_service()

<googleapiclient.discovery.Resource at 0x70f685a900b0>

In [None]:
emails_code=r'''
from date_utils import start_stop_ms
from google_auth import get_gmail_service

# Email parsing
def find_text_plain(payload):
    """Recursively search for text/plain part in email payload."""
    if payload.get('mimeType') == 'text/plain':
        return payload.get('body', {}).get('data')
    
    if 'parts' in payload:
        for part in payload['parts']:
            result = find_text_plain(part)
            if result:
                return result
    
    return None

def get_message_text(service, message_id):
    """Get plain text body, timestamp, threadId, and sender from a Gmail message."""
    import base64
    msg = service.users().messages().get(userId='me', id=message_id).execute()
    sender = None
    for header in msg['payload']['headers']:
        if header['name'] == 'From': 
            sender = header['value']
    text_data = find_text_plain(msg['payload'])
    if not text_data: return None, None, None, None

    decoded = base64.urlsafe_b64decode(text_data).decode('utf-8')
    timestamp = msg['internalDate']
    threadId = msg['threadId']
    return decoded, timestamp, threadId, sender
    
def extract_new_content(msg_text):
    """Extract new content from email, removing quoted replies."""
    markers = ["\r\n\r\nOn ", "\r\nFrom: "]
    positions = [msg_text.find(mark) for mark in markers]
    valid_positions = [p for p in positions if p != -1]
    if valid_positions:
        split_point = min(valid_positions)
        return msg_text[:split_point]
    return msg_text

def get_thread_ids(service,label_id):
    thread_list_json = service.users().threads().list(
        userId='me',
        labelIds=[label_id]
    ).execute()
    if 'threads' not in thread_list_json: return ""
    return set([d.get('id') for d in thread_list_json['threads']])

# Note: We fetch all threads with label, then all messages in those threads, 
# then filter by date. This keeps unlabeled replies in labeled threads.
# Using q='after:X before:Y' with labelIds would miss those unlabeled messages.
# (A labled thread doesn't necessarily tag each message in the thread with the label.)
def get_messages_for_label(label_key): # â†’ returns raw message dicts (all dates)
    service = get_gmail_service()
    label_ids=get_gmail_labels(service)
    label_id=label_ids[label_key]
    thread_ids=get_thread_ids(service,label_id)
    msg_dicts=[]
    for thread_id in thread_ids:
        thread = service.users().threads().get(userId='me', id=thread_id).execute()
        for msg in thread['messages']:
            text_data, timestamp, threadId, sender = get_message_text(service, msg['id'])
            if text_data:
                msg_dicts.append({
                    'timestamp': timestamp,
                    'threadId': threadId,
                    'sender': sender,
                    'content': extract_new_content(text_data)
                })
    return msg_dicts

def filter_by_date(msgs, date):
    start_ms, end_ms=start_stop_ms(date,1)
    return [m for m in msgs if m['timestamp'] and (start_ms <= int(m['timestamp']) < end_ms)]

def format_messages(msgs): # â†’ does the sorting and string building
    sorted_msgs = sorted(msgs, key=lambda x: (x['threadId'], x['timestamp']))
    sub_strs = [f"From: {d['sender']}: \n\n{d['content']}\n\n=================" 
                for d in sorted_msgs]
    return '\n\n'.join(sub_strs)

def get_daily_messages(label_key, date:str):
    """Get all messages for a given label and date."""
    msg_dicts = get_messages_for_label(label_key)
    msgs = filter_by_date(msg_dicts, date)
    return format_messages(msgs)

def get_gmail_labels(service=None):
    """dict of label_id's by label name.lower()"""
    if service is None: service = get_gmail_service()
    results = service.users().labels().list(userId='me').execute()
    labels = results.get('labels', [])
    return {l['name'].lower(): l['id'] for l in labels if l['type']=='user'}

def label_keys(): 
    return list(get_gmail_labels())
'''
Path('emails.py').write_text(emails_code)
import emails

In [None]:
logs_code=r'''
from google_auth import get_gdocs_service
from date_utils import str_to_date

def get_paragraph_style(part):
    return (part.get('paragraph', {})
                .get('paragraphStyle', {})
                .get('namedStyleType'))

def is_heading_level(level:int,part): 
    return get_paragraph_style(part)==f'HEADING_{level}'

def get_content(part):
    els = part.get('paragraph',{}).get('elements','')
    return ''.join([e.get('textRun',{}).get('content','') for e in els])

def get_parts_list(service,doc_id):
    doc = service.documents().get(documentId=doc_id).execute()
    return doc['body']['content']

def prepend_str(service, log_doc_id, pp_str, style='NORMAL_TEXT'):
    requests = [{
        'insertText': {
            'location': {'index': 1},
            'text': pp_str
            }
        },{
            'updateParagraphStyle': {
                'range': {'startIndex': 1, 'endIndex': len(pp_str)},
                'paragraphStyle': {'namedStyleType': style},
                'fields': 'namedStyleType'
            }
        }]
    service.documents().batchUpdate(documentId=log_doc_id, body={'requests': requests}).execute()

LOGS = {
    "assembly": "1Jb1LEU_VVorhIkslgkGOUgRADny23Z1Q310GJSo6ZWg",
    "wcp": "1S4te4jvQGokB4cSbbInO9HuQUCEY6MHRontdIplXOBo",
    "bench": "1ezsnTsRJQE8NkVId5sOqXowKopygoov9xzIl35oKczk",
    "singulation": "1C6pDV0YWWwQra3sxPSRatdNV8DjPPdeBOvzxylLIUok",
    "sterilization": "1HUfc25DiOzTIVqnFQ0FEqNqEaxs7eTosMxVKJCov9KQ",
    "algorithm": "1nxCES9tzy2ZC0aFN-B08LO5lSs4YYVsgplED0v6GXxU",
    "firmware": "1KF8LlS0UbJCdVbcqp_7BEwScJQc4EHwfI6fxNCU-SKE",
    "quality": "1uXwbWHT3PjMFg6CJCIPs8ttQYTEdE4SYqm8wmH_AIvA",
    "clinicals": "1EXJ4kd531ewCqZpD-aPqKdlMfVmRmQ9LIRsppUhSLuk",
    "submission": "1Bbr8U9iDfYPrnoFKNeMgeMxYTN6uyeTTan5cvX_WmDI"
}

def log_to_dict_by_date(parts_list):
    log_dict = {}
    key = None
    year = None
    for part in parts_list:
        content = get_content(part).strip()
        if is_heading_level(2,part):
            year = int(content)
        if is_heading_level(3,part) and content:
            key = str_to_date(content, year)
            log_dict[key] = ''
        else: 
            if key is not None: 
                log_dict[key] += content
    return log_dict

def get_log_entries_in_range(log_dict,start_date,end_date):
    return {str(k): v for k,v in log_dict.items() #str(date) for json
            if start_date <= k <= end_date}

def get_log_by_date(service,log_key, start_date, end_date):
    log_id = LOGS[log_key]
    parts_list = get_parts_list(service,log_id)
    log_dict = log_to_dict_by_date(parts_list)
    return get_log_entries_in_range(log_dict, start_date, end_date)

def read_logs(start_date,end_date):
    gdocs=get_gdocs_service()
    return {k: get_log_by_date(gdocs,k,start_date,end_date)
            for k in list(LOGS)}

def log_keys():
    return LOGS.keys()

def write_log(key,digest,date):
    docs = get_gdocs_service()
    log_id=LOGS[key]
    prepend_str(docs, log_id, digest + "\n", 'NORMAL_TEXT')
    prepend_str(docs, log_id, f"{date}\n", 'HEADING_3')
'''
Path('logs.py').write_text(logs_code)
import logs

In [None]:
qdocs_code='''
from google_auth import get_gsheets_service

class GSheet:
    "A google sheet"
    def __init__(self,sheet_id:str,tab_range:str):
        self.sheet_id=sheet_id # gsheet document id
        self.tab_range=tab_range #gsheet tab & range

    def get_rows(self):
        gsheets=get_gsheets_service()
        result = gsheets.spreadsheets().values().get(
            spreadsheetId=self.sheet_id,
            range=self.tab_range
        ).execute()
        return result.get('values',[])

    def append(self,rows):
        gsheets=get_gsheets_service()
        gsheets.spreadsheets().values().append(
            spreadsheetId=self.sheet_id,
            range=self.tab_range,  
            valueInputOption='RAW',  # or 'USER_ENTERED'
            body={'values': rows}
        ).execute()

class QDoc(GSheet):
    "Quality Document (Risk Register, Decision Log, Design I/O Matrix)"
    def __init__(self, sheet_id, tab_range, entry_f):
        super().__init__(sheet_id, tab_range)
        self.entry_f = entry_f #doc entry defn function
        self.rows = self.get_rows()
        self._headers = self.rows[0] if self.rows else []

    @staticmethod
    def _sort_by_date_if_present(prop_dicts):
        if not prop_dicts or 'Date' not in prop_dicts[0]: return prop_dicts
        return sorted(prop_dicts, key=lambda d: d.get('Date', ''))

    def _mk_row(self,prop):
        return [prop.get(h,'') for h in self._headers]

    def add_entries(self,props):
        props=self._sort_by_date_if_present(props)
        rows=[self._mk_row(prop) for prop in props]
        self.append(rows)

def qdocs(): 
    def risk_f(hazard:str, harms:str, causes:str)->dict:
        "Candidate Risk Register entry"
        return {'Hazard': hazard,
            'Potential Harms': harms,
            'Possible Causes': causes,
            'Reviewed': 'FALSE'}

    def decision_f(component:str,decision:str,rationale:str,date:str)->dict:
        "Candidate Decision Log entry"
        return {'Component': component,
            'Decision': decision,
            'Rationale': rationale,
            'Date': date,
            'Reviewed': 'FALSE'}

    def design_f(design_input:str,source_rationale:str,design_output:str,verif_valid_method:str)->dict:
        "Candidate Design Input/Output Matrix entry"
        return {'Design Input': design_input,
            'Source / Rationale': source_rationale,
            'Design Output': design_output,
            'Verification/Validation Method': verif_valid_method,
            'Reviewed': 'FALSE'}

    rreg = QDoc(
        sheet_id='1j752K9N89Qv9WLkjje3OWkrzDbSMjLwYuxdoTfArcpQ',
        tab_range='Risk Register!A:L',
        entry_f=risk_f)
    dlog = QDoc(
        sheet_id='14oGD5hwGP_8OLfq_tqQVdEjicRzJ4cyQWPkXz92avv4',
        tab_range='Decision Log!A:G',
        entry_f=decision_f)   
    dio = QDoc(
        sheet_id='1xrV_39iaVoTpAXmbxbKfFBBnbdHN5yeyKJMDZ59MdqM',
        tab_range='Design IO Matrix!A:G',
        entry_f=design_f)
    return [rreg,dlog,dio]

'''
Path('qdocs.py').write_text(qdocs_code)
import qdocs

In [None]:
core_code=r'''
from claudette import Chat, Client
from date_utils import ytd
from emails import label_keys, get_daily_messages
from logs import log_keys, write_log, read_logs
from qdocs import qdocs

# update logs

def get_email_cleaner():
    instr = """Please format this as a clean plain-text chronological transcript by:
    1. Removing all email signature blocks
    2. Removing any quoted/forwarded text
    3. For each message: sender's name, colon, then the message body
    4. After each message: "---" with no newlines or returns (no blank lines)
    5. Keep it tight - no additional blank lines
    6. Plain text only - no markdown formatting"""
    return Chat(model='claude-sonnet-4-20250514', sp=instr)

def clean(messages):
    c = get_email_cleaner()
    r = c(messages)
    return r.content

def update_logs(date=None):
    """update all logs based on emails from date (default yesterday)"""
    if date is None: date=ytd()
    for k in (set(label_keys()) & set(log_keys())): 
        messages = get_daily_messages(k, date)
        if not messages: return
        clean_digest = clean(messages)
        if not clean_digest: return
        write_log(k,clean_digest,date)

## Update quality docs 

def _entries_prompt(logs):
    p = f"""Review this set of engineering logs from a continuous glucose monitor development project
            and identify candidate entries to the quality document represented in the tool. Focus on items 
            of significant, long-term importance; ignore short-term operational or execution issues. Keep 
            each field terse (10-20 words max).

            Logs:
            {logs} 
        """
    return p

def _get_props(logs, qdoc):
    """propose entries for qdoc before seeing existing entries"""
    c = Client('claude-sonnet-4-5')
    return c.structured(_entries_prompt(logs),[qdoc.entry_f])

def _filter_props(props, qdoc):
    p = f"""
    Determine which proposed new entries are NOT already documented the given existing document. 
    Return ONLY those proposals that represent genuinely NEW concepts not already captured in existing entries. 
    - Ignore wording differences - focus on whether the core concept is already documented.
    - If a proposal is semantically similar to an existing entry, exclude it.
    - If NONE of the proposals are new, make NO tool calls.

    Proposals:
    {props}

    Existing entries:
    {qdoc.rows}
    """
    c = Client('claude-sonnet-4-5')
    return c.structured(p,qdoc.entry_f)

def update_qdocs(start_date,end_date):
    entries=read_logs(start_date,end_date)
    if not any(entries.values()): return
    for qdoc in qdocs(): 
        props = _get_props(entries,qdoc)
        news = _filter_props(props,qdoc)
        if news: qdoc.add_entries(news)

'''
Path('core.py').write_text(core_code)
import core

In [None]:
cloud_code=r'''
# CLOUD DEPLOYMENT: When creating main.py, combine all modules but replace google_auth functions
# with the keyless versions below. 

from datetime import datetime, timedelta
from zoneinfo import ZoneInfo

from google.auth import default, iam
from google.auth.transport import requests as google_requests
from google.oauth2 import service_account
from googleapiclient.discovery import build

SERVICE_ACCOUNT = 'docs-updater@qa-assistant-458920.iam.gserviceaccount.com'
USER_EMAIL = 'dan@onehealthbiosensing.com'

def get_google_service(service_name, version, scopes, user_email=USER_EMAIL):
    """Get Google API service using service account with delegation via IAM signJwt API."""
    signing_creds, _ = default()
    signer = iam.Signer(
        request=google_requests.Request(),
        credentials=signing_creds,
        service_account_email=SERVICE_ACCOUNT
    )
    delegated_creds = service_account.Credentials(
        signer=signer,
        service_account_email=SERVICE_ACCOUNT,
        token_uri='https://oauth2.googleapis.com/token',
        scopes=scopes,
        subject=user_email
    )
    return build(service_name, version, credentials=delegated_creds)

def get_gmail_service():
    return get_google_service('gmail', 'v1', 
        ['https://www.googleapis.com/auth/gmail.readonly'])

def get_gdocs_service():
    return get_google_service('docs', 'v1', 
        ['https://www.googleapis.com/auth/documents'])

def get_gsheets_service():
    return get_google_service('sheets', 'v4', 
        ['https://www.googleapis.com/auth/spreadsheets'])

# Cloud Function entry points

def update_logs_daily(request):
    """Cloud Function entry point - updates logs from yesterday's emails."""
    from core import update_logs
    update_logs()
    return 'Logs updated successfully'

def update_qdocs_weekly(request):
    """Cloud Function entry point - run Fridays at 4am for prior week (Fr-Th)."""
    from core import update_qdocs
    now = datetime.now(ZoneInfo('US/Central'))
    end_date = (now - timedelta(hours=6)).date()
    start_date = end_date - timedelta(days=7)
    update_qdocs(start_date, end_date)
    return f"Quality docs updated through {end_date.strftime('%Y/%m/%d')}"
'''
Path('cloud.py').write_text(cloud_code)
import cloud

In [None]:
cloud_md='''
# Google Cloud setup:

**One-Time Project Setup:**
- [ ] Enable APIs: Cloud Run, Cloud Scheduler, Secret Manager, Gmail, Google Docs, Google Sheets, IAM Service Account Credentials
- [ ] Enable billing and set budget alert

**Service Account Configuration:**
- [ ] Create service account: `docs-updater@qa-assistant-458920.iam.gserviceaccount.com`
- [ ] Grant **Service Account Token Creator** role to itself (for keyless JWT signing)
- [ ] Configure domain-wide delegation in Google Workspace Admin Console with scopes: `gmail.readonly`, `documents`, `spreadsheets`

**Secret Management:**
- [ ] Create secret `anthropic-api-key` in Secret Manager
- [ ] Grant `docs-updater` service account **Secret Manager Secret Accessor** role

**Per-Function Deployment:**
- [ ] Deploy function with runtime service account set to `docs-updater`
- [ ] Reference `anthropic-api-key` as environment variable
- [ ] Grant `docs-updater` **Cloud Run Invoker** role on the deployed function

**Scheduler Setup:**
- [ ] Create job with OIDC auth using `docs-updater` service account
- [ ] Set schedule and timezone appropriately
'''
Path('gcloud.md').write_text(cloud_md)

1112

In [None]:
from datetime import date
from core import update_qdocs
update_qdocs(date(2025,10,3),date(2025,10,9))