# üìÑ Google Docs ‚Üí GitHub Sync

**One-click sync of all InsightPulseAI documentation to GitHub**

This notebook:
1. Authenticates with your Google account
2. Fetches all 12+ deliverable documents
3. Converts to Markdown
4. Pushes to GitHub repository

---

## üöÄ Quick Start

1. Run **Cell 1** (Setup) - installs dependencies
2. Run **Cell 2** (Auth) - authenticates with Google
3. Run **Cell 3** (Config) - enter your GitHub token
4. Run **Cell 4** (Sync) - syncs all docs to GitHub

That's it! All your Google Docs will be in the repo.

## 1Ô∏è‚É£ Setup - Install Dependencies

In [None]:
# Install required packages
!pip install -q PyGithub google-api-python-client google-auth-httplib2 google-auth-oauthlib

print("‚úÖ Dependencies installed!")

## 2Ô∏è‚É£ Authenticate with Google

In [None]:
# Authenticate with Google (uses your logged-in Colab account)
from google.colab import auth
auth.authenticate_user()

from google.auth import default
from googleapiclient.discovery import build

creds, _ = default()
docs_service = build('docs', 'v1', credentials=creds)
drive_service = build('drive', 'v3', credentials=creds)

print("‚úÖ Google authentication successful!")
print("   You can now access your Google Docs.")

## 3Ô∏è‚É£ Configuration

In [None]:
# =============================================================================
# CONFIGURATION - Edit these values
# =============================================================================

# GitHub Configuration
GITHUB_TOKEN = ""  # @param {type:"string"}
REPO_NAME = "Insightpulseai-net/pulser-agent-framework"  # @param {type:"string"}
BRANCH = "claude/system-design-analysis-pVVIl"  # @param {type:"string"}
DOCS_PATH = "docs/google-docs/"  # @param {type:"string"}

# If token not entered above, prompt for it
if not GITHUB_TOKEN:
    from google.colab import userdata
    try:
        GITHUB_TOKEN = userdata.get('GITHUB_TOKEN')
        print("‚úÖ Using GITHUB_TOKEN from Colab secrets")
    except:
        GITHUB_TOKEN = input("Enter your GitHub Personal Access Token: ")

# =============================================================================
# DOCUMENT LIST - Your 12+ Deliverables
# =============================================================================
# Format: (doc_id, output_filename, description)

DOCUMENTS = [
    # Testing & Development
    ("1Qp4nf8nl7M8MnaNtmrBgP4B1mw2aSUqEzYMKmFBCzH4", "COMPREHENSIVE_TESTING_STRATEGY.md", "Comprehensive Testing Strategy"),
    ("12cvYyZdPeLeLJSGX7OW8XQAwvsBVOaiO146UTkVcc7w", "GOOGLE_DOCS_TO_GITHUB_WORKFLOW.md", "Google Docs to GitHub Workflow"),
    ("1qL1fJT6mX4zjXFO_ui8VKKALTlACSa87VgTIc7HXqbo", "PULSER_AGENT_FRAMEWORK_TESTING.md", "Pulser-Agent-Framework Testing Implementation"),
    ("1Bfe2Lih6dj1Xw85T5xqjtQs5DvT1LMhSW218mnH657A", "ODOO_18_CE_OCA_TESTING.md", "Odoo 18 CE/OCA Native Testing"),
    ("1WY2GJz8IWTWNuTBIOeAoko5f1o_oMTQMd0kzpBLFxXM", "GITHUB_INTEGRATION_CODE_MANAGEMENT.md", "GitHub Integration & Code Management"),

    # Architecture & Design
    # Add more document IDs as needed - format:
    # ("DOC_ID_HERE", "FILENAME.md", "Description"),
]

print(f"‚úÖ Configuration loaded!")
print(f"   Repository: {REPO_NAME}")
print(f"   Branch: {BRANCH}")
print(f"   Output path: {DOCS_PATH}")
print(f"   Documents to sync: {len(DOCUMENTS)}")

## 4Ô∏è‚É£ Sync All Documents to GitHub

In [None]:
import re
from datetime import datetime
from github import Github
from github.GithubException import GithubException

def fetch_doc_as_html(doc_id: str) -> tuple:
    """Fetch Google Doc and export as HTML."""
    # Get metadata
    doc = docs_service.documents().get(documentId=doc_id).execute()
    title = doc.get('title', 'Untitled')

    # Export as HTML
    html = drive_service.files().export(
        fileId=doc_id,
        mimeType='text/html'
    ).execute()

    return title, html.decode('utf-8')

def html_to_markdown(html: str, title: str, doc_id: str) -> str:
    """Convert HTML to Markdown with frontmatter."""
    content = html

    # Remove style and script tags
    content = re.sub(r'<style[^>]*>.*?</style>', '', content, flags=re.DOTALL)
    content = re.sub(r'<script[^>]*>.*?</script>', '', content, flags=re.DOTALL)

    # Convert HTML to Markdown
    conversions = [
        (r'<h1[^>]*>(.*?)</h1>', r'# \1\n'),
        (r'<h2[^>]*>(.*?)</h2>', r'## \1\n'),
        (r'<h3[^>]*>(.*?)</h3>', r'### \1\n'),
        (r'<h4[^>]*>(.*?)</h4>', r'#### \1\n'),
        (r'<strong[^>]*>(.*?)</strong>', r'**\1**'),
        (r'<b[^>]*>(.*?)</b>', r'**\1**'),
        (r'<em[^>]*>(.*?)</em>', r'*\1*'),
        (r'<i[^>]*>(.*?)</i>', r'*\1*'),
        (r'<code[^>]*>(.*?)</code>', r'`\1`'),
        (r'<br\s*/?>', '\n'),
        (r'<p[^>]*>(.*?)</p>', r'\1\n\n'),
        (r'<li[^>]*>(.*?)</li>', r'- \1\n'),
        (r'<ul[^>]*>', ''),
        (r'</ul>', '\n'),
        (r'<ol[^>]*>', ''),
        (r'</ol>', '\n'),
        (r'<a[^>]*href=["\']([^"\']*)["\'][^>]*>(.*?)</a>', r'[\2](\1)'),
        (r'<[^>]+>', ''),
        (r'&nbsp;', ' '),
        (r'&amp;', '&'),
        (r'&lt;', '<'),
        (r'&gt;', '>'),
        (r'&quot;', '"'),
        (r'\n{3,}', '\n\n'),
    ]

    for pattern, replacement in conversions:
        content = re.sub(pattern, replacement, content, flags=re.DOTALL | re.IGNORECASE)

    content = content.strip()

    # Add frontmatter
    timestamp = datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%SZ')
    frontmatter = f"""---
title: "{title}"
source: Google Docs
document_id: "{doc_id}"
synced_at: "{timestamp}"
---

"""

    footer = f"""

---

*Synced from [Google Docs](https://docs.google.com/document/d/{doc_id}/edit) on {timestamp}*
"""

    return frontmatter + content + footer

def push_to_github(path: str, content: str, message: str):
    """Push file to GitHub repository."""
    g = Github(GITHUB_TOKEN)
    repo = g.get_repo(REPO_NAME)

    try:
        # Try to get existing file
        existing = repo.get_contents(path, ref=BRANCH)
        repo.update_file(path, message, content, existing.sha, branch=BRANCH)
        return "updated"
    except GithubException as e:
        if e.status == 404:
            repo.create_file(path, message, content, branch=BRANCH)
            return "created"
        raise

# =============================================================================
# SYNC ALL DOCUMENTS
# =============================================================================
print("="*60)
print("üìÑ GOOGLE DOCS ‚Üí GITHUB SYNC")
print("="*60)
print()

results = []
for doc_id, filename, description in DOCUMENTS:
    print(f"üì• Fetching: {description}...")

    try:
        # Fetch and convert
        title, html = fetch_doc_as_html(doc_id)
        markdown = html_to_markdown(html, title, doc_id)

        # Push to GitHub
        path = f"{DOCS_PATH}{filename}"
        status = push_to_github(
            path,
            markdown,
            f"docs: sync {filename} from Google Docs"
        )

        print(f"   ‚úÖ {status.upper()}: {path}")
        results.append((filename, "success", status))

    except Exception as e:
        print(f"   ‚ùå FAILED: {e}")
        results.append((filename, "failed", str(e)))

    print()

# Summary
print("="*60)
print("üìä SYNC SUMMARY")
print("="*60)
success = sum(1 for _, status, _ in results if status == "success")
failed = sum(1 for _, status, _ in results if status == "failed")
print(f"‚úÖ Success: {success}")
print(f"‚ùå Failed: {failed}")
print()
print(f"üîó View at: https://github.com/{REPO_NAME}/tree/{BRANCH}/{DOCS_PATH}")

## 5Ô∏è‚É£ Add More Documents (Optional)

To add more documents, edit the `DOCUMENTS` list in Cell 3:

```python
DOCUMENTS = [
    ("DOC_ID", "filename.md", "Description"),
    # Add more here...
]
```

Get the Doc ID from the URL:
```
https://docs.google.com/document/d/THIS_IS_THE_DOC_ID/edit
```

## üîç Sync a Single Document

In [None]:
# Quick sync a single document
SINGLE_DOC_ID = ""  # @param {type:"string"}
SINGLE_FILENAME = "custom_doc.md"  # @param {type:"string"}

if SINGLE_DOC_ID:
    print(f"üì• Fetching document {SINGLE_DOC_ID}...")

    title, html = fetch_doc_as_html(SINGLE_DOC_ID)
    markdown = html_to_markdown(html, title, SINGLE_DOC_ID)

    path = f"{DOCS_PATH}{SINGLE_FILENAME}"
    status = push_to_github(path, markdown, f"docs: sync {SINGLE_FILENAME}")

    print(f"‚úÖ {status.upper()}: {path}")
    print(f"üîó https://github.com/{REPO_NAME}/blob/{BRANCH}/{path}")
else:
    print("Enter a document ID above to sync a single document.")

---

## üìã Your Document IDs Reference

| Document | ID | Link |
|----------|-------|------|
| Comprehensive Testing Strategy | `1Qp4nf8nl7M8MnaNtmrBgP4B1mw2aSUqEzYMKmFBCzH4` | [Open](https://docs.google.com/document/d/1Qp4nf8nl7M8MnaNtmrBgP4B1mw2aSUqEzYMKmFBCzH4/edit) |
| Google Docs to GitHub Workflow | `12cvYyZdPeLeLJSGX7OW8XQAwvsBVOaiO146UTkVcc7w` | [Open](https://docs.google.com/document/d/12cvYyZdPeLeLJSGX7OW8XQAwvsBVOaiO146UTkVcc7w/edit) |
| Pulser-Agent-Framework Testing | `1qL1fJT6mX4zjXFO_ui8VKKALTlACSa87VgTIc7HXqbo` | [Open](https://docs.google.com/document/d/1qL1fJT6mX4zjXFO_ui8VKKALTlACSa87VgTIc7HXqbo/edit) |
| Odoo 18 CE/OCA Testing | `1Bfe2Lih6dj1Xw85T5xqjtQs5DvT1LMhSW218mnH657A` | [Open](https://docs.google.com/document/d/1Bfe2Lih6dj1Xw85T5xqjtQs5DvT1LMhSW218mnH657A/edit) |
| GitHub Integration | `1WY2GJz8IWTWNuTBIOeAoko5f1o_oMTQMd0kzpBLFxXM` | [Open](https://docs.google.com/document/d/1WY2GJz8IWTWNuTBIOeAoko5f1o_oMTQMd0kzpBLFxXM/edit) |

---

*InsightPulseAI Docs2Code Pipeline*