<a href="https://colab.research.google.com/github/Palaeoprot/GoogleSlides-from-markdown/blob/main/GoogleSlides_md_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cell 1 – Project Overview
# Google Slides Markdown to Presentation Converter

## Requirements
This script converts a Markdown file into a Google Slides presentation using a predefined slide template.

### Formatting Requirements:
1. The presentation must use the custom template from `PalaeomeSlideFormat.pptx`.
2. The **Lato** font must be applied to all text fields (title and body).
3. Slide types and layouts must follow these rules:
   - The **first slide** uses the "Title" layout from the template.
   - All subsequent content slides use the "Title and Content" layout.
   - Any section dividers should use the "Section Header" layout.
4. Use proper `placeholderIdMappings` to bind content to TITLE and BODY placeholders.
5. If needed, explicitly enforce Lato font using `updateTextStyle` with `"fontFamily": "Lato"`.

Ensure that the template ID (`template_id`) is correctly set for the PalaeomeSlideFormat in Google Drive.

# Markdown Formatting Guide for Google Slides Converter

## Overview
This guide explains the exact formatting requirements for markdown files that will be converted to Google Slides presentations using the Python converter.

## File Structure
- Use **YAML-style frontmatter** with sections separated by `---` (three dashes)
- Each section between separators represents **one slide**
- Sections must be separated by exactly `---` on its own line
- Empty sections are automatically skipped

## Required Fields for Each Slide

### Universal Fields
- `layout:` - Determines slide type (required)
- `title:` - Slide title (required)

### Optional Fields
- `subtitle:` - Subtitle for title and section slides
- `presenter:` - Presenter name (title slides only)
- `body:` - Main content (content slides only)

## Slide Layout Types

### 1. Title Slide (`layout: title`)
```markdown
layout: title
title: Your Presentation Title
subtitle: Optional Subtitle Here
presenter: Your Name
---
```

### 2. Section Header (`layout: section_header`)
```markdown
layout: section_header
title: Section Name
subtitle: Optional Section Description
---
```

### 3. Content Slide (`layout: content`)
```markdown
layout: content
title: Slide Title
body: |
  * First bullet point
  * Second bullet point
  * Third bullet point
---
```

## Body Content Formatting Rules

### Multi-line Content (Recommended)
Always use the pipe (`|`) syntax for body content:
```markdown
body: |
  Line 1 of content
  Line 2 of content
  Line 3 of content
```

### Indentation Rules
- **CRITICAL:** All body content lines must be indented with exactly **2 spaces**
- The parser removes the first 2 spaces from each line
- Additional indentation beyond 2 spaces creates nested content

```markdown
body: |
  * Top-level bullet (2 spaces)
    * Sub-bullet (4 spaces = 2 removed + 2 kept)
      * Sub-sub-bullet (6 spaces = 2 removed + 4 kept)
```

### Bullet Points
Use asterisks (`*`) or dashes (`-`) for bullets:
```markdown
body: |
  * Primary bullet point
  * Another primary bullet
    * Nested sub-bullet
    * Another sub-bullet
  * Back to primary level
```

### Text Formatting
Standard markdown formatting within content:
```markdown
body: |
  * **Bold text** for emphasis
  * *Italic text* for subtle emphasis
  * `Code or technical terms`
  * Regular text for normal content
```

### Paragraph Breaks
Use empty lines for paragraph separation:
```markdown
body: |
  * First paragraph of bullets
  * More bullets in first paragraph
  
  * Second paragraph starts here
  * More content in second paragraph
```

## Complete Example Template

```markdown
layout: title
title: My Presentation Title
subtitle: A Comprehensive Overview
presenter: Your Name
---
layout: section_header
title: Introduction
subtitle: Setting the Context
---
layout: content
title: Key Concepts
body: |
  * **First concept:** Brief explanation here
  * **Second concept:** Building on the first
    * Supporting detail A
    * Supporting detail B
  * **Third concept:** Bringing it all together
---
layout: content
title: Detailed Analysis
body: |
  The research reveals several important findings:
  
  * **Finding 1:** Significant results observed
  * **Finding 2:** Unexpected patterns emerged
  * **Finding 3:** Hypothesis confirmed
  
  These results suggest new directions for future work.
---
layout: section_header
title: Conclusion
---
layout: content
title: Summary
body: |
  * We have successfully demonstrated our approach
  * Key contributions include methodology and findings
  * Future work should expand the scope
  * Thank you for your attention
---
```

## Critical Formatting Rules

### ✅ DO:
- Use exactly `---` (three dashes) as separators
- Indent body content with exactly 2 spaces
- Use consistent field names: `layout:`, `title:`, `subtitle:`, `body:`
- Include `body: |` for multi-line content
- Use standard markdown formatting (`**bold**`, `*italic*`)

### ❌ DON'T:
- Use tabs instead of spaces for indentation
- Forget the `|` after `body:`
- Use inconsistent spacing
- Include extra characters in separators (like `----` or `--- `)
- Mix indentation styles within the same file

## Common Mistakes to Avoid

1. **Wrong separator:** Using `----` instead of `---`
2. **Missing pipe:** Writing `body:` instead of `body: |`
3. **Incorrect indentation:** Using tabs or wrong number of spaces
4. **Inconsistent field names:** Using `Body:` instead of `body:`
5. **Missing separators:** Forgetting `---` between slides

## Validation Checklist

Before running the converter, verify:
- [ ] Each slide section is separated by `---`
- [ ] All body content uses `body: |` format
- [ ] All body content lines start with exactly 2 spaces
- [ ] Layout types are spelled correctly: `title`, `section_header`, or `content`
- [ ] No trailing spaces after field names or separators
- [ ] File encoding is UTF-8

## Field Defaults

If fields are missing, the parser uses these defaults:
- **layout:** Defaults to `content`
- **title:** Defaults to "Untitled Slide"
- **subtitle:** Defaults to empty string
- **body:** Defaults to empty list

Following these formatting rules ensures your markdown file will convert correctly to a Google Slides presentation with proper layout, formatting, and content structure.

In [None]:
# ===== Cell 2 =====
# Authentication and Google Drive connection
from google.colab import auth, drive
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
import os
import traceback
import time
import re
import yaml

# Mount Google Drive
drive.mount('/content/drive')

# Authenticate using Colab's built-in method
auth.authenticate_user()

# Build the Google Slides service
slides_service = build('slides', 'v1')

In [None]:
# ===== Cell 3 (REVISED for File Discovery and Template Dropdown) =====
# Configuration parameters
import os

# --- Discover Markdown Files in /content ---
content_dir = "/content"
md_files = [f for f in os.listdir(content_dir) if f.endswith('.md')]

# Check for files and set a default or error message
if not md_files:
    md_options = ["No .md files found in /content"]
    default_md = md_options[0]
else:
    md_options = md_files
    # Prioritize 'test2_deduped.md' if it exists, otherwise use the first found file
    if "test2_deduped.md" in md_options:
        default_md = "test2_deduped.md"
    else:
        default_md = md_options[0]

# --- Template Options ---
TEMPLATE_OPTIONS = {
    "Palaeome (GoogleSlides)": "1bYjQ10Pfkjk16ATPJlXL9xy-MxkR9JQVgF41pEZPMHg",
    "Codicum (PowerPoint derived)": "1uU5MxvuYRsTuNg8zCUU5Gf9Lw7rrewVmYNuT5z3zk1U"
}
default_template = "Palaeome (GoogleSlides)"
template_keys = list(TEMPLATE_OPTIONS.keys())


# --- User Adjustable Parameters ---

# Dropdown Menu for Markdown File Selection
md_filename = default_md #@param {type:"string", options: md_options}
md_path = os.path.join(content_dir, md_filename)

# Dropdown Menu for Template Selection
template_choice = "Palaeome (GoogleSlides)" #@param ["Palaeome (GoogleSlides)", "Codicum (PowerPoint derived)"]
template_id = TEMPLATE_OPTIONS[template_choice]

# Other parameters
save_folder = "/content/drive/MyDrive/4 Presentations/Python Presentations/" #@param {type:"string"}
presentation_name = "f\"{md_filename.replace('.md', '')}_Presentation\"" #@param {type:"string"}
debug_mode = False #@param {type:"boolean"}
default_font = "Lato Light" #@param {type:"string"}

# --- Final Check ---
if "No .md files found" in md_filename:
    print("WARNING: Please upload a Markdown (.md) file to the /content directory.")
    # Set template_id to a placeholder so the next cell doesn't crash, but warns.
    template_id = "PLACEHOLDER_ID_NO_FILE"

In [None]:
# ===== Cell 4 =====
# Helper functions for folder and layout management
def get_or_create_folder(service, folder_path):
    """Creates folder structure in Google Drive if it doesn't exist."""
    relative_path = folder_path.replace('/content/drive/MyDrive/', '')
    parts = relative_path.strip('/').split('/')
    parent_id = 'root'

    for part in parts:
        query = (
            f"name='{part}' and mimeType='application/vnd.google-apps.folder' "
            f"and '{parent_id}' in parents and trashed=false"
        )
        result = service.files().list(q=query, spaces='drive', fields='files(id, name)').execute().get('files', [])
        if result:
            parent_id = result[0]['id']
        else:
            folder_metadata = {
                'name': part,
                'mimeType': 'application/vnd.google-apps.folder',
                'parents': [parent_id]
            }
            folder = service.files().create(body=folder_metadata, fields='id').execute()
            parent_id = folder['id']
    return parent_id

def get_layout_id_by_name(presentation_info, layout_name):
    """Finds the layout ID based on its name."""
    layout_mapping = {
        'title': 'TITLE',
        'section_header': 'SECTION_HEADER',
        'content': 'TITLE_AND_BODY'
    }

    target_layout = layout_mapping.get(layout_name.lower(), 'TITLE_AND_BODY')

    for layout in presentation_info.get('layouts', []):
        if layout['layoutProperties']['name'].upper() == target_layout.upper():
            return layout['objectId']
    return None

def print_layouts(presentation_id):
    """Debug function to print all available layouts in the template."""
    slides_service = build('slides', 'v1')
    presentation = slides_service.presentations().get(presentationId=presentation_id).execute()
    print("Available layouts:")
    for layout in presentation.get('layouts', []):
        layout_name = layout['layoutProperties']['name']
        layout_id = layout['objectId']
        print(f"Layout Name: {layout_name}, Layout ID: {layout_id}")
        for placeholder in layout.get('placeholder', []):
            print(f" - Placeholder Type: {placeholder['type']}, ID: {placeholder['objectId']}")

# Print available layouts for debugging
print_layouts(template_id)

In [None]:
# ===== Cell 5 - Universal Placeholder Fix =====
# Clean text processing functions that remove all markdown formatting

def clean_markdown_text(text):
    """
    Removes all markdown formatting and returns clean plain text.
    Converts bullets to proper bullet points and removes bold/italic markers.
    """
    lines = text.split('\n')
    processed_lines = []

    for line in lines:
        # Skip empty lines
        if not line.strip():
            processed_lines.append('')
            continue

        stripped_line = line.lstrip()

        # Handle bullet points
        if stripped_line.startswith('* ') or stripped_line.startswith('- '):
            leading_spaces = len(line) - len(stripped_line)
            indent_level = leading_spaces // 2

            clean_text = stripped_line[2:].strip()
            clean_text = clean_text.replace('**', '').replace('*', '').replace('_', '').replace('`', '')

            # Use appropriate bullet character for display
            if indent_level == 0:
                bullet_char = '• '
            elif indent_level == 1:
                bullet_char = '  ◦ '
            else:
                bullet_char = '    ▪ '

            processed_lines.append(' ' * (indent_level * 2) + bullet_char + clean_text)

        else:
            # Regular text line - just remove markdown formatting
            clean_text = line.replace('**', '').replace('*', '').replace('_', '').replace('`', '')
            processed_lines.append(clean_text)

    return '\n'.join(processed_lines)


def get_placeholder_ids(slides_service, presentation_id, slide_id):
    """
    Fetches placeholder IDs from a newly created slide, prioritizing
    standard TITLE, BODY, and SUBTITLE types.
    """
    try:
        presentation = slides_service.presentations().get(presentationId=presentation_id).execute()

        slide = next((s for s in presentation['slides'] if s['objectId'] == slide_id), None)

        if slide:
            print(f"\n🔍 Slide ID: {slide_id}")
            placeholders = {}
            for element in slide.get('pageElements', []):
                if 'shape' in element and 'placeholder' in element['shape']:
                    ph_info = element['shape']['placeholder']
                    ph_type = ph_info.get('type', 'UNKNOWN')
                    ph_id = element['objectId']

                    # Store all found placeholders by type.
                    # CRITICAL: This is the data we search for in the insertion loop.
                    placeholders[ph_type] = ph_id
                    print(f" - Found placeholder: Type={ph_type}, Object ID={ph_id}")
            return placeholders

        print(f"⚠️ No slide found with ID {slide_id}")
        return {}
    except Exception as e:
        print(f"Error fetching placeholders for slide {slide_id}: {e}")
        traceback.print_exc()
        return {}


def create_slide_presentation(slides_content, folder_path, template_id, presentation_name, debug=False):
    """Creates a Google Slides presentation from parsed markdown content with clean text."""
    slides_service = build('slides', 'v1')
    drive_service = build('drive', 'v3')

    # Copy template to new presentation
    folder_id = get_or_create_folder(drive_service, folder_path)
    copied_file = {'name': presentation_name, 'parents': [folder_id]}
    presentation = drive_service.files().copy(fileId=template_id, body=copied_file).execute()
    presentation_id = presentation['id']

    # Get layout info from template
    template_info = slides_service.presentations().get(presentationId=template_id).execute()

    # Remove all initial slides
    presentation_info = slides_service.presentations().get(presentationId=presentation_id).execute()
    requests = []
    for slide in presentation_info.get('slides', []):
        requests.append({'deleteObject': {'objectId': slide['objectId']}})

    if requests:
        slides_service.presentations().batchUpdate(
            presentationId=presentation_id,
            body={'requests': requests}
        ).execute()

    # Now add new slides one by one
    requests = []

    # Store slide object IDs for later mapping
    slide_ids = {}

    for i, slide_data in enumerate(slides_content):
        sid = f"slide_{i}"
        slide_ids[i] = sid

        # Determine layout ID based on slide's layout type
        layout_id = get_layout_id_by_name(template_info, slide_data['layout'])
        if not layout_id:
            print(f"Warning: Layout '{slide_data['layout']}' not found. Falling back to TITLE_AND_BODY.")
            # Fallback to a known content slide layout
            layout_id = get_layout_id_by_name(template_info, 'content')

        # Create slide request
        requests.append({
            'createSlide': {
                'objectId': sid,
                'slideLayoutReference': {'layoutId': layout_id}
            }
        })

    # Execute batch update to create slides
    slides_service.presentations().batchUpdate(
        presentationId=presentation_id,
        body={'requests': requests}
    ).execute()

    # Now fetch presentation again and insert clean text
    content_requests = []

    for i, slide_data in enumerate(slides_content):
      try:
        sid = slide_ids[i]
        placeholders = get_placeholder_ids(slides_service, presentation_id, sid)

        # --- UNIVERSAL PLACEHOLDER IDENTIFICATION ---
        # Prioritize standard types first. This should work for both templates.

        # 1. Identify Title Placeholder ID
        # Search for TITLE or CENTER_TITLE
        title_id = placeholders.get('TITLE') or placeholders.get('CENTER_TITLE')

        # 2. Identify Body Placeholder ID
        # Search for BODY or a generic content placeholder
        body_id = placeholders.get('BODY')

        # 3. Identify Subtitle Placeholder ID
        subtitle_id = placeholders.get('SUBTITLE')

        # --- TITLE INSERTION ---
        if title_id:
            clean_title = slide_data['title'].replace('**', '').replace('*', '')
            print(f"Inserting title: {clean_title} into {title_id}")
            content_requests.append({'insertText': {'objectId': title_id, 'text': clean_title}})
        else:
            print(f"⚠️ Slide {i+1} ('{slide_data['title']}'): No title placeholder found (IDs: {list(placeholders.keys())}).")

        # --- BODY/CONTENT INSERTION ---
        if body_id and slide_data.get('content'):
            raw_body_text = '\n'.join(slide_data['content'])
            clean_body_text = clean_markdown_text(raw_body_text)

            print(f"📝 Slide {i+1}: Processed content preview: {clean_body_text[:100]}...")
            content_requests.append({'insertText': {'objectId': body_id, 'text': clean_body_text}})

        # --- SUBTITLE/PRESENTER INSERTION ---
        if subtitle_id and (slide_data.get('subtitle') or slide_data.get('presenter')):
             text_to_insert = slide_data.get('subtitle') or slide_data.get('presenter')
             clean_text = text_to_insert.replace('**', '').replace('*', '')
             print(f"Inserting subtitle/presenter: {clean_text}")
             content_requests.append({'insertText': {'objectId': subtitle_id, 'text': clean_text}})


      except Exception as e:
        print(f"Error processing slide {i+1}: {e}")
        continue

    # Send insert text requests in batches
    print(f"📝 Inserting clean text content...")

    # Batch updates are safer
    for i in range(0, len(content_requests), 50):
        slides_service.presentations().batchUpdate(
            presentationId=presentation_id,
            body={'requests': content_requests[i:i + 50]}
        ).execute()

    print(f"✅ Presentation created: https://docs.google.com/presentation/d/{presentation_id}")
    return presentation_id

In [None]:
# ===== Cell 6 =====
# Enhanced markdown parser for YAML-style frontmatter
def parse_markdown_to_slides(md_path, debug=False):
    """
    Parses markdown file with YAML-style frontmatter separated by ---.
    Handles layout, title, subtitle, and body fields.
    """
    try:
        with open(md_path, 'r', encoding='utf-8') as f:
            content = f.read()
    except Exception as e:
        print(f"Error reading markdown file: {e}")
        return []

    slides = []

    # Split content by --- separators
    sections = content.split('---')

    # Remove empty sections and strip whitespace
    sections = [section.strip() for section in sections if section.strip()]

    for i, section in enumerate(sections):
        if not section:
            continue

        slide_data = {
            'layout': 'content',  # Default layout
            'title': 'Untitled Slide',
            'subtitle': '',
            'content': []
        }

        lines = section.split('\n')
        current_field = None
        body_content = []

        for line in lines:
            line = line.strip()

            if line.startswith('layout:'):
                slide_data['layout'] = line.split(':', 1)[1].strip()
            elif line.startswith('title:'):
                slide_data['title'] = line.split(':', 1)[1].strip()
            elif line.startswith('subtitle:'):
                slide_data['subtitle'] = line.split(':', 1)[1].strip()
            elif line.startswith('presenter:'):
                # Handle presenter field (could be part of title slide)
                slide_data['presenter'] = line.split(':', 1)[1].strip()
            elif line.startswith('body:'):
                current_field = 'body'
                # Check if there's content on the same line after "body:"
                body_part = line.split(':', 1)[1].strip()
                if body_part and body_part != '|':
                    body_content.append(body_part)
            elif current_field == 'body' and line:
                # Continue collecting body content
                # Remove leading spaces for proper formatting
                if line.startswith('  '):
                    line = line[2:]
                body_content.append(line)

        # Set the body content
        slide_data['content'] = body_content

        # Only add slides that have meaningful content
        if slide_data['title'] != 'Untitled Slide' or slide_data['content']:
            slides.append(slide_data)

    if debug:
        print(f"Parsed {len(slides)} slides:")
        for i, slide in enumerate(slides):
            print(f"Slide {i+1}: '{slide['title']}' (Layout: {slide['layout']})")
            if slide.get('subtitle'):
                print(f"  Subtitle: {slide['subtitle']}")
            print(f"  Content lines: {len(slide['content'])}")
            if slide['content']:
                print(f"  First content line: {slide['content'][0][:50]}...")

    return slides

In [None]:
# ===== Cell 7 =====
# Execute the conversion process
print("🚀 Starting markdown to slides conversion...")

# Parse the markdown file
slides = parse_markdown_to_slides(md_path, debug=debug_mode)

if slides:
    print(f"📄 Found {len(slides)} slides in the markdown file:")
    for i, slide in enumerate(slides):
        print(f"  Slide {i+1}: '{slide['title']}' (Layout: {slide['layout']})")
        if slide.get('subtitle'):
            print(f"    Subtitle: {slide['subtitle']}")
        print(f"    Content lines: {len(slide['content'])}")

    print(f"\n🔧 Creating presentation from template...")

    # Create the presentation
    presentation_id = create_slide_presentation(
        slides,
        folder_path=save_folder,
        template_id=template_id,
        presentation_name=presentation_name,
        debug=debug_mode
    )

    print(f"\n🎉 SUCCESS! Presentation created successfully!")
    print(f"📎 Direct link: https://docs.google.com/presentation/d/{presentation_id}")

else:
    print("❌ ERROR: No slides found in markdown file. Check your markdown formatting.")