Skip to content

Improve onboarding imports and graph summaries#6362

Merged
kodjima33 merged 2 commits into
mainfrom
feat/gmail-background-import
Apr 6, 2026
Merged

Improve onboarding imports and graph summaries#6362
kodjima33 merged 2 commits into
mainfrom
feat/gmail-background-import

Conversation

@kodjima33
Copy link
Copy Markdown
Collaborator

Summary

  • improve onboarding source connectors and branded icons
  • expand Gmail session-based import coverage and dedupe for higher onboarding email counts
  • improve second-brain footer copy and graph layout for onboarding

Testing

  • swift test --filter OnboardingFlowTests

@kodjima33 kodjima33 merged commit 16f7b24 into main Apr 6, 2026
3 checks passed
@kodjima33 kodjima33 deleted the feat/gmail-background-import branch April 6, 2026 19:36
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 6, 2026

Greptile Summary

This PR improves the onboarding Gmail import by merging a Gmail-bootstrap-page scan with per-label Atom feeds for higher email coverage, adds branded connector icons (Google Calendar, Gmail), and adds a "Who you are" footer panel to the second-brain graph pane.

  • Dedup key mismatch (P1): bootstrapEmails uses real Gmail hex IDs; fetchGmailViaLabelFeeds (Atom path) generates atom_<sha1> IDs when the link URL lacks /message_id=. The dictionary merge in readRecentEmails won't collapse those pairs, so the same message can reach synthesizeFromEmails twice, inflating memory counts and LLM token usage.

Confidence Score: 4/5

Safe to merge after resolving the dedup key mismatch in readRecentEmails.

One P1 logic issue: duplicate emails can survive the ID-based merge and be sent to the LLM synthesis step, producing inflated memory counts. All other findings are P2 style/cleanup items that don't affect correctness.

GmailReaderService.swift — the readRecentEmails merge block and fetchGmailViaLabelFeeds.

Important Files Changed

Filename Overview
desktop/Desktop/Sources/GmailReaderService.swift Adds multi-label Atom feed collection and dedup merge; P1 ID-format mismatch means bootstrap and Atom emails for the same message aren't collapsed.
desktop/Desktop/Sources/ConnectorBrandIcon.swift New component that loads branded icons from app bundles or bundled PNG resources with a cached NSImage loader; looks correct.
desktop/Desktop/Sources/OnboardingDataSourcesStepView.swift Adopts ConnectorBrandIcon in all source rows; logic unchanged and straightforward.
desktop/Desktop/Sources/OnboardingStepScaffold.swift Adds rightPaneFooterText and 'Who you are' section to the graph pane footer; clean implementation.
desktop/Desktop/Sources/OnboardingPagedIntroCoordinator.swift Adds connectedContextSummary computed property feeding the new footer; logic is straightforward.

Sequence Diagram

sequenceDiagram
    participant OC as OnboardingCoordinator
    participant GRS as GmailReaderService
    participant BS as Bootstrap (Gmail HTML)
    participant LF as Label Feeds (13 Atom feeds)
    participant LLM as LLM (synthesizeFromEmails)

    OC->>GRS: readRecentEmails(maxResults:300, query:"newer_than:365d")
    GRS->>BS: fetchGmailViaAtomFeedSingle(allowBootstrap:true)
    BS-->>GRS: bootstrapEmails [real Gmail IDs]
    GRS->>LF: fetchGmailViaLabelFeeds(maxResults:300)
    loop each of 13 feed paths
        LF->>BS: fetchGmailViaAtomFeedSingle(feedPath, allowBootstrap:false)
        BS-->>LF: atom emails [atom_sha1 IDs if no message_id in URL]
    end
    LF-->>GRS: labelEmails [mixed ID formats]
    GRS->>GRS: merge by email.id (ID mismatch = duplicates survive)
    GRS-->>OC: merged emails (may contain duplicates)
    OC->>LLM: synthesizeFromEmails(emails.prefix(120))
    LLM-->>OC: memories, tasks, profileSummary
Loading

Comments Outside Diff (1)

  1. desktop/Desktop/Sources/GmailReaderService.swift, line 159-166 (link)

    P2 Redundant sort after inner sort

    The days > 20 branch already sorts and trims to maxResults before assigning to emails. The unconditional return emails.sorted { $0.date > $1.date } at line 166 re-sorts an already-sorted array. For maxResults = 300 this is a trivial O(n log n) overhead, but the intent is unclear.

Reviews (1): Last reviewed commit: "Improve onboarding imports and graph sum..." | Re-trigger Greptile

Comment on lines +152 to +162
var merged: [String: GmailEmail] = [:]
for email in bootstrapEmails + labelEmails {
let existing = merged[email.id]
if existing == nil || existing!.date < email.date {
merged[email.id] = email
}
}
emails = Array(merged.values)
.sorted { $0.date > $1.date }
.prefix(maxResults)
.map(\.self)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Dedup key mismatch between bootstrap and Atom-feed emails

The bootstrapEmails path extracts real Gmail thread/message IDs (hex strings like 1958abc…), while the Atom-feed path (fetchGmailViaLabelFeeds) generates atom_<sha1> IDs for any entry whose link URL lacks /message_id=. Because the two sources produce structurally incompatible IDs for the same email, the dictionary merge keyed on email.id will not detect those duplicates—the same message can end up in the result set twice. When the merged array is later passed to synthesizeFromEmails, the LLM receives duplicate content, inflating memory counts and wasting tokens.

Comment on lines +785 to +812
private func fetchGmailViaLabelFeeds(maxResults: Int) throws -> [GmailEmail] {
guard maxResults > 0 else { return [] }

let feedPaths = [
"atom/all",
"atom/inbox",
"atom/sent",
"atom/starred",
"atom/important",
"atom/trash",
"atom/spam",
"atom/unread",
"atom/social",
"atom/promotions",
"atom/updates",
"atom/forums",
"atom/personal",
]

var merged: [String: GmailEmail] = [:]
for feedPath in feedPaths {
let feedEmails = try fetchGmailViaAtomFeedSingle(
maxResults: min(20, maxResults),
query: "newer_than:1d",
feedPath: feedPath,
allowBootstrap: false
)
for email in feedEmails {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 fetchGmailViaLabelFeeds ignores the caller-supplied query

Every call inside the loop passes query: "newer_than:1d" as a hard-coded string. However, because feedPath is also set, the Python script takes the feedPath branch and builds the URL from the feed path alone—the query argument is never used. The hard-coded value is dead code but also misleading: a reader might assume these feeds are limited to the past day, whereas they actually return the N most recent items in each Gmail folder regardless of date. Consider removing the query argument from these calls (pass query: "" or add a dedicated parameter) to make the intent explicit.

Glucksberg pushed a commit to Glucksberg/omi-local that referenced this pull request Apr 28, 2026
* Read Gmail bootstrap inbox before Atom fallback

* Improve onboarding imports and graph summaries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant