Skip to content

Add discord_activity_tracker app with DiscordChatExporter CLI integration#50

Merged
wpak-ai merged 22 commits into
developfrom
feature/issue-26
Feb 24, 2026
Merged

Add discord_activity_tracker app with DiscordChatExporter CLI integration#50
wpak-ai merged 22 commits into
developfrom
feature/issue-26

Conversation

@timon0305
Copy link
Copy Markdown
Contributor

@timon0305 timon0305 commented Feb 10, 2026

Closes #36

Summary

  • Add discord_activity_tracker Django app to archive C/C++ Together Discord server (ID: 331718482485837825) chat history
  • Bot API blocked by server admin → switched to DiscordChatExporter CLI with user token
  • Full pipeline: CLI export → JSON → Django DB → markdown
  • Markdown exported to discord-cplusplus-together-context repo
  • Management command run_discord_exporter with sync modes: sync, export, all, import-only
  • Incremental sync via last_synced_at tracking, with --days-back, --full-sync overrides

What changed

Area Details
CLI wrapper sync/chat_exporter.py — Popen (no timeout), stderr streaming, proxy stripping
Management command run_discord_exporter — tasks: sync, export, all, import-only
CLI flags --days-back, --full-sync, --months, --active-days
Import mode --task import-only for pre-exported JSON (full history export)
Collector update run_all_collectors.py → uses run_discord_exporter
Config Discord settings in settings.py, .env.example, requirements.txt
Docs README.md, EXPORTER_INTEGRATION.md, tools/README.md (user token guide)

Bugs fixed

Bug Root cause Fix
CLI killed after 60min subprocess.run(timeout=3600) hard-kills long exports Replaced with subprocess.Popen — no timeout, streams stderr progress
--days-back ignored when last_synced_at exists Incremental sync date always took priority over CLI flag Use min(sync_date, days_back_date) so --days-back can widen window
KeyError on timestamp during DB import export_and_parse_guild converted messages, then _persist_exported_data converted again Removed conversion from export_and_parse_guild

File structure

discord_activity_tracker/
├── management/commands/
│ ├── run_discord_exporter.py # CLI integration command (primary)
│ └── run_discord_activity_tracker.py # Bot API command (blocked)
├── migrations/
│ └── 0001_initial.py
├── sync/
│ ├── chat_exporter.py # DiscordChatExporter CLI wrapper
│ ├── client.py # Discord HTTP API client (bot method)
│ ├── export.py # Markdown generation + git operations
│ ├── messages.py # Message processing + DB persistence
│ └── utils.py # Date parsing, URL formatting
├── tools/
│ └── README.md # User token extraction guide
├── models.py # Server, Channel, Author, Message, Reaction
├── services.py # DB helper functions
├── workspace.py
├── README.md # Usage docs, all commands, full history setup
└── EXPORTER_INTEGRATION.md # CLI integration details

Modified existing files

File Change
config/settings.py Added Discord config (token, server ID, context repo path)
.env.example Added DISCORD_USER_TOKEN, DISCORD_SERVER_ID, DISCORD_CONTEXT_REPO_PATH
.gitignore Added CLI binary exclusion, allow tools/README.md
requirements.txt Added aiohttp, asgiref
workflow/.../run_all_collectors.py Changed to run_discord_exporter

Exported data

Test plan

  • 10-day sync completed — data visible in context repo
  • 15-day full sync in progress (264+ JSON files exported, CLI still running)
  • Pipeline end-to-end verified: CLI → JSON → DB → markdown (on 10-day sync)
  • Verify 15-day sync completes and data appears in context repo
  • Run full history export (2017-present) via --task import-only

Summary by CodeRabbit

  • New Features
    • End-to-end Discord support: track servers, channels, messages, reactions; user profiles; management commands to sync and export Markdown contexts.
  • Documentation
    • Added Discord service API docs and updated contributing/service index.
  • Tests
    • Added comprehensive tests for bulk processing and markdown export.
  • Chores
    • Added Discord env placeholders, gitignore entries, and new runtime dependencies.
  • Revert
    • Removed legacy collectors runner and its associated tests/fixtures.

@timon0305 timon0305 self-assigned this Feb 10, 2026
@timon0305 timon0305 marked this pull request as ready for review February 10, 2026 20:01
Copy link
Copy Markdown
Collaborator

@snowfox1003 snowfox1003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed your result (https://github.com/CppDigest/discord-cplusplus-together-context
) and noticed a few issues:

  • Timestamps are not unique. It appears the time may be based on your local time zone. Could you standardize all timestamps to UTC (or clearly specify the time zone)?
  • Some content appears corrupted or incorrectly formatted.
    Example: https://github.com/CppDigest/discord-cplusplus-together-context/blob/main/2026/2026-02/2026-02-c-help-text.md#1416---mxtreme-1
    -Files such as 2026-02-c-cpp-discussion.md are too large. Could you split them into separate files per day if a single day’s content is too big??
  • The Markdown formatting should be cleaner and more readable in preview mode. Please review the llvm-project-context repository and align the formatting style accordingly.
  • Update reply references from
    Reply to: @roemy2826 (01:59)
    to a linkable format such as:
    Reply to: [@roemy2826 (01:59)](<filename>#<date>-@<username>)
  • Bot messages are missing. Is it not possible to retrieve bot-generated content?

@timon0305
Copy link
Copy Markdown
Contributor Author

I reviewed your result (https://github.com/CppDigest/discord-cplusplus-together-context ) and noticed a few issues:

  • Timestamps are not unique. It appears the time may be based on your local time zone. Could you standardize all timestamps to UTC (or clearly specify the time zone)?
  • Some content appears corrupted or incorrectly formatted.
    Example: https://github.com/CppDigest/discord-cplusplus-together-context/blob/main/2026/2026-02/2026-02-c-help-text.md#1416---mxtreme-1
    -Files such as 2026-02-c-cpp-discussion.md are too large. Could you split them into separate files per day if a single day’s content is too big??
  • The Markdown formatting should be cleaner and more readable in preview mode. Please review the llvm-project-context repository and align the formatting style accordingly.
  • Update reply references from
    Reply to: @roemy2826 (01:59)
    to a linkable format such as:
    Reply to: [@roemy2826 (01:59)](<filename>#<date>-@<username>)
  • Bot messages are missing. Is it not possible to retrieve bot-generated content?

Hello @snowfox1003
https://github.com/CppDigest/discord-cplusplus-together-context/tree/main/2026/2026-02, please check it, do you want export like this? Please let you know me

@snowfox1003
Copy link
Copy Markdown
Collaborator

  • Remove all <a> tags. Markdown provides an auto-anchor system. If the subsection is ### 01:32:31.841 UTC — @twopic, its URL is #013231841-utc--twopic (replace blank spaces with -, use all lowercase letters, and remove special characters).
  • All metadata should start with >.
  • All metadata except attachments should be placed before the message.
  • In attachments, add two blank spaces at the start so that they display better in preview mode.
  • Use Url: <url of message> instead of [Source](link).
  • In my opinion, reactions should not be in the md file; they should only be stored in the database.
  • Some parts are not correct. (See ### 03:21:13.564 UTC — @tartarusfire in https://github.com/CppDigest/discord-cplusplus-together-context/blob/main/2026/2026-02/2026-02-01/Additional%20PyLongs.md)
  • Is it still impossible to get the bot messages? See: https://discord.com/channels/331718482485837825/5062744055009
### 01:34:12.223 UTC — @tartarusfire
> Reply to: [@twopic (01:32:31.841 UTC)](#013231841-utc--twopic)
> Url: https://discord.com/channels/331718482485837825/839604334023147571/1467332174955937835

jensen's background music makes it better
https://cdn.discordapp.com/emojis/1162599085421899827.png?size=48

> Attachments:
  - [image.png](https://cdn.discordapp.com/attachments/839604334023147571/1467336901508726928/image.png?ex=698c89bb&is=698b383b&hm=a32b052f04fe558f2d7af3a6c886846fb398a1701201947e25b3dea5ee19b1e9&)

01:34:12.223 UTC — @TartarusFire

Reply to: @twopic (01:32:31.841 UTC)
Url: https://discord.com/channels/331718482485837825/839604334023147571/1467332174955937835

jensen's background music makes it better
https://cdn.discordapp.com/emojis/1162599085421899827.png?size=48

Attachments:

Copy link
Copy Markdown
Collaborator

@snowfox1003 snowfox1003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review feedback

Performance

  • Exporting only 10 days of Discord history can take over an hour. That is a known limitation when you pull a lot of history at once.

Intended use vs full scrape

  • The design is for daily-incremental runs (like in boost-data-collector). Each run only grabs new data since the last sync, so one day's worth is small and runs stay fast.
  • For full-historical scrapes, a different approach may work better instead of using the same flow with a big --days-back or "all history" from the CLI.

Data and service layer

  • All user/identity profiles in this project live in cppa_user_tracker.
  • When you add or update user-related records (identities, profiles, emails, etc.), use cppa_user_tracker.services. Do not write to models or other apps directly.
  • See docs/Contributing.md and docs/service_api/cppa_user_tracker.md.

Documentation

  • Project docs live under the docs/ folder.
  • Add (or extend) service API docs: add docs/service_api/discord_activity_tracker.md and link it from docs/service_api/README.md so the Discord tracker is in the same service API index as the other apps.

Workspace and raw files

  • Use the app-level workspace (each app's workspace module or config.workspace.get_workspace_path(app_slug)). Do not hardcode project-root paths like workspace/exporter_temp.
  • Put raw export files in that app's workspace under a raw subfolder (e.g. workspace/raw/discord_activity_tracker).
  • This follows the pattern in docs/Workspace.md: one workspace root, one subfolder per app. Temp and raw data stay under the app that owns them.

@wpak-ai
Copy link
Copy Markdown
Collaborator

wpak-ai commented Feb 18, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 18, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new discord_activity_tracker Django app with models, migrations, services, async sync clients and DiscordChatExporter wrappers, markdown export & git push tooling, management commands, workspace utilities, tests, docs, settings/env entries, and migrations migrating DiscordUser → DiscordProfile and deleting DiscordUser.

Changes

Cohort / File(s) Summary
Configuration & Ignore
\.env.example, config/settings.py, config/test_settings.py, \.gitignore
Adds Discord env placeholders (DISCORD_TOKEN, DISCORD_USER_TOKEN, DISCORD_SERVER_ID, DISCORD_CONTEXT_REPO_PATH), registers discord_activity_tracker in INSTALLED_APPS and workspace slugs, updates test settings, and adds ignore patterns.
User profile & cppa_user_tracker
cppa_user_tracker/models.py, cppa_user_tracker/migrations/0003_discordprofile_alter_baseprofile_type.py, cppa_user_tracker/services.py
Adds DiscordProfile (BaseProfile multi-table inheritance), expands ProfileType with DISCORD, indexes type field, and adds get_or_create_discord_profile service.
Discord app models & migrations
discord_activity_tracker/models.py, discord_activity_tracker/migrations/0001_initial.py, .../0002_migrate_users_to_discord_profile.py, .../0003_alter_discordmessage_author.py, .../0004_delete_discorduser.py
Introduces DiscordServer/Channel/Message/Reaction models with indexes/constraints; data migration remaps DiscordUser → DiscordProfile (bulk SQL); repoints DiscordMessage.author to DiscordProfile; deletes legacy DiscordUser model.
Service layer & workspace utilities
discord_activity_tracker/services.py, discord_activity_tracker/workspace.py
Implements get-or-create primitives, bulk upsert workflows (users, messages, reactions), transactional batch processing, channel activity/sync updates, and workspace path helpers for exported JSON.
Sync clients & exporter wrappers
discord_activity_tracker/sync/client.py, discord_activity_tracker/sync/chat_exporter.py, discord_activity_tracker/sync/messages.py, discord_activity_tracker/sync/utils.py, discord_activity_tracker/sync/export.py
Adds async DiscordSyncClient (discord.py) with fetch/serialize, DiscordChatExporter CLI wrapper and parser, message normalization, concurrent/incremental/full sync orchestration, markdown export (per-day/month, sanitization, reply linking), and git commit/push helpers.
Management commands
discord_activity_tracker/management/commands/run_discord_activity_tracker.py, .../run_discord_exporter.py, .../debug_discord_export.py
Adds commands for syncing/exporting/importing (discord.py or DiscordChatExporter), debug inspection, CLI args, dry-run modes, and configuration validation.
Admin & App config
discord_activity_tracker/admin.py, discord_activity_tracker/apps.py
Adds AppConfig and admin registrations for Discord models (list_display, search, readonly, date_hierarchy).
Tests
discord_activity_tracker/tests/test_bulk_services.py, discord_activity_tracker/tests/test_export.py
Adds tests for bulk DB services (idempotence, upserts) and export/markdown behavior including sanitization and anchor generation.
Docs & service API
docs/Contributing.md, docs/service_api/README.md, docs/service_api/cppa_user_tracker.md, docs/service_api/discord_activity_tracker.md
Documents new service API and DiscordProfile/get_or_create_discord_profile, updates contributing and service API indexes and quick reference.
Dependencies
requirements.txt
Adds discord.py>=2.3.0 and python-dateutil>=2.8.0.
Removed workflow pieces & tests
workflow/management/commands/run_all_collectors.py (deleted), workflow/tests/fixtures.py (fixture removed), workflow/tests/test_commands.py (deleted)
Removes consolidated run_all_collectors command and its associated tests/fixture.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Cmd as Management Command
    participant Client as DiscordSyncClient
    participant Exporter as DiscordChatExporter
    participant Parser as Export Parser
    participant Services as Service Layer
    participant DB as Database
    participant Git as Git Repo

    User->>Cmd: invoke (sync / import / export / debug)
    alt sync via discord.py
        Cmd->>Client: connect & fetch guild/channels/messages
        Client-->>Services: raw message dicts
    else sync via DiscordChatExporter
        Cmd->>Exporter: export_guild_to_json(user_token, guild_id)
        Exporter-->>Parser: JSON files
        Parser-->>Services: parsed message dicts
    end
    Services->>DB: bulk upsert users, messages, reactions
    DB-->>Services: created/updated results
    Services->>DB: update channel last_activity/last_synced
    Cmd->>Services: export_and_push(context_repo_path, server)
    Services->>Git: commit_and_push_context_repo
    Git-->>Cmd: push result
    Cmd-->>User: summary / logs
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐰 I nibbled logs and chased each thread,

Servers and channels neatly fed,
Messages hop into markdown rows,
Commits bound where the context grows,
Hooray — the warren tracks Discord's tread!

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Note

🎁 Summarized by CodeRabbit Free

Your organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login.

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread discord_activity_tracker/models.py Outdated
Comment thread discord_activity_tracker/README.md Outdated
Comment thread docs/Workspace.md Outdated
Comment thread docs/Workspace.md Outdated
Comment thread docs/Workspace.md Outdated
Comment thread discord_activity_tracker/services.py Outdated
Comment thread discord_activity_tracker/models.py Outdated
Comment thread discord_activity_tracker/EXPORTER_INTEGRATION.md Outdated
Comment thread discord_activity_tracker/sync/client.py
Copy link
Copy Markdown
Collaborator

@snowfox1003 snowfox1003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question:
Why did you delete run_all_collectors.py, fixtures.py and test_commands.py in workflow app?

Comment thread .gitignore
Copy link
Copy Markdown
Collaborator

@snowfox1003 snowfox1003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@timon0305 timon0305 requested a review from wpak-ai February 23, 2026 15:25
@wpak-ai wpak-ai merged commit 81d823f into develop Feb 24, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants