Refactor CNBC and InsiderTrading data formats for storage efficiency#6
Merged
AlexCatarino merged 2 commits intoMay 12, 2026
Merged
Conversation
- Extract TransactionCode, OwnershipType, AcquiredDisposedCode enums under the QuantConnect.DataSource.QuiverQuant namespace, each annotated with [EnumMember] for the SEC single-letter code so Newtonsoft serializes them directly without a custom converter. - Persist enum values, booleans, and OrderDirection in CSV as single letters / T-F / underlying int (helpers in QuiverQuantCsvExtensions), shrinking on-disk size for both datasets. - InsiderTrading now reads every field returned by live/insiders: Date, fileDate, TransactionCode, PricePerShare, Shares, SharesOwnedFollowing, AcquiredDisposedCode, DirectOrIndirectOwnership, OfficerTitle, IsDirector/IsOfficer/IsTenPercentOwner/IsOther. - Reader semantics: Time = uploadedDate.AddDays(-1) so EndTime equals the upload day. fileDate / adviceDate empty-shortcut falls back to that upload date (CNBC and InsiderTrading aligned). - InsiderTrading downloader mirrors the CNBC pattern: Run accumulates per-ticker in memory, Flush writes per-ticker files with per-ticker exception handling, ProcessUniverse rebuilds universe files from the per-ticker corpus by upload date. - Reject invalid tickers (e.g. "N/A") in the InsiderTrading Run loop to avoid Windows path-separator failures. - Program.cs lifts processing-date / processing-date-lookback so a single invocation can backfill recent days; CNBC and InsiderTrading share the same iteration loop. Dataset is now selectable via args[0]. - Add tests covering the Reader and universe Reader for the compact format plus every CSV helper mapping (33 helper cases + 8 Reader cases). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Bets - Program.cs now reads the dataset name from the `vendor-data-name` config key (default `cnbc`) instead of args, lifts `processingDate` and `processingStartDate` to the top so every case shares them, and prints the valid options when an unknown dataset is provided. - QuiverCongressDataDownloader now combines `destinationFolder` with `VendorName` and `VendorDataName`, matching every other downloader. Previously it dropped `VendorName` from the path. - QuiverWallStreetBetsDataDownloader stops re-prefixing `alternative` — Program.cs already passes `destinationDirectory` (which includes it), so the downloader now follows the same convention as the others. - config.json gains `processing-date-lookback` and `vendor-data-name` defaults; auth token kept empty. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TransactionCode,OwnershipType,AcquiredDisposedCode) sit in the newQuantConnect.DataSource.QuiverQuantsub-namespace with[EnumMember]so Newtonsoft serializes API JSON natively (no custom JsonConverters needed).live/insiders(transaction date, fileDate, transaction code, acquired/disposed, ownership, officer title, isDirector/isOfficer/isTenPercentOwner/isOther). Reader setsTime = uploadedDate.AddDays(-1)soEndTimeis the day data became available; fileDate/adviceDate empty in CSV ⇒ Reader reuses the upload date.Runaccumulates per-ticker,Flushwrites per-ticker files with per-ticker resiliency,ProcessUniverserebuilds universe files from the corpus. Invalid tickers (e.g.N/A) are filtered up front to avoid Windows path-separator failures.Program.csliftsprocessing-date+processing-date-lookback(default 0) so the same iteration loop backfills recent days for both datasets; dataset is selectable viaargs[0].Test plan
dotnet test tests/Tests.csproj— 61 Quiver tests pass.aapl.csvin both datasets shows compact format.