Skip to content

Add mic_code (ISO 10383 MIC) column#149

Merged
JerBouma merged 1 commit into
JerBouma:mainfrom
dokson:feature/add-mic-code
May 29, 2026
Merged

Add mic_code (ISO 10383 MIC) column#149
JerBouma merged 1 commit into
JerBouma:mainfrom
dokson:feature/add-mic-code

Conversation

@dokson
Copy link
Copy Markdown
Contributor

@dokson dokson commented May 29, 2026

Closes #110.

Adds a standardised ISO 10383 MIC code for each ticker, as requested in the issue, by mapping the existing Yahoo exchange code to its MIC.

Data

  • New mic_code column (placed right after exchange) in equities, etfs, funds and indices.
  • Every Yahoo exchange code that corresponds to a real trading venue is mapped to its operating MIC (e.g. NMS/NGM/NCM → XNAS, NYQ → XNYS, LSE/IOB → XLON, GER → XETR, JPX → XJPX).
  • All mapped MICs were validated as ACTIVE against the official ISO 10383 registry, with the ISO country code cross-checked against the data.
  • Pseudo-codes that are not trading venues (indices, FX, crypto, NAV mutual funds) and venue-ambiguous codes (e.g. ENX, generic Euronext) are intentionally left blank rather than guessed.
  • Spot-checked end-to-end on well-known tickers (AAPL→XNAS, JPM→XNYS, VOD.L→XLON, SAP.DE→XETR, 7203.T→XJPX, NESN.SW→XSWX, 005930.KS→XKRX, …).

Package

  • mic_code added as a filter to select() and as an option in show_options() for every asset class.
  • The duplicated validate-and-filter blocks were refactored into a single shared helper (FinanceDatabase._filter_by_options) with a centralised COLUMN_LABELS map for the error messages.
  • select() is now annotated to return FinanceFrame; search(**kwargs) typing broadened.
  • Fixed a pandas DtypeWarning and cleared the package's pyright/ruff findings.

Tests

  • New coverage for the mic_code filter, show_options, and an exchange → mic_code one-to-one invariant.
  • Snapshot fixtures regenerated.

Note: compression/ artifacts are intentionally not committed — they are regenerated by the Database-Update workflow.

@JerBouma
Copy link
Copy Markdown
Owner

JerBouma commented May 29, 2026

It says the column is right next to exchange, shouldn't it be alongside the other codes? E.g. after the ISIN column? Whatever makes the most sense.

Other columns are without the word "code" so I'd say we just make it "mic" (and MIC already contains the word code 😉)

EDIT: it's related to exchange so location of the column makes sense!

@dokson
Copy link
Copy Markdown
Contributor Author

dokson commented May 29, 2026

Good points, thanks!

On the name: agreed — I'll rename it to mic to stay consistent with the other columns (none use a code suffix).

On the placement: I put it right after exchange on purpose, because mic is essentially the ISO-standardised encoding of exchange — both identify the trading venue, just in different notations (the Yahoo code vs. ISO 10383). Keeping them adjacent (and next to market) lets you read the venue in both forms at a glance. isin / cusip / figi, on the other hand, identify the security itself rather than the venue, so semantically mic felt closer to exchange/market than to that block.

That said, it's your call — happy to move it next to isin with the other identifiers if you prefer that grouping. Just let me know and I'll update the PR (together with the mic rename).

@JerBouma
Copy link
Copy Markdown
Owner

Location makes sense indeed, if you could rename to mic this looks good to go!

@dokson dokson force-pushed the feature/add-mic-code branch from 07f5565 to 97767a2 Compare May 29, 2026 19:27
@dokson
Copy link
Copy Markdown
Contributor Author

dokson commented May 29, 2026

Location makes sense indeed, if you could rename to mic this looks good to go!

renamed 😉

@JerBouma
Copy link
Copy Markdown
Owner

JerBouma commented May 29, 2026

The code is making a lot of changes to the package, somewhat simplifying it but in this case how the code originally was written was done on purpose. This adds a layer of abstraction that, given the minimal size of the package, doesn't add much value.

I'd much prefer to keep the package the way it is even if it can in theory be done in less lines of code.

Resolves JerBouma#110.

Data:
- Add `mic` column (right after `exchange`) to equities, etfs, funds and
  indices, mapping each Yahoo exchange code to its ISO 10383 MIC. Pseudo-codes
  (indices/FX/crypto/NAV funds) and venue-ambiguous codes (e.g. ENX) are left
  blank rather than guessed. All MICs validated as ACTIVE against the official
  ISO 10383 registry.

Package:
- Add a `mic` filter to select()/show_options() across the asset classes,
  keeping the existing explicit per-field style.
- Annotate select() to return FinanceFrame and broaden search(**kwargs) typing.
- Fix the pandas DtypeWarning (low_memory=False) and clear pyright/ruff findings.

Tests:
- Add mic coverage (filter, show_options, exchange->MIC invariant) and
  regenerate the snapshot fixtures.
@dokson dokson force-pushed the feature/add-mic-code branch from 97767a2 to 5836813 Compare May 29, 2026 20:05
@dokson
Copy link
Copy Markdown
Contributor Author

dokson commented May 29, 2026

The code is making a lot of changes to the package, somewhat simplifying it but in this case how the code originally was written was done on purpose. This adds a layer of abstraction that, given the minimal size of the package, doesn't add much value.

I'd much prefer to keep the package the way it is even if it can in theory be done in less lines of code.

following your guide

@JerBouma
Copy link
Copy Markdown
Owner

Perfect, thank you. Once there is functionality for the delisted tickers (as mentioned in your issue) I'll release a new PyPi and GitHub release. Wanted to work on this somewhere next week.

@JerBouma JerBouma merged commit 2786d9d into JerBouma:main May 29, 2026
3 checks passed
@JerBouma
Copy link
Copy Markdown
Owner

JerBouma commented May 29, 2026

@dokson Forgot something, the GitHub Actions script doesn't include a placeholder for the new mic column yet. It also should be defined based on new additions over there given the exchange code will be filled.

@dokson
Copy link
Copy Markdown
Contributor Author

dokson commented May 29, 2026

the GitHub Actions script doesn't include a placeholder for the new mic column yet. It also should be defined based on new additions over there given the exchange code will be filled.

@JerBouma fixed in #150. In the Database-Update action, build_new_ticker now derives exchange → ISO 10383 MIC from the existing one-to-one data and fills mic for new tickers (placed right after exchange); unknown codes intentionally stay blank. Since the job only ever assigns NMS/ASE, those resolve to XNAS/XASE.

It also backfills the 16 NASDAQ rows (NMS → XNAS) that were added by a prior run before the column existed, and adds test_mic_filled_when_exchange_mapped — which fails if a row has a known exchange but a missing mic (the gap that let those rows ship).

JerBouma pushed a commit that referenced this pull request May 29, 2026
Two related data-quality resolutions:

1. composite_figi enrichment (adapts PR #125 to current main)
   - Fills empty composite_figi for 1,539 ISINs sourced via OpenFIGI,
     applied to 1,753 rows in equities.csv. Only previously-empty cells
     are populated; no existing value is overwritten and no other column
     is touched (coverage 7,235 -> 8,774 unique ISINs).

2. mic backfill + workflow fix (resolves PR #149 follow-up)
   - The Database-Update action did not set the new mic column for added
     tickers. build_new_ticker now derives exchange -> ISO 10383 MIC from
     the existing one-to-one data and fills it; unknown codes stay blank.
   - Backfills 16 NASDAQ rows (NMS -> XNAS) added by a prior run before
     the column existed.
   - Adds test_mic_filled_when_exchange_mapped: fails when a row has a
     known exchange but a missing mic (the gap that let those rows ship).

Snapshot fixtures regenerated; full suite passes (58).
@dokson dokson deleted the feature/add-mic-code branch May 29, 2026 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DATA] Add mic to the database

2 participants