# KaxaNuk Data Corrector: error analysis

## 00: Preamble
Data provider issues are a rather common problem that financial analysts need to address. Data quality translates to model quality, model quality translates into alpha. The purpose of this challenge is to first gain an understanding of the problems by analyzing the problems themselves, and their underlying causes and patterns. This in order to propose better informed solutions to data correction or imputation.

## 01: Provided error log analysis

Financial Modeling Prep's data errors are **not isolated incidents but systemic issues** concentrated in three categories: non-standard securities (preferred shares, warrants, senior notes), companies undergoing corporate actions, and illiquid micro-cap stocks. The errors stem from data architecture that struggles with securities that deviate from standard common stock data structures.

### The "sorted by date" errors

The 74 tickers generating "FundamentalData.rows not correctly sorted by date" errors reveal teh same patterns. Nearly **40% are non-common equity securities**—preferred shares, warrants, or senior notes—that have fundamentally different data reporting requirements than common stock. The B. Riley Financial family alone contributes 8 tickers (RILY, RILYG, RILYK, RILYL, RILYN, RILYT, RILYZ, RILYP), spanning common stock, preferred shares, and tradeable senior notes. Federal Agricultural Mortgage (Farmer Mac) adds 8 more (AGM and seven preferred series), while Presidio Property Trust contributes common stock, preferred shares, and warrants.

**Corporate actions create data discontinuities** across this list. At least 12 tickers underwent mergers, acquisitions, or name changes in 2024-2025:

| Ticker | Event | Date |
|--------|-------|------|
| FARO | Acquired by AMETEK | July 2025 |
| IVAC | Acquired by Seagate | March 2025 |
| SASR | Acquired by Atlantic Union | April 2025 |
| APDN | Rebranded to BNBX | October 2025 |
| MICS | Became RIME | September 2024 |
| ATON | Rebranded AlphaTON Capital | September 2025 |
| NBP | Former I-Mab, now NovaBridge | October 2025 |

Several companies are in financial distress: B. Riley Financial suspended dividends and faces Nasdaq delisting risk after **$435-475M quarterly losses**; Ideanomics (IDEX) filed Chapter 11 bankruptcy in December 2024 following SEC fraud settlements; Staffing 360 Solutions (STAF) was delisted to OTC.

### Market cap distribution

The market cap breakdown exposes another pattern: **approximately 35-40% of affected tickers are micro-cap stocks** (under $300M market capitalization). These include XELB ($8-11M), EVTV ($20M), SOTK ($40M), TPCS ($35M), and DLPN ($50M). Micro-cap stocks typically have less rigorous data reporting, lower analyst coverage, and more frequent data quality issues due to limited institutional oversight.

| Market Cap Category | Percentage of Error Tickers |
|---------------------|---------------------------|
| Micro-cap (<$300M) | ~35-40% |
| Small-cap ($300M-$2B) | ~25-30% |
| Mid-cap ($2B-$10B) | ~20% |
| Large-cap (>$10B) | ~15% |

The large-cap tickers that appear—DOV ($27B), DG ($22-24B), JBL ($18B), RBA ($19.5B)—likely experience errors due to corporate actions rather than data quality. RBA (RB Global) completed a major merger with IAA in 2023, and MTZ (MasTec) faced shareholder lawsuits creating reporting complexities.

### Preferred shares dominate the "no data returned" errors

All 12 tickers generating "No data returned by unadjusted market data endpoint" errors are **preferred shares or eliminated share classes**:

- **PEI series** (PEI-PB, PEI-PC, PEI-PD): Pennsylvania REIT preferred shares—company emerged from Chapter 11 bankruptcy in 2020 with restructured capital
- **PSB series** (PSB-PX, PSB-PY, PSB-PZ): PS Business Parks preferred depositary shares—parent company was acquired by Blackstone for **$7.6 billion in 2022**, delisting common stock but potentially leaving preferred shares trading
- **NRZ series** (NRZ-PA, NRZ-PB, NRZ-PC): Rithm Capital (formerly New Residential Investment) fixed-to-floating rate preferreds experiencing LIBOR transition complications
- **STZ-B**: Constellation Brands Class B stock—**eliminated entirely in November 2022** when the Sands family exchanged their super-voting shares for $64.64 cash plus Class A shares
- **PNC-PP**: PNC Financial Series P preferred with complex fixed-to-floating rate structure
- **ALP-PQ**: Appears to be an invalid or delisted ticker

The pattern is unmistakable: FMP's unadjusted market data endpoint cannot handle preferred share structures, depositary shares, or securities that no longer trade but retain historical data.

### Warrants create impossible price relationships

The three tickers with "MarketDataDailyRow low > high" errors are all **SPAC warrants trading at near-zero prices**:

| Ticker | Company | Current Price | Status |
|--------|---------|---------------|--------|
| UWMC-WT | UWM Holdings | ~$0.01 | NYSE delisting proceedings initiated December 19, 2025 |
| BFLY-WT | Butterfly Network | ~$0.02 | Extremely illiquid, ~44K average daily volume |
| ML-WT | MoneyLion | ~$0.26 | Thinly traded, expires September 2026 |

When securities trade at fractions of a penny with minimal volume, bad tick data becomes inevitable. Wide bid-ask spreads, stale quotes, and erroneous trade reports create situations where recorded daily lows can exceed daily highs. UWMC-WT is actively being delisted for "abnormally low selling price"—the security is essentially worthless.

### Negative shares outstanding traces to corporate restructuring

The three "Negative shares outstanding" errors (HELE, QLGN, ELDN) correlate directly with significant corporate events:

**HELE** (Helen of Troy, error date May 1, 2017): No stock splits, but the company operates on a February fiscal year-end. The error date falls during fiscal year transitions when share counts from buyback programs may create calculation discrepancies across FMP's data sources.

**QLGN** (Qualigen Therapeutics, error date November 14, 2025): This company has undergone **two reverse stock splits** (1-for-10 in 2022, 1-for-50 in 2024), was acquired by Faraday Future as a 55% stakeholder, and rebranded to AIxCrypto Holdings in November 2025—all creating massive data discontinuities.

**ELDN** (Eledon Pharmaceuticals, error date November 14, 2025): A **$50 million dilutive offering** closed around November 12, 2025, adding 15+ million shares plus warrants, increasing share count by over 100% year-over-year. The error date coincides exactly with this offering.

### The single negative price error points to data corruption

ASB-PF (Associated Banc-Corp Series F Preferred) showing a negative low price is simply **data corruption**. Preferred stocks have complex ex-dividend adjustments, and a calculation error in FMP's dividend adjustment pipeline likely produced an impossible negative value. The security trades normally around $21 with a 6.65% yield.

### Conclusions: systemic issues with non-standard securities

These errors demonstrate FMP has **architectural limitations handling three categories of securities**:

**Non-standard security types**: Preferred shares, warrants, senior notes, and depositary shares have different data structures, reporting requirements, and pricing mechanics than common stock. FMP's fundamental data infrastructure appears designed primarily for common equity.

**Corporate action transitions**: Mergers, acquisitions, reverse splits, name changes, and bankruptcies create data discontinuities. When Constellation Brands eliminated STZ-B or Blackstone acquired PSB, historical data must be handled differently—FMP's pipeline struggles with these transitions.

**Illiquid and penny securities**: When UWMC-WT trades at $0.0098 with minimal volume, standard data validation breaks down. The "low > high" errors are essentially the data provider acknowledging bad tick data from nearly untradeable securities.

The pattern suggests FMP should implement separate handling for non-common equity, add corporate action flags to identify transitioning securities, and apply different validation rules for illiquid instruments. For users, these errors serve as a useful filter—securities generating them often require manual review regardless of the data provider.




### 01.1: Correction strategies

| Error Category | Specific Scenario | Correction Strategy                                                                                                                                                                                                                                                                                                                                                                                        |
| :--- | :--- |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **1. Temporal Sorting Failures** | **Restatement Risks**<br>*(e.g., MasTec, Flowserve)* | **Skip Strict Sorting:** The "not sorted" error is a validation check. Adjust the ingestion script to bypass this check. After data is ingested, sort entries by market date and give the user an option to preserve only the entries with an earlier filing date (to avoid look-forward bias) or to preserve the most recent data. Only one version of each period should be accepted to avoid conflicts. |
| | **Parent-Child Inheritance**<br>*(e.g., B. Riley, Farmer Mac)* | **Implement "Parent-Proxy" Logic:** Do not query child tickers (e.g., `RILYZ`) directly. Query the **Parent (RILY)** fundamental data and map it to the child security, as child tickers have no independent EDGAR existence.                                                                                                                                                                              |
| **2. "Zombie Ticker" Failures** | **Bankruptcy / Delisting**<br>*(e.g., PREIT, PEI)* | **Purge Universe:** Remove these tickers from the data request, data for these is unavailable.                                                                                                                                                                                                                                                                                                             |
| | **Acquisition**<br>*(e.g., PS Business Parks)* | **Purge Universe:** Remove acquired tickers (`PSB`, `PSB-PX`) as they have been redeemed and data for these is unavailable.                                                                                                                                                                                                                                                                                |
| | **Rebranding**<br>*(e.g., New Residential)* | **Remap Tickers:** Update the symbol map. Change all `NRZ` requests to `RITM` (e.g., `NRZ-PA` $\rightarrow$ `RITM-PA`).                                                                                                                                                                                                                                                                                    |
| **3. Calculation Artifacts** | **Reverse Splits**<br>*(e.g., Qualigen, Eledon)* | **Manual Override:** Flag micro-caps with "Negative Shares". Backward fill with the next valid observation or Forward fill the last valid observation (to avoid look-forward bias).                                                                                                                                                                                                                        |
| | **Illiquidity / Warrants**<br>*(e.g., UWMC-WT)* | **Drop Tickers or Impute Values:** Recognize `Low > High` as a data quality issue specific to illiquid ticks, not a system failure. **Impute** low = min(open, high, low, close).                                                                                                                                                                                                                          |

Points 1 and 3 can be implemented using DataCurator custom calculations (except for the ticker remapping). The strategy in point 2 requieres manual intervention.

## 02: Errors not covered by the error log

The previous log analysis and data correction strategies were consequent to a thorough forensic examination of the critical errors that kept the datacurator from extracting the data. This section tackles non-critical errors that did not keep the data from being downloaded.

### 02.1: Sanity check