-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
Analysis of all records with date_event before 1901 revealed ~800 records with date errors across 4 problem categories, embedded among ~7,200 legitimately ancient sighting records.
An analysis dataset has been extracted to temp/historic_pre1901.db with a date_analysis table pre-classified into categories for manual review.
Problem Categories
1. UFOCAT century-only 19// (692 records)
- Raw date:
19//(YEAR=19, MO=empty, DAY=empty) - Parsed as:
0019-01-01(zero-padded 2-digit year) - Actual meaning: "Sometime in the 1900s" — year is genuinely unknown, only century known
- Evidence: Descriptions include modern events: "recalled abduction from orphanage as little girl", "Motion pictures", "radar confirmation". Cities: Sacramento, Miami, Chicago, Houston
- Root cause:
parse_ufocat_date()inimport_ufocat.pyline 39:f"{y:04d}"zero-pads19→0019 - Proposed fix: Set
date_event = NULL(year genuinely unknown)
2. UFOCAT 3-digit year ambiguity (88 records)
- 3-digit raw years (034–999), mostly legitimately ancient
- 2 confirmed modern mislabels:
195//→ "H-BOMB TEST" (clearly 1950s, not 195 AD)188//with states CN, NZL, FRA, TUR and no descriptions (possibly 1880s)
- Proposed fix: Manual classification in analysis DB, then targeted corrections
3. UPDB mangled modern years (~20 records)
- Raw JSON confirms broken years in upstream UPDB export
- Pattern examples:
0196= 1962 (description says "June 22-23, 1962")0200= ~2000 (Ellsworth AFB radar, modern descriptions)0191= ~1991 (Boxford, "bright red object...upside down saucer")0100= modern (Topanga Canyon, "triangle formation")
- Proposed fix: Manual correction from description context where possible; NULL the rest
4. NUFORC data entry errors (~3 records)
0205-01-05= 2005 (description: "Dad seen outside window")1071-06-16= ~2007 (description: "my friend spotted object in sky")1721-02-01= ~2021 (description: "straight line of lights in sky")- Proposed fix: Manual correction from description context
Legitimately Ancient Records (~7,200)
These are correct and need no changes:
- UFOCAT 4-digit years: 4,436 records (1001–1900)
- UFO-search: 1,984 records (Geldreich Majestic Timeline, 61–1900 AD)
- UPDB: ~760 records (1000–1900)
- MUFON: 40 records (1890s)
- NUFORC: ~23 records (historic sighting reports)
Analysis Tooling
extract_historic.py— Extracts pre-1901 records intotemp/historic_pre1901.dbtemp/historic_pre1901.db— Standalone SQLite DB withdate_analysistable containing:category— Auto-classified categorycorrected_year— Manual override column (NULL = no correction needed)notes— Reviewer notes- Views:
v_category_summary,v_3digit_review,v_century_only,v_updb_review,v_timeline
temp/ANALYSIS.md— Detailed analysis report
Next Steps
- Annotate ~110 ambiguous records in the analysis DB (UFOCAT 3-digit + UPDB + NUFORC)
- Decide on UFOCAT century-only handling (NULL vs. marker value)
- Generate SQL fix statements from annotated analysis DB
- Add tests for historic date fixes
- Apply fixes to
rebuild_db.py/import_ufocat.py - Update methodology documentation
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels