Skip to content

Databricks CSV seed load coercing NULL booleans to False #5589

@saipraveenpn

Description

@saipraveenpn

Observed in sqlmesh[databricks] v0.224.
Seed models with Boolean columns having NULL values are coerced to False in the INSERT query. The ideal behavior would be to leave NULLs as is without manipulating data. I was unable to change this behavior through csv_settings.keep_default_na or csv_settings.na_values.

Example snippets to reproduce issue:
test_seed.csv

id,test_ind
1,null
2,false
3,true
4,null

test_seed.sql

MODEL (
  name test_lookup.test_seed
  kind SEED (
    path '$root/seeds/test_seed.csv',
    -- csv_settings (
    --   keep_default_na = false,
    --   na_values = []
    -- )
  ),
  columns (
    id string,
    test_ind boolean
  ),
  tags ( 'SEED' )
);

Generated INSERT Query

INSERT INTO `sqlmesh_testing_dev`.`test_lookup`.test_seed
  REPLACE WHERE
    TRUE
  SELECT
    CAST(`id` AS STRING) AS `id`,
    CAST(`test_ind` AS BOOLEAN) AS test_ind
  FROM
    VALUES ('1', FALSE), ('2', FALSE), ('3', TRUE), ('4', FALSE) AS `t` (`id`, `test_ind`)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions