Skip to content

External store codec columns (<filepath@store>, <blob@store>) fail on MariaDB: TypeError: dict can not be used as parameter #1451

@esutlie

Description

@esutlie

Summary

#1438 / PR #1443 fixed json column handling on MariaDB by recovering attr.json from the :json: comment marker. External store codec columns (<filepath@store>, <blob@store>) have the same root cause but are not covered by that fix — their encode() returns a Python dict that reaches pymysql unserialized, raising TypeError: dict can not be used as parameter.

Reproduction

Using DataJoint 2.2.2 against MariaDB 10.3.28:

import datajoint as dj

schema = dj.Schema("test_filepath_mariadb")

dj.config["stores"] = {
    "local": {
        "protocol": "file",
        "location": "/tmp/dj_test_store",
        "stage": "/tmp/dj_test_store",
    }
}

@schema
class FileTest(dj.Manual):
    definition = """
    id : int
    ---
    data_file : <filepath@local>
    """

# Create a dummy file
import os
os.makedirs("/tmp/dj_test_store", exist_ok=True)
with open("/tmp/dj_test_store/test.bin", "wb") as f:
    f.write(b"test")

FileTest.insert1({"id": 1, "data_file": "test.bin"})
# TypeError: dict can not be used as parameter

Same error for <blob@store> columns.

Root cause

The filepath codec's encode() returns a dict ({path, store, size, timestamp}). The blob codec behaves similarly. On insert, __make_placeholder() (table.py line ~1383) checks attr.json to decide whether to json.dumps() the value:

elif attr.json:
    value = json.dumps(value)

On MySQL, the column type is reported as json from information_schema, so attr.json is set to True and the dict is serialized. On MariaDB, the column type is reported as longtext. PR #1443 recovers attr.json for plain json columns using the :json: comment marker, but codec columns have a :<filepath@store>: marker instead — their original_type resolves through the codec's get_dtype()"json", which should hit the same category == "JSON" recovery path, but empirically attr.json remains False for these columns on MariaDB.

The unserialized dict reaches pymysql, which raises TypeError in converters.escape_dict().

Impact

This affects all external store columns on MariaDB:

  • <filepath@store> — file reference metadata
  • <blob@store> — serialized data references

Both insert and fetch are affected (fetch returns a raw JSON string instead of a dict on MariaDB).

Current workaround

We are patching pymysql's converters at import time to serialize dicts as JSON strings:

import json
import pymysql.converters

def _escape_dict_as_json(val, charset, mapping=None):
    return pymysql.converters.escape_str(json.dumps(val, default=str), charset)

pymysql.converters.encoders[dict] = _escape_dict_as_json
pymysql.converters.conversions[dict] = _escape_dict_as_json

This must be applied before any DataJoint connection is created. On MySQL, it is inert (DataJoint serializes dicts before they reach pymysql). This workaround is in production at the Sainsbury Wellcome Centre.

Suggested fix

Extend the attr.json recovery from PR #1443 to cover codec columns whose get_dtype() resolves to "json". Alternatively, have __make_placeholder() call json.dumps() on any dict value destined for a longtext column, regardless of attr.json.

Environment

  • DataJoint 2.2.2
  • MariaDB 10.3.28
  • pymysql 1.1.1
  • Python 3.11

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIndicates an unexpected problem or unintended behavior

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions