Skip to content

[Bug] Paimon isn't writing data to s3 #2263

@gordonmurray

Description

@gordonmurray

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

I have tried the following versions with the same results

  • paimon-flink-1.17-0.5.0-incubating.jar
  • paimon-flink-1.17-0.6-20231105.002237-58.jar
  • paimon-flink-1.18-0.6-20231105.002237-43.jar

Compute Engine

flink:1.17.1 docker image

Minimal reproduce step

I use the following Docker Compose file to start Flink version 1.17.1:

version: '3.7'

services:

  mariadb:
    image: mariadb:10.6.14
    environment:
      MYSQL_ROOT_PASSWORD: rootpassword
    volumes:
      - ./sql/mariadb.cnf:/etc/mysql/mariadb.conf.d/mariadb.cnf
      - ./sql/seed.sql:/docker-entrypoint-initdb.d/seed.sql
    ports:
      - "3306:3306"

  jobmanager:
    image: flink:1.17.1
    container_name: jobmanager
    environment:
      - JOB_MANAGER_RPC_ADDRESS=jobmanager
    ports:
      - "8081:8081"
    command: jobmanager
    volumes:
      - ./jars/flink-sql-connector-mysql-cdc-2.4.1.jar:/opt/flink/lib/flink-sql-connector-mysql-cdc-2.4.1.jar
      - ./jars/flink-connector-jdbc-3.1.0-1.17.jar:/opt/flink/lib/flink-connector-jdbc-3.1.0-1.17.jar
      - ./jars/paimon-flink-1.18-0.6-20231105.002237-43.jar:/opt/flink/lib/paimon-flink-1.18-0.6-20231105.002237-43.jar
      - ./jars/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:/opt/flink/lib/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
      - ./jars/paimon-s3-0.6-20231027.002013-54.jar:/opt/flink/lib/paimon-s3-0.6-20231027.002013-54.jar
      - ./jobs/job.sql:/opt/flink/job.sql

  taskmanager:
    image: flink:1.17.1
    environment:
      - JOB_MANAGER_RPC_ADDRESS=jobmanager
    depends_on:
      - jobmanager
    command: taskmanager
    volumes:
      - ./jars/flink-sql-connector-mysql-cdc-2.4.1.jar:/opt/flink/lib/flink-sql-connector-mysql-cdc-2.4.1.jar
      - ./jars/flink-connector-jdbc-3.1.0-1.17.jar:/opt/flink/lib/flink-connector-jdbc-3.1.0-1.17.jar
      - ./jars/paimon-flink-1.18-0.6-20231105.002237-43.jar:/opt/flink/lib/paimon-flink-1.18-0.6-20231105.002237-43.jar
      - ./jars/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:/opt/flink/lib/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
      - ./jars/paimon-s3-0.6-20231027.002013-54.jar:/opt/flink/lib/paimon-s3-0.6-20231027.002013-54.jar
    deploy:
          replicas: 2

I then submit the following Flink SQL Job:

USE CATALOG default_catalog;

CREATE CATALOG s3_catalog WITH (
    'type' = 'paimon',
    'warehouse' = 's3://my-test-bucket/paimon',
    's3.access-key' = 'xxxxxx',
    's3.secret-key' = 'xxxxx'
);

USE CATALOG s3_catalog;

CREATE DATABASE my_database;

USE my_database;

CREATE TABLE myproducts (
    id INT PRIMARY KEY NOT ENFORCED,
    name VARCHAR,
    price DECIMAL(10, 2)
);

create temporary table products (
    id INT,
    name VARCHAR,
    price DECIMAL(10, 2),
    PRIMARY KEY (id) NOT ENFORCED
) WITH (
    'connector' = 'mysql-cdc',
    'connection.pool.size' = '10',
    'hostname' = 'mariadb',
    'port' = '3306',
    'username' = 'root',
    'password' = 'rootpassword',
    'database-name' = 'mydatabase',
    'table-name' = 'products'
);

INSERT INTO myproducts (id,name) SELECT id, name FROM products;

The products table has data in it, which looks like:

CREATE DATABASE IF NOT EXISTS mydatabase;

USE mydatabase;

CREATE TABLE IF NOT EXISTS products (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    price DECIMAL(10,2) NOT NULL
);

INSERT INTO products (name, price) VALUES
    ('Product A', 19.99),
    ('Product B', 29.99),
    ('Product C', 39.99);

On s3, I see a folder for the catalog and table is created successfully, however it only holds the schema folder and file. No other data is written.

Inside the schema file I see the following JSON which reflects the structure of the table.

{
  "id" : 0,
  "fields" : [ {
    "id" : 0,
    "name" : "id",
    "type" : "INT NOT NULL"
  }, {
    "id" : 1,
    "name" : "name",
    "type" : "STRING"
  }, {
    "id" : 2,
    "name" : "price",
    "type" : "DECIMAL(10, 2)"
  } ],
  "highestFieldId" : 2,
  "partitionKeys" : [ ],
  "primaryKeys" : [ "id" ],
  "options" : { },
  "timeMillis" : 1696694538055
}

I tried a table without the price DECIMAL(10,2) NOT NULL field in case that was a problem, no change.

There are no errors in the logs. The Job is running successfully in Flinks UI, just no records are being written to s3.

image

What doesn't meet your expectations?

I expect to see some table data in the s3 bucket. I only see the schema folder which indicates it is working at least partially.

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions