Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pulsar-sql] Handle schema not found #4890

Merged
merged 6 commits into from
Aug 10, 2019

Conversation

codelipenghui
Copy link
Contributor

Motivation

Currently, pulsar does not store byte[] schema in schema registry, while use send message with byte[], use pulsar sql will cause

Query 20190805_043001_00002_7yazb failed: Table has no columns: pulsar:PulsarTableHandle{connectorId=pulsar, schemaName=public/default, tableName=test, topicName=test}

Modifications

Handle schema not found exception and use default schema.

SchemaInfo.builder()
                .type(SchemaType.BYTES)
                .schema(new byte[0])
                .name(name)
                .build();

Verifying this change

Unit tests passed

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API: (no)
  • The schema: (yes)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: ( no)
  • The admin cli options: (no)
  • Anything that affects deployment: (/ no)

Documentation

  • Does this pull request introduce a new feature? (no)

@codelipenghui codelipenghui self-assigned this Aug 5, 2019
@codelipenghui codelipenghui added the area/sql Pulsar SQL related features label Aug 5, 2019
@codelipenghui codelipenghui added this to the 2.5.0 milestone Aug 5, 2019
@@ -50,4 +51,11 @@ static SchemaHandler newPulsarSchemaHandler(SchemaInfo schemaInfo,
}
}

static SchemaInfo defaultSchema(String name) {
return SchemaInfo.builder()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can simplify with Schema.BYTES.getSchemaInfo(). This also avoids creating a schema info object again and again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it.

@sijie sijie modified the milestones: 2.5.0, 2.4.1 Aug 5, 2019
@@ -352,7 +352,7 @@ private ConnectorTableMetadata getTableMetadata(SchemaTableName schemaTableName,
ColumnMetadata valueColumn = new PulsarColumnMetadata(
"__value__",
convertPulsarType(schemaInfo.getType()),
null, null, false, false,
"The value of the message with primary type schema", null, false, false,
Copy link
Contributor

@jerrypeng jerrypeng Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The value of the message with primitive schema type"

Copy link
Contributor

@jerrypeng jerrypeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Minor spelling mistake. Possible to add any tests?

@codelipenghui
Copy link
Contributor Author

@jerrypeng I have added a unit test for getSplit while topic have no schema, PLAT.

@codelipenghui
Copy link
Contributor Author

run cpp tests

@codelipenghui
Copy link
Contributor Author

run java8 tests

@sijie
Copy link
Member

sijie commented Aug 10, 2019

// org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?

run java8 tests

@sijie sijie merged commit 2069f76 into apache:master Aug 10, 2019
@mingfang
Copy link

I'm not sure it this is related but I'm seeing the "Table has no columns" error when running the Pulsar SQL tutorial here https://pulsar.apache.org/docs/en/sql-getting-started/.

Steps to recreate the problem
1-Generate some data by running this command for a minute and then kill it.
./bin/pulsar-admin sources localrun --name generator --destinationTopicName generator_test -a connectors/pulsar-io-data-generator-2.4.0.nar

2-Use Pulsar SQL to "consume" the data
select * from pulsar."public/default".generator_test;

3-Repeat steps 1 and 2. In step 2 I would see this error
Query 20190823_033331_00067_sw7tk failed: Table has no columns: pulsar:PulsarTableHandle{connectorId=pulsar, schemaName=public/default, tableName=generator_test, topicName=generator_test}

I can reproduce this error 100% of the time.

@sijie
Copy link
Member

sijie commented Aug 23, 2019

@mingfang

3-Repeat steps 1 and 2. In step 2 I would see this error

Do you mean you stop the generator and rerun it again? Or did you just rerun step2?

@mingfang
Copy link

You have to have some data into to see the error therefore you have to repeat step 1, run the generator for a minute and then kill it before running step 2 again.

My wild guess is that the schema got deleted and is not getting recreated by the generator because the topic already exist when running after the first time. If I change the topic name then it works again.

jiazhai pushed a commit that referenced this pull request Aug 28, 2019
* Handle get schema 404 in pulsar sql(table meta and get splits)

* Fix unit test.

* add defaultSchema()

* use Schema.BYTES.getSchemaInfo()

* add unit test

* rebase and fix unit tests

(cherry picked from commit 2069f76)
@codelipenghui codelipenghui deleted the pulsar_sql_schema_404 branch May 19, 2021 05:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sql Pulsar SQL related features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants