Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change type of publish_time to timestamp #4757

Merged
merged 2 commits into from
Jul 19, 2019

Conversation

codelipenghui
Copy link
Contributor

Fixes #4734

Motivation

"publish_time" is Pulsar SQL internal column, as Pulsar only stores timestamps, it doesn’t store the timezone information. Use timestamp as "publish_time" type is more correct way in Pulsar SQL.

Modifications

Change type of publish_time to timestamp.

Verifying this change

predicate of publish_time is pushdown

Use __publish_time__ to trim messages:

SELECT COUNT(*)
FROM "sql-test-1" 
WHERE "__publish_time__" >= TIMESTAMP '2019-07-18 17:26:50.119' 
AND  "__publish_time__" < TIMESTAMP '2019-07-18 17:26:51.119';

image

Without __publish_time__ predicate:

SELECT COUNT(*)
FROM "sql-test-1";

image

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API: (no)
  • The schema: (no)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (no)
  • The admin cli options: (no)
  • Anything that affects deployment: (no)

Documentation

  • Does this pull request introduce a new feature? (no)

@codelipenghui
Copy link
Contributor Author

run Integration Tests
run java8 tests

@codelipenghui
Copy link
Contributor Author

run java8 tests

1 similar comment
@sijie
Copy link
Member

sijie commented Jul 18, 2019

run java8 tests

@sijie
Copy link
Member

sijie commented Jul 18, 2019

run java8 tests

Copy link
Contributor

@jerrypeng jerrypeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codelipenghui thanks for making this fix! I tested your branch out and everything is working correctly.

@jerrypeng
Copy link
Contributor

rerun java8 tests

@jiazhai jiazhai added this to the 2.5.0 milestone Jul 19, 2019
@jiazhai jiazhai added the area/sql Pulsar SQL related features label Jul 19, 2019
@jiazhai jiazhai modified the milestones: 2.5.0, 2.4.1 Jul 19, 2019
@sijie sijie merged commit 6f5416e into apache:master Jul 19, 2019
@codelipenghui codelipenghui deleted the change_type_publish_time branch July 19, 2019 09:08
easyfan pushed a commit to easyfan/pulsar that referenced this pull request Jul 26, 2019
Fixes apache#4734

### Motivation

"publish_time" is Pulsar SQL internal column, as Pulsar only stores timestamps, it doesn’t store the timezone information. Use timestamp as "publish_time" type is more correct way in Pulsar SQL.

### Modifications

Change type of publish_time to timestamp.

### Verifying this change

predicate of publish_time is pushdown

Use `__publish_time__` to trim messages:
```
SELECT COUNT(*)
FROM "sql-test-1" 
WHERE "__publish_time__" >= TIMESTAMP '2019-07-18 17:26:50.119' 
AND  "__publish_time__" < TIMESTAMP '2019-07-18 17:26:51.119';
```
![image](https://user-images.githubusercontent.com/12592133/61447301-43835080-a983-11e9-814b-bc2b378f02b9.png)

Without `__publish_time__` predicate:
```
SELECT COUNT(*)
FROM "sql-test-1";
```
![image](https://user-images.githubusercontent.com/12592133/61447427-82190b00-a983-11e9-8d3f-3bf2a4798047.png)
jiazhai pushed a commit that referenced this pull request Aug 28, 2019
Fixes #4734

### Motivation

"publish_time" is Pulsar SQL internal column, as Pulsar only stores timestamps, it doesn’t store the timezone information. Use timestamp as "publish_time" type is more correct way in Pulsar SQL.

### Modifications

Change type of publish_time to timestamp.

### Verifying this change

predicate of publish_time is pushdown

Use `__publish_time__` to trim messages:
```
SELECT COUNT(*)
FROM "sql-test-1"
WHERE "__publish_time__" >= TIMESTAMP '2019-07-18 17:26:50.119'
AND  "__publish_time__" < TIMESTAMP '2019-07-18 17:26:51.119';
```
![image](https://user-images.githubusercontent.com/12592133/61447301-43835080-a983-11e9-814b-bc2b378f02b9.png)

Without `__publish_time__` predicate:
```
SELECT COUNT(*)
FROM "sql-test-1";
```
![image](https://user-images.githubusercontent.com/12592133/61447427-82190b00-a983-11e9-8d3f-3bf2a4798047.png)

(cherry picked from commit 6f5416e)
sijie pushed a commit that referenced this pull request Feb 7, 2020
…d due to TTL (#6211)

Fixes #5579 

### Motivation

In Pulsar 2.4.1 and later versions, if message TTL is enabled, `PersistentMessageExpiryMonitor` always deletes one non-expired message every 5 minutes.

The cause of this bug is #4744. `PersistentMessageExpiryMonitor` expects `ManagedCursor#asyncFindNewestMatching()` to pass null as its found position to itself as a callback if no expired messages exist.
https://github.com/apache/pulsar/blob/c5ba52983fee994de61984aae7d1757e9b738caf/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentMessageExpiryMonitor.java#L119-L130

However, due to the change in #4744, if no entry is found that matches the search condition, the callback will be passed `startPosition` instead of null now. For this reason, the earliest backlog message is always deleted by `PersistentMessageExpiryMonitor`.

This means that unexpected message loss can occur.

### Modifications

Revert the #4744 changes. The motivation of #4744 is to avoid NPE caused in pulse-sql, but that seems to be fixed in #4757.
https://github.com/apache/pulsar/blob/2069f761753940ed6a1faca8999af70036f20fd6/pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto/PulsarSplitManager.java#L363-L382
aahmed-se pushed a commit to aahmed-se/pulsar that referenced this pull request Feb 11, 2020
…d due to TTL (apache#6211)

Fixes apache#5579 

### Motivation

In Pulsar 2.4.1 and later versions, if message TTL is enabled, `PersistentMessageExpiryMonitor` always deletes one non-expired message every 5 minutes.

The cause of this bug is apache#4744. `PersistentMessageExpiryMonitor` expects `ManagedCursor#asyncFindNewestMatching()` to pass null as its found position to itself as a callback if no expired messages exist.
https://github.com/apache/pulsar/blob/c5ba52983fee994de61984aae7d1757e9b738caf/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentMessageExpiryMonitor.java#L119-L130

However, due to the change in apache#4744, if no entry is found that matches the search condition, the callback will be passed `startPosition` instead of null now. For this reason, the earliest backlog message is always deleted by `PersistentMessageExpiryMonitor`.

This means that unexpected message loss can occur.

### Modifications

Revert the apache#4744 changes. The motivation of apache#4744 is to avoid NPE caused in pulse-sql, but that seems to be fixed in apache#4757.
https://github.com/apache/pulsar/blob/2069f761753940ed6a1faca8999af70036f20fd6/pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto/PulsarSplitManager.java#L363-L382
tuteng pushed a commit to AmateurEvents/pulsar that referenced this pull request Feb 23, 2020
…d due to TTL (apache#6211)

Fixes apache#5579 

### Motivation

In Pulsar 2.4.1 and later versions, if message TTL is enabled, `PersistentMessageExpiryMonitor` always deletes one non-expired message every 5 minutes.

The cause of this bug is apache#4744. `PersistentMessageExpiryMonitor` expects `ManagedCursor#asyncFindNewestMatching()` to pass null as its found position to itself as a callback if no expired messages exist.
https://github.com/apache/pulsar/blob/c5ba52983fee994de61984aae7d1757e9b738caf/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentMessageExpiryMonitor.java#L119-L130

However, due to the change in apache#4744, if no entry is found that matches the search condition, the callback will be passed `startPosition` instead of null now. For this reason, the earliest backlog message is always deleted by `PersistentMessageExpiryMonitor`.

This means that unexpected message loss can occur.

### Modifications

Revert the apache#4744 changes. The motivation of apache#4744 is to avoid NPE caused in pulse-sql, but that seems to be fixed in apache#4757.
https://github.com/apache/pulsar/blob/2069f761753940ed6a1faca8999af70036f20fd6/pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto/PulsarSplitManager.java#L363-L382
tuteng pushed a commit to AmateurEvents/pulsar that referenced this pull request Mar 21, 2020
…d due to TTL (apache#6211)

Fixes apache#5579 

### Motivation

In Pulsar 2.4.1 and later versions, if message TTL is enabled, `PersistentMessageExpiryMonitor` always deletes one non-expired message every 5 minutes.

The cause of this bug is apache#4744. `PersistentMessageExpiryMonitor` expects `ManagedCursor#asyncFindNewestMatching()` to pass null as its found position to itself as a callback if no expired messages exist.
https://github.com/apache/pulsar/blob/c5ba52983fee994de61984aae7d1757e9b738caf/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentMessageExpiryMonitor.java#L119-L130

However, due to the change in apache#4744, if no entry is found that matches the search condition, the callback will be passed `startPosition` instead of null now. For this reason, the earliest backlog message is always deleted by `PersistentMessageExpiryMonitor`.

This means that unexpected message loss can occur.

### Modifications

Revert the apache#4744 changes. The motivation of apache#4744 is to avoid NPE caused in pulse-sql, but that seems to be fixed in apache#4757.
https://github.com/apache/pulsar/blob/2069f761753940ed6a1faca8999af70036f20fd6/pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto/PulsarSplitManager.java#L363-L382

(cherry picked from commit 54b39e6)
tuteng pushed a commit that referenced this pull request Apr 13, 2020
…d due to TTL (#6211)

Fixes #5579 

### Motivation

In Pulsar 2.4.1 and later versions, if message TTL is enabled, `PersistentMessageExpiryMonitor` always deletes one non-expired message every 5 minutes.

The cause of this bug is #4744. `PersistentMessageExpiryMonitor` expects `ManagedCursor#asyncFindNewestMatching()` to pass null as its found position to itself as a callback if no expired messages exist.
https://github.com/apache/pulsar/blob/c5ba52983fee994de61984aae7d1757e9b738caf/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentMessageExpiryMonitor.java#L119-L130

However, due to the change in #4744, if no entry is found that matches the search condition, the callback will be passed `startPosition` instead of null now. For this reason, the earliest backlog message is always deleted by `PersistentMessageExpiryMonitor`.

This means that unexpected message loss can occur.

### Modifications

Revert the #4744 changes. The motivation of #4744 is to avoid NPE caused in pulse-sql, but that seems to be fixed in #4757.
https://github.com/apache/pulsar/blob/2069f761753940ed6a1faca8999af70036f20fd6/pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto/PulsarSplitManager.java#L363-L382

(cherry picked from commit 54b39e6)
jiazhai pushed a commit to jiazhai/pulsar that referenced this pull request May 18, 2020
…d due to TTL (apache#6211)

Fixes apache#5579

### Motivation

In Pulsar 2.4.1 and later versions, if message TTL is enabled, `PersistentMessageExpiryMonitor` always deletes one non-expired message every 5 minutes.

The cause of this bug is apache#4744. `PersistentMessageExpiryMonitor` expects `ManagedCursor#asyncFindNewestMatching()` to pass null as its found position to itself as a callback if no expired messages exist.
https://github.com/apache/pulsar/blob/c5ba52983fee994de61984aae7d1757e9b738caf/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentMessageExpiryMonitor.java#L119-L130

However, due to the change in apache#4744, if no entry is found that matches the search condition, the callback will be passed `startPosition` instead of null now. For this reason, the earliest backlog message is always deleted by `PersistentMessageExpiryMonitor`.

This means that unexpected message loss can occur.

### Modifications

Revert the apache#4744 changes. The motivation of apache#4744 is to avoid NPE caused in pulse-sql, but that seems to be fixed in apache#4757.
https://github.com/apache/pulsar/blob/2069f761753940ed6a1faca8999af70036f20fd6/pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto/PulsarSplitManager.java#L363-L382
(cherry picked from commit 54b39e6)
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Aug 24, 2020
…d due to TTL (apache#6211)

Fixes apache#5579 

### Motivation

In Pulsar 2.4.1 and later versions, if message TTL is enabled, `PersistentMessageExpiryMonitor` always deletes one non-expired message every 5 minutes.

The cause of this bug is apache#4744. `PersistentMessageExpiryMonitor` expects `ManagedCursor#asyncFindNewestMatching()` to pass null as its found position to itself as a callback if no expired messages exist.
https://github.com/apache/pulsar/blob/c5ba52983fee994de61984aae7d1757e9b738caf/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentMessageExpiryMonitor.java#L119-L130

However, due to the change in apache#4744, if no entry is found that matches the search condition, the callback will be passed `startPosition` instead of null now. For this reason, the earliest backlog message is always deleted by `PersistentMessageExpiryMonitor`.

This means that unexpected message loss can occur.

### Modifications

Revert the apache#4744 changes. The motivation of apache#4744 is to avoid NPE caused in pulse-sql, but that seems to be fixed in apache#4757.
https://github.com/apache/pulsar/blob/2069f761753940ed6a1faca8999af70036f20fd6/pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto/PulsarSplitManager.java#L363-L382
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sql Pulsar SQL related features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use timestamp as "__publish_time__" type in Pulsar SQL.
4 participants