-
Notifications
You must be signed in to change notification settings - Fork 5
Implementing Exercise 07 - Querying Kafka Topics #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3b9fa6c to
783f5eb
Compare
0f48c1c to
3278d80
Compare
pmoskovi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice exercises. My comments are based on reading them. If you need someone going through the exercises after David & Martijn's reviews, let me know, happy to do it.
|
|
||
| More importantly, it creates a foundation for future work. | ||
|
|
||
| **Note:** It's important to remember that the data in the table is streaming and unbounded. Once the query is executed it will run forever until it is terminated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be clear to the reader at this point what the term unbounded means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this describes streaming: Once the query is executed it will run forever until it is terminated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user was just doing the exercises, then that might not be clear. If they are watching the lectures, then I hope it would be clear.
| - Execute and return the result. | ||
|
|
||
| <details> | ||
| <summary>**Hint**</summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the expandable Hint here! Nice touch!
|
|
||
| Modify `Marketplace.java` as follows. | ||
|
|
||
| - In the `main` method, create an instance of the `CustomerService`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When referring to a method, consider adding (), as in: main().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main() wouldn't be the right signature though. I think using the backticks works to make it clear this is a code entity.
| Modify `Marketplace.java` as follows. | ||
|
|
||
| - In the `main` method, create an instance of the `CustomerService`. | ||
| - Call the `allCustomers` method on the service to get a `TableResult`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allCustomers()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment applies to other methods mentioned below.
| You can implement a basic select statement as follows: | ||
|
|
||
| ``` | ||
| env.from("TABLE NAME") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how Hints are just that. No solutions, just hints.
This made me think - would it make sense to add links to the actual solutions? Or they'll have it all cloned to their machines anyway, so it's not necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the first exercise, they will be reading through the README which explains how they can pull the solution if necessary.
Links to the solution are an interesting idea though. Let me think on that.
|
|
||
| import static org.junit.jupiter.api.Assertions.*; | ||
|
|
||
| class CustomerServiceTest extends FlinkTableAPITest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are written as "Integration" style tests. They talk to the real cluster. That's not ideal. Many of these tests take 30-90 seconds to execute which is incredibly slow.
However, I don't think we currently have a viable unit test strategy. I am open to alternatives that might help speed things up.
The question is, in the absence of a viable unit test strategy, is this suitable for at least doing internal enablement on the technology?
| this.env = env; | ||
| } | ||
|
|
||
| public TableResult allCustomers() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed a way to make these methods at least somewhat "testable". Most of the examples of the Table API consist of nothing but a "main" method which isn't testable.
I opted to create these methods inside of Services so that I could write tests that target individual Flink queries/jobs.
However, if there is a better/more common way to organize a Flink Table API application, please let me know what it is (ideally with an example) and I can see if I can adapt.
|
|
||
| Arrays.stream(env.listTables()).forEach(System.out::println); | ||
|
|
||
| customers.allCustomers(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When these queries are executed there is nothing to shut them down. For queries like this, with no sink, they will automatically shutdown a short time after the application terminates.
For complete streams that include a sink, they never shut down and will run forever.
Is that desirable? How would we recommend people manage these jobs to make sure they don't leak a bunch of queries and consume resources unnecessarily? How can we prevent someone from running the same job multiple times by accident and causing duplicate data in their topic? What is our recommended way of managing the lifecycle of these jobs?
Description
Implementing the second exercise (module 7) which focuses on writing basic Select statements in Flink.
Checklist
./build.sh validate.