Implementing Exercise 07 - Querying Kafka Topics #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

WadeWaldron wants to merge 3 commits into main from exercise-07

Contributor

WadeWaldron commented Jun 25, 2024 •

edited

Loading

Description

Implementing the second exercise (module 7) which focuses on writing basic Select statements in Flink.

Checklist

Unit tests created/updated for any new code (where applicable).
Run all tests with ./build.sh validate.
Update the CHANGELOG.md.
Update the README.md if necessary.

cla-assistant bot commented Jun 25, 2024 •

edited

Loading

All committers have signed the CLA.

WadeWaldron force-pushed the exercise-07 branch 2 times, most recently from 3b9fa6c to 783f5eb Compare

June 25, 2024 20:47


          Implementing Exercise 04 - Connecting to Confluent Cloud

82caba9

WadeWaldron force-pushed the exercise-07 branch 3 times, most recently from 0f48c1c to 3278d80 Compare

June 25, 2024 20:56


          Implementing Exercise 07 - Querying Kafka Topics

b84d90b

WadeWaldron force-pushed the exercise-07 branch from 9e23fb7 to b84d90b Compare

June 27, 2024 18:27


          Slight tweak to the cleanup method.

0dda484

WadeWaldron marked this pull request as ready for review

July 8, 2024 16:36

WadeWaldron requested a review from a team as a code owner

July 8, 2024 16:36

WadeWaldron requested review from MartijnVisser, alpinegizmo and pmoskovi

July 8, 2024 16:37

pmoskovi reviewed

View reviewed changes

Member

pmoskovi left a comment

Nice exercises. My comments are based on reading them. If you need someone going through the exercises after David & Martijn's reviews, let me know, happy to do it.

instructions/07-querying-flink-tables.md


		More importantly, it creates a foundation for future work.

		Note: It's important to remember that the data in the table is streaming and unbounded. Once the query is executed it will run forever until it is terminated.

Member

pmoskovi Jul 8, 2024

Will it be clear to the reader at this point what the term unbounded means?

Member

pmoskovi Jul 8, 2024

I assume this describes streaming: Once the query is executed it will run forever until it is terminated.

Contributor Author

WadeWaldron Jul 8, 2024

If the user was just doing the exercises, then that might not be clear. If they are watching the lectures, then I hope it would be clear.

instructions/07-querying-flink-tables.md

+              - Execute and return the result.
+              <details>
+                <summary>**Hint**</summary>

Member

pmoskovi Jul 8, 2024

I like the expandable Hint here! Nice touch!

instructions/07-querying-flink-tables.md


		Modify `Marketplace.java` as follows.

		- In the `main` method, create an instance of the `CustomerService`.

Member

pmoskovi Jul 8, 2024

When referring to a method, consider adding (), as in: main().

Contributor Author

WadeWaldron Jul 8, 2024

main() wouldn't be the right signature though. I think using the backticks works to make it clear this is a code entity.

instructions/07-querying-flink-tables.md

+              Modify `Marketplace.java` as follows.
+              - In the `main` method, create an instance of the `CustomerService`.
+              - Call the `allCustomers` method on the service to get a `TableResult`.

Member

pmoskovi Jul 8, 2024

allCustomers()

Member

pmoskovi Jul 8, 2024

This comment applies to other methods mentioned below.

instructions/07-querying-flink-tables.md

+              You can implement a basic select statement as follows:
+              ```
+              env.from("TABLE NAME")

Member

pmoskovi Jul 8, 2024

I like how Hints are just that. No solutions, just hints.
This made me think - would it make sense to add links to the actual solutions? Or they'll have it all cloned to their machines anyway, so it's not necessary?

Contributor Author

WadeWaldron Jul 8, 2024

During the first exercise, they will be reading through the README which explains how they can pull the solution if necessary.

Links to the solution are an interesting idea though. Let me think on that.

WadeWaldron commented

View reviewed changes

staging/07-querying-flink-tables/src/test/java/marketplace/CustomerServiceTest.java


		import static org.junit.jupiter.api.Assertions.*;

		class CustomerServiceTest extends FlinkTableAPITest {

Contributor Author

WadeWaldron Jul 16, 2024

These are written as "Integration" style tests. They talk to the real cluster. That's not ideal. Many of these tests take 30-90 seconds to execute which is incredibly slow.

However, I don't think we currently have a viable unit test strategy. I am open to alternatives that might help speed things up.

The question is, in the absence of a viable unit test strategy, is this suitable for at least doing internal enablement on the technology?

WadeWaldron commented

View reviewed changes

staging/07-querying-flink-tables/src/main/java/marketplace/CustomerService.java

+                      this.env = env;
+                  }
+                  public TableResult allCustomers() {

Contributor Author

WadeWaldron Jul 16, 2024 •

edited

Loading

I needed a way to make these methods at least somewhat "testable". Most of the examples of the Table API consist of nothing but a "main" method which isn't testable.

I opted to create these methods inside of Services so that I could write tests that target individual Flink queries/jobs.

However, if there is a better/more common way to organize a Flink Table API application, please let me know what it is (ideally with an example) and I can see if I can adapt.

WadeWaldron commented

View reviewed changes

solutions/07-querying-flink-tables/src/main/java/marketplace/Marketplace.java


		Arrays.stream(env.listTables()).forEach(System.out::println);

		customers.allCustomers();

Contributor Author

WadeWaldron Jul 16, 2024

When these queries are executed there is nothing to shut them down. For queries like this, with no sink, they will automatically shutdown a short time after the application terminates.

For complete streams that include a sink, they never shut down and will run forever.

Is that desirable? How would we recommend people manage these jobs to make sure they don't leak a bunch of queries and consume resources unnecessarily? How can we prevent someone from running the same job multiple times by accident and causing duplicate data in their topic? What is our recommended way of managing the lifecycle of these jobs?

WadeWaldron closed this

WadeWaldron deleted the exercise-07 branch

August 21, 2024 15:30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet