IN-930 Create data flow diagram #131

jonavellecuerdo · 2023-10-17T15:03:39Z

What does this PR do?

Create a data flow diagram for the Carbon application.

Helpful background context

I went back-and-forth on what type of diagram to use (flowchart, sequence, or state flow), but ultimately, I decided that this flow chart showing where the data comes from and where it ends up were the key questions a data engineer might ask when first working on the application. I couldn't figure out a diagram type--with the existing mermaid features--that could easily communicate the 'threading' component of the Carbon application, so i opted to just create this diagram for now. However, I'd be happy to get suggestions for future work!

A little while ago, I made this diagram on Figma that tries to show the concurrent threads (i.e., green bars are used to indicate which thread is running for a given action), but I couldn't see how to replicate this with mermaid.

How can a reviewer manually see the effects of these changes?

Please see the README.

Includes new or updated dependencies?

NO

Developer

README is updated to reflect all changes as needed
All related Jira tickets are linked in commit message(s)

Code Reviewer

The commit message is clear and follows our guidelines
(not just this pull request message)
There are appropriate tests covering any new functionality
The documentation has been updated to reflect all changes or is not needed
The changes have been verified
New dependencies are appropriate or there were no changes

Why these changes are being introduced: * Provide a visual that encapsulates the flow of data from source to destination. How this addresses that need: * Create a diagram showing data flow for a Carbon run * Add link to Confluence document in the README Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/IN-930

ghukill

I think it looks fantastic, and I actually think this slightly simplified mermaid is even desirable over a diagram that exposes the threads in play.

Perhaps another diagram could be helpful -- if the need or request presents itself -- that really dives into the pipes and threads, but this diagram feels good for a high level, project level README.

Feel free to use or disregard any comments, and consider approved as-is. I think it's clear as-is, and my suggestions are just moving the needle slightly in the direction of conceptual/simple over technical/precise.

ghukill · 2023-10-17T18:31:05Z

README.md

+   
+   subgraph in-memory[Application In-memory]
+      direction TB
+      rec-generator([Record generator])


Is it accurate that this "Record generator" is a python generator from the data warehouse query results? If so, I might suggest renaming to "Query Results Generator" to really drive home that it's looping over the records from the data warehouse.

Yes! Update made!

ghukill · 2023-10-17T18:32:47Z

README.md

+   end
+
+   mit-dwrhs -->|Fetch query results | rec-generator
+   rec-generator-->|Yielding records one at a time, <br> transform record into normalized XML strings <br> and write to a file| buffered-writer


What about changing:

and write to a file

to:

and passes to write buffer

Please pushback though if I'm misunderstanding how it works.

I went back and forth on saying "write to file" (as in "file-like object") or "write to stream", but I do think "pass to write buffer" is clean and easy to understand.

ghukill · 2023-10-17T18:35:48Z

README.md

+
+   mit-dwrhs -->|Fetch query results | rec-generator
+   rec-generator-->|Yielding records one at a time, <br> transform record into normalized XML strings <br> and write to a file| buffered-writer
+   buffered-writer -->|Stream contents from file to another read end of pipe| buffered-reader


To continue simplifying for people new to codebase, what about changing:

Stream contents from file to another read end of pipe

to:

Direct pipe

Acknowledging this may paper over some details, at the end of the day it feels like a reader should understand that the "writer" is "piped" to the "reader" and they don't need to think about it very deeply past that (unless digging into code of course...).

We discussed this over Slack, and I made changes to reflect our conversation! Please let me know what you think.

ghukill

Again, approved!

I think diagramming the read/write pipes in Carbon is difficult, depending a lot on what kind of detail you want to show, but I think this diagram gives enough information to a developer to understand where they could look further.

At a very high level, my understanding is this:

data warehouse --> query --> results generator --> normalize XML --> read/write pipe --> FTPS client --> FTPS server

What this flow suggests to me is that each record that is normalized, is immediately moving through some kind of pipe and straight onto the FTP server.

What I think is tricky, is what do you define as the "Buffered Writer" or "Buffered Readers" there?

I think your diagram communicates that flow, and to me, that's the important thing at this README level.

ehanson8

Looks good!

jonavellecuerdo · 2023-10-18T15:13:58Z

Thank you both for your review!

jonavellecuerdo force-pushed the IN-930-add-diagrams-to-readme branch 2 times, most recently from 971e377 to cc13640 Compare October 17, 2023 15:09

jonavellecuerdo force-pushed the IN-930-add-diagrams-to-readme branch from cc13640 to 833d505 Compare October 17, 2023 15:18

jonavellecuerdo marked this pull request as ready for review October 17, 2023 15:19

jonavellecuerdo requested a review from ghukill October 17, 2023 15:19

jonavellecuerdo self-assigned this Oct 17, 2023

jonavellecuerdo requested a review from ehanson8 October 17, 2023 15:20

ghukill approved these changes Oct 17, 2023

View reviewed changes

Update diagram

bbb2f9a

jonavellecuerdo force-pushed the IN-930-add-diagrams-to-readme branch from 7791e75 to bbb2f9a Compare October 18, 2023 14:32

ghukill approved these changes Oct 18, 2023

View reviewed changes

ehanson8 approved these changes Oct 18, 2023

View reviewed changes

jonavellecuerdo merged commit f804b68 into main Oct 18, 2023

jonavellecuerdo deleted the IN-930-add-diagrams-to-readme branch October 18, 2023 15:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

IN-930 Create data flow diagram #131

IN-930 Create data flow diagram #131

Uh oh!

jonavellecuerdo commented Oct 17, 2023 •

edited

Loading

Uh oh!

ghukill left a comment •

edited

Loading

Uh oh!

ghukill Oct 17, 2023

Uh oh!

jonavellecuerdo Oct 18, 2023

Uh oh!

ghukill Oct 17, 2023

Uh oh!

jonavellecuerdo Oct 18, 2023

Uh oh!

ghukill Oct 17, 2023

Uh oh!

jonavellecuerdo Oct 18, 2023

Uh oh!

ghukill left a comment

Uh oh!

ehanson8 left a comment

Uh oh!

jonavellecuerdo commented Oct 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

IN-930 Create data flow diagram #131

IN-930 Create data flow diagram #131

Uh oh!

Conversation

jonavellecuerdo commented Oct 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Helpful background context

How can a reviewer manually see the effects of these changes?

Includes new or updated dependencies?

Developer

Code Reviewer

Uh oh!

ghukill left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghukill Oct 17, 2023

Choose a reason for hiding this comment

Uh oh!

jonavellecuerdo Oct 18, 2023

Choose a reason for hiding this comment

Uh oh!

ghukill Oct 17, 2023

Choose a reason for hiding this comment

Uh oh!

jonavellecuerdo Oct 18, 2023

Choose a reason for hiding this comment

Uh oh!

ghukill Oct 17, 2023

Choose a reason for hiding this comment

Uh oh!

jonavellecuerdo Oct 18, 2023

Choose a reason for hiding this comment

Uh oh!

ghukill left a comment

Choose a reason for hiding this comment

Uh oh!

ehanson8 left a comment

Choose a reason for hiding this comment

Uh oh!

jonavellecuerdo commented Oct 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jonavellecuerdo commented Oct 17, 2023 •

edited

Loading

ghukill left a comment •

edited

Loading