Skip to content

Conversation

@jonavellecuerdo
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo commented Oct 17, 2023

What does this PR do?

Create a data flow diagram for the Carbon application.

Helpful background context

I went back-and-forth on what type of diagram to use (flowchart, sequence, or state flow), but ultimately, I decided that this flow chart showing where the data comes from and where it ends up were the key questions a data engineer might ask when first working on the application. I couldn't figure out a diagram type--with the existing mermaid features--that could easily communicate the 'threading' component of the Carbon application, so i opted to just create this diagram for now. However, I'd be happy to get suggestions for future work!

A little while ago, I made this diagram on Figma that tries to show the concurrent threads (i.e., green bars are used to indicate which thread is running for a given action), but I couldn't see how to replicate this with mermaid.
image

How can a reviewer manually see the effects of these changes?

Please see the README.

Includes new or updated dependencies?

NO

Developer

  • README is updated to reflect all changes as needed
  • All related Jira tickets are linked in commit message(s)

Code Reviewer

  • The commit message is clear and follows our guidelines
    (not just this pull request message)
  • There are appropriate tests covering any new functionality
  • The documentation has been updated to reflect all changes or is not needed
  • The changes have been verified
  • New dependencies are appropriate or there were no changes

@jonavellecuerdo jonavellecuerdo force-pushed the IN-930-add-diagrams-to-readme branch 2 times, most recently from 971e377 to cc13640 Compare October 17, 2023 15:09
Why these changes are being introduced:
* Provide a visual that encapsulates the flow of data from source
to destination.

How this addresses that need:
* Create a diagram showing data flow for a Carbon run
* Add link to Confluence document in the README

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-930
@jonavellecuerdo jonavellecuerdo force-pushed the IN-930-add-diagrams-to-readme branch from cc13640 to 833d505 Compare October 17, 2023 15:18
@jonavellecuerdo jonavellecuerdo marked this pull request as ready for review October 17, 2023 15:19
@jonavellecuerdo jonavellecuerdo self-assigned this Oct 17, 2023
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks fantastic, and I actually think this slightly simplified mermaid is even desirable over a diagram that exposes the threads in play.

Perhaps another diagram could be helpful -- if the need or request presents itself -- that really dives into the pipes and threads, but this diagram feels good for a high level, project level README.

Feel free to use or disregard any comments, and consider approved as-is. I think it's clear as-is, and my suggestions are just moving the needle slightly in the direction of conceptual/simple over technical/precise.

README.md Outdated
subgraph in-memory[Application In-memory]
direction TB
rec-generator([Record generator])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it accurate that this "Record generator" is a python generator from the data warehouse query results? If so, I might suggest renaming to "Query Results Generator" to really drive home that it's looping over the records from the data warehouse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Update made!

README.md Outdated
end
mit-dwrhs -->|Fetch query results | rec-generator
rec-generator-->|Yielding records one at a time, <br> transform record into normalized XML strings <br> and write to a file| buffered-writer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about changing:

and write to a file

to:

and passes to write buffer

Please pushback though if I'm misunderstanding how it works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on saying "write to file" (as in "file-like object") or "write to stream", but I do think "pass to write buffer" is clean and easy to understand.

README.md Outdated
mit-dwrhs -->|Fetch query results | rec-generator
rec-generator-->|Yielding records one at a time, <br> transform record into normalized XML strings <br> and write to a file| buffered-writer
buffered-writer -->|Stream contents from file to another read end of pipe| buffered-reader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To continue simplifying for people new to codebase, what about changing:

Stream contents from file to another read end of pipe

to:

Direct pipe

Acknowledging this may paper over some details, at the end of the day it feels like a reader should understand that the "writer" is "piped" to the "reader" and they don't need to think about it very deeply past that (unless digging into code of course...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this over Slack, and I made changes to reflect our conversation! Please let me know what you think.

@jonavellecuerdo jonavellecuerdo force-pushed the IN-930-add-diagrams-to-readme branch from 7791e75 to bbb2f9a Compare October 18, 2023 14:32
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, approved!

I think diagramming the read/write pipes in Carbon is difficult, depending a lot on what kind of detail you want to show, but I think this diagram gives enough information to a developer to understand where they could look further.

At a very high level, my understanding is this:

data warehouse --> query --> results generator --> normalize XML --> read/write pipe --> FTPS client --> FTPS server

What this flow suggests to me is that each record that is normalized, is immediately moving through some kind of pipe and straight onto the FTP server.

What I think is tricky, is what do you define as the "Buffered Writer" or "Buffered Readers" there?

I think your diagram communicates that flow, and to me, that's the important thing at this README level.

Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@jonavellecuerdo
Copy link
Contributor Author

Thank you both for your review!

@jonavellecuerdo jonavellecuerdo merged commit f804b68 into main Oct 18, 2023
@jonavellecuerdo jonavellecuerdo deleted the IN-930-add-diagrams-to-readme branch October 18, 2023 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants