Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database diagram #29

Open
wants to merge 6 commits into
base: master
from

Conversation

@kinow
Copy link
Member

commented Jul 9, 2019

Closes #28

Adds script bin/database-diagram.py which relies on eralchemy, to produce a graphviz/dot database diagram.

The script uses cylc-flow Python code to create the SQLite DB, just like what Cylc does normally. Then eralchemy produces a dot file.

The diagram generated by eralchemy is thin and long, without any constraints. Adding the contraints to cylc-flow's DB could lead to regressions when running suites, so I prefer to leave that for post Cylc 8 - or at least until I feel confident to help in case things go bad after this change. eralchemy supports manually entered relationships, so they are hardcoded for now (we don't seem to change the DB schema with frequency anyway).

To solve the confusing layout in the vanilla diagram, the script re-organizes the contents of the eralchemy generated dot file, by changing the graph settings, and adding a subgraph.

Final diagram:

cylc-database

Added a section to the current documentation as well. Later when we start working on the new documentation, I would like to move the database documentation up to the initial chapters, as I believe this is basic/introductory information, which is currently listed under "Suite Run databases", in chapter 13.

Screenshot_2019-07-08_20-07-58

ps: eralchemy is not a hard requirement for the documentation, but I've added an optional dependency under group [database], that can be used only when the script needs to be executed again via pip install .[database] and then python bin/database-diagram.py.

@kinow kinow self-assigned this Jul 9, 2019

@kinow kinow added the content label Jul 9, 2019

@kinow kinow referenced this pull request Jul 9, 2019
@matthewrmshin

This comment has been minimized.

Copy link
Member

commented Jul 9, 2019

Looks good.

@kinow

This comment has been minimized.

Copy link
Member Author

commented Jul 9, 2019

Travis failed with ModuleNotFoundError: No module named 'sphinx', but I think it is not due to changes in this PR. It built correctly in my notebook

@kinow kinow force-pushed the kinow:database-diagram branch from 4ab8710 to b4e14e0 Jul 12, 2019

@kinow

This comment has been minimized.

Copy link
Member Author

commented Jul 12, 2019

Conflicts fixed, updated setup.py's commit to match new syntax from master. Rebased, ready for review again.

@kinow kinow force-pushed the kinow:database-diagram branch from b4e14e0 to 6c50ab2 Jul 12, 2019

@kinow kinow force-pushed the kinow:database-diagram branch from 6c50ab2 to d2b0229 Jul 12, 2019

@sadielbartholomew
Copy link
Member

left a comment

(Going to jump in with some comments, as a possible reviewer 2).

Great work Bruno, that's a beautiful diagram, & it being (largely) auto-generated will be really useful. The more auto-generated content in the docs, the less work we need to do maintaining them 😇!

The diagram generated by eralchemy is thin and long, without any constraints

I appreciate all your efforts to make the overall dimensions more appropriate, but now the schematic seems to be too wide in that the text comes out very small, so that most would likely struggle to read it, & even opening it from the docs into a separate tab, it needs zooming in to be legible.

It would be a shame if people couldn't read it, so can I suggest, if possible, editing the script to change the placement of the third layer down as so:

diagram_1

to reduce the width, & bumping up the font size as well, since most table items have plenty of extra space in their "boxes", & only the task_jobs boxes would increase in length a bit (not enough to nullify the previous width change), so that overall the text gets larger relative to the image size. Ideally we can have the image as it sits inside the docs (with a standard-ish screen size) with font size comparable to that of the plain docs text, but it is currently about half of that.

Some further suggestions:

  • A brief description, in the docs text introducing the diagram, of the meaning of the DB schema aspects would be useful; I appreciate it is a standard DB schema format, but for example:
    • what do the lbels 0 .. N and {0, 1} mean? Could they perhaps even be changed to labels with a more obvious meaning?
    • why are certain items underlined?
  • Some DB table elements have columns of key & value. Are these just generic columns with those words to mean 'any key', 'any value'? Could that be clarified (possibly this is just my lack of knowledge on the DB, so if this makes little sense to you, no worries!)?

I would like to move the database documentation up to the initial chapters, as I believe this is basic/introductory information, which is currently listed under "Suite Run databases", in chapter 13.

Good point, though personally I'm not sure if the introductory chapters are the best place, as only advanced users should need to know about the DB, but we certainly don't want this diagram hidden away & hard to find).

In fact, much of the docs needs a restructuring, as outlined briefly in #11 (comment); perhaps add that item onto that Issue under the restructuring to help us keep a note, though restructuring might be significant enough (& need sufficient discussion) to merit it's own Issue, so we could move it out to its own one too.

@sadielbartholomew
Copy link
Member

left a comment

Sorry, forgot to submit some code comments with my previous sub-review. Very minor observations here, as the code looks great.

return cursor.fetchall()


def is_orphan(table_name, relationships):

This comment has been minimized.

Copy link
@sadielbartholomew

sadielbartholomew Jul 13, 2019

Member

This function is only used once, & is a textbook candidate for a list comprehension, so to cut out many lines I think you could just use an equivalent if-in-for-in list comprehension on line 115-116.

# eralchemy needs a file that ends with .er to parse markdown
with tempfile.NamedTemporaryFile(suffix=".er") as tf_md:
markdown = "\n".join(schema)
logger.info(f"Markdown generated: ")

This comment has been minimized.

Copy link
@sadielbartholomew

sadielbartholomew Jul 13, 2019

Member

Instead of calling logger.info twice, you could use string formatting, unless calling separately is for ease of placing these on separate lines or something? (This comment also applies to lines 226-227 below).

"""
# TODO: remove this once the relationships are in the DB, and then automate
return [
["task_states", "*--?", "task_events"],

This comment has been minimized.

Copy link
@sadielbartholomew

sadielbartholomew Jul 13, 2019

Member

Even if these are temporary, using two constants like ZERO_TO_N_REL = *--? etc. might be more comprehensible for development & ensures consistency.

@kinow

This comment has been minimized.

Copy link
Member Author

commented Jul 24, 2019

Thanks for the review @sadielbartholomew ! Will go through the feedback slowly over the next days in-between working in issues in Cylc projects :)

@sadielbartholomew sadielbartholomew referenced this pull request Jul 27, 2019
5 of 7 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.