Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

METRON-660 [Umbrella] up-to-date versioned documentation #429

Closed
wants to merge 3 commits into from

Conversation

mattf-apache
Copy link
Member

This has come out rather well, I think. The integration is still open to discussion. Currently it is in a stand-alone, versioned-with-the-code sub-directory and sub-project. The idea is that a release manager would build the site-book (following the instructions below), then copy it into a versioned subdirectory of the (unversioned) public site, to publish it along with each code release.

To build the book, do the following:
In any git clone of incubator-metron containing the site-book subdirectory,

cd site-book
bin/generate-md.sh
mvn site:site

It only takes a few seconds. You may now view your copy of the book in a browser by opening file:///your/path/to/incubator-metron/site-book/target/site/index.html. On a Mac, you can just say open target/site/index.html

Enjoy! For code review purposes, the key files under site-book/ are:

  • bin/generate-md.sh copies all .md files from the code directory tree into the site tree, performs some transformations on them, and generates the nav tree structure and labels.
  • bin/fix-md-dialect.awk is called by 'generate-md.sh'. It does transforms within the text of each file, related to converting the Github-MD dialect of markdown into the doxia-markdown dialect.
  • pom.xml and src/site/site.xml are doxia boilerplate, tweaked for our specific needs. Thanks to @mmiklavc for his help getting these right. Please don't squash the entire PR, as the second commit is to his credit.
  • The rest is either routine fixes to .md file syntax, or should be self-explanatory.

No .md files were harmed in the making of this PR! The goal was "If it works in Github-MD, don't edit the source .md file -- fix it in the re-writer."

The only thing hardwired is the handling of the few image files used; that can be removed when it becomes important. Some may object to the awk script; if you wish to translate it into python, I'll be more than happy to accept the PR :-)

Thanks,
--Matt

@justinleet
Copy link
Contributor

This is a fantastic improvement to what we have, and even though I'm just starting to dig in, this looks really good. I'll add any comments I have as I go through it, but this is great.

See the License for the specific language governing permissions and
limitations under the License.
-->
<project name="Falcon" xmlns="http://maven.apache.org/DECORATION/1.3.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change the name from 'Falcon'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Missed that one. Thanks for pointing it out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@justinleet
Copy link
Contributor

justinleet commented Jan 30, 2017

Bullet points get a little weird in some places. (e.g. in the metron-maas-service README). Looks like, if we don't have a newline before the first bullet, it looks like it just includes it as part of the previous sentence. I found a few in the metron-maas-service README, but it wouldn't surprise me if more show up.

Same actually applies to code blocks.

@JonZeolla
Copy link
Member

JonZeolla commented Jan 30, 2017

Awesome stuff, I'm taking a look now.

Related to @justinleet's comment, it looks like there's an issue with nesting ordered and unordered list. An example is on the main index.html page - it looks like it's incorrectly ending the <ol> after "Efficient information storage" and starting a separate one before "An interface that gives a security..." where it should be a continuance of the first <ol>. Still looking at things...

Also, regarding the bullet point issue, there is an example on the main page under "Navigating the Architecture", under metron-parsers, and I'm sure in various other locations.

@mattf-apache
Copy link
Member Author

@justinleet and @JonZeolla , the issues you are describing with bullets and codeblocks sound like things I worked hard to fix, and they work on my system. Please see new screenshot at https://issues.apache.org/jira/secure/attachment/12850028/METRON-660-screenshot2.png

I'm running the re-writer under Mac OS X with GNU Awk version 4.1.1. Would you both please tell me what system you're running and what awk --version gives you? Thanks.

@JonZeolla
Copy link
Member

Mac 10.12.2, awk version 20070501

@mattf-apache
Copy link
Member Author

@JonZeolla , interesting. That would be equivalent to gawk version 3.1.5 or 3.1.6. I see now that my updated awk is from a homebrew installation.

Would you be willing to update awk? If you already use homebrew, all you have to say is brew install gawk
If you don't yet use homebrew, that's a single-line install too.

@justinleet
Copy link
Contributor

@mattf-horton Homebrew gawk seems to work out well for me. Not sure what the implementation difference is between the two though.

@mattf-apache
Copy link
Member Author

Guys, to make it easier to distinguish between tool problems vs content bugs, I've uploaded a tarball of the full site-book as built on my platform, at site-book_0.3.0_20170130.tar.gz

Could you please compare problems you are seeing in your build of the book, vs what you see in my build? Thanks.

BTW, I'm not minimizing the tool problems, they need to be resolved. But we do need to know whether we are looking at a tool problem or a content bug! :-)

@justinleet
Copy link
Contributor

@mattf-horton
Screenshots from your build vs mine of code formatting. I assume both are off the latest, because the title name is correct on both (not included in the pics). Both are in Chrome.
Yours (which looks good):
mattf_book
Mine (which does not look good):
leet_book

@justinleet
Copy link
Contributor

justinleet commented Jan 30, 2017

@mattf-horton
Edit: Issue appears to be the use of 3 argument awk (and I am not an awk expert by any measure, so I'll need double checking on this).

gawk man page: match(s, r [, a]) ...
default awk man page: match(s, r) ...

Obvious awk issue in output (that doesn't show up with gawk), that I should have noted, but apparently just shut my brain off for:

Fixing up markdown dialect problems between Github-MD and doxia-markdown: ./index.md awk: syntax error at source line 147 in function fix_prefix_blanks source file /Users/jleet/Documents/workspace/incubator-metron/site-book/bin/fix-md-dialect.awk context is prefix_blanks = match($0, >>> /^[[:blank:]]*/, <<< a); awk: illegal statement at source line 148 in function fix_prefix_blanks source file /Users/jleet/Documents/workspace/incubator-metron/site-book/bin/fix-md-dialect.awk awk: illegal statement at source line 148 in function fix_prefix_blanks source file /Users/jleet/Documents/workspace/incubator-metron/site-book/bin/fix-md-dialect.awk ./metron-analytics/index.md

@mattf-apache
Copy link
Member Author

@justinleet , yah, that's typical of what I called "paragraph munching". The triple-backtick delimiter is being consumed as text, then interpreted as an empty double-backtick, followed by single backtick that starts a code-phrase, hence the ugly highlighting.

Was this with the homebrew awk? And can you confirm that

ls -l  `which awk`

points at ../Cellar/gawk/4.1.1/bin/awk ?

It's most likely an awk problem, because that's where the re-writing gets done.
The only other toolchain, afaik, is maven, the maven-site-plugin, and doxia-module-markdown.
I'm using maven 3.3.9 on Mac OS X 10.11.6. The latter two are specified in the pom.xml file at version 3.4 and 1.6, respectively.
Could you confirm your maven version, and what versions of doxia and site are at

  • ~/.m2/repository/org/apache/maven/plugins/maven-site-plugin/
  • ~/.m2/repository/org/apache/maven/doxia/doxia-module-markdown/

@mattf-apache
Copy link
Member Author

@justinleet , hi, sorry our responses crossed. Looks like the old awk is rejecting the regex, probably the [:blank:] notation? So yah, need an updated awk. Is that an acceptable solution?

On my Centos7 test VM, awk --version gives 4.0.2. Probably will work, but I'll try it out.

@justinleet
Copy link
Contributor

@mattf-horton I think we're dealing with different implementations of awk, rather than just a pure versioning issue. I'm guessing it works on that VM, because that version number looks like a Gawk number. I'm pretty sure it's not working on the default Mac, because it's a different implementation entirely.

At that point, we're saying that whoever is building has to install gawk if they're on Mac (which a good portion of people are). Could the awk script can be relatively easily refactored to not use gawk specifics functions/features? I simply don't have enough awk knowledge to know how easy/hard that would be

@mattf-apache
Copy link
Member Author

@justinleet , No, the advanced awk features are heavily used. Well, I guess this is a good reason to convert to a python script after all. I'll give it whack.

if [[ "$OSTYPE" =~ "darwin" ]] ; then
# Macs need an argument after -i option
sed -i '' -e "${HREF_REWRITE_LIST[ $(( i + 1 )) ]}" "${HREF_REWRITE_LIST[$i]}"
else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest handling this slightly more explicitly. My two thoughts are:

  1. Approach it similar to here, using a case "${OSTYPE}" to watch darwin* and linux*, and then a *) that alerts the user.
  2. By adding an elif for either "${OSTYPE}" == linux* or "${OSTYPE}" =~ "linux", and then an else that alerts the user.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the path of least resistance here just to provide a docker container with the right version of awk?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JonZeolla thanks for the example pointer. Fixed.

@james-sirota , yes maybe sorta. But that's work too, and the on-going nuisance for people trying to use it isn't adequately balanced for the effort. I'm almost done with the conversion to python.

@JonZeolla
Copy link
Member

Ok, I'm going to hold off on any additional review until the move to a python script.

@mattf-apache
Copy link
Member Author

Okay, @JonZeolla , @justinleet , and others, the awk script has been replaced by a python script. The results of the dialect fix-up are identical to the awk script's results across all 37 .md files. Thanks.

@mattf-apache
Copy link
Member Author

BTW, the instructions for use are unchanged, since the fix-up script is invoked from within the bin/generate-md.sh script.

@justinleet
Copy link
Contributor

@mattf-horton, thanks for taking the time/effort to migrate the script. Tried it out and it works great.

The only thing I'd like to see is the indentation in the Python script made consistent. I think spaces is more standard Python and swapping potentially creates issues because of the way Python handles tabs (searching around online suggests it's treated as 8 spaces).

@mattf-apache
Copy link
Member Author

@justinleet thanks for the catch. Fixed.

@justinleet
Copy link
Contributor

+1, this is great.

@JonZeolla
Copy link
Member

Great job with the migration to python, things look much much better now.

I did notice that the h1 named anchors don't exist on incubator-metron/site-book/target/site/metron-platform/metron-common/index.html (Stellar_Language, Global_Configuration, Management_Utility) but these appear to be working in the source README.md. Other than that, everything looks good to me.

@mattf-apache
Copy link
Member Author

Hi @JonZeolla , you have good eyes! I missed those cases, which are only visible where H1 headers have been used in the body of the document, AND there are TOC or other internal links to those headings.

Eeesh, that's an unfortunate deficiency in the doxia-markdown plugin. Apparently it does not generate named anchors for the H1 headers, perhaps under the assumption there will only be one at the beginning of the doc? At any rate, I've opened two new jiras:

  • METRON-697 [site-book] doxia-markdown plugin doesn't generate named anchors for H1 headings
    METRON-698 [site-book] Add link checker as a "unit test" for site-book build

But I request that we allow these to be addressed as follow-on tasks, so we can get the site-book as it is into the public's hands. Thanks.

@mattf-apache
Copy link
Member Author

@JonZeolla you pushed my perfectionist button :-) This latest push fixes the problem with anchors on H1 headers. The generated html is unchanged except for the insertion of anchor lines next to H1 header lines. The visuals appear the same.

@mattf-apache
Copy link
Member Author

@cestella , as Release Manager you can choose whether to accept this last commit (if @JonZeolla has time to review it) or just roll in the previous change set. I'm ok either way.

@JonZeolla
Copy link
Member

+1 looks good to me @mattf-horton.

Reviewed the latest commit and ran it up/did some brief validation.

@cestella
Copy link
Member

cestella commented Feb 6, 2017

I'm +1 as well, merge master in @mattf-horton and I'll commit. Great work; I've been lurking on this one and like what I saw. 🥇

…rite script to conform Github-MD source files to doxia-markdown. Also misc fixes to markdown files.
@mattf-apache
Copy link
Member Author

Thank you all for your efforts reviewing this.

@cestella , I have rebased to master and re-validated. The only conflict was a trivial edit in one doc file.

Per discussions in the mailing lists, I also squashed most of my commits to a single one, leaving the necessary to preserve the attribution of @mmiklavc 's assist. Hence a total of three commits.

@asfgit asfgit closed this in e4d54a2 Feb 6, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
6 participants