Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support context-aware includes for images and anchors #826

Merged
merged 5 commits into from
Apr 18, 2019

Conversation

sijie123
Copy link
Contributor

@sijie123 sijie123 commented Apr 9, 2019

• [x] Enhancement to an existing feature

Fixes #822

What is the rationale for this request?
After our parser includes a file, it's context is discarded.
This means that in subsequent passes of parser, we do not know where
the content has been included from, and included content is treated
uniformly as if they were originally part of the file.

This means that content in an included file must be valid wherever
it is included from. For example, resource references need to be
absolute paths, so that they remain valid wherever they are included
from.

This makes it impossible to support relative src references written using Markdown, as markdown code is only parsed after all the includes have been made.

What changes did you make? (Give an overview)
To support context-based references, i.e. the included segments
retain awareness of their original include locations, let's add
a data-included-from meta whenever we include files. To maintain
the existing behaviour and to prevent leaking file structure data,
this meta tag is removed before we finish generating the final
website.

Breaking changes
Previously, some relative URLs are allowed. The reference was always made against files which initiate the include.
Now, any reference is always made against files using the relative URLs.
This should make it more intuitive for authors to reuse content, but may break existing code especially if they are generated using nunjucks/macros.

Provide some example code that this change will affect:
Given this directory structure:

/
├ userGuide/
├-- subFolder/
├---- content.md
├-- index.md
├-- tutorial.md

Assume that we have a tutorial file that simply collates a list of content:

<include src="subFolder/content.md" />

And assuming we have the existing content file:

Topic One
Please refer to [our user guide here](index.html)

The existing content file is correct because the generated tutorial.html file will have a correct relative reference to index.html, the homepage of the user guide.
However, after this PR, the existing content file is incorrect. index.html will refer to an index file within subFolder. All relative references to resources and links are now with respect to the file that writes these references, so in this case, they are made with respect to content.md.
content.md should be updated as

Topic One
Please refer to [our user guide here](../index.html)

Is there anything you'd like reviewers to focus on?
As this PR is significantly breaking and likely to be buggy, please help me test as much as possible and point out any bugs so that I can correct them ASAP.
Sites confirmed working:
[x] Our system test site
[x] CS2103T website
[x] TE3201 website

List of known issues:
All known issues to date are resolved.

Progress:
[x] Finish testing against all existing websites and ensure that there are no bugs
[x] Update documentation to reflect change against relative URL references
[x] Determine reason behind unwrapping div bug. Update: bug is due to difference between span and div used for inline and non-inline includes. Bug squashed.
[x] Determine why tests aren't correct - and whether to update tests or to find bug. Update: bug squashed.

Testing instructions:
Please check out this PR and build it against existing, known good sites and verify that the sites are working as expected.

@sijie123
Copy link
Contributor Author

sijie123 commented Apr 9, 2019

@Chng-Zhi-Xuan there were some issues with the commit previously - i've updated and rebased. Please pull the changes again. Thanks!

@Chng-Zhi-Xuan
Copy link
Contributor

I will be testing in-depth more tomorrow due to time constraints.

Not immediately after requesting for review from myself :P

@acjh
Copy link
Contributor

acjh commented Apr 9, 2019

Do test with boilerplate files.

@yamgent
Copy link
Member

yamgent commented Apr 9, 2019

A brief comment from a quick glance... commit 11bb1e8 seems to be doing two different things at the same time (adding test + pushing the parsing of img and anchors to a later stage), is it not possible to split it up?

@sijie123
Copy link
Contributor Author

sijie123 commented Apr 10, 2019

A brief comment from a quick glance... commit 11bb1e8 seems to be doing two different things at the same time (adding test + pushing the parsing of img and anchors to a later stage), is it not possible to split it up?

Oh no. One of the commits disappeared / got squashed together... I'll fix that up in a bit. Thanks for the heads up.

Edit: It appears okay now.

@Chng-Zhi-Xuan
Copy link
Contributor

So I've updated the test_site files to pass the tests.

Then I've notice the image links are broken for the 2 lines are are replaced here, in testReuse.md:
unexpected-behaviour

So the src is wrong and there is a href attached to those links.

Here is the line within testReuse.md. Can you take a look into this weird bug?

<pic src="images/I'm not allowed to use my favorite tool.png"></pic>
<pic src="{{imgFolder}}/I'm not allowed to use my favorite tool.png"></pic>

@sijie123 sijie123 force-pushed the opRelativeSrc branch 3 times, most recently from 3313e79 to 3635062 Compare April 11, 2019 07:17
@sijie123
Copy link
Contributor Author

Thanks @yamgent. That was dangerous - I had thought that the relative URLs in the testReuse.include.html was something expected from the new behaviour.
Lesson learnt: Always trust your tests :P

I've fixed the oversight. Now, included files (via panels) should be processed correctly too.

@sijie123 sijie123 requested a review from yamgent April 12, 2019 12:31
Copy link
Member

@yamgent yamgent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit "Parser: Support context-aware includes"

This means that content in an included file must be valid wherever
it is included from. For example, resource references need to be
absolute paths, so that they remain valid wherever they are included
from.

Um, the England here doesn't make sense? :P Are you trying to say that
paths need to be valid regardless of wherever it is included from?

src/lib/markbind/src/parser.js Outdated Show resolved Hide resolved
test/unit/parser.test.js Show resolved Hide resolved
@sijie123
Copy link
Contributor Author

Commit "Parser: Support context-aware includes"

This means that content in an included file must be valid wherever
it is included from. For example, resource references need to be
absolute paths, so that they remain valid wherever they are included
from.

Um, the England here doesn't make sense? :P Are you trying to say that
paths need to be valid regardless of wherever it is included from?

I was trying to say something like if we have 2 files from different places, e.g.
/a/index.md and /index.md both includes /a/sub-folder/b.md, any reference in b.md needs to remain valid whether it ends up in /a/index.md or /index.md.
For example, if b.md refers to a css file located in /a/file.css, it cannot refer to it as file.css, because this reference is invalid from the perspective of /index.md.

I've updated the commit message and fixed the {{hostBaseUrl}} bug. I still don't really like my commit message, but I can't find a clearer way to express the idea above^

@yamgent
Copy link
Member

yamgent commented Apr 15, 2019

I put in the example that you gave, and also invented the word "host file", see if that makes it clearer. The first two paragraphs are modified.

After our parser includes a file, its context is discarded.
This means that in subsequent passes of parser, we do not know where
the content has been included from, and included content is treated
uniformly as if they were originally part of the host file.

The path of the references in the included file may not resolve
correctly if it belongs to a different directory from the host file.
For example, if /a/b/c.md refers to a file in '../file.css', and
/index.md is the host file that includes /a/b/c.md, then /index.md
should resolve this path to '/a/b/file.css'. It cannot retain the
original path '../file.css', as that will lead to the wrong css file.

To support context-aware references, i.e. the included segments
retain awareness of their original include locations, let's add
a data-included-from meta whenever we include files. To maintain
the existing behaviour and to prevent leaking file structure data,
this meta tag is removed before we finish generating the final
website.

@Chng-Zhi-Xuan
Copy link
Contributor

Chng-Zhi-Xuan commented Apr 15, 2019

I just pulled the latest branch. Tested and found this:

Both Pic and Img tags test have a stay C:/ within the src, which results in a broken link
bug

@sijie123
Copy link
Contributor Author

I just pulled the latest branch. Tested and found this:

Both Pic and Img tags test have a stay C:/ within the src, which results in a broken link
bug

Confirmed. I'll just have to add a couple more ensurePosix then :(
On a side note, we really have to get our AppVeyor up and running :P

@sijie123 sijie123 force-pushed the opRelativeSrc branch 2 times, most recently from 6f20e97 to c9213fb Compare April 15, 2019 12:47
@sijie123
Copy link
Contributor Author

I put in the example that you gave, and also invented the word "host file", see if that makes it clearer. The first two paragraphs are modified.

After our parser includes a file, its context is discarded.
This means that in subsequent passes of parser, we do not know where
the content has been included from, and included content is treated
uniformly as if they were originally part of the host file.

The path of the references in the included file may not resolve
correctly if it belongs to a different directory from the host file.
For example, if /a/b/c.md refers to a file in '../file.css', and
/index.md is the host file that includes /a/b/c.md, then /index.md
should resolve this path to '/a/b/file.css'. It cannot retain the
original path '../file.css', as that will lead to the wrong css file.

To support context-aware references, i.e. the included segments
retain awareness of their original include locations, let's add
a data-included-from meta whenever we include files. To maintain
the existing behaviour and to prevent leaking file structure data,
this meta tag is removed before we finish generating the final
website.

Yes! The use of "host file" made it much clearer. Thank you so much - I will adopt this commit message :D

I've also attempted a fix for the Windows bug @Chng-Zhi-Xuan by forcing it to use posix notation. I've tested it on a Windows VM and it looks fine now.

@sijie123 sijie123 requested a review from yamgent April 15, 2019 12:50
yamgent
yamgent previously approved these changes Apr 15, 2019
@yamgent yamgent added this to the v2.2.1 milestone Apr 15, 2019
@sijie123 sijie123 requested a review from yamgent April 15, 2019 17:14
@sijie123 sijie123 dismissed yamgent’s stale review April 15, 2019 17:14

Updated code base

@sijie123
Copy link
Contributor Author

sijie123 commented Apr 15, 2019

Thanks for reviewing @yamgent. Sorry I had to dismiss your review because I found an issue with authoringContents.md (and some CS2103T pages) that happened as a result of inline includes.
When we do inline includes, we use a span instead of a div. However, my implementation "broke" inline includes because I always use a div. I have now added checks: if it's an inline include, use a span; otherwise, use a div.
(The reason why I don't use span all the time is because somehow spans introduce extra <p></p> that cause the formatting to go wrong)

I've taken a quick glance at the resulting CS2103T and TE3201 websites, they look more fine now :P

I'll still need to update the documentation to reflect the change in our includes. Will do that tomorrow.

@Chng-Zhi-Xuan
Copy link
Contributor

Chng-Zhi-Xuan commented Apr 16, 2019

Hi @sijie123 Thanks for updating the PR, and doing additional tests.

It will be good to add a Pending Item or To-Do list either in the PR description or latest comment to inform reviewers that there are still items being added to the PR.

Just a minor QoL nit :P

@sijie123 sijie123 changed the title Support context-aware includes for images and anchors [WIP] Support context-aware includes for images and anchors Apr 16, 2019
@sijie123 sijie123 force-pushed the opRelativeSrc branch 2 times, most recently from 9597ebc to f108f1c Compare April 16, 2019 05:48
@sijie123 sijie123 changed the title [WIP] Support context-aware includes for images and anchors Support context-aware includes for images and anchors Apr 16, 2019
@sijie123
Copy link
Contributor Author

sijie123 commented Apr 16, 2019

Finally! I think I'm done with everything I set out to do...
I'll do one last test on a Windows VM, but in the meantime it's up for review. (Yup, looks good on a Windows VM)
Thanks @yamgent @Chng-Zhi-Xuan for your help!

Copy link
Contributor

@Chng-Zhi-Xuan Chng-Zhi-Xuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with panels and nested panels and seems fine so far 👍

Just a couple of comments, feel free to give your perspective if needed.

src/lib/markbind/src/parser.js Outdated Show resolved Hide resolved
docs/userGuide/syntax/links.mbdf Show resolved Hide resolved
After our parser includes a file, its context is discarded.
This means that in subsequent passes of parser, we do not know where
the content has been included from, and included content is treated
uniformly as if they were originally part of the host file.

The path of the references in the included file may not resolve
correctly if it belongs to a different directory from the host file.
For example, if /a/b/c.md refers to a file in '../file.css', and
/index.md is the host file that includes /a/b/c.md, then /index.md
should resolve this path to '/a/b/file.css'. It cannot retain the
original path '../file.css', as that will lead to the wrong css file.

To support context-aware references, i.e. the included segments
retain awareness of their original include locations, let's add
a data-included-from meta whenever we include files. To maintain
the existing behaviour and to prevent leaking file structure data,
this meta tag is removed before we finish generating the final
website.
MarkBind does not support relative references if they are written
using Markdown syntax. The existing MarkBind parser only processes
relative URLs once during the preprocess stage, during which
Markdown code has not yet been transcribed into html.

With context-based includes available in a previous commit, we can
now store the context from which files are included from. We can
then use this information to derive the absolute path of a resource
referenced by an included file.

Let's rewrite the support for relative URLs by processing all
such relative URLs after the site is generated. This way, we can
convert relative URLs in Markdown code to absolute URLs too.

With this new method of processing relative URLs, let's also remove
the old implementation.
As parser now supports context-aware includes, we need to update
our unit tests for parser to check that our data-included-from
tags are correct.

Let's update our unit tests in parser.test.js.
With context-aware includes, all relative URLs should now be with
respect to the file that contains the relative URLs, instead of
being with respect to the files that are expected to include them.

FilterTags needs to be updated so that the reference to Hiding
Tags is with respect to its location in userGuide/plugins/.
Our user guide does not mention how users should use relative URL
references. Now that parser fully supports relative URL references
for images and links, let's update our user guide to document
MarkBind's behavior towards relative references.

In particular, the intra-site links section has been updated with
an important note on how users should specify relative references,
while images and pictures section have been updated with a note
teaching users that they can now use relative references.
@sijie123
Copy link
Contributor Author

Updated with @Chng-Zhi-Xuan's comments. Thanks :))

Copy link
Contributor

@Chng-Zhi-Xuan Chng-Zhi-Xuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 , don't notice any other glaring issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Markdown syntax for images doesn't work without {{ baseUrl }}
4 participants