Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

composebox_typeahead: Avoid generating broken links. #30071

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kuv2707
Copy link
Collaborator

@kuv2707 kuv2707 commented May 13, 2024

The #**stream>topic** syntax generates broken links for topics containing two backticks or starting/ending with *, because of architectural flaws in the backend markdown processor. So we avoid generating the syntax for such topics and instead generate the normal link syntax in markdown.

Fixes #19873

CZO discussion

Screenshots and Screen Captures

67C2DQnjcD

chrome_yORpX1Hnrn

Self-review checklist
  • Self-reviewed the changes for clarity and maintainability
    (variable names, code reuse, readability, etc.).

Communicate decisions, questions, and potential concerns.

  • Explains differences from previous plans (e.g., issue description).
  • Highlights technical choices and bugs encountered.
  • Calls out remaining decisions and concerns.
  • Automated tests verify logic where appropriate.

Individual commits are ready for review (see commit discipline).

  • Each commit is a coherent idea.
  • Commit message(s) explain reasoning and motivation for changes.

Completed manual review and testing of the following:

  • Visual appearance of the changes.
  • Responsiveness and internationalization.
  • Strings and tooltips.
  • End-to-end functionality of buttons, interactions and flows.
  • Corner cases, error conditions, and easily imagined bugs.

@zulipbot zulipbot added size: S area: markdown (mentions) Mentions for users/groups/stream/topic/time. bug rust community request Issues of interest to the Rust community labels May 13, 2024
Copy link
Collaborator Author

@kuv2707 kuv2707 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made a preliminary implementation and highlighted the reasons behind some technical choices.
Once we decide the correct logic to detect faulty topic names, I'll refactor the functions and put them in a sensible place.

@@ -995,8 +997,29 @@ export function content_typeahead_selected(item, query, input_element, event) {
// Stream + topic mention typeahead; close the stream+topic mention syntax
// with the topic and the final **. Note that token.length can be 0
// if we are completing from `**streamname>`.
function will_produce_broken_link(topic) {
Copy link
Collaborator Author

@kuv2707 kuv2707 May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I have considered only two cases which produced buggy links:

  • When there are two or more backticks
  • When the topic name begins or ends with asterisks.

There can still be topic names which cause broken links. I had earlier tried to parse the syntax text using the frontend markdown parser and see if the link in the parsed html points to the correct stream and topic. But double-backtick topic names gave different urls through frontend and backend markdown processors (in fact, the frontend parser correctly parsed the double-backtick topic name, while the backend processor generated an incorrect url).

This means that checking if the frontend markdown processor parses the syntax correctly gives no guarantee about the correctness of the link in the final message. So I thought the best option would be to check for the individual reported cases which produce broken urls.

I infer from CZO discussions that we will eventually move to a different markdown processor, and this will no longer be an issue then.

web/src/composebox_typeahead.js Outdated Show resolved Hide resolved
@alya
Copy link
Contributor

alya commented May 14, 2024

@timabbott Do you want to take a look and provide some feedback here?

@alya alya added the integration review Added by maintainers when a PR may be ready for integration. label May 14, 2024
beginning.slice(0, syntax_start_index) +
url_syntax(get_stream_name(beginning), item.topic) +
" ";
}
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you extract this logic to a function and/or add unit tests for it? It's complicated enough that it's very much worth having a test for.

Copy link
Collaborator Author

@kuv2707 kuv2707 May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have extracted the logic and added some unit tests.

@kuv2707
Copy link
Collaborator Author

kuv2707 commented May 20, 2024

@timabbott Addressed the suggestion. Please take a look.

);
assert.equal(
ct.stream_topic_link_syntax("#**stream>t", "*asterisk"),
"[#stream>*asterisk](#narrow/stream/-1-unknown/topic/*asterisk) ",
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do all of these have -1 as the stream ID in the URLs? The first couple don't, which makes me think you've got some sort of weird bug in either one test case or the other.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stream ids are only required when we generate the markdown link syntax (since we need to generate the url ourselves).
That is why the first two didn't need them.

@@ -1180,6 +1185,31 @@ export function content_typeahead_selected(
return beginning + rest;
}

function will_produce_broken_stream_topic_link(topic_name: string): boolean {
return /(\*+)|(.*`.*`.*)/.test(topic_name);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The links for topic names with double backticks are now generated correctly, but the double backticks cause the enclosed part of the topic name to render as an inline code block.
I don't know what to do in this case since backslash escaping is disabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think there's no great answer for that at present. This is #18202.

As long as the link works correctly, though, that's a much less severe issue than #19873 is.

A possible workaround is to use HTML character references, as suggested here: #18202 (comment)
I'd definitely leave that as a separate later commit, though.

@kuv2707 kuv2707 force-pushed the 19873_topic_name_fix branch 2 times, most recently from a3c5613 to 8c243b1 Compare May 21, 2024 12:16
@kuv2707 kuv2707 requested a review from timabbott May 21, 2024 15:38
Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kuv2707! This is an old issue people have repeatedly run into, and I'm happy to see progress being made on it.

web/src/composebox_typeahead.ts Outdated Show resolved Hide resolved
@@ -1180,6 +1185,31 @@ export function content_typeahead_selected(
return beginning + rest;
}

function will_produce_broken_stream_topic_link(topic_name: string): boolean {
return /(\*+)|(.*`.*`.*)/.test(topic_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think there's no great answer for that at present. This is #18202.

As long as the link works correctly, though, that's a much less severe issue than #19873 is.

A possible workaround is to use HTML character references, as suggested here: #18202 (comment)
I'd definitely leave that as a separate later commit, though.

web/src/composebox_typeahead.ts Outdated Show resolved Hide resolved
@kuv2707 kuv2707 force-pushed the 19873_topic_name_fix branch 2 times, most recently from 7704b04 to d15f3fd Compare June 4, 2024 12:32
@kuv2707
Copy link
Collaborator Author

kuv2707 commented Jun 4, 2024

@alya Does this PR have to go through the buddy and mentor review stages? (Since it's already reviewed by maintainers once)

@alya
Copy link
Contributor

alya commented Jun 5, 2024

I think it's OK to skip those at this stage. Has all the feedback above been addressed?

@kuv2707
Copy link
Collaborator Author

kuv2707 commented Jun 6, 2024

@alya I have addresed all the feedback.

Changes:

  • Simplified the regex.
  • Included $$ in the regex.
  • Checking both the stream name and the topic name.
  • Using HTML character references to escape backticks, dollars etc.

I have updated the PR description with the screenshots.

I've initiated a discussion about some decisions regarding corner cases here

const end = syntax.lastIndexOf(">");
return syntax.slice(start + 3, end);
}
export function replace_markdown_characters(text: string): string {
Copy link
Sponsor Member

@timabbott timabbott Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should have a blank line between functions.

Also, usually it's better to add new functions above the one function that uses them.

I also think a better name would be something like html_escape_markdown_syntax_characters.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done

) {
return url_syntax(stream_name, topic_name);
}
return `#**${stream_name}>${topic_name}** `;
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a trailing space here? Seems like it should be added by the caller if desired.

Copy link
Collaborator Author

@kuv2707 kuv2707 Jun 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed in the module that a trailing space was always put after inserting any sort of syntax, so I included it here itself.
Moved the spaces to the caller.

return invalid_stream_topic_regex.test(word);
}

function get_stream_name(syntax: string): string {
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function name is too generic; something like get_stream_name_from_topic_link_syntax would be more appropriate, though I've not checked exactly how it is used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name matches what it does.
Changed.

The #**stream>topic** syntax generates broken links for
topics containing two backticks or ending with *, because of
architectural flaws in the backend markdown processor.
So we avoid generating the syntax for such topics and instead
generate the normal link syntax in markdown.

Fixes zulip#19873
@kuv2707
Copy link
Collaborator Author

kuv2707 commented Jun 8, 2024

@timabbott I have addressed the feedback.

@kuv2707 kuv2707 requested a review from timabbott June 8, 2024 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: markdown (mentions) Mentions for users/groups/stream/topic/time. bug integration review Added by maintainers when a PR may be ready for integration. rust community request Issues of interest to the Rust community size: XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix linking to topic names containing multiple backticks with in-message markdown
5 participants