Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

markdown: detect valid url #113

Closed
wants to merge 11 commits into from
Closed

Conversation

laysauchoa
Copy link
Collaborator

@laysauchoa laysauchoa commented Sep 26, 2020

Steps:

  • [https://ahoi.io](https://ahoi.io)
  • [fun](../fun.html)
  • [struct Foo](super::Foo)

Signed-off-by: Laysa Uchoa laysa.uchoa@gmail.com

What does this PR accomplish?

  • 🩹 Bug Fix

Closes # #44

Changes proposed by this PR:

detect valid urls in [] so spell checkings can be ignored in that span https://ahoi.io

📜 Checklist

  • Works on the ./demo sub directory
  • Test coverage is excellent and passes
  • Documentation is thorough

in case of valid urls, spell checking
can ignore in case of valid ones.

Signed-off-by: Laysa Uchoa <laysa.uchoa@gmail.com>
@laysauchoa laysauchoa marked this pull request as draft September 27, 2020 10:57
@@ -110,6 +111,7 @@ struct Args {
flag_skip_readme: bool,
flag_code: u8,
flag_stdout: bool,
flag_reachable_link: bool,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @drahnr
silly question, where can I pick up the result of what is passed in the flag? Can I pick up in the markdown file?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate where you need what information? I don't quite understand what you are asking for.

The Args struct contains all the commandline parameters. From there on you have to thread 🧵 / stitch 🪡 it through to the place you need it. If there are already other arguments passed through, consider adding a per component argument struct.

I hope that covers it :) if not, please elaborate on you question ❓

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually do not know how to or where should I save the boolean value of this flag from main to markdown file.
Should I do something like this? or how can I do: save the boolean value of the flag so I can do actions in markdown about whether the links should be checked for Get request or not?

lazy_static! {
    static ref USERS: HashMap<&'static str, Vec<&'static str>> = {
        let mut map = HashMap::new();
        map.insert("Loe", vec!["user", "user"]);
        map
    };
}

fn show_user(name: &str) {
    let access = USERS.get(name);
    println!("{}: {:?}", name, access);
}

Thank you in advance

Copy link
Owner

@drahnr drahnr Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

markdown file

I assume you mean propagate the bool from main.rs / fn main to markdown.rs.

Just looked into the code now, probably should have done that earlier.

Soo, this is indeed a bit iffy. There are three options (see detailed discussion underneath):

  1. fiddle the bool through to erase_cmark
  2. make the whole configuration a global singleton
  3. make the link destination check a separate, indepedent checker that implements trait Checker

I would recommend option 3., since markdown parsing is pretty cheap compared to the http lookup (even with smid yet to be enabled), and also compared to the dictionary lookup (see #57 for the flamegraph)

Option 1. is feasible, but would mix concerns rather unpleasantly - a mashup of a feature and a parsing layer - not so good and entangles the code quite a bit.

Option 2. is something for a server application / service mostly done for ad-hoc re-configuration, which would work too, but is somewhat special and makes testing harder, so I think the trade of is simply not given here.

Very good question!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not really follow how adding the Checker trait will help me to have the desired flag in markdown file.

This is an example of the config I found in the Hunspell checker like here:
https://github.com/drahnr/cargo-spellcheck/blob/master/src/checker/hunspell.rs#L118

{ lang: Some("en_US"), search_dirs: SearchDirs(Some(["/home/tmhdev/Documents/cargo-spellcheck/.config", "/usr/share/hunspell"])), extra_dictionaries: Some(["/home/tmhdev/Documents/cargo-spellcheck/.config/lingo.dic"]), quirks: Some(Quirks { transform_regex: Some([WrappedRegex(^'([^\s])'$), WrappedRegex(^[0-9]+x$), WrappedRegex(^\#[0-9]+$), WrappedRegex(^[0-9]+$), WrappedRegex(^.+\+$)]), allow_concatenation: Some(true), allow_dashes: None }) }

Also, I am confused if you meant that this flag reachable_link should not be a boolean but rather be passed to:
--checkers=<checkers> Calculate the intersection between

I am confused about this step. Probably because I lack to understand how other parts of the project work.

Copy link
Owner

@drahnr drahnr Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not get notifed of this comment 🤨 maybe one of the plenty gh outages swallowed that notification, maybe I screwed up.

The issue here is, that currently the markdown is a layer in the general processing of input - which as of today - has no knowledge of any configuration, it's in it's core a pseudo-stream based pipeline with origin tracking.

Now if you would implement this with the trait Checker you would be called straight from main like Hunspell and LanguageTool (and soon Reflow) are, each with their defined params. So if you move away from being just a knot or layer in the input processing, but actually a consumer of the output of those layers, you can operate on the "raw" chunks (with markdown included) and do the checks right there.

Hopefully I can find a few minutes to generate some documentation images for #54 to clarify the whole flow a bit.

Copy link
Owner

@drahnr drahnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minuscule nitpicks only 👍

src/documentation/markdown.rs Outdated Show resolved Hide resolved
src/documentation/markdown.rs Outdated Show resolved Hide resolved
src/documentation/markdown.rs Outdated Show resolved Hide resolved
src/documentation/markdown.rs Show resolved Hide resolved
src/main.rs Show resolved Hide resolved
@drahnr drahnr mentioned this pull request Sep 28, 2020
@drahnr
Copy link
Owner

drahnr commented Oct 9, 2020

I'd like to release a 1.0.0 rather soon~ish (end of October~ish), and I'd love to include this feature with that :) Do you think you that is enough time to complete it until then? Is there anything you need help with?

Signed-off-by: Laysa Uchoa <laysa.uchoa@gmail.com>
@laysauchoa
Copy link
Collaborator Author

Unfortunately, closing it as I did not figure out how to bring flags to other files needed for this PR.

@laysauchoa laysauchoa closed this Oct 16, 2020
@drahnr
Copy link
Owner

drahnr commented Oct 16, 2020

You are so close, it's just one last refactor with the trait Checker to make it all work ( 🤞 ) but if you feel this is becoming a chore, I can take over from here. Whatever you prefer.

@drahnr drahnr reopened this Oct 16, 2020
@drahnr
Copy link
Owner

drahnr commented Oct 18, 2020

@laysauchoa I went forward and added the integration with the configuration file according to Option 1) described earlier - it turned out less messy than anticipated.
Do you want to tackle the rest or should I take it over in its enterty?

@drahnr drahnr added enhancement 🦚 New feature or request hacktoberfest labels Oct 18, 2020
@laysauchoa
Copy link
Collaborator Author

laysauchoa commented Oct 18, 2020

@laysauchoa I went forward and added the integration with the configuration file according to Option 1) described earlier - it turned out less messy than anticipated.
Do you want to tackle the rest or should I take it over in its enterty?

thanks, @drahnr I had no idea how to do this. I understood that the flag will be link_check. I will try to continue from here. Thanks a lot!

@laysauchoa laysauchoa marked this pull request as ready for review October 26, 2020 16:34
@laysauchoa
Copy link
Collaborator Author

@laysauchoa I went forward and added the integration with the configuration file according to Option 1) described earlier - it turned out less messy than anticipated.
Do you want to tackle the rest or should I take it over in its enterty?

Would it be possible to update the readme how to use the integration for the flag added it? I put a place holder because I am still missing the value for config.

@@ -61,6 +62,14 @@ impl<'a> PlainOverlay<'a> {
fn valid_url(url: &str) -> bool {
url::Url::parse(url).is_ok()
}
fn reachable_link(url: &str) -> bool {
use reqwest;
let response = reqwest::blocking::get(url).unwrap();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want this to remain blocking, which I think makes sense, we should think about parallelizing work per document eventuall (separate PR).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also track which url we already checked, we don't want to spam a server with requests and on the other side cause the user to wait for an extended time for not much benefit.

@drahnr
Copy link
Owner

drahnr commented Oct 26, 2020

@laysauchoa I went forward and added the integration with the configuration file according to Option 1) described earlier - it turned out less messy than anticipated.
Do you want to tackle the rest or should I take it over in its enterty?

Would it be possible to update the readme how to use the integration for the flag added it? I put a place holder because I am still missing the value for config.

I think in this case you either need to add &self to fn extract_plain_with_mapping (here~ish

fn extract_plain_with_mapping(cmark: &str) -> (String, IndexMap<Range, Range>) {
) as argument, or feed out the resulting unreachable websites via a suggestion variant, and then, later on filtering that. Does that make sense?

@drahnr drahnr changed the title [wip] markdown: detect valid url markdown: detect valid url Nov 3, 2020
@drahnr
Copy link
Owner

drahnr commented Nov 14, 2020

@laysauchoa I merged all the giant reflow stuff so I'll go ahead and rebase this.

Could you outline which work items are still open? Is anything blocking you right now?

@drahnr
Copy link
Owner

drahnr commented Dec 2, 2020

We could take some code from lychee, which does all the link checking pretty nicely already :)

@drahnr drahnr mentioned this pull request Dec 15, 2020
3 tasks
@drahnr drahnr mentioned this pull request Apr 27, 2021
3 tasks
@drahnr
Copy link
Owner

drahnr commented Jul 30, 2021

It's been a while :) I think we should come to an end and attempt to push it over the finish line, or define the topic to be out of scope for cargo-spellcheck unless somebody else wants to pick it up.

@drahnr drahnr added the stale 😐 PRs and issues that need some additional work to make it across the finishline label Aug 18, 2021
@drahnr
Copy link
Owner

drahnr commented May 4, 2022

Unfortunately closing, would love to see this eventually

@drahnr drahnr closed this May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🦚 New feature or request stale 😐 PRs and issues that need some additional work to make it across the finishline
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants