Skip to content
This repository has been archived by the owner on Feb 19, 2019. It is now read-only.

Download helper functions for common annoying sources #377

Open
Redsandro opened this issue Nov 18, 2013 · 9 comments
Open

Download helper functions for common annoying sources #377

Redsandro opened this issue Nov 18, 2013 · 9 comments

Comments

@Redsandro
Copy link
Contributor

We need download helper functions for sites that are often hosting the software we pacakge.
Because now, every time a site breaks, we have to fix and push and moderate a lot of packages, while fixing the helper would fix all packages at once.

  • CNET (...)
  • FossHub
  • Sourceforge (is a built-in helper)

@gep13 said:

This is something that other packages that come from CNET are going to run into. Does this warrant some sort of helper method in Chocolatey to do this work, or should it be left to package maintainers to look after?

@Redsandro said:

It's doable. But given that a Chocolatey update can take a few months to make it through, I don't think it's clever to have multiple packages depend on this when CNET changes things.

This needs some thinking, better open an issue on GitHub.

@ferventcoder said:

Well look for things to get faster in the coming months.


@ferventcoder I didn't mean that in a bad way. It's good that Chocolatey has some quality assurance before being pushed.

Maybe if we're gonna do this we should have the helper functions in a separate module that can be updated instantly, independent of Chocolatey, when someone pushes a fix.

Anyway I just wanted to put this out there. In case anyone wants to play with it.

I've extracted the regex code to a separate module here:
https://github.com/Redsandro/chocolatey/blob/master/partitionassistant/tools/Get-UrlFromCnet.ps1

The example regex (Get-FilenameFromRegex $url '(?ms).*href="(http://download.cnet.com/.+?)".*') works when there is only one CNET link on the page. Otherwise it has to be more detailed (e.g. add anchor text/title).

@przemoc
Copy link

przemoc commented Nov 19, 2013

I have to add that while my exemplary gist, linked in the comments about AOMEI Partition Assistant Standard, is working, it was only a kind of quick and dirty how-it-can-be-done thing. It may work as a first approach, but it's wrong in the long term, because regexps shouldn't be used to parse html - html parser is needed for it.

Question is whether PowerShell has html parser out-of-the-box and if not, which one of open source ones out there (I think there should be at least one, but haven't googled yet) should be included in chocolatey for convenience of package creators.

@Redsandro
Copy link
Contributor Author

Nothing wrong with it. I use regexes for other packages too. Just didn't think about using a bunch of them in a row.
Feel free to improve. Coming from Linux myself, I didn't know powershell either when I discovered Chocolatey.

Maybe your regexes are smarter, I just rewrote them because I don't get the sed syntax with all the comma's. ❓

@gep13
Copy link
Member

gep13 commented Nov 19, 2013

What is being described here sounds to me like a Chocolatey.Contrib project, which is where the helper module would like. This could then be a standalone package that people can take a dependency on.

Thoughts?

@Redsandro
Copy link
Contributor Author

Same problem again. Fixing the broken CNET parser needs all packages fixed. This should be internal, just like SourceForge is handled internally.

@Redsandro
Copy link
Contributor Author

For reference:
https://github.com/Redsandro/chocolatey/blob/master/autopackages/PartitionAssistant/tools/Get-UrlFromCnet2.ps1

However, why does this return an array?

[0]    =>    True,
[1]    =>    "http://www.TheUrlThatIAmLookingFor'

@jberezanski
Copy link
Contributor

In line 33, the -match operator returns a boolean value indicating whether the match succeeded or not. Because this value is neither assigned to a variable nor used in any other way, it becomes part of the function output.

To fix this, either:

  1. pipe the result to Out-Null:
    $html -match "(?ms)data-dl-url=['`"](.+?)['\" ]+data-nodlm-url" | Out-Nullor, better, 2) actually check if the match succeeded: if ($html -notmatch "(?ms)data-dl-url='\"['\" ]+data-nodlm-url") { throw "Match failed" }`

@Redsandro
Copy link
Contributor Author

Thanks! 👍

@Redsandro
Copy link
Contributor Author

I still think helpers should have a certain format, and they should be packages that can be referenced as dependencies.

One change means one package to fix, and all packages that depend on it will work again.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants