Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make wget automatically determine the file name. #536

Open
wants to merge 9 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@Luca0208
Copy link
Contributor

Luca0208 commented Apr 4, 2018

Sometimes when using wget it is pretty annoying having to write the file name again.
This PR makes that unnecessary by automatically determining the file name.
This is done by stripping all trailing slashes and afterwards taking everything after the last remaining slash.
This is either the domain name if used like this: https://domain.tld or the file name if used liked this: https://domain.tld/folder/file.ext

Luca0208 added some commits Apr 4, 2018

Giving a filename to wget is now optional.
wget will try to determine the file name by:
1. Stripping all trailing slashes
2. Take everything right of the last remaining slash, meaning either the full domain name or the file or foldername on the server.
@SquidDev

This comment has been minimized.

Copy link
Contributor

SquidDev commented Apr 4, 2018

Thank you! This is something I've been meaning to do for a while. Instead of using a loop, it might be a little cleaner to use gsub and match though:

-- Trims everything after ? or # (means foo#bar or foo?bar=qux reduce to foo), 
-- then trims trailing `/`
sUrl = sUrl:gsub( "[#?].*", "" ):gsub( "/+$", "" )

-- Find everything from the last / to the end of the string
return sUrl:match( "/([^/]+)$" ) or sUrl

Note I've only done some limited testing, so this may be more susceptible to breaking.

@Luca0208

This comment has been minimized.

Copy link
Contributor Author

Luca0208 commented Apr 4, 2018

foo#bar should be reduced to bar, however I'm not sure if foo?bar=s should be reduced to foo.
Sometimes these URL Parameters matter.

As for the pattern I tested them against a bunch of edge cases and they seem to work.
The or sUrl in the last line isn't even needed because the pattern will match the / of "http://" or "https://" which is required for a valid URL.

@dmarcuse

This comment has been minimized.

Copy link
Contributor

dmarcuse commented Apr 4, 2018

@Luca0208

This comment has been minimized.

Copy link
Contributor Author

Luca0208 commented Apr 4, 2018

That's a good point. I don't have access to CC in MC right now, but both CCEmuX and CCEmuRedux also emit ? in filenames, = seems to be allowed(However I'm on a mac right now, not sure how it behaves on windows)

@dmarcuse

This comment has been minimized.

Copy link
Contributor

dmarcuse commented Apr 4, 2018

On NTFS (the most commonly used Windows filesystem) it will fail.

@Luca0208

This comment has been minimized.

Copy link
Contributor Author

Luca0208 commented Apr 4, 2018

Ok it will now strip anchors and url parameters and I also applied the suggestion of SquidDev to use patterns and gsub/find

getFilename() needs to return something, because shell.resolve() is c…
…alled before checking if the URL is valid
@Luca0208
Copy link
Contributor Author

Luca0208 left a comment

Any opinions on this?


local function getFilename( sUrl )
sUrl = sUrl:gsub( "[#?].*" , "" ):gsub( "/+$" , "" )
return sUrl:match( "/([^/]+)$" ) or " "

This comment has been minimized.

@Luca0208

Luca0208 Apr 5, 2018

Author Contributor

I'm not sure if the or " " should be here or further down(See comment on line 52)

@@ -43,7 +48,7 @@ end

-- Determine file to download
local sUrl = tArgs[1]
local sFile = tArgs[2]
local sFile = tArgs[2] or getFilename(sUrl)
local sPath = shell.resolve( sFile )

This comment has been minimized.

@Luca0208

Luca0208 Apr 5, 2018

Author Contributor

This could be replaced by shell.resolve( sFile or " ")

@SquidDev

This comment has been minimized.

Copy link
Contributor

SquidDev commented Apr 5, 2018

On my machine (Windows), fs.exists(" ") always returns true, as it's parsed as a directory. Which means wget google.com will result in File already exists instead of URL malformed.

I think it would be better to move the http.checkURL call before the file name extraction. This way one doesn't need the or " " at all, which makes it substantially less ugly. It might produce a tiny bit of delay, as this may require a DNS lookup, but it should be imperceptible (<50ms).

@Luca0208

This comment has been minimized.

Copy link
Contributor Author

Luca0208 commented Apr 5, 2018

Ok in that case I will put the http.checkURL before the filename extraction.
Edit: Newest commit should fix this

@Luca0208

This comment has been minimized.

Copy link
Contributor Author

Luca0208 commented Apr 5, 2018

I just noticed that the help page of wget should be changed to reflect this change. Going to do that this evening.

--Check if the URL is valid
local ok, err = http.checkURL( sUrl )
if not ok then
if err then

This comment has been minimized.

@SquidDev

SquidDev Apr 5, 2018

Contributor

I wonder if it'd be worth doing printError( err or "Failed. ") instead. Though err shouldn't ever be nil, so neither is really needed.

This comment has been minimized.

@Luca0208

Luca0208 Apr 5, 2018

Author Contributor

If this is done it would probably be better to use "Invalid URL" or something like that instead of "Failed." to distinguish it from the "Failed." message caused when the download failed.
Edit: But you are right in any way the if is not needed. Just copied the checking function from the original

Luca0208 added some commits Apr 5, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.