Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Image URLs without .JPG/.PNG Suffixes #51

Closed
CavalloScuro opened this issue Feb 23, 2021 · 9 comments
Closed

Allow Image URLs without .JPG/.PNG Suffixes #51

CavalloScuro opened this issue Feb 23, 2021 · 9 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@CavalloScuro
Copy link

I have been searching for well over a week for an application that does precisely what this wonderful application can do. However, I ran into one insurmountable obstacle: the website from which I am currently attempting to batch download images (of historical newspapers, journals, magazines, etc.) obscures the absolute path to the images in question. Thus, my image URLs conform to the following format:

http://digitale.bnc.roma.sbn.it/tecadigitale/img/giornale/TO00185283/1880/unico/00000001/original

It would be super if the developers of this application could allow for image URLs such as this one that do not feature a .jpg/.png file extension. As it currently stands, my URLs simply error out. I have tried to use other batch downloaders, and they successfully download these images, but none of them allow for filenames or folder names, which is what makes this application highly attractive.

Everything else about this application is incredible, and I commend the developers for their very good work.

Thank you for your consideration.

@btargac
Copy link
Owner

btargac commented Feb 25, 2021

Hi @CavalloScuro thanks for reaching out, trying the app and giving feedbacks about possible improvements.

In your case urls without an extension brings a problem on saving to filesystem (with which filename and extension to save) but I think it can be handled with the server response header (via parsing the mime type)

Can you supply a small set of image resources without file extensions to make the testing easier?

Kind regards

@btargac btargac self-assigned this Feb 25, 2021
@btargac btargac added enhancement New feature or request good first issue Good for newcomers labels Feb 25, 2021
@CavalloScuro
Copy link
Author

Dear Burak,

Thank you for responding so quickly and so kindly to my inquiry. As I mentioned, I'm really impressed with your tool, which will be an enormous (invaluable, really) resource for my research.

The URLs that I am dealing with right now look like this:

http://digitale.bnc.roma.sbn.it/tecadigitale/img/giornale/TO00181645/1935/unico/00000001/original
http://digitale.bnc.roma.sbn.it/tecadigitale/img/giornale/TO00181645/1935/unico/00000002/original
http://digitale.bnc.roma.sbn.it/tecadigitale/img/giornale/TO00181645/1935/unico/00000003/original
http://digitale.bnc.roma.sbn.it/tecadigitale/img/giornale/TO00181645/1935/unico/00000004/original
http://digitale.bnc.roma.sbn.it/tecadigitale/img/giornale/TO00181645/1935/unico/00000005/original

There is no way, that I know of, to get an absolute path to the jpg itself; I have only been able to find these URLs which are essentially HTML websites with the .jpg embedded inside of them. But when one performs the Save As function, it saves the file as a .jpg (at least in my browser). The problem, of course, is automation!

I thank you for being willing to look into this solution. It means a lot. And thanks for your hard work.

CS

@CavalloScuro
Copy link
Author

I just had an idea: I wonder if I could just put the .jpg extensions in the "FileName" column of the spreadsheet, which would then become the full file name when the Parser assigns the extensionless file a file name. Am I off here, or could this be a possible solution?

@btargac
Copy link
Owner

btargac commented Feb 26, 2021

Yeah that could work of course;

But that could require some extra work to prepare the excel file in cases where you can not guess the extension or if the resource changes from png to jpg on the server, or if they serve the image with an image CDN they can check your request headers and send webP, avif or any other modern extension due to your headers( in this case embedded chromiums user agent), so the most convenient way seems to rely on the server response's mime type and save with that extension on the filesystem.

This task won't take a long time, I guess next week I'll have some spare time to work on 👨‍💻

@CavalloScuro
Copy link
Author

I thank you so much, Burak! Very kind!

@btargac
Copy link
Owner

btargac commented Mar 8, 2021

Hi @CavalloScuro ,

I looked for the mimeType solution but we have missed some other problem,

every file ends with the name original so when the job is completed you only have one file and its the last one on the excel file since it always over writes the last file with the same name :)

so I guess you should also have a column B (which is currently supported, it can contain an extension or not in your case).

for example;

|    | A                                             | B            | C                         |
| ---| :---------------------------------------------| :------------| :-------------------------|
| 1  | http://.................../00000001/original  | optional1    | optional-sub-folder-name  |
| 2  | http://.................../00000002/original  | optional2    | optional-sub-folder-name  |
| 3  | http://.................../00000003/original  | optional3.png|                           |
| .  | ...                                           |              |                           |
| .  | ...                                           |              |                           |

so if the column B contains an extension that will be used, but if not mimeType from the response will be used, does that sound good to you ?

@CavalloScuro
Copy link
Author

Dear Burak,

Thank you very much for taking the time to look into this and making these adjustments to your code. I really appreciate it. I'll give it a go right now and report back later.

Many thanks again.

CS

@btargac
Copy link
Owner

btargac commented Mar 8, 2021

By the way the new code is not usable right now, there will be a new release and in the update notes we will have this kind of story for extension free file urls.

Kind regards

@btargac
Copy link
Owner

btargac commented Mar 29, 2021

updates are handle in the pull request #54 and merged to master 👍🏼

@btargac btargac closed this as completed Mar 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants