Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is it possible to include the level in the name template #40

Closed
ghisbo opened this issue Jun 9, 2021 · 10 comments
Closed

is it possible to include the level in the name template #40

ghisbo opened this issue Jun 9, 2021 · 10 comments

Comments

@ghisbo
Copy link

ghisbo commented Jun 9, 2021

is it possible to include the level(depth) in the name template. this would be very helpful but i see no reference to that in the documentation. example %level//%url or similar

thanks in advance

@DannyBen
Copy link
Owner

DannyBen commented Jun 9, 2021

You mean the page depth?
This can be added, but I am not sure it will be useful.

Consider this: In many sites, a page is linked from more than one place. The depth will only represent the depth that snapcrawl encountered this page.

The "real" level, is in fact the URL itself - since it represents how deep the page is.

What is the use case for this?

@ghisbo
Copy link
Author

ghisbo commented Jun 9, 2021

we are looking to implement ImageAI to recognise the site and the level of the page ( of course in standard search from the root on ) When running snapcrawl, it shows that level and that would be useful for this application. From the image we then could conclude that we are at the top level or somewhere in between.

@DannyBen
Copy link
Owner

DannyBen commented Jun 9, 2021

I see.

Give me a few minutes, I will push a test version for you to try.

@DannyBen
Copy link
Owner

DannyBen commented Jun 9, 2021

Can you try the version from GitHub?

Create a Gemfile with this content:

source "https://rubygems.org"
git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
gem 'snapcrawl', github: 'dannyben/snapcrawl', branch: 'add/depth-var-to-name-template'

then run bundle to install it, then in the same folder, use bundle exec snapcrawl instead of snapcrawl.

This version lets you use the variable %{depth} in the config, for example:

name_template: '%{depth}-%{url}'

Note, you cannot use slashes. If this is important, I can try and implement this as well.

let me know if this is what you need.

@ghisbo
Copy link
Author

ghisbo commented Jun 9, 2021

I was able to get the docker image running but what you say here is beyond my understanding. I am sorry, I am 65 and try to follow, but i do not know ruby, gem, ...
Howvere, What you show looks ok to me.

@DannyBen
Copy link
Owner

DannyBen commented Jun 9, 2021

Oh, sorry - I thought you were using the Ruby version. That's fine.

I just tested it locally - and in fact there is a side effect problem, which makes this implementation I did useless.

Snapcrawl determines that it does not need to take a screenshot if the file exists. So, when using the depth variable in the filename, it will capture the same image only with a different name (if it appears in different depths).

@ghisbo
Copy link
Author

ghisbo commented Jun 9, 2021

I understand, in fact i did not think about that, file comparison can eventually happen on the part without the depth. that would solve this issue , i think

@DannyBen
Copy link
Owner

DannyBen commented Jun 9, 2021

Yeah, it could, but it will require a deeper change, and I am not sure this use case is common.

You already have the depth information in the output log (which can be redirected to a file with > out.log or something) - isn't this sufficient for your model? It can take this log file as input and get the depth for each url.

@ghisbo
Copy link
Author

ghisbo commented Jun 9, 2021

Yes, i can use the log file to get the info.
Thanks anyway for the speedy replies and for having considered.

best regards

@DannyBen
Copy link
Owner

DannyBen commented Jun 9, 2021

Alright. If I come up with a better solution with a good effort-reward ratio, I will post here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants