New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is it possible to include the level in the name template #40
Comments
You mean the page depth? Consider this: In many sites, a page is linked from more than one place. The depth will only represent the depth that snapcrawl encountered this page. The "real" level, is in fact the URL itself - since it represents how deep the page is. What is the use case for this? |
we are looking to implement ImageAI to recognise the site and the level of the page ( of course in standard search from the root on ) When running snapcrawl, it shows that level and that would be useful for this application. From the image we then could conclude that we are at the top level or somewhere in between. |
I see. Give me a few minutes, I will push a test version for you to try. |
Can you try the version from GitHub? Create a Gemfile with this content: source "https://rubygems.org"
git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
gem 'snapcrawl', github: 'dannyben/snapcrawl', branch: 'add/depth-var-to-name-template' then run This version lets you use the variable name_template: '%{depth}-%{url}' Note, you cannot use slashes. If this is important, I can try and implement this as well. let me know if this is what you need. |
I was able to get the docker image running but what you say here is beyond my understanding. I am sorry, I am 65 and try to follow, but i do not know ruby, gem, ... |
Oh, sorry - I thought you were using the Ruby version. That's fine. I just tested it locally - and in fact there is a side effect problem, which makes this implementation I did useless. Snapcrawl determines that it does not need to take a screenshot if the file exists. So, when using the |
I understand, in fact i did not think about that, file comparison can eventually happen on the part without the depth. that would solve this issue , i think |
Yeah, it could, but it will require a deeper change, and I am not sure this use case is common. You already have the depth information in the output log (which can be redirected to a file with |
Yes, i can use the log file to get the info. best regards |
Alright. If I come up with a better solution with a good effort-reward ratio, I will post here. |
is it possible to include the level(depth) in the name template. this would be very helpful but i see no reference to that in the documentation. example %level//%url or similar
thanks in advance
The text was updated successfully, but these errors were encountered: