-
-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store the crawl logs in a unique subfolder per installation #1865
Conversation
@@ -99,7 +99,7 @@ public function run() | |||
|
|||
$jobId = Input::get('jobId'); | |||
$queue = $factory->createLazyQueue(); | |||
$crawLogsDir = sys_get_temp_dir() . '/contao-crawl'; | |||
$crawLogsDir = sys_get_temp_dir() . '/contao-crawl-' . $jobId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should imo rather be something like /contao-crawl/md5(%kernel.projectDir%)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: sys_get_temp_dir() can contain a trailing slash, as visible in this example on Strato.
Mabe something like this, then 🙂
$crawLogsDir = sys_get_temp_dir() . '/contao-crawl-' . $jobId; | |
$crawLogsDir = Path::join(sys_get_temp_dir(), 'contao/crawl', md5(System::getContainer()->getParameter('kernel.projectDir')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the use of Path::join. But its important to use a unique folder in tmp. Otherwise the next user that tries to write into the folder "contao" would also get a "permission denied".
Maybe something like this.
$crawLogsDir = Path::join(sys_get_temp_dir(), 'contao-crawl-' . $jobId);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise the next user that tries to write into the folder "contao" would also get a "permission denied".
The main problem is, that your hoster is using a wrong tmp folder configuration. Every user needs to have its own tmp folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I contactet strato to clarify the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every user needs to have its own tmp folder.
This is not generally true. Shared folders such as /tmp
or /var/lib/php/sessions
often have the sticky bit set to protect the files.
Related: #1813 |
Note: |
Also related: #267 - using a folder based on the |
Thinking about it again, this
should never be a problem if multiple Contao applications are running in the same hosting. So imo no fix is needed. |
@m-vo he probably means, by another Contao installation from a completely different user. |
Yeah, well this is clearly a misconfiguration by the hoster. In which case we shouldn't do anything as well, don't we? |
Its a completely different customer. Sounds like strato has to fix this problem. I am waiting for their answer and will keep you up to date. |
Yes, however, similar adjustments were made to the old |
As already outlined, it's a potential security issue. 🤷 |
It seems, that other webhosters also share the tmp-folder by default. https://help.dreamhost.com/hc/en-us/articles/216735938--tmp-directory-overview I was wondering for what operations the folder "system/tmp" is used for and if it could be an alternative?
|
The whole |
If we must continue to use contao/core-bundle/src/Command/AbstractLockedCommand.php Lines 59 to 69 in 8e364e7
|
The Crawl.php ist generating a md5 hashed folder name based on kernel.project_dir now. Furthermore pathnames are generated with Path::join now to avoid double slashes. |
Sorry, didnt realize that Webmozart is no requirement of the contao-core. Removed Path::join. |
It's implicitly there anyway (for example via |
Hm, if the core-bundle uses it, it should be required. Same for #1868 |
@@ -99,7 +99,7 @@ public function run() | |||
|
|||
$jobId = Input::get('jobId'); | |||
$queue = $factory->createLazyQueue(); | |||
$crawLogsDir = sys_get_temp_dir() . '/contao-crawl'; | |||
$crawLogsDir = sys_get_temp_dir() . md5(System::getContainer()->getParameter('kernel.project_dir')) . '/contao-crawl'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work, if sys_get_temp_dir()
contains no trailing slash.
Yes, of course. I just wanted to say that using it won't stretch our requirements budget. 🙂 |
But where does it come from? Cant find it in the composer.json of contao. https://github.com/contao/contao/blob/master/core-bundle/composer.json |
Yes, you need to add it in this PR. |
Like I said, It's from a dependency. You can use It has to be put into the core's |
Ok, I got it. Files are changed again. |
@bohnmedia Since this is a bugfix, can you please rebase your changes onto the |
I am not sure how to do a rebase. I am getting many conflicts when I try to do "git rebase origin/4.9". I could create a completetly new pull request. |
That's ok as well. But you should probably wait up the discussion regarding |
Alternatively, we can use rtrim to avoid the double slash.
|
Yes, that is what I prefer as well. But let's see. |
I have rebased the PR and adjusted the implementation. @bohnmedia Can you please test if it works for you? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you merge, please make sure the commit message does not contain "crawling problems" because that's not correct.
Thank you @bohnmedia. |
I moved a contao instance to a sharehoster (strato) and got the following error when i tried to generate a crawl index.
The stream or file "/var/tmp//contao-crawl/3ce19d12-15a6-45df-89f3-b717adb35f51_log.csv" could not be opened: failed to open stream: Permission denied
It turned out that the folder "/var/tmp/contao-crawl/" already was created by another contao instance on the same sharehoster. I was able to read the folder with "readdir", load the files in it and see all crawled pages.
After changing the two lines in the Crawler.php, the crawler worked fine. Furthermore its not possible to use "readdir" on "/var/tmp/" so it isnt possible to scan for other contao instances by scanning "/var/tmp/contao-crawl/" anymore.