Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store the crawl logs in a unique subfolder per installation #1865

Merged
merged 2 commits into from
Jul 3, 2020
Merged

Store the crawl logs in a unique subfolder per installation #1865

merged 2 commits into from
Jul 3, 2020

Conversation

bohnmedia
Copy link

I moved a contao instance to a sharehoster (strato) and got the following error when i tried to generate a crawl index.

The stream or file "/var/tmp//contao-crawl/3ce19d12-15a6-45df-89f3-b717adb35f51_log.csv" could not be opened: failed to open stream: Permission denied

It turned out that the folder "/var/tmp/contao-crawl/" already was created by another contao instance on the same sharehoster. I was able to read the folder with "readdir", load the files in it and see all crawled pages.

After changing the two lines in the Crawler.php, the crawler worked fine. Furthermore its not possible to use "readdir" on "/var/tmp/" so it isnt possible to scan for other contao instances by scanning "/var/tmp/contao-crawl/" anymore.

@@ -99,7 +99,7 @@ public function run()

$jobId = Input::get('jobId');
$queue = $factory->createLazyQueue();
$crawLogsDir = sys_get_temp_dir() . '/contao-crawl';
$crawLogsDir = sys_get_temp_dir() . '/contao-crawl-' . $jobId;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should imo rather be something like /contao-crawl/md5(%kernel.projectDir%).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: sys_get_temp_dir() can contain a trailing slash, as visible in this example on Strato.

Mabe something like this, then 🙂

Suggested change
$crawLogsDir = sys_get_temp_dir() . '/contao-crawl-' . $jobId;
$crawLogsDir = Path::join(sys_get_temp_dir(), 'contao/crawl', md5(System::getContainer()->getParameter('kernel.projectDir'));

Copy link
Author

@bohnmedia bohnmedia Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the use of Path::join. But its important to use a unique folder in tmp. Otherwise the next user that tries to write into the folder "contao" would also get a "permission denied".

Maybe something like this.

$crawLogsDir = Path::join(sys_get_temp_dir(), 'contao-crawl-' . $jobId);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise the next user that tries to write into the folder "contao" would also get a "permission denied".

The main problem is, that your hoster is using a wrong tmp folder configuration. Every user needs to have its own tmp folder.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I contactet strato to clarify the problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every user needs to have its own tmp folder.

This is not generally true. Shared folders such as /tmp or /var/lib/php/sessions often have the sticky bit set to protect the files.

@fritzmg
Copy link
Contributor

fritzmg commented Jun 24, 2020

Related: #1813

@fritzmg
Copy link
Contributor

fritzmg commented Jun 24, 2020

Note: sys_get_temp_dir() can contain a trailing slash, as visible in this example on Strato.

@fritzmg
Copy link
Contributor

fritzmg commented Jun 24, 2020

Also related: #267 - using a folder based on the kernel.projectDir might still fail in special circumstances apparently.

@m-vo
Copy link
Member

m-vo commented Jun 24, 2020

Thinking about it again, this

It turned out that the folder "/var/tmp/contao-crawl/" already was created by another contao instance

should never be a problem if multiple Contao applications are running in the same hosting. So imo no fix is needed.

@fritzmg
Copy link
Contributor

fritzmg commented Jun 24, 2020

@m-vo he probably means, by another Contao installation from a completely different user.

@m-vo
Copy link
Member

m-vo commented Jun 24, 2020

he probably means, by another Contao installation from a completely different user.

Yeah, well this is clearly a misconfiguration by the hoster. In which case we shouldn't do anything as well, don't we?

@bohnmedia
Copy link
Author

bohnmedia commented Jun 24, 2020

Its a completely different customer. Sounds like strato has to fix this problem. I am waiting for their answer and will keep you up to date.

@fritzmg
Copy link
Contributor

fritzmg commented Jun 24, 2020

Yeah, well this is clearly a misconfiguration by the hoster. In which case we shouldn't do anything as well, don't we?

Yes, however, similar adjustments were made to the old AbstractLockedCommand for that reason, before it was eventually dropped.

@asaage
Copy link

asaage commented Jun 24, 2020

Reading this i checked my own sys_temp_dir wich was not set in php.ini /tmp was utilised via TMPDIR by default.
File-ownership was correct however - do you realy think having a serverwide tmp-dir is a misconfiguration?
I then set it to /home/myown/tmp
And was greeted by the manager:
image
which was because i did not create the directory 🙈.

I doubt that it's good if getenv('TMPDIR') and sys_get_temp_dir() return different values.

@m-vo
Copy link
Member

m-vo commented Jun 24, 2020

  • do you realy think having a serverwide tmp-dir is a misconfiguration?

As already outlined, it's a potential security issue. 🤷

@bohnmedia
Copy link
Author

bohnmedia commented Jun 25, 2020

It seems, that other webhosters also share the tmp-folder by default.

https://help.dreamhost.com/hc/en-us/articles/216735938--tmp-directory-overview

I was wondering for what operations the folder "system/tmp" is used for and if it could be an alternative?

image

$crawLogsDir = System::getContainer()->getParameter('kernel.project_dir') . '/system/tmp/contao-crawl';
...
return System::getContainer()->getParameter('kernel.project_dir') . '/system/tmp/contao-crawl/' . $jobId . '_' . $subscriberName . '_log.csv';

@fritzmg
Copy link
Contributor

fritzmg commented Jun 25, 2020

I was wondering for what operations the folder "system/tmp" is used for and if it could be an alternative?

The whole system/ folder is there for legacy reasons. (var/tmp/ would be better suited.)

@leofeyer
Copy link
Member

leofeyer commented Jun 25, 2020

If we must continue to use sys_get_temp_dir(), even though it is error-prone and even the Symfony devs have already discussed adding a var/tmp folder in symfony/symfony#36515, we should generate installation specific subfolders as we do here:

private function getTempDir(): string
{
$container = $this->getContainer();
$tmpDir = sys_get_temp_dir().'/'.md5($container->getParameter('kernel.project_dir'));
if (!is_dir($tmpDir)) {
$container->get('filesystem')->mkdir($tmpDir);
}
return $tmpDir;
}

@bohnmedia
Copy link
Author

If we must continue to use sys_get_temp_dir(), even though it is error-prone and even the Symfony devs have already discussed adding a var/tmp folder in symfony/symfony#36515, we should generate installation specific subfolders as we do here:

private function getTempDir(): string
{
$container = $this->getContainer();
$tmpDir = sys_get_temp_dir().'/'.md5($container->getParameter('kernel.project_dir'));
if (!is_dir($tmpDir)) {
$container->get('filesystem')->mkdir($tmpDir);
}
return $tmpDir;
}

The Crawl.php ist generating a md5 hashed folder name based on kernel.project_dir now. Furthermore pathnames are generated with Path::join now to avoid double slashes.

@bohnmedia
Copy link
Author

Sorry, didnt realize that Webmozart is no requirement of the contao-core. Removed Path::join.

@m-vo
Copy link
Member

m-vo commented Jun 26, 2020

Sorry, didnt realize that Webmozart is no requirement of the contao-core. Removed Path::join.

It's implicitly there anyway (for example via contao/image).

@fritzmg
Copy link
Contributor

fritzmg commented Jun 26, 2020

It's implicitly there anyway (for example via contao/image).

Hm, if the core-bundle uses it, it should be required. Same for #1868

@@ -99,7 +99,7 @@ public function run()

$jobId = Input::get('jobId');
$queue = $factory->createLazyQueue();
$crawLogsDir = sys_get_temp_dir() . '/contao-crawl';
$crawLogsDir = sys_get_temp_dir() . md5(System::getContainer()->getParameter('kernel.project_dir')) . '/contao-crawl';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work, if sys_get_temp_dir() contains no trailing slash.

@m-vo
Copy link
Member

m-vo commented Jun 26, 2020

Yes, of course. I just wanted to say that using it won't stretch our requirements budget. 🙂

@bohnmedia
Copy link
Author

It's implicitly there anyway (for example via contao/image).

Hm, if the core-bundle uses it, it should be required. Same for #1868

But where does it come from? Cant find it in the composer.json of contao.

https://github.com/contao/contao/blob/master/core-bundle/composer.json

@fritzmg
Copy link
Contributor

fritzmg commented Jun 26, 2020

But where does it come from?

contao/image, as @m-vo already said.

Cant find it in the composer.json of contao.

https://github.com/contao/contao/blob/master/core-bundle/composer.json

Yes, you need to add it in this PR.

@m-vo
Copy link
Member

m-vo commented Jun 26, 2020

Like I said, It's from a dependency. You can use composer why webmozart/path-util --tree to find out.

It has to be put into the core's composer.json if you're using it in the core's source, but it won't - to this date - install any new packages.

@bohnmedia
Copy link
Author

Ok, I got it. Files are changed again.

@bohnmedia bohnmedia requested a review from fritzmg June 26, 2020 15:45
fritzmg
fritzmg previously approved these changes Jun 26, 2020
@leofeyer leofeyer added the bug label Jun 30, 2020
@leofeyer leofeyer added this to the 4.9 milestone Jun 30, 2020
@leofeyer
Copy link
Member

@bohnmedia Since this is a bugfix, can you please rebase your changes onto the 4.9 branch?

@bohnmedia
Copy link
Author

I am not sure how to do a rebase. I am getting many conflicts when I try to do "git rebase origin/4.9". I could create a completetly new pull request.

@leofeyer
Copy link
Member

That's ok as well. But you should probably wait up the discussion regarding Path::join() here, because IMHO it is an unnecessary dependency that could be avoided.

@bohnmedia
Copy link
Author

bohnmedia commented Jun 30, 2020

Alternatively, we can use rtrim to avoid the double slash.

rtrim(sys_get_temp_dir(), '/') . '/' . md5(System::getContainer()->getParameter('kernel.project_dir')) . '/contao-crawl/' . $jobId . '_' . $subscriberName . '_log.csv'

@leofeyer
Copy link
Member

Alternatively, we can use rtrim to avoid the double slash.

Yes, that is what I prefer as well. But let's see.

@leofeyer leofeyer changed the base branch from master to 4.9 July 1, 2020 12:18
@leofeyer
Copy link
Member

leofeyer commented Jul 1, 2020

I have rebased the PR and adjusted the implementation. @bohnmedia Can you please test if it works for you?

@leofeyer leofeyer requested a review from a team July 3, 2020 07:14
Copy link
Member

@Toflar Toflar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you merge, please make sure the commit message does not contain "crawling problems" because that's not correct.

@leofeyer leofeyer changed the title Fixed crawling problems on sharehoster (strato) Store the crawl logs in a unique subfolder per installation Jul 3, 2020
@leofeyer leofeyer merged commit 9969120 into contao:4.9 Jul 3, 2020
@leofeyer
Copy link
Member

leofeyer commented Jul 3, 2020

Thank you @bohnmedia.

@bohnmedia bohnmedia removed their assignment Jul 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants