Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTMLPurifier not writable #84

Closed
AroundtheGlobe opened this issue Apr 4, 2019 · 15 comments
Closed

HTMLPurifier not writable #84

AroundtheGlobe opened this issue Apr 4, 2019 · 15 comments

Comments

@AroundtheGlobe
Copy link

AroundtheGlobe commented Apr 4, 2019

I've updated php-htmldiff from version 1.0.0 to 1.0.9 because it wasn't able to compare a 5.500 word article within 30 sec before the PHP timeout kicked in. After updating I got the error message:
/vendor/ezyang/htmlpurifier/library/HTMLPurifier/DefinitionCache/Serializer not writable, please chmod to 777

I've added the config settings to a dir with 777 rights like this:

$htmlDiffConfig = new HtmlDiffConfig();
$htmlDiffConfig->setPurifierCacheLocation(Config::getDocRoot().'tempdir');
$htmlDiff = new HtmlDiff('text1 ', 'text2', $htmlDiffConfig);

But that setting doesn't seem to be passed on by HtmlDiff so there is not way to set the new (temp) directory. Version 1.0.9 also struggled with the large text, so I think downgrading is the only option I have?

@SavageTiger
Copy link
Contributor

After setting cache location, do you see the path you set in the error message?

@AroundtheGlobe
Copy link
Author

The message above was the only message I've got. I had composer download and update all files today and it seems the problem is solved now.

The 5.500 word HTML diff still takes a long time (longer than a normal user would wait). When I wait for 400 sec (or 6.5 minutes) I am now able to to a diff without any errors.

@SavageTiger
Copy link
Contributor

I am not sure why I takes 7 minutes, one of the things that I know can take up loads of time is inline images (base64 encoded src tags). Also I advice you to make sure you run at least version 7 of PHP. I might be able to tell you why your html diff is slow if you can provide the data-set.

@AroundtheGlobe
Copy link
Author

This is the text I try to compare (with an older / different version) https://www.aroundtheglobe.nl/reizen/duitsland/berlijn-si7161.html
It's without the encoded src tags and I am using php 7.1.x on the server so that should be okay. Maybe something else is slowing it down in my (html) code?

@SavageTiger
Copy link
Contributor

Just to be clear, are you literally compairing the source of the entire page html (including all the stuff like menu's etc?) or just the content of <article class="post-content"> for example?

@AroundtheGlobe
Copy link
Author

AroundtheGlobe commented Apr 16, 2019

Only the content of <article class="post-content">

@SavageTiger
Copy link
Contributor

I have created some testing fixtures based on that page

fixtures.zip

The performance on my local machine was not to bad, so not sure what the issue is.

Screenshot from 2019-04-16 23-00-03

Not sure what your server specs are, I run the test on a "Core i7-4790K CPU @ 4.60GHz"

If you want to test this yourself, checkout the vendor library, install using composer, overwrite the html pages from the zip in the directory /tests/fixtures/Performance and run vendor/phpunit/phpunit/phpunit --group=performance from the root of the library folder.

@SavageTiger
Copy link
Contributor

@jschroed91 I think we can close this issue, since the original issue seems invalid, and the performance issue is not really applicable to this specific ticket.

@AroundtheGlobe
Copy link
Author

I will try to have a look today to test and see how it performs on the main server (vs dev server). The dev server is a Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz

@SavageTiger
Copy link
Contributor

I will try to have a look today to test and see how it performs on the main server (vs dev server). The dev server is a Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz

According to user bench, the 950 has around 35% of the performance per core of a 4790, plus there is clock difference, so what would we expect on the 950, around 60 seconds maybe?, ignoring memory speed differences. Not really scientific, but not sure if 400 seconds makes sense, don't think so.

@AroundtheGlobe
Copy link
Author

After uploading everything to the main server I got the same error again.
/var/www/[mysite]/vendor/ezyang/htmlpurifier/library/HTMLPurifier/DefinitionCache/Serializer not writable, please chmod to 777

the htmldiff code is still set the same as above.

@AroundtheGlobe
Copy link
Author

AroundtheGlobe commented May 2, 2019

To answer this question:

After setting cache location, do you see the path you set in the error message?

No, I don't see the path I've set in the error message, but tI assume the default path of HTMLPurifier: /ezyang/htmlpurifier/library/HTMLPurifier/DefinitionCache/Serializer

@AroundtheGlobe
Copy link
Author

I managed to solve the problem, but I can't explain what caused it. I use composer on my pc to get the latest files and update the composer pachages. When I compared the version of the vendor files / caxy files on my development server which at one point started to work with the live it seemed a few directories and files where missing. Uploading the files/directories solved the problem.

The missing files in my vendor dir where:
/vendor_path/caxy/lib/Caxy/HtmlDiff/ListDiffNew.php

In the ezyang package some directories where missing:
/vendor_path/ezyang/htmlPurifier/library/DefinitionCache/Serializer In this directory the two directories HTML and URI where missing.

Uploading the files / directories solved the caching problem and comparing the HTML text above takes about 15 seconds on the live server which is acceptable.

Thank you @SavageTiger for all your support!

@SavageTiger
Copy link
Contributor

@AroundtheGlobe Sorry for the late response, I was a bit busy. Happy that the issue is solved though, I think 15 seconds as what is about to be expected for a text of that size.

cc @jschroed91

@jschroed91
Copy link
Member

Thanks for your help here @SavageTiger !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants