Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special Characters #89

Closed
adonisfigueroa opened this issue Jun 7, 2016 · 25 comments
Closed

Special Characters #89

adonisfigueroa opened this issue Jun 7, 2016 · 25 comments
Assignees

Comments

@adonisfigueroa
Copy link
Contributor

Some text with special characters has a wrong parse, for example:

$amp = new AMP();
$amp->loadHtml("You’re there");
echo $amp->convertToAmpHtml();

Or symbols:
$amp->loadHtml("end​");

Other examples:
première
of ‘A
a – b
Gael García Bernal
-“quotes”-

@sidkshatriya
Copy link
Contributor

@adonisfigueroa I am not able to repeat the error.

What version of the library are you using? What is the version of PHP?

<?php
require __DIR__.'/vendor/autoload.php';

use Lullabot\AMP\AMP;

$amp = new AMP();
$amp->loadHtml("<p>Gael García Bernal</p>");
echo $amp->convertToAmpHtml();

gives
<p>Gael García Bernal</p>

@sidkshatriya
Copy link
Contributor

Can you post your phpinfo(); perhaps?

@sidkshatriya
Copy link
Contributor

There was a similar problem sometime ago #48 which was fixed.

Are you running with the latest versions of the library and its dependencies?

@adonisfigueroa
Copy link
Contributor Author

I made un update today (with composer), I have this version:
lullabot/amp dev-master 02d65f9 A set of useful classes and utilities to convert html to AMP html (See https://www.ampproject.org/)

About the phpinfo, what information do you need specifically?
This is the php version and dom version:

captura de pantalla de 2016-06-07 14 24 21
captura de pantalla de 2016-06-07 14 22 08

@sidkshatriya
Copy link
Contributor

@adonisfigueroa
Do you have mbstring extension installed?

@adonisfigueroa
Copy link
Contributor Author

Yes, I have

captura de pantalla de 2016-06-07 14 49 05

@sidkshatriya
Copy link
Contributor

sidkshatriya commented Jun 7, 2016

What is the output of the small program I posted in #89 (comment) for you?

@adonisfigueroa
Copy link
Contributor Author

Output: Gael García Bernal

@sidkshatriya
Copy link
Contributor

This is puzzling for me. What is your operating system? Can you try on another installation of PHP (preferably on LInux or Mac) that you might have access to and tell me what results you are getting?

@adonisfigueroa
Copy link
Contributor Author

My localhost is Ubuntu 14.04.4 LTS with php 5.5.9
and in a Red Hat Enterprise Linux Server release 6.7 with php 5.6.21, same results.

@sidkshatriya
Copy link
Contributor

  • Can you tell me what you get when you type php -i | grep libxml ?
  • Also can you try to run the above Special Characters #89 (comment) script in cgi/fpm/fastcgi etc. mode (i.e. non command line) and tell me what output you get?

@adonisfigueroa
Copy link
Contributor Author

adonisfigueroa commented Jun 7, 2016

php -i | grep libxml
libxml Version => 2.9.1
libxml
libxml2 Version => 2.9.1

When execute the code in console, I get correct characters, maybe some problems with charset ISO-8859-1?

/usr/bin/php /var/www/html/test.php

Gael García Bernal

@sidkshatriya
Copy link
Contributor

Ah! So things are working correctly in the console and not working correctly in the browser?

The AMP library output is UTF-8. We only support UTF-8 output.

@sidkshatriya
Copy link
Contributor

(We don't support ISO-8859-1)

@sidkshatriya
Copy link
Contributor

But you can try to do some encoding conversions using mb_convert_encoding. Encoding conversions are tricky but ISO-8859-1 is quite a mainstream encoding...

However, I would advise using UTF-8 for all multilingual stuff in your web pages...

@adonisfigueroa
Copy link
Contributor Author

Before submit the issue, I made a test with mb_convert_encoding, but that symbols were unsuccessful, I'll try converting in other way before call to loadHtml.

@sidkshatriya
Copy link
Contributor

Is there a reason why you must use ISO-8859-1 for your output? UTF-8 is just so much better and you can mix languages other than those supported in latin-1...

@adonisfigueroa
Copy link
Contributor Author

Yes, we need to use the AMP in a site with many content in the database that is in ISO-8859-1, then to support UTF-8, we need to convert first all the data.

@sidkshatriya
Copy link
Contributor

I'm closing this ticket as this not really a bug in the library. The library is working as intended.

However feel free to keep updating this ticket in case you find something useful or need any help...

@adonisfigueroa
Copy link
Contributor Author

Ok thanks for your help

@sidkshatriya
Copy link
Contributor

I want to re-word a comment I made above:

The AMP library output is UTF-8. We only support UTF-8 output.

Actually, it would be more accurate to say is you should only provide UTF-8 input to the library and then you would get proper UTF-8 output. We only support UTF-8 encoding.

@adonisfigueroa
Copy link
Contributor Author

It works if the content to UTF-8 is parsed before the call loadHTML and then the results are parsing to initial encoded:

$text = iconv('ISO-8859-1', 'UTF-8//TRANSLIT//IGNORE', $text);
$amp = new AMP();
$amp->loadHtml($text);
$text_amp = $amp->convertToAmpHtml();
$text_amp = iconv('UTF-8', 'ISO-8859-1//TRANSLIT//IGNORE', $text_amp);

Thanks again. You could mention about the encoding supported in the documentation.

@sidkshatriya
Copy link
Contributor

@adonisfigueroa Thanks for the tip. I have updated README.md and added a note saying that we only support UTF-8 and ASCII (UTF-8 is a superset of ASCII).

BTW I'm curious that you chose to use iconv. Does mb_convert_encoding not do the trick for you?

@adonisfigueroa
Copy link
Contributor Author

It was an example, but the good news it's that both work (using it before and after AMP library).

@adonisfigueroa
Copy link
Contributor Author

Sorry, an example where it doesn't work with mb_convert_encoding but it works with iconv:
-> special quotes -“What”-

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants