Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not all non-ascii urls seem to work #3082

Closed
Raven24 opened this issue Mar 29, 2012 · 8 comments
Closed

not all non-ascii urls seem to work #3082

Raven24 opened this issue Mar 29, 2012 · 8 comments

Comments

@Raven24
Copy link
Member

Raven24 commented Mar 29, 2012

It appears my axiomatic regex is not perfect in every way ... what a bummer ;)

The link
http://موقع.وزارة-الاتصالات.مصر/ (Egyptian Ministry of Communications and Information Technology)
should yield
http://xn--4gbrim.xn----ymcbaaajlc6dj7bxne2c.xn--wgbh1c/
which it doesn't.

I'll look into it, soon (or if anyone else is interested in improving their regex skills, I'd be happy to assist)

@maxwell
Copy link
Member

maxwell commented Mar 29, 2012

I am beginning to think there has to be a vendored regex SOMEWHERE. this is not the kind of thing I think we should be maintaining ourselves :P

@maxwell
Copy link
Member

maxwell commented Mar 29, 2012

or we switch to a regex which defined what NOT each part could be...

@Raven24
Copy link
Member Author

Raven24 commented Mar 29, 2012

Now I get it ... the regex should be fine. But it seems the fragments of a sub.domainname.tld should be processed seperately according to this php function:

/**
 * Removes a weakness of encode(), which cannot properly handle URIs but instead encodes their
 * path or query components, too.
 * @author  Matthias Sommerfeld <mso@phlylabs.de>
 * @copyright 2004-2011 phlyLabs Berlin, http://phlylabs.de
 * @param string  $uri  Expects the URI as a UTF-8 (or ASCII) string
 * @return  string  The URI encoded to Punycode, everything but the host component is left alone
 * @since 0.6.4
 */
public function encode_uri($uri)
  {
    $parsed = parse_url($uri);
    if (!isset($parsed['host'])) {
      $this->_error('The given string does not look like a URI');
      return false;
    }
    $arr = explode('.', $parsed['host']);
    foreach ($arr as $k => $v) {
      $conv = $this->encode($v, 'utf8');
      if ($conv) $arr[$k] = $conv;
    }
    $parsed['host'] = join('.', $arr);
    $return =
      (empty($parsed['scheme']) ? '' : $parsed['scheme'].(strtolower($parsed['scheme']) == 'mailto' ? ':' : '://'))
      .(empty($parsed['user']) ? '' : $parsed['user'].(empty($parsed['pass']) ? '' : ':'.$parsed['pass']).'@')
      .$parsed['host']
      .(empty($parsed['port']) ? '' : ':'.$parsed['port'])
      .(empty($parsed['path']) ? '' : $parsed['path'])
      .(empty($parsed['query']) ? '' : '?'.$parsed['query'])
      .(empty($parsed['fragment']) ? '' : '#'.$parsed['fragment']);
    return $return;
  }

That shouldn't be too hard ;)

@Raven24
Copy link
Member Author

Raven24 commented Mar 29, 2012

nice, I got it pretty much working.
I shall be waiting to submit a pull request until my earlier tests are back ;)

@maxwell
Copy link
Member

maxwell commented Mar 29, 2012

could you just include them in this new pull request to fix it?

@Raven24
Copy link
Member Author

Raven24 commented Mar 29, 2012

in post_view_spec.js or stream_post_spec.js?
;)

@maxwell
Copy link
Member

maxwell commented Mar 29, 2012

good questions. it is generic to anything being markdowned, right? so post__view_spec i think

@Raven24
Copy link
Member Author

Raven24 commented Mar 29, 2012

done, see #3084

@Raven24 Raven24 closed this as completed Mar 29, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants