Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

utf8 encoding problem in casper.getCurrentUrl() and document.location.href #276

Closed
maerten opened this Issue Nov 14, 2012 · 3 comments

Comments

Projects
None yet
2 participants
Contributor

maerten commented Nov 14, 2012

We found that utf8 chacaters like "ö" get replaced by a questionmark when using the document.location.href inside an casper.evaluate(func) block. The same happens with casper.getCurrentUrl() (probably is using document.location.href internally).

Casperjs version 1.0.0-RC1, tested on both OS X and Ubuntu 12.04.

Owner

n1k0 commented Nov 14, 2012

Can you please try to update to latest master? ace3fe0 may have fixed your issue since RC1

Contributor

maerten commented Nov 14, 2012

Same results with master. I made a test you can try:

var casper = require('casper').create({
    clientScripts: [ 'jquery.min.js' ]
});

casper.start('http://localhost:3000/', function() { /* this.echo('Opened URL at testserver: ' + this.getCurrentUrl()); */ });

casper.then(function() {
  var link_url = this.evaluate(function() { return $('#testlink').attr('href'); });

  // this shows the current URL
  this.echo("Found one link with URL: " + link_url);

  this.thenOpen('http://localhost:3000' + link_url, function() {

    // this shows a questionmark instead of the 'ö'
    this.echo("Opened link. Current URL: " + this.getCurrentUrl());
  });
});


casper.run(function() {
  this.exit();
});

and a node testserver:

var express = require('express')
  , app = express.createServer();

app.use(express.bodyParser());
app.get('/', function(req, res){
  console.log("Client requested: "  + req.url);
  res.header('Content-Type: text/html; charset=utf8');
  res.end('<!doctype html><html><head><meta charset="utf-8"></head><body><a id="testlink" href="/freie-stellen/maschinenbau/konstrukteur-fördertechnik-275/">');
});
app.get('/*', function(req, res) {
  console.log("Client requested: "  + req.url);
});
console.log('--- webserver running at localhost:3000');
app.listen(3000);

Something extra came up: nodejs shows the requested URL is malformed. So there might be more going on here.. In the actual casperjs script I am using, the page with UTF8 url is requested without problems though (i think the target site is running on IIS/ASP).
I hope this helps!

Owner

n1k0 commented Nov 15, 2012

I think the problem is related to your nodejs/server env:

var casper = require('casper').create();

casper.start('https://www.google.fr/#q=f%C3%B6rdertechnik&fp=1', function() {
    this.echo(this.getCurrentUrl());
});

casper.thenOpen('https://www.google.fr/#q=fördertechnik&fp=1', function() {
    this.echo(this.getCurrentUrl());
});

casper.run();

Test:

$ casperjs c.js
https://www.google.fr/search?q=fördertechnik&cad=h
https://www.google.fr/search?q=fördertechnik&cad=h

@n1k0 n1k0 closed this Nov 15, 2012

hubpan pushed a commit to hubpan/casperjs that referenced this issue Feb 7, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment