Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character problems with character encoding of iso-8859-1 sites #1543

Closed
darkcattz opened this issue Apr 5, 2018 · 23 comments
Closed

Character problems with character encoding of iso-8859-1 sites #1543

darkcattz opened this issue Apr 5, 2018 · 23 comments

Comments

@darkcattz
Copy link

@darkcattz darkcattz commented Apr 5, 2018

  • Operating System: Mac OS
  • Cypress Version: Last
  • Browser Version: Chrome and Electron

Hello,

My site is not yet in utf8 and i have Character problems for accented characters (doctype iso-8859-1) when I use cypress. It's not really blocking except when doing regular expression searches :

HTML HEADER CHARSET :

meta http-equiv="content-type" content="text/html; charset=iso-8859-1"

Result :

Les sites ont bien �t� supprim�s
� Tous droits r�serv�s � 

Thanks
Regards

@victorjspinto
Copy link

@victorjspinto victorjspinto commented Jul 2, 2018

Same problem with me.

i'm trying to write some test to and old platform with charset iso-8859-1

@Longtrainz
Copy link

@Longtrainz Longtrainz commented Aug 19, 2018

Same problem with windows-1251

@cannibalcow
Copy link

@cannibalcow cannibalcow commented Sep 14, 2018

Why is this labeled as a feature? Is this not a bug?

@dudevictor
Copy link

@dudevictor dudevictor commented Jan 14, 2019

I'm also facing this issue. Trying to write tests in a old app that uses ISO-8859-1. Any workaround?

@jennifer-shehane jennifer-shehane changed the title Character problems with no utf8 site Character problems with character encoding of iso-8859-1 sites Jan 15, 2019
@jennifer-shehane
Copy link
Member

@jennifer-shehane jennifer-shehane commented Jan 15, 2019

I'm having a hard time replicating this behavior with the example characters I've tried so far. Could any of you provide the exact html content that will print as � within Cypress?

@dudevictor
Copy link

@dudevictor dudevictor commented Jan 15, 2019

Hello @jennifer-shehane thanks for your time. I've uploaded the repository https://github.com/dudevictor/cypress-character-problem that shows this issue.

Besides the file must be encoded with iso-8859-1, also I've noticed that the server must returns the content-type header with charset=iso-8859-1 so that the issue occurs.

I noted that exists an app called 'runner' that shows the content inside the cypress app. If the solution won't be too complex, you could give me some directions and I could try to solve it

@jennifer-shehane
Copy link
Member

@jennifer-shehane jennifer-shehane commented Jan 15, 2019

@dudevictor Thank you! All of the characters I was trying previously were working, so this is extra helpful. I'd be happy to help with any directions you need working on the repo if you have any leads.

From everything I've read, the iso-8859-1 is meant to be parsed as windows-1252 per the spec.

I really thought this may be the problem. So I looked for document.characterSet, which prints the character encoding used to render the page. I printed this within Cypress within the application under test and also within the application, and it prints the accurate windows-1252.

cy.document().its('characterSet').should('include', 'windows-1252') // passes

The content-type also seems to be printing fine in the Network panel content-type: text/html;charset=iso-8859-1

I suppose one thing that does stand out is the content-encoding in Cypress of Content-Encoding: gzip, which does not exist when visiting on localhost, but I don't think this should be related to the issue. And now I'm at a dead end.

@jennifer-shehane
Copy link
Member

@jennifer-shehane jennifer-shehane commented Jan 16, 2019

Some more thoughts on this. The prevailing theory now is that since we are gzipping and sending chunked content, the chunking may think it is of one charset when it should be set to another - this may be causing the content to be chunked at the incorrect byte size (since charsets have different byte sizes).

That may not be the greatest explanation, but basically we think there is something going wrong in the chunking.

@opensas
Copy link

@opensas opensas commented Mar 17, 2019

I have a similar issue, I reported it here, added a very simple html page to reproduce the bug:

<html>
<head>
  <meta http-equiv="Content-Type" content="text/html">
  <meta charset="windows-1252">
</head>
<body>
  <h1>Character encoding failing test: á é í ó ú ñ</h1>
</body>

If this is the case, can anybody point me in a workaround to sidestep this issue until it gets fixed?

I'm testing for the presence of the following text in a span like this:

    cy.get('div.flash_warning span')
      .should('have.text', 'El código de la aplicacion no puede estar vacío.')

Which is failing because of the broken encoding.

Is there some way to test for something like this?

    cy.get('div.flash_warning span')
      .should('have.text', 'El c?digo de la aplicacion no puede estar vac?o.')

That would allow me to work around this issue, and I could easily build a helper function that would replace the troublesome characters. I hope I made myself clear.


update: this is the best workaound I could find so far, if anybody has a better alternative I'd be grateful

describe('playing with regular expressions', () => {
  it.only('should match by regular expression', () => {
    cy.visit('http://localhost/metaSSC/cypress/regexp.html')
    cy.get('div.flash_warning span')
      .should('have.text', 'El registro no ha podido ser dado de alta.')
    cy.get('div.flash_error span')
      .contains(/^El c.digo de la aplicacion no puede estar vac.o\.$/) // match span text by regexp
  })
})

also asked at SO

@opensas
Copy link

@opensas opensas commented Mar 17, 2019

ON the other hand, I noticed this issue is labeled like stage: needs information, I would gladly help with this if anybody can tell me what information is missing

@jennifer-shehane
Copy link
Member

@jennifer-shehane jennifer-shehane commented Mar 18, 2019

Hey @opensas - I do believe this issue is likely the same.

Honestly, this should be labeled as 'ready for work' on our side, since we do have a reproducible example. The cause it still unknown though, although we had a theory.

The fact that this doesn't run correctly in Electron only is a helpful new piece of information.

I will close #3725 as a duplicate.

@opensas
Copy link

@opensas opensas commented Mar 18, 2019

I will close #3725 as a duplicate.

Sure, go ahead, I do hope you can work it out. Please let me know if there's anything I can do to help.

BTW, can anybody give a clue on how to implement a custom extension to cy like this:

cy.get('div.flash_error span')
      .containsWithEncoding('El código de la aplicacion no puede estar vacío.')

It would just build a regular expresion replacing every problematic char with '.'

thanks a lot

@jennifer-shehane
Copy link
Member

@jennifer-shehane jennifer-shehane commented Mar 18, 2019

@opensas Look into our custom command documentation

@opensas
Copy link

@opensas opensas commented Mar 20, 2019

thanks, just for the record, this is the workaround I developed:

Cypress.Commands.add('containsLike', {
  prevSubject: true
}, (subject, search, chars) => {

  chars = chars || 'áéíóúñÁÉÍÓÚÑ'
  if (!Array.isArray(chars)) chars = chars.toString().split('')

  chars.forEach( char => {
    const repAllChars = new RegExp(char, 'g') // see: https://stackoverflow.com/a/17606289/47633
    search = search.replace(repAllChars, '.')
  })

  const regExp = new RegExp('^' + search + '$')
  return cy.wrap(subject).contains(regExp)
})

and I use it like this:

describe('my first test', () => {
  it.only('should pass', () => {
    cy.visit('http://localhost/xxxx/yyy.asp')
      .get('div.flash_error span')
      .containsLike('El código de la aplicacion no puede estar vacío.')
// it runs .contains(/^El c.digo de la aplicacion no puede estar vac.o\.$/)
  })
})

@HeatherFlux
Copy link

@HeatherFlux HeatherFlux commented Mar 26, 2019

Hey this has to due with obstructive code. To fix the issue in your configuration file use, "modifyObstructiveCode": false . This should fix any issues with weird charsets.

@opensas
Copy link

@opensas opensas commented Mar 27, 2019

I can confirm that setting modifyObstructiveCode to false does NOT fixes the issue, this is my cypress.json:

{
  "modifyObstructiveCode": false,
  "browser": {
    "modifyObstructiveCode": false
  }
}

(didn't know if the settings goes on the root level or inside browser)

and I also tried starting cypress with:

cypress open --config modifyObstructiveCode=false

None of them seemed to work

@HeatherFlux
Copy link

@HeatherFlux HeatherFlux commented Mar 27, 2019

Hmmm, sorry then. For me I was having issues with some chars not being translated properly when running my application. https://docs.cypress.io/guides/references/configuration.html#modifyObstructiveCode
This section was able to help me solve the issue I was having with �tï type of char in the translation of bsdatepicker.

My config was laid out as such:
{ "modifyObstructiveCode": false, }

@nagyzso94
Copy link

@nagyzso94 nagyzso94 commented May 16, 2019

I have the same issue. Has anyone found a solution to this?
Tried to set the modifyObstructiveCode to false in cypress.json but that didnt help.

@simonmeggle
Copy link

@simonmeggle simonmeggle commented Jul 11, 2019

Are there any new on this topic? I also have a web site which is comes in win1252; the charset gets utf8 in cypress. The german umlauts (ä,ö,ü) on this site are all displayed wrong (e.g. �).
Setting modifyObstructiveCode to false also did not work for me.

@cypress-bot
Copy link

@cypress-bot cypress-bot bot commented Jul 15, 2019

The code for this is done in cypress-io/cypress#4698, but has yet to be released.
We'll update this issue and reference the changelog when it's released.

@simonmeggle
Copy link

@simonmeggle simonmeggle commented Jul 16, 2019

Great to hear that you are working on the charset issue. Ad far as I can see, #4698 does not cover the win-1252 charset. Is there any plan to do this also? Thanks...

@flotwig
Copy link
Member

@flotwig flotwig commented Jul 16, 2019

Hey @simonmeggle, it does also fix win-1252 charset, along with any other charset you're likely to experience on the web (full list: https://github.com/ashtuchkin/iconv-lite/wiki/Supported-Encodings). I'll update the issue comment to clarify :)

@cypress-bot
Copy link

@cypress-bot cypress-bot bot commented Jul 29, 2019

Released in 3.4.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.