Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character problems with character encoding of iso-8859-1 sites #1543

Open
darkcattz opened this Issue Apr 5, 2018 · 17 comments

Comments

8 participants
@darkcattz
Copy link

darkcattz commented Apr 5, 2018

  • Operating System: Mac OS
  • Cypress Version: Last
  • Browser Version: Chrome and Electron

Hello,

My site is not yet in utf8 and i have Character problems for accented characters (doctype iso-8859-1) when I use cypress. It's not really blocking except when doing regular expression searches :

HTML HEADER CHARSET :

meta http-equiv="content-type" content="text/html; charset=iso-8859-1"

Result :

Les sites ont bien �t� supprim�s
� Tous droits r�serv�s � 

Thanks
Regards

@victorjspinto

This comment has been minimized.

Copy link

victorjspinto commented Jul 2, 2018

Same problem with me.

i'm trying to write some test to and old platform with charset iso-8859-1

@Longtrainz

This comment has been minimized.

Copy link

Longtrainz commented Aug 19, 2018

Same problem with windows-1251

@cannibalcow

This comment has been minimized.

Copy link

cannibalcow commented Sep 14, 2018

Why is this labeled as a feature? Is this not a bug?

@dudevictor

This comment has been minimized.

Copy link

dudevictor commented Jan 14, 2019

I'm also facing this issue. Trying to write tests in a old app that uses ISO-8859-1. Any workaround?

@jennifer-shehane jennifer-shehane changed the title Character problems with no utf8 site Character problems with character encoding of iso-8859-1 sites Jan 15, 2019

@jennifer-shehane

This comment has been minimized.

Copy link
Member

jennifer-shehane commented Jan 15, 2019

I'm having a hard time replicating this behavior with the example characters I've tried so far. Could any of you provide the exact html content that will print as � within Cypress?

@dudevictor

This comment has been minimized.

Copy link

dudevictor commented Jan 15, 2019

Hello @jennifer-shehane thanks for your time. I've uploaded the repository https://github.com/dudevictor/cypress-character-problem that shows this issue.

Besides the file must be encoded with iso-8859-1, also I've noticed that the server must returns the content-type header with charset=iso-8859-1 so that the issue occurs.

I noted that exists an app called 'runner' that shows the content inside the cypress app. If the solution won't be too complex, you could give me some directions and I could try to solve it

@jennifer-shehane

This comment has been minimized.

Copy link
Member

jennifer-shehane commented Jan 15, 2019

@dudevictor Thank you! All of the characters I was trying previously were working, so this is extra helpful. I'd be happy to help with any directions you need working on the repo if you have any leads.

From everything I've read, the iso-8859-1 is meant to be parsed as windows-1252 per the spec.

I really thought this may be the problem. So I looked for document.characterSet, which prints the character encoding used to render the page. I printed this within Cypress within the application under test and also within the application, and it prints the accurate windows-1252.

cy.document().its('characterSet').should('include', 'windows-1252') // passes

The content-type also seems to be printing fine in the Network panel content-type: text/html;charset=iso-8859-1

I suppose one thing that does stand out is the content-encoding in Cypress of Content-Encoding: gzip, which does not exist when visiting on localhost, but I don't think this should be related to the issue. And now I'm at a dead end.

@jennifer-shehane

This comment has been minimized.

Copy link
Member

jennifer-shehane commented Jan 16, 2019

Some more thoughts on this. The prevailing theory now is that since we are gzipping and sending chunked content, the chunking may think it is of one charset when it should be set to another - this may be causing the content to be chunked at the incorrect byte size (since charsets have different byte sizes).

That may not be the greatest explanation, but basically we think there is something going wrong in the chunking.

@opensas

This comment has been minimized.

Copy link

opensas commented Mar 17, 2019

I have a similar issue, I reported it here, added a very simple html page to reproduce the bug:

<html>
<head>
  <meta http-equiv="Content-Type" content="text/html">
  <meta charset="windows-1252">
</head>
<body>
  <h1>Character encoding failing test: á é í ó ú ñ</h1>
</body>

If this is the case, can anybody point me in a workaround to sidestep this issue until it gets fixed?

I'm testing for the presence of the following text in a span like this:

    cy.get('div.flash_warning span')
      .should('have.text', 'El código de la aplicacion no puede estar vacío.')

Which is failing because of the broken encoding.

Is there some way to test for something like this?

    cy.get('div.flash_warning span')
      .should('have.text', 'El c?digo de la aplicacion no puede estar vac?o.')

That would allow me to work around this issue, and I could easily build a helper function that would replace the troublesome characters. I hope I made myself clear.


update: this is the best workaound I could find so far, if anybody has a better alternative I'd be grateful

describe('playing with regular expressions', () => {
  it.only('should match by regular expression', () => {
    cy.visit('http://localhost/metaSSC/cypress/regexp.html')
    cy.get('div.flash_warning span')
      .should('have.text', 'El registro no ha podido ser dado de alta.')
    cy.get('div.flash_error span')
      .contains(/^El c.digo de la aplicacion no puede estar vac.o\.$/) // match span text by regexp
  })
})

also asked at SO

@opensas

This comment has been minimized.

Copy link

opensas commented Mar 17, 2019

ON the other hand, I noticed this issue is labeled like stage: needs information, I would gladly help with this if anybody can tell me what information is missing

@jennifer-shehane

This comment has been minimized.

Copy link
Member

jennifer-shehane commented Mar 18, 2019

Hey @opensas - I do believe this issue is likely the same.

Honestly, this should be labeled as 'ready for work' on our side, since we do have a reproducible example. The cause it still unknown though, although we had a theory.

The fact that this doesn't run correctly in Electron only is a helpful new piece of information.

I will close #3725 as a duplicate.

@opensas

This comment has been minimized.

Copy link

opensas commented Mar 18, 2019

I will close #3725 as a duplicate.

Sure, go ahead, I do hope you can work it out. Please let me know if there's anything I can do to help.

BTW, can anybody give a clue on how to implement a custom extension to cy like this:

cy.get('div.flash_error span')
      .containsWithEncoding('El código de la aplicacion no puede estar vacío.')

It would just build a regular expresion replacing every problematic char with '.'

thanks a lot

@jennifer-shehane

This comment has been minimized.

Copy link
Member

jennifer-shehane commented Mar 18, 2019

@opensas Look into our custom command documentation

@opensas

This comment has been minimized.

Copy link

opensas commented Mar 20, 2019

thanks, just for the record, this is the workaround I developed:

Cypress.Commands.add('containsLike', {
  prevSubject: true
}, (subject, search, chars) => {

  chars = chars || 'áéíóúñÁÉÍÓÚÑ'
  if (!Array.isArray(chars)) chars = chars.toString().split('')

  chars.forEach( char => {
    const repAllChars = new RegExp(char, 'g') // see: https://stackoverflow.com/a/17606289/47633
    search = search.replace(repAllChars, '.')
  })

  const regExp = new RegExp('^' + search + '$')
  return cy.wrap(subject).contains(regExp)
})

and I use it like this:

describe('my first test', () => {
  it.only('should pass', () => {
    cy.visit('http://localhost/xxxx/yyy.asp')
      .get('div.flash_error span')
      .containsLike('El código de la aplicacion no puede estar vacío.')
// it runs .contains(/^El c.digo de la aplicacion no puede estar vac.o\.$/)
  })
})
@HeatherFlux

This comment has been minimized.

Copy link

HeatherFlux commented Mar 26, 2019

Hey this has to due with obstructive code. To fix the issue in your configuration file use, "modifyObstructiveCode": false . This should fix any issues with weird charsets.

@opensas

This comment has been minimized.

Copy link

opensas commented Mar 27, 2019

I can confirm that setting modifyObstructiveCode to false does NOT fixes the issue, this is my cypress.json:

{
  "modifyObstructiveCode": false,
  "browser": {
    "modifyObstructiveCode": false
  }
}

(didn't know if the settings goes on the root level or inside browser)

and I also tried starting cypress with:

cypress open --config modifyObstructiveCode=false

None of them seemed to work

@HeatherFlux

This comment has been minimized.

Copy link

HeatherFlux commented Mar 27, 2019

Hmmm, sorry then. For me I was having issues with some chars not being translated properly when running my application. https://docs.cypress.io/guides/references/configuration.html#modifyObstructiveCode
This section was able to help me solve the issue I was having with �tï type of char in the translation of bsdatepicker.

My config was laid out as such:
{ "modifyObstructiveCode": false, }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.