Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Case Sensitive / Capital Characters #27

Closed
julkue opened this issue Jan 25, 2018 · 6 comments
Closed

Case Sensitive / Capital Characters #27

julkue opened this issue Jan 25, 2018 · 6 comments
Assignees
Projects

Comments

@julkue
Copy link
Member

julkue commented Jan 25, 2018

Currently we have an output like e.g.:

"ü": {
  "mapping": {
    "base": "u",
    "decompose": {
      "titleCase": "Ue",
      "upperCase": "UE",
      "lowerCase": "ue"
    }
  }
},

In order to keep the behavior of mark.js' regular expression creation we need to differentiate between capital and non-capital characters. For above mentioned example I could imagine an output like:

"ü": {
  "capital": false,
  "mapping": {
    "base": "u",
    "decompose": {
      "lowerCase": "ue"
    }
  }
}

So, a new property capital was added and titleCase as well as upperCase was removed. For the opposite -- the capital character:

"Ü": {
  "mapping": {
    "base": "U",
    "decompose": {
      "titleCase": "Ue",
      "upperCase": "UE",
      "lowerCase": "ue"
    }
  },
}

it could be:

"Ü": {
  "capital": true,
  "mapping": {
    "base": "U",
    "decompose": {
      "titleCase": "Ue",
      "upperCase": "UE"
    }
  },
}

What do you think @Mottie?

Btw.: I've noticed that the equivalents generation is inconsistent. We have always used camelCase (e.g. "languageNative") but in the equivalents part we have e.g. html_decimal.

@julkue julkue added this to In Progress in Database Jan 25, 2018
@Mottie
Copy link
Member

Mottie commented Jan 25, 2018

Adding a capital value would be a good addition; but I think it would be better to include all decompose values in case the user is switching the case of the character.

I'll update the database to use camel-cased keys.

@julkue
Copy link
Member Author

julkue commented Jan 25, 2018

but I think it would be better to include all decompose values in case the user is switching the case of the character.

Could you please elaborate this use case bit?

@Mottie
Copy link
Member

Mottie commented Jan 25, 2018

I know the value could be converted to the desired case before accessing the database, but we don't really know the use-case of the database yet. If a developer wants to get the data for ü, but wants the output to be all caps we should have an upperCase value available.

@julkue
Copy link
Member Author

julkue commented Jan 25, 2018

@Mottie From my point of view we're currently doing everything that is necessary to get mark.js rolled out with the diacritics integration. If we do not have use cases for keeping upperCase and it won't be included in the node module I don't see a reason to keep it?

@Mottie
Copy link
Member

Mottie commented Jan 25, 2018

What do we do about es characters like ¿ which has the same decompose title, upper and lower case value of ?? There is no case.

Edit: I set these as lower case and removed the title and uppercase entries.

@Mottie
Copy link
Member

Mottie commented Jan 25, 2018

Please review.

@julkue julkue closed this as completed in ffc151d Feb 6, 2018
@julkue julkue moved this from In Progress to Done in Database Feb 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants