Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEI using outmoded ISO 5218 for sex value attribute #426

Closed
TEITechnicalCouncil opened this issue Jan 21, 2013 · 36 comments
Closed

TEI using outmoded ISO 5218 for sex value attribute #426

TEITechnicalCouncil opened this issue Jan 21, 2013 · 36 comments

Comments

@TEITechnicalCouncil
Copy link

TEI uses ISO 5218:2004 to assign sexuality of persons in a document ( with attributes being given as 1 for male, 2 for female, 9 for non-applicable, and 0 for unknown). This is an outmoded and problematic representation of sexuality, and in particular formally assigns women to be secondary to men.

There are other discussions online regarding how best to tackle sexuality in markup, and the problems in using ISO 5218 - see the w3c lists here: http://lists.w3.org/Archives/Public/public-contacts-coord/2010JulSep/0010.html .

I would like to see TEI move away from enshrining women as the second sex in their markup - as Steven Ramsay tweeted:
<author>Simone de Beauvoir</author> <sex value="2">female</sex> sigh

Can a discussion be had about how best to achieve this? Your current approach is both outmoded and offensive.
best,
Melissa

Original comment by: @melissaterras

@TEITechnicalCouncil
Copy link
Author

I have long argued that ISO 5218 was inadequate for recording sex, not only because of the precedence of "male" in the numbering (although I've heard people interpret it the other way around--female is double the value of male--but that's a derailment), but because the values "unknown" and "not applicable" (which presumably can only refer to things like an anonymous blogger and a robot, respectively) are completely inadequate for representing the many other sexes and sexual-identities possible, including intersex, genderqueer, gender-neutral, fluid, trans* and many others.

I did ask around various online queer communities if there were other proposed open standards for representing sex more inclusively, but no one could think of any. (The problem being, of course, that any such standard would be inadequate.)

I agree very strongly with Melissa that we need a discussion about how to improve this situation, both at the TEI level, where we have a chance to improve the situation in the short term, and at ISO (which will no doubt takea lot longer). Failing any other external standard to adopt, I suggest the datatype of @sex be changed from data.sex to enumerated, allowing project-specific definition of sex (with documentation). This would allow anyone who is currently using ISO 5218 validly to continue doing so, but anyone who wants to do better to definte their taxonomy in the teiHeader (perhaps a sexDesc element provided for the purpose?).

Original comment by: @gabrielbodard

@TEITechnicalCouncil
Copy link
Author

I have been arguing for this for the past 6 years, since the presentation of P5, and I don't buy the retrospective argument that 2 is assigned to a woman because it is twice as good as a man. The TEI should not accomplice of the sexism of ISO. Agreed with Melissa and Gabby.

Original comment by: sf_user_epierazzo

@TEITechnicalCouncil
Copy link
Author

I agree with Gabby's proposed solution to go to data.enumerated for sex/@value in the short term, and revise the prose of the Guidelines. Then we need a working group to come up with a proper solution.

Original comment by: @martindholmes

@TEITechnicalCouncil
Copy link
Author

My feeling is that we should stick to the principle that we re-use external standards where at all possible. There isn't an obvious other contender to a categorisation of sex than ISO, so if thats inadequate, lobby ISO, not TEI.

Why do people care? if you want to ignore the nornalization to ISO, use the body of the element to say whatever is needed. If you want to use another normalization schema, redefine data.sex in your customization as usual.

by the way, I don't regard TEI as "them" or "you". It's "we" and "our" standard. Similarly, ISO.
lets not fall into the "leave the EU, they take away all our rights" trap....

Original comment by: @sebastianrahtz

@TEITechnicalCouncil
Copy link
Author

I don't seriously make the argument that '2' is better than '1' because it is more. When I've said that it is to point out how silly I find it to make the assumption that a numbering system of 1 and 2 somehow implies precedence or order (especially when '9' is also used). IMHO, I don't think it truly 'assigns women to be secondary to men', just like if ISO 5218 was expanded to have , let's say, trans ppl as '3' that they would be considered tertiary and below women. I'm sorry that you find it offensive. I've always felt that It is just a different number. It is simply an agreed machine-processable label -- yes people can get offended by that, but that isn't inherent to the number itself but the interpretations people place on it. I'm not saying those interpretations aren't real or don't have weight, validity or consequences. But that has to be balanced against using some adhoc linguistically-specific system which is why we moved away from 'm', 'f', 'u', and 'x'. There are many, many, other possible ways that people could record the information about sex and/or gender using TEI should they wish to do so on point of principle. (Using @ana to point to a full taxonomy of possibilities, for example.) Or as Sebastian points out local implementations are free to change the values associated with data.sex if they wish. I've always felt unease at the limitation of the 4 values and would happily argue for updating it if an agreed standard could be formalised.

I'm not saying the TEI shouldn't change this, but I would instead be trying to get ISO to redefine the standard in some appropriate way and then TEI would, happily and without argument, implement that.

No one has suggested what possible values we should use, so discussion would need to develop a clear proposal. Part of the problem is that any numerical system has the perception of ordering, and alphabetic ones have linguistic culture-specific assumptions that we'd prefer to shy away from if possible. One possibility suggested to me was to code for chromosome types XY for males and XX for females (which nicely gets a way to deal with sex chromosome abnormalities like Klinefelter syndrome), however, while this may work for biological sex identification it does not deal with gender and/or sexual identification such as the list proposed by Gabby. I honestly do not know what the right values should be. If I was encoding texts where it was felt important to have more than the four categories, as I suggested I would probably use a <taxonomy> with a range of values suitable to the task.

[Since Gabby has already done some research in this area, I'm assigning the ticket to him to make sure we don't lose sight of it. Marking it as group 'RED' at the moment because we'd need a clear proposal to discuss and it isn't all clear what that proposal should be.]

Original comment by: @jamescummings

@TEITechnicalCouncil
Copy link
Author

  • assigned_to: nobody --> gabrielbodard
  • milestone: --> RED

Original comment by: @jamescummings

@TEITechnicalCouncil
Copy link
Author

I agree that TEI is not--and shouldn't be--in the business of creating new standards, but we are currently in the business of recommending the use of existing standards, and we should be careful that the standards we recommend are fit for the purpose our users are going to employ them for. It's clear that for several reasons ISO 5218 is not fit.

I do have a concrete proposal, in fact: change the datatype of person/@sex and sex/@value to data.enumerated, with instructions in the classSpec to use a project-defined or other standard taxonomy of sex (e.g. ISO 5218, or something better if it arises). This is backward compatible and doesn't prevent anyone who likes the current scheme from continuing to use it as their default normalization; it just deprivileges it.

Those of us who care can also petition ISO to improve or remove the inadequate standard (but good luck with that), or work with other communities to come up with a rival standard. It's not TEI's business to do that, though.

Original comment by: @gabrielbodard

@TEITechnicalCouncil
Copy link
Author

Having thought about this a bit more, Gabby's proposal (changing data.sex to data.enumerated) won't work transparently, because data.enumerated is data.name, and data.name is an XML name which cannot begin with a digit, and so ISO 5218 numerical values would become invalid. They could be prefixed with a letter, of course, but such a change would break backwards compatibility.

Original comment by: @martindholmes

@TEITechnicalCouncil
Copy link
Author

That's a problem. This isn't the first time that the datatype of data.enumerated has turned out to be a problem (cf discussion of datatype of @ed). Is there some other some other value we can use (e.g. data.code) that allows arbitrary enumerated values?

Why does data.enumerated need to start with an alphabetic character anyway?

Original comment by: @gabrielbodard

@TEITechnicalCouncil
Copy link
Author

The discussion Melissa has pointed to advocates an open-ended approach in which there are some suggested values ("male", "female") but the category is open so that users can express their sexuality or gender in a way that suits them. One option would be to create a new attribute with open data.enumerated values, and suggest some values. This could coexist alongside the @sex and sex/@value, which could then be deprecated if we wanted to make a clear statement of disapproval of 5218.

Naming this attribute would be problematic. @gender springs to mind, but the potential confusion between sex and gender, transsexual vs transgendered, etc. would probably rule it out. We need a lot of input from the community here.

Original comment by: @martindholmes

@TEITechnicalCouncil
Copy link
Author

It seems decidely retrograde to go back to an open list of arbitrary tokens made up by each project as it deems fit. Either let's use tokens from a recognized authority or convention (as we do for dates and times, for example), or let's use the "pointer to a classification" system which we espouse elsewhere. So just as we say hand="#hand1", lets say sex="#sex1", where "sex1" is the ID
of an item in a typology. People can then make up as many categories as they see fit, and making it a pointer forces them to decide which scheme to use.

Using arbitrary magic codes is the worst possible solution.

(my survey of people, asking them if they find the 1 and 2 thing offensive, so far yields a more or less equal numbers of "i have no idea what you're talking about", "oh yes that old chestnut, but there are far more important issues to solve", and "it's an unordered set of arbitrary tokens, whats the issue")

Original comment by: @sebastianrahtz

@TEITechnicalCouncil
Copy link
Author

I was amused to read that Sweden used to/uses a citizen identifier where "The number uses ten digits, YYMMDD-NNGC. The first six give the birth date in YYMMDD format. Digits seven to nine (NNG) are used to make the number unique, where digit nine (G) is odd for men and even for women. " Scotland does something similar. It raises the possibility that one could use any old numbering system, but follow the convention of "odd is sort of like men and even is sort of like women" (leaving 0 or negative numbers for other uses). that would allow one to use 100 for women and 101 for men. A trivial function will return the ISO equivalent for those that want to map to it.

http://en.wikipedia.org/wiki/National\_identification\_number is fascinating reading :-}

Original comment by: @sebastianrahtz

@TEITechnicalCouncil
Copy link
Author

Thanks for your comments on this - and glad to see that some are taking this seriously (when someone says that they are offended at something, it is generally useful to believe that they are offended, rather than telling them that they cant possibly be offended, or that there are better things to be doing.)

I'm following this discussion with interest (although markup isnt my forte) - I agree that using an outmoded standard, just because it is a standard, isnt a useful approach.

Fwiw, I'd be interested in working with someone on petitioning the ISO about this, if anyone else is willing to join forces.

Original comment by: @melissaterras

@TEITechnicalCouncil
Copy link
Author

ISO standards have a very detailed and carefully designed process, to make sure they don't just hang on for ever. This one was last examined and renewed by due process in 2004. I dont think the right process is to "petition ISO", however. They don't make standards, they merely publish the work done by their working groups, which are composed of representatives of the national standard bodies, ie the BSI in our case. So I'd suggest contacting BSI, and finding out when the next examination is due, and which the relevant committee is.

On a quick browse of 5218, it is very clear that this isn't a group of people sitting down and making up codes; it is (as is often the case) formalizing existing processes in member countries. My other investigation suggests that the convention odd=male, even=female is probably the origin of it. It may well be, then, a very uphill task indeed to argue for a revision.
Maybe the argument of offence caused would have an impact. Maybe the argument that sex is not regarded as binary these days would have an impact. Still, the point about formalizing existing processes remain; arguably, one has to first get a majority practice across the member countries of no longer regarding sex as a binary divide.

I cannot see which of the lengthy and detailed replies on the ticket is not taking it seriously, by the way.

I would not agree that "using an outmoded standard, just because it is a standard, isn't a useful approach.". I'd argue that it is a great deal better than having no interchange of information at all. It is pretty obvious, isn't it, that the standard of doing our calendar based on the supposed birth of a Jewish prophet in a religion which is a minority worldwide is outmoded - but its jolly useful!

Original comment by: @sebastianrahtz

@TEITechnicalCouncil
Copy link
Author

I think there are 6 answers here, some backward compatible and some not:

  1. stay aligned with ISO, adding a sentence or two apologizing for any offence caused and explaining more background

  2. allow any digits, except 0 and 9, and say that "odd means male, even means female"

  3. just change to an open value, taxonomy up to the project

  4. make data.sex into a pointer, and say that it must point to a taxonomy

  5. invent our own tokenization based on a non-ordered set (eg symbols, though almost
    any choice is open to causing offence, symbols are powerful beasts)

  6. some combiunation of the above, with several attributes; eg keep @sex as @isosex, and add a new @gender (or whatever) pointing at a taxonomy

Original comment by: @sebastianrahtz

@TEITechnicalCouncil
Copy link
Author

@melissaterras: I don't think anyone has said that you can't possibly be offended at this, you certainly can and as I said your interpretation has weight and validity, just as much as anyone else's. I believe we honestly don't want to cause offence to any of our users while still providing a useful and robust encoding scheme. It is tricky and would be much easier for the TEI if the ISO standard was changed.

@rahtz: You are right in your summary of ISO workings, I had forgotten that. I still think the right course of action is to get involved with any others out there who are working to change the ISO standard whether that is via the BSI or elsewhere.

I will encourage the rest of tei-council to comment on this (and if it seems reasonable later to draw attention to this ticket on TEI-L).

To spell out my proposed solution for those who don't want to use a numerical @sex attribute I would have used @ana with two URI pointers in it pointing to a taxonomy in the header (or elsewhere online):

<person ana="#idOfBiologicalSexualCategory #idOfGenderIdentification">...</person>

which would then point to a taxonomy with categories with the appropriate IDs. I include the (debatable) bio vs genderIdentification here simply to highlight that such an approach allows multiple vectors however the encoders feel would be useful to categorise their taxonomies. If we adopted @rahtz's proposal #4 then @sex would do this. The drawback is one of perceived interoperability -- a large corpus of texts from disparate projects would have to be normalised back into a single system (whatever that system is).

Of his suggestions:

  1. I'd vote for this one.
  2. So women become 5 and men become 4? (Or something else... it doesn't necessarily solve the problem) and makes interchange more difficult.
  3. possible but we lose the benefit of having a datatype in the first place.
  4. If not 1. then I'd vote for this one (with backwards compatibility arguments pending)
  5. This sounds like just a different can of worms
  6. Possible, but a more major change... I think we'd need more community input on how desired this change was.

Of all of them 1. is easiest, but may not really solve the problem of offence generated, just apologise for it shifting the blame to ISO. I realise that isn't very satisfactory but at least recognises the problem while causing the minimal side-effects in backwards compatibility for the community.

Original comment by: @jamescummings

@TEITechnicalCouncil
Copy link
Author

I would vote for 1. as well, maybe referring to James' good proposal of using @ana. It is my understanding that anything going beyond the baseline reflected in the ISO standard (which I do not read as a political or sociological stance, but rather as the simplest way to ensure interoperability within administrative information system) relates to interpretative processes that @ana can take into account quite well.

Original comment by: @laurentromary

@TEITechnicalCouncil
Copy link
Author

I agree with Laurent. Promoting/explaining the use of @ana on <sex> (surely not on <person>? that would not have enough context) alongside the existing minimal @(iso)sex seems desirable. One could imagine a new section (after http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ND.html\#NDPERSEpc?\) which discussed the issue of how to represent non-binary sex.

Original comment by: @sebastianrahtz

@TEITechnicalCouncil
Copy link
Author

Using @ana on <sex> would also be fine. I've usually preferred the @sex attribute because projects I've been involved with have not been recording human-readable text alongside the sex, just a very limited (and usually binary) interpretation of it viz male/female represented by a digit. to use <sex> then just provides additional markup when not required for those situations. But yes, that would work as well.

None of this addresses the central problem of ISO 5218 and that our use of it may be offensive. I think there is a choice (before @rahtz's more detailed options) of either:
a) we still use ISO 5218 for data.sex datatype (possibly encouraging alternatives and explaining its limitations and potential for offence) or
b) we abandon ISO 5218 in favour of some-other-system, as yet not codified, that either we create or adopt from a different group.

Of the two I would prefer a) but with an explanation of possible problems and limitations added in chapters and reference pages, and use of something like @ana pointing to a detailed taxonomy suggested as a mechanism for those who want more fine-grained (and potentially less offensive) methods.

Original comment by: @jamescummings

@TEITechnicalCouncil
Copy link
Author

I think we need a slightly more coherent approach than is being discussed here. I suggest:

  1. deprecating (but not removing) both person/@sex and sex/@value

  2. replacing both by a new (resumably classed, e.g. att.sex), @sex-iso (parallel with @when-iso which we don't recommend but provide for people who want to use the, to us not ideal, ISO standard)

  3. also add (maybe in the same class) @sexRef, as suggested by Sebastian, which allows linking to internal or externally defined taxonomies of sex via url/uri/pointer (I prefer this to @ana, especially on <person>, but am flexible)

For those who wish to continue using ISO 5218 in the meantime, the only difference is that the attribute they are using is mildly disrecommended in favour of @sex-iso, and we encourage them to move over to that in the next couple of years. Of course, when ISO next update that datatype, we'll change the model of that attribute to follow.

Brief additional prose to point out the problems with ISO 5218 would be welcome. I don't think we want to discuss "how to represent non-binary sex" especially, as again that's not our place. How to use taxonomies other than ISO (whatever the reason for your dissatisfaction for it) would be essential, however.

Original comment by: @gabrielbodard

@TEITechnicalCouncil
Copy link
Author

I'd simplify this to

  1. add @sexRef to <person> and <sex>
  2. add extra prose around @sex/<sex> pointing out the issues

and forget @sex-iso/deprecated @sex stuff.

I don't think our deprecation mechanisms are enough to have enough effect.

If are prepared to consider this issue against Birnbaum, rename @sex to @iso-sex now.

Original comment by: @sebastianrahtz

@TEITechnicalCouncil
Copy link
Author

<personGrp> would also need any new attribute (it currently has @sex), and while we're at it we could add it to <listPerson> as well. If a <personGrp> can have a consistent sex value, I don't see why a <listPerson> couldn't either.

A new attribute class for @sex and @sexRef would be a good idea, but there are two problems: <sex> uses @value rather than @sex, and I can't think of any name for such a class that doesn't seem ridiculous.

Original comment by: @martindholmes

@TEITechnicalCouncil
Copy link
Author

Martin: This is why I suggested we replace both person/@sex (and as you say personGrp/@sex) and sex/@value with the new @sex-iso and @sexRef. It has the added advantage of making the attributes more coherent. (And groupable in a class.)

Why does att.sex sound ridiculous?

Sebastian: I still think that renaming to @sex-iso (either with deprecation or without) has two advantages: consistency (as above) and quarantining ourselves a little bit from the problems with the ISO standard.

Original comment by: @gabrielbodard

@TEITechnicalCouncil
Copy link
Author

Gabby: I do like your solution, but it is a bit disruptive compared with Sebastian's. Re att.sex: I was thinking it should be an adjective, and I couldn't think of an acceptable one.

Original comment by: @martindholmes

@TEITechnicalCouncil
Copy link
Author

i am ok with renaming @sex to @iso-sex, but less keen on that and keeping old @sex (albeit deprecated). It is a toss-up between Birnbaum and user confusion. I'd take that one to TEI-L, to get a sense of how much @sex is used and relied upon in processing.

Original comment by: @sebastianrahtz

@TEITechnicalCouncil
Copy link
Author

Sorry to come to this particular party a bit late. Here are my views:

  1. Renaming @sex to @iso-sex is fine by me (not @sex-iso, please) but if we do that we have to deprecate @sex, I think
  2. By the same token i've no problem with renaming sex@value to sex@iso-value, and deprecating sex@value.
  3. if there is any evidence of people wanting coded values for sexuality or gender more subtle than those provided by the ISO standard, by all means add @sexRef to <person>, but it makes no sense to add it to <sex> (since what would it be pointing to if not a <sex> element in which the particular constellation of factors concerned was described?). So far as I am aware however the only requirement is to encode quite coarsely. Happy to be proved wrong of course.
  4. I'm unhappy with using @ana for this -- it's not an interpretation, it's a (possibly reductive) coding/classification of information already represented in full within the <sex> element.
  5. the difference between personGrp and listPerson is a bit subtle: a personGrp can have a @sex iff it's a group of people acting as one, sexed, individual; a listPerson on the other hand cannot have a @sex imho

Original comment by: @lb42

@TEITechnicalCouncil
Copy link
Author

Lou:

  1. I prefer the idea of renaming sex/@value -> @iso-sex, in parallel with (and defined by the same class as) the attribute on <person>. I can see why the idea of naming an attribute sex/@sex looked silly when this was first designed (but cf the parallel of certainty/@cert and precision/@precision, which I assert are Good Things, because they can be defined in a single place).

  2. I don't think <sex> is an element that defines a sex is it (although that would be useful); it's just a free-text way of defining a person's sex, and so liable to normalization in exactly the same way as the attribute on <person>.

(5. Side issue: presumably you could have a listPerson broken up into 1 sub-listPerson containing all the men, another containing only women, and a third containing all the intersex athletes at the 2028 Olympics; wouldn't it be useful to be able to attach @sexRef to the containing list rather than to each person with it?)

Original comment by: @gabrielbodard

@TEITechnicalCouncil
Copy link
Author

Carina Zona has sent me some links to more inclusive standards for recording sex, some of which have a certain amount of real-world use. Most interesting is:

> http://transhealth.ucsf.edu/trans?page=lib-data-collection <- the
> best I've seen, and it's gotten medical research community traction.
> Still has problems (chromosomal sex isn't addressed well -- a
> surprising oversight given the context), but a good starting point for
> cases in which there's a case to be made for requiring checkboxes
> instead of open text fields.

Other links she recommended included:
https://genderoutlaw.wordpress.com/2008/05/16/pownce-scores-on-gender-options/
http://genderoutlaw.wordpress.com/2008/03/09/w00t-for-yahoo/
http://www.sarahdopp.com/blog/2010/gender-is-a-text-field-diaspora-backstory-and-context/
http://tantek.com/log/2007/11.html\#d02t2318
http://microformats.org/wiki/gender-brainstorming
http://microformats.org/wiki/gender-formats
http://microformats.org/wiki/gender-examples
and "Schemas for the Real World" in http://cczona.com/talks

I don't think any of these schemas/standards are ready for use as a replacement for ISO 5218 at this moment, but some may be the basis for a competing standard or a modified ISO proposal at some point. More immediately relevant right now, it might be worth pointing to one or two of these in the Guidelines discussion of what alternatives there are to the binary distinction forced by @iso-sex.

Original comment by: @gabrielbodard

@TEITechnicalCouncil
Copy link
Author

The first link does look interesting. It breaks down sex/gender identification into two distinct questions: 1) How does someone self-identify, and 2) what sex was assigned to the person at birth. The options for the latter fit with ISO (although that doesn't address the offence of ordinal precedence). The second question is concerned with how people self-identify when asked, and this is an enumeration with the additional option of supplying a new value.

I think this distinction fits with our proposed distinction between @sex and @sexRef.

Original comment by: @martindholmes

@TEITechnicalCouncil
Copy link
Author

  • labels: TEI: New or Changed Element --> TEI: New Or Changed Element
  • status: open --> open-accepted

Original comment by: @gabrielbodard

@TEITechnicalCouncil
Copy link
Author

At the TEI Council meeting in Brown, 2013-04, we agreed to change the datatype of person/@sex, personGrp/@sex and sex/@value from ISO 5218 to data.word, so as to allow the use of locally defined values or alternative published standards to be used in these attributes. I will make this change in data.sex, and also change the prose to reflect this in all the affected elements (while retaining a reference to ISO and one or two other useful standards).

In the meantime (and in another ticket) Syd is going to suggest changing data.enumerated to data.word so that we can use that here and values such as "0", "1" will remain valid (currently a data.enumerated is data.name which has to begin with an alphabetic character, and would therefore break backward-compatibility). The datatype of data.sex may therefore eventually be changed to data.enumerated or similar, but all values valid against data.word will remain valid.

Original comment by: @gabrielbodard

@TEITechnicalCouncil
Copy link
Author

  • labels: TEI: New Or Changed Element --> TEI: New or Changed Element

Original comment by: @gabrielbodard

@TEITechnicalCouncil
Copy link
Author

  • status: open-accepted --> closed-accepted
  • Group: RED --> AMBER

Original comment by: @gabrielbodard

@TEITechnicalCouncil
Copy link
Author

Done at revision [r11913].

Original comment by: @gabrielbodard

@TEITechnicalCouncil
Copy link
Author

Great to see some movement on this! thanks for taking it forward - appreciated.

Original comment by: @melissaterras

@TEITechnicalCouncil
Copy link
Author

This issue was originally assigned to SF user: gabrielbodard
Current user is: gabrielbodard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants