Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue for converting bio sequence #17

Closed
ihungalexhsu opened this issue Jun 16, 2022 · 3 comments
Closed

Issue for converting bio sequence #17

ihungalexhsu opened this issue Jun 16, 2022 · 3 comments

Comments

@ihungalexhsu
Copy link

Hi @jgmf-amazon, thanks for your reply. However, the current evaluation code contains some issue:

def convert_to_bio(seq_tags, outside='Other', labels_merge=None):
    """
    Converts a sequence of tags into BIO format. EX:

        ['city', 'city', 'Other', 'country', -100, 'Other']
        to
        ['B-city', 'I-city', 'O', 'B-country', 'I-country', 'O']
        where outside = 'Other' and labels_merge = [-100]

    :param seq_tags: the sequence of tags that should be converted
    :type seq_tags: list
    :param outside: The label(s) to put outside (ignore). Default: 'Other'
    :type outside: str or list
    :param labels_merge: The labels to merge leftward (i.e. for tokenized inputs)
    :type labels_merge: str or list
    :return: a BIO-tagged sequence
    :rtype: list
    """

    seq_tags = [str(x) for x in seq_tags]

    outside = [outside] if type(outside) != list else outside
    outside = [str(x) for x in outside]

    if labels_merge:
        labels_merge = [labels_merge] if type(labels_merge) != list else labels_merge
        labels_merge = [str(x) for x in labels_merge]
    else:
        labels_merge = []

    bio_tagged = []
    prev_tag = None
    for tag in seq_tags:
        if tag in outside:
            bio_tagged.append('O')
            prev_tag = tag
            continue
        if tag != prev_tag and tag not in labels_merge:
            bio_tagged.append('B-' + tag)
            prev_tag = tag
            continue
        if tag == prev_tag or tag in labels_merge:
            if prev_tag in outside:
                bio_tagged.append('O')
            else:
                bio_tagged.append('I-' + prev_tag)

    return bio_tagged

The current code will meet a bug when:
prev_tag is None and tag is -100.
This will not happen if model is properly trained, but when train the model at the beginning stage, the model might output this silly combination.

Maybe a potential solution is not initialized prev_tag with None, but using 'O'?

Originally posted by @ihungalexhsu in #13 (comment)

@jgmf-amazon
Copy link
Contributor

Hi @ihungalexhsu , good catch. How does this look?

6a12822

@ihungalexhsu
Copy link
Author

Yeah, I think this test case looks good!

@cperiz
Copy link
Contributor

cperiz commented Jun 16, 2022

Thanks @ihungalexhsu. Merged PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants