Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support FamPlex #131

Closed
wants to merge 18 commits into from
Closed

Support FamPlex #131

wants to merge 18 commits into from

Conversation

jvwong
Copy link
Member

@jvwong jvwong commented Mar 29, 2023

Summary

Added support for FamPlex namespaced entities (family and complex) including 31 tests for FamPlex entities. I corrected some older test cases that, in fact, references families (e.g. Acc, p70S6K).

There are some additional changes and dependencies that should be addressed at some point (see Outstanding and Related issues)

Entity "type"?

The ENTRY_TYPE for famplex is currently set to 'entity'.

There are several alternatives to consider:

  • Leave as entity
    • Most appropriate in factoid
    • Issues
      • Curation tool: entity doesn't appear to be an option; update required
      • Export to BioPAX and SBGN
        • SBGN: entity is labelled as SBGN node type 'unspecified entity' which is OK for now
        • BioPAX: entity is labelled as bp:PhysicalEntity but no external grounding is attached.
  • Set as ggp
    • Easiest and doesn't require other changes to grounding UI, export formats
    • For family members, this may be reasonable but for complexes, it feels odd as they are not single genes or molecules.
  • Set type as complex
    • factoid expects a complex to consist of > 1 components. However, this is complicated because (Famplex) complexes can consist of different combinations of components (e.g. proteins) some of which may be family members
    • factoid doesn't 'ground' parent complex
  • Create new type complex/family
    • this starts to feel like entity anyways

Test results

Added famplex.json

Test cases added for each of the top 15 most frequently referenced entities:

"We found that the 15 most frequently-referenced FamPlex identifiers accounted for 50% of all FamPlex groundings (blue bars in Fig. 3d)"
Bachman, J. A., Gyori, B. M. & Sorger, P. K. FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining. BMC Bioinformatics 19, 248 (2018).

Results

I summarize the search accuracy, without and with FamPlex ( texts, data ):

FamPlex included type passes failures  rate
No search 649 105 13.9%
Yes search 680 107 13.6%

Below, I report additional failing cases when Famplex data and tests are included:

text expected.namespace expected.id actual.namespace actual.id organismOrdering *rank
let-7 ncbi 266952 fplx MIRLET7 [6239] -1
UNC-40 ncbi 172233 ncbi 32757 [6239] -1
lin-35 ncbi 172249 ncbi 55957 [6239] -1

**rank indicates the rank order in search results; -1 means it was not returned at all

Refs #130

@jvwong
Copy link
Member Author

jvwong commented Apr 13, 2023

Sample outputs:

biofactoid

biofactoid

app-ui (SBGN)

apps-ui_pathway

neo4j

neo4j_ui

@jvwong jvwong closed this May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant