# Introduction to using Gilda

## Using Gilda through its Python API
First, we `import gilda`, which allows us to access its API functions, most importantly, `ground`. The `ground` function takes a string argument which is the entity string to be grounded. It also takes an optional `context` argument, another string, which serves as additional text context to disambiguate the query string in case it is ambiguous.

In [1]:
import gilda
scored_matches = gilda.ground('k-ras')

The `ground` function returns a list of `ScoredMatch` objects which contain a `Term` (a grounded entity) and some metadata about the grounding process, including a `score` which can be used for ranking matches. The list of `ScoredMatch`-es is always returned in reverse sorted order by score.

In [2]:
scored_matches[0].term

Term(kras,Kras,HGNC,6407,KRAS,assertion,famplex)

The namespace and ID of the resulting term can now be used for any downstream task.

In [3]:
scored_matches[0].term.db, scored_matches[0].term.id

('HGNC', '6407')

## Using Gilda as a web service
First, the Gilda service needs to be running. The public service is available at

In [4]:
public_url = 'http://grounding.indra.bio/ground'

but it is also possible to run the Gilda service locally by doing `python -m gilda.app` and then using the local address as

In [5]:
local_url = 'http://localhost:8001/ground'

By default, here we will use the `public_url` to show how Gilda works.

In [6]:
url = public_url

### Submitting a request to the service

The following helper function submits a request to the service and returns the result.

In [7]:
import requests
def ground(url, text, context=None):
    res = requests.post(url, json={'text': text, 'context': context})
    return res.json()

#### Simple entity string with one match
In this first example, we submit the string `k-ras` for grounding. It doesn't directly match any names or synonyms in the integrated resources, but the approximate string matching accounts for capitalization and dashes, and finds the `KRAS` gene as the unique match.

Note the following details of the returned value:
- The returned value is a list of scored matches, in this case it only has 1 element
- Each scored match contains 3 keys: term, score, and match
- The `term` describes the entry that the string was matched to, including the database / name space `db`, the identifier within that namespace `id`, the standardized name of the entry `entry_name`, and some other epistemic information including the `status`, `source`, and `norm_text` corresponding to the entry.
- The `score` is a number between 0 and 1, with "better" matches corresponding to a higher score. The service, by default, sorts scored matches from highest to lowest score.
- The `match` field is there as meta-data about the match, it characterizes details of how the input string relates to the entry, and users typically do not need to use it directly.

In [8]:
ground(url, 'k-ras')

[{'match': {'cap_combos': [['all_lower', 'single_cap_letter']],
   'dash_mismatches': ['query'],
   'exact': True,
   'query': 'k-ras',
   'ref': 'Kras',
   'space_mismatch': False},
  'score': 0.9936095650381365,
  'term': {'db': 'HGNC',
   'entry_name': 'KRAS',
   'id': '6407',
   'norm_text': 'kras',
   'source': 'famplex',
   'status': 'assertion',
   'text': 'Kras'}}]

#### Simple entity string with multiple matches
Let's now look at an example where there are multiple matches but with different statuses. Here `MEK` is an exact match for the MEK protein family, asserted by FamPlex, and a match for a synonym of a chemical in ChEBI called butan-2-one. The protein family gets a much higher score due to its prioritization as an official name.

In [9]:
ground(url, 'MEK')

[{'match': {'cap_combos': [],
   'dash_mismatches': [],
   'exact': True,
   'query': 'MEK',
   'ref': 'MEK',
   'space_mismatch': False},
  'score': 1.0,
  'term': {'db': 'FPLX',
   'entry_name': 'MEK',
   'id': 'MEK',
   'norm_text': 'mek',
   'source': 'famplex',
   'status': 'assertion',
   'text': 'MEK'}},
 {'match': {'cap_combos': [],
   'dash_mismatches': [],
   'exact': True,
   'query': 'MEK',
   'ref': 'MEK',
   'space_mismatch': False},
  'score': 0.5555555555555556,
  'term': {'db': 'CHEBI',
   'entry_name': 'butan-2-one',
   'id': 'CHEBI:28398',
   'norm_text': 'mek',
   'source': 'chebi',
   'status': 'synonym',
   'text': 'MEK'}}]

### Grounding with contextual disambiguation
In this example, we demonstrate how Gilda can disambiguate entity senses based on additional context. Gilda integrates Adeft, and relies on one of the 46 trained models that Adeft provides (https://github.com/indralab/adeft) to disambiguate an entity text based on some additional context (i.e., surrounding text).

We look at "IR" as an example, which is widely used in the literature as an acronym for e.g., insulin receptor, and ionizing radiation.

In the first example, we ground IR with context implying the insulin receptor sense:

In [10]:
ground(url, 'IR', context='IR binds INS at the membrane.')[0]

{'disambiguation': {'match': 'grounded',
  'score': 0.9945447300565196,
  'type': 'adeft'},
 'match': {'cap_combos': [],
  'dash_mismatches': [],
  'exact': True,
  'query': 'IR',
  'ref': 'IR',
  'space_mismatch': False},
 'score': 0.9945447300565196,
 'term': {'db': 'HGNC',
  'entry_name': 'INSR',
  'id': '6091',
  'norm_text': 'ir',
  'source': 'famplex',
  'status': 'assertion',
  'text': 'IR'}}

As expected, the top grounding we get is to the insulin receptor gene, INSR.

Next, we look at a sentence which implies that IR is used in the sense of ionizing radiation:

In [11]:
ground(url, 'IR', context='IR can cause DNA damage.')[0]

{'disambiguation': {'match': 'grounded',
  'score': 0.9915279740334499,
  'type': 'adeft'},
 'match': {'cap_combos': [],
  'dash_mismatches': [],
  'exact': True,
  'query': 'IR',
  'ref': 'IR',
  'space_mismatch': False},
 'score': 0.9915279740334499,
 'term': {'db': 'MESH',
  'entry_name': 'Radiation, Ionizing',
  'id': 'D011839',
  'norm_text': 'ir',
  'source': 'famplex',
  'status': 'assertion',
  'text': 'IR'}}

In this case, we end up with the MeSH entry representing ionizing radiation as the top grounding.
The above examples demonstrate that in many cases, even a few words of surrounding text can help reliably dismbiguate senses. Generally, disambiguation becomes more accurate given more context, e.g., the full text of the article containing the entity string.