In [1]:
import google.generativeai as genai
import os
import pandas as pd
from IPython.display import display, Markdown, Latex
from pathlib import Path
from tqdm import tqdm

In [2]:
OUTPATH = 'data/eutils_cleaned'
DATA_DIR = "data/eutils_raw/"
Path(OUTPATH).mkdir(parents=True, exist_ok=True)
BASE_PROMPT_PATH = "data/prompts/apidoc2md.md"
DOC_ORDER = ["einfo", "esearch", "epost",
             "esummary", "efetch", "elink", 
             "egquery", "espell", "ecitmatch"]
DOC_ORDER = pd.DataFrame(zip(DOC_ORDER, range(len(DOC_ORDER))), 
                         columns=["api_name", "format_order"])

In [3]:
genai.configure(api_key=os.environ["GEMINI_API_KEY"])

# See https://ai.google.dev/api/python/google/generativeai/GenerativeModel
# when response_mime_type is set to application/json api returns "RECITATION"
# due to some repetition safe guard?
# https://dropbox.tech/machine-learning/bye-bye-bye-evolution-of-repeated-token-attacks-on-chatgpt-models 
generation_config = {
  "temperature": 0,
  "top_p": 0.95,
  "top_k": 64,
  "max_output_tokens": 8192,
  "response_mime_type": "text/plain",
}

In [4]:
model = genai.GenerativeModel(
  model_name="gemini-1.5-flash-latest",
  generation_config=generation_config,
  system_instruction="You are a helpful, assistant. You take raw, unformatted text and output clean, [structured markdown](https://www.markdownguide.org/)"
)

In [5]:
api_docs = []

for _file in Path(DATA_DIR).iterdir():
    with open(_file) as f:
        doc = f.read()
    api_docs.append([Path(_file).stem, doc])

api_docs = pd.DataFrame(api_docs, columns=["api_name", "raw_text"])

In [6]:
with open(BASE_PROMPT_PATH) as f:
    base_prompt = f.read()
base_prompt

'# Instruction\nConvert the following api documentation into markdown. Improve its structure so it is more informative to readers.\n'

In [7]:
model = genai.GenerativeModel(
  model_name="gemini-1.5-flash-latest",
  generation_config=generation_config,
)


responses = []
for index, row in tqdm(api_docs.iterrows(), total=api_docs.shape[0]):
  # tack on extra instruction to keep names as they should be.
  prompt = base_prompt + "\nPlease spell the API name exactly as it appears in the text.\n"


  response = model.generate_content([base_prompt, f"filename: {row['api_name']}.txt", row["raw_text"]])
  try:
    responses.append(response.text)
  except ValueError:
    try: 
      print(response.candidates)
    except:
      print("Gemini is throwing a tantrum.")

100%|██████████| 9/9 [01:22<00:00,  9.12s/it]


In [9]:
api_docs["text_md"] = responses
api_docs = api_docs.merge(DOC_ORDER)
api_docs = api_docs.sort_values(["format_order"])

In [10]:
for i in api_docs["text_md"]:
    display(Markdown(i))

## EInfo API Documentation

The EInfo API provides information about Entrez databases, including a list of all available databases and detailed statistics for a specific database.

**Base URL:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi
```

**Functions:**

* **List all Entrez databases:** Returns a list of all valid Entrez database names.
* **Retrieve database statistics:** Provides detailed statistics for a single Entrez database, including indexing fields and available link names.

**Required Parameters:**

* None. If no `db` parameter is provided, EInfo will return a list of all valid Entrez databases.

**Optional Parameters:**

| Parameter | Description | Example |
|---|---|---|
| `db` | Target database to retrieve statistics for. Must be a valid Entrez database name. | `db=protein` |
| `version` | Specifies version 2.0 EInfo XML. The only supported value is '2.0'. | `version=2.0` |
| `retmode` | Retrieval type. Determines the format of the returned output. | `retmode=json` |

**Version 2.0 EInfo XML:**

When `version=2.0` is specified, the EInfo XML will include two new fields:

* `<IsTruncatable>`: Indicates whether a field allows the wildcard character '*' for searching.
* `<IsRangeable>`: Indicates whether a field allows the range operator ':' for searching.

**Examples:**

**1. List all Entrez databases:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi
```

**2. Retrieve statistics for Entrez Protein (version 2.0):**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=protein&version=2.0
```

**3. Retrieve statistics for Entrez Protein in JSON format:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=protein&retmode=json
```

**Note:**

* The wildcard character '*' can be used in fields marked as `<IsTruncatable>true</IsTruncatable>`.
* The range operator ':' can be used in fields marked as `<IsRangeable>true</IsRangeable>`.
* The `retmode` parameter can be used to retrieve the output in JSON format.


## ESearch API Documentation

The ESearch API provides a powerful way to search Entrez databases programmatically. It allows you to:

* **Retrieve a list of UIDs** matching a text query.
* **Post search results** to the History server for later use.
* **Download UIDs** from a dataset stored on the History server.
* **Combine or limit** UID datasets stored on the History server.
* **Sort sets of UIDs**.

**Important Note:** Some NCBI products offer search features on their web interfaces that are not available through ESearch. For example, PubMed's web interface includes citation matching and spelling correction tools that are not accessible via the API.

### Base URL

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi
```

### Required Parameters

* **db:** The Entrez database to search. Must be a valid database name (e.g., `pubmed`, `nucleotide`, `protein`). Defaults to `pubmed`.
* **term:** The Entrez text query. All special characters must be URL encoded. Spaces can be replaced with '+' signs. For very long queries, consider using an HTTP POST call. Refer to the PubMed or Entrez help for information on search field descriptions and tags.

**Example:**

```
esearch.fcgi?db=pubmed&term=asthma
```

**Proximity Searching in PubMed:**

PubMed supports "proximity searching" for multiple terms appearing within a specified number of words from each other.

**Example:**

```
esearch.fcgi?db=pubmed&term=”asthma treatment”[Title:~3]
```

This query searches for "asthma treatment" where the terms appear within 3 words of each other in the `Title` field.

### Optional Parameters

#### History Server

* **usehistory:** When set to 'y', ESearch posts the resulting UIDs to the History server. This allows you to use them in subsequent E-utility calls. `usehistory` must be set to 'y' to use `WebEnv` or `query_key`.
* **WebEnv:** A web environment string returned from a previous ESearch, EPost, or ELink call. Providing `WebEnv` appends the search results to the existing environment. It also allows using query keys in `term` to combine or limit previous search sets. `usehistory` must be set to 'y' when using `WebEnv`.
* **query_key:** An integer query key returned from a previous ESearch, EPost, or ELink call. ESearch finds the intersection of the set specified by `query_key` and the set retrieved by the query in `term` (i.e., joins them with AND). `WebEnv` and `usehistory` must be set to 'y' for `query_key` to function.

**Query Keys in `term`:**

Query keys can be included in `term` by preceding them with '#' (%23 in the URL). While only one `query_key` parameter can be provided, multiple query keys can be combined in `term` using AND, OR, and NOT.

**Example:**

```
esearch.fcgi?db=pubmed&term=%231+AND+asthma&WebEnv=<webenv string>&usehistory=y
```

This is equivalent to:

```
esearch.fcgi?db=pubmed&term=asthma&query_key=1&WebEnv=<webenv string>&usehistory=y
```

#### Retrieval

* **retstart:** The sequential index of the first UID to be shown in the XML output (default=0). Use this parameter with `retmax` to download a specific subset of UIDs.
* **retmax:** The total number of UIDs to be shown in the XML output (default=20). By default, ESearch only includes the first 20 UIDs. The remaining UIDs are stored on the History server if `usehistory` is set to 'y'. Increase `retmax` to include more UIDs in the output (up to 10,000).
* **rettype:** The retrieval type. Allowed values:
    * `uilist` (default): Displays the standard XML output.
    * `count`: Displays only the `<Count>` tag.
* **retmode:** The format of the returned output. Allowed values:
    * `xml` (default): ESearch XML format.
    * `json`: JSON format.
* **sort:** Specifies the sorting method for UIDs in the output. Available values vary by database and can be found in the Display Settings menu on an Entrez search results page. If `usehistory` is set to 'y', the UIDs are loaded onto the History Server in the specified sort order.

**Example:**

```
esearch.fcgi?db=pubmed&term=asthma&sort=pub_date
```

This sorts the results by publication date in descending order.

* **field:** Limits the entire search term to a specific Entrez field.

**Example:**

```
esearch.fcgi?db=pubmed&term=asthma&field=title
```

This is equivalent to:

```
esearch.fcgi?db=pubmed&term=asthma[title]
```

* **idtype:** Specifies the type of identifier to return for sequence databases (nuccore, popset, protein). Defaults to GI numbers. Set to `acc` to return accession.version identifiers.

#### Dates

* **datetype:** The type of date used to limit a search. Allowed values vary by database, but common values include `mdat` (modification date), `pdat` (publication date), and `edat` (Entrez date).
* **reldate:** When set to an integer `n`, returns items with a date specified by `datetype` within the last `n` days.
* **mindate, maxdate:** Date range used to limit a search result by the date specified by `datetype`. Must be used together to specify an arbitrary date range. Format: YYYY/MM/DD, YYYY, YYYY/MM.

### Examples

* **Search PubMed for abstracts with an Entrez date within the last 60 days, retrieve the first 100 PMIDs, post the results to the History server, and return a WebEnv and query_key:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=60&datetype=edat&retmax=100&usehistory=y
```

* **Search PubMed for articles in PNAS, Volume 97, and retrieve six PMIDs starting with the seventh PMID:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=PNAS[ta]+AND+97[vi]&retstart=6&retmax=6&tool=biomed3
```

* **Search the NLM Catalog for journals matching the term "obstetrics":**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nlmcatalog&term=obstetrics+AND+ncbijournals[filter]
```

* **Search PubMed Central for free full text articles containing the query "stem cells":**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pmc&term=stem+cells+AND+free+fulltext[filter]
```

* **Search Nucleotide for all tRNAs:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nucleotide&term=biomol+trna[prop]
```

* **Search Protein for a molecular weight range:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=70000:90000[molecular+weight]
```

This documentation provides a comprehensive overview of the ESearch API. For more detailed information, refer to the official NCBI documentation.


## epost: Upload UIDs to Entrez History

The `epost` API allows you to upload a list of UIDs (Unique Identifiers) to the Entrez History server. This enables you to save and manage sets of UIDs for later retrieval or use in other Entrez tools.

**Base URL:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi
```

**Functions:**

* **Uploads a list of UIDs to the Entrez History server:** This function allows you to create a new set of UIDs associated with a specific Entrez database.
* **Appends a list of UIDs to an existing set of UID lists attached to a Web Environment:** This function allows you to add UIDs to an existing set of UIDs that are already associated with a specific Web Environment.

**Required Parameters:**

* **`db`:** The Entrez database containing the UIDs in the input list. This must be a valid Entrez database name (e.g., `pubmed`, `protein`, `nuccore`). The default value is `pubmed`.
* **`id`:** A comma-delimited list of UIDs from the specified database. You can provide a single UID or multiple UIDs. 
    * **PubMed:**  A maximum of 10,000 UIDs can be included in a single URL request.
    * **Other databases:** There is no set maximum, but for large lists (over 200 UIDs), it is recommended to use the HTTP POST method.
    * **Sequence databases (nuccore, popset, protein):** You can use a mixed list of GI numbers and accession.version identifiers. However, large lists of accession.version identifiers may cause timeouts due to conversion steps. It is recommended to batch these requests in sizes of about 500 UIDs or less.

**Example:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi?db=pubmed&id=19393038,30242208,29453458
```

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi?db=protein&id=15718680,NP_001098858.1,119703751
```

**Optional Parameter:**

* **`WebEnv`:** The Web Environment to which the UID list will be added. This parameter is usually obtained from the output of a previous `ESearch`, `EPost`, or `ELink` call. If not provided, a new Web Environment will be created and the UID list will be associated with query_key 1.

**Example:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi?db=protein&id=15718680,157427902,119703751&WebEnv=<webenv string>
```

**Example: Post records to PubMed:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi?db=pubmed&id=11237011,12466850
```

**Note:**

* The `epost` API returns a response in XML format.
* The response contains information about the Web Environment and the query key associated with the uploaded UIDs.
* You can use the `WebEnv` and `query_key` values to retrieve the uploaded UIDs using the `efetch` API.

**Further Information:**

* [Entrez Programming Utilities Documentation](https://www.ncbi.nlm.nih.gov/books/NBK25499/)
* [epost API Documentation](https://www.ncbi.nlm.nih.gov/books/NBK25499/chapter/ch10/)


## ESummary API Documentation

The ESummary API provides a way to retrieve document summaries (DocSums) for a list of input UIDs from various Entrez databases. 

**Base URL:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi
```

**Functions:**

* Returns DocSums for a list of input UIDs.
* Returns DocSums for a set of UIDs stored on the Entrez History server.

**Required Parameters:**

**1. `db`:**

* **Description:** Database from which to retrieve DocSums.
* **Value:** A valid Entrez database name (e.g., `pubmed`, `protein`, `nucleotide`).
* **Default:** `pubmed`

**2. `id` (Used only when input is from a UID list):**

* **Description:** UID list. Either a single UID or a comma-delimited list of UIDs.
* **Value:** A valid UID or a list of UIDs from the specified database.
* **Note:** For sequence databases (e.g., `nuccore`, `popset`, `protein`), the UID list can include both GI numbers and accession.version identifiers.
* **Example:** `esummary.fcgi?db=pubmed&id=19393038,30242208,29453458`

**3. `query_key` and `WebEnv` (Used only when input is from the Entrez History server):**

* **Description:** These parameters specify the Web Environment and query key containing the UID list to be used as input.
* **Value:** Obtained from the output of previous ESearch, EPost, or ELink calls.
* **Example:** `esummary.fcgi?db=protein&query_key=<key>&WebEnv=<webenv string>`

**Optional Parameters:**

**1. Retrieval:**

* **`retstart`:** Sequential index of the first DocSum to be retrieved (default: 1).
* **`retmax`:** Total number of DocSums to be retrieved (maximum: 10,000).
* **`retmode`:** Retrieval type (default: `xml`). Supported values: `xml`, `json`.
* **`version`:** Specifies version 2.0 ESummary XML (only supported value: `2.0`).

**Examples:**

* **PubMed:** `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=11850928,11482001`
* **PubMed (version 2.0 XML):** `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=11850928,11482001&version=2.0`
* **Protein:** `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=protein&id=28800982,28628843`
* **Nucleotide:** `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nucleotide&id=28864546,28800981`
* **Structure:** `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=structure&id=19923,12120`
* **Taxonomy:** `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=taxonomy&id=9913,30521`
* **UniSTS:** `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=unists&id=254085,254086`

**Note:**

* The ESummary API is designed to retrieve DocSums in batches. If the total set of DocSums is larger than 10,000, you can use `retstart` and `retmax` to download the entire set in batches.
* The `version` parameter allows you to retrieve version 2.0 ESummary XML, which is unique to each Entrez database and often contains more data than the default DocSum XML.


## EFetch API Documentation

The EFetch API allows you to retrieve formatted data records from various Entrez databases. It provides a flexible way to access and download information based on specific UIDs or query results stored in the Entrez History server.

### Base URL

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi
```

### Functions

* **Retrieves formatted data records for a list of input UIDs:**  You can provide a comma-separated list of UIDs from a specific Entrez database.
* **Retrieves formatted data records for a set of UIDs stored on the Entrez History server:** This allows you to retrieve data based on previous search results or saved queries.

### Required Parameters

#### Input from UID List

* **db:** The Entrez database from which to retrieve records. This must be a valid Entrez database name (e.g., pubmed, nucleotide, protein). See Table 1 in Chapter 2 for a list of available databases.
* **id:** A single UID or a comma-delimited list of UIDs from the specified database. There is no set maximum for the number of UIDs, but for large lists, consider using the HTTP POST method.

**Note for Sequence Databases (nuccore, popset, protein):** You can include a mixed list of GI numbers and accession.version identifiers in the `id` parameter.

#### Input from Entrez History Server

* **query_key:** An integer specifying the UID list associated with the given Web Environment. This value is obtained from the output of previous ESearch, EPost, or ELink calls.
* **WebEnv:** A string representing the Web Environment containing the UID list. This value is also obtained from the output of previous ESearch, EPost, or ELink calls.

### Optional Parameters

#### Retrieval Options

* **retmode:** Specifies the data format of the returned records (e.g., text, HTML, XML). See Table 1 for allowed values for each database.
* **rettype:** Specifies the record view (e.g., Abstract or MEDLINE for PubMed, GenPept or FASTA for protein). See Table 1 for allowed values for each database.
* **retstart:** The sequential index of the first record to be retrieved (default: 0). Use this in conjunction with `retmax` to download subsets of records.
* **retmax:** The total number of records to retrieve (maximum: 10,000). Use this to download large sets in batches.

#### Sequence Database Specific Parameters

* **strand:** Specifies the strand of DNA to retrieve (1 for plus strand, 2 for minus strand).
* **seq_start:** The integer coordinate of the first base to retrieve (1 represents the first base).
* **seq_stop:** The integer coordinate of the last base to retrieve.
* **complexity:** An integer value (0-4) determining the amount of data to return. This is relevant for sequence records that are part of a larger data structure.

| Complexity | Data Returned |
|---|---|
| 0 | Entire blob |
| 1 | Bioseq |
| 2 | Minimal bioseq-set |
| 3 | Minimal nuc-prot |
| 4 | Minimal pub-set |

### Examples

#### PubMed

* **Fetch PMIDs 17284678 and 9997 as text abstracts:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=17284678,9997&retmode=text&rettype=abstract
```

* **Fetch PMIDs in XML:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11748933,11700088&retmode=xml
```

#### PubMed Central

* **Fetch XML for PubMed Central ID 212403:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=212403
```

#### Nucleotide/Nuccore

* **Fetch the first 100 bases of the plus strand of GI 21614549 in FASTA format:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=21614549&strand=1&seq_start=1&seq_stop=100&rettype=fasta&retmode=text
```

* **Fetch the first 100 bases of the minus strand of GI 21614549 in FASTA format:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=21614549&strand=2&seq_start=1&seq_stop=100&rettype=fasta&retmode=text
```

* **Fetch the nuc-prot object for GI 21614549:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=21614549&complexity=3
```

* **Fetch the full ASN.1 record for GI 5:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5
```

* **Fetch FASTA for GI 5:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&rettype=fasta
```

* **Fetch the GenBank flat file for GI 5:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&rettype=gb
```

* **Fetch GBSeqXML for GI 5:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&rettype=gb&retmode=xml
```

* **Fetch TinySeqXML for GI 5:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&rettype=fasta&retmode=xml
```

#### Popset

* **Fetch the GenPept flat file for Popset ID 12829836:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=popset&id=12829836&rettype=gp
```

#### Protein

* **Fetch the GenPept flat file for GI 8:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=8&rettype=gp
```

* **Fetch GBSeqXML for GI 8:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=8&rettype=gp&retmode=xml
```

#### Sequences

* **Fetch FASTA for a transcript and its protein product (GIs 312836839 and 34577063):**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=sequences&id=312836839,34577063&rettype=fasta&retmode=text
```

#### Gene

* **Fetch full XML record for Gene ID 2:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=2&retmode=xml
```

This documentation provides a comprehensive overview of the EFetch API, including its functions, parameters, and examples. For more detailed information, refer to the official NCBI documentation.


## ELink API Documentation

The ELink API provides a way to retrieve links between UIDs in different Entrez databases or within the same database. It allows you to find related records, check for the existence of links, and retrieve LinkOut URLs.

**Base URL:**

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi
```

**Functions:**

* **Neighbor:** Returns a set of UIDs in the destination database linked to the input UIDs in the origin database.
* **Neighbor_score:** Returns a set of UIDs within the same database as the input UIDs along with computed similarity scores.
* **Neighbor_history:** Posts the output UIDs to the Entrez History server and returns a query_key and WebEnv.
* **Acheck:** Lists all links available for a set of UIDs.
* **Ncheck:** Checks for the existence of links within the same database for a set of UIDs.
* **Lcheck:** Checks for the existence of external links (LinkOuts) for a set of UIDs.
* **Llinks:** Lists the URLs and attributes for LinkOut providers that are not libraries.
* **Llinkslib:** Lists the URLs and attributes for all LinkOut providers, including libraries.
* **Prlinks:** Lists the primary LinkOut provider for each input UID, or links directly to the LinkOut provider's website for a single UID.

**Required Parameters:**

* **db:** Destination database for the link operation. Must be a valid Entrez database name (default: pubmed).
* **dbfrom:** Origin database of the link operation. Must be a valid Entrez database name (default: pubmed).
* **cmd:** ELink command mode. Specifies the function ELink will perform.
* **id:** UID list. Either a single UID or a comma-delimited list of UIDs from the database specified by dbfrom.
* **query_key:** Query key obtained from a previous ESearch, EPost, or ELInk call. Used in conjunction with WebEnv.
* **WebEnv:** Web Environment obtained from a previous ESearch, EPost, or ELInk call. Used in conjunction with query_key.

**Optional Parameters:**

* **retmode:** Retrieval type. Determines the format of the returned output (default: xml, also supports json).
* **idtype:** Specifies the type of identifier to return for sequence databases (default: GI numbers, can be set to 'acc' for accession.version identifiers).
* **linkname:** Name of the Entrez link to retrieve. Only links with the specified name will be retrieved.
* **term:** Entrez query used to limit the output set of linked UIDs.
* **holding:** Name of LinkOut provider. Only URLs for the specified provider will be returned.
* **datetype:** Type of date used to limit a link operation (only for pubmed dbfrom).
* **reldate:** Returns only items with a date within the last n days (only for pubmed dbfrom).
* **mindate, maxdate:** Date range used to limit a link operation (only for pubmed dbfrom).

**Examples:**

* **Link from protein to gene:**
    ```
    https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=protein&db=gene&id=15718680,157427902
    ```
* **Find related articles to PMID 20210808:**
    ```
    https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&db=pubmed&id=20210808&cmd=neighbor_score
    ```
* **List all possible links from two protein GIs:**
    ```
    https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=protein&id=15718680,157427902&cmd=acheck
    ```
* **Find information from clinicaltrials.gov for a PMID:**
    ```
    https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&cmd=llinkslib&id=16210666&holding=CTgov
    ```

**Entrez Links:**

A comprehensive list of available Entrez links can be found [here](https://www.ncbi.nlm.nih.gov/books/NBK25499/).

**Note:**

* For requests with more than 200 UIDs, use the HTTP POST method.
* For sequence databases, the UID list can be a mixed list of GI numbers and accession.version identifiers.

This documentation provides a comprehensive overview of the ELink API. For more detailed information, please refer to the official NCBI documentation.


## egqueryEGQuery API Documentation

### Overview

The `egqueryEGQuery` API provides a simple way to retrieve the number of records matching a given text query across all Entrez databases. This is useful for quickly estimating the size of a search result set before performing a full search.

### Base URL

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/egquery.fcgi
```

### Function

This API endpoint returns the count of records matching the provided query across all Entrez databases.

### Required Parameter

**term:** The Entrez text query. 

* **URL Encoding:** All special characters must be URL encoded.
* **Spaces:** Spaces can be replaced by '+' signs.
* **Long Queries:** For very long queries (more than several hundred characters), consider using an HTTP POST call.
* **Search Fields and Tags:** Refer to the PubMed or Entrez help for information about search field descriptions and tags. These are database specific.

### Example Request

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/egquery.fcgi?term=asthma
```

### Response Format

The API returns a simple XML response containing the number of records matching the query.

**Example Response:**

```xml
<?xml version="1.0"?>
<!DOCTYPE eGQueryResult PUBLIC "-//NLM//DTD eGQueryResult, 20030314//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/eGQueryResult.dtd">
<eGQueryResult>
  <Count>12345</Count>
</eGQueryResult>
```

In this example, the `Count` element indicates that there are 12345 records matching the query "asthma" across all Entrez databases.

### Notes

* The `egqueryEGQuery` API is a simple and efficient way to get a quick estimate of the size of a search result set.
* For more complex searches or specific database queries, consider using other Entrez APIs like `esearch` or `efetch`.
* Refer to the Entrez documentation for detailed information about search fields, tags, and other API options.


## eSpell API Documentation

### Overview

The eSpell API provides spelling suggestions for terms within a single text query in a given Entrez database. This can be helpful for improving search accuracy and finding relevant results.

### Base URL

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/espell.fcgi
```

### Required Parameters

| Parameter | Description | Example |
|---|---|---|
| `db` | The Entrez database to search. Must be a valid Entrez database name. | `pubmed` |
| `term` | The Entrez text query. All special characters must be URL encoded. Spaces can be replaced by '+' signs. For very long queries, consider using an HTTP POST call. | `asthmaa+OR+alergies` |

**Note:** For information about search field descriptions and tags, refer to the PubMed or Entrez help documentation. Search fields and tags are database specific.

### Example Request

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/espell.fcgi?db=pubmed&term=asthmaa+OR+alergies
```

This request searches the PubMed database for spelling suggestions for the query "asthmaa OR allergies".

### Response Format

The eSpell API returns an XML response containing the following elements:

* **Query**: The original search query.
* **Suggestions**: A list of spelling suggestions for the query. Each suggestion includes:
    * **Term**: The suggested term.
    * **Score**: A score indicating the likelihood of the suggestion being correct.

### Example Response

```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE eSpell PUBLIC "-//NLM//DTD eSpell 2.0//EN" "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/espell.dtd">
<eSpell>
  <Query>asthmaa OR allergies</Query>
  <Suggestions>
    <Suggestion>
      <Term>asthma OR allergies</Term>
      <Score>100</Score>
    </Suggestion>
  </Suggestions>
</eSpell>
```

This response suggests "asthma OR allergies" as a possible correction for the original query.

### Additional Information

* The eSpell API is case-insensitive.
* The `db` parameter defaults to `pubmed` if not specified.
* For more information about Entrez databases and search fields, visit the [Entrez website](https://www.ncbi.nlm.nih.gov/entrez/).


## eCitMatch API Documentation

This documentation outlines the eCitMatch API, a service provided by NCBI's Entrez system for retrieving PubMed IDs (PMIDs) based on input citation strings.

### Base URL

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/ecitmatch.cgi
```

### Function

The eCitMatch API retrieves PMIDs corresponding to a set of input citation strings. It utilizes a specific format for representing citations, allowing for efficient matching against the PubMed database.

### Required Parameters

The following parameters are mandatory for using the eCitMatch API:

* **db:** Specifies the database to search. Currently, only 'pubmed' is supported.
* **rettype:** Defines the retrieval type. Only 'xml' is supported, indicating that the output will be in XML format.
* **bdata:** Contains the citation strings. Each citation must adhere to the following format:

```
journal_title|year|volume|first_page|author_name|your_key|
```

**Explanation of fields:**

* **journal_title:** The title of the journal. Spaces should be replaced with '+' symbols.
* **year:** The year of publication.
* **volume:** The volume number.
* **first_page:** The first page number of the article.
* **author_name:** The name of the first author.
* **your_key:** An arbitrary label provided by the user to identify the citation. This label will be included in the output.

**Multiple citations:**

To submit multiple citations, separate them with a carriage return character (`%0D`).

**Example:**

```
proc+natl+acad+sci+u+s+a|1991|88|3248|mann+bj|Art1|%0Dscience|1987|235|182|palmenberg+ac|Art2|
```

This example provides two citations, one for an article in "Proceedings of the National Academy of Sciences, USA" and another for an article in "Science".

### Output Format

The eCitMatch API returns an XML response containing the matched PMIDs. The output includes the provided 'your_key' for each citation, allowing you to easily associate the PMIDs with their corresponding input citations.

### Example Request

```
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/ecitmatch.cgi?db=pubmed&retmode=xml&bdata=proc+natl+acad+sci+u+s+a|1991|88|3248|mann+bj|Art1|%0Dscience|1987|235|182|palmenberg+ac|Art2|
```

This request searches the PubMed database for the two citations provided in the 'bdata' parameter and returns the corresponding PMIDs in XML format.

### Conclusion

The eCitMatch API provides a convenient way to retrieve PMIDs based on citation strings. By adhering to the specified format and using the required parameters, you can efficiently match citations against the PubMed database and obtain the relevant PMIDs for further research or analysis.
