In [None]:
import ast
from IPython.display import display, Markdown

The documentation for Hutch now uses [Nextra](https://nextra.site), which renders markdown into pages. I liked using Sphinx because it automatically populated an API reference for you if you used the autoapi plugin. This is a nice behaviour that would be useful if the DRS converges on Nextra as a solution.

I don't know how to write a plugin for a static site generator, but I do know how to make something that just makes markdown files that you can drop in. I've already written a solution that makes the folder structure and inserts a header for any source code files. I had a hacky solution that renders python entities as markdown, but it was a bit rubbish. I've tried an improved version that goes through the python abstract syntax tree using `ast` from the standard library. I'll explain it using a python script that handily contains a method and a class.

First we read the source as a string

In [24]:
with open("/Users/james/Documents/Code/lettuce/Lettuce/omop/OMOP_match.py", "r") as f:
    omop_source = f.read()

omop_source[:300]

'import re\nfrom os import environ\nfrom urllib.parse import quote_plus\nfrom typing import List\n\nimport pandas as pd\nfrom dotenv import load_dotenv\nfrom rapidfuzz import fuzz\nfrom sqlalchemy import create_engine\nfrom sqlalchemy.orm import sessionmaker\nfrom omop.omop_models import build_query\n\nfrom logg'

`ast.parse` parses the code and gives you the syntax tree in the `.body` attribute of the resulting object.

In [25]:
omop_tree = ast.parse(omop_source)

In [26]:
omop_tree.body

[<ast.Import at 0x13ddbbf90>,
 <ast.ImportFrom at 0x13d7eba50>,
 <ast.ImportFrom at 0x13ddbbf10>,
 <ast.ImportFrom at 0x13ddbbe90>,
 <ast.Import at 0x13ddbbe10>,
 <ast.ImportFrom at 0x13ddbbd90>,
 <ast.ImportFrom at 0x13ddbbd10>,
 <ast.ImportFrom at 0x13ddbbc90>,
 <ast.ImportFrom at 0x13ddbbc10>,
 <ast.ImportFrom at 0x13ddbbb90>,
 <ast.ImportFrom at 0x13ddbbb10>,
 <ast.ImportFrom at 0x13ddbba90>,
 <ast.ClassDef at 0x13ddbba10>,
 <ast.FunctionDef at 0x13d9c7550>]

The objects in the `body` list are, in this case, imports, a class definition, and a function definition. Let's start with the function. We can access its name with the `.name` attribute.

In [29]:
omop_functions = [x for x in omop_tree.body if isinstance(x, ast.FunctionDef)]
for fun in omop_functions:
    print(fun.name)

run


## Functions

There's a function, `ast.get_docstring` that retrieves the docstring for a definition. Here's `run`'s docstring.

In [41]:
ast.get_docstring(omop_functions[0])

'Runs queries against the OMOP database\n\nLoads the query options from BaseOptions, then uses these to select which queries to run.\n\nParameters\n----------\nvocabulary_id: list[str]\n    A list of vocabularies to use for search\nconcept_ancestor: bool\n    Whether to return ancestor concepts in the result\nconcept_relationship: bool\n    Whether to return related concepts in the result\nconcept_synonym: bool\n    Whether to explore concept synonyms in the result\nsearch_threshold: int\n    The fuzzy match threshold for results\nmax_separation_descendant: int\n    The maximum separation between a base concept and its descendants\nmax_separation_ancestor: int\n    The maximum separation between a base concept and its ancestors\nsearch_term: str\n    The name of a drug to use in queries to the OMOP database\nlogger: Logger\n    A logger for logging runs of the tool\n\nReturns\n-------\nlist\n    A list of OMOP concepts relating to the search term and relevant information'

We can define our first two functions for rendering python documentation with these. First, we can render the name of the `run` function. There's stuff here that rendering classes and methods will use later. I want my functions to start with an h3 header, and the name in plaintext, followed by opening a codeblock with
```
def some_name
```

In [165]:
def render_name(name: str, entity_type: str = "") -> str:
    """
    Renders the name of an entity for markdown

    Parameters
    ----------
    name: str
        The name of the entity
    entity_type: str
        The kind of entity we're rendering. "method" and "class" have slightly different formatting.
        The default is just a def, for functions

    Returns
    -------
    str
        The name of the entity, formatted for markdown
    """
    if entity_type == "method":
        return f"#### `{name}`\n```\nmethod {name}"
    elif entity_type == "class":
        return f"### `{name}`\n```\nclass {name}"
    else:
        return f"### `{name}`\n```\ndef {name}"

render_name("run")

'### `run`\n```\ndef run'

Next, we want to be able to render the docstring of a function. For mine, I have some text describing the function, then

```
Parameters
----------
```

delineating the start of the description of the parameters and

```
Returns
-------
```
delineating the start of the description of what the function returns. As I will have class methods as a subsection of a class, we need to be able to define what level of header to have the parameter and returns subsections at.

In [173]:
def render_docstring(docstring: str, header_level: int = 4, parameters_string: str = "Parameters", returns_string: str = "Returns"):
    """
    Renders the docstring of an entity for markdown.
    The header level of the subsections are configurable.

    Parameters
    ----------
    docstring: str
        The docstring to render
    header_level: int
        The level of header that the Parameters and Returns should be rendered at
    parameters_string: str
        The string used to delineate the parameters section
    returns_string: str
        The string used to delineate the returns section

    Returns
    -------
    str
        The markdown for the docstring
    """
    if (parameters_string in docstring) & (returns_string in docstring):
        result, func_sig_string = docstring.split(parameters_string)
        pars_string, ret_string = func_sig_string.split(returns_string)
        result += "#" * header_level + f" {parameters_string}\n"
        for line in pars_string.split("\n"):
            if any([char.isalnum() for char in line]):
                result += line.strip() + "\n\n"
        result += "#" * header_level + f" {returns_string}\n"
        for line in ret_string.split("\n"):
            if any([char.isalnum() for char in line]):
                result += line.strip() + "\n\n"
    else:
        result = docstring
    return result

render_docstring(ast.get_docstring(omop_functions[0]))

'Runs queries against the OMOP database\n\nLoads the query options from BaseOptions, then uses these to select which queries to run.\n\n#### Parameters\nvocabulary_id: list[str]\n\nA list of vocabularies to use for search\n\nconcept_ancestor: bool\n\nWhether to return ancestor concepts in the result\n\nconcept_relationship: bool\n\nWhether to return related concepts in the result\n\nconcept_synonym: bool\n\nWhether to explore concept synonyms in the result\n\nsearch_threshold: int\n\nThe fuzzy match threshold for results\n\nmax_separation_descendant: int\n\nThe maximum separation between a base concept and its descendants\n\nmax_separation_ancestor: int\n\nThe maximum separation between a base concept and its ancestors\n\nsearch_term: str\n\nThe name of a drug to use in queries to the OMOP database\n\nlogger: Logger\n\nA logger for logging runs of the tool\n\n#### Returns\nlist\n\nA list of OMOP concepts relating to the search term and relevant information\n\n'

The next thing we want to have in our documentation is the arguments to the function/method. Here's the structure of `run`'s arguments:

In [175]:
omop_functions[0].args.args

[<ast.arg at 0x13d9c7290>,
 <ast.arg at 0x13d9c6b50>,
 <ast.arg at 0x13d9c6a50>,
 <ast.arg at 0x13d9c6090>,
 <ast.arg at 0x13d9c56d0>,
 <ast.arg at 0x13dde6f50>,
 <ast.arg at 0x13dde6e90>,
 <ast.arg at 0x13dde6c50>,
 <ast.arg at 0x13dde7990>]

There's a list of `arg` objects in there. These are the attributes of an arg.

In [179]:
[attr for attr in dir(omop_functions[0].args.args[0]) if "_" not in attr]

['annotation', 'arg', 'lineno']

The `arg` attribute is the name of the argument, and the `annotation` is the type annotation. Ideally we want both. The `arg` is just a string, but a type annotation is defined recursively. For example, you can have

```python
List[Dict[str, Any]]
```

Where the main type annotation is that it's a `List`.

If the annotation is a single type (`ast.Name`), then the `.id` returns the type, like `str`. However, if the annotation is more complex, the annotation is an `ast.Subscript`. This has a `.slice.id` describing that level's type. It also has a `.value`, which contains more annotations. This means we can recursively call a function that builds annotations. There might be other types of annotation which I haven't come across in the `ast` documentation, so I'm just having it return an empty string in these cases.

I've put the functions for rendering arguments together with the `render_name` and `render_docstring` functions into `build_function_md`. Here's what that looks like for `run`

In [161]:
def build_annotation(annotation):
    if isinstance(annotation, ast.Name):
        return annotation.id
    elif isinstance(annotation, ast.Subscript):
        return f"{annotation.slice.id}[{build_annotation(annotation.value)}]"
    else:
        return ""
        
def render_args(function_args: ast.arguments):
    result = ""
    args = [arg.arg for arg in function_args.args]
    annotations = [build_annotation(x) for x in [arg.annotation for arg in function_args.args]]
    for arg, annotation in zip(args, annotations):
        result += f"\t{arg}: {annotation}\n\n"
    return result
        


def build_function_md(function_definition: ast.FunctionDef):
    result = ""
    result += render_name(function_definition.name)
    result += "(\n"
    result += render_args(function_definition.args)
    result += ")\n```"
    result += "\n\n"
    result += render_docstring(ast.get_docstring(function_definition))

    return result

test_function_md = build_function_md(omop_functions[0])
display(Markdown(test_function_md))

### `run`
```
def run(
	search_term: str[List]

	logger: Logger

	vocabulary_id: str[list]

	search_threshold: int

	concept_ancestor: bool

	concept_relationship: bool

	concept_synonym: bool

	max_separation_descendant: int

	max_separation_ancestor: int

)
```

Runs queries against the OMOP database

Loads the query options from BaseOptions, then uses these to select which queries to run.

#### Parameters
vocabulary_id: list[str]

A list of vocabularies to use for search

concept_ancestor: bool

Whether to return ancestor concepts in the result

concept_relationship: bool

Whether to return related concepts in the result

concept_synonym: bool

Whether to explore concept synonyms in the result

search_threshold: int

The fuzzy match threshold for results

max_separation_descendant: int

The maximum separation between a base concept and its descendants

max_separation_ancestor: int

The maximum separation between a base concept and its ancestors

search_term: str

The name of a drug to use in queries to the OMOP database

logger: Logger

A logger for logging runs of the tool

#### Returns
list

A list of OMOP concepts relating to the search term and relevant information



## Classes

So much for functions. Classes are a bit more complicated. We have a class definition to use here, `OMOPMatcher`.

In [118]:
omop_classes = [x for x in omop_tree.body if isinstance(x, ast.ClassDef)]
omop_classes

[<ast.ClassDef at 0x13ddbba10>]

In [120]:
print(omop_classes[0].name)
ast.get_docstring(omop_classes[0])

OMOPMatcher


'This class retrieves matches from an OMOP database and returns the best'

We can re-use the code from above to `build_class_md`. This renders the name of the class similarly to a function. Then, it uses the `__init__` to provide a description of the class constructor. Annoyingly, this currently includes the `self` parameter. I'm tired of writing this, so I'm not fixing it right now. As this makes markdown files, we can just manually edit this out. It then renders the methods of the class under a Methods subheading.

In [181]:
def build_class_md(class_definition: ast.FunctionDef):
    methods = [x for x in class_definition.body if isinstance(x, ast.FunctionDef)]
    init_method = [x for x in methods if x.name == "__init__"][0]
    
    result = ""
    result += render_name(class_definition.name, entity_type="class")
    result += "(\n"
    result += render_args(init_method.args)
    result += ")\n```\n\n"

    result += render_docstring(ast.get_docstring(class_definition))
    if len(methods) > 1:
        result += "\n### Methods"
        for method in methods:
            result += "\n\n"
            result += render_name(method.name, entity_type = "method")
            result += "(\n"
            result += render_args(method.args)
            result += ")\n```"
            result += "\n\n"
            if ast.get_docstring(method):
                result += render_docstring(ast.get_docstring(method), header_level = 5)

    return result

display(Markdown(build_class_md(omop_classes[0])))

### `OMOPMatcher`
```
class OMOPMatcher(
	self: 

	logger: Logger

)
```

This class retrieves matches from an OMOP database and returns the best
### Methods

#### `__init__`
```
method __init__(
	self: 

	logger: Logger

)
```



#### `close`
```
method close(
	self: 

)
```

Close the engine connection.

#### `calculate_best_matches`
```
method calculate_best_matches(
	self: 

	search_terms: str[List]

	vocabulary_id: 

	concept_ancestor: bool

	concept_relationship: bool

	concept_synonym: bool

	search_threshold: int

	max_separation_descendant: int

	max_separation_ancestor: int

)
```

Calculate best OMOP matches for given search terms

Calls fetch_OMOP_concepts on every item in search_terms.

##### Parameters
search_terms: List[str]

A list of queries to send to the OMOP database

vocabulary_id: str

An OMOP vocabulary_id to pass to the OMOP query to restrict the concepts received to a specific vocabulary

concept_ancestor: bool

If 'y', then calls fetch_concept_ancestor()

concept_relationship: bool

If 'y', then calls fetch_concept_relationship()

concept_synonym: bool

If 'y', then queries the synonym table of the OMOP database for matches to the search terms

search_threshold: int

The threshold on fuzzy string matching for returned results

max_separation_descendant: int

The maximum separation to search for concept descendants

max_separation_ancestor: int

The maximum separation to search for concept ancestors

##### Returns
list

A list of results for the search terms run with the other parameters provided.



#### `fetch_OMOP_concepts`
```
method fetch_OMOP_concepts(
	self: 

	search_term: str

	vocabulary_id: 

	concept_ancestor: bool

	concept_relationship: bool

	concept_synonym: bool

	search_threshold: int

	max_separation_descendant: int

	max_separation_ancestor: int

)
```

Fetch OMOP concepts for a given search term

Runs queries against the OMOP database
If concept_synonym != 'y', then a query is run that queries the concept table alone. If concept_synonym == 'y', then this search is expanded to the concept_synonym table.

Any concepts returned by the query are then filtered by fuzzy string matching. Any concepts satisfying the concept threshold are returned.

If the concept_ancestor and concept_relationship arguments are 'y', the relevant methods are called on these concepts and the result added to the output.

##### Parameters
search_term: str

A search term for a concept inserted into a query to the OMOP database

vocabulary_id: list[str]

A list of OMOP vocabularies to filter the findings by

concept_ancestor: str

If 'y' then appends the results of a call to fetch_concept_ancestor to the output

concept_relationship: str

If 'y' then appends the result of a call to fetch_concept_relationship to the output

concept_synonym: str

If 'y', checks the concept_synonym table for the search term

search_threshold: int

The threshold on fuzzy string matching for returned results

max_separation_descendant: int

The maximum separation to search for concept descendants

max_separation_ancestor: int

The maximum separation to search for concept ancestors

##### Returns
list | None

A list of search results from the OMOP database if the query comes back with results, otherwise returns None



#### `fetch_concept_ancestor`
```
method fetch_concept_ancestor(
	self: 

	concept_id: str

	max_separation_descendant: int

	max_separation_ancestor: int

)
```

Fetch concept ancestor for a given concept_id

Queries the OMOP database's ancestor table to find ancestors for the concept_id provided within the constraints of the degrees of separation provided.

##### Parameters
concept_id: str

The concept_id used to find ancestors

max_separation_descendant: int

The maximum level of separation allowed between descendant concepts and the provided concept

max_separation_ancestor: int

The maximum level of separation allowed between ancestor concepts and the provided concept

##### Returns
list

A list of retrieved concepts and their relationships to the provided concept_id



#### `fetch_concept_relationship`
```
method fetch_concept_relationship(
	self: 

	concept_id: 

)
```

Fetch concept relationship for a given concept_id

Queries the concept_relationship table of the OMOP database to find the relationship between concepts

##### Parameters
concept_id: str

An id for a concept provided to the query for finding concept relationships

##### Returns
list

A list of related concepts from the OMOP database



Plugging this code into the `populate_python` function of my markdown docs builder makes more useful files.