# Watchful Python API

This notebock introduces the Watchful Python API with some examples.
Be sure to check other available documentation as there's more you can do.

Everything here is in Python; you can use the code segments outside a Jupyter notebook as well.

By now, your hosted Watchful application instance should be spun up by your ops team, already up and running.
For the purpose of experimenting with the API, you will connect to your hosted Watchful application instance in this notebook.

The server output is logged in your hosted Watchful application's mounted volume.
This can be useful to look at if anything goes wrong.

It's good to be aware that the log file contains a complete record of your session; so if you import a lot of data, the log file will be correspondingly big, and you might want to delete it to save space.

We recommend the Python 3.8.12 environment as it is used to build the SDK and run this notebook.
Generally, Python >=3.7 and <=3.10.9 should work.

## Installing dependencies and Watchful SDK

We first install the Python dependencies for running this notebook, and also the Watchful SDK which provides a suite of API for you to interact with your hosted Watchful application instance.

In [1]:
# Install the dependencies
import sys
!{sys.executable} -m pip install -r requirements_api_intro.txt

In [2]:
# Import Watchful SDK
import watchful as w
w.__version__

'1.1.3'

## Connecting to your already-running Watchful application instance

If you're creating hinters in an existing project, you'd need to connect to your already-running Watchful application instance and its currently active project.

Here's what that would look like:

In [3]:
# Connect to your hosted Watchful application instance
host = "your.watchful.application.host"  # change this string to your actual host
port = "9001"
w.external(host, port)

We can do a sanity check here by calling `w.get()`.
After you've connected to your hosted Watchful application instance, this function can be called anytime you like to check on its status.

As you can see in the output below, `w.get()` returns a response that contains information such as your currently active project, dataset examples (candidates) and classes, hinters created, hand labels and label distribution, confidences and error rate, recall and precision and many more.

In [4]:
import pprint
pp = pprint.PrettyPrinter(indent=4).pprint
pp(w.get())

{   'auto_complete': {'end': 0, 'start': 0, 'values': []},
    'cand_seq_full': 24,
    'cand_seq_prefix': 24,
    'candidates': [   {   'fields': [   "I'm not happy with the service, what "
                                        'do I have to do to submit a consumer '
                                        'complaint?',
                                        'FEEDBACK'],
                          'matches': {   'hats': [   [   [0, 8, None, None, []],
                                                         [   8,
                                                             13,
                                                             None,
                                                             [   [   8,
                                                                     13,
                                                                     'Something_Else',
                                                                     67]],
                                      

Alternatively, you could have listed your projects and then open the project you're interested in.
You can do this with the steps below.

First, we list the projects using `w.list_projects()`.
It shows every `*.hints` file in your watchful projects directory.

In [5]:
w.list_projects()

[{'path': '/root/watchful/projects/1664235417.be7863f8-008d-4f0d-9993-2f16d39a3c24',
  'shared': True,
  'title': 'Untitled Project 2022-09-22'},
 {'path': '/root/watchful/2022-09-28.hints',
  'shared': False,
  'title': 'Untitled Project 2022-09-28'},
 {'path': '/root/watchful/projects/1664395633.d8e2fa69-aa08-4b5f-8df4-23a6295b76b9',
  'shared': True,
  'title': 'Untitled Project 2022-09-28'},
 {'path': '/root/watchful/projects/1664395695.e84d28cd-264f-4e72-be87-30bdee029cc2',
  'shared': True,
  'title': 'test project'}]

Next, we open the project you're interested in by passing its path to `w.open_project(...)`.

Note that you can also open a hints file outside of the Watchful project directory, if there ever is a need.

In [6]:
w.open_project("/root/watchful/projects/1664235417.be7863f8-008d-4f0d-9993-2f16d39a3c24")

b'"OK"\n'

In [7]:
pp(w.get())

{   'auto_complete': {'end': 0, 'start': 0, 'values': []},
    'cand_seq_full': 24,
    'cand_seq_prefix': 24,
    'candidates': [   {   'fields': [   "I'm not happy with the service, what "
                                        'do I have to do to submit a consumer '
                                        'complaint?',
                                        'FEEDBACK'],
                          'matches': {   'hats': [   [   [0, 8, None, None, []],
                                                         [   8,
                                                             13,
                                                             None,
                                                             [   [   8,
                                                                     13,
                                                                     'Something_Else',
                                                                     67]],
                                      

If you're not using the UI to manage your projects, and want to do everything using the API, you can create a new project with `w.create_project()`.
It will additionally be opened automatically, so you don't need to call `w.open_project(...)`.
You can give it a title with `w.title("My Project")`.
We will show this in a later section below.

We recommend using the UI and the API at the same time, so you can connect to your hosted Watchful application via the API in a notebook or from plain Python, and at the same time also visualize your work in the UI.

# A walk-through of the API

Now that you're connected to your Watchful application, let's take a look at the key parts of the API that are most commonly used.

## `get()`

You can use `w.get()` at any time to get the current status.
The structure returned is called the `summary`, and it is fairly complete.
We've already used `w.get()` earlier as a sanity check after connecting to your Watchful application instance, we'd just be explaining more about it below.

It's worth noting that:
- The frontend of your Watchful application gets everything that it displays from this same `summary` object, so if you see it in the frontend, you can find it in the `summary`.
- The `summary` object is returned from every API call, not just `w.get()`, so if you call any oher function that sends a request to your Watchful application, you'd always be returned with the `summary` object.
- The fields of the `summary` object will always be there.

In [8]:
w.get().keys()

dict_keys(['auto_complete', 'cand_seq_full', 'cand_seq_prefix', 'candidates', 'classes', 'datasets', 'disagreements', 'error_msg', 'error_verb', 'export_preview', 'exports', 'field_names', 'hand_labels', 'hinters', 'messages', 'n_candidates', 'n_handlabels', 'ner_hl_text', 'notifications', 'precision_candidate', 'project_config', 'project_id', 'published_title', 'pull_actions', 'push_actions', 'query', 'query_breakdown', 'query_completed', 'query_end', 'query_examined', 'query_full_rows', 'query_history', 'query_hit_count', 'query_page', 'selected_class', 'selections', 'show_notification_badge', 'state_seq', 'status', 'suggestion', 'suggestions', 'title', 'unlabeled_candidate', 'watchful_home'])

It's also worth mentioning a couple of these fields that are especially useful.

The `status` field tells you whether the backend is doing work or not, and as we can see here it is "current", which is usually what you want.
If it is "working", then the backend is still doing some work, and you can expect that some things may change.
An example is creating a hinter, as we'll do below, when you can see that the `summary` object returns immediately with a status of "working", and the hinter is still in the progress of being fully applied to all the candidates in the background, at which point it will go back to "current".

The `error_msg` field reports the error information if there is any error.
If there is a value in this field, it means the API request did not succeed, so check this field when appropriate.

## Loading and querying data

If you want to edit the notebook here and load a CSV file from your own computer, here is how it is done. If you have a CSV file containing your data, you can open it and read it into the `csv` variable below.

In [None]:
# csv = open("~/path/to/data.csv").read()

As we just want to focus on the API without getting distracted by real data, we'll use integers from 1 to 1000 as a toy example. 

We generate the integers from 1 to 1000 and add newlines to separate them as required by CSV formatting, so that we end up with a minimal CSV file containing a single column with the integers.

This will be the dataset that we'll be working with.

In [9]:
csv = ""
for i in range(1000):
    csv += str(i) + "\n"

Now, you can create a new project with `w.create_project()`, then give it the title "My Project" by calling `w.title("My Project")`.

The `summary` below is empty because we don't have any data yet, but it shows the fields that are always there.

In [10]:
w.create_project()

b'"OK"\n'

In [11]:
w.title("My Project")

{'auto_complete': {'end': 0, 'start': 0, 'values': []},
 'cand_seq_full': 0,
 'cand_seq_prefix': 0,
 'candidates': [],
 'classes': {},
 'datasets': [],
 'disagreements': [],
 'error_msg': None,
 'error_verb': None,
 'export_preview': None,
 'exports': [],
 'field_names': [],
 'hand_labels': None,
 'hinters': [],
 'messages': [],
 'n_candidates': 0,
 'n_handlabels': 0,
 'ner_hl_text': None,
 'notifications': [],
 'precision_candidate': {'candidate': [], 'mode': 'ner'},
 'project_config': {},
 'project_id': '2022-09-30.hints',
 'published_title': None,
 'pull_actions': [],
 'push_actions': [],
 'query': '',
 'query_breakdown': {'depths': [], 'hits': [], 'offsets': [], 'values': []},
 'query_completed': True,
 'query_end': True,
 'query_examined': 0,
 'query_full_rows': False,
 'query_history': {'hits': [], 'values': []},
 'query_hit_count': 0,
 'query_page': 0,
 'selected_class': '',
 'selections': [],
 'show_notification_badge': False,
 'state_seq': 1836,
 'status': 'current',
 'suggest

Next, you can import the CSV data that was created earlier into your watchful application with `w.records(...)`.

Since the `summary` object is always returned from any API call, we just directly extract the `n_candidates` field to make sure that importing the data worked, and we expect to see 1000 candidates loaded.

If instead of 1000, you obtained another number between 0 to 1000, it is because the `status` of your watchful application is still "working" (recall earlier explanation on the `status` field of the `summary` object). Therefore, you can call `w.get()` again and extract the `n_candidates` field; it should give you 1000 by then.

In [12]:
# The output may be less than 1000 if your watchful application is still processing the data
w.records(csv)["n_candidates"]

1000

In [13]:
w.get()["n_candidates"]

1000

When we import it, your Watchful application will give the first (and only) column the default field name "F1". You can verify this by extracting the `field_names` field from the `summary` object.

In [14]:
w.get()["field_names"]

['F1']

Now, we can do a query on the loaded data using `w.query(...)`.

Since our data is just numbers, we'll just look for occurrences of the regex `/88/`, that is, two eight digits in a row.
If you like these kinds of puzzles, take a moment to guess how many numbers there are between 1 and 1000 that have the digit pattern "88" somewhere in them.

We'll print the whole summary object here, so we can see what it looks like with the integers data.

In [15]:
w.query("/88/")

{'auto_complete': {'end': 0, 'start': 0, 'values': []},
 'cand_seq_full': 1,
 'cand_seq_prefix': 1,
 'candidates': [{'fields': ['88'],
   'matches': {'hats': [[[0, 2, None, None, [[0, 2, 'h']]]]]}},
  {'fields': ['188'],
   'matches': {'hats': [[[0, 3, None, None, [[1, 3, 'h']]]]]}},
  {'fields': ['288'],
   'matches': {'hats': [[[0, 3, None, None, [[1, 3, 'h']]]]]}},
  {'fields': ['388'],
   'matches': {'hats': [[[0, 3, None, None, [[1, 3, 'h']]]]]}},
  {'fields': ['488'],
   'matches': {'hats': [[[0, 3, None, None, [[1, 3, 'h']]]]]}},
  {'fields': ['588'],
   'matches': {'hats': [[[0, 3, None, None, [[1, 3, 'h']]]]]}},
  {'fields': ['688'],
   'matches': {'hats': [[[0, 3, None, None, [[1, 3, 'h']]]]]}},
  {'fields': ['788'],
   'matches': {'hats': [[[0, 3, None, None, [[1, 3, 'h']]]]]}}],
 'classes': {},
 'datasets': ['9666b00c-d6fc-442f-b9f7-0ab195efe3d4'],
 'disagreements': [],
 'error_msg': None,
 'error_verb': None,
 'export_preview': None,
 'exports': [],
 'field_names': ['F1'],

We can see the `query_examined` is 1000, which means your watchful application finished searching all the data before returning the `summary` object.

Often we work with larger data sets and so typically the query will continue to run in the background, but since this data set is so small, it returns all the results.
We also see `query_hit_count` is 19, which is the answer to our puzzle above (ten for 88, 188, ... 988, and ten from 880-889, but we counted 888 twice, so only 19 not 20).

Finally we see that eight candidates are returned that match the query.
This is a sample out of the 19 matched candidates.

## Set base rate, create hinter and check for matches

Let's look at the API for setting the base rate for a class, and creating a hinter for that class.

Either setting a base rate or creating a hinter will create the class in your Watchful application.

We'll create a class of interesting numbers, which we name `Interesting`, with a base rate of 10%. This means that we expect 10% of the data to the interesting.

We deem numbers ending in "7" as interesting, so we'll create a hinter for those with a weight of 80%. This means that we expect 80% of the numbers matched by this hinter to be interesting.

We then do a query, print the first matching candidate, and check that it is indeed a candidate the hinter should match.

In [16]:
w.base_rate("Interesting", 10)
w.hinter("Interesting", "/7$/", 80)
w.query("/7$/")["candidates"][0]

{'fields': ['7'], 'matches': {'hats': [[[0, 1, None, None, [[0, 1, 'h']]]]]}}

In [17]:
w.query("/7$/")

{'auto_complete': {'end': 0, 'start': 0, 'values': []},
 'cand_seq_full': 7,
 'cand_seq_prefix': 7,
 'candidates': [{'fields': ['7'],
   'matches': {'hats': [[[0, 1, None, None, [[0, 1, 'h']]]]]}},
  {'fields': ['17'],
   'matches': {'hats': [[[0, 2, None, None, [[1, 2, 'h']]]]]}},
  {'fields': ['27'],
   'matches': {'hats': [[[0, 2, None, None, [[1, 2, 'h']]]]]}},
  {'fields': ['37'],
   'matches': {'hats': [[[0, 2, None, None, [[1, 2, 'h']]]]]}},
  {'fields': ['47'],
   'matches': {'hats': [[[0, 2, None, None, [[1, 2, 'h']]]]]}},
  {'fields': ['57'],
   'matches': {'hats': [[[0, 2, None, None, [[1, 2, 'h']]]]]}},
  {'fields': ['67'],
   'matches': {'hats': [[[0, 2, None, None, [[1, 2, 'h']]]]]}},
  {'fields': ['77'],
   'matches': {'hats': [[[0, 2, None, None, [[1, 2, 'h']]]]]}}],
 'classes': {'Interesting': {'br_given': 10,
   'br_pdf': [0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    100,
    99,
    97,
    96,
    95,
    94,
    92,
    89,
    87,
    85,
    80,
    77

We can see all the hinters we have by looking at "hinters" on the `summary`. Currently, we only have the hinter that looks for `/7$/`, so we should have that one hinter returned.

Besides its name being "Interesting" and weight being 80, you can also see that it hits 100 out of the 1000 integers.

In [18]:
w.get()['hinters']

[{'hit_ratio': [100, 1000],
  'hl_ratio': [0, 0],
  'id': 1,
  'label': 'Interesting',
  'name': '',
  'query': '/7$/',
  'suggested': False,
  'weight': 80}]

## Creating a hinter from external data

Sometimes you can't extract all the data you want using a regex, or you need to query an external data source for a hinter.
"External hinters" let you provide the hint values yourself for each candidate.

Since we are looking for interesting numbers, prime numbers might also be interesting.
Perhaps it is possible to write a regex to recognize prime numbers, but it doesn't sound like a good idea, so let's use an external hinter!

In [19]:
# get the hinters
w.get()["hinters"]

[{'hit_ratio': [100, 1000],
  'hl_ratio': [0, 0],
  'id': 1,
  'label': 'Interesting',
  'name': '',
  'query': '/7$/',
  'suggested': False,
  'weight': 80}]

In [20]:
w.external_hinter("Interesting", "prime", 90)["hinters"]

[{'hit_ratio': [100, 1000],
  'hl_ratio': [0, 0],
  'id': 1,
  'label': 'Interesting',
  'name': '',
  'query': '/7$/',
  'suggested': False,
  'weight': 80},
 {'hit_ratio': [0, 0],
  'hl_ratio': [0, 0],
  'id': 2,
  'label': 'Interesting',
  'name': 'prime',
  'query': '[external]',
  'suggested': False,
  'weight': 90}]

We called `w.external_hinter(...)` which takes the class name, the name of the hinter, and the hinter weight at 90%.
We are saying that prime numbers are 90% likely to be interesting.
Why not 100%?
Remember that we set our base rate to 10%, which means we are looking specifically for the 10% of numbers that are the *most* interesting, so some of the prime numbers might not make the cut.
A number that is both prime *and* ending in a 7 would match both of our hinters and would be boosted much more as an interesting number; so we'll expect to see those numbers if for example, we take the top 10% of the probabilistic labels for the `Interesting` class.

In the output above, after creating the external hinter, you can see that there is now a hinter in the list with name "prime" and a query "\[external\]".
Compare this to the hinter we created before which has the query `/7$/`.
This is the difference between a regular hinter, which is defined by what matches a query, and an external hinter.

This is also why it's important that we give our external hinters names.
If we look at the first hinter, we can see what it's doing because there's a query there, but for the second hinter, if we had several external hinters, we would have no idea what this one is doing if we hadn't given it a name.

The `w.external_hinter(...)` API call provides "\[external\]" as the query value, which not a real query but a special value that Watchful recognizes.
Unlike regular hinters which are applied immediately, an external hinter will not be applied until we send the hint values, either true or false per candidate for all of the candidates.

Note that the `hit_ratio` in a hinter, seen above, indicates both the number of positive matches (the first number) and the number of candidates that have been examined so far (the second number).
In our new external hinter, both of these numbers will be 0, because the system is waiting for us to give it these values.
With a regular hinter that is defined by a query, you can see both these numbers go up as the entire dataset is queried, just like with the `query_examined` and `query_hit_count` values that we saw before.

Now we'll provide the values for this hinter.

In [21]:
# We iterate over all candidates, store the hint values as we go, 
# then provide all the candidates' hint values at once with `w.hint_all(...)`
hint_values = []
for candidate in w.dump():
    n = int(candidate[0])
    is_prime = not any(map(lambda x: n % x == 0, range(2,32)))
    hint_values.append(is_prime)
    
# We use `w.hint_all(...)` because we already have all the hint values in memory,
# but if the dataset is large, you can stream your hints back in chunks.
summary = w.hint_all("prime", hint_values)
print(summary["hinters"])
print(summary["status"])

[{'hit_ratio': [100, 1000], 'hl_ratio': [0, 0], 'id': 1, 'label': 'Interesting', 'name': '', 'query': '/7$/', 'suggested': False, 'weight': 80}, {'hit_ratio': [158, 1000], 'hl_ratio': [0, 0], 'id': 2, 'label': 'Interesting', 'name': 'prime', 'query': '[external]', 'suggested': False, 'weight': 90}]
current


Above, `dump()` is a special API call used for external hinting.
Behind the scenes, it is making multiple calls to the API to get candidates in chunks.
If your dataset is very large, you may want to stream the results back to the system as well, rather than creating a single array of hint values as we are doing here.
If you need to do this, check out the implementation of `hint_all()` and the other `*hint*()` functions for details.

While we have a table of numbers here, candidate data is stored as strings, so we'll need to convert from string to number.

When you run `w.hint_all(...)`, the new hint values will be applied in the background, which means the `summary` object returned from the `w.hint_all(...)` call itself may not show that much has changed. Note that in the immediate return of the `summary` object from `w.hint_all(...)`, the `hit_ratio` may still be 0/0 and the `status` field may also be "working" to show that background work is continuing. If this is the case, you can call `w.get()` a moment later to see the finalized results.

In [22]:
summary = w.get()
print(summary["hinters"])
print(summary["status"])

[{'hit_ratio': [100, 1000], 'hl_ratio': [0, 0], 'id': 1, 'label': 'Interesting', 'name': '', 'query': '/7$/', 'suggested': False, 'weight': 80}, {'hit_ratio': [158, 1000], 'hl_ratio': [0, 0], 'id': 2, 'label': 'Interesting', 'name': 'prime', 'query': '[external]', 'suggested': False, 'weight': 90}]
current


Now we can see the hit ratio for our external hinter.

So, apparently, there are 158 prime numbers under 1000. This is also the number of `True` values in the `hint_values` variable above.

## Deleting hinters and classes

Now that we've created some hinters, we can also delete them.

You can delete a class as well, but you have to delete all the hinters in it first.

Hinters are deleted by id and classes by name.

In [23]:
w.get()['hinters']

[{'hit_ratio': [100, 1000],
  'hl_ratio': [0, 0],
  'id': 1,
  'label': 'Interesting',
  'name': '',
  'query': '/7$/',
  'suggested': False,
  'weight': 80},
 {'hit_ratio': [158, 1000],
  'hl_ratio': [0, 0],
  'id': 2,
  'label': 'Interesting',
  'name': 'prime',
  'query': '[external]',
  'suggested': False,
  'weight': 90}]

In [24]:
w.delete(1)
w.delete(2)
w.get()['hinters']

[]

Above, you can see that the `summary` object no longer has any hinter after you have deleted them.

Next, we delete the class.

In [25]:
summary = w.get()
list(summary['classes'].keys())

['Interesting']

In [26]:
summary = w.delete_class("Interesting")
list(summary['classes'].keys())

[]

Above, you can see that the `summary` object no longer has any class after you deleted the one (and only) class.

# Exploring the API further

You can also look at the `client.py` module in the Watchful package for the API functions, some of which are already covered above.

A way to quickly explore the functions in `client.py` and their current documentation is to use the built-in help function, as shown below.

You could also explore the Watchful package documentation hosted at https://watchful.readthedocs.io/en/stable/.

In [27]:
help(w)

Help on package watchful:

NAME
    watchful - Initializes ``watchful`` as a module.

PACKAGE CONTENTS
    attributes
    client
    enrich
    enricher

DATA
    ATTR_WRITER = None
    BASE = 64
    COMPRESSED = {0: '#', 1: '$', 2: '%', 3: '&', 4: "'", 5: '(', 6: ')', ...
    COMPRESSED_LEN = 8
    Callable = typing.Callable
    Dict = typing.Dict
    ENRICHMENT_ARGS = None
    EnrichedCell = typing.List[typing.Tuple[typing.Union[typing.Lis...ing....
    Generator = typing.Generator
    HOST = 'localhost'
    IS_MULTIPROC = False
    List = typing.List
    MULTIPROC_CHUNKSIZE = None
    NUMERALS = {0: '0', 1: '1', 2: '2', 3: '3', 4: '4', 5: '5', 6: '6', 7:...
    Optional = typing.Optional
    PORT = '9002'
    Tuple = typing.Tuple
    Union = typing.Union

VERSION
    1.1.3




To explore the API functions in `client.py` module:

In [28]:
help(w.client)

Help on module watchful.client in watchful:

NAME
    watchful.client

DESCRIPTION
    This script provides the functions required for interacting directly with
    Watchful client application.

FUNCTIONS
    api(verb: str, **args: Dict) -> Union[Dict, NoneType]
        This is a convenience function for API calls; made up of a verb and optional
        keyword arguments.
        
        :param verb: The verb for the API.
        :type verb: str
        :param args: Optional parameters to support the API for ``verb``.
        :type args: Dict
        :return: The dictionary of the HTTP response from the connection request.
        :rtype: Dict
    
    api_send_action(action: Dict) -> Union[Dict, NoneType]
        This is a convenience function for API calls with an action.
        
        :param action: The ``verb`` for the API with optional parameters.
        :type action: Dict
        :return: The dictionary of the HTTP response from the connection request.
        :rtype: Dict
 