# Connect to Data Commons Natural Language API

Google launched (09/2024) the [Data Gemma model](https://blog.google/technology/ai/google-datagemma-ai-llm/). Their [RIG notebook](https://colab.sandbox.google.com/github/datacommonsorg/llm-tools/blob/master/notebooks/datagemma_rig.ipynb) connects the Data Gemma model to the [Data Commons Natural language API](https://docs.datacommons.org/2023/09/13/explore.html).

This notebook explores the posibility of connecting the Gemini-API to that same interface.

## Setup

### Install

Install the necessary python packages.

In [None]:
!pip install -Uq "google.generativeai>=0.8.1"

In [None]:
!pip install -Uq datacommons

Use the helper code from the datacommons `llm-tools`, which were written for Data Gemma.

https://github.com/datacommonsorg/llm-tools/blob/main/data_gemma/datacommons.py

In [None]:
!pip install -q git+https://github.com/datacommonsorg/llm-tools@d99b583ca7aa5e7085c3181a87e23364749d7c63

### Import

Import the packages

In [None]:
import data_gemma.datacommons as dc_lib

In [None]:
import google.generativeai as genai

In [None]:
from IPython import display

Get an api key from: https://apikeys.datacommons.org, make sure activate the NL-API foryour key.

Note: The datacommons trial key does not work for the natural language API.

The code below fetches the keys from the "Colab Secrets" tab ("🔑" on the lsft of the colab window).

In [None]:
from google.colab import userdata

DATACOMMONS_API_KEY = userdata.get("DATACOMMONS_API_KEY")
genai.configure(api_key=userdata.get('GOOGLE_API_KEY'))

In [None]:
dc = dc_lib.DataCommons(api_key=DATACOMMONS_API_KEY)

## Try the Data Commond NL API

In [None]:
dc.point("what is the GDP of Spain?")

In [None]:
dc.table("what was the GDP of spain over the years?")

## Write some wrapper functions for Gemini

This section is to make the two methods callable by the API.
The main goal here is just giving a clear docstring what explains what the function does.

In [None]:
import dataclasses

def asdict(thing):
  thing = dataclasses.asdict(thing)
  thing = {key:value for key,value in thing.items() if thing != ""}
  thing.pop('query', None)
  thing.pop('id', None)
  return thing

In [None]:
def datacommons_point(query:str):
  """Call the datacommons api with a natural language query and return a single value.

  For example: "what is the GDP of Spain?"

  If the lookup fails it returns an empty result.
  """
  return asdict(dc.point(query))

In [None]:
def datacommons_table(query:str):
  """Call the datacommons api with a natural language query and return a table of values.

  For example: "what was the GDP of spain over the years?"

  If the lookup fails it returns an empty result.
  """
  return asdict(dc.table(query))

## Try it

In [None]:
SI = """You are a data analist and research assistant.
You have access to the DataCommons natural language API.
The user will send you a question, and you will query the database to research the answer.
Make sure your answers are well researched: Don't be lazy and simply copy paste the user's question.
Any good answer will likely take several database queries.
"""


In [None]:
model = genai.GenerativeModel(model_name='gemini-1.5-flash',
                              # Describe the tools to Gemini
                              tools=[datacommons_point, datacommons_table],
                              system_instruction=SI)

In [None]:
chat = model.start_chat(
    # Have the Chat-session automatically make the function calls and send back the responses.
    enable_automatic_function_calling=True)

In [None]:
# Ignore the returned response, look at the chat history for the full results.
_ = chat.send_message('Give me a report on the progress in public health in Pakistan over the last 20 years')

### Show the result text

The easiest way to show the result is to go through the the chat history and print all the text chiunks:

In [None]:
for message in chat.history:
  for part in message.parts:
    if text:=part.text:
      print(f'{message.role.title()}:\n\n')
      print(text)
      print('-'*80)

### Show detailed results

You can get more detailed output if you process the different output types.
`Part`s can contain `text`, `function_call`, or `function_response`. For the `table` responses, convert them to markdown.

In [None]:
def markdown_table(table):
  table = table.splitlines()

  def fixline(line):
    if line.count('-') == len(line):
      line = '|-|-|'
    else:
      line = f"|{line}|"
    return line

  table = [fixline(line) for line in table if line]
  table = '\n'.join(table)
  return table

In [None]:
def table_fr(fr):
  fr.pop('table', None)
  fr = {key: value for key, value in fr.items() if value}
  fr = [f"|{str(key)}|{str(value)}|" for key, value in fr.items()]
  fr = '|key|value|\n|-|-|\n'+'\n'.join(fr)
  return fr

In [None]:
for message in chat.history:
  display.display(display.Markdown(f'## {message.role.title()}:\n\n'))
  for part in message.parts:
    if text:=part.text:
      display.display(display.Markdown(text))
    if function_call:=part.function_call:
      display.display(display.Markdown(f"function_call: {function_call.name}(\"{function_call.args['query']}\")"))
    if function_response:=part.function_response:
      fr = dict(function_response.response)
      if table:=function_response.response.get('table', None):
        display.display(display.Markdown(f"function_response: {function_response.name}"))
        display.display(display.Markdown(table_fr(fr)))
        display.display(display.Markdown(markdown_table(table)))
      else:
        display.display(display.Markdown(f"function_response:"))
        display.display(display.Markdown(table_fr(fr)))
  display.display(display.Markdown('-'*80))