**`Webinar 2024-02-28`**

[YouTube Recording](https://www.youtube.com/watch?v=r-BJha0niXM&list=PLePKvgYhNOFxg3O7gXfmxaKp0EWXEhvNC&index=12)

# Web Search

Define globals and requirements 

In [16]:
import json
import superpowered
from pprint import pprint

API_KEY_ID = ''
API_KEY_SECRET = ''

superpowered.set_api_key(API_KEY_ID, API_KEY_SECRET)

### Create Web Search Preset

In [3]:
wsp = superpowered.create_web_search_preset(
    title='Web Scholar',
    include_domains=['arxiv.org', 'sciencedirect.com'],
    timeframe_days=365,
)

### Query Endpoint

We will just use the web search preset rather than overrides

##### NOTE: this uses the new `json_response` feature

In [19]:
response = superpowered.query_knowledge_bases(
    query='What is the latest research on processing units to improve inference time in deep learning? Please respond in json where the keys are `question` and `answer`.',
    use_web_search=True,
    web_search_preset_id=wsp['id'],
    summarize_results=True,
    json_response=True,
)
pprint(json.loads(response['summary']))
pprint(response)

{'answer': 'The GitHub repositories provide valuable information on '
           'benchmarking inference speed for deep learning models on various '
           'platforms and architectures. They discuss techniques such as '
           'pruning, quantization, and optimization to improve performance. '
           'Different frameworks and platforms are compared based on their '
           'compatibility with models, detection support, and speed '
           'benchmarks. The repositories also mention specific tools and '
           'libraries like SparseML for pruning and quantization, as well as '
           'various inference engines optimized for different hardware '
           'configurations. Overall, the repositories offer a comprehensive '
           'overview of the latest research and techniques aimed at enhancing '
           'inference time in deep learning.',
 'question': 'What is the latest research on processing units to improve '
             'inference time in deep learnin

### Chat Endpoint

Just use thread overrides instead of adding the web search config to the chat thread default options.

Using the new `mistral-large` model (released 2024-02-26)

In [20]:
thread = superpowered.create_chat_thread()

response = superpowered.get_chat_response(
    thread_id=thread['id'],
    input='What is the latest research on processing units to improve inference time in deep learning?',
    model='mistral-large',
    use_web_search=True,
    web_search_include_domains=['arxiv.org', 'sciencedirect.com', 'wikipedia.org'],
)
pprint(response)

{'interaction': {'id': '0000000001',
                 'model_response': {'content': 'The latest research on '
                                               'processing units to improve '
                                               'inference time in deep '
                                               'learning is quite extensive '
                                               'and multifaceted. Here are '
                                               'some key findings from the '
                                               'papers you provided:\n'
                                               '\n'
                                               '1. **Edge TPUs**: A study by '
                                               'Google researchers evaluated '
                                               'Edge TPUs, a domain of '
                                               'accelerators for low-power, '
                                               'edge devices used in vario