## Hybrid Search
KDB.AI hybrid search is a method of similarity search to increase the relevancy of results retrieved from the vector database. It combines two search methods: sparse vector search, and dense vector search.

Sparse vector search uses the BM25 algorithm to find the most relevant keyword matches, while dense vector search finds the most semantically relevant matches. 

In KDB.AI, users can run sparse or dense search independently, or run hybrid search which runs both sparse and dense vector searches and then re-ranks to combine the results of each search based on a user defined "alpha" value. An alpha value closer to 0 indicates a higher sparse search weight, while a value closer to 1 indicates a higher dense search weight.

In this sample we will use hybrid search over a Federal Reserve speech to extract chunks of the speech that are similar to a user's prompt. In this notebook we will chunk up the document into smaller subsections, create sparse and dense vectors of the chunks, store the vectors in the KDB.AI vector database, and then run dense search, sparse search, and hybrid search to retrieve the most relevant chunks to a user's query. 

Agenda:
1. Imports & Setup
2. Ingest & Chunk Data
3. Generate Sparse & Dense Vectors for Each Chunk
4. Define KDB.AI Session
5. Create KDB.AI Schema & Table
6. Insert Data into KDB.AI Table
7. Create Sparse and Dense Query Vectors
8. Run Sparse, Dense, and Hybrid Searches

[Inflation: Progress and the Path Ahead](https://www.federalreserve.gov/newsevents/speech/powell20230825a.htm)

## 1. Imports & Setup

In [26]:
import pandas as pd
import numpy as np
import os
from getpass import getpass
import kdbai_client as kdbai
import time
from transformers import BertTokenizerFast
from collections import Counter

# Ignore Warnings
import warnings
warnings.filterwarnings("ignore")

from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

## 2. Ingest & Chunk Data
Data is from Federal Reserve Chain Jerome H. Powell:

[Inflation: Progress and the Path Ahead](https://www.federalreserve.gov/newsevents/speech/powell20230825a.htm)

In [37]:
### Load the documents we want to prompt an LLM about
doc = TextLoader("data/inflation.txt").load()

In [39]:
### Chunk the documents into 500 character chunks using langchain's text splitter "RucursiveCharacterTextSplitter"
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)

In [40]:
### split_documents produces a list of all the chunks created
pages = [p.page_content for p in text_splitter.split_documents(doc)]

In [41]:
### Create a blank dataframe to store chunks and vectors in before insertion
data = {
    'ID':[],
    'chunk': [],
    'dense': [],
    'sparse': []
}

# Create the DataFrame
df = pd.DataFrame(data)

## 3. Generate Sparse & Dense Vectors for Each Chunk

In [42]:
### Tokenizer to create sparse vectors
token = BertTokenizerFast.from_pretrained('bert-base-uncased')

### Embedding model to be used to embed user input query
from sentence_transformers import SentenceTransformer
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

In [45]:
### Create sparse and dense vectors of each chunk, append to the dataframe

id = 0
for chunk in pages:
    ### Create the dense query vector
    dense_chunk = [embedding_model.encode(chunk).tolist()]

    ### Create the sparse query vector
    sparse_chunk = [dict(Counter(y)) for y in token([chunk], padding=True,max_length=None)['input_ids']]
    sparse_chunk[0].pop(101);sparse_chunk[0].pop(102);

    new_row_df = pd.DataFrame([{"ID": str(id), "chunk": chunk, "dense": dense_chunk[0], "sparse": sparse_chunk[0]}])
    df = pd.concat([df, new_row_df], ignore_index=True)
    id += int(1)
df.head()

Unnamed: 0,ID,chunk,dense,sparse
0,0,"At last year's Jackson Hole symposium, I delivered a brief, direct message. My remarks this year will be a bit longer, but the message is the same: It is the Fed's job to bring inflation down to our 2 percent goal, and we will do so. We have tightened policy significantly over the past year. Although inflation has moved down from its peak—a welcome development—it remains too high. We are prepared to raise rates further if appropriate, and intend to hold policy at a restrictive level until we","[-0.02285625785589218, -0.02936527691781521, 0.03257254883646965, 0.10052481293678284, 0.04068514332175255, -0.04594222083687782, -0.06294257938861847, -0.012264790944755077, -0.08026789873838425, -0.025917794555425644, -0.04978106915950775, -0.03822582960128784, -0.1009216457605362, 0.045194994658231735, 0.06557615846395493, 0.055128004401922226, 0.00954356323927641, 0.011856192722916603, -0.025397202000021935, -0.005371921695768833, 0.0076469676569104195, -0.004146605730056763, 0.012556673027575016, 0.051397811621427536, -0.018861399963498116, 0.05590061843395233, -0.03667116165161133, -0.0006381056737154722, -0.0489947535097599, -0.00392116280272603, -0.03900369629263878, -0.01048232801258564, -0.012800188735127449, 0.012096517719328403, 0.04901905730366707, 0.04637635499238968, 0.0036269817501306534, -0.006850716192275286, 0.021863503381609917, -0.0403534397482872, -0.003986533731222153, -0.08572252094745636, 0.02243533730506897, -0.05199118331074715, -0.006603453308343887, 0.020408615469932556, 0.020964983850717545, 0.07860825210809708, -0.013116786256432533, 0.03581731766462326, 0.03360438346862793, -0.02600356563925743, 0.037548549473285675, -0.02683015540242195, 0.02880382165312767, 0.06678441166877747, -0.02589700184762478, -0.06794388592243195, -0.00928811077028513, -0.005671375896781683, 0.011809244751930237, -0.0003264880215283483, -0.09315428882837296, 0.028234535828232765, 0.022194689139723778, -0.03869275748729706, 0.006919924635440111, -0.047879934310913086, -0.012619221583008766, 0.059348221868276596, 0.038876794278621674, 0.010870655998587608, 0.00904536247253418, -0.0645502358675003, 0.04553348943591118, 0.0051292432472109795, 0.017876025289297104, 0.06248472258448601, 0.13011716306209564, -0.07174630463123322, 0.09569113701581955, -0.03645782172679901, 0.04733900725841522, -0.09409350156784058, -0.1287020593881607, -0.0757233127951622, 0.015421920455992222, -0.04373693838715553, -0.0005575934192165732, -0.014823147095739841, 0.0886424332857132, -0.00674480851739645, -0.01761506497859955, 0.012838358990848064, 0.019711405038833618, 0.06962540745735168, -0.09802231192588806, -0.029251888394355774, -0.05243441462516785, 0.06309278309345245, ...]","{2012: 2, 2197: 1, 2095: 3, 1005: 2, 1055: 2, 4027: 1, 4920: 1, 17899: 1, 1010: 5, 1045: 1, 5359: 1, 1037: 4, 4766: 1, 3622: 1, 4471: 2, 1012: 4, 2026: 1, 12629: 1, 2023: 1, 2097: 2, 2022: 1, 2978: 1, 2936: 1, 2021: 1, 1996: 4, 2003: 2, 2168: 1, 1024: 1, 2009: 2, 7349: 1, 3105: 1, 2000: 4, 3288: 1, 14200: 2, 2091: 2, 2256: 1, 1016: 1, 3867: 1, 3125: 1, 1998: 2, 2057: 4, 2079: 1, 2061: 1, 2031: 1, 8371: 1, 3343: 2, 6022: 1, 2058: 1, 2627: 1, 2348: 1, 2038: 1, 2333: 1, 2013: 1, 2049: 1, 4672: 1, 1517: 2, 6160: 1, 2458: 1, 3464: 1, 2205: 1, 2152: 1, 2024: 1, 4810: 1, 5333: 1, 6165: 1, 2582: 1, 2065: 1, 6413: 1, 13566: 1, 2907: 1, 25986: 1, 2504: 1, 2127: 1}"
1,1,are confident that inflation is moving sustainably down toward our objective.,"[0.01128313411027193, -0.030178550630807877, -0.028004847466945648, 0.08286452293395996, 0.07045692950487137, -0.012613235972821712, 0.015299422666430473, -0.048042941838502884, -0.014649586752057076, -0.01985168643295765, -0.006772790104150772, -0.012450152076780796, 0.02647615596652031, -0.03598489984869957, -0.03806186839938164, -0.013750843703746796, 0.0006535585271194577, -0.04038465768098831, -0.040352098643779755, 0.08869611471891403, -0.04489782080054283, -0.0018517323769629002, -0.016759781166911125, 0.03662634268403053, 0.06356023997068405, 0.07317017763853073, -0.021493328735232353, -0.00033986615017056465, 0.03778466582298279, 0.037972498685121536, -0.036830175668001175, -0.01918942481279373, -0.02211209200322628, -0.02584174834191799, 0.09871915727853775, 0.1006321832537651, 0.05575035139918327, 0.005863363388925791, 0.06447790563106537, -0.023004818707704544, 0.00847711507230997, -0.12029243260622025, -0.010796607472002506, -0.054761987179517746, -0.00019693249487318099, 0.02268078178167343, 0.026154324412345886, 0.008541055954992771, -0.05315438657999039, 0.03262715041637421, -0.04761399328708649, -0.0019579329527914524, 0.007144295144826174, -0.028067082166671753, -0.021158892661333084, 0.05623309314250946, 0.02764894813299179, -0.04430371895432472, 0.04405578225851059, -0.03655688092112541, -0.013488036580383778, 0.056885384023189545, -0.055619459599256516, 0.07481297105550766, 0.07835764437913895, -0.022458255290985107, 0.037293825298547745, 0.04162101820111275, -0.06493495404720306, 0.041778795421123505, 0.015524281188845634, -0.023366855457425117, -0.013581580482423306, 0.0011435907799750566, -0.02101784758269787, -0.025099484249949455, 0.09411011636257172, -0.04841676726937294, 0.06328073889017105, 0.01261104829609394, 0.1185251995921135, 0.0005713221617043018, -0.014801022596657276, -0.05258866399526596, -0.08134287595748901, -0.058117497712373734, 0.014299919828772545, -0.06440902501344681, -0.0016581265954300761, -0.05167120695114136, 0.007921971380710602, 0.03901951014995575, -0.016558902338147163, 0.035806670784950256, 0.019900007173419, 0.061307985335588455, -0.05479276925325394, -0.03253747522830963, -0.01426837407052517, 0.06486073136329651, ...]","{2024: 1, 9657: 1, 2008: 1, 14200: 1, 2003: 1, 3048: 1, 15770: 1, 8231: 1, 2091: 1, 2646: 1, 2256: 1, 7863: 1, 1012: 1}"
2,2,"Today I will review our progress so far and discuss the outlook and the uncertainties we face as we pursue our dual mandate goals. I will conclude with a summary of what this means for policy. Given how far we have come, at upcoming meetings we are in a position to proceed carefully as we assess the incoming data and the evolving outlook and risks.","[-0.03170399367809296, 0.017693432047963142, 0.028395740315318108, -0.0036524597089737654, 0.028333019465208054, 0.025562051683664322, -0.01357969082891941, -0.03617359325289726, -0.07777886837720871, -0.013764874078333378, -0.03317200019955635, 0.00584209430962801, -0.009432682767510414, -0.016689687967300415, 0.001956022111698985, 0.07148478925228119, -0.011375511065125465, -0.11026592552661896, -0.07611670345067978, 0.07181675732135773, 0.0035658071283251047, -0.02708597667515278, 0.0772479698061943, 0.021828148514032364, -0.05115029960870743, 0.027582159265875816, -0.043227750808000565, -0.01163290161639452, -0.056656353175640106, -0.03405488282442093, 0.011104020290076733, -0.03140908107161522, -0.027516767382621765, -0.01690256968140602, 0.03154020383954048, 0.028387175872921944, 0.034220293164253235, -0.07338082045316696, 0.05520183965563774, 0.00796577613800764, -0.024402910843491554, -0.08948075026273727, -0.04223588854074478, 0.024946143850684166, 0.013297993689775467, 0.006549637299031019, 0.018830042332410812, -0.05291004478931427, -0.004785293247550726, -0.00040339920087717474, -0.0262099951505661, -0.07117480784654617, 0.06951809674501419, -0.05466916412115097, 0.028682997450232506, 0.08154810220003128, 0.0025897647719830275, -0.036078356206417084, 0.02166397124528885, -0.058171141892671585, -0.01537337526679039, -0.018165871500968933, -0.06759598851203918, 0.05044863745570183, 0.033789556473493576, -0.039111051708459854, -0.05849521607160568, -0.003715317230671644, -0.004995780065655708, 0.025195350870490074, -0.09989162534475327, 0.0021416277159005404, -0.011228229850530624, -0.050963178277015686, 0.016373509541153908, -0.05123377963900566, 0.012105263769626617, 0.06505385786294937, 0.10960511863231659, -0.11112980544567108, 0.07526575773954391, 0.01632758416235447, -0.0001942870585480705, -0.010997369885444641, -0.04432172700762749, -0.07947203516960144, -0.01576756127178669, 0.007026952225714922, -0.009899158030748367, -0.02159152925014496, -0.008432329632341862, -0.02529800310730934, 0.07689513266086578, 0.03582959994673729, 0.011584165506064892, 0.08381620049476624, -0.04691419005393982, -0.05245824158191681, -0.0028198575600981712, 0.07436425238847733, ...]","{2651: 1, 1045: 2, 2097: 2, 3319: 1, 2256: 2, 5082: 1, 2061: 1, 2521: 2, 1998: 4, 6848: 1, 1996: 4, 17680: 2, 9662: 1, 7368: 1, 2057: 5, 2227: 1, 2004: 2, 7323: 1, 7037: 1, 11405: 1, 3289: 1, 1012: 3, 16519: 1, 2007: 1, 1037: 2, 12654: 1, 1997: 1, 2054: 1, 2023: 1, 2965: 1, 2005: 1, 3343: 1, 2445: 1, 2129: 1, 2031: 1, 2272: 1, 1010: 1, 2012: 1, 9046: 1, 6295: 1, 2024: 1, 1999: 1, 2597: 1, 2000: 1, 10838: 1, 5362: 1, 14358: 1, 14932: 1, 2951: 1, 20607: 1, 10831: 1}"
3,3,The Decline in Inflation So Far,"[0.003466656431555748, 0.007666202262043953, -0.03565439581871033, 0.08759867399930954, 0.03441566601395607, 0.03998332843184471, -0.06377791613340378, 0.029775315895676613, -0.03601164370775223, -0.006999653298407793, 0.07259610295295715, 0.03494003787636757, -0.0006784996367059648, 0.03840668871998787, 0.002948357257992029, -0.011374080553650856, -0.029362453147768974, -0.05348880961537361, -0.0015722834505140781, -0.022970641031861305, -0.034450456500053406, -0.03381078317761421, -0.08411557227373123, -0.0069604297168552876, 0.1026393324136734, 0.018168464303016663, -0.053261931985616684, -0.05014416202902794, -0.013423727825284004, -0.0207371823489666, -0.0374159961938858, 0.09294652193784714, 0.020451026037335396, -0.0020170961506664753, 0.07631052285432816, -0.02301836758852005, 0.1343110203742981, 0.05784633383154869, -0.0042676376178860664, -0.0519573800265789, 0.008486058562994003, -0.0734884962439537, 0.001993207959458232, -0.058242302387952805, 0.016656914725899696, 0.007503883447498083, 0.056056562811136246, 0.029338646680116653, -0.02605483867228031, 0.058837875723838806, 0.022654112428426743, 0.021755971014499664, 0.01756257191300392, -0.01265937089920044, 0.015090744942426682, 0.009587511420249939, -0.02104850858449936, -0.03836342692375183, 0.005009876564145088, -0.03674819692969322, -0.041876088827848434, 0.0024690700229257345, -0.03661569207906723, -0.013928686268627644, 0.02700718864798546, 0.00413798913359642, 0.03524109348654747, -0.04441413655877113, -0.07790438085794449, 0.07959410548210144, 0.013096524402499199, -0.005116598680615425, -0.03804832324385643, -0.03534472733736038, -0.0004915629979223013, -0.0572199709713459, 0.049819912761449814, -0.02310873381793499, 0.07504858821630478, -0.009914012625813484, 0.11099277436733246, -0.03639334812760353, -0.043169502168893814, -0.02786279283463955, -0.06924278289079666, -0.05690569803118706, 0.04791276901960373, -0.0986170545220375, -0.002906450070440769, -0.023096973076462746, 0.10629398375749588, 0.03341894969344139, -0.03612646460533142, 0.004329432733356953, 0.010832712054252625, 0.0008999643614515662, -0.009579421021044254, -0.04287347197532654, 0.007509719580411911, 0.10590385645627975, ...]","{1996: 1, 6689: 1, 1999: 1, 14200: 1, 2061: 1, 2521: 1}"
4,4,"The ongoing episode of high inflation initially emerged from a collision between very strong demand and pandemic-constrained supply. By the time the Federal Open Market Committee raised the policy rate in March 2022, it was clear that bringing down inflation would depend on both the unwinding of the unprecedented pandemic-related demand and supply distortions and on our tightening of monetary policy, which would slow the growth of aggregate demand, allowing supply time to catch up. While these","[-0.020729437470436096, -0.05514803156256676, -0.005661901086568832, 0.05776488780975342, 0.08166161924600601, 0.007415444124490023, -0.08715937286615372, -0.011653534136712551, 0.022230807691812515, 0.008095250464975834, -0.008189831860363483, 0.08031855523586273, -0.0494932197034359, -0.057165008038282394, 0.0641394779086113, -0.003304441459476948, -0.032255660742521286, -0.03476017341017723, -0.03335358202457428, 0.012043125927448273, 0.009181149303913116, 0.011864849366247654, -0.060355525463819504, -0.006636300124228001, 0.005846268031746149, 0.06044892594218254, -0.011055702343583107, -0.006277767475694418, -0.008302600122988224, 0.09433145821094513, -0.01611296832561493, 0.06718583405017853, 0.03032250516116619, -0.03269006684422493, 0.04195820167660713, 0.015321889892220497, 0.028874561190605164, 0.030524494126439095, -0.005840185098350048, 0.018246399238705635, 0.016586050391197205, -0.04388229548931122, -0.026119636371731758, 0.035497166216373444, 0.04690246284008026, 0.029245955869555473, 0.02934139408171177, 0.10130330175161362, -0.06881672888994217, 0.0029219265561550856, -0.03086479753255844, -0.022620946168899536, 0.008488346822559834, -0.02577389031648636, 0.036416374146938324, 0.06583990156650543, -0.040072642266750336, -0.07448919862508774, 0.03216037526726723, -0.006307804025709629, -0.10942457616329193, 0.018112896010279655, -0.010425835847854614, 0.01585676707327366, 0.1051362156867981, -0.030995449051260948, 0.04500046372413635, -0.017133615911006927, -0.019761838018894196, 0.08552390336990356, 0.008444778621196747, 0.06095994636416435, 0.029359299689531326, -0.07735776156187057, 0.02533135376870632, 0.031516458839178085, 0.025117812678217888, -0.02084812894463539, 0.13202016055583954, -0.016911912709474564, 0.09788121283054352, -0.06095634028315544, 0.05486052855849266, -0.08059404790401459, -0.06915923207998276, -0.07546098530292511, 0.06765684485435486, -0.04679181054234505, -0.01491133589297533, 0.0023532966151833534, 0.09007390588521957, -0.03570729121565819, 0.032751113176345825, 0.03596649318933487, -0.057767096906900406, 0.008761380799114704, -0.05398254841566086, -0.04752792418003082, 0.01885335147380829, 0.04489108920097351, ...]","{1996: 7, 7552: 1, 2792: 1, 1997: 4, 2152: 1, 14200: 2, 3322: 1, 6003: 1, 2013: 1, 1037: 1, 12365: 1, 2090: 1, 2200: 1, 2844: 1, 5157: 3, 1998: 3, 6090: 2, 3207: 2, 7712: 2, 1011: 2, 27570: 1, 4425: 3, 1012: 2, 2011: 1, 2051: 2, 2976: 1, 2330: 1, 3006: 1, 2837: 1, 2992: 1, 3343: 2, 3446: 1, 1999: 1, 2233: 1, 16798: 1, 2475: 1, 1010: 3, 2009: 1, 2001: 1, 3154: 1, 2008: 1, 5026: 1, 2091: 1, 2052: 2, 12530: 1, 2006: 2, 2119: 1, 4895: 1, 11101: 1, 2075: 1, 15741: 1, 3141: 1, 20870: 1, 2015: 1, 2256: 1, 18711: 1, 12194: 1, 2029: 1, 4030: 1, 3930: 1, 9572: 1, 4352: 1, 2000: 1, 4608: 1, 2039: 1, 2096: 1, 2122: 1}"


## 4. Define KDB.AI Session
KDB.AI comes in two offerings:

KDB.AI Cloud - For experimenting with smaller generative AI projects with a vector database in our cloud.
KDB.AI Server - For evaluating large scale generative AI applications on-premises or on your own cloud provider.
Depending on which you use there will be different setup steps and connection details required.

Option 1. KDB.AI Cloud
To use KDB.AI Cloud, you will need two session details - a URL endpoint and an API key. To get these you can sign up for free here.

You can connect to a KDB.AI Cloud session using kdbai.Session and passing the session URL endpoint and API key details from your KDB.AI Cloud portal.

If the environment variables KDBAI_ENDPOINTS and KDBAI_API_KEY exist on your system containing your KDB.AI Cloud portal details, these variables will automatically be used to connect. If these do not exist, it will prompt you to enter your KDB.AI Cloud portal session URL endpoint and API key details.

### Option 1. KDB.AI Cloud

In [None]:
#Set up KDB.AI endpoing and API key
KDBAI_ENDPOINT = (
    os.environ["KDBAI_ENDPOINT"]
    if "KDBAI_ENDPOINT" in os.environ
    else input("KDB.AI endpoint: ")
)
KDBAI_API_KEY = (
    os.environ["KDBAI_API_KEY"]
    if "KDBAI_API_KEY" in os.environ
    else getpass("KDB.AI API key: ")
)

In [None]:
### Start Session with KDB.AI Cloud
session = kdbai.Session(api_key=KDBAI_API_KEY, endpoint=KDBAI_ENDPOINT)

### Option 2. KDB.AI Server
To use KDB.AI Server, you will need download and run your own container. To do this, you will first need to sign up for free here.

You will receive an email with the required license file and bearer token needed to download your instance. Follow instructions in the signup email to get your session up and running.

Once the setup steps are complete you can then connect to your KDB.AI Server session using kdbai.Session and passing your local endpoint.

In [46]:
### start session with KDB.AI Server
session = kdbai.Session()

## 5. Create Schema and KDB.AI Table

Now, let us define the schema that will be used to create the KDB.AI table.

"ID" and "chunk" columns will hold the unique identifier and raw text chunk.

sparse and dense columns will hold the respective sparse and dense vectors.

*Note that in the 'sparse' column we define the "b" and "k" parameters. These parameters can be adjusted at runtime, enabling the hyperparameter tuning for term saturation and document length impact on relevance. This will be discussed further during a later example.

In [47]:
schema = dict(
        columns=[
            {"name": "ID", "pytype": "str"},
            {"name": "chunk", "pytype": "str"},
            {
                "name":"sparse",
                "pytype":"dict",
                "sparseIndex":{
                    "k": 1.25,
                    "b": 0.75
                },
            },
            {
                "name":"dense",
                "pytype":"float32",
                "vectorIndex":{
                    "type": "flat",
                    "metric": "L2",
                    "dims": 384
                },
            },
        ]
    )

In [48]:
# If we're re-running this, remove the old trade table
if 'inflation' in session.list():
    table = session.table('inflation')
    table.drop()

In [49]:
table = session.create_table("inflation", schema)

## 6. Insert data into the KDB.AI Table

In [50]:
### Insert the dataframe into the KDB.AI table
table.insert(df)

True

## 7. Create Sparse and Dense Query Vectors

In [51]:
query = '12-month basis'

### Create the dense query vector
dense_query = [embedding_model.encode(query).tolist()]

### Create the sparse query vector
sparse_query = [dict(Counter(y)) for y in token([query], padding=True,max_length=None)['input_ids']]
sparse_query[0].pop(101);sparse_query[0].pop(102);

## 8. Run Sparse, Dense, and Hybrid Searches

In [52]:
### Adjust display settings so we can see full output
pd.set_option('display.max_colwidth', None)

In [53]:
### Type 1 - dense search
table.search(dense_query, n=5)[0][['ID','chunk']]

Unnamed: 0,ID,chunk
0,9,"coming quarters. Twelve-month core inflation is still elevated, and there is substantial further ground to cover to get back to price stability."
1,35,"That assessment is further complicated by uncertainty about the duration of the lags with which monetary tightening affects economic activity and especially inflation. Since the symposium a year ago, the Committee has raised the policy rate by 300 basis points, including 100 basis points over the past seven months. And we have substantially reduced the size of our securities holdings. The wide range of estimates of these lags suggests that there may be significant further drag in the pipeline."
2,29,"Total hours worked has been flat over the past six months, and the average workweek has declined to the lower end of its pre-pandemic range, reflecting a gradual normalization in labor market conditions (figure 5)."
3,23,"Restrictive monetary policy has tightened financial conditions, supporting the expectation of below-trend growth.5 Since last year's symposium, the two-year real yield is up about 250 basis points, and longer-term real yields are higher as well—by nearly 150 basis points.6 Beyond changes in interest rates, bank lending standards have tightened, and loan growth has slowed sharply.7 Such a tightening of broad financial conditions typically contributes to a slowing in the growth of economic"
4,24,"activity, and there is evidence of that in this cycle as well. For example, growth in industrial production has slowed, and the amount spent on residential investment has declined in each of the past five quarters (figure 4)."


In [54]:
### Type 2 - sparse search
table.search(sparse_query, n=5)[0][['ID','chunk']]

Unnamed: 0,ID,chunk
0,14,"Similar dynamics are playing out for core goods inflation overall. As they do, the effects of monetary restraint should show through more fully over time. Core goods prices fell the past two months, but on a 12-month basis, core goods inflation remains well above its pre-pandemic level. Sustained progress is needed, and restrictive monetary policy is called for to achieve that progress."
1,8,"On a 12-month basis, core PCE inflation peaked at 5.4 percent in February 2022 and declined gradually to 4.3 percent in July (figure 1, panel B). The lower monthly readings for core inflation in June and July were welcome, but two months of good data are only the beginning of what it will take to build confidence that inflation is moving down sustainably toward our goal. We can't yet know the extent to which these lower readings will continue or where underlying inflation will settle over"
2,6,"On a 12-month basis, U.S. total, or ""headline,"" PCE (personal consumption expenditures) inflation peaked at 7 percent in June 2022 and declined to 3.3 percent as of July, following a trajectory roughly in line with global trends (figure 1, panel A).1 The effects of Russia's war against Ukraine have been a primary driver of the changes in headline inflation around the world since early 2022. Headline inflation is what households and businesses experience most directly, so this decline is very"
3,9,"coming quarters. Twelve-month core inflation is still elevated, and there is substantial further ground to cover to get back to price stability."
4,23,"Restrictive monetary policy has tightened financial conditions, supporting the expectation of below-trend growth.5 Since last year's symposium, the two-year real yield is up about 250 basis points, and longer-term real yields are higher as well—by nearly 150 basis points.6 Beyond changes in interest rates, bank lending standards have tightened, and loan growth has slowed sharply.7 Such a tightening of broad financial conditions typically contributes to a slowing in the growth of economic"


**After comparing the sparse search and dense search results based on the query of "12-month basis", we see that while both return relevant results, the sparse search is returns several chunks that contain specific references to "12-month basis".**

**This search example shows the advantage of having a sparse search when interested in specific terms.** 

Let's run a hybrid search to combine the results:

In [55]:
### Type 3 - hybrid search
table.hybrid_search(dense_vectors=dense_query,sparse_vectors=sparse_query,n=5)[0][['ID','chunk']]

Unnamed: 0,ID,chunk
0,9,"coming quarters. Twelve-month core inflation is still elevated, and there is substantial further ground to cover to get back to price stability."
1,14,"Similar dynamics are playing out for core goods inflation overall. As they do, the effects of monetary restraint should show through more fully over time. Core goods prices fell the past two months, but on a 12-month basis, core goods inflation remains well above its pre-pandemic level. Sustained progress is needed, and restrictive monetary policy is called for to achieve that progress."
2,23,"Restrictive monetary policy has tightened financial conditions, supporting the expectation of below-trend growth.5 Since last year's symposium, the two-year real yield is up about 250 basis points, and longer-term real yields are higher as well—by nearly 150 basis points.6 Beyond changes in interest rates, bank lending standards have tightened, and loan growth has slowed sharply.7 Such a tightening of broad financial conditions typically contributes to a slowing in the growth of economic"
3,35,"That assessment is further complicated by uncertainty about the duration of the lags with which monetary tightening affects economic activity and especially inflation. Since the symposium a year ago, the Committee has raised the policy rate by 300 basis points, including 100 basis points over the past seven months. And we have substantially reduced the size of our securities holdings. The wide range of estimates of these lags suggests that there may be significant further drag in the pipeline."
4,8,"On a 12-month basis, core PCE inflation peaked at 5.4 percent in February 2022 and declined gradually to 4.3 percent in July (figure 1, panel B). The lower monthly readings for core inflation in June and July were welcome, but two months of good data are only the beginning of what it will take to build confidence that inflation is moving down sustainably toward our goal. We can't yet know the extent to which these lower readings will continue or where underlying inflation will settle over"


##### Hybrid Search with 'alpha = 0.1' Sparse Bias

In [56]:
table.hybrid_search(dense_vectors=dense_query,sparse_vectors=sparse_query,n=5,alpha=0.1)[0][['ID','chunk']]

Unnamed: 0,ID,chunk
0,14,"Similar dynamics are playing out for core goods inflation overall. As they do, the effects of monetary restraint should show through more fully over time. Core goods prices fell the past two months, but on a 12-month basis, core goods inflation remains well above its pre-pandemic level. Sustained progress is needed, and restrictive monetary policy is called for to achieve that progress."
1,8,"On a 12-month basis, core PCE inflation peaked at 5.4 percent in February 2022 and declined gradually to 4.3 percent in July (figure 1, panel B). The lower monthly readings for core inflation in June and July were welcome, but two months of good data are only the beginning of what it will take to build confidence that inflation is moving down sustainably toward our goal. We can't yet know the extent to which these lower readings will continue or where underlying inflation will settle over"
2,9,"coming quarters. Twelve-month core inflation is still elevated, and there is substantial further ground to cover to get back to price stability."
3,6,"On a 12-month basis, U.S. total, or ""headline,"" PCE (personal consumption expenditures) inflation peaked at 7 percent in June 2022 and declined to 3.3 percent as of July, following a trajectory roughly in line with global trends (figure 1, panel A).1 The effects of Russia's war against Ukraine have been a primary driver of the changes in headline inflation around the world since early 2022. Headline inflation is what households and businesses experience most directly, so this decline is very"
4,23,"Restrictive monetary policy has tightened financial conditions, supporting the expectation of below-trend growth.5 Since last year's symposium, the two-year real yield is up about 250 basis points, and longer-term real yields are higher as well—by nearly 150 basis points.6 Beyond changes in interest rates, bank lending standards have tightened, and loan growth has slowed sharply.7 Such a tightening of broad financial conditions typically contributes to a slowing in the growth of economic"


##### Hybrid Search with 'alpha = 0.9' Dense Bias

In [57]:
table.hybrid_search(dense_vectors=dense_query,sparse_vectors=sparse_query,n=5,alpha=0.9)[0][['ID','chunk']]

Unnamed: 0,ID,chunk
0,9,"coming quarters. Twelve-month core inflation is still elevated, and there is substantial further ground to cover to get back to price stability."
1,35,"That assessment is further complicated by uncertainty about the duration of the lags with which monetary tightening affects economic activity and especially inflation. Since the symposium a year ago, the Committee has raised the policy rate by 300 basis points, including 100 basis points over the past seven months. And we have substantially reduced the size of our securities holdings. The wide range of estimates of these lags suggests that there may be significant further drag in the pipeline."
2,29,"Total hours worked has been flat over the past six months, and the average workweek has declined to the lower end of its pre-pandemic range, reflecting a gradual normalization in labor market conditions (figure 5)."
3,23,"Restrictive monetary policy has tightened financial conditions, supporting the expectation of below-trend growth.5 Since last year's symposium, the two-year real yield is up about 250 basis points, and longer-term real yields are higher as well—by nearly 150 basis points.6 Beyond changes in interest rates, bank lending standards have tightened, and loan growth has slowed sharply.7 Such a tightening of broad financial conditions typically contributes to a slowing in the growth of economic"
4,24,"activity, and there is evidence of that in this cycle as well. For example, growth in industrial production has slowed, and the amount spent on residential investment has declined in each of the past five quarters (figure 4)."


### Sparse Search Hyperparameter Optimization
#### Dynamic Testing /  Override of b and k using sparse_index_options

Depending on the use-case, it could be beneficial to tune the underlying parameters of sparse search in order to increase relevancy of retrieved data. KDB.AI offers developers the ability to customize the 'b' and 'k' parameters during runtime, ensuring flexibility in sparse search implementation.

**b: (values 0 to 1, defaults to 0.75)** 	
<br>Document length impact on relevance
<br>In general the more specific a document is the less likely length will detrimentally impact relevance so b should be low. For general documents that cover multiple topics at a high level consider using a higher value of b.
<br>
<br>**k: (values 0 to 3, defaults to 1.2)** 
<br>Term saturation
<br>How much more relevant do additional instances of  a term make a document. The lower k, the faster term saturation occurs, (i.e. additional terms do not count as much).

In [None]:
table.hybrid_search(dense_vectors=dense_query,sparse_vectors=sparse_query,n=5,sparse_index_options={'b':0.1, 'k':3})[0][['ID','chunk']]

**Additionally, upon the insertion of new data into the KDB.AI table, all underlying BM25 statistics are updated. This means that when new data is added, the BM25 scoring is updated and aligns with the all sparse data when a sparse seach is run.**

### Delete the KDB.AI Table
Once finished with the table, it is best practice to drop it.

In [16]:
table.drop()

True