# Vespa Demo


see:
- [Vespa Quick start](https://docs.vespa.ai/en/vespa-quick-start.html)
- [Getting Started](https://docs.vespa.ai/en/getting-started.html)
- [PyVespa](https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa.html)

Note
- this demo is going to populate Vespa with MSMarco data contained in `sample_docs.csv`. 
- Download from: [sample_docs.csv](https://data.vespa.oath.cloud/blog/msmarco/sample_docs.csv)
- `sample_docs` is a CSV file with columns: (id,title,body)

In [23]:
from dotenv import find_dotenv
# NOTE: empty `.env` file was added beneath `src` directory. Ignored by gitignore rules.

import os
import sys
sys.path.append(os.path.dirname(find_dotenv()))

from notebooks.notebook_utils import DevData

In [25]:
import os
import pandas as pd

# vespa imports used below
from vespa.package import ApplicationPackage
from vespa.package import Field
from vespa.package import FieldSet
from vespa.package import RankProfile
from vespa.deployment import VespaDocker

In [7]:
# Create an empty schema
app_package = ApplicationPackage(name="textsearch")

### Add fields to the schema:
(i.e. define the schema)

`id` - holds the document ids  
`title` and `body` -  text fields of the documents.  

Note: 
- Setting "index" in indexing - means that a searchable index for `title` and `body` is created
- Setting index = "enable-bm25" will pre-compute quantities to make it fast to compute the BM25 score.


n.b. Read about BM25 score:
- (https://en.wikipedia.org/wiki/Okapi_BM25)
- 

In [17]:
# Add fields to the schema
# n.b. these correspond to the columns in the sample_data CSV file
app_package.schema.add_fields(
    Field(name = "id",    type = "string", indexing = ["attribute", "summary"]),
    Field(name = "title", type = "string", indexing = ["index", "summary"], index = "enable-bm25"),
    Field(name = "body",  type = "string", indexing = ["index", "summary"], index = "enable-bm25")
)

### Search multiple fields

In [12]:
# A FieldSet groups fields together for searching - it configures queries to look for matches both in the title and body fields of the documents:
app_package.schema.add_field_set(
    FieldSet(name = "default", fields = ["title", "body"])
)

### Define ranking

In [15]:
# Specify how to rank the matched documents by defining a RankProfile. Below are different rank profiles that can be selected in the query:
app_package.schema.add_rank_profile(
    RankProfile(name = "bm25", first_phase = "bm25(title) + bm25(body)")
)
app_package.schema.add_rank_profile(
    RankProfile(name = "native_rank", first_phase = "nativeRank(title, body)")
)

### Deploy


The text search app has been defined with 
- fields
- a fieldset to group fields together
- rank profiles

Thus it is ready to deploy.  
Deploy app_package on the local machine using Docker, by creating an instance of VespaDocker:

In [20]:
# The following provides the app with a reference to a Vespa (Docker) instance 
vespa_docker = VespaDocker()
app = vespa_docker.deploy(application_package=app_package)


Waiting for configuration server, 0/300 seconds...
Waiting for configuration server, 5/300 seconds...
Waiting for application status, 0/300 seconds...
Waiting for application status, 5/300 seconds...
Waiting for application status, 10/300 seconds...
Waiting for application status, 15/300 seconds...
Waiting for application status, 20/300 seconds...
Waiting for application status, 25/300 seconds...
Waiting for application status, 30/300 seconds...
Finished deployment.


### Feed

Two options here:
1. download the sample data directly into a dataframe, or
2. download the sample data to a local file and then load

URL for option #1:


In [21]:
# # OPTION 1

# docs = read_csv(
#     filepath_or_buffer="https://data.vespa.oath.cloud/blog/msmarco/sample_docs.csv"
# ).fillna('')
# docs.head()

In [28]:
# OPTION 2

path = os.path.join(DevData().data_dir, "sample_data", "vespa", "sample_docs.csv")
docs = pd.read_csv(path).fillna("")
docs.head(5)

Unnamed: 0,id,title,body
0,D1712962,Can you eat crab or imitation krab when you ha...,Answers com Wiki Answers Categories Health...
1,D1817294,How long is a tax refund check good,Answers com Wiki Answers Categories Busine...
2,D1761039,The Suffolk Resolves 1774,The Suffolk Resolves 1774 Across New England ...
3,D2899268,The eagle has flown,Download citation Share Download full text PDF...
4,D3278481,22b Cotton and African American Life,22b Cotton and African American Life Two thi...


In [29]:
# Feed the documents to the application:
feed_res = app.feed_df(docs, asynchronous=False, batch_size=1000)

Successful documents fed: 1000/1000.
Batch progress: 1/10.
Successful documents fed: 1000/1000.
Batch progress: 2/10.
Successful documents fed: 1000/1000.
Batch progress: 3/10.
Successful documents fed: 1000/1000.
Batch progress: 4/10.
Successful documents fed: 1000/1000.
Batch progress: 5/10.
Successful documents fed: 1000/1000.
Batch progress: 6/10.
Successful documents fed: 1000/1000.
Batch progress: 7/10.
Successful documents fed: 1000/1000.
Batch progress: 8/10.
Successful documents fed: 1000/1000.
Batch progress: 9/10.
Successful documents fed: 963/963.
Batch progress: 10/10.


### Query
Query the text search app using the Vespa Query language by sending the parameters to the body argument of Vespa.query - here using the bm25 rank profile

In [30]:
query = {
    'yql': 'select * from sources * where userQuery()',
    'query': 'what keeps planes in the air',
    'ranking': 'bm25',
    'type': 'all',
    'hits': 10
}
res = app.query(body=query)
res.hits[0]

{'id': 'id:textsearch:textsearch::D1871659',
 'relevance': 25.661158431161503,
 'source': 'textsearch_content',
 'fields': {'sddocname': 'textsearch',
  'documentid': 'id:textsearch:textsearch::D1871659',
  'id': 'D1871659',
  'title': 'What keeps airplanes in the air ',
  'body': 'Answers com   Wiki Answers   Categories Cars   Vehicles Airplanes and Aircraft What keeps airplanes in the air  Flag What keeps airplanes in the air  Answer by Karin L  Confidence votes 95 0KThere s more to raising cattle than throwing them out to pasture  Know your soil and plants to earn profit above ground and wealth below  It is the combined forces of lift  thrust and weight that keeps an airplane in the air  Lift happens to be the largest force in this equation  and is dependent on the speed of the wing  or how fast an airplane is going   vertical velocity of air and air density  Well the elevator the rudder will help and something else I forgot what it was but don t judge me for that               And 

In [38]:
print(f"Number of hits: {len(res.hits)}")

print(res.hits[0]["fields"]["body"])

Number of hits: 10
Answers com   Wiki Answers   Categories Cars   Vehicles Airplanes and Aircraft What keeps airplanes in the air  Flag What keeps airplanes in the air  Answer by Karin L  Confidence votes 95 0KThere s more to raising cattle than throwing them out to pasture  Know your soil and plants to earn profit above ground and wealth below  It is the combined forces of lift  thrust and weight that keeps an airplane in the air  Lift happens to be the largest force in this equation  and is dependent on the speed of the wing  or how fast an airplane is going   vertical velocity of air and air density  Well the elevator the rudder will help and something else I forgot what it was but don t judge me for that               And that s how you be a bow done   Like a boss  Boss    15 people found this useful Was this answer useful  Yes Somewhat No How do airplane windows keep out the cold  Airplane windows   The only way that heat can escape the warm cabin is to travel through something or

### Clean up
Clean up the docker containers

In [39]:
vespa_docker.container.stop()
vespa_docker.container.remove()