# Vespa Demo


see:
- [Vespa Quick start](https://docs.vespa.ai/en/vespa-quick-start.html)
- [Getting Started](https://docs.vespa.ai/en/getting-started.html)
- [PyVespa](https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa.html)


Below assumes
- Vespa docker image is running


Note
- this demo is going to populate Vespa with MSMarco data contained in `sample_docs.csv`. 
- Download from: [sample_docs.csv](https://data.vespa.oath.cloud/blog/msmarco/sample_docs.csv)
- `sample_docs` is a CSV file with columns: (id,title,body)

In [16]:
# vespa imports used below
from vespa.package import ApplicationPackage
from vespa.package import Field
from vespa.package import FieldSet
from vespa.package import RankProfile

In [7]:
# Create an empty schema
app_package = ApplicationPackage(name="textsearch")

### Add fields to the schema:
(i.e. define the schema)

`id` - holds the document ids  
`title` and `body` -  text fields of the documents.  

Note: 
- Setting "index" in indexing - means that a searchable index for `title` and `body` is created
- Setting index = "enable-bm25" will pre-compute quantities to make it fast to compute the BM25 score.


n.b. Read about BM25 score:
- (https://en.wikipedia.org/wiki/Okapi_BM25)
- 

In [17]:
# Add fields to the schema
# n.b. these correspond to the columns in the sample_data CSV file
app_package.schema.add_fields(
    Field(name = "id",    type = "string", indexing = ["attribute", "summary"]),
    Field(name = "title", type = "string", indexing = ["index", "summary"], index = "enable-bm25"),
    Field(name = "body",  type = "string", indexing = ["index", "summary"], index = "enable-bm25")
)

### Search multiple fields

In [12]:
# A FieldSet groups fields together for searching - it configures queries to look for matches both in the title and body fields of the documents:
app_package.schema.add_field_set(
    FieldSet(name = "default", fields = ["title", "body"])
)

### Define ranking

In [15]:
# Specify how to rank the matched documents by defining a RankProfile. Below are different rank profiles that can be selected in the query:
app_package.schema.add_rank_profile(
    RankProfile(name = "bm25", first_phase = "bm25(title) + bm25(body)")
)
app_package.schema.add_rank_profile(
    RankProfile(name = "native_rank", first_phase = "nativeRank(title, body)")
)