# Adding a ranking profile to Vespa 

### Adding fields
Here we need to add the fields that are necissary for the ranking function. So here we add "body_lengt" becuse we want to evaluate the document based on the length. Not very smart, but it is just an illustration.

In [2]:
from vespa.package import Document, Field

document = Document(
    fields=[
        Field(name = "id", type = "string", indexing = ["attribute", "summary"]),
        Field(name = "title", type = "string", indexing = ["index", "summary"], index = "enable-bm25"),
        Field(name = "body", type = "string", indexing = ["index", "summary"], index = "enable-bm25"),
        Field(name = "body_length", type = "int", indexing = ["attribute", "summary"] )
    ]
)


In [3]:
from vespa.package import Schema, FieldSet, RankProfile

msmarco_schema = Schema(
    name = "msmarco",
    document = document,
    fieldsets = [FieldSet(name = "default", fields = ["title", "body"])],
    rank_profiles = [RankProfile(name = "default", first_phase = "nativeRank(title, body)")]
)

In [4]:
from vespa.package import ApplicationPackage

app_package = ApplicationPackage(name = "msmarco", schema=msmarco_schema)


In [5]:
from vespa.package import VespaCloud

            #C:\Users\User\OneDrive - NTNU\NTNU\Prosjekt oppgave NLP
path_key = "C:\\Users\\User\\OneDrive - NTNU\\NTNU\\Prosjekt oppgave NLP\\Cloud_test\\"
file = "andre.olaisen.tmartins-ntnu.pem"


# App name in Cloud
app_name = "andre-test-loud"
vespa_cloud = VespaCloud(
    tenant="tmartins-ntnu",
    application=app_name,
    key_location=path_key + file,
    application_package=app_package
)

In [6]:
name = "sample_application"

path_key = "C:\\Users\\User\\OneDrive - NTNU\\NTNU\\Prosjekt oppgave NLP\\Cloud_test\\"


app = vespa_cloud.deploy(
    instance='andre-olaisen',
    disk_folder=path_key
)


Deployment started in run 40 of dev-aws-us-east-1c for tmartins-ntnu.andre-test-loud.andre-olaisen. This may take about 15 minutes the first time.
INFO    [07:52:50]  Deploying platform version 7.299.105 and application version unknown ...
INFO    [07:52:51]  No services requiring restart.
INFO    [07:52:51]  Deployment successful.
INFO    [07:52:51]  Session 1689 for tenant 'tmartins-ntnu' prepared and activated.
INFO    [07:52:52]  ######## Details for all nodes ########
INFO    [07:52:52]  h800a.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP
INFO    [07:52:52]  --- platform docker.ouroath.com:4443/vespa/centos-tenant:7.299.105
INFO    [07:52:52]  --- container-clustercontroller on port 19050 has config generation 1687, wanted is 1689
INFO    [07:52:52]  --- storagenode on port 19102 has config generation 1687, wanted is 1689
INFO    [07:52:52]  --- searchnode on port 19107 has config generation 1689, wanted is 1689
INFO    [07:52:52]  --- distributor on port 191

In [7]:
from pandas import read_csv

docs = read_csv("https://thigm85.github.io/data/msmarco/docs.tsv", sep = "\t")
docs.shape

doc = docs[1:100]
doc.shape

(99, 3)

### Feeding data

Here we feed the data to the vespa app. We need to include the length of the body at this point.

In [8]:

i = 1
for idx, row in doc.iterrows():
    i += 1
    if (i > 100): # Do not need much data for this test
        break 
    response = app.feed_data_point(
        schema = "msmarco",
        data_id = str(row["id"]),
        fields = {
            "id": str(row["id"]),
            "title": str(row["title"]),
            "body": str(row["body"]),
            "body_length": len(row["body"])
        }
    )

In [9]:
from vespa.query import Query, OR, AND, WeakAnd, ANN, RankProfile as Ranking


results = app.query(
    query="Where is my app",
    query_model = Query(
        match_phase=OR(),
        rank_profile=Ranking(name="default")
    ),
    hits = 10
)

In [10]:
print(results.number_documents_retrieved)
print(results.number_documents_indexed)

print("\n")

print("Results: or , deault")
for result in results.hits:
    print(result['fields']['title'])
    print(result["relevance"])
    print(result['fields']['body_length'])
    

904
996


Results: or , deault
My Thoughts on Standardized Work and  Lean
0.1412692730883527
14603
Coffered Ceiling Layout
0.11916579151483872
26195
Is the word autistic another word for retarded or mentally challenged 
0.11796103067641245
4560
What is the difference between science and philosophy 
0.11557961521393088
35962
Amazonaws virus
0.1151814856231579
92797
Is war justified for any reasons 
0.10981591717687116
6322
Frequently Asked Questions
0.10748563735615728
7989
 Myth Busters  regular Erik Gates dead
0.10402027021032244
59022
Where did the name Scotland come from 
0.10309842164117858
2021
The Best Time to Visit Paris
0.1010418499838613
28048


### Adding Ranking profile
Here we add the ranking profile. In this case we have to use attribute(body_length) or else vespa does not know what body_length is. Then we can use different arithmetic operations, like sqrt. This is found here:

https://docs.vespa.ai/documentation/reference/ranking-expressions.html

We can also use built in features found here

https://docs.vespa.ai/documentation/reference/rank-features.html

In [19]:
# Here we need to use attribute(body_length) or else vespa does not know what body_length is

app_package.schema.add_rank_profile(
    RankProfile(name = "body_length", inherits = "default", first_phase = "sqrt(attribute(body_length))"))

In [12]:
app_package.schema.add_rank_profile(
    RankProfile(name = "nativerank_bm25_combo", inherits = "default",
                first_phase = "10 * nativeRank(title,body) + bm25(body)")
)

In [20]:
path_key = "C:\\Users\\User\\OneDrive - NTNU\\NTNU\\Prosjekt oppgave NLP\\Cloud_test\\"

app = vespa_cloud.deploy(
    instance='andre-olaisen',
    disk_folder=path_key
)

Deployment started in run 43 of dev-aws-us-east-1c for tmartins-ntnu.andre-test-loud.andre-olaisen. This may take about 15 minutes the first time.
INFO    [07:57:01]  Deploying platform version 7.299.105 and application version unknown ...
INFO    [07:57:03]  No services requiring restart.
INFO    [07:57:03]  Deployment successful.
INFO    [07:57:03]  Session 1693 for tenant 'tmartins-ntnu' prepared and activated.
INFO    [07:57:03]  ######## Details for all nodes ########
INFO    [07:57:03]  h800a.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP
INFO    [07:57:03]  --- platform docker.ouroath.com:4443/vespa/centos-tenant:7.299.105
INFO    [07:57:03]  --- container-clustercontroller on port 19050 has config generation 1693, wanted is 1693
INFO    [07:57:03]  --- storagenode on port 19102 has config generation 1692, wanted is 1693
INFO    [07:57:03]  --- searchnode on port 19107 has config generation 1693, wanted is 1693
INFO    [07:57:03]  --- distributor on port 191

### Rank_profile in action 
Here we can see the ranking in action. 

In [25]:
query_text = "What is an apple?"

results_or_body_length = app.query(
    query=query_text,
    query_model = Query(
        match_phase=OR(),
        rank_profile=Ranking(name="body_length")
    ),
    hits = 10
)

print("\n")
print("Results: OR , Body_length")
rank = 1
for result in results_or_body_length.hits:
    print("Ranking:", rank)
    print("Title: \t", result['fields']['title'])
    print("Relevance: \t", result["relevance"])
    print("Sqrt(body_length): \t", (result["fields"]["body_length"])**(1/2), "\n")
    rank += 1



Results: OR , Body_length
Ranking: 1
Title: 	 Theses and Dissertations Available from Pro Quest
Relevance: 	 657.7659158089601
Sqrt(body_length): 	 657.7659158089601 

Ranking: 2
Title: 	 FAQs
Relevance: 	 481.64821187252426
Sqrt(body_length): 	 481.64821187252426 

Ranking: 3
Title: 	 RAMBLER automobile Kenosha Wisconsin USA Part I
Relevance: 	 343.3642380912724
Sqrt(body_length): 	 343.3642380912724 

Ranking: 4
Title: 	 Amazonaws virus
Relevance: 	 304.62600020352824
Sqrt(body_length): 	 304.62600020352824 

Ranking: 5
Title: 	 Donald Trump presidential campaign  2020
Relevance: 	 293.1603656703955
Sqrt(body_length): 	 293.1603656703955 

Ranking: 6
Title: 	 Warfarin
Relevance: 	 275.7879620288021
Sqrt(body_length): 	 275.7879620288021 

Ranking: 7
Title: 	 Gene Wilder  star of  Willy Wonka   dead at 83
Relevance: 	 268.43621216221925
Sqrt(body_length): 	 268.43621216221925 

Ranking: 8
Title: 	 Slavery  Manifest Destiny  Pre Civil War  Civil War  Reconstruction
Relevance: 	 255.4