
# Application packages

Vespa is configured using an [application package](https://docs.vespa.ai/en/application-packages.html).
Pyvespa provides an API to generate a deployable application package.
An application package has at a minimum a [schema](https://docs.vespa.ai/en/schemas.html)
and [services.xml](https://docs.vespa.ai/en/reference/services.html).

> **_NOTE: pyvespa generally does not support all indexing options in Vespa - it is made for easy experimentation._**
  **_To configure setting an unsupported indexing option (or any other unsupported option),_**
  **_export the application package like above, modify the schema or other files_**
  **_and deploy the application package from the directory, or as a zipped file._**
  **_Find more details at the end of this notebook._**

In [6]:
!pip3 install pyvespa




[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


By exporting to disk, one can see the generated files:

In [16]:
!rm -rf  /content/sample_data

In [1]:
import os
import tempfile
from pathlib import Path
from vespa.package import ApplicationPackage

app_name = "myschema"
app_package = ApplicationPackage(name=app_name, create_query_profile_by_default=False)

temp_dir = '.'
app_package.to_files(temp_dir)

for p in Path(temp_dir).rglob("*"):
    if p.is_file():
        print(p)

application_packages.ipynb
services.xml
schemas\myschema.sd


## Schema

A schema is created with the same name as the application package:

In [2]:
os.environ["TMP_APP_DIR"] = temp_dir
os.environ["APP_NAME"] = "schemas/myschema"



Configure the schema with [fields](https://docs.vespa.ai/en/schemas.html#field),
[fieldsets](https://docs.vespa.ai/en/schemas.html#fieldset)
and a [ranking function](https://docs.vespa.ai/en/ranking.html):

In [3]:
from vespa.package import Field, FieldSet, RankProfile

app_package.schema.add_fields(
    Field(name="id", type="string", indexing=["attribute", "summary"]),
    Field(
        name="title", type="string", indexing=["index", "summary"], index="enable-bm25"
    ),
    Field(
        name="body", type="string", indexing=["index", "summary"], index="enable-bm25"
    ),
)

app_package.schema.add_field_set(FieldSet(name="default", fields=["title", "body"]))

app_package.schema.add_rank_profile(
    RankProfile(name="default", first_phase="bm25(title) + bm25(body)")
)

Export the application package again, show schema:

In [4]:
app_package.to_files(temp_dir)



## Services

`services.xml` configures container and content clusters -
see the [Vespa Overview](https://docs.vespa.ai/en/overview.html).
This is a file you will normally not change or need to know much about:

In [30]:
!cat '/content/services.xml'

<?xml version="1.0" encoding="UTF-8"?>
<services version="1.0">
    <container id="myschema_container" version="1.0">
        <search></search>
        <document-api></document-api>
        <document-processing></document-processing>
    </container>
    <content id="myschema_content" version="1.0">
        <redundancy>1</redundancy>
        <documents>
            <document type="myschema" mode="index"></document>
        </documents>
        <nodes>
            <node distribution-key="0" hostalias="node1"></node>
        </nodes>
    </content>
</services>

Observe:

* A _content cluster_ (this is where the index is stored) called `myschema_content` is created.
  This is information not normally needed, unless using
  [delete_all_docs](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.application.Vespa.delete_all_docs)
  to quickly remove all documents from a schema

## Deploy

After completing the code for the fields and ranking, deploy the application into a Docker container -
the container is started by pyvespa:

In [12]:
from vespa.deployment import VespaDocker

vespa_container = VespaDocker(port=8080) 
vespa_connection = vespa_container.deploy(application_package=app_package)
# 7:34  6:00

Waiting for configuration server, 0/60 seconds...
Waiting for configuration server, 5/60 seconds...
Waiting for configuration server, 10/60 seconds...
Waiting for configuration server, 15/60 seconds...
Waiting for application to come up, 0/300 seconds.
Waiting for application to come up, 5/300 seconds.
Waiting for application to come up, 10/300 seconds.
Waiting for application to come up, 15/300 seconds.
Waiting for application to come up, 20/300 seconds.
Waiting for application to come up, 25/300 seconds.
Application is up!
Finished deployment.


## Deploy from modified files

To add configuration the the schema, which is not supported by the pyvespa code,
export the files, modify, then deploy by using `deploy_from_disk`.
This example adds custom configuration to the `services.xml` file above and deploys it:

In [None]:
import os


# Set your environment variables or fallback to defaults
TMP_APP_DIR = temp_dir
APP_NAME = "schemas/myschema"

# Ensure the directory exists
os.makedirs(TMP_APP_DIR, exist_ok=True)

# Define the XML content
xml_content = f"""<?xml version="1.0" encoding="UTF-8"?>
<services version="1.0">
    <container id="{APP_NAME}_container" version="1.0">
        <search></search>
        <document-api></document-api>
    </container>
    <content id="{APP_NAME}_content" version="1.0">
        <redundancy reply-after="1">1</redundancy>
        <documents>
            <document type="{APP_NAME}" mode="index"></document>
        </documents>
        <nodes>
            <node distribution-key="0" hostalias="node1"></node>
        </nodes>
        <tuning>
            <resource-limits>
                <disk>0.90</disk>
            </resource-limits>
        </tuning>
    </content>
</services>
"""

# Write the file
with open(os.path.join(TMP_APP_DIR, "services.xml"), "w", encoding="utf-8") as f:
    f.write(xml_content)

print("services.xml created successfully.")

# Ensure the directory exists
os.makedirs(TMP_APP_DIR, exist_ok=True)

# Define the XML content
xml_content = f"""<?xml version="1.0" encoding="UTF-8"?>
<services version="1.0">
    <container id="{APP_NAME}_container" version="1.0">
        <search></search>
        <document-api></document-api>
    </container>
    <content id="{APP_NAME}_content" version="1.0">
        <redundancy reply-after="1">1</redundancy>
        <documents>
            <document type="{APP_NAME}" mode="index"></document>
        </documents>
        <nodes>
            <node distribution-key="0" hostalias="node1"></node>
        </nodes>
        <tuning>
            <resource-limits>
                <disk>0.90</disk>
            </resource-limits>
        </tuning>
    </content>
</services>
"""

# Write the file
with open(os.path.join(TMP_APP_DIR, "services.xml"), "w", encoding="utf-8") as f:
    f.write(xml_content)

print("services.xml created successfully.")


In [None]:
%%sh
cat << EOF > $TMP_APP_DIR/services.xml
<?xml version="1.0" encoding="UTF-8"?>
<services version="1.0">
    <container id="${APP_NAME}_container" version="1.0">
        <search></search>
        <document-api></document-api>
    </container>
    <content id="${APP_NAME}_content" version="1.0">
        <redundancy reply-after="1">1</redundancy>
        <documents>
            <document type="${APP_NAME}" mode="index"></document>
        </documents>
        <nodes>
            <node distribution-key="0" hostalias="node1"></node>
        </nodes>
        <tuning>
            <resource-limits>
                <disk>0.90</disk>
            </resource-limits>
        </tuning>
    </content>
</services>
EOF

The [resource-limits](https://docs.vespa.ai/en/reference/services-content.html#resource-limits) in `tuning/resource-limits/disk` configuration setting allows a higher disk usage.

Deploy using the exported files:

In [None]:
vespa_connection = vespa_container.deploy_from_disk(
    application_name=app_name, application_root=temp_dir.name
)

Waiting for configuration server, 0/300 seconds...
Waiting for configuration server, 5/300 seconds...
Waiting for application status, 0/300 seconds...
Waiting for application status, 5/300 seconds...
Finished deployment.


One can also export a deployable zip-file, which can be deployed using the Vespa Cloud Console:

In [None]:
Path.mkdir(Path(temp_dir.name) / "zip", exist_ok=True, parents=True)
app_package.to_zipfile(temp_dir.name + "/zip/application.zip")

! find "$TMP_APP_DIR/zip" -type f

/var/folders/9_/z105jyln7jz8h2vwsrjb7kxh0000gp/T/tmp6geo2dpg/zip/application.zip


### Cleanup

Remove the container resources and temporary application package file export:

In [None]:
temp_dir.cleanup()
vespa_container.container.stop()
vespa_container.container.remove()

## Next step: Deploy, feed and query

Once the schema is ready for deployment, decide deployment option and deploy the application package:

* [Deploy to local container](https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa.html)
* [Deploy to Vespa Cloud](https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa-cloud.html)

Use the guides on the pyvespa site to feed and query data.