# Build our Presentation Corpus
We'll now build our presentation corpus, this is where we'll put existing presentations. Normally we'll crawl all the existing talks but here we've just taken the last few.

A key thing for this corpus is to include five filter attributes, which demonstrate the power of Semantic Search combined with Key-Value searches. We will define four filter attributes below:

* **City:** Where the presentation
* **Format:** Meetup or Conference
* **Date:** When the meetup occured / will occur
* **Status:** Proposed or Accepted
* **Presenters:** List of presenters

In [9]:
from vectara_client.core import Factory
from vectara_client.admin import CorpusBuilder
import logging

logging.basicConfig(format='%(asctime)s:%(name)-35s %(levelname)s:%(message)s', level=logging.INFO, datefmt='%H:%M:%S %z')
logging.getLogger("OAuthUtil").setLevel(logging.WARNING)
logger = logging.getLogger(__name__)

client = Factory().build()
manager = client.corpus_manager

corpus = (CorpusBuilder("meetup-presentations")
          .description("This is where we put all of our meetup presentations")
          .add_attribute("city", "The city where the presentation occured / will occur")
          .add_attribute("format", "Meetup or Conference")
          .add_attribute("date", "When the meetup occured / will occur")
          .add_attribute("status", "Proposed / Accepted / Rejected")
          .add_attribute("presenters", "Who were the presenter", type="text")
          .build()
         )
corpus_id = manager.create_corpus(corpus, delete_existing=True)

10:59:08 +1000:Factory                             INFO:initializing builder
10:59:08 +1000:Factory                             INFO:Factory will load configuration from home directory
10:59:08 +1000:HomeConfigLoader                    INFO:Loading configuration from users home directory [C:\Users\david]
10:59:08 +1000:HomeConfigLoader                    INFO:Loading default configuration [default]
10:59:08 +1000:HomeConfigLoader                    INFO:Parsing config
10:59:08 +1000:root                                INFO:We are processing authentication type [OAuth2]
10:59:08 +1000:root                                INFO:initializing Client
10:59:08 +1000:CorpusManager                       INFO:Performing account checks before corpus creation for name [meetup-presentations]
10:59:10 +1000:RequestUtil                         INFO:URL for operation list-corpora is: https://api.vectara.io/v1/list-corpora
10:59:12 +1000:CorpusManager                       INFO:Checking corpus with name

## Load our Corpus
We'll now load our corpus with our data in the JSON format

In [10]:
from pathlib import Path
import json

with open(Path("../resources/meetups/meetups.json"), "r") as f:
    for line in f.readlines():
        presentation = json.loads(line)

        upload_path = Path(f"../resources/meetups/meetup_{presentation['id']}.docx")
        client.indexer_service.upload(corpus_id, upload_path, metadata=presentation)
    

11:00:10 +1000:IndexerService                      INFO:Headers: {"c": "1623270172", "o": "725"}
meetup_1.docx: 14.1kB [00:03, 4.29kB/s]                                                                                
11:00:14 +1000:IndexerService                      INFO:Headers: {"c": "1623270172", "o": "725"}
meetup_2.docx: 14.4kB [00:02, 5.30kB/s]                                                                                
11:00:17 +1000:IndexerService                      INFO:Headers: {"c": "1623270172", "o": "725"}
meetup_3.docx: 14.4kB [00:03, 4.07kB/s]                                                                                


## Run Tests
We'll now check we have information available to query

In [12]:
response = client.query_service.query(
    "What was the speech about?", corpus_id, summary=True, 
    summarizer="vectara-summary-ext-v1.3.0", summary_result_count=5,
    metadata="doc.presenters = 'David Levy'")
logger.info(f"Response was: {response.summary[0].text}")

11:02:11 +1000:RequestUtil                         INFO:URL for operation query is: https://api.vectara.io/v1/query
11:02:18 +1000:__main__                            INFO:Response was: The speech was about building a grounded GenAI solution with Vectara, which was delivered in a format consisting of 33% slides, 33% UI console, and 33% Python Notebook, with a touch of humor [1][2][4]. The speaker was David Levy, the Head of Field Engineering APAC for Vectara. He formerly worked at Databricks/Cloudera and has experience as a tech lead on projects for numerous well-known public sector organizations [2][3][5]. His focus now lies in implementing simple operational solutions into production [3][5].
