# Build our Presentation Corpus
We'll now build our presentation corpus, this is where we'll put existing presentations. Normally we'll crawl all the existing talks but here we've just taken the last few.

A key thing for this corpus is to include five filter attributes, which demonstrate the power of Semantic Search combined with Key-Value searches. We will define four filter attributes below:

* **City:** Where the presentation
* **Format:** Meetup or Conference
* **Date:** When the meetup occured / will occur
* **Status:** Proposed or Accepted
* **Presenters:** List of presenters

In [None]:
from vectara_client.core import Factory
from vectara_client.admin import CorpusBuilder
import logging

logging.basicConfig(format='%(asctime)s:%(name)-35s %(levelname)s:%(message)s', level=logging.INFO, datefmt='%H:%M:%S %z')
logging.getLogger("OAuthUtil").setLevel(logging.WARNING)
logger = logging.getLogger(__name__)

client = Factory().build()
manager = client.corpus_manager

corpus = (CorpusBuilder("meetup-presentations")
          .description("This is where we put all of our meetup presentations")
          .add_attribute("city", "The city where the presentation occured / will occur")
          .add_attribute("format", "Meetup or Conference")
          .add_attribute("date", "When the meetup occured / will occur")
          .add_attribute("status", "Proposed / Accepted / Rejected")
          .add_attribute("presenters", "Who were the presenter", type="text")
          .build()
         )
corpus_id = manager.create_corpus(corpus, delete_existing=True)

## Load our Corpus
We'll now load our corpus with our data in the JSON format

In [None]:
from pathlib import Path
import json

with open(Path("../resources/meetups/meetups.json"), "r") as f:
    for line in f.readlines():
        presentation = json.loads(line)

        upload_path = Path(f"../resources/meetups/meetup_{presentation['id']}.docx")
        client.indexer_service.upload(corpus_id, upload_path, metadata=presentation)
    

## Run Tests
We'll now check we have information available to query

In [None]:
response = client.query_service.query(
    "What was the speech about?", corpus_id, summary=True, 
    summarizer="vectara-summary-ext-v1.3.0", summary_result_count=5,
    metadata="doc.presenters = 'David Levy'")
logger.info(f"Response was: {response.summary[0].text}")