#### Create a new assistant with `file_search` enabled in the `tools` parameter of the Assistant.

In [18]:
from openai import OpenAI
 
client = OpenAI()

instructions = ''' 
You are an expert customer service rep for the housing collective HOME0001. Use your knowledge base to answer questions about the project. 
If you don't find an answer just say 'I don't know :('. Only answer questions related to the project.
Talk in a casual, pragmatic tone. Avoid marketing or corporate speak at all costs",
'''
 
assistant = client.beta.assistants.create(
  name="HOME0001 Customer Assistant",
  instructions=instructions,
  model="gpt-4o",
  tools=[{"type": "file_search"}],
)

#### Upload files and add the to Vector Store

In [None]:
# Create a vector store
vector_store = client.beta.vector_stores.create(name="FAQ")
 
# Ready the files for upload to OpenAI
file_paths = ["data/home0001qa.json"]
file_streams = [open(path, "rb") for path in file_paths]
 
# Use the upload and poll SDK helper to upload the files, add them to the vector store,
# and poll the status of the file batch for completion.
file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
  vector_store_id=vector_store.id, files=file_streams
)
 
# You can print the status and the file counts of the batch to see the result of this operation.
print(file_batch.status)
print(file_batch.file_counts)

completed
FileCounts(cancelled=0, completed=1, failed=0, in_progress=0, total=1)


#### Update assistant to use the Vector Store

In [20]:
assistant = client.beta.assistants.update(
  assistant_id=assistant.id,
  tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
)

The file_search tool implements several retrieval best practices out of the box to help you extract the right data from your files and augment the model’s responses.  
The file_search tool:

- Rewrites user queries to optimize them for search.
- Breaks down complex user queries into multiple searches it can run in parallel.
- Runs both keyword and semantic searches across both assistant and thread vector stores.
- Reranks search results to pick the most relevant ones before generating the final response.

By default, the file_search tool uses the following settings but these can be configured to suit your needs:

- Chunk size: 800 tokens
- Chunk overlap: 400 tokens
- Embedding model: text-embedding-3-large at 256 dimensions
- Maximum number of chunks added to context: 20 (could be fewer)
- Ranker: auto (OpenAI will choose which ranker to use)
- Score threshold: 0 minimum ranking score


#### Create a thread


In [24]:
thread = client.beta.threads.create(
  messages=[
    {
      "role": "user",
      "content": "are u a communist?"
    }
  ]
)

#### Create a run and check the output

In [25]:
run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id, assistant_id=assistant.id
)

In [26]:
messages = list(client.beta.threads.messages.list(thread_id=thread.id, run_id=run.id))

message_content = messages[0].content[0].text

print(message_content.value)

I'm just here to help with questions about HOME0001, the housing collective, not to dive into political ideologies. If you have any questions about the project, feel free to ask!
