This notebook goes over the process of creating an assistant. You should make sure to run the "Resource Creator" notebook before you run this one. 

# Contents

**1 - Setup**

- 1.1 Imports
- 1.2 OpenAI
- 1.3 Directories

**2 - Assistant Creator**

- 2.1 Loading Vector Stores
- 2.2 Assistant Creation
- 2.3 Storing Assistant ID

# 1 - Setup

This section details the process of setting up the modules, OpenAI functionality and directories we'll need

## 1.1 Imports

These are the imported modules we'll need

`from openai import OpenAI` the OpenAI module allows us to set up a client that can communicate with OpenAI's services. These services are not specific to just chatbots although it does include this purpose we can use these services to create vector stores (more on this later) and upload and change files.

`import os` this module allows us to modify and access files and folders

`import json` this module allows for the reading and creation of .json files which allow us to store the data we process for later use

In [None]:
from openai import OpenAI
import os
import json

## 1.2 OpenAI

We define the defnitions needed for OpenAI so we can easily access them later

`api_key =` this is essentially a password provided by OpenAI, it allows us to access OpenAI's services whenever we use them

`client = OpenAI(api_key=api_key)` this sets up a client which can communicate with OpenAI's services, we specify this beforehand so we do not have to write out "OpenAI(api_key=api_key)" when we want to communicate with OpenAI

In [None]:
api_key = ""

client = OpenAI(api_key=api_key)

## 1.3 Directories

We set up any directories for files that we will use later

`store_name =` this is a general purpose name that we will use when creating files, this allows us to make sure we are retrieving the documents we want later on.

`data_directory =` this is the file directory where we'll store and retrieve any other kinds of data.

`document_directory =` this is the file directory where we'll store and retrieve our documents from.

`assistant_directory =` this is the file directory where we'll store the assistant ids. 

You should make sure when specifying these that they are the same as you used in the Conversation Script and Resource Creator.

In [None]:
#automatically assigns the directories 
store_name = "Labs Dutchess"

this_directory = os.getcwd()

directories = os.listdir(this_directory)

directories = [os.path.join(this_directory, entry) for entry in directories if not os.path.isfile(os.path.join(this_directory, entry))] 

for directory in directories:
    if "Data Base" in os.path.basename(directory):
        data_directory = directory
    elif "Documents" in os.path.basename(directory):
        document_directory = directory
    elif "Output Images" in os.path.basename(directory):
        image_directory = directory
    elif "Assistants" in os.path.basename(directory):
        assistant_directory = directory

print(f"assistant_directory = {assistant_directory}")
print(f"data_directory = {data_directory}")
print(f"document_directory = {document_directory}")

# 2 - Assistant Creator

This section details the process of assistant creation and ensuring that a given assistant has the knowledge base we want it to have

## 2.1 Loading Vector Stores

We need retrieve the vector store id we stored in the resource creator script as our asssitant will need this so it knows what vector store to use as its knowledge base. We make sure to check if a vector store exists first however.

In [None]:
vector_store_found = False

if os.listdir(document_directory) != []:   
    vector_name = f"vector_store_id_{store_name}.json"
    vector_path = os.path.join(data_directory, vector_name) # gets the path for our vector store id

    with open(vector_path, "r") as file: 
        vector_store_id = json.load(file) # retrieves our vector store id
    vector_store_found = True

else:
    print("no vector store found")

print(f"Vector Store Found: {vector_store_found}")

## 2.2 Assistant Creation

`Description` and `instructions`: Descriptions and instructions are very similar but differ in scope, the description is limited to 512 characters and provides a general overview of what the assistant does, it should be a high level statement that outlines the assistants capabilities, functions and intend use cases. The description is limited to 256,000 characters and is used to provide detailed and specific guidelines for how the assistant should perform its task, it should contain specific rules, behaviours or constraints.

`Model`: Here I've used the gpt-4o model if you want to have your assistant exert more specific behaviours in terms of importance of modification of behaviour the order in which your actions should take preference is: 1 - Modification of Prompts, 2 - Modification of Description and Instructions, 3 - Fine tuning. Fine Tuning can be a very long winded process to get the correct behaviour you want so the other two methods and more prominantly the modification of prompts should take priority. If you do decide that fine tuning is the only way to go I'd recommend looking at the fine tuning note book and tutorial as most of this takes place outside of coding. If you have a fine tuned model you can swap "gpt-4o" for the output model given in the fine tuning menu.

`Tools`: These are the tools you give your chatbot access to, the two I have given it here are file_search and code_interpreter. The file_search tool allows the chatbot to search through any vector stores we give it whilst the code_interpreter allows the chatbot to run and write its own code. It's generally not a bad idea to just give a chat bot these but you can freely remove these from the list of tools it has access to, although this will remove the relevant functionality.

`Tool Resources`: This is used predominantly to give the file_search tool access to the vector stores you want to give your chatbot

`Top_p`: Top_p is one of two quantities that controls the balance between creativity and determinism of the chatbot. The other is called temperature and generally it is recommended you only adjust one at a time. Higher values of the values leads to more creativity/randomness whereas lower values lead to more determinsitic and fact base behaviour as such I have set the value of top_p relatively low.

**IMPORTANT**: You should make sure to modify the `name` that you want the assitant to use as this will be its reference in Dutchess, also make sure that if you do not want the assistant to have access to the vector store you specify that `access_to_vector_store` is false.

In [None]:
name = "Labs General"
model = "gpt-4o"
top_p = 0.05

access_to_vector_store = True

You can then run the following to create an assistant, if you want to add any fields that are not included below simply follow the same format of `____ = what you want the variable to be,` after a comma (unless its the first in the list) 

In [None]:
if os.listdir(document_directory) != [] and access_to_vector_store == True: 
    assistant = client.beta.assistants.create( # creates the assistant
        name=name, # gives the assistant a name
        model=model, # specifies the model to be used
        tools=[{"type": "file_search"}, {"type": "code_interpreter"}], # gives it access to several tools
        tool_resources={"file_search": {"vector_store_ids": [vector_store_id]}}, # gives it access to the vector store we created 
        top_p=top_p, # specifies the top_p
        )
else:
    assistant = client.beta.assistants.create( # creates the assistant
        name=name, # gives the assistant a name
        model=model, # specifies the model to be used
        tools=[{"type": "code_interpreter"}], # gives it access to code interpreter tool
        top_p=top_p, # specifies the top_p
        )

## 2.3 Storing Assistant ID
We store the assistant id in a .json file so it can be used later, we give the .json file a name corresponding to the name of the assistant

In [None]:
assistant_name = f"{name}_assistant_id.json" # creates the file name for the vector store id
assistant_path = os.path.join(assistant_directory, assistant_name) # creates the file path for the vector store id

with open(assistant_path, "w") as file: # saves the vector store id as a .json file
    json.dump(assistant.id, file)