# **Support Ticket Search**
A support ticket search system takes in a customer support request, such as:

*"My internet connection has been slow for the past few days after traveling overseas"*

and searches for similar past tickets. This is useful for both internal use as well as to help customers troubleshoot using previous customer support outcomes.

# **Step 1: Get your support ticket history**
A CSV file with columns for ticket IDs and titles is sufficient, but our system can utilize any additional information that you give it. In this notebook, we use a dataset containing Stack Overflow questions and their respective answers as our "support ticket history" as these question-answer pairs emulate the problem-solution nature of support tickets. It has the following fields:
- `"id"`: The ticket ID.
- `"title"`: The title of the Stack Overflow post or question.
- `"question_body"`: The part that explains the question and explains details regarding the context of the problem.
- `"answer_body"`: The answer to the question.
- `"tags"`: Categories for the question, such as "git version" or "html button".

Note that the IDs must be integers between 0 and `N_TICKETS - 1`.

In [None]:
import os

# This dataset is originally from https://www.kaggle.com/datasets/stackoverflow/stackoverflow?select=posts_questions
# Preprocessed as follows:
# - Keep 100,000 most popular resolved questions
# - Extract relevant columns
# - Add ticket IDs
# We will download two files that contain the same samples but preprocessed slightly differently.

# stackoverflow-display.csv retains original HTML formatting
os.system("curl -L https://www.dropbox.com/s/fby91zecup9fq9r/stackoverflow-display.csv?dl=0 --output stackoverflow-display.csv")

# stackoverflow-train.csv replaces newlines in question_body and answer_body with whitespaces
os.system("curl -L https://www.dropbox.com/s/amjs8d5xisiinv5/stackoverflow-train.csv?dl=0 --output stackoverflow-train.csv")

!pip3 install pandas
import pandas as pd

pd.options.display.max_colwidth = 150
df = pd.read_csv("stackoverflow-display.csv")

df.head(30)

# **Step 2: Import and Configure UDT**
First, install the ThirdAI package and activate the license.

In [None]:
!pip3 install thirdai --upgrade

import thirdai
thirdai.licensing.activate("AHUU-TM9U-EH4M-JWFW-VE7F-3VRH-7TMV-CHPX")

The next step is where we define our model. Let's walk through each parameter:
- `data_types`: this is where we define the data fields and types that our model is concerned about. We want a model that accepts a text query, which we arbitrarily named `"query"`, and returns a ticket ID, which is named `"id"` in accordance with our dataset.
- `target`: identify the data field that we will predict. In this case, it's `"id"`.
- `n_target_classes`: the number of unique ticket IDs.
- `integer_target`: this is True since our ticket IDs are integers between 0 and `n_target_classes - 1`.

In [None]:
from thirdai import bolt

model = bolt.UniversalDeepTransformer(
    data_types={
        "query": bolt.types.text(),
        "id": bolt.types.categorical(n_classes=100_000, type="int"),
    },
    target="id",
)

# **Step 3: Cold-start the model**
It can be difficult to collect a supervised dataset that maps customer queries to relevant past ticket IDs (it would look something like the following)
```
query,id
npm install stuck on ubuntu machine,32887
...
```
So we can instead cold-start the model on our existing support ticket history. To do so, we need to give the model the path to the dataset as well as how to use the different columns of the dataset. 

Each support ticket has a title, a question body, an answer body, and tags. Since the title usually summarizes the problem well, we’ll call that a **strong column**. But the question body and the answer have all the details, so we’ll add them as **weak columns**, along with tags.

You should feel free to play around and see what configuration gives you the best results for your dataset!

In [None]:
model.cold_start(
    filename="stackoverflow-train.csv",
    strong_column_names=["title"],
    weak_column_names=["question_body", "answer_body", "tags"],
    learning_rate=0.001,
    epochs=5,
    metrics=["categorical_accuracy"],
)

# Save the model for later use.
model.save("ticketsearch.bolt")

# **Step 4: Display the results**
When we call model.predict() with our query, we get scores for each ticket ID. Let’s write a helper function to display this information to customers in a nice way.

In [None]:
df = pd.read_csv("stackoverflow-display.csv")

def search(query):
    print("Question:", query)
    print("Results:\n")

    scores = model.predict({"query": query})
    k = 3
    sorted_ticket_ids = scores.argsort()[-k:][::-1]

    for t_id in sorted_ticket_ids:
        result = df.iloc[t_id]["title"]
        print("\t> " + result)

search("c++ move semantics")

A major advantage of this model is that it does very well with long queries with many details such as the machine type, specific error messages, or descriptive full sentences. Try these ones!


In [None]:
search("How do I copy a vector in rust without getting borrow as mutable error")
search("I ran pip install Jupiter but terminal cannot find Jupiter and ipython command is not found")
search("Postgres db 9.1 running on AWS EC2 with ubuntu 12.04 running clusters stuck")

You can also wrap this in a UI, for example using `gradio`, to turn this into an app.

In [None]:
!pip3 install --upgrade pydantic
!pip3 install gradio
import gradio as gr

max_posts = 10

# If you want to move this app script to a separate file, uncomment the 
# following lines to load the saved model and to load the dataset for display.
# model = bolt.UniversalDeepTransformer.load("ticketsearch.bolt")
# df = pd.read_csv("stackoverflow-display.csv")

def search(query):
    scores = model.predict({"query": query})
    sorted_post_ids = scores.argsort()[-max_posts:][::-1]
    relevant_posts = [
        df.iloc[pid] for pid in sorted_post_ids
    ]

    title = [gr.Markdown.update(visible=True)]
    boxes = [
        gr.Accordion.update(visible=True, open=False, label=post['title']) 
        for post in relevant_posts
    ]
    bodies = [
        gr.HTML.update(
            visible=True,
            value=f"<h1>{post['title']}</h1>\n\n"
            f"<h2>Question:</h2>\n{post['question_body']}\n\n"
            f"<h2>Answer:</h2>\n{post['answer_body']}\n\n"
            f"<h2>Tags:</h2>\n{post['tags']}\n\n")
        for post in relevant_posts
    ]

    return title + boxes + bodies


with gr.Blocks() as demo:
    query = gr.Textbox(label="Question")
    title = [gr.Markdown("# Relevant Posts", visible=False)]
    post_boxes = []
    post_bodies = []
    for i in range(max_posts):
        with gr.Accordion("", visible=False) as box:
            post_boxes.append(box)
            body = gr.HTML("", visible=False)
            post_bodies.append(body)
    allblocks = title + post_boxes + post_bodies

    # Replace `change()` with `submit()` to make the results update only after 
    # pressing 'enter' instead of at every keystroke.
    query.change(search, query, allblocks)


demo.launch()