# Combining all pieces

Ideally, a similar process like in this notebook would be followed in a UI for deployment

In [1]:

from openai import OpenAI
from sqlalchemy import create_engine
from transformers import pipeline, TapasTokenizer, TapasForQuestionAnswering
import pandas as pd

  from .autonotebook import tqdm as notebook_tqdm


Load our new trained model for multilabel classification

In [2]:
prompt = "Who is the coordinator of the Data Engineering career?"

In [3]:
pipelabel = pipeline("sentiment-analysis", model = "mlabelclassmodel")

Get all labels with a probability higher than 0.5

In [4]:
labelresults = [labelscore["label"] for labelscore in pipelabel(prompt, return_all_scores = True)[0] if labelscore["score"] > 0.5]
labelresults



['careercoord']

Since available information is not gigantic, and I think the amount of labels is not that big, they can be classified with simple code \
I would like to have a defined database to make this part better, personally, I think most cases can be covered with a simple query \
rather than attaching another model just to do heavy lifting. It seems like there are models that can choose the table for you, \
but they seem to be used when there is a huge amount of tables, which for a school and the information that will be available, \
I think this is not the case.

In [5]:
def choosetable(labels: list):
    table = None
    if "careercoord" in labels:
        table = "coordinators"
    elif ("wherestudent" in labels) or ("studenttutor" in labels) or ("groupstudents" in labels):
        table = "students"
    elif ("whereprof" in labels) or ("classesprofessor" in labels) or ("subjectprofessors" in labels):
        table = "classes"
    elif ("careergroups" in labels) or ("groupclassroom" in labels) or ("grouptutor" in labels):
        table = "groups"
    return table

Get the table that matters to our prompt

In [6]:
requiredtable = choosetable(labelresults)
requiredtable

'coordinators'

Connect to table and retrieve it, then make a pandas Dataframe out of it

In [7]:
engine = create_engine('postgresql://localhost/school_info?user=postgres&password=somepassword')

In [8]:
connection = engine.connect()

In [9]:
dfresult = pd.read_sql("select * from \"" + requiredtable + "\"", connection)

In [10]:
dfresult

Unnamed: 0,career,coordinator
0,Data Engineering,Pascual Icíar
1,Robotics,Jerónimo Micaela
2,Embedded Systems,Alma Irma
3,Cybersecurity,José Ángel Anastacia


Don't forget to close your connection

In [11]:
connection.close()

Use our TAPAS model to extract the answer from the table

In [12]:
tokenizer = TapasTokenizer.from_pretrained("google/tapas-base-finetuned-wtq")
model = TapasForQuestionAnswering.from_pretrained("google/tapas-base-finetuned-wtq")


In [13]:
pipeextract = pipeline("table-question-answering", model = model, tokenizer = tokenizer)

In [19]:
answer = pipeextract(table = dfresult, query = prompt)["answer"]
answer

  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


'Pascual Icíar'

Reply to the user, generating text with the OpenAI API, using the prompt and the answer

In [20]:
client = OpenAI()

In [21]:
completion = client.chat.completions.create(
    model = "gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a chatbot for a university that helps students get information to guide themselves"},
        {"role": "user", "content": "Answer the prompt " + prompt + " knowing that the answer is " + answer}
    ]
)

In [22]:
completion.choices[0].message

ChatCompletionMessage(content='The coordinator of the Data Engineering career is Pascual Icíar.', role='assistant', function_call=None, tool_calls=None)