<a href="https://colab.research.google.com/github/Revmaker/BotFramework-WebChat/blob/master/Magic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# TASKS/NOTES
- Better dev and runtime environment
- Multiply-connected nodes
- Hyundai version
- Define synonyms in new sheet
- Match all words in user input with dictionary and process in batch
- Add text search as fall-back
- Dictionary vs. array
- Nodes: what CAN I talk about given which priors?
- Design convos for KPIs, not long tail
- Recipes
- Recommendation engine / build & price
- How to model tasks / when bot leads (BMW)
- Two cars with the same engine can share the engine in the graph
- Tags to classify nodes (model, attribute, performance...)
- Connect Croation data
- %debug magic command
- Node functions (explain, describe, compare...)

In [0]:
#!pip install --upgrade --quiet gspread
#!pip install -U textblob
#!python -m textblob.download_corpora

In [0]:
#  STEP 1: GET AUTHORIZATION TO CONNECT TO YOUR GOOGLE ACCOUNT 

from google.colab import auth
auth.authenticate_user()

import gspread
from oauth2client.client import GoogleCredentials

gc = gspread.authorize(GoogleCredentials.get_application_default())

In [140]:
#  STEP 2: CONNECT TO THE SPREADSHEET & READ WORKSHEET INTO A LIST CALLED 'gRows'
# ------------------------------------------------------------------------------
filename = 'Magic Data'
spreadsheet = gc.open(filename)
# super-freaky: yesterday (9/4/19) I was able to get a worksheet using 
# gc.open(filename).SheetName. Today I can only ref by index. WTF?
worksheet = spreadsheet.get_worksheet(0)    
gRows = worksheet.get_all_values()
# we don't use the DataFrame representation expect to display the fetched data
# in a nice format
import pandas as pd
pd.DataFrame.from_records(gRows)

Unnamed: 0,0,1,2,3,4
0,id,name,parent_name,parent_id,message
1,0,root,,-1,Root node
2,1,Kia,root,0,"Yes, I'm all about Kia. Ask me about models, f..."
3,2,Soul,kia,1,"The Soul is fun, affordable and the most popul..."
4,3,Niro,kia,1,"The Niro is versatile, fuel-efficient and avai..."
5,4,FE,niro,3,FE stand for Fuel-Efficient. 52mpg in the city.
6,5,Touring,niro,3,"The Touring comes with a sunroof, roofracks an..."
7,6,horsepower,FE,4,The FE's engine makes139 horsepower
8,7,price,FE,4,"Starts at $23,000"
9,8,horsepower,Touring,5,The Touring's hybrid engine/electric motor com...


In [0]:
# UTILITY FUNCTIONS TO ACCESS, FIND AND TRAVERSE NODES IN THE GRAPH MORE EASILY
# ------------------------------------------------------------------------------

# it's a pain to get to a gnode's id. We use the keys method which returns a set
# of keys of type dict_keys, then cast it to a list.
# The first (and only) element is the id, and Bob's your uncle
#
def get_node_from_gNode(gNode):
  nodeId = list(gNode.keys())[0]
  node = gNode[nodeId]
  return node

# give it a node_id and get back the matching node (without gNode 'wrapper')
#
def get_node_from_id(id):
  if id >= 0 and id < len(gNodes):
    return get_node_from_gNode(gNodes[id])
  else:
    error_print("get_node_from_id ERROR 3 id:",id)

# give it a node_id and get back the parent node
#
def get_parent_node_from_id(id):
  if id >= 0 and id < len(gNodes):
    this_node = get_node_from_id(id)
    parent_node = get_node_from_id(this_node["parent_id"])
    return parent_node
  else:
    error_print("ERROR 4")

# search for all nodes with name='name' downstream from node 'root_node'
# modifies 'matches' by reference
#
def find_downstream_nodes_by_name(matches,name,start_id):
  pprint("looking downstream for ",name,"starting at",start_id )
  node = get_node_from_id(start_id)
  if node["name"].lower() == name.lower():
    matches.append(node)
  for child in node["children"]:
    find_downstream_nodes_by_name(matches,name,child)
  pprint("Found matches",matches)

# search for all nodes with name='name' upstream from node 'root_node'
# modifies 'matches' by reference
#
def find_upstream_nodes_by_name(matches,name,start_id):
  pprint("looking upstream for ",name,"starting at",start_id )
  while True:
    node = get_node_from_id(start_id)
    if not(node):
      error_print("get_node_from_id ERROR 5 id:",start_id)
      break
    if node["name"].lower() == name.lower():
      matches.append(node)
    start_id = node["parent_id"]
  pprint(matches)

# search for all nodes at the same level as node start_id
# modifies 'matches' by reference
#
def find_sibling_nodes_by_name(matches,name,start_id):
  pprint("looking for siblings called ",name,"starting at",start_id )
  parent_node = get_parent_node_from_id(start_id)
  if not(parent_node):
    error_print("ERROR 6")
    return
  pprint("parent_node['children']",parent_node["children"])
  for node_id in parent_node["children"]:
    if node_id != start_id:
      child_node = get_node_from_id(node_id)
      pprint("child_name, name", child_node["name"],name)
      if child_node["name"].lower() == name.lower():
        matches.append(child_node)
  pprint(matches)

# search for all nodes with name='name' that are 'up and over', i.e. in a 
# different branch than current node
#
def find_all_nodes_by_name(matches,name,start_id):
  pprint("looking for nodes called ",name,"anywhere")
  find_downstream_nodes_by_name(matches,name,0)
  pprint(matches)

# crude way to turn debug info on and off
debug = False
def pprint(*s): 
  if debug: print(s)

error_debug = True
def error_print(*s): 
  if error_debug: print(s)



In [0]:
# STEP 3: READ THE LIST OF ROWS AND TURN EACH INTO A gNode, A GRAPH NODE
# ------------------------------------------------------------------------------
# 
# gNodes have a somewhat odd shape. They're a dictionary with the key being the 
# node ID to make it really easy to find a node by its id.
# The value of the tuple is the node itself, including its id. The price we pay
# for the extra complexity is that it's a pain to get the id from a gNode.
# That's the purpose of the get_node_from_gNode function. Use it liberally!
#
labels = gRows[0]
gNodes=[]
# don't want to process the header and have it become a bum node
skip_header = True
for row in gRows:
  node = {};
  if skip_header:
    skip_header = False
    continue
  for x in range(len(labels)):
    if (labels[x][-2:] == 'id'):
      # hack: if the label ends in 'id', convert the cell to integer
      node[labels[x]] = int(row[x])
    else:
      node[labels[x]] = row[x]
  node["children"] = []
  gNode = {node["id"]:node}
  gNodes.append(gNode)

# add list of children to each node to save us processing time
for gNode in gNodes:
  # get a node and see who its parent is
  childNode = get_node_from_gNode(gNode)
  childId = childNode["id"]
  parentId = childNode["parent_id"]
  if parentId < 0:
    continue
  # now add this node's ID to the parentId node's list of children
  parentNode = get_node_from_id(parentId)
  parentNode["children"].append(childId)

# print out list of nodes
for gNode in gNodes:
  pprint(gNode)


In [0]:
# FUNCTION LIBRARY
# ------------------------------------------------------------------------------

def getBridge(from_id,to_id):
  bridge = ''
  while True:
    pprint("bridge:",bridge)
    node = get_node_from_id(to_id)
    if node == None:
      error_print("ERROR 7")
      return bridge
    bridge = node['name'] + "➡︎" + bridge
    to_id = node["parent_id"]
    if to_id == from_id:
      return bridge 
  
# output bot message to user
def respond(*argv):
  response = "   "
  for arg in argv:
    response += str(arg)
  print(response)

# CASE 1: only one node with that name downstream
def oneMatch(node):
  respond(node["message"])
  contextId = matches[0]["id"]

def getPath(from_id,to_id):
  # Create a list of node_ids that lead from current node to each Target nodes
  path = []
  while True:
    target_node = get_node_from_id(to_id)
    if target_node == None:
      return path
    path.insert(0,target_node)
    to_id = target_node["parent_id"]
    if to_id == from_id:
      return path

""" disambiguate
    target_nodes: all nodes that matched a given a token entered by user ("kia", "EV")
    returns: user's choice of first branch point
"""
def disambiguate(target_nodes):

  # Build a list of paths from the current to each of the Target nodes
  paths = []
  for x in range(len(target_nodes)):
    node = target_nodes[x]
    path = getPath(contextId,node["id"])
    paths.append(path)
  
  pprint("PATHS in Disambiguate:",paths)
  x = 0
  while True:
    # Determine how many branches there are
    first_node_ids=[]
    for path in paths:
      first_node_ids.append(path[x]["id"])

    # Converting the list to a set and back leaves only unique elements
    unique_node_ids = set(first_node_ids)
    unique_node_ids = (list(unique_node_ids))
    pprint("unique_node_ids in disambiguate:",unique_node_ids)
    if len(unique_node_ids) == 1:
      x+=1
      respond("all the same")
    else:
      break

  respond("Which one do you mean?")
  x=0
  responses=[]
  for id in unique_node_ids:
    x+=1
    node = get_node_from_id(id)
    responses.append(node)
    respond("  ",x,": ",node['name'])
  while True:
    msg = input()
    if msg.isdigit():
      index = int(msg)-1
      if index >= 0 and index < len(responses):
        node = responses[index]
        respond(node["message"])
        new_contextId = node["id"]
        break
      else:
        respond("try one of those numbers above")
    else:
      for node in responses:
        if node["name"] == msg:
          respond("nice choice: ",msg)
          new_contextId = node["id"]
          break
  return new_contextId


In [143]:
# START THE CONVERSATION
# ------------------------------------------------------------------------------
contextId = 0         # id of the current node
stopWord = "q"        # way to terminate user input loop
debug = False         # type 'debug' to toggle 
error_debug = False   # can't be toggled in convo
# Sequence in which we search for matching nodes
searches = [find_downstream_nodes_by_name,
            find_upstream_nodes_by_name,
            find_sibling_nodes_by_name,
            find_all_nodes_by_name]
# Main loop where we wait for and process user input
while True:
  user_input = input()
  if user_input == stopWord:
    break
  if user_input == "debug":
    debug = not(debug)
    continue
  ''' Some user queries result in multiple matches. If you start the convo by 
      typing 'price', there are one or more disambiguation steps involved each
      of which requires a run through this loop'''
  resolved = False
  while not(resolved):
    ''' When we performed disambiguation, we go through this loop again as it
        the user had entered a new command. This way we can do multiple steps 
        of disambiguation'''
    for search in searches:
      matches=[]
      search(matches,user_input,contextId)  
      nMatches = len(matches)
      ''' This is our preferred way to exit this loop: we found one matching
          node'''
      if nMatches == 1:
        oneMatch(matches[0])
        contextId = matches[0]["id"]
        resolved = True
        break
      ''' If we found more than one matching node, we disambiguate and then 
          run through this loop again'''
      if nMatches > 1:
        contextId = disambiguate(matches)
        break
    ''' If all has failed, we give control back to the user to type something
        more intelligable'''
    if nMatches == 0:
      pprint("huh?")
      resolved = True
    

price
   all the same
   Which one do you mean?
     1: warranty
     2: Soul
     3: Niro
2
   The Soul is fun, affordable and the most popular hatchback in the country
   Which one do you mean?
     1: EV
     2: Touring 
1
   The Soul EV. All-new for 2020. Powerful, fun and 234 miles range
   Starts at $35,000
price
   Starts at $35,000
kia
   Yes, I'm all about Kia. Ask me about models, features, pricing, inventory...
price
   Which one do you mean?
     1: warranty
     2: Soul
     3: Niro
1
   Kia has the best warranty of all car makers
   It's free -- no charge.
price
   It's free -- no charge.


KeyboardInterrupt: ignored

In [146]:
from IPython.display import display, HTML
js = "<script>alert('Hello World!');</script>"
display(HTML(js))