<a href="https://colab.research.google.com/github/averma12/Full-Stack-DL/blob/master/Copy_of_Localization_using_word_vectors_Aug_2020.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Example 1  Step 1: Extract entities that need to be localized
Example 1 focuses on an example that it completely automated and needs no manual intervention. In example 2 we will focus on providing UI functionality to select the correct choice manually.

In [None]:
import spacy
import pandas as pd
from spacy import displacy
from spacy.tokens import Span
nlp = spacy.load("en")

In [None]:

original_input = "Frank lives in San Francisco and Elizabeth lives in Los Angeles. If the flight time is 2 hrs when will Elizabeth reach Frank if she starts at 8am in the morning?"
processed_input_text=nlp(original_input)
keyword_set = set()
entity_mapping = []
for token in processed_input_text.ents:
    if token.text not in keyword_set:
      keyword_set.add(token.text )
      entity_mapping.append((token.text,token.label_))
print (entity_mapping)
displacy.render(processed_input_text, style='ent', jupyter=True)

[('Frank', 'PERSON'), ('San Francisco', 'GPE'), ('Elizabeth', 'PERSON'), ('Los Angeles', 'GPE'), ('2', 'CARDINAL'), ('8am in the morning', 'TIME')]


In [None]:
# Now all entities cannot be localized. Example no need to localize numbers. So keep only relevant entities that need to be localized.
keep_entities_list = ['PERSON','GPE','FAC','ORG','PRODUCT','NORP','MONEY','LOC','WORK_OF_ART','LAW','LANGUAGE','QUANTITY']
finalized_entity_mapping = {}
for ent in entity_mapping:
  if ent[1] in keep_entities_list:
    finalized_entity_mapping[ent[0]] = []

print (finalized_entity_mapping)





{'Frank': [], 'San Francisco': [], 'Elizabeth': [], 'Los Angeles': []}


## Example 1 Step 2: Initialize the Google news word vectors from Gensim and perform localization

In [None]:
import gensim.downloader as api
model = api.load("word2vec-google-news-300") 
word_vectors = model.wv



  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL
  This is separate from the ipykernel package so we can avoid doing imports until


In [None]:
Origin_country='USA' 
Target_country='India'

final_mapping ={}

for word in finalized_entity_mapping: 
  word = word.strip()
  word = word.replace(" ","_")
  try:
    similar_words_list= model.most_similar(positive=[Target_country,word],negative=[Origin_country],topn=10)
    # Remove the scores for the retrieved choices
    similar_words_list = [choices[0].replace("_"," ") for choices in similar_words_list ]
    final_mapping[word.replace("_"," ")] = similar_words_list
  except:
    similar_words_list = []
    print (" Fetching similar words failed for ",word)
  print (word," -- Replacement suggestions -- ",similar_words_list)


  if np.issubdtype(vec.dtype, np.int):


Frank  -- Replacement suggestions --  ['Sanjay Verma', 'Sabyasachi Sen', 'JK Jain', 'Sunil Chauhan', 'Don', 'Sudip', 'Ajay Shankar', 'Robert', 'V. Srinivasan', 'Kanwar Sain']
San_Francisco  -- Replacement suggestions --  ['Bangalore', 'Kolkata', 'Mumbai', 'Chennai', 'Delhi', 'Hyderabad', 'Calcutta', 'San Franciso', 'Bombay', 'Bengaluru']
Elizabeth  -- Replacement suggestions --  ['Rekha', 'Nandita', 'Meera', 'Margaret', 'Katharine', 'Bhagirath', 'Monica', 'Lakshmi', 'Manisha', 'Anita']
Los_Angeles  -- Replacement suggestions --  ['Mumbai', 'Los Angles', 'Kolkata', 'Chennai', 'Bangalore', 'LA', 'Delhi', 'Hyderabad', 'Ahmedabad', 'Calcutta']


In [None]:
from IPython.display import Markdown, display

#  Here localization is performed assuming the correct choice is returned first.
#  Elizabeth  -- Replacement suggestions --  ['Rekha', 'Nandita', 'Meera', 'Margaret', 'Katharine', 'Bhagirath', 'Monica', 'Lakshmi', 'Manisha', 'Anita']
#  Example Elizabeth  is replaced with Rekha.  In the next section we will see how to provide an UI to choose the replacement ourselves.

#  This function is used to bolden the relevant entities that are changed.
def prepare_string(sentence,mapping,orig=True):
  if orig:
    for k in mapping:
      sentence = sentence.replace(k,"**"+k+"**")
  else:
    for k in mapping:
      sentence = sentence.replace(mapping[k][0],"**"+mapping[k][0]+"**")

  return sentence


def localize(sentence,mapping):
  for k in mapping:
    sentence = sentence.replace(k,mapping[k][0])
  return sentence


def printmd(string):
    display(Markdown(string))



print('Original Sentence:')
printmd(prepare_string(original_input,final_mapping))

localized_string =  localize(original_input,final_mapping)

print('\nLocalized Sentence:')
printmd(prepare_string(localized_string,final_mapping,orig=False))



Original Sentence:


**Frank** lives in **San Francisco** and **Elizabeth** lives in **Los Angeles**. If the flight time is 2 hrs when will **Elizabeth** reach **Frank** if she starts at 8am in the morning?


Localized Sentence:


**Sanjay Verma** lives in **Bangalore** and **Rekha** lives in **Mumbai**. If the flight time is 2 hrs when will **Rekha** reach **Sanjay Verma** if she starts at 8am in the morning?

## Example 2 Step 1: Extract entities that need to be localized
In example 2 we will focus on how we can manually add any missed entities and also select the correct replacement via a simple UI. 

In [None]:
original_input = "Elizabeth bought 10 croissants at the Los Angeles airport for 5 dollars. How much does 4 croissants cost?"
processed_input_text=nlp(original_input)
keyword_set = set()
entity_mapping = []
for token in processed_input_text.ents:
    if token.text not in keyword_set:
      keyword_set.add(token.text )
      entity_mapping.append((token.text,token.label_))
print (entity_mapping)
displacy.render(processed_input_text, style='ent', jupyter=True)

[('Elizabeth', 'PERSON'), ('10', 'CARDINAL'), ('Los Angeles', 'GPE'), ('5 dollars', 'MONEY'), ('4', 'CARDINAL')]


In [None]:
# Since croissants is missing here, let's add it manually with a fixed entity name from spacy
entity_mapping.append(('croissants','PRODUCT'))


In [None]:
def filter_numbers_from_entity(ent_string):
  # get the last word and see if it doesn't have numbers. 
  # Eg: In 5 dollars, 8 pounds try to extract only dollars and pounds
  last= ent_string.split()[-1]
  if last.isalpha():
    return last
  else:
    return None


# Now all entities cannot be localized. Example no need to localize numbers. So keep only relevant entities that need to be localized.
keep_entities_list = ['PERSON','GPE','FAC','ORG','PRODUCT','NORP','MONEY','LOC','WORK_OF_ART','LAW','LANGUAGE','QUANTITY']
finalized_entity_mapping = {}
for ent in entity_mapping:
  if ent[1] in keep_entities_list:
    if ent[1] in ['MONEY','QUANTITY']:
      filtered_entity = filter_numbers_from_entity(ent[0])
      if filtered_entity is not None:
        finalized_entity_mapping[filtered_entity] = []
    else:
      finalized_entity_mapping[ent[0]] = []

print (finalized_entity_mapping)





{'Elizabeth': [], 'Los Angeles': [], 'dollars': [], 'croissants': []}


## Example 2 Step 2: Perform localization


In [None]:
Origin_country='USA' 
Target_country='India'

final_mapping ={}

for word in finalized_entity_mapping: 
  word = word.strip()
  word = word.replace(" ","_")
  try:
    similar_words_list= model.most_similar(positive=[Target_country,word],negative=[Origin_country],topn=10)
    # Remove the scores for the retrieved choices
    similar_words_list = [choices[0].replace("_"," ") for choices in similar_words_list ]
    final_mapping[word.replace("_"," ")] = similar_words_list
  except:
    similar_words_list = []
    print (" Fetching similar words failed for ",word)
  print (word," -- Replacement suggestions -- ",similar_words_list)


  if np.issubdtype(vec.dtype, np.int):


Elizabeth  -- Replacement suggestions --  ['Rekha', 'Nandita', 'Meera', 'Margaret', 'Katharine', 'Bhagirath', 'Monica', 'Lakshmi', 'Manisha', 'Anita']
Los_Angeles  -- Replacement suggestions --  ['Mumbai', 'Los Angles', 'Kolkata', 'Chennai', 'Bangalore', 'LA', 'Delhi', 'Hyderabad', 'Ahmedabad', 'Calcutta']
dollars  -- Replacement suggestions --  ['rupees', 'crores', 'Rupees', 'Rs.###', 'Rs## crores', 'Rs5 crore', 'Rs.##', 'Rs.### crore Rs.1', 'INR ###bn', 'Rs.### crore Rs.#.#']
croissants  -- Replacement suggestions --  ['idlis', 'jalebis', 'pakoras', 'dosas', 'parathas', 'idli', 'masala dosa', 'jalebi', 'idli vada', 'idli dosa']


**Select the necessary replacements from the drop downs shown.**

In [None]:
import ipywidgets as widgets                        # Creating widgets
from IPython.display import display                 # Displaying widgets

output = widgets.Output()
# Creating dropdown objects
dropdownobjects =[widgets.Dropdown(options = final_mapping[key], description=key) for key in final_mapping]

# Display the dropdowns
input_widgets = widgets.VBox(dropdownobjects)
print ("Choose from the dropdown the best replacement for a given word and proceed to the next cell : \n")
display(input_widgets)

Choose from the dropdown the best replacement for a given word and proceed to the next cell : 



VBox(children=(Dropdown(description='Frank', options=('Sanjay Verma', 'Sabyasachi Sen', 'JK Jain', 'Sunil Chau…

In [None]:
final_UI_chosen_mapping = {}
for key,choice in zip(final_mapping.keys(),dropdownobjects):
  final_UI_chosen_mapping[key] = choice.value
print (final_UI_chosen_mapping)

{'Frank': 'Sanjay Verma', 'San Francisco': 'Bangalore', 'Elizabeth': 'Rekha', 'Los Angeles': 'Mumbai'}


In [None]:
from IPython.display import Markdown, display

#  This function is used to bolden the relevant entities that are changed.
def prepare_string(sentence,mapping,orig=True):
  if orig:
    for k in mapping:
      sentence = sentence.replace(k,"**"+k+"**")
  else:
    for k in mapping:
      sentence = sentence.replace(mapping[k],"**"+mapping[k]+"**")

  return sentence


def localize(sentence,mapping):
  for k in mapping:
    sentence = sentence.replace(k,mapping[k])
  return sentence


def printmd(string):
    display(Markdown(string))



print('Original Sentence:')
printmd(prepare_string(original_input,final_UI_chosen_mapping))

localized_string =  localize(original_input,final_UI_chosen_mapping)


print('\nLocalized Sentence:')
printmd(prepare_string(localized_string,final_UI_chosen_mapping,orig=False))



Original Sentence:


**Elizabeth** bought 10 croissants at the **Los Angeles** airport for 5 dollars. How much does 4 croissants cost?


Localized Sentence:


**Rekha** bought 10 croissants at the **Mumbai** airport for 5 dollars. How much does 4 croissants cost?