## Search Engine using semantic search for Cordlife SG 
### Data source: 64 FAQs taken directly from the [website](https://www.cordlife.com/sg/)
### This search engine operates using a deep learning model under the 'Sentence Trasnformers library'.

### Author: Amir Rahman

### Testing the model on a sample of sentences

In [2]:
from sentence_transformers import SentenceTransformer, util
sentences = ["I'm happy", "I'm full of happiness"]

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

#Compute embedding for both lists
embedding_1= model.encode(sentences[0], convert_to_tensor=True)
embedding_2 = model.encode(sentences[1], convert_to_tensor=True)

x = util.pytorch_cos_sim(embedding_1, embedding_2)
print(x)

tensor([[0.6003]])


### Importing the FAQ dataset

In [3]:
import pandas as pd

raw = pd.read_csv('cordlife_faqs1.csv')

lookup = {}
for a, b, *_ in raw.values:
    lookup[a] = b

raw

Unnamed: 0,prompt,response,Unnamed: 2
0,What is cord blood?,Cord blood is blood that remains in the umbili...,
1,What are cord blood stem cells?,Cord blood stem cells are also known as Haemat...,
2,Why should I save my baby's cord blood stem ce...,There are several advantages of storing your b...,
3,What can cord blood stem cells do?,Cord blood stem cells can:\r\n\r\nReplace and ...,
4,How does a cord blood stem cell transplant work?,The purpose of a stem cell treatment is to rec...,
...,...,...,...
58,When will the new GST change from 7% to 8% tak...,The new GST change from 7% to 8% will take eff...,
59,How will the GST change affect my payments to ...,For Existing Clients\r\nThe revision in GST ra...,
60,"With the absorption of the 1% GST increase, wh...",Any enrolment on or before 31 December 2022 wi...,
61,What should I do if I am currently on an annua...,You can consider upgrading your current annual...,


### Loading the L6 model and initializing the sentence embedding variable

In [3]:
from sentence_transformers import SentenceTransformer, util

sentences = list(raw['prompt'].values)
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
sentences_embeddings = [model.encode(s, convert_to_tensor=True) for s in sentences]


### Creating main search function:
#### Includes the sentence embedding and question embedding variables

In [4]:
def search(q, THRESHOLD=0.65):
    q_embedding = model.encode(q, convert_to_tensor=True)
    out = []
    for sentence, sentence_embedding in zip(sentences, sentences_embeddings):
        score = util.pytorch_cos_sim(q_embedding, sentence_embedding).item()
        out.append((sentence, score))
    out.sort(key=lambda x:-x[-1])
    if out[0][1] < THRESHOLD:
        return ['I don\'t know']
    top = out[:3]
    return [(qns, lookup[qns], score) for qns,score in top]

search('what is cordlife preferential plan and how would it benefit a parent like me')

[('How would Preferential Plan benefit a Cordlife Parent like me?',
  "As a gesture of our appreciation to our Cordlife parents, we've introduced the Preferential Plan which allows Cordlife Preferential Plan lets you enjoy: Protection against increases in Annual Instalment Fees. Protection against hikes in Goods and Services Tax (GST). The convenience of a one-time payment for the remaining balance of annual instalment fee. Multiple payment modes such as Child Development Account (CDA), credit card interest-free instalment plans & etc.                                                           Contact us to find out more about the full benefits.",
  0.9085880517959595),
 ('What is Cordlife Preferential Plan?',
  "As a gesture of our appreciation to our Cordlife parents, we've introduced the Preferential Plan which allows parents to prepay the balance of their annual fees. Multiple payment modes are available; including the use of your child's Child Development Account (CDA) for this one

#### Testing the model and function on some sample questions. 
##### The accuracy scores of the model for the responses to the questions are provided at the end of each of the top 3 best matched FAQs

In [5]:
search('what is CDA and how does it work')

[('What is CDA First Step Grant?',
  'The CDA First Step Grant is new grant of $3,000 for eligible Singaporean children born from 24 March 2016. It is paid into the child�s CDA and forms part of the existing overall Government contribution cap.\r\n\r\nSingaporean children born from 24 March 2016 who are Singaporean Citizens (or become a citizen before turning 12 years old) and have lawfully married parents, are eligible for the CDA benefits.',
  0.7030069828033447),
 ('How does the Child Development Account (CDA) work?',
  'The CDA has two components: the CDA First Step Grant and the Government Dollar-for-Dollar Matching.\r\n\r\nUnder the CDA First Step Grant, parents will receive an initial amount of $3,000 (from the Government�s existing contribution caps) which will be deposited into your child�s account.',
  0.634533703327179),
 ('What is a Child Development Account (CDA)?',
  'The CDA* is a special savings account for children to help build up the savings that can be spent on appr

In [6]:
search('does cordlife plan to launch anything in the near future')

["I don't know"]

In [7]:
search('does cordlife have branches in europe or america')

["I don't know"]

In [8]:
search('can i donate blood to cordlife')

["I don't know"]

### Loading the L12 model and initializing the sentence embedding variable

In [9]:
from sentence_transformers import SentenceTransformer, util

sentences = list(raw['prompt'].values)
model = SentenceTransformer('sentence-transformers/all-MiniLM-L12-v2')
sentences_embeddings = [model.encode(s, convert_to_tensor=True) for s in sentences]


In [10]:
def search(q, THRESHOLD=0.65):
    q_embedding = model.encode(q, convert_to_tensor=True)
    out = []
    for sentence, sentence_embedding in zip(sentences, sentences_embeddings):
        score = util.pytorch_cos_sim(q_embedding, sentence_embedding).item()
        out.append((sentence, score))
    out.sort(key=lambda x:-x[-1])
    if out[0][1] < THRESHOLD:
        return ['I don\'t know']
    top = out[:3]
    return [(qns, lookup[qns], score) for qns,score in top]

search('what is cordlife preferential plan and how would it benefit a parent like me')

[('How would Preferential Plan benefit a Cordlife Parent like me?',
  "As a gesture of our appreciation to our Cordlife parents, we've introduced the Preferential Plan which allows Cordlife Preferential Plan lets you enjoy: Protection against increases in Annual Instalment Fees. Protection against hikes in Goods and Services Tax (GST). The convenience of a one-time payment for the remaining balance of annual instalment fee. Multiple payment modes such as Child Development Account (CDA), credit card interest-free instalment plans & etc.                                                           Contact us to find out more about the full benefits.",
  0.9248884916305542),
 ('What is Cordlife Preferential Plan?',
  "As a gesture of our appreciation to our Cordlife parents, we've introduced the Preferential Plan which allows parents to prepay the balance of their annual fees. Multiple payment modes are available; including the use of your child's Child Development Account (CDA) for this one

In [11]:
search('what is CDA and how does it work')

[('How does the Child Development Account (CDA) work?',
  'The CDA has two components: the CDA First Step Grant and the Government Dollar-for-Dollar Matching.\r\n\r\nUnder the CDA First Step Grant, parents will receive an initial amount of $3,000 (from the Government�s existing contribution caps) which will be deposited into your child�s account.',
  0.6768630743026733),
 ('What is a Child Development Account (CDA)?',
  'The CDA* is a special savings account for children to help build up the savings that can be spent on approved uses.\r\n\r\nYour child�s CDA can be opened at any OCBC Bank, DBS Bank or UOB Bank branch.\r\n\r\nYou can deposit cash into the CDA any time until 31 December of the year your child turns 12 years of age.\r\n\r\nFor more information on CDA, please visit www.babybonus.msf.gov.sg or ask our friendly consultants today!\r\n\r\n*A child is eligible to apply for a CDA if he/she is:\r\n\r\nBorn on or after 17 August 2008\r\nA Singaporean citizen (or becomes a citizen 

In [12]:
search('does cordlife plan to launch anything in the near future')

["I don't know"]

In [13]:
search('does cordlife have branches in europe or america')

["I don't know"]

### Loading the L12 Multilingual model and initializing the sentence embedding variable

In [4]:
# sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
from sentence_transformers import SentenceTransformer, util

sentences = list(raw['prompt'].values)
model = SentenceTransformer('sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2')
sentences_embeddings = [model.encode(s, convert_to_tensor=True) for s in sentences]


In [12]:
def search(q, THRESHOLD=0.65):
    q_embedding = model.encode(q, convert_to_tensor=True)
    out = []
    for sentence, sentence_embedding in zip(sentences, sentences_embeddings):
        score = util.pytorch_cos_sim(q_embedding, sentence_embedding).item()
        out.append((sentence, score))
    out.sort(key=lambda x:-x[-1])
    if out[0][1] < THRESHOLD:
        return ['I don\'t know']
    top = out[:3]
    return [(qns, lookup[qns], score) for qns,score in top]

search('what is cordlife preferential plan and how would it benefit a parent like me')

[('How would Preferential Plan benefit a Cordlife Parent like me?',
  "As a gesture of our appreciation to our Cordlife parents, we've introduced the Preferential Plan which allows Cordlife Preferential Plan lets you enjoy: Protection against increases in Annual Instalment Fees. Protection against hikes in Goods and Services Tax (GST). The convenience of a one-time payment for the remaining balance of annual instalment fee. Multiple payment modes such as Child Development Account (CDA), credit card interest-free instalment plans & etc.                                                           Contact us to find out more about the full benefits.",
  0.9360084533691406),
 ('What is Cordlife Preferential Plan?',
  "As a gesture of our appreciation to our Cordlife parents, we've introduced the Preferential Plan which allows parents to prepay the balance of their annual fees. Multiple payment modes are available; including the use of your child's Child Development Account (CDA) for this one

### Testing the models's capabilities with French 

In [13]:
search("qu'est-ce que le CDA et comment ça marche ?")

[('How does the Child Development Account (CDA) work?',
  'The CDA has two components: the CDA First Step Grant and the Government Dollar-for-Dollar Matching.\r\n\r\nUnder the CDA First Step Grant, parents will receive an initial amount of $3,000 (from the Government�s existing contribution caps) which will be deposited into your child�s account.',
  0.746713399887085),
 ('What is a Child Development Account (CDA)?',
  'The CDA* is a special savings account for children to help build up the savings that can be spent on approved uses.\r\n\r\nYour child�s CDA can be opened at any OCBC Bank, DBS Bank or UOB Bank branch.\r\n\r\nYou can deposit cash into the CDA any time until 31 December of the year your child turns 12 years of age.\r\n\r\nFor more information on CDA, please visit www.babybonus.msf.gov.sg or ask our friendly consultants today!\r\n\r\n*A child is eligible to apply for a CDA if he/she is:\r\n\r\nBorn on or after 17 August 2008\r\nA Singaporean citizen (or becomes a citizen b

### Testing the models's capabilities with Chinese language

In [14]:
search("什麼是 CDA 以及它是如何運作的？")

[('How does the Child Development Account (CDA) work?',
  'The CDA has two components: the CDA First Step Grant and the Government Dollar-for-Dollar Matching.\r\n\r\nUnder the CDA First Step Grant, parents will receive an initial amount of $3,000 (from the Government�s existing contribution caps) which will be deposited into your child�s account.',
  0.735127866268158),
 ('What is a Child Development Account (CDA)?',
  'The CDA* is a special savings account for children to help build up the savings that can be spent on approved uses.\r\n\r\nYour child�s CDA can be opened at any OCBC Bank, DBS Bank or UOB Bank branch.\r\n\r\nYou can deposit cash into the CDA any time until 31 December of the year your child turns 12 years of age.\r\n\r\nFor more information on CDA, please visit www.babybonus.msf.gov.sg or ask our friendly consultants today!\r\n\r\n*A child is eligible to apply for a CDA if he/she is:\r\n\r\nBorn on or after 17 August 2008\r\nA Singaporean citizen (or becomes a citizen b

In [15]:
search("我為什麼要儲存寶寶的臍帶血？")

[("How is my baby's cord blood stored?",
  "Your baby's umbilical cord blood is stored within a US FDA-approved cryogenic storage pouch made of a special material that is specifically designed to withstand cryogenic temperatures. The pouch has two main segments (20% and 80%) that are attached integrally, and two test segments that are also integrally attached. The integral segments allow for additional testing on the associated unit to ensure that no samples are mixed up and that the cord blood remains viable. These tests are typically performed prior to a transplant. Having dual integrated segments addresses the possibility of future stem cell expansion. This means that when stem cell expansion is available commercially, you will be able to withdraw a portion of the stem cells for expansion while keeping the remainder in storage.",
  0.8103278279304504),
 ("How is my baby's umbilical cord blood collected by my OBGYN?",
  'Immediately after the delivery of your baby, your child�s umbil

### Testing the models's capabilities with Malay language

In [16]:
search("kenapa saya perlu menyimpan darah tali pusat bayi saya?")

[("How is my baby's cord blood stored?",
  "Your baby's umbilical cord blood is stored within a US FDA-approved cryogenic storage pouch made of a special material that is specifically designed to withstand cryogenic temperatures. The pouch has two main segments (20% and 80%) that are attached integrally, and two test segments that are also integrally attached. The integral segments allow for additional testing on the associated unit to ensure that no samples are mixed up and that the cord blood remains viable. These tests are typically performed prior to a transplant. Having dual integrated segments addresses the possibility of future stem cell expansion. This means that when stem cell expansion is available commercially, you will be able to withdraw a portion of the stem cells for expansion while keeping the remainder in storage.",
  0.8031582236289978),
 ("Why should I save my baby's cord blood stem cells?",
  'There are several advantages of storing your baby�s cord blood stem cells

### Testing the models's capabilities with Indonesian language

In [17]:
search("kenapa saya harus menyimpan darah tali pusat bayi saya?")

[("How is my baby's cord blood stored?",
  "Your baby's umbilical cord blood is stored within a US FDA-approved cryogenic storage pouch made of a special material that is specifically designed to withstand cryogenic temperatures. The pouch has two main segments (20% and 80%) that are attached integrally, and two test segments that are also integrally attached. The integral segments allow for additional testing on the associated unit to ensure that no samples are mixed up and that the cord blood remains viable. These tests are typically performed prior to a transplant. Having dual integrated segments addresses the possibility of future stem cell expansion. This means that when stem cell expansion is available commercially, you will be able to withdraw a portion of the stem cells for expansion while keeping the remainder in storage.",
  0.8019660711288452),
 ("Why should I save my baby's cord blood stem cells?",
  'There are several advantages of storing your baby�s cord blood stem cells

In [20]:
search("疫情期間提取臍帶血安全嗎？")

[('Is it safe to have cord blood, cord lining and cord tissue collected during this pandemic?',
  'As cord blood will be collected by your caregiver immediately after the delivery of your baby, the collection of cord blood, cord lining, and cord tissue remains safe even during this pandemic period.\r\n\r\nEmerging evidence is now suggesting that the risk of direct transmission at the point of delivery is low, and there was no detection of the COVID-19 virus strain within the maternal and neonatal samples that can be transmitted to the umbilical cord blood and cord tissue.9\r\n\r\nIn most, if not all, cases of COVID-19 human-to-human transmission globally, the virus is spread mainly through close interaction with an infected person, where respiratory secretions can enter the eyes, mouth, nose, or airways, and via the touching of a surface or an object that is contaminated with the respiratory droplets.10\r\n\r\nCordlife is listed as an essential service provider during this pandemic. Ou