# Exploring techniques for NLP
As we mentioned in the lecture slides, the different approaches used to solve NLP problems commonly fall into three categories: 
- rule-based, 
- machine learning, and 
- deep learning. 

In this notebook we will try to show you how to use rule-based approaches to solve NLP problems. You can open the cloud version of this notebook using the following link:
<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/Ali-Alameer/NLP/blob/main/week1_rule_based_technique.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>


## Rule-based technique
Similar to other early AI systems, early attempts at designing NLP systems were based on building rules for the task at hand. This required that the developers had some expertise in the domain to formulate rules that could be incorporated into a program. Such systems also required resources like dictionaries and thesauruses, typically compiled and digitized over a period of time.

Regular expressions (regex) are a great tool for text analysis and building rule-based systems. A regex is a set of characters or a pattern that is used to match and find substrings in text. For example, a regex like <b><font color='maroon'>‘^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.] +)\.([a-zA-Z]{2,5})$’</font></b> is used to find all email IDs in a piece of text. Regexes are a great way to incorporate domain knowledge in your NLP system. For example, given a customer complaint that comes via chat or email, we want to build a system to automatically identify the product the complaint is about. There is a range of product codes that map to certain brand names. We can use regexes to match these easily.

Senario: Consider a chatbot that tries to help customer support team by identifying order numbers from the following chats with different customers:

- <b>Customer 1:</b> "Hello, I am having an issue with my order # 412889912"
- <b>Customer 2:</b> "I have a problem with my order number 412889913" 
- <b>Customer 3:</b> "My order 412889915 is having an issue, I was charged £300 when online it says £280" 

Regular expression can be used to find all the order numbers in the chat.

In [None]:
import re

chat1='Customer 1: Hello, I am having an issue with my order # 412889912'

pattern = 'order[^\d]*(\d*)' # a little details about the pattern given, e.g., \d digit etc 
matches = re.findall(pattern, chat1)
matches

In [None]:
chat2='Customer 2: I have a problem with my order number 412889912'
pattern = 'order[^\d]*(\d*)'
matches = re.findall(pattern, chat2)
matches

In [None]:
chat3='Customer 3: My order 412889912 is having an issue, I was charged £300 when online it says £280'
pattern = 'order[^\d]*(\d*)'
matches = re.findall(pattern, chat3)
matches

In [4]:
def get_pattern_match(pattern, text):
    matches = re.findall(pattern, text)
    if matches:
        return matches[0]

In [None]:
get_pattern_match('order[^\d]*(\d*)', chat1)

## Exercise
In the following of conversation, chatbot asked about the customer phone number and email address. Following are three sample answers:

- <b>Customer 1:</b> "you ask lot of questions 😠  1235678912, customer1@salford.com"
- <b>Customer 2:</b> "here it is: (123)-567-8912, cust2@gmail.com" 
- <b>Customer 3:</b> "yes, phone: 1235678912 email: customer_3@my-site.com" 

Try to write regular expression to find the phone number and email address for each response.


In [7]:
chat1 = 'Customer 1: you ask lot of questions 😠  1235678912, customer1@salford.com'
chat2 = 'Customer 2: here it is: (123)-567-8912, cust2@gmail.com'
chat3 = 'Customer 3: yes, phone: 1235678912 email: customer_3@my-site.com'

In [None]:
# write a pattern to extract email address of customer 1.
pattern = '?'
get_pattern_match(pattern ,chat1)

In [None]:
# write a pattern to extract phone number of customer 2.
pattern = '?'
get_pattern_match(pattern ,chat2)

In [None]:
# write a pattern to extract phone number of customer 3.
pattern = '?'
get_pattern_match(pattern ,chat2)