<a href="https://colab.research.google.com/github/carloslme/automating-boring-stuff/blob/main/Chapter_7_Project_Phone_Number_and_Address_Extractor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Summary
Find every phone number and email address in a long web page or document.

The code will need to to the following:
* Use the `pyperclip` module to copy and paste strings.
* Create two regexes, one for matching phone numbers and the other for matching email addresses.
* Find all matches, not just the first match, of both regexes.
* Neatly format the matches strings into a single string to paste.
* Display some king of message if no matches were found in the text.


# Step 1: Create a Regex for Phone

## Numbers
Create a regular expression to search for phone numbers

In [None]:
# !pip install pyperclip

Collecting pyperclip
  Downloading https://files.pythonhosted.org/packages/6f/4c/0b1d507ad7e8bc31d690d04b4f475e74c2002d060f7994ce8c09612df707/pyperclip-1.8.1.tar.gz
Building wheels for collected packages: pyperclip
  Building wheel for pyperclip (setup.py) ... [?25l[?25hdone
  Created wheel for pyperclip: filename=pyperclip-1.8.1-cp36-none-any.whl size=11120 sha256=f3491dac3ccd456adea724e638cb30545af205fd79a6834fd20f4a8d9e5b1f0e
  Stored in directory: /root/.cache/pip/wheels/44/10/3a/c830e9bb3db2c93274ea1f213a41fabde0d8cf3794251fad0c
Successfully built pyperclip
Installing collected packages: pyperclip
Successfully installed pyperclip-1.8.1


In [None]:
# %cd /content/drive/MyDrive/Colab\ Notebooks

In [None]:
# !ls

Chapter_1_Exploratory_Data_Analysis.ipynb
Chapter_2_Introduction_to_TensorFlow_Extended.ipynb
Chapter_7_Project_Phone_Number_and_Address_Extractor.ipynb
phoneAndEmail.py
Portfolio_and_projects
testing_data_on_google_colab.ipynb
ThinkStats2
ThinkStats2_Exercises


In [30]:
import pyperclip, re

phoneRegex = re.compile(r'''(
    (\d{3}|\(\d{3}\))? # area code 
    (\s|-|\.)? # separator 
    (\d{3}) # first 3 digits 
    (\s|-|\.) # separator 
    (\d{4}) # last 4 digits
    (\s*(ext|x|ext.)\s*(\d{2,5}))? # extension
  )''', re.VERBOSE)

# Step 2: Create a Regex for Email Addresses

In [31]:
emailRegex = re.compile(r'''(
  [a-zA-Z0-9._%+-]+ # username 
  @ # @ symbol 
  [a-zA-Z0-9.-]+ # domain name 
  (\.[a-zA-Z]{2,4}) # dot-something 
  )''', re.VERBOSE)

# Step 3: Find All Matches in the Clipboard Text

In [None]:
text = str(pyperclip.paste()) 
matches = [] 
for groups in phoneRegex.findall(text): 
  # These groups are the area code, first three digits, last four digits, and extension.
  phoneNum = '-'.join([groups[1], groups[3], groups[5]]) 
  if groups[8] != '':
    phoneNum += ' x' + groups[8] 
  matches.append(phoneNum)
for groups in emailRegex.findall(text):
  matches.append(groups[0])

# Step 4: Join the Matches into a String for the Clipboard

In [33]:
# Copy results to the clipboard. 
if len(matches) > 0: 
  pyperclip.copy('\n'.join(matches)) 
  print('Copied to clipboard:') 
  print('\n'.join(matches)) 
else: 
  print('No phone numbers or email addresses found.')

No phone numbers or email addresses found.


# Running
* Just download the Google Colab notebook as *.py
* Then install pyperclip with the command: `pip install pyperclip`
* Open any webpage and copy, press Ctrl+A, Ctrl-C
* Open Command Prompt (assuming you already have Python installed) and execute the script downloaded: `python /Chapter_7_Project_Phone_Number_and_Address_Extractor.py`

# Example
* Access this webpage: https://www.banxico.org.mx/contacto/contacto.html
* Press Ctrl+A, Ctrl+C
* Execute the python script
* Get the results

```
D:\Downloads> python Chapter_7_Project_Phone_Number_and_Address_Extractor.py
Copied to clipboard:
800-767-7734
800-226-9426
dinero@banxico.org.mx
```

