## Regular Expressions using the `re` module

In [1]:
# Import the regular expressions library
import re
from pprint import pprint

## Part 1: Toll free numbers
In the first subsection, you will go through a chunk of text which contains the name of companies and their phone numbers. 

From this, you are required to extract the name of the company and its phone number.

e.g. In the string `Our toll free number is 1800 180 1407` we should look to match `1800 180 1407`

In [2]:
# First open the 'tollfree.txt' file and read its contents to `phone_nums` variable
with open('tollfree.txt') as f:
    phone_nums = f.read()

In [3]:
# print a small sub-section of the file to see its contents
print(phone_nums[:100])

Toll Free Numbers in India Airlines Indian Airlines 1800 180 1407 Jet Airways 1800 22 5522 Spice Jet


In [4]:
### edTest(test_nums) ###
# Create a regular expression which extracts the toll free phone numbers
# of the form XXXX XXX XXXX
regex = re.compile(r'\d{4}\s\d{3}\s\d{4}')
# Use regex.findall() to get an extracted list of phone numbers
nums = regex.findall(phone_nums)
pprint(nums)

['1800 180 1407',
 '1800 180 3333',
 '1800 180 0101',
 '1800 425 1400',
 '1800 425 6664',
 '1800 424 1800',
 '1800 180 8080',
 '1800 180 1104',
 '1800 180 1225',
 '1901 180 9999',
 '1800 180 8000',
 '1800 425 4255']


In [5]:
# Unfortunately, the above regular expression only gives the phone numbers
# We need one that captures the name of the company and the phone numbers together
# HINT: Use () to make two groups, one that finds 'company name' and the other 'number'

regex = re.compile(r"(\b[a-zA-Z\s']{1,20})(\d{4}\s\d{3}\s\d{4})")
name_nums = regex.findall(phone_nums)
pprint(name_nums)

[(' Indian Airlines ', '1800 180 1407'),
 (' Spice Jet ', '1800 180 3333'),
 (' Kingfisher ', '1800 180 0101'),
 (' Indian Bank ', '1800 425 1400'),
 (' AMD ', '1800 425 6664'),
 (' Data One Broadband ', '1800 424 1800'),
 (' HCL ', '1800 180 8080'),
 (' Seagate ', '1800 180 1104'),
 (' Xerox ', '1800 180 1225'),
 (' LG ', '1901 180 9999'),
 (' Investments ', '1800 180 8000'),
 (' Templeton Fund ', '1800 425 4255')]


In [6]:
# If you have correctly captured the groups, the below code will print the name and number
# of the company
print('No.'.ljust(4) + 'Company'.ljust(22) + 'Phone number')
for sr,(company,number) in enumerate(name_nums,start=1):
    print(f'{str(sr).ljust(3)}{company.strip().ljust(20)}: {number}')

No. Company               Phone number
1  Indian Airlines     : 1800 180 1407
2  Spice Jet           : 1800 180 3333
3  Kingfisher          : 1800 180 0101
4  Indian Bank         : 1800 425 1400
5  AMD                 : 1800 425 6664
6  Data One Broadband  : 1800 424 1800
7  HCL                 : 1800 180 8080
8  Seagate             : 1800 180 1104
9  Xerox               : 1800 180 1225
10 LG                  : 1901 180 9999
11 Investments         : 1800 180 8000
12 Templeton Fund      : 1800 425 4255


## Part 2: Email IDs
In this subsection, you will dig through a blob of text for email ids.

In [7]:
# Next open the 'emails.txt' file using the .read() function
with open('emails.txt') as f:
    email_text = f.read()
##print(emails)

In [8]:
# print a small sub-section of the file to see its contents
print(email_text[:800])


INFOHINDIHUB.IN
SUBSCRIBE

SEARCH

Home Contact Us About Us Download Aadhar Computer
.
Free Bulk Email ID List - 1000 Active Email Data For Free 1 To 200
March 03, 2021
अगर आप Email Marketing करते है या करना चाहते है, तो आपको लोगो को email करने पड़ते है जिससे आप की वेबसाइट की marketing होती है. Email Marketing से हम बौहत अच्छी earning कर सकते है लकिन problem यह है की हमे email id कैसे मिले। free bulk email id list कैसे मिले. Free Email List कहा से Download करे. इस website की सहायता से हम आप को ज़ादा से ज़ादा free email address database देंगे। अगर आप google पर search कर रहे है की email id list kahan se nikale. तो आप को कही और जाने की ज़रूरत नही है. आप इस Email database को कॉपी एंड पेस्ट कर के इस्तिमाल कर सकते है.

Bulk Email ID
Aaradhykumar@gmail.com
Aarhantkumar@gmail.com

Aarishkumar@gmail.c


In [9]:
# Create a regular expression which extracts the email ids from the above text blob
# of the form xxxx@xxx.xxx
regex = re.compile("[\w.]+\@\w+\.\w+")
# Use regex.findall() to get an extracted list of emails
emails = regex.findall(email_text)

### ⏸ How many email ids from the `email_text` string are *Yahoo!* accounts ?


#### A. 24
#### B. 14
#### C. 16
#### D. 2

In [10]:
# The number of email ids which are yahoo accounts can be calculated by
print(len(re.findall('[\w.]+@yahoo.\w+',email_text)))

24


In [12]:
### edTest(test_chow2) ###

# Submit an answer choice as a string below (eg. if you choose option C, put 'C')
answer2 = 'A'