<a href="https://colab.research.google.com/github/Ayush-Singh2309/Python2-Shivank/blob/main/08-Regex_notes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Regex

---

## Content

- Regular Expressions
  - Simple pattern matching
  - `.` character
  - `\` character
- Metacharacters
  - Matching digits (and non digits) - \d , \D
  - Matching word (and non-word) characters - \w , \W
  - Matching whitespace characters - \s , \S
- Anchors
  - Word boundary (and non-boundary) anchors - \b , \B
  - Beginning and end anchors - ^ , $
- Character Set
  - Range notation - [a-z] , [A-Z] , [1-6]
  - Negation - [^a-z]
- Quantifiers
  - *, +, ?, {3}, {3, 4}
- Groups
  - `()` and `|`
- Functions in `re` library

---

### Why do we study regex?

- It helps in Natural Language Processing.
- It comes handy for text analytics and text processing tasks.
- It helps in searching, parsing and manipulating textual data.

### Business Use Case

- Extracting customer information
- Validating an email
- Masking a phone number

---

### Email Validation

How can we check if an email address is valid?

- There are a bunch of rules provided by the WWW.
- Even if we have those rules, in order to validate an email, we will have to write a big if-elif-else based solution and still miss out on some conditions.

This is where **regex** comes into action.

In [None]:
import re # importing regex library

def is_vemail(s):
  email_pattern = "^\w+([\.-]?\w+)*@\w+([|.-]?\w+)*(\.\w{2,3})+$"
  res = re.search(email_pattern, s)
  # scans through string looking for the first location where pattern is found
  if res:
    return True
  else:
    return False

In [None]:
is_vemail("abcd@gmail")

False

In [None]:
is_vemail("abcd@xy.sx")

True

In [None]:
is_vemail("nikhil.sanghi@scaler-academy.co.in")

True

#### The string `^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$` is called a **Regular Expression**.

1. `^` means start and `$` means end
2. `\w+` means alphanumeric, one or more times
3. Then the group `([\.-]?\w+)*`
4. Within the group, we have `[\.-]?` which means either dot or dash, zero or one time.
5. `(\.\w{2,3})+` - matches `.` followed by 2 or 3 letter word i.e. com, in etc.

#### So, what is a Regular Expression (RegEx)?

- RegEx can be thought of as a highly specialised programming language which can be used to find the patterns in the strings.

#### How to write these regular expressions?
- Rules for writing REs are programming language agnostic.

---

In [None]:
# Importing necessary libraries -

import re
import pandas as pd
import numpy as np

In [None]:
# Downloading dataset -

!gdown 1sSDV5UspYZL3UUOGuiuxppSGcv1wS9ex

Downloading...
From: https://drive.google.com/uc?id=1sSDV5UspYZL3UUOGuiuxppSGcv1wS9ex
To: /content/data.txt
  0% 0.00/9.33k [00:00<?, ?B/s]100% 9.33k/9.33k [00:00<00:00, 17.0MB/s]


In [None]:
data = open("data.txt", "r").read()
print(data[:500]) # display some data

Dave Martin
615-555-7164
173 Main St., Springfield RI 55924
davemartin@bogusemail.com

Charles Harris
800-555-5669
969 High St., Atlantis VA 34075
charlesharris@bogusemail.com

Eric Williams
560-555-5153
806 1st St., Faketown AK 86847
laurawilliams@bogusemail.com

Corey Jefferson
900-555-9340
826 Elm St., Epicburg NE 10671
coreyjefferson@bogusemail.com

Jennifer Martin-White
714-555-7405
212 Cedar St., Sunnydale CT 74983
jenniferwhite@bogusemail.com

Erick Davis
800-555-6771
519 Washington St., 


---

### Online tool: https://regex101.com/

**Sample Text :**

```
abcdefghijklmnopqurtuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890

abcdef

.[{()\^$|?*+

scaler.com

321-555-4321
123.555.1234

Mr. Varma
Mr Anant  
Ms Nandini
Mrs. Singh
Mr. T
```

#### Finding simple patterns in this textual content :

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/352/original/1.png?1706165253" width="500" height="550">

- Notice that it couldn't find "ABC". That means it is case sensitive.
- **Regular Expressions are case-sensitive.**

#### Can it find the same string if we jumble up the characters in RE?

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/353/original/2.png?1706165409" width="500" height="500">

- **Order matters (e.g. "abc" is different from "cba")**

#### Finding period symbol (dot) "."

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/355/original/3.png?1706165552" width="500" height="500">

- It highlights everything except newline.
- Which means **dot has a special meaning in REs**.

#### Use escape character (backslash) along with dot `.`

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/356/original/4.png?1706165717" width="500" height="500">

- **escape sequence followed by dot matches with period character.**

---

### What are Metacharacters?

- Special characters that don't match themselves.
- Instead, they signal, some out-of-ordinary thing should be matched.
- Example: `.[{()\^$|?*+`
- Note that backslash is also a meta-character, so to look for literal `\` , we need to use `\\`

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/358/original/5.png?1706165965" width="500" height="500">

#### Metacharacter - `\d`
- matches with all the digits

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/359/original/6.png?1706166082" width="500" height="500">

#### Metacharacter - `\D`
- matches for everything which is not a digit except newline

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/360/original/7.png?1706166217" width="500" height="400">

#### Metcharacter - `\w`
- looks for alphanumeric characters and underscore `(_)`

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/362/original/8.png?1706166344" width="500" height="400">

#### Metcharacter - `\W`

- matches with non-alphanumeric characters

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/363/original/9.png?1706166420" width="500" height="400">

- upper-case counterparts negates the search
- none of these meta-characeters matches for newline character.

#### Matching whitespace characters

- `\s`
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/364/original/10.png?1706166627" width = "500" height = "400">

- `\S` <br>
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/365/original/11.png?1706166657" width = "500" height = "400">

### Other Special Characters

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/366/original/12.png?1706166741" width="500" height="300">

**Documentation:** https://docs.python.org/3/howto/regex.html

---

### What are Anchors?

- They don't match any characters.
- They match invisible positions before or after the characters.

#### `\b` - word boundary

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/368/original/13.png?1706166971" width="500" height="400"><br>

- Notice that if we're checking word boundary with Ha, it only matches with first two Ha.
- Both start of the **line and whitespace are considered as word boundaries**.
<br>

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/369/original/14.png?1706166984" width="500" height="400">

Lets see two very important anchors `^` and  `$`

 - `^` matches for a pattern only if it's in the begining of the string.
 - `$` matches for a pattern only if it's in the end of the string.

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/370/original/15.png?1706167186" width="500" height="400">

---

#### Parse phone numbers in the given sample text -

**Sample Text :**

```
abcdefghijklmnopqurtuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890

Ha HaHa

.[{()^$|?*+


scaler.com
321-555-4321
123.555.1234


Mr. Varma
Mr Anant
Ms Nandini
Mrs. Singh
Mr. T
```

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/371/original/16.png?1706167394" width="500" height="400"> <br>

Replace "1234567890" with "123456789012"

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/372/original/17.png?1706167532" width="500" height="400"> <br>

- We need to be more specific about only allowing dot or dash.
- Currently, it matches the wrong pattern rather than what we actually want.

---

### Character Set

- Character Set matches for one of the possible characters defined in the set.
- For example : `[.-]`

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/373/original/18.png?1706167850" width = "500" height = "400">

**Note:** Instead of listing all potential characters that you want to match, you can also provide the range.
1. `[1-6]` - Looks for digit between 1 and 6.
2. `[a-z]` - Looks for any alphabet between a and z.

#### What if want to match the chars/numbers except the ones mentioned in the set?

- Use `^` to negate the set like `[^a-z]` to match everything which is not a lowercase character. <br>

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/375/original/19.png?1706168031" width="500" height="400">

---

### Quantifiers

- match more than one characters at a time

```
* - 0 or More
+ - 1 or More
? - 0 or One
{3} - Exact Number
{3,4} - Range of Numbers (Minimum, Maximum)
```

Example: `\d{3}[.-]\d{3}[-.]\d{4}`

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/376/original/20.png?1706168167" width="500" height="400">

#### Special Characters

- `+` - 1 or more. It ensures that the preceeding token occurs atleast once.

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/062/015/original/9.png?1705320374" width="300" height="200">

- `*` - 0 or more. It ensures that the preceeding token occurs 0 or more times.
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/062/017/original/10.png?1705320405" width="300" height="200">

- `?` - 0 or 1. It ensures that the preceeding token occurs once or doesn't occur.

- `()` - Matches from a group of words.

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/062/018/original/11.png?1705320435" width="300" height="200">

---

#### Search for names starting with Mr -

- `\bMr\.?\s[A-Z][a-z]*`

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/377/original/21.png?1706168240" width="500" height="400">

**Question:** How to add Ms and Mrs to search pattern?
- Using Groups

### What are Groups?

- Groups allow us to match several different patterns.
- We can create groups using paranthesis `()`.
- Example: `M(r|s|rs)\.?\s[A-Z]\w*`

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/063/379/original/22.png?1706168368" width="500" height="400">

---

### Important `re` functions -

- `match` : Checks for a match only at the beginning of the string.
- `search` : Finds out the first occurence of the pattern in data.
- `findall` : Finds all the occurences of the pattern in data
- `finditer` : Finds all occurrences of a pattern in a string and return an iterator that produces match objects for each match found.

#### Function to do string manipulation:

- `sub` : Searches and replaces a string.
- `split` : Split the text by the given regular expression pattern.

---

### Extracting phone number -

#### `re.match()` -

In [None]:
pattern = "\d{3}-\d{3}-\d{4}"
print(re.match(pattern, data))

None


#### `re.search()` -

In [None]:
pattern = "\d{3}-\d{3}-\d{4}"
print(re.search(pattern, data))

<re.Match object; span=(12, 24), match='615-555-7164'>


#### `re.findall()` -

In [None]:
pattern = "\d{3}-\d{3}-\d{4}"
phone_numbers = re.findall(pattern, data)
print(phone_numbers)

['615-555-7164', '800-555-5669', '560-555-5153', '900-555-9340', '714-555-7405', '800-555-6771', '783-555-4799', '516-555-4615', '127-555-1867', '608-555-4938', '568-555-6051', '292-555-1875', '900-555-3205', '614-555-1166', '530-555-2676', '470-555-2750', '800-555-6089', '880-555-8319', '777-555-8378', '998-555-7385', '800-555-7100', '903-555-8277', '196-555-5674', '900-555-5118', '905-555-1630', '203-555-3475', '884-555-8444', '904-555-8559', '889-555-7393', '195-555-2405', '321-555-9053', '133-555-1711', '900-555-5428', '760-555-7147', '391-555-6621', '932-555-7724', '609-555-7908', '800-555-8810', '149-555-7657', '130-555-9709', '143-555-9295', '903-555-9878', '574-555-3194', '496-555-7533', '210-555-3757', '900-555-9598', '866-555-9844', '669-555-7159', '152-555-7417', '893-555-9832', '217-555-7123', '786-555-6544', '780-555-2574', '926-555-8735', '895-555-3539', '874-555-3949', '800-555-2420', '936-555-6340', '372-555-9809', '890-555-5618', '670-555-3005', '509-555-5997', '721-55

#### `re.finditer()` -

In [None]:
pattern = "\d{3}-\d{3}-\d{4}"
numbers = re.finditer(pattern, data)
for i, num in enumerate(numbers):
  print(num)
  if(i==5):
    break

<re.Match object; span=(12, 24), match='615-555-7164'>
<re.Match object; span=(102, 114), match='800-555-5669'>
<re.Match object; span=(191, 203), match='560-555-5153'>
<re.Match object; span=(281, 293), match='900-555-9340'>
<re.Match object; span=(378, 390), match='714-555-7405'>
<re.Match object; span=(467, 479), match='800-555-6771'>


#### How can we extract the location and content from these `re.Match` objects?

In [None]:
pattern = "\d{3}-\d{3}-\d{4}"
numbers = re.finditer(pattern, data)
for i, num in enumerate(numbers):
  print(num.group(), num.start(), num.end()) # num.group returns the matching data, num.start() and num.end() returns the span
  if(i==5):
    break

615-555-7164 12 24
800-555-5669 102 114
560-555-5153 191 203
900-555-9340 281 293
714-555-7405 378 390
800-555-6771 467 479


---

### Extracting email -

general structure - `(string1)@(string2).(2+characters)`

In [None]:
pattern = '\w+@\w+\.\w{2,3}'
emails = re.finditer(pattern, data)
for i, email in enumerate(emails):
  print(email)
  if(i==5):
    break

<re.Match object; span=(60, 85), match='davemartin@bogusemail.com'>
<re.Match object; span=(147, 175), match='charlesharris@bogusemail.com'>
<re.Match object; span=(235, 263), match='laurawilliams@bogusemail.com'>
<re.Match object; span=(325, 354), match='coreyjefferson@bogusemail.com'>
<re.Match object; span=(425, 453), match='jenniferwhite@bogusemail.com'>
<re.Match object; span=(517, 540), match='tomdavis@bogusemail.com'>


Let's see if our old generic pattern would have worked or not -




In [None]:
pattern = '\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+'
emails = re.finditer(pattern, data)
for i, email in enumerate(emails):
 print(email)
 if i==5:
  break

<re.Match object; span=(60, 85), match='davemartin@bogusemail.com'>
<re.Match object; span=(147, 175), match='charlesharris@bogusemail.com'>
<re.Match object; span=(235, 263), match='laurawilliams@bogusemail.com'>
<re.Match object; span=(325, 354), match='coreyjefferson@bogusemail.com'>
<re.Match object; span=(425, 453), match='jenniferwhite@bogusemail.com'>
<re.Match object; span=(517, 540), match='tomdavis@bogusemail.com'>


---

### Extracting name -

Names follow the pattern: `String1 String2`
- First alphabet of string 1 starts with upper case followed by lower case alphabets.
- Followed by a whitespace character.
- First alphabet of string 2 also starts with upper case followed by lower case alphabets.

In [None]:
pattern = '[A-Z][a-z]*\s[A-Z][a-z]*'

names = re.finditer(pattern, data)
for i, name in enumerate(names):
  print(name)
  if(i==5):
    break

<re.Match object; span=(0, 11), match='Dave Martin'>
<re.Match object; span=(29, 36), match='Main St'>
<re.Match object; span=(39, 52), match='Springfield R'>
<re.Match object; span=(87, 101), match='Charles Harris'>
<re.Match object; span=(119, 126), match='High St'>
<re.Match object; span=(129, 139), match='Atlantis V'>


**Takeaways?**
- Along with names we are also getting some part of addresses as well eg Maple St, Oak St, etc.

**Why?**
- Because they are also following the pattern string i.e. 1 or more character followed by capital letter.
- Therefore, we modify our pattern by adding one more `[a-z]` or using `{2,}`.

In [None]:
pattern = '[A-Z][a-z]*\s[A-Z][a-z]{2,}'

names = re.finditer(pattern, data)
for i, name in enumerate(names):
  print(name)
  if(i==5):
    break

<re.Match object; span=(0, 11), match='Dave Martin'>
<re.Match object; span=(87, 101), match='Charles Harris'>
<re.Match object; span=(177, 190), match='Eric Williams'>
<re.Match object; span=(265, 280), match='Corey Jefferson'>
<re.Match object; span=(356, 371), match='Jennifer Martin'>
<re.Match object; span=(455, 466), match='Erick Davis'>


---

### Modifying RE with flags -

#### `re.IGNORECASE or re.I`

- ignore case senstivity
- makes matching case insensitive

In [None]:
print(re.search('a+', 'aaaAAA'))
print(re.search('A+', 'aaaAAA'))

<re.Match object; span=(0, 3), match='aaa'>
<re.Match object; span=(3, 6), match='AAA'>


Let's try do the same pattern matching while ignoring case.

In [None]:
print(re.search('a+', 'aaaAAA', re.I))
print(re.search('A+', 'aaaAAA', re.IGNORECASE))

<re.Match object; span=(0, 6), match='aaaAAA'>
<re.Match object; span=(0, 6), match='aaaAAA'>


#### `re.VERBOSE` or `re.X`

Allows us to add
- Better spacing, indentation, and a clean formatting for writing intricate patterns.
- Comments right inside the pattern for later reference using the hash sign `#`.

We can use multiple flags together by simply combining them using `|`.

In [None]:
regex = '^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$' # highly non-readable regex

regex_verbose = re.compile(r"""   # Very readable and easy to understand.
                ^\w+([\.-]?\w+)*  # Start, \w+
                @                 # Single @ sign
                \w+([|.-]?\w+)*   # Domain name
                (\.\w{2,3})+$     # .com, .ac.in etc.
                """, re.VERBOSE | re.IGNORECASE)

---

### Extracting pincode -

In [None]:
print(data[:263])

Dave Martin
615-555-7164
173 Main St., Springfield RI 55924
davemartin@bogusemail.com

Charles Harris
800-555-5669
969 High St., Atlantis VA 34075
charlesharris@bogusemail.com

Eric Williams
560-555-5153
806 1st St., Faketown AK 86847
laurawilliams@bogusemail.com


In [None]:
regex = r'\b(\d{5})\b'
matches = re.findall(regex,data)
print(matches)

['55924', '34075', '86847', '10671', '74983', '32425', '61914', '29947', '43597', '90938', '99000', '87282', '28362', '92474', '61967', '56526', '97152', '82767', '72160', '97183', '58176', '89212', '96962', '77737', '34615', '30826', '29348', '94854', '21888', '59348', '74122', '99431', '16576', '25668', '29540', '60758', '78172', '79714', '87195', '85386', '57112', '64102', '17880', '46692', '78455', '29087', '11899', '43281', '78036', '62260', '66724', '18586', '16272', '89569', '54999', '89260', '61275', '88289', '75205', '36433', '25473', '30958', '62155', '57680', '55462', '51312', '72025', '78862', '13147', '92369', '29551', '23225', '81427', '96421', '96698', '98412', '26245', '56449', '97503', '49113', '47472', '11845', '74526', '26941', '47182', '22772', '73725', '47466', '73860', '82473', '58720', '57764', '31836', '31169', '81541', '15445', '22215', '39308', '16547', '24886']


- `\b`: Word boundary to ensure that we match complete words (pin codes).
- `(\d{5})`: This is a capturing group that matches exactly five digits.

---

### Masking email -

In [None]:
pattern = '\w+@\w+.[a-z]{3}'
emails = re.findall(pattern,data)
print(emails)

['davemartin@bogusemail.com', 'charlesharris@bogusemail.com', 'laurawilliams@bogusemail.com', 'coreyjefferson@bogusemail.com', 'jenniferwhite@bogusemail.com', 'tomdavis@bogusemail.com', 'neilpatterson@bogusemail.com', 'laurajefferson@bogusemail.com', 'mariajohnson@bogusemail.com', 'michaelarnold@bogusemail.com', 'michaelsmith@bogusemail.com', 'robertstuart@bogusemail.com', 'lauramartin@bogusemail.com', 'barbaramartin@bogusemail.com', 'lindajackson@bogusemail.com', 'stevemiller@bogusemail.com', 'davearnold@bogusemail.com', 'jenniferjacobs@bogusemail.com', 'neilwilson@bogusemail.com', 'kurtjackson@bogusemail.com', 'maryjacobs@bogusemail.com', 'michaelwhite@bogusemail.com', 'jenniferjenkins@bogusemail.com', 'samwright@bogusemail.com', 'johndavis@bogusemail.com', 'neildavis@bogusemail.com', 'laurajackson@bogusemail.com', 'johnwilliams@bogusemail.com', 'michaelmartin@bogusemail.com', 'maggiebrown@bogusemail.com', 'kurtwilson@bogusemail.com', 'elizabetharnold@bogusemail.com', 'janemartin@bog

In [None]:
def mask_email(s):
 if '@' in s:
  name, domain = s.split('@')
 return f"{name[0]}#####{name[-1]}@{domain}"

In [None]:
print([mask_email(e) for e in emails])

['d#####n@bogusemail.com', 'c#####s@bogusemail.com', 'l#####s@bogusemail.com', 'c#####n@bogusemail.com', 'j#####e@bogusemail.com', 't#####s@bogusemail.com', 'n#####n@bogusemail.com', 'l#####n@bogusemail.com', 'm#####n@bogusemail.com', 'm#####d@bogusemail.com', 'm#####h@bogusemail.com', 'r#####t@bogusemail.com', 'l#####n@bogusemail.com', 'b#####n@bogusemail.com', 'l#####n@bogusemail.com', 's#####r@bogusemail.com', 'd#####d@bogusemail.com', 'j#####s@bogusemail.com', 'n#####n@bogusemail.com', 'k#####n@bogusemail.com', 'm#####s@bogusemail.com', 'm#####e@bogusemail.com', 'j#####s@bogusemail.com', 's#####t@bogusemail.com', 'j#####s@bogusemail.com', 'n#####s@bogusemail.com', 'l#####n@bogusemail.com', 'j#####s@bogusemail.com', 'm#####n@bogusemail.com', 'm#####n@bogusemail.com', 'k#####n@bogusemail.com', 'e#####d@bogusemail.com', 'j#####n@bogusemail.com', 't#####n@bogusemail.com', 'l#####n@bogusemail.com', 't#####s@bogusemail.com', 'j#####r@bogusemail.com', 'j#####t@bogusemail.com', 's#####e@bo

---

### Masking phone number -

In [None]:
pattern = "\d{3}-\d{3}-\d{4}"
numbers = re.findall(pattern,data)
print(numbers)

['615-555-7164', '800-555-5669', '560-555-5153', '900-555-9340', '714-555-7405', '800-555-6771', '783-555-4799', '516-555-4615', '127-555-1867', '608-555-4938', '568-555-6051', '292-555-1875', '900-555-3205', '614-555-1166', '530-555-2676', '470-555-2750', '800-555-6089', '880-555-8319', '777-555-8378', '998-555-7385', '800-555-7100', '903-555-8277', '196-555-5674', '900-555-5118', '905-555-1630', '203-555-3475', '884-555-8444', '904-555-8559', '889-555-7393', '195-555-2405', '321-555-9053', '133-555-1711', '900-555-5428', '760-555-7147', '391-555-6621', '932-555-7724', '609-555-7908', '800-555-8810', '149-555-7657', '130-555-9709', '143-555-9295', '903-555-9878', '574-555-3194', '496-555-7533', '210-555-3757', '900-555-9598', '866-555-9844', '669-555-7159', '152-555-7417', '893-555-9832', '217-555-7123', '786-555-6544', '780-555-2574', '926-555-8735', '895-555-3539', '874-555-3949', '800-555-2420', '936-555-6340', '372-555-9809', '890-555-5618', '670-555-3005', '509-555-5997', '721-55

In [None]:
def mask_phone(p):
 if len(p) == 12:
  return f"###-###-{p[-3:]}"

In [None]:
print([mask_phone(n) for n in numbers])

['###-###-164', '###-###-669', '###-###-153', '###-###-340', '###-###-405', '###-###-771', '###-###-799', '###-###-615', '###-###-867', '###-###-938', '###-###-051', '###-###-875', '###-###-205', '###-###-166', '###-###-676', '###-###-750', '###-###-089', '###-###-319', '###-###-378', '###-###-385', '###-###-100', '###-###-277', '###-###-674', '###-###-118', '###-###-630', '###-###-475', '###-###-444', '###-###-559', '###-###-393', '###-###-405', '###-###-053', '###-###-711', '###-###-428', '###-###-147', '###-###-621', '###-###-724', '###-###-908', '###-###-810', '###-###-657', '###-###-709', '###-###-295', '###-###-878', '###-###-194', '###-###-533', '###-###-757', '###-###-598', '###-###-844', '###-###-159', '###-###-417', '###-###-832', '###-###-123', '###-###-544', '###-###-574', '###-###-735', '###-###-539', '###-###-949', '###-###-420', '###-###-340', '###-###-809', '###-###-618', '###-###-005', '###-###-997', '###-###-632', '###-###-567', '###-###-830', '###-###-426', '###-###-

---