# 作業目標: 利用正規表達式達到預期配對
本次作業將以互動式網站[Regex101](https://regex101.com/)來做練習，請將所需配對文本複製貼上到Regex101的**TEST STRING**區塊來做練習

### HW1: 電話號碼配對

抓出在電話號碼的所在地區以及號碼

```
ex: 02-33334444 --> 配對02, 33334444
```


**所需配對文本:**
```
02-27208889
04-2220-3585
(06)-2991111
(07)799-5678
```

**應配對出的結果為**
```
02, 27208889
04, 22203585
06, 2991111
07, 7995678
```

In [28]:
import re

txt = "SaveTheWorld@hotmail.com \n foobar@gmail.com"
pattern = r"(.*)@(?!gmail)\w+\.com"

# 建立模式物件
pattern_obj = re.compile(pattern=pattern)
# 使用pattern_obj.search配對
x1 = pattern_obj.search(txt)
print("type(x1):", type(x1))
print(x1.group())

# Or use re.search directly
x2 = re.search(pattern, txt)
print("type(x2):", type(x2))
print(x2.group())

print("-"*50)

# \(?\d{2}\)?-?[\d-]+
str_no = '02-27208889 \n 04-2220-3585 \n (06)-2991111 \n (07)799-5678'
pattern = '\(?\d{2}\)?-?[\d-]+'

pattern_obj = re.compile(pattern=pattern)
print("pattern_obj:", pattern_obj)
print("type(pattern_obj):", type(pattern_obj))
matches = pattern_obj.search(list_no[0])
print("matches:", matches)
print("matches.start():", matches.start())
print("matches.end():", matches.end())
print("matches.group():", matches.group())
print("matches.group(0):", matches.group(0))
#print("type(matches.group()):", type(matches.group()))

print("-"*50)

# re.findall()
matches = re.findall(pattern, str_no)
print("matches:", matches)
print("type(matches):", type(matches))

print("-"*50)

# re.finditer()
matches = re.finditer(pattern, str_no)
print("matches:", matches)
print("type(matches):", type(matches))
for m in matches:
    print(m)
    print(m.group())


type(x1): <class '_sre.SRE_Match'>
SaveTheWorld@hotmail.com
type(x2): <class '_sre.SRE_Match'>
SaveTheWorld@hotmail.com
--------------------------------------------------
pattern_obj: re.compile('\\(?\\d{2}\\)?-?[\\d-]+')
type(pattern_obj): <class '_sre.SRE_Pattern'>
matches: <_sre.SRE_Match object; span=(0, 11), match='02-27208889'>
matches.start(): 0
matches.end(): 11
matches.group(): 02-27208889
matches.group(0): 02-27208889
--------------------------------------------------
matches: ['02-27208889', '04-2220-3585', '(06)-2991111', '(07)799-5678']
type(matches): <class 'list'>
--------------------------------------------------
matches: <callable_iterator object at 0x7fd828f040f0>
type(matches): <class 'callable_iterator'>
<_sre.SRE_Match object; span=(0, 11), match='02-27208889'>
02-27208889
<_sre.SRE_Match object; span=(14, 26), match='04-2220-3585'>
04-2220-3585
<_sre.SRE_Match object; span=(29, 41), match='(06)-2991111'>
(06)-2991111
<_sre.SRE_Match object; span=(44, 56), match='(

### HW2: 身分證字號配對
請配對出找出桃園(H), 台南(D), 嘉義(Q)中為男生的身分證字號(數字為1開頭)

**所需配對文本:**
```
A121040176
L186856359
Z127598010
I114537095
D279884447
L186834359
D243736345
I114537095
Q146110887
D187217314
I114537095
Q243556025
Z127598010
H250077453
Q188367037
```

**應配對出的結果為**
```
Q146110887
D187217314
Q188367037
```

In [39]:
import re

str_data = 'A121040176 L186856359 Z127598010 I114537095 D279884447 L186834359 D243736345 I114537095 Q146110887 \
D187217314 I114537095 Q243556025 Z127598010 H250077453 Q188367037'
pattern = '[HDQ]1\d{8}'

re.findall(pattern, str_data)

['Q146110887', 'D187217314', 'Q188367037']

### HW3: 電子郵件配對
請抓出非gmail的電子郵件

**所需配對文本:**
```
foobar@gmail.com
NoOneCareMe@gmail.com
SaveTheWorld@hotmail.com
zzzGroup@yahoo.com
eagle1963@gmail.com
maythefourthwithyiu@starwars.com
```

**應配對出的結果為**
```
SaveTheWorld@hotmail.com
zzzGroup@yahoo.com
maythefourthwithyiu@starwars.com
```


In [75]:
str_data = 'foobar@gmail.com NoOneCareMe@gmail.com SaveTheWorld@hotmail.com zzzGroup@yahoo.com eagle1963@gmail.com maythefourthwithyiu@starwars.com'
pattern = '\S+@(?!gmail)\w+\.com'

re.findall(pattern, str_data)

['SaveTheWorld@hotmail.com',
 'zzzGroup@yahoo.com',
 'maythefourthwithyiu@starwars.com']

### HW4: HTML格式配對

請抓出<TAG>當中的Tag就好，裡面的屬性請排除。

```
ex: <p class='test'> --> 抓出 p
```

**所需配對文本:**
```
<h1>This is a header 1</h1>
<a>This is a hyperlink</a>
<div class='test'>This is a text block</div>
<a href="https://regexisfun.com.tw/">Learning Regular Expression</a>
```

**應配對出的結果為**
```
h1
a
div
a
```

In [114]:
str_data = "<h1>This is a header 1</h1> <a>This is a hyperlink</a> <div class='test'>This is a text block</div> \
<a href='https://regexisfun.com.tw/'>Learning Regular Expression</a>"

pattern = '(?<=<)[^\/]\w*'
re.findall(pattern, str_data)

['h1', 'a', 'div', 'a']

### HW5: 特定檔案名稱與格式配對

在所有檔案中，抓出屬於 gif 或 jpg 的檔名。


**所需配對文本:**
```
.bash_profile
workShop.ai
file_folderName_num.jpg
favicon.png
IMG_002.png
IMG_003.gif
qoo.jpg.tmp
index.html
foobar.bmp
foobar.jpg
account.html
access.lock
```

**應配對出的結果為**
```
IMG_003.gif
file_folderName_num.jpg
foobar.jpg
```

In [122]:
str_data = '.bash_profile workShop.ai file_folderName_num.jpg favicon.png IMG_002.png IMG_003.gif qoo.jpg.tmp \
index.html foobar.bmp foobar.jpg account.html access.lock'
pattern = '\w*\.jpg|\w*\.gif'  #'\w*\.(jpg|gif)'
re.findall(pattern, str_data)

['file_folderName_num.jpg', 'IMG_003.gif', 'qoo.jpg', 'foobar.jpg']

### HW6: URL配對

請抓出 Url 中的協定方式, 網址, 與埠

```
ex: Https://localhost:4200/ --> 抓取 Https, localhost, 4200
```

**所需配對文本:**
```
ftp://file_server.com:21/account/customers.xml
https://hengxiuxu.blogspot.tw/
file://localhost:4200
https://s3cur3-server.com:9999/
```

**應配對出的結果為**
```
ftp, file_server, 21
https, hengxiuxu.blogspot.tw
file, localhost, 4200
https, s3cur3-server.com, 9999
```

In [131]:
str_data = 'ftp://file_server.com:21/account/customers.xml https://hengxiuxu.blogspot.tw/ file://localhost:4200 \
https://s3cur3-server.com:9999/'
pattern = 'ftp(?=:)|https(?=:)|file(?=:)|(?<=:\/\/)[\w\.-]+[^:\/]|\d+'
re.findall(pattern, str_data)

['ftp',
 'file_server.com',
 '21',
 'https',
 'hengxiuxu.blogspot.tw',
 'file',
 'localhost',
 '4200',
 'https',
 's3cur3-server.com',
 '9999']