## A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.RegEx can be used to check if a string contains the specified search pattern.

bir arama kalıbı oluşturan bir karakter dizisidir.

RegEx, bir dizenin belirtilen arama kalıbını içerip içermediğini kontrol etmek için kullanılabilir.

### Expressions :
***

`\d`                         Any numeric digit from `0` to `9`.Matches any digit character (0-9). Equivalent to [0-9].
                           
`\D`                         Matches any character which is not a decimal digit. 
                           This is the opposite of `\d`.Equivalent to [^0-9].
                           
`\w`                       (Think of this as matching
                           "word" characters.)
                           Matches any word character (alphanumeric & underscore).Equivalent to [A-Za-z0-9_]
                           
`\W`                         Matches any character that is not a word character (alphanumeric & underscore). Equivalent to [^A-Za-z0-9_]
                           
`\s`                         Any space, tab, or newline character.  (
                           Think of this as matching white-space
                           characters.)
                           
`\S`                         Any character that is not a space, tab,
                           or newline.
***

In [613]:
import re
import pandas as pd

- Search

Returns a Match object if there is a match anywhere in the string

Herhangi bir eşleşme varsa döndürür

In [614]:
text = "A78L41K"

In [615]:
num=re.search('\d\d',text)
num

<re.Match object; span=(1, 3), match='78'>

In [616]:
num.start()

1

In [617]:
num.end()

3

In [618]:
num.group(0)

'78'

In [619]:
text = "8PM_19MIN"

In [620]:
nondigi=re.search('\D',text)
nondigi.group()
# print(nondigi.group())

'P'

In [2]:
text = 'My phone number is 555 666 7777'

In [3]:
telno=re.search('\d\d\d \d\d\d \d\d\d\d',text)
print(telno.span())
print(telno.group())

<IPython.core.display.Javascript object>

(19, 31)
555 666 7777


In [623]:
text[19:31]

'555 666 7777'

In [624]:
text = 'My phone number is 415-555-1212'

In [625]:
telno=re.search('\d\d\d-\d\d\d-\d\d\d\d',text)
print(telno.group())

415-555-1212


In [626]:
telno=re.search('(\d\d\d)-(\d\d\d)-(\d\d\d\d)',text)
print(telno.group(1))
print(telno.group(2))
print(telno.group(3))

415
555
1212


In [627]:
telno = re.search("\d"*3 + "-" + "\d"*3 + "-" + "\d"*4, text)
telno.group()
# print(telno.group())

'415-555-1212'

## The sub() function replaces the matches with the text of your choice
## eşleşmeleri seçtiğiniz metinle değiştirir:

In [628]:
phone = "2004-959-559"

In [629]:
output=re.sub('\D','*',phone)
print(output)

2004*959*559


In [630]:
output=re.sub('\d',' + ',phone)
print(output)

 +  +  +  + - +  +  + - +  +  + 


In [631]:
txt = "You are good man"
x = re.sub("\s", "9", txt, 2)
x

'You9are9good man'

## The findall() function returns a list containing all matches.
## tüm eşleşmeleri içeren bir liste döndürür.

### Special Characters
___

``"[]"``	  A set of characters	``"[a-m]"``

``"\"``	      Signals a special sequence (can also be used to escape special characters)

``"."``	      Any character (except newline character)

``"^"``	      Starts with	``"^hello"``

``"$"``	      Ends with	``"world$"``

``"*"``	      Zero or more occurrences

`"+"`	      One or more occurrences

`"{}"`	  Exactly the specified number of occurrences

`"|"`	      Either or	`"falls|stays"`

`"()"`	  Capture and group
___

In [632]:
value = "o 1, t 10, o 100. 100000"

In [633]:
output=re.findall('\d{1,6}',value)
print(output)

['1', '10', '100', '100000']


In [634]:
txt = "1 person against 100 people"

In [635]:
output=re.findall('\d+',txt)
print(output)

['1', '100']


In [636]:
txt = "hello world"

In [637]:
re.findall('^hello',txt)

['hello']

In [638]:
out=re.findall('world$',txt)
print(out)

['world']


In [639]:
s = pd.Series(['a3', 'b4', 'c5', 'd'])

In [640]:
s

0    a3
1    b4
2    c5
3     d
dtype: object

In [641]:
s.str.contains("\d")

0     True
1     True
2     True
3    False
dtype: bool

In [642]:
s.apply(lambda x : True if re.search("\d", x) else False)

0     True
1     True
2     True
3    False
dtype: bool

In [643]:
s.str.extract('(\d)')

Unnamed: 0,0
0,3.0
1,4.0
2,5.0
3,


In [644]:
s.str.extract('(\w)')

Unnamed: 0,0
0,a
1,b
2,c
3,d


In [645]:
s = pd.Series(['a3aa', 'b4aa', 'c5aa'])
s

0    a3aa
1    b4aa
2    c5aa
dtype: object

In [646]:
s.str.extract('(\w)\d(\w)(\w)')

Unnamed: 0,0,1,2
0,a,a,a
1,b,a,a
2,c,a,a


In [647]:
s= pd.Series(['40 l/100 km (comb)', 
        '38 l/100 km (comb)', '6.4 l/100 km (comb)',
       '8.3 kg/100 km (comb)', '5.1 kg/100 km (comb)',
       '5.4 l/100 km (comb)', '6.7 l/100 km (comb)',
       '6.2 l/100 km (comb)', '7.3 l/100 km (comb)',
       '6.3 l/100 km (comb)', '5.7 l/100 km (comb)',
       '6.1 l/100 km (comb)', '6.8 l/100 km (comb)',
       '7.5 l/100 km (comb)', '7.4 l/100 km (comb)',
       '3.6 kg/100 km (comb)', '0 l/100 km (comb)', 
       '7.8 l/100 km (comb)'])


In [648]:
s

0       40 l/100 km (comb)
1       38 l/100 km (comb)
2      6.4 l/100 km (comb)
3     8.3 kg/100 km (comb)
4     5.1 kg/100 km (comb)
5      5.4 l/100 km (comb)
6      6.7 l/100 km (comb)
7      6.2 l/100 km (comb)
8      7.3 l/100 km (comb)
9      6.3 l/100 km (comb)
10     5.7 l/100 km (comb)
11     6.1 l/100 km (comb)
12     6.8 l/100 km (comb)
13     7.5 l/100 km (comb)
14     7.4 l/100 km (comb)
15    3.6 kg/100 km (comb)
16       0 l/100 km (comb)
17     7.8 l/100 km (comb)
dtype: object

In [649]:
result=s.str.extract('(\d\d|\d.\d|\d)')
result

Unnamed: 0,0
0,40.0
1,38.0
2,6.4
3,8.3
4,5.1
5,5.4
6,6.7
7,6.2
8,7.3
9,6.3


In [650]:
result=s.str.extract('(\d\d|\d.\d|\d).+/(\d\d\d)')
result
# ""/"" escape koymak garanti yapar. koymadan da bazen olabilir ama bazen hata alırsın

Unnamed: 0,0,1
0,40.0,100
1,38.0,100
2,6.4,100
3,8.3,100
4,5.1,100
5,5.4,100
6,6.7,100
7,6.2,100
8,7.3,100
9,6.3,100


In [651]:
result=s.str.extract('(^\d*.\d*) \w*/(\d*)')
result

Unnamed: 0,0,1
0,40.0,100
1,38.0,100
2,6.4,100
3,8.3,100
4,5.1,100
5,5.4,100
6,6.7,100
7,6.2,100
8,7.3,100
9,6.3,100


In [652]:
s.str.extract('(\d*.\d*) .+/(\d*)')

Unnamed: 0,0,1
0,40.0,100
1,38.0,100
2,6.4,100
3,8.3,100
4,5.1,100
5,5.4,100
6,6.7,100
7,6.2,100
8,7.3,100
9,6.3,100


In [653]:
result=s.str.extract('(\d*.\d*) \w*/(\d*)')
result

Unnamed: 0,0,1
0,40.0,100
1,38.0,100
2,6.4,100
3,8.3,100
4,5.1,100
5,5.4,100
6,6.7,100
7,6.2,100
8,7.3,100
9,6.3,100


In [654]:
s = pd.Series(['06/2020\n\n4.9 l/100 km (comb)',
'11/2020\n\n166 g CO2/km (comb)',                                 
'10/2019\n\n5.3 l/100 km (comb)',
'05/2022\n\n6.3 l/100 km (comb)',
'07/2019\n\n128 g CO2/km (comb)',
'06/2022\n\n112 g CO2/km (comb)',                                                 
'01/2022\n\n5.8 l/100 km (comb)',
'11/2020\n\n106 g CO2/km (comb)',
'04/2019\n\n105 g CO2/km (comb)',
'08/2020\n\n133 g CO2/km (comb)',
'04/2022\n\n133 g CO2/km (comb)'])


In [655]:
result=s.str.extract('(\d+).(\d+)')
result

Unnamed: 0,0,1
0,6,2020
1,11,2020
2,10,2019
3,5,2022
4,7,2019
5,6,2022
6,1,2022
7,11,2020
8,4,2019
9,8,2020


In [656]:
t = s.str.extract('(\S+)/(\S+)')
t

Unnamed: 0,0,1
0,6,2020
1,11,2020
2,10,2019
3,5,2022
4,7,2019
5,6,2022
6,1,2022
7,11,2020
8,4,2019
9,8,2020


In [657]:
result=s.str.extract('(\d{2})/(\d{4})')
result

#  "/" yerine "." da olur

Unnamed: 0,0,1
0,6,2020
1,11,2020
2,10,2019
3,5,2022
4,7,2019
5,6,2022
6,1,2022
7,11,2020
8,4,2019
9,8,2020


In [1]:
python -m pip install notebook --pre --upgrade

SyntaxError: invalid syntax (Temp/ipykernel_2520/735237753.py, line 1)