# Table of Contents
<a id="toc"></a>
- [Regular Expressions (regex)](#1)
- [ Regular Expressions - 2: Walkthrough example](#2)
- [Pandas: Extract, Contains in Python using regex](#3)

<a id="1"></a>
# **<span style="color:#00BFC4;"> Regular Expressions (regex)  </span>**

<div style="font-size:15px; font-family:verdana;">In this part you will learn:<br>
<ol>
    <li>What is a regex?</li>
    <li>Why is it needed?</li>
    <li>How to create and use a regex?</li> 
    <li>What are <span style="color:green;">metacharacters</span><b>? <code>. ^ $ * + ? < / { } [ ] | \ ( )<code></li> 
    <li>What are <span style="color:crimson;">special sequence</span> lists?</li>
    <li>What is a Raw String?and when to use it?</li>
    
</ol>

`^` : caret

In [1]:
import numpy as np
import pandas as pd
import re

We are going to find patterns

In [16]:
# String
s = 'dfda& ab&cdeabb& 98 abcd f12_abbbb&gh$[a \
lsjfdlj 1jkfd'

<div style="font-size:15px; font-family:verdana;">In this part you will learn:<br>
<ol>
    <b>These are alphanumeric<br>
    <code>a-z A-Z 0-9 _<code> <br>
    <b>otherwise : non-alphanumeric    

### Methods or commands

In [17]:
# Thats the pattern that we are looking at
p = 'a&'

In [18]:
# p : pattern
# s : string
m = re.match(p, s)        # returns : objects ; looks at the beginning of a string
print(m)

None


In [6]:
print(m)

None


In [7]:
# String
s = 'a& ab&cdeabb& 98 abcd f12_abbbb&gh$[a \
lsjfdlj 1jkfd'

In [8]:
m = re.match(p, s)
print(m)

<re.Match object; span=(0, 2), match='a&'>


span(starting position, end position)

In [10]:
s

'a& ab&cdeabb& 98 abcd f12_abbbb&gh$[a lsjfdlj 1jkfd'

In [13]:
print('group : ', m.group())
print('start point : ', m.start())
print('end point : ', m.end())
print('span : ', m.span())

group :  a&
start point :  0
end point :  2
span :  (0, 2)


### search()

In [19]:
sr = re.search(p, s)          # object ; looks through entire string and return the 1st time it occures

In [20]:
print('group : ', sr.group())
print('start point : ', sr.start())
print('end point : ', sr.end())
print('span : ', sr.span())

group :  a&
start point :  3
end point :  5
span :  (3, 5)


In [33]:
s1 = 'dfd ab&cda& eaa&bb& 98 abcd f12_aba& bbb&gh$[a lsjfa& dlj 1jkfd'

In [31]:
print(re.match(p, s1))

None


returns `None` because there is **nothing** at the beginning of the string.

In [34]:
re.search(p, s1)

<re.Match object; span=(9, 11), match='a&'>

### findall()

It returns a list of all matches for that particular pattern in the string.

In [36]:
f = re.findall(p, s1)      # returns list ; looks through entire string

In [37]:
f

['a&', 'a&', 'a&', 'a&']

### finditer()

In [42]:
m = re.finditer(p, s1)
for i in m:
    print('group : ', i.group())
    print('start point : ', i.start())
    print('end point : ', i.end())
    print('span : ', i.span(), '\n')

group :  a&
start point :  9
end point :  11
span :  (9, 11) 

group :  a&
start point :  14
end point :  16
span :  (14, 16) 

group :  a&
start point :  34
end point :  36
span :  (34, 36) 

group :  a&
start point :  51
end point :  53
span :  (51, 53) 



### How metacharacters work

In [49]:
# String
s = 'a& dfd ab&cdeabb& 98 abcd f12_abbbb&gh$[a \
lsjfdlj 1jkfd'

In [46]:
p = 'a&'

In [50]:
re.findall(p, s)

['a&']

In [51]:
p = '[abc]'
re.findall(p, s)

['a',
 'a',
 'b',
 'c',
 'a',
 'b',
 'b',
 'a',
 'b',
 'c',
 'a',
 'b',
 'b',
 'b',
 'b',
 'a']

We get list of all Characters a,b,c that exist in that particular string.

In [52]:
re.match(p, s).span()

(0, 1)

In [56]:
re.findall(p, s)

['a',
 'a',
 'b',
 'c',
 'a',
 'b',
 'b',
 'a',
 'b',
 'c',
 'a',
 'b',
 'b',
 'b',
 'b',
 'a']

In [58]:
s = 'a& dfd ab&cdeabb& 98 abcd f12_abbbb&gh$[a \
lsjfdlj 1jkfd'
s

'a& dfd ab&cdeabb& 98 abcd f12_abbbb&gh$[a lsjfdlj 1jkfd'

In [57]:
p = '[a-c]'       # a-c : tells find everything in that range
re.findall(p, s)

['a',
 'a',
 'b',
 'c',
 'a',
 'b',
 'b',
 'a',
 'b',
 'c',
 'a',
 'b',
 'b',
 'b',
 'b',
 'a']

In [59]:
p = '[1-9]'      
re.findall(p, s)

['9', '8', '1', '2', '1']

In [60]:
p = '[7-9]'      
re.findall(p, s)

['9', '8']

In [63]:
p = '\$'      
re.findall(p, s)

['$']

because `$` is a metacharacters , we use \ before $ for searching all this sign in the text

In [65]:
s

'a& dfd ab&cdeabb& 98 abcd f12_abbbb&gh$[a lsjfdlj 1jkfd'

In [68]:
p = '\$'      
re.search(p, s)

<re.Match object; span=(38, 39), match='$'>

In [69]:
p = '\['      
re.search(p, s)

<re.Match object; span=(39, 40), match='['>

In [70]:
p = '[$]'      
re.search(p, s)

<re.Match object; span=(38, 39), match='$'>

if we put **metacharacters** onto [  ] , it losses its prpperty of metacharacter

### caret `^`

<li> Caret tells us that the string started with a

In [73]:
p = '^a'      
re.search(p, s)

<re.Match object; span=(0, 1), match='a'>

In [75]:
p = '[^a]'      
re.search(p, s)

<re.Match object; span=(1, 2), match='&'>

[^a] --> find everything except matching a

## Special sequences

`\d` : find all the `numbers` in the string

In [77]:
p = '\d'      
re.findall(p, s)

['9', '8', '1', '2', '1']

`\D` : find all the characters `except numbers` in the string

In [78]:
p = '\D'      
re.findall(p, s)

['a',
 '&',
 ' ',
 'd',
 'f',
 'd',
 ' ',
 'a',
 'b',
 '&',
 'c',
 'd',
 'e',
 'a',
 'b',
 'b',
 '&',
 ' ',
 ' ',
 'a',
 'b',
 'c',
 'd',
 ' ',
 'f',
 '_',
 'a',
 'b',
 'b',
 'b',
 'b',
 '&',
 'g',
 'h',
 '$',
 '[',
 'a',
 ' ',
 'l',
 's',
 'j',
 'f',
 'd',
 'l',
 'j',
 ' ',
 'j',
 'k',
 'f',
 'd']

`\s` : find all the `spaces` in the string

In [79]:
p = '\s'      
re.findall(p, s)

[' ', ' ', ' ', ' ', ' ', ' ', ' ']

In [80]:
s

'a& dfd ab&cdeabb& 98 abcd f12_abbbb&gh$[a lsjfdlj 1jkfd'

`\S` : getting everything back from the string `except spaces`.

In [81]:
p = '\S'      
re.findall(p, s)

['a',
 '&',
 'd',
 'f',
 'd',
 'a',
 'b',
 '&',
 'c',
 'd',
 'e',
 'a',
 'b',
 'b',
 '&',
 '9',
 '8',
 'a',
 'b',
 'c',
 'd',
 'f',
 '1',
 '2',
 '_',
 'a',
 'b',
 'b',
 'b',
 'b',
 '&',
 'g',
 'h',
 '$',
 '[',
 'a',
 'l',
 's',
 'j',
 'f',
 'd',
 'l',
 'j',
 '1',
 'j',
 'k',
 'f',
 'd']

`\w` : returns all the `alphanumeric` characters only.

In [87]:
p = '\w'      
re.findall(p, s)

['a',
 'd',
 'f',
 'd',
 'a',
 'b',
 'c',
 'd',
 'e',
 'a',
 'b',
 'b',
 '9',
 '8',
 'a',
 'b',
 'c',
 'd',
 'f',
 '1',
 '2',
 '_',
 'a',
 'b',
 'b',
 'b',
 'b',
 'g',
 'h',
 'a',
 'l',
 's',
 'j',
 'f',
 'd',
 'l',
 'j',
 '1',
 'j',
 'k',
 'f',
 'd']

`\W` : returns all the characters `except alphanumeric` or returns `metacharacters` only.

In [88]:
p = '\W'      
re.findall(p, s)

['&', ' ', ' ', '&', '&', ' ', ' ', ' ', '&', '$', '[', ' ', ' ']

`\b...` : it tries to find the substring that we have given (for example data)either at the beginning or at the end of the string.

In [92]:
s = 'data science data'
p = '\bdata'
print(re.search(p, s))

None


In [93]:
s = 'data science data'
p = r'\bdata'
re.search(p, s)

<re.Match object; span=(0, 4), match='data'>

In [12]:
s = 'Alireza'
a = b'Alireza'

print(print(s), type(s), '\n',print(a), type(a))

Alireza
b'Alireza'
None <class 'str'> 
 None <class 'bytes'>


`\b` : in python has another meaning,so we have to make it as a **`**raw string**`**<br>therefore we have to put `r` in front of the string<br>for example : **r**' \bdata'

# Metacharacters : Repeats

In [97]:
s = 'a& ab&cdeabb& 98 abcd f12_abbbb&gh$[a \
lsjfdlj 1jkfd'

s

'a& ab&cdeabb& 98 abcd f12_abbbb&gh$[a lsjfdlj 1jkfd'

### **`*`**

`'ab*&'` : the match has to `start` with letter A<br>has to `end` with the letter with ampersand(&)<br>in `middle` between those 2 the b can occur
zero through infinity times.

In [98]:
p = 'ab*&'
re.findall(p, s)

['a&', 'ab&', 'abb&', 'abbbb&']

### **`+`**

`'ab+&'` : the biggest different between `*` and `+` is it there has to be <span style="color:red;">at least 1</span> b between a and &.


In [99]:
p = 'ab+&'
re.findall(p, s)

['ab&', 'abb&', 'abbbb&']

### **`?`**

`?` : its binary<br>the b occurrence has to be either 1 or it can be zero.

In [101]:
p = 'ab?&'
re.findall(p, s)

['a&', 'ab&']

### **`{ }`**

`{ }` : `'ab{2,3}&'` what this is teling,the number of b that have to be there between a and & <br>
minimum 2 b have to be there <br>
but not more than 3<br>
{minimum , maximum}

In [103]:
p = 'ab{2,3}&'
re.findall(p, s)

['abb&']

# Match at the end of the string

`Dollar($)` : symbol matches the end of the string i.e checks whether the string ends with the given character(s) or not.

In [106]:
s = 'data1 science data'
p = 'data$'
re.findall(p, s)

['data']

### Group

In [13]:
s = 'a& ab&cdeabb& 98 abcd f12_abbbb&gh$[a \
lsjfdlj 1jkfd'

The group is just using these parantheses to include these conditions.

In [15]:
p = '(a|b|c)'
re.findall(p, s)

['a',
 'a',
 'b',
 'c',
 'a',
 'b',
 'b',
 'a',
 'b',
 'c',
 'a',
 'b',
 'b',
 'b',
 'b',
 'a']

group can be used to combine a large number of matches like patterns.

## Use cases

In [16]:
s = 'This is a a Monday'
p = '\s'
re.split(p, s)

['This', 'is', 'a', 'a', 'Monday']

In [17]:
s = 'This is a a Monday'
p = '\s'

re.sub(p, '*', s)          # sub : replace

'This*is*a*a*Monday'

In [20]:
s = 'This is a a Monday'
p = '\s'

re.subn(p, '*', s)        # subn : the number of times it was replaced
print('(Replace the pattern with *, The number of times it was replaced) : ', re.subn(p, '*', s) )

(Replace the pattern with *, The number of times it was replaced) :  ('This*is*a*a*Monday', 4)


## Raw string

In [21]:
s = 'today is \n Monday \b'

In [23]:
p = '[\n\b]'
re.findall(p, s)

['\n', '\x08']

<a id="2"></a>
# **<span style="color:#00BFC4;"> Regular Expressions - 2: Walkthrough example  </span>**

In [24]:
# Data
s = '\ [***Note***: This is a made up text to explain Regular Expressions in Python. It is NOT real.]\
    This year in 2020, the rainfall in this region of the forest is 2% more than the rest the forest as compared to last year. \
    And the primary reason we think is because of increased tree plantations since the year 2005. \
    Most notably the organizations such as "Save Forest", "Save Planet", "Sun & Rain" have \
    made an important contribution. Annually each of these organizations have planted 1000000, 500000 and 200000 \
    saplings. And about 25% of those have now grown into tall magnificient trees. The survival rate\
    of such saplings is lower because of hot and dry summer temperatures that are upwards of 113 deg F (or 45 deg C).\
    There were 500, 245 and 793 volunteers from each of \
    organizations. The new forest canopy also provides a lush green habitat to support wildlife.\
    We can now see species of birds such as parrots increase in numbers fro 500 to almost 2000.\
    Few species of animals such as monkeys have also grown in population from 200 to around 400.\
    This is all very encouraging, however there still lies one problem. The rest of the forest\
    outside this 1000 sq km has a rocky terrain. Most of the soil was washed out by rain water\
    because there was no vegetation to hold it in place. The lack of vegetation in those area is\
    likely a consequence of rapid deforestation. Over past 5 years, more than 1 trillion trees have been cut down \
    in this forest for various reasons such as global demand for wood, clearing land for large farmlands, and \
    consequently reduced rainfall. The estimated global market for wood has increased from $60 billion in the year 2005 to \
    $300 billion in the year 2020. Therefore, such incresed efforts to plant more trees to save forests and also meet\
    increased demand in global market are needed. Save forests! \
    Contact us: \
    Email: savetree@saveforest.org or Phone: +00-1111-2222 \
    Email: saveblue@saveplanet.org or Phone: +00-3333-4444 \
    Email: bringrain@sunandrain.org or Phone: +00-3333-4444 \
    '

In [26]:
p = '\]'
dlist = re.split(p, s)
dlist

['\\ [***Note***: This is a made up text to explain Regular Expressions in Python. It is NOT real.',
 '    This year in 2020, the rainfall in this region of the forest is 2% more than the rest the forest as compared to last year.     And the primary reason we think is because of increased tree plantations since the year 2005.     Most notably the organizations such as "Save Forest", "Save Planet", "Sun & Rain" have     made an important contribution. Annually each of these organizations have planted 1000000, 500000 and 200000     saplings. And about 25% of those have now grown into tall magnificient trees. The survival rate    of such saplings is lower because of hot and dry summer temperatures that are upwards of 113 deg F (or 45 deg C).    There were 500, 245 and 793 volunteers from each of     organizations. The new forest canopy also provides a lush green habitat to support wildlife.    We can now see species of birds such as parrots increase in numbers fro 500 to almost 2000.    F

It returns a 2 strings.

In [27]:
d_note = dlist[0]
d_note

'\\ [***Note***: This is a made up text to explain Regular Expressions in Python. It is NOT real.'

In [28]:
d_text = dlist[1]
d_text

'    This year in 2020, the rainfall in this region of the forest is 2% more than the rest the forest as compared to last year.     And the primary reason we think is because of increased tree plantations since the year 2005.     Most notably the organizations such as "Save Forest", "Save Planet", "Sun & Rain" have     made an important contribution. Annually each of these organizations have planted 1000000, 500000 and 200000     saplings. And about 25% of those have now grown into tall magnificient trees. The survival rate    of such saplings is lower because of hot and dry summer temperatures that are upwards of 113 deg F (or 45 deg C).    There were 500, 245 and 793 volunteers from each of     organizations. The new forest canopy also provides a lush green habitat to support wildlife.    We can now see species of birds such as parrots increase in numbers fro 500 to almost 2000.    Few species of animals such as monkeys have also grown in population from 200 to around 400.    This is

## Remove blank spaces

In [30]:
# remove long space
p = ' {2,5}'
dt = re.sub(p, ' ', d_text)
dt

' This year in 2020, the rainfall in this region of the forest is 2% more than the rest the forest as compared to last year. And the primary reason we think is because of increased tree plantations since the year 2005. Most notably the organizations such as "Save Forest", "Save Planet", "Sun & Rain" have made an important contribution. Annually each of these organizations have planted 1000000, 500000 and 200000 saplings. And about 25% of those have now grown into tall magnificient trees. The survival rate of such saplings is lower because of hot and dry summer temperatures that are upwards of 113 deg F (or 45 deg C). There were 500, 245 and 793 volunteers from each of organizations. The new forest canopy also provides a lush green habitat to support wildlife. We can now see species of birds such as parrots increase in numbers fro 500 to almost 2000. Few species of animals such as monkeys have also grown in population from 200 to around 400. This is all very encouraging, however there s

### Get company information

#### <li>Get names

In [32]:
p = '"(.+?)"'
cnames = re.findall(p, dt)
cnames

['Save Forest', 'Save Planet', 'Sun & Rain']

#### <li>Get emails

In [35]:
p = '[a-zA-Z0-9_]+@[a-zA-Z0-9_]+\.[a-zA-Z0-9_]+'
email = re.findall(p, dt)
email

['savetree@saveforest.org',
 'saveblue@saveplanet.org',
 'bringrain@sunandrain.org']

#### <li>Get domain

In [37]:
p = '[a-zA-Z0-9_]+\.[a-zA-Z0-9_]+'
domain = re.findall(p, dt)
domain

['saveforest.org', 'saveplanet.org', 'sunandrain.org']

#### <li>Get number

In [39]:
p = '\+\d{2}-\d{4}-\d{4}'
number = re.findall(p, dt)
number

['+00-1111-2222', '+00-3333-4444', '+00-3333-4444']

#### <li>Get count of sapling

In [41]:
p = '\d{5,}'
sapling = re.findall(p, dt)
sapling

['1000000', '500000', '200000']

#### <li>Get volunteer

In [49]:
p = 'were .* volunteers'
r = re.findall(p, dt)[0]

p_2 = '\d{2,}'
volunteer = re.findall(p_2, r)
volunteer

['500', '245', '793']

In [50]:
p = 'were .* volunteers'
r = re.findall(p, dt)[0]

p_2 = '\d+'
volunteer = re.findall(p_2, r)
volunteer

['500', '245', '793']

Now ,combine these information and put it into DataFrame.

## Create a company DataFrame

In [51]:
dfc = pd.DataFrame({
    'company' : nmaes,
    'email' : email,
    'domain' : domain,
    'phone': number,
    'sapling_planted' : sapling,
    'volunteers' : volunteer
})

dfc

Unnamed: 0,company,email,domain,phone,sapling_planted,volunteers
0,Save Forest,savetree@saveforest.org,saveforest.org,+00-1111-2222,1000000,500
1,Save Planet,saveblue@saveplanet.org,saveplanet.org,+00-3333-4444,500000,245
2,Sun & Rain,bringrain@sunandrain.org,sunandrain.org,+00-3333-4444,200000,793


## DataFrame

In [56]:
p = '\.'
x = re.split(p, dt)

dfl = pd.DataFrame({
    'raw' : x
})
dfl

Unnamed: 0,raw
0,"This year in 2020, the rainfall in this regio..."
1,And the primary reason we think is because of...
2,"Most notably the organizations such as ""Save ..."
3,Annually each of these organizations have pla...
4,And about 25% of those have now grown into ta...
5,The survival rate of such saplings is lower b...
6,"There were 500, 245 and 793 volunteers from e..."
7,The new forest canopy also provides a lush gr...
8,We can now see species of birds such as parro...
9,Few species of animals such as monkeys have a...


In [74]:
dfl['billion_$'] = dfl['raw'].apply(lambda x : re.findall('\$\d+', x) if ('$' in x) else np.nan)
dfl['reasons'] = dfl['raw'].apply(lambda x : re.split('because ', x)[1] if ('because' in x) else np.nan)
dfl['birds'] = dfl['raw'].apply(lambda x : re.split('birds ', x)[1] if ('birds' in x) else np.nan)
dfl['animals'] = dfl['raw'].apply(lambda x : re.split('animals ', x)[1] if ('animals' in x) else np.nan)
dfl['years'] = dfl['raw'].apply(lambda x : re.findall('\d{4}', x) if(('years' in x) | ('year' in x)) else np.nan)

if we don't put [1],it returns lists of sample.

In [75]:
dfl

Unnamed: 0,raw,billion_$,reasons,birds,animals,years
0,"This year in 2020, the rainfall in this regio...",,,,,[2020]
1,And the primary reason we think is because of...,,of increased tree plantations since the year 2005,,,[2005]
2,"Most notably the organizations such as ""Save ...",,,,,
3,Annually each of these organizations have pla...,,,,,
4,And about 25% of those have now grown into ta...,,,,,
5,The survival rate of such saplings is lower b...,,of hot and dry summer temperatures that are up...,,,
6,"There were 500, 245 and 793 volunteers from e...",,,,,
7,The new forest canopy also provides a lush gr...,,,,,
8,We can now see species of birds such as parro...,,,such as parrots increase in numbers fro 500 to...,,
9,Few species of animals such as monkeys have a...,,,,such as monkeys have also grown in population ...,


### Get market size

In [76]:
dfnew = pd.DataFrame({
    'year' : dfl.loc[15, 'years'],
    'billion_$' : dfl.loc[15, 'billion_$']
})
dfnew

Unnamed: 0,year,billion_$
0,2005,$60
1,2020,$300


In [79]:
x = dfl['animals'].dropna()
x

9    such as monkeys have also grown in population ...
Name: animals, dtype: object

In [88]:
x = dfl['animals'].dropna().values[0]
a_count = re.findall('\d+', x)
a_name = re.split('such as ', x)[1].split(' ')[0]
dfnew['animal_population'] = a_count

In [87]:
re.split('such as ', x)[1].split(' ')[0]

'monkeys'

In [89]:
dfnew

Unnamed: 0,year,billion_$,animal_population
0,2005,$60,200
1,2020,$300,400


<a id="3"></a>
# **<span style="color:#00BFC4;"> Pandas: Extract, Contains in Python using regex  </span>**

How to use regex or regular expressions in Pandas?<br>
How to split strings in a Series and DataFrame?<br>
How to split index?<br>
How to check if a pattern exists in a Series or a column of DataFrame?<br>
How to use regex to directly split strings into dummy variables?<br>

<b>Commands:<br>
<li>extract()with and without expand<br>
<li>extractall()<br>
<li>.contains()

### Data

In [2]:
# s : Series
# p : pattern

In [5]:
s = pd.Series(['m1', 'n2', 'o3', '4'])

### Extract

split strings within a Series and put them into different if needed in different columns or new series itself.

In [6]:
s

0    m1
1    n2
2    o3
3     4
dtype: object

In [7]:
p = r'([mn])(\d)'

In [14]:
x = s.str.extract(p, expand=False)
print(type(x))
x

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,0,1
0,m,1.0
1,n,2.0
2,,
3,,


In [18]:
p = r'(?P<letter>[mn])(?P<number>\d)'
x = s.str.extract(p, expand=False)
print(type(x))
x

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,letter,number
0,m,1.0
1,n,2.0
2,,
3,,


In [22]:
p = r'([mn])?(\d)'
x = s.str.extract(p, expand=False)
print(type(x))
x

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,0,1
0,m,1
1,n,2
2,,3
3,,4


#### expand = True vs. False

In [23]:
p = r'[mn](\d)'
x = s.str.extract(p, expand=False)
print(type(x))
x

<class 'pandas.core.series.Series'>


0      1
1      2
2    NaN
3    NaN
dtype: object

In [24]:
p = r'[mn](\d)'
x = s.str.extract(p, expand=True)
print(type(x))
x

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,0
0,1.0
1,2.0
2,
3,


expand=True , return DataFrame

### Index

In [25]:
s.index = ['M1', 'N2', 'O3', 'Q4']

In [28]:
s

M1    m1
N2    n2
O3    o3
Q4     4
dtype: object

In [26]:
p = r'[MN](?P<number>\d)'
x = s.index.str.extract(p, expand=False)
print(type(x))
x

<class 'pandas.core.indexes.base.Index'>


Index(['1', '2', nan, nan], dtype='object', name='number')

In [27]:
p = r'[MN](?P<number>\d)'
x = s.index.str.extract(p, expand=True)
print(type(x))
x

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,number
0,1.0
1,2.0
2,
3,


In [30]:
p = r'(?P<letter>[MN])(?P<number>\d)'
x = s.index.str.extract(p, expand=True)
print(type(x))
x

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,letter,number
0,M,1.0
1,N,2.0
2,,
3,,


### extract all

In [33]:
s[0] = 'm1m2'
s

M1    m1m2
N2      n2
O3      o3
Q4       4
dtype: object

In [35]:
p = r'(?P<letter>[a-zA-Z])(?P<number>\d)'
s.str.extractall(p)

Unnamed: 0_level_0,Unnamed: 1_level_0,letter,number
Unnamed: 0_level_1,match,Unnamed: 2_level_1,Unnamed: 3_level_1
M1,0,m,1
M1,1,m,2
N2,0,n,2
O3,0,o,3


### contains

In [37]:
s = pd.Series(['0a2', '3bb', '4', 'q'])
s

0    0a2
1    3bb
2      4
3      q
dtype: object

In [38]:
p = r'[0-9][a-z][0-9]'
x = s.str.contains(p)
print(type(x))
x

<class 'pandas.core.series.Series'>


0     True
1    False
2    False
3    False
dtype: bool

In [39]:
s.str.match(p)

0     True
1    False
2    False
3    False
dtype: bool