# [Regular Expressions in Python](https://www.datacamp.com/completed/statement-of-accomplishment/course/43a09b72c90c13053dfdc04f32f46fbb77b25cbe)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/adamelliotfields/datacamp/blob/main/notebooks/courses/regular_expressions_in_python/notebook.ipynb)
[![Render nbviewer](https://raw.githubusercontent.com/jupyter/design/main/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/adamelliotfields/datacamp/blob/main/notebooks/courses/regular_expressions_in_python/notebook.ipynb)

**Contents:**
  * [String Manipulation](#string-manipulation)
  * [String Formatting](#string-formatting)
  * [Regular Expressions](#regular-expressions)
  * [Advanced Regular Expressions](#advanced-regular-expressions)

In [1]:
import re
import pandas as pd
from datetime import datetime
from string import Template


## String Manipulation

In [2]:
movie = "fox and kelley soon become bitter rivals because the new fox books store is opening up right across the block from the small business ."
statement = "Number of characters in this review:"

# find number of characters
length_string = len(movie)

# convert to string
to_string = str(length_string)

# concatenate
print(f"{statement} {to_string}")


Number of characters in this review: 135


In [3]:
movie1 = "the most significant tension of _election_ is the potential relationship between a teacher and his student ."
movie2 = "the most significant tension of _rushmore_ is the potential relationship between a teacher and his student ."
# first 32 characters
first_part = movie1[:32]

# starting from 43rd character
last_part = movie1[42:]

# from 33rd to 42nd
middle_part = movie2[32:42]

# same!
print(f"{first_part}{middle_part}{last_part}")
print(movie2)


the most significant tension of _rushmore_ is the potential relationship between a teacher and his student .
the most significant tension of _rushmore_ is the potential relationship between a teacher and his student .


In [4]:
movie = "oh my God! desserts I stressed was an ugly movie"

# get the word
movie_title = movie[11:30]

# get the palindrome
palindrome = movie_title[::-1]

# same!
print(movie_title)
print(palindrome)


desserts I stressed
desserts I stressed


In [5]:
movie = "$I supposed that coming from MTV Films I should expect no less$"

# convert to lowercase
movie_lower = movie.lower()

# remove `$`
movie_no_sign = movie_lower.strip("$")

# split into substrings
movie_split = movie_no_sign.split()

# get 2nd word, all but last letter
word_root = movie_split[1][:-1]
print(movie_no_sign)
print(word_root)


i supposed that coming from mtv films i should expect no less
suppose


In [6]:
movie = "the film,however,is all good<\\i>"

# remove tags
movie_tag = movie.strip("<\\i>")

# split on commas
movie_no_comma = movie_tag.split(",")

# join back
movie_join = " ".join(movie_no_comma)

# print
print(movie_tag)
print(movie_join)


the film,however,is all good
the film however is all good


In [7]:
file = "mtv films election, a high school comedy, is a current example\nfrom there, director steven spielberg wastes no time, taking us into the water on a midnight swim"

# split at line boundaries (newline characters)
file_split = file.split("\n")

# split by commas
for substring in file_split:
    substring_split = substring.split(",")
    print(substring_split)


['mtv films election', ' a high school comedy', ' is a current example']
['from there', ' director steven spielberg wastes no time', ' taking us into the water on a midnight swim']


In [8]:
movies = pd.Series(
    [
        "it's clear that he's passionate about his beliefs , and that he's not just a punk looking for an excuse to beat people up .",
        "I believe you I always said that the actor actor actor is amazing in every movie he has played",
        "it's astonishing how frightening the actor actor norton looks with a shaved head and a swastika on his chest.",
    ],
    index=[200, 201, 202],
    name="text",
)

# if actor is not between char 37 and 41 (inclusive)
for movie in movies:
    if movie.find("actor", 37, 42) == -1:
        print("Word not found")
    # replace "actor actor" with "actor" only if repeated twice
    elif movie.count("actor") == 2:
        print(movie.replace("actor actor", "actor"))
    # replace "actor actor actor" with "actor"
    else:
        print(movie.replace("actor actor actor", "actor"))


Word not found
I believe you I always said that the actor is amazing in every movie he has played
it's astonishing how frightening the actor norton looks with a shaved head and a swastika on his chest.


In [9]:
movies = pd.Series(
    [
        "heck , jackie doesn't even have enough money f...",
        "in condor , chan plays the same character he's...",
    ],
    index=[137, 138],
    name="text",
)

for movie in movies:
    # find index of `money` between 12 and 50
    try:
        print(movie.index("money", 12, 51))
    except ValueError:
        print("substring not found")


39
substring not found


In [10]:
movies = "the rest of the story isn't important because all it does is serve as a mere backdrop for the two stars to share the screen ."

# replace negations
movies_no_negation = movies.replace("isn't", "is")

# replace "important" with "insignificant"
movies_antonym = movies_no_negation.replace("important", "insignificant")

# print
print(movies_antonym)


the rest of the story is insignificant because all it does is serve as a mere backdrop for the two stars to share the screen .


## String Formatting

In [11]:
wikipedia_article = "In computer science, artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals."
my_list = []

# characters 4 to 19 (inclusive)
first_pos = wikipedia_article[3:19]

# characters 22 to 44 (inclusive)
second_pos = wikipedia_article[21:44]

# template
my_list.append("The tool {} is used in {}")

# rearrange template
my_list.append("The tool {1} is used in {0}")

# print
for my_string in my_list:
    print(my_string.format(first_pos, second_pos))


The tool computer science is used in artificial intelligence
The tool artificial intelligence is used in computer science


In [12]:
courses = ["artificial intelligence", "neural networks"]
plan = {
    "field": courses[0],
    "tool": courses[1],
}

# use dict in template
my_message = (
    "If you are interested in {plan[field]}, you can take the course related to {plan[tool]}"
)

# the dict used becomes a keyword argument
print(my_message.format(plan=plan))


If you are interested in artificial intelligence, you can take the course related to neural networks


In [13]:
# get today's date
get_date = datetime.now()

# add named placeholders with format specifiers
message = "Good morning. Today is {today:%B %d, %Y}. It's {today:%H:%M} ... time to work!"

# print
print(message.format(today=get_date))


Good morning. Today is September 30, 2023. It's 11:29 ... time to work!


### F-strings

Proposed in [PEP-498](https://peps.python.org/pep-0498) and introduced in Python 3.6, _f-strings_ offer an elegant way to embed expressions inside string literals.

_Format specifiers_ are preceded by a colon (`:`) and are used specify how the expression should be formatted. For example, you might want floats to only have 2 decimal places, or you might want strings to be quoted.

In [14]:
field1 = "sexiest job"
field2 = "data is produced daily"
field3 = "Individuals"
fact1 = 21
fact2 = 2500000000000000000
fact3 = 72.41415415151
fact4 = 1.09

# complete f-strings
print(f"Data science is considered {field1!r} in the {fact1:d}st century")
print(f"About {fact2:e} of {field2} in the world")
print(f"{field3} create around {fact3:.2f}% of the data but only {fact4:.1f}% is analyzed")


Data science is considered 'sexiest job' in the 21st century
About 2.500000e+18 of data is produced daily in the world
Individuals create around 72.41% of the data but only 1.1% is analyzed


In [15]:
number1 = 120
number2 = 7
string1 = "httpswww.datacamp.com"
list_links = [
    "www.news.com",
    "www.google.com",
    "www.yahoo.com",
    "www.bbc.com",
    "www.msn.com",
    "www.facebook.com",
    "www.news.google.com",
]

# complete f-strings
print(
    f"{number1} tweets were downloaded in {number2} minutes indicating a speed of {number1 / number2:.1f} tweets per min"
)
print(f"{string1.replace('https', '')}")
print(f"Only {((len(list_links) * 100) / 120):.2f}% of the posts contain links")


120 tweets were downloaded in 7 minutes indicating a speed of 17.1 tweets per min
www.datacamp.com
Only 5.83% of the posts contain links


In [16]:
east = {"date": datetime(2007, 4, 20, 0, 0), "price": 1232443}
west = {"date": datetime(2006, 5, 26, 0, 0), "price": 1432673}

# complete f-strings
print(
    f"The price for a house in the east neighborhood was ${east['price']} in {east['date']:%m-%d-%Y}"
)
print(
    f"The price for a house in the west neighborhood was ${west['price']} in {west['date']:%m-%d-%Y}."
)


The price for a house in the east neighborhood was $1232443 in 04-20-2007
The price for a house in the west neighborhood was $1432673 in 05-26-2006.


### Template Strings

The `string` module provides a `Template` class that supports `$`-based substitutions. This is useful for string templates that are provided by users and not known until runtime. For example, in an email app, you could let users store reusable templates.

Template strings are safer because they do not evaluate expressions like f-strings do.

In [17]:
tool1 = "Natural Language Toolkit"
tool2 = "TextBlob"
tool3 = "Gensim"
description1 = "suite of libraries and programs for symbolic and statistical natural language processing (NLP) for English written in the Python programming language. It was developed by Steven Bird and Edward Loper in the Department of Computer and Information Science at the University of Pennsylvania."
description2 = "Python library for processing textual data. It provides a simple API for diving into common natural language processing tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more."
description3 = "robust open-source vector space modeling and topic modeling toolkit implemented in Python. It uses NumPy, SciPy and optionally Cython for performance. Gensim is specifically designed to handle large text collections, using data streaming and efficient incremental algorithms, which differentiates it from most other scientific software packages that only target batch and in-memory processing."

# create template
wikipedia = Template("$tool is a $description")

# substitute and print
print(wikipedia.substitute(tool=tool1, description=description1))
print(wikipedia.substitute(tool=tool2, description=description2))
print(wikipedia.substitute(tool=tool3, description=description3))


Natural Language Toolkit is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for English written in the Python programming language. It was developed by Steven Bird and Edward Loper in the Department of Computer and Information Science at the University of Pennsylvania.
TextBlob is a Python library for processing textual data. It provides a simple API for diving into common natural language processing tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
Gensim is a robust open-source vector space modeling and topic modeling toolkit implemented in Python. It uses NumPy, SciPy and optionally Cython for performance. Gensim is specifically designed to handle large text collections, using data streaming and efficient incremental algorithms, which differentiates it from most other scientific software packages that only target batch and in-memory processing.


In [18]:
tools = ["Natural Language Toolkit", "20", "month"]
our_tool = tools[0]
our_fee = tools[1]
our_pay = tools[2]

# create template
course = Template("We are offering a 3-month beginner course on $tool just for $$$fee ${pay}ly")

# substitute and print
print(course.substitute(tool=our_tool, fee=our_fee, pay=our_pay))


We are offering a 3-month beginner course on Natural Language Toolkit just for $20 monthly


In [19]:
answers = {"answer1": "I really like the app. But there are some features that can be improved"}

# create template
the_answers = Template("Check your answer 1: $answer1, and your answer 2: $answer2")

# use safe substution
try:
    print(the_answers.safe_substitute(answers))
except KeyError:
    print("Missing information")


Check your answer 1: I really like the app. But there are some features that can be improved, and your answer 2: $answer2


## Regular Expressions

A _Regular Expression_ or _RegEx_ is a sequence of characters that define a search pattern. They are an extremely powerful tool for finding patterns in text.

Regular expressions can be made up of regular characters and special _metacharacters_. Metacharacters have specific meaning and allow you to build complex expressions.

When writing regular expressions, it is common to use Python's _raw strings_ to avoid having to escape backslashes. Raw strings are prefixed with an `r`.

In [20]:
sentiment_analysis = "@robot9! @robot4& I have a good feeling that the show isgoing to be amazing! @robot9$ @robot7%"

# define regex
regex = r"@robot\d\W"

# find matches
print(re.findall(regex, sentiment_analysis))


['@robot9!', '@robot4&', '@robot9$', '@robot7%']


In [21]:
sentiment_analysis = "Unfortunately one of those moments wasn't a giant squid monster. User_mentions:2, likes: 9, number of retweets: 7"

# find matches
print(re.findall(r"User_mentions:\d", sentiment_analysis))
print(re.findall(r"likes:\s\d", sentiment_analysis))
print(re.findall(r"number\sof\sretweets:\s\d", sentiment_analysis))


['User_mentions:2']
['likes: 9']
['number of retweets: 7']


In [22]:
sentiment_analysis = (
    "He#newHis%newTin love with$newPscrappy. #8break%He is&newYmissing him@newLalready"
)

regex_sentence = r"\W\dbreak\W"
sentiment_sub = re.sub(regex_sentence, " ", sentiment_analysis)
regex_words = r"\Wnew\w"

# print
print(re.sub(regex_words, " ", sentiment_sub))


He is in love with scrappy.  He is missing him already


### Quantifiers

_Quantifiers_ determine how many instances of a particular character or group must be present in order for a match to be found.

In [23]:
sentiment_analysis = pd.Series(
    [
        "Boredd. Colddd @blueKnight39 Internet keeps stuffing up. Save me! https://www.tellyourstory.com",
        "I had a horrible nightmare last night @anitaLopez98 @MyredHat31 which affected my sleep, now I'm really tired",
        "im lonely  keep me company @YourBestCompany! @foxRadio https://radio.foxnews.com 22 female, new york",
    ],
    index=[545, 546, 547],
    name="text",
)

for tweet in sentiment_analysis:
    # match http links and print
    print(re.findall(r"https?\://\S+", tweet))

    # match user mentions and print
    print(re.findall(r"@\w+\d+", tweet))


['https://www.tellyourstory.com']
['@blueKnight39']
[]
['@anitaLopez98', '@MyredHat31']
['https://radio.foxnews.com']
[]


In [24]:
sentiment_analysis = pd.Series(
    [
        "I would like to apologize for the repeated Video Games Live related tweets. 32 minutes ago",
        "@zaydia but i cant figure out how to get there / back / pay for a hotel 1st May 2019",
        "FML: So much for seniority, bc of technological ineptness 23rd June 2018 17:54",
    ],
    index=[232, 233, 234],
    name="text",
)

for date in sentiment_analysis:
    print(re.findall(r"\d{1,2}\s\w+\sago", date))
    print(re.findall(r"\d{1,2}\w+\s\w+\s\d{4}", date))
    print(re.findall(r"\d{1,2}\w+\s\w+\s\d{4}\s\d{1,2}:\d{2}", date))


['32 minutes ago']
[]
[]
[]
['1st May 2019']
[]
[]
['23rd June 2018']
['23rd June 2018 17:54']


In [25]:
sentiment_analysis = "ITS NOT ENOUGH TO SAY THAT IMISS U #MissYou #SoMuch #Friendship #Forever"

# hashtag regex
regex = r"#\w+"

# replace hashtags
no_hashtag = re.sub(regex, "", sentiment_analysis)

# get tokens by splitting text
print(re.split(r"\s+", no_hashtag))


['ITS', 'NOT', 'ENOUGH', 'TO', 'SAY', 'THAT', 'IMISS', 'U', '']


In [26]:
sentiment_analysis = [
    "AIshadowhunters.txt aaaaand back to my literature review. At least i have a friendly cup of coffee to keep me company",
    "ouMYTAXES.txt I am worried that I won't get my $900 even though I paid tax last year",
]

# match filenames that stat with 2 vowels
regex = r"^[aeiouAEIOU]{2,3}\w+\.txt\b"

for text in sentiment_analysis:
    # find files
    print(re.findall(regex, text))

    # replace all matches with empty string
    print(re.sub(regex, "", text))


['AIshadowhunters.txt']
 aaaaand back to my literature review. At least i have a friendly cup of coffee to keep me company
['ouMYTAXES.txt']
 I am worried that I won't get my $900 even though I paid tax last year


In [27]:
emails = ["n.john.smith@gmail.com", "87victory@hotmail.com", "!#mary-=@msca.net"]

# username can contain letters, numbers and some symbols
# domain is separated by `@` and contains word characters followed by `.com`
regex = r"[A-Za-z0-9!#%&*\$\.]+@\w+\.com"

for example in emails:
    if re.match(regex, example):
        print("The email {email_example} is a valid email".format(email_example=example))
    else:
        print("The email {email_example} is invalid".format(email_example=example))


The email n.john.smith@gmail.com is a valid email
The email 87victory@hotmail.com is a valid email
The email !#mary-=@msca.net is invalid


In [28]:
passwords = ["Apple34!rose", "My87hou#4$", "abc123"]

# passwords can contain letters, numbers, and some symbols
# must be at least 8 characters, but no more than 20
regex = r"[A-Za-z0-9!#%&*\$\.]{8,20}"

for example in passwords:
    # find a match
    if re.search(regex, example):
        # complete the format method to print out the result
        print("The password {example} is a valid password".format(example=example))
    else:
        print("The password {example} is invalid".format(example=example))


The password Apple34!rose is a valid password
The password My87hou#4$ is a valid password
The password abc123 is invalid


### Greedy vs Lazy

_Greedy_ and _lazy_ describe how much text the regex engine should consume. Greedy matching tries to capture as as much text as possible, while lazy matching tries to capture as little text as possible.

By default, quantifiers are greedy. You can make them lazy by appending a `?` to the quantifier.

In [29]:
string = "I want to see that <strong>amazing show</strong> again!"

# no tags
print(re.sub(r"<.+?>", "", string))


I want to see that amazing show again!


In [30]:
sentiment_analysis = "Was intending to finish editing my 536-page novel manuscript tonight, but that will probably not happen. And only 12 pages are left "

numbers_found_greedy = re.findall(r"\d+", sentiment_analysis)
numbers_found_lazy = re.findall(r"\d+?", sentiment_analysis)

print(numbers_found_greedy)
print(numbers_found_lazy)


['536', '12']
['5', '3', '6', '1', '2']


In [31]:
sentiment_analysis = "Put vacation photos online (They were so cute) a few yrs ago. PC crashed, and now I forget the name of the site (I'm crying). "

# match text in parens
sentences_found_greedy = re.findall(r"\(.+\)", sentiment_analysis)
sentences_found_lazy = re.findall(r"\(.+?\)", sentiment_analysis)

print(sentences_found_greedy)
print(sentences_found_lazy)


["(They were so cute) a few yrs ago. PC crashed, and now I forget the name of the site (I'm crying)"]
['(They were so cute)', "(I'm crying)"]


## Advanced Regular Expressions

### Grouping

_Grouping_ allows you to treat multiple characters as a single unit.

In [32]:
sentiment_analysis = [
    "Just got ur newsletter, those fares really are unbelievable. Write to statravelAU@gmail.com or statravelpo@hotmail.com. They have amazing prices",
    "I should have paid more attention when we covered photoshop in my webpage design class in undergrad. Contact me Hollywoodheat34@msn.net.",
    "hey missed ya at the meeting. Read your email! msdrama098@hotmail.com",
]

# capture the name part of the email
regex_email = r"([A-Za-z0-9]+)@\S+"

for tweet in sentiment_analysis:
    # find all matches of regex in each tweet
    email_matched = re.findall(regex_email, tweet)

    # print
    print("Lists of users found in this tweet: {}".format(email_matched))


Lists of users found in this tweet: ['statravelAU', 'statravelpo']
Lists of users found in this tweet: ['Hollywoodheat34']
Lists of users found in this tweet: ['msdrama098']


In [33]:
flight = "Subject: You are now ready to fly. Here you have your boarding pass IB3723 AMS-MAD 06OCT"

# capture flight information
regex = r"([A-Z]{2})(\d{4})\s([A-Z]{3})-([A-Z]{3})\s(\d{2}[A-Z]{3})"
flight_matches = re.findall(regex, flight)

# print
print("Airline: {} Flight number: {}".format(flight_matches[0][0], flight_matches[0][1]))
print("Departure: {} Destination: {}".format(flight_matches[0][2], flight_matches[0][3]))
print("Date: {}".format(flight_matches[0][4]))


Airline: IB Flight number: 3723
Departure: AMS Destination: MAD
Date: 06OCT


### Alternation

In [34]:
sentiment_analysis = [
    "I totally love the concert The Book of Souls World Tour. It kinda amazing!",
    "I enjoy the movie Wreck-It Ralph. I watched with my boyfriend.",
    "I still like the movie Wish Upon a Star. Too bad Disney doesn't show it anymore.",
]

regex_positive = r"(love|like|enjoy).+?(movie|concert)\s(.+?)\."

for tweet in sentiment_analysis:
    positive_matches = re.findall(regex_positive, tweet)
    print("Positive comments found {}".format(positive_matches))


Positive comments found [('love', 'concert', 'The Book of Souls World Tour')]
Positive comments found [('enjoy', 'movie', 'Wreck-It Ralph')]
Positive comments found [('like', 'movie', 'Wish Upon a Star')]


### Non-capturing Groups

In [35]:
sentiment_analysis = [
    "That was horrible! I really dislike the movie The cabin and the ant. So boring.",
    "I disapprove the movie Honest with you. It's full of cliches.",
    "I dislike very much the concert After twelve Tour. The sound was horrible.",
]

# use `?:` for the non-capturing group
regex_negative = r"(hate|dislike|disapprove).+?(?:movie|concert)\s(.+?)\."

for tweet in sentiment_analysis:
    negative_matches = re.findall(regex_negative, tweet)
    print("Negative comments found {}".format(negative_matches))


Negative comments found [('dislike', 'The cabin and the ant')]
Negative comments found [('disapprove', 'Honest with you')]
Negative comments found [('dislike', 'After twelve Tour')]


### Backreferences

A _backreference_ is a reference to a previously matched group. They can be numbered or named. When using numbered groups, the `0` group is the entire match.

In [36]:
contract = "Provider will invoice Client for Services performed within 30 days of performance.  Client will pay Provider as set forth in each Statement of Work within 30 days of receipt and acceptance of such invoice. It is understood that payments to Provider for services rendered shall be made in full as agreed, without any deductions for taxes of any kind whatsoever, in conformity with Provider’s status as an independent contractor. Signed on 03/25/2001."

# capture month, day, and year where the contract was **signed**
regex_dates = r"Signed\son\s(\d{2})/(\d{2})/(\d{4})"
dates = re.search(regex_dates, contract)

# create a dict from the results
signature = {
    "day": dates.group(2),
    "month": dates.group(1),
    "year": dates.group(3),
}

# print
print(
    "Our first contract is dated back to {data[year]}. Particularly, the day {data[day]} of the month {data[month]}.".format(
        data=signature
    )
)


Our first contract is dated back to 2001. Particularly, the day 25 of the month 03.


In [37]:
html_tags = [
    "<body>Welcome to our course! It would be an awesome experience</body>",
    "<article>To be a data scientist, you need to have knowledge in statistics and mathematics</article>",
    "<nav>About me Links Contact me!",
]

for string in html_tags:
    # find if there is a match
    match_tag = re.match(r"<(\w+)>.*?</\1>", string)

    if match_tag:
        # if so, print the first captured group
        print("Your tag {} is closed".format(match_tag.group(1)))
    else:
        # if not, capture only the tag
        notmatch_tag = re.match(r"<(\w+)>", string)
        print("Close your {} tag!".format(notmatch_tag.group(1)))


Your tag body is closed
Your tag article is closed
Close your nav tag!


### Repeated Characters

In [38]:
sentiment_analysis = [
    "@marykatherine_q i know! I heard it this morning and wondered the same thing. Moscooooooow is so behind the times",
    "Staying at a friends house...neighborrrrrrrs are so loud-having a party",
    "Just woke up an already have read some e-mail",
]

# match elongated words
regex_elongated = r"\w*(\w)\1\w*"

for tweet in sentiment_analysis:
    match_elongated = re.search(regex_elongated, tweet)

    if match_elongated:
        elongated_word = match_elongated.group(0)
        print("Elongated word found: {word}".format(word=elongated_word))


Elongated word found: Moscooooooow
Elongated word found: neighborrrrrrrs


### Lookaround

In [39]:
sentiment_analysis = (
    "You need excellent python skills to be a data scientist. Must be! Excellent python"
)

# positive lookahead
look_ahead = re.findall(r"\w+(?=\spython)", sentiment_analysis)

# positive lookbehind
look_behind = re.findall(r"(?<=[Pp]ython\s)\w+", sentiment_analysis)


In [40]:
cellphones = ["4564-646464-01", "345-5785-544245", "6476-579052-01"]

# negative lookbehind
for phone in cellphones:
    number = re.findall(r"(?<!\d{3}-)\d{4}-\d{6}-\d{2}", phone)
    print(number)

# negative lookahead
for phone in cellphones:
    number = re.findall(r"\d{3}-\d{4}-\d{6}(?!-\d{2})", phone)
    print(number)


['4564-646464-01']
[]
['6476-579052-01']
[]
['345-5785-544245']
[]
