<a href="https://colab.research.google.com/github/arthurzhao234/CSE30/blob/main/NB1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Strings and Regular Expressions

*by Dr. Larissa Munishkina, March, 2025*


##Objectives
The purpose of this notebook is to provide you with knowledge of using strings and regular expressions in text analysis and text-based applications.

You will create:
* a function that validates the user's email
* a function `isfloat` that checks if a string can be converted into a float
* a Caesar's cipher encoder-decoder
* a hangman game (extra credit)

##What is a String?

A string literal is a sequence of characters enclosed in quotes, single quotes, double quotes, or triple quotes.

We use a string literal to represent a string value.

In [None]:
'a'       # a string literal enclosed in single quotes

In [None]:
"hello"    # string literal enclosed in double quotes

In [None]:
# string literal enclosed in triple quotes

'''I can have as many lines
as I want in a string literal
enclosed in triple quotes.'''

In programming, we store values in variables. We use assignment statements to assign values to variables.

If we assign a string to a variable, we create a string variable.

In [None]:
char = '$'
word = 'python'
sentence = 'I love Python'
paragraph = '''I can have as many lines
as I want in a string literal
enclosed in triple quotes'''

We can check the variable type using the function `type`. The data type of strings is a string or str for short.

In [None]:
type(char)

In [None]:
type(paragraph)

As you may know, there is no difference of using double or single quotes with string literals.

Only the usage of triple quotes is different -- we use them for string literals that are typed on several lines. Strings with triple quotes are used as docstrings to document code ([See Docstrings](https://peps.python.org/pep-0257/#what-is-a-docstring)).

Also, you need to remember that surrounding quotes present in a string literal are not the parts of the string value -- they are used as delimiters (they indicate the boundary of the string literal) for parsing the code by the Python interpreter.

For example, the string literal `'hello'` has the value `hello`.

 We can usually substitute single quotes with double quotes without affecting the code if the closed quote is the same as the open quote. However, in some cases, when we can combine single quotes with double quotes, it may affect the code.

In [None]:
print("Hello, John! What's up?")

If we substitute the double quotes with single quotes, we get an error.

In [None]:
print('Hello, John! What's up?')

##Escape Sequences

The issue with a string literal arises when the string literal has certain characters such as a quotation mark or apostrophe. To resolve this issue, we use the escape character `\` called a backslash. When a backslash is combined with the characters in a string literal, it changes the meaning of some string characters.

For example, a character `n` combined with a backslash forms a new character `\n` called a newline, `t` combined with a backslash forms a tab `\t`.

We can use a backslash to create a string literal containing quotation marks and apostrophes.

Let's check the following examples.

In [None]:
print('I can\'t believe it!')

In [None]:
print('We can create \ta \tstring \tcontaining \ttabs \nand \nnewlines.')

In [None]:
print('We can print a backslash: \\')

A sequence that starts with a backslash and followed by a character is called an **escape sequence** that has a different meaning than characters by themselves.

In [None]:
print('\n')

In [None]:
s = '\\' + 'n'
print(s)

Here are a few other examples that can be handy for designing an editor application.

In [None]:
print('this is not printed\rthis is printed')

In [None]:
print('we can fix the type\bo')

In [None]:
word = 'dogz'
print(word + '\bs')

##Print and Stdout

In the code above, we used the `print` function that allows us to visualize data. You may recall that the `print` function outputs data to the screen that is called the `stdout` that stands for the standard output.

On the screen, we see pixels of images called **glyphs** that represent characters. We cannot output any other images to the stdout, only images of string characters including spaces, tabs, and newlines.

To visualize regular images, we need to use applications with a GUI (**G**raphical **U**ser **I**nterface) window.

We use the `print` function to display data on the screen in a special window called the terminal. The terminal, also called the console, now refers to a special GUI window used within IDEs and shells.

In early days of computer science, terminals and consoles were stations that had only two devices, a screen and a keyboard. These stations were attached by wires to the mainframe computer and used to receive from and send data to the mainframe.

When we use the `print` function to output data to the terminal window, data are converted to a string. For example, integers are converted into a sequence of digits (string characters), Booleans are converted into strings 'True' or 'False', etc.



##Indexing and Slicing Operations

Strings are sequences of characters and stored in memory as arrays of characters.

An ***array*** is a sequence of elements, where each element has its index. An index is an integer that corresponds to the position of the element within the sequence. For example, in the sequence [1, 2, 3], the element 1 has index 0, 2 -- index 1, and 3 -- index 2. We usually start indexing elements from 0.

In the sequence `python`, the first element has index 0, the second element -- 1, and so on.

We can retrieve an element from an array using an indexing operation, which involves placing an index inside square brackets.

In [None]:
s = 'python'
s[0]  # this is an indexing operation

We can use the `len` function to determine how many elements are in a string and use that value to iterate over all its characters.

In [None]:
for i in range(len(s)):
  print(s[i])

In addition to the positive indexes used above, we can also use negative indexes. The last character in a string has the index $-1$, the second-to-last is $-2$, and so on. The first character can be accessed using the negative index `-len(s)`, where `s` is the string.

In [None]:
for i in range(-1, -len(s) - 1, -1):
    print(s[i])

In addition to indexing, we can use slicing to access parts of a string. Both indexing and slicing operations return new strings.

A slicing operation uses square brackets with up to three integers — start, stop, and step — separated by colons `:`.

In [None]:
s = 'hello world'
s[0:5:1]

In [None]:
s[6:11]

We can extract various substrings from a given string using slicing. For example, we can reverse a string.

In [None]:
s[::-1]

We can retrieve characters at even indexes.

In [None]:
n = '123456'
n[::2]

As you may notice, we can omit the start, stop, step, or even the colons in a slicing operation. When omitted, default values are used.

* If step is not specified, it defaults to $1$.
* If start is not specified, it defaults to 0 when step is positive, and $-1$ when step is negative.
* If stop is not specified, it defaults to the end of the string `len(s)` when step is positive, and to `-len(s) - 1` when step is negative."

In [None]:
s[::-1]

In [None]:
s[-1: -len(s)-1: -1]

In [None]:
s[:]

In [None]:
s[0: len(s): 1]

##String Concatenation and Formatting

We can concatenate strings using the + operator, which acts as a concatenation operator when both operands are strings.

Note that `+` is overloaded in Python, meaning its behavior depends on the types of its operands. For strings, it performs concatenation.

Concatenation is a binary operation on two strings, $a$ and $b$, that produces a new string $c$, formed by appending $b$ to $a$:

$c = ab$



In [None]:
s1 = 'Hello'
s2 = 'world'
s1 + s2

We can concatenate as many strings as we want.

In [None]:
s1 + ' ' + s2 + '!'

In addition to the `+` operator, we can use the asterisk `*` to perform repeated string concatenation. This requires an integer to specify how many times the string should be repeated.

When used this way, the `*` operator is called the repetition operator, as it creates a new string by repeating the original one multiple times.

In [None]:
'=' * 10

In [None]:
'hello ' * 5

In [None]:
5 * 'hi'

Often, we need to combine literal strings, variables (string, integers, floats, Booleans), and function calls in one string that we may use for data processing, printing, streaming, or storing.

We can format strings in various ways applying the following constructs or operations:

* f-strings
* string method format
* function format
* alignment operators
* % operators
* concatenation

Let's define some variables and functions and use them to form one string.

In [None]:
def add(a, b):
    return a + b

x = 10
y = 20
op = 'sum'

In f-strings, we prefix the string with **f** and use curly braces **{}** to embed variables and function calls. Although the f-string is written with placeholders, it is evaluated at runtime, where all variables and function calls are replaced with their actual values.

In [None]:
#@title f-string

f'The {op} of {x} and {y} is {add(x, y)}.'

The `format` string method works similarly to f-strings. We use curly braces {} as placeholders to embed variable values or function calls. However, with `format`, the actual variables and function calls are placed inside the method's parentheses, not directly in the string.

In [None]:
#@title the format method

'The {} of x and y is {}.'.format(op, add(x, y))

The built-in `format` function is different from the string method `format`. While both can be used for formatting, the `format` function typically takes two arguments: a format string and a value to format. It's often used for formatting numbers, dates, or other objects according to a specified format.

For example, we can use the following operators for aligning:
* `>` align to the right
* `<` align to the left
* `^` align at the center

We can convert numbers into decimal, binary, octal, hexadecimal, scientific, and float string representation:

* d decimal
* o octal
* h hexadecimal
* e scientific
* f float

We use numbers (like $15$ in the example below) to specify the width of the formatted string. A sequence consisting of a dot followed by a number (such as $.2)$ is used specifically for floating-point numbers to indicate decimal precision — that is, how many digits should appear after the decimal point.

In [None]:
#@title the format function

print('decimal notation    ', format(123, '>15d'))
print('binary notation     ', format(123, '>15b'))
print('octal notation      ', format(123, '>15o'))
print('hexadecimal notation', format(123, '>15x'))
print('scientific notation ', format(123.456, '>15e'))
print('float notation      ', format(123.456, '>15.2f'))


We can use alignment and float precision with both f-strings and the `format` method.

In [None]:
w = 'hello'
f'{w:.>15}'

In [None]:
number = 123.456789
formatted_string = "The number is {:.2f}".format(number)
print(formatted_string)

In earlier versions of Python, such as 2.x, string formatting was commonly done using the `%` operator. This style is still supported in Python 3.x.

Common format specifiers include:

* `%s` for strings
* `%d` for integers
* `%f` for floating-point numbers
* `%x` for hexadecimal values

In the following example, `%s` is used for the string variable name, and `%d` is used for the integer variable age.

In [None]:
#@title string formatting operator %
name = "Alice"
age = 20
sentence = "My name is %s, and I am %d years old." % (name, age)
print(sentence)

My name is Alice, and I am 20 years old.


##Regular Expressions

Regular expressions (often abbreviated as RE or regex) are used to search strings for specific patterns. A pattern is typically a string composed of defined character sequences (see [Regular Expressions](https://docs.python.org/3/howto/regex.html)).

For example, a pattern that matches a sequence of digits can be written as \d+, where \d represents any digit (0-9), and + is a repetition operator that allows one or more occurrences. This pattern can be used to find integers within text.

Another common pattern is a word. In computer science, a word is defined as a combination of letters, digits, and underscores. In regex, this can be expressed as \w+, where \w matches any alphanumeric character or underscore, and + again allows repetition. This pattern is useful for identifying words in a string.

A regular expression is a sequence of characters and special symbols (such as \d, \W, \w, etc.) combined with literals (*, ., +, A, 1, -, etc.) to define a search pattern. These expressions describe strings that belong to a regular language and are processed by a finite automaton — a type of state machine that can be implemented in code. Using finite automata for pattern matching is both fast and efficient.

In [None]:
#@title Finding numbers
import re
pattern = re.compile(r'\d+') # finds numbers in the text
matches = pattern.findall('5005 students and 234 teachers')
print(matches)

['5005', '234']


The same code snippet can be written in a different way without using `compile`.

In [None]:
import re
matches = re.findall(r'\d+', '5005 students and 234 teachers')
print(matches)

['5005', '234']


In [None]:
#@title Finding words
import re
text = 'Hello, 007! First Line.\nSecond Line\nThird Line.'
#print(text)
pattern = re.compile(r'\w+') # finds words in the text
matches = pattern.findall(text)
print(matches)

['Hello', '007', 'First', 'Line', 'Second', 'Line', 'Third', 'Line']


In [None]:
#@title Validating identifiers
import re
pattern = re.compile ( r'^[A-Za-z_]\w*$' )

identifier = input("Enter an identifier: ")
match = pattern.search(identifier)
if match :
    print("It is a valid identifier.")
else :
    print("It is not a valid identifier.")


Enter an identifier: #272@
It is not a valid identifier.


In [None]:
#@title Validating a password
import re
p_upper = re.compile(r'[A-Z]')
p_lower = re.compile(r'[a-z]')
p_digit = re.compile(r'[0-9]')
p_symbol = re.compile(r'[@#$%&?! ]')
while True :
    password = input("Enter a password: ")
    m_lower = p_lower.search(password)
    m_upper = p_upper.search(password)
    m_digit = p_digit.search(password)
    m_symbol = p_symbol.search(password)

    if not m_lower :
        print ("You need to use a lowercase letter.")
        continue
    if not m_upper :
        print ("You need to use an uppercase letter.")
        continue
    if not m_digit :
        print ("You need to use a digit.")
        continue
    if not m_symbol :
        print ("You need to use a special character.")
        continue
    break
print('Your password is created successfully!')

In [None]:
#@title Finding phone numbers
import re
p = re.compile(r'\(?\d{3}\)?-?\d{3}-?\d{4}')
while True :
    text = input("Enter text with phone numbers: ")
    m1 = p.search(text)
    if m1 :
        print ("Search group: ", m1.group())
    else :
        print ("Search: no number")
        continue
    m2 = p.findall(text)
    print ("Findall: ", m2)
    for m in m2 :
        print ("Findall group: ", m)
    m3 = p.match(text)
    if m3 :
        print ("Match: ", m3.group())
    else :
        print ("Match: no match")
    break

##Instructions

Instructions of using notebooks are from Prof. Luca de Alfaro

Copyright Luca de Alfaro, 2019-20. License: CC-BY-NC-ND.

### Notebook format

For each question in this notebook, there is:

* A text description of the problem.
* One or more places where you have to insert your solution.  You need to complete every place marked:

    `# YOUR SOLUTION HERE`
    
    and you should not modify any other place.
* One or more test cells.  Each cell is worth some number of points, marked at the top.  You should not modify these tests cells.  The tests pass if no error is printed out: when there is a statement that says, for instance:

    `assert x == 2`
    
    then the test passes if `x` has value 2, and fails otherwise.
     
* You can debug your work by placing `print()` or other statements in the cell marked

    `# YOUR OPTIONAL TESTS HERE`  

    and make sure your optional tests work with the rest of the code.

### Notes

* Your code will be tested both according to the tests you can see (the `assert` statements you can see), _and_ additional tests.  This prevents you from hard-coding the answer to the particular questions posed.  Your code should solve the _general_ intended case, not hard-code the particular answer for the values used in the tests.

* **Please do not delete or add cells!** The test is autograded, and if you modify the test by adding or deleting cells, even if you re-add cells you delete, you may not receive credit.

* **Please do not import modules that are not part of the [standard library](https://docs.python.org/3/library/index.html).** You do not need any, and they will likely not available in the grading environment, leading your code to fail.

* **If you are inactive too long, your notebook might get disconnected from the back-end.** Your work is never lost, but you have to re-run all the cells before you continue.

* You can write out print statements in your code, to help you test/debug it. But remember: the code is graded on the basis of what it outputs or returns, not on the basis of what it prints.

* **TAs and tutors have access to this notebook,** so if you let them know you need their help, they can look at your work and give you advice.

### Grading

Each cell where there are tests is worth a certain number of points.  You get the points allocated to a cell only if you pass _all_ the tests in the cell.

The tests in a cell include both the tests you can see, and other, similar, tests that are used for grading only.  Therefore, you cannot hard-code the solutions: you really have to solve the essence of the problem, to receive the points in a cell.


### Code of Conduct

* Work on the test yourself, alone.
* You can search documentation on the web, on sites such as the Python documentation sites, Stackoverflow, and similar, and you can use the results.
* You cannot share your work with others or solicit their help.


###Submission of your work

* First, click on "Runtime > Restart and run all", and check that you get no errors.  This enables you to catch any error you might have introduced, and not noticed, due to your running cells out of order.
* Second, submit your work on the notebook grader website before the deadline specified on Canvas to avoid late penalties.

You can submit multiple times; the last submission before the deadline is the one that counts.

#Exercises

## Question 1: An Email Validator

In this exercise, you need to implement a function called `is_valid` that takes a string and returns `True` if the string is a valid email address. Otherwise, it returns `False`.

A valid email address should be composed of three main parts:

* **Local** is the first part that typically identifies the specific user or mailbox. For example, in user@example.com, **user** is the local part. The local part can include letters, numbers, and special characters such as periods, hyphens, and underscores.
* **Domain** is the second part and separated from the local part by an ampersat **@**. It specifies the domain that hosts the email server. For example, in user@example.com, **example** is the domain name which can include letters, numbers, and hyphens.
* **Top-level domain (TLD)** is the last part and separated from the domain by a period. For example, **.com**, **.net**, or **.org**.

In an email address like john.doe@company.com, john.doe is the local-part, and "company.com" is the domain-part.

For simplicity, you can write a pattern that uses only letters, digits, and underscores for naming the local part and domains (do not worry about periods or hyphens that may be present in the real email addresses).

In [None]:
import re

In [None]:
#@title Implementing an email validator

def is_valid(email):
    email=email.split("@")
    if (len(email))!=2:
      return False
    Local=re.compile(r'^[A-Za-z0-9.-_]\w*$')
    identifier=email[0]
    match=Local.search(identifier)
    if not match:
      return False
    Domain=re.compile(r'\.org|\.net|\.com|\.edu')
    identifier1=email[1]
    match=Domain.search(identifier1)
    if not match:
      return False


    return True
    ### YOUR SOLUTION HERE

In [None]:
# Tests 10 points.

print(is_valid('hello@gmail.com'))
assert is_valid('hello@gmail.com')

print(is_valid('tom_121@cruz_2025.edu'))
assert is_valid('tom_121@cruz_2025.edu')

True
True


In [None]:
# Tests 10 points.

print(is_valid('#ello@gmail.com'))
assert not is_valid('#ello@gmail.com')

print(is_valid('hello@@gmail.com'))
assert not is_valid('hello@@gmail.com')

print(is_valid('hello@gmail.com'))
assert is_valid('hello@gmail.com')

False
False
True


**Interesting Facts:**

* The word ***ampersat*** refers to the **@** symbol, commonly used in email addresses and social media handles.
* ***Ampersat*** is a conjunction of ***ampers*** and ***at***.
* ***Ampers*** comes from the word ***ampersand*** that is a corruption of the phrase ***and per se and*** which means ***and by itself and*** and refers to the **&** symbol.
* The **@** symbol has an interesting history. It was originally used in medieval manuscripts as a shorthand for the Latin word **ad**, meaning **at** or **toward**. It later found its way into commerce as a unit price measurement, such as **5 apples @ $1 each**.
* Its modern use in email addresses was popularized by Ray Tomlinson in the early 1970s when he chose the **@** symbol to designate email addresses in the format **user@domain**.

Let's check for a missing period `.` and an ampersat `@`.

In [None]:
# Tests 10 points.

print(is_valid('hello@gmailcom'))
assert not is_valid('hello@gmailcom')

print(is_valid('hello_gmail.com'))
assert not is_valid('hello_gmail.com')

print(is_valid('hello_@gmail.com'))
assert is_valid('hello_@gmail.com')

False
False
True


##Question 2. Function isfloat

Often, we need to validate user input to check whether it can be converted into a float.

Write a function called `isfloat` that determines if a string can be safely converted into a float. If the string can be converted into a float, the function should return `True`; otherwise, it should return `False`.

Your `isfloat` function should be based on regular expressions.

In [None]:
import re

In [None]:
#@title Implementing a function isfloat

def isfloat(num):
    pattern=re.compile(r'^-?[0-9]*\.?[0-9]*?$')
    identifier=num
    match=pattern.search(identifier)
    if not match:
      return False
    return True
    ### YOUR SOLUTION HERE

In [None]:
# Tests 10 points.

print(isfloat('345'))
assert isfloat('345')

print(isfloat('$345'))
assert not isfloat('$345')

print(isfloat('345.'))
assert isfloat('345.')

print(isfloat('345.o'))  # o is not zero 0
assert not isfloat('345.o')

True
False
True
False


In [None]:
# Tests 10 points.

print(isfloat('123..'))
assert not isfloat('123..')

print(isfloat('123.456'))
assert isfloat('123.456')

print(isfloat('-123.456'))
assert isfloat('-123.456')

print(isfloat('123.-456'))
assert not isfloat('123.-456')

False
True
True
False


In [None]:
# Tests 10 points.

print(isfloat('.456'))
assert isfloat('.456')

print(isfloat('123.123.'))
assert not isfloat('123.123.')

print(isfloat('--123'))
assert not isfloat('--123')

print(isfloat('-123-123'))
assert not isfloat('-123-123')

True
False
False
False


## Question 3: A Caesar's Cipher

In this exercise, you can implement a simple encoder-decoder called a Caesar cipher. A Caesar cipher is based on shifting a code for each letter by some number.

For example, let's choose a shift that equals to 3 and a letter from the English alphabet that is A, then the encoded symbol is a letter that follows A and is located at the position that is shifted from A by 3, so it is D. D is the fourth letter in the alphabet, whereas A is the first letter, and the distance between them is 3.

In this encoding we use the right shift, so A becomes D, B becomes E, C becomes F, and so on. It is also easy to decode the cipher because we can use the left shift to reverse the encoding, so F becomes C, E becomes B, and D becomes A.

The only problem is how to encode the last letters in the alphabet because they are not followed by any other letters. This problem can be easily solved if we assume that the letters in the alphabet are arranged in a circle, so Z is followed by A. Then using the right shift 3, letter Z becomes C, letter Y becomes B, and letter X becomes A.

To achieve this functionality, we can use a circular array. A circular array is a sequence of items (that can be implemented as an array, list, or string) where items are accessed by an index that is calculated using the **modulo operation**.

In the following code, you need to complete the `encode` function that can be used as a Caesar cipher encoder and decoder (for encoding, we can use positive numbers for the right shift, and for decoding, we can use negative numbers for the left shift).

In [None]:
def encode(text, shift):
    alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    new_text = '' # a placeholder for the new text
    for char in text:
        if char in alphabet:
            index = alphabet.index(char)
            newindex = (index + shift) % len(alphabet)
            newletter = alphabet[newindex]
        else:
            newletter = char
        new_text=new_text+newletter
        ### YOUR SOLUTION HERE
    return new_text


In [None]:
# Tests 10 points.

text = 'HELLO'
shift = 3
print(encode(text, shift))
assert encode(text, shift) == 'KHOOR'

text = 'KHOOR'
shift = -3
print(encode(text, shift))
assert encode(text, shift) == 'HELLO'

KHOOR
HELLO


In [None]:
# Tests 10 points.

text = 'HELLO, HOW ARE YOU?'
shift = 3
print(encode(text, shift))
assert encode(text, shift) == 'KHOOR, KRZ DUH BRX?'

text = 'KHOOR, KRZ DUH BRX?'
shift = -3
print(encode(text, shift))
assert encode(text, shift) == 'HELLO, HOW ARE YOU?'

KHOOR, KRZ DUH BRX?
HELLO, HOW ARE YOU?


Now, you can write your own improved function `encode2` that can use strings with upper and lower case letters.

In [None]:
def encode2(text, shift):
    alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    Low_case = 'abcdefghijklmnopqrstuvwxyz'
    new_text = '' # a placeholder for the new text
    for char in text:
        if char in alphabet:
            index = alphabet.index(char)
            newindex = (index + shift) % len(alphabet)
            newletter = alphabet[newindex]
        elif char in Low_case:
            index = Low_case.index(char)
            newindex = (index + shift) % len(Low_case)
            newletter = Low_case[newindex]
        else:
            newletter = char
        new_text=new_text+newletter
        ### YOUR SOLUTION HERE
    return new_text


In [None]:
# Tests 10 points.

text = 'Hello'
shift = 3
print(encode2(text, shift))
assert encode2(text, shift) == 'Khoor'

text = 'Khoor'
shift = -3
print(encode2(text, shift))
assert encode2(text, shift) == 'Hello'

Khoor
Hello


In [None]:
# Tests 10 points.

text = 'Hello, How Are You?'
shift = 3
print(encode2(text, shift))
assert encode2(text, shift) == 'Khoor, Krz Duh Brx?'

text = 'Khoor, Krz Duh Brx?'
shift = -3
print(encode2(text, shift))
assert encode2(text, shift) == 'Hello, How Are You?'

Khoor, Krz Duh Brx?
Hello, How Are You?


## Extra Credit Question 4: A Hangman Game

Hangman is a classic word-guessing game that is a mix of logic, deduction, and vocabulary. It's great for parties, classrooms, or other occasions.

The goal of the game is to guess a hidden word, letter by letter. One player chooses a word and represents it as a series of blank spaces (_ _ _). The other player guesses letters to try to fill in the blanks.

For every incorrect guess, a part of a stick figure is drawn. This stick figure is hung on a gallows, piece by piece, as guesses run out — hence the name ***Hangman***.

If the word is guessed correctly before the stick figure is fully drawn, the guesser wins! If not, the figure is completed, and the guesser loses the game.

In this exercise, you can complete the hangman game written below. At the beginning, we select a secret word and let the program to run the hangman game. The program repetitively asks the user for an input letter and produces output, the secret word with correctly guessed letters.

The program does not draw a hangman but keeps track of left lives.

Your task is to complete two functions that updates the hidden word and outputs the hidden word with left lives and guessed letters. In the code below, the following variables are used:

* `word` refers to the secret word (a string made of letters)
* `hidden` refers to the hidden word used to display the secret word (it is a list made of double underscores and letters)
* `letters` refers to previously guessed letters (a list)
* `lives` refers to the left lives (an int)


The function `output_hidden` returns a string in the following format:
```
Letters chosen: A
A __ __ __ __ lives: 5
```

where the first line corresponds to letters already guessed, the second line corresponds to the hidden word with hidden letters shown as double underscores and correctly guessed letters. It also includes the left lives. In the example above, the user correctly guessed a letter `A`, the first letter in the hidden word is `A`, and the number of left lives is 5.


The function `update_hidden` does not return anything but modifies the list `hidden` -- it substitutes double underscores to the correstly guessed letter. For example, if the secret word is `APPLE`, and the user chooses a letter `A`, then the list `hidden` should be changed: the first item `__` (a double underscore) is changed to an `A`.

In [None]:

def update_hidden(word, hidden, letter):
    pattern=re.compile(letter)
    identifier=word
    match=pattern.search(identifier)
    if match:
      position=word.index(letter)
      hidden[position]=letter
    return hidden

def output_hidden(hidden, letters, lives):
    letters = ' '.join(letters)
    hidden = ' '.join(hidden)
    return(f"\nLetters chosen: {letters}\n{hidden} lives: {lives}")




def hangman(word, lives):
    word = word.upper()
    letters = []
    hidden = ["__"] * len(word)
    for i in range(len(word)) :
        if word[i] == '-':
            hidden[i] = '-'
        elif word[i] == '\'':
            hidden[i] = '\''

    while True:
        # format and print the game interface:
        print (output_hidden(hidden, letters, lives))

        # ask user to guess a letter
        while True:
            try :
                letter = input("Please choose a new letter > ").upper()
                if letter in letters:
                    print("You have already chosen this letter.")
                elif letter.isalpha() and len(letter) == 1:
                    break
            except Exception:
                continue

        # update the list of chosen letters
        letters.append(letter)

        # if the letter is correct update the hidden word,
        # else update the number of lives
        if letter in word :
            print ("You guessed right!")
            update_hidden(word, hidden, letter)
        else :
            print ("You guessed wrong, you lost one life.")
            lives -= 1

        # check if the user guesses the word correctly or lost all lives,
        # if yes finish the game
        if "__" not in hidden:
            # format and print the game interface:
            print(output_hidden(hidden, letters, lives))
            print ("Congratulations!!! You won! The word is " + word + "!")
            break
        elif lives < 1:
            # format and print the game interface:
            print(output_hidden(hidden, letters, lives))
            print ("You lost! The word is " + word + "!")
            break

In [None]:
# Tests 10 points.
hidden = ['__'] * 5
letters = ['A']
lives = 5

s = output_hidden(hidden, letters, lives)

print(s)
print('\nLetters chosen: A\n__ __ __ __ __ lives: 5')

print(len(s))
print(len('\nLetters chosen: A\n__ __ __ __ __ lives: 5'))

assert s == '\nLetters chosen: A\n__ __ __ __ __ lives: 5'


Letters chosen: A
__ __ __ __ __ lives: 5

Letters chosen: A
__ __ __ __ __ lives: 5
42
42


In [None]:
# Tests 10 points.
word = 'APPLE'
hidden = ['__'] * 5
letter = 'A'

update_hidden(word, hidden, letter)
print(hidden)
assert hidden == ['A', '__', '__', '__', '__']

['A', '__', '__', '__', '__']


And now, you are ready to play a hangman game. Feel free to modify it how you like it for your future projects.

In [None]:
# You need to uncomment the statement below to run the game
# hangman('apple', 5)