<a href="https://colab.research.google.com/github/Pakpako95/GooglexPython/blob/main/Google_IT_Automation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **🐍 Google IT Automation with Python Specialization**

## 🟥 **Course 2: Using Python to Interact with the Operating System**

### 🟢 **Module 3: Regular Expressions**

#### ◽**Theme 2: Basic Regular Expressions**

#####💠**Simple Matching in Python**

In Python's regex implementation, the re module provides functions for string manipulation. Use r prefix to create rawstrings, preventing Python from interpreting special characters.

The search function checks if a pattern exists in a string, returning a match object with information like position (span) and the matching substring.

If no match is found, search returns None. Special regex characters include ^ to match the beginning of a line and . to match any character. Additional options like re.IGNORECASE can modify matching behavior.

In [None]:
import re
result = re.search(r"aza", "plaza")
print(result)

In [None]:
import re
result = re.search(r"aza", "bazaar")
print(result)

In [None]:
import re
result = re.search(r"aza", "maze")
print(result)

print(re.search(r"^x", "xenon"))

In [None]:
import re
print(re.search(r"p.ng", "penguin"))

In [None]:
import re
print(re.search(r"p.ng", "clapping"))
print(re.search(r"p.ng", "sponge"))

In [None]:
import re
print(re.search(r"p.ng", "Pangaea", re.IGNORECASE))

##### 💠**Wildcards and Character Classes**

Character classes in regex allow matching specific character groups using square brackets.

To match Python with either uppercase or lowercase p, use `[Pp]ython`. Ranges are defined with dashes: `[a-z]` for lowercase letters, `[A-Z]` for uppercase, `[0-9]` for digits.

The `^` symbol inside brackets negates the class, matching anything not in the specified set. For example, `[^a-zA-Z]` matches characters that aren't letters.

The pipe symbol `|` creates alternatives, matching either expression. For instance, cat|dog matches either "cat" or "dog".
While search finds only the first match, findall returns all matches in a string.

In [None]:
import re
print(re.search(r"[Pp]ython", "Python"))

In [None]:
import re
print(re.search(r"[a-z]way", "The end of the highway"))
print(re.search(r"[a-z]way", "What a way to go"))
print(re.search("cloud[a-zA-Z0-9]", "cloudy"))
print(re.search("cloud[a-zA-Z0-9]", "cloud9"))

In [None]:
import re
print(re.search(r"[^a-zA-Z]", "This is a sentence with spaces."))
print(re.search(r"[^a-zA-Z ]", "This is a sentence with spaces."))

print(re.search(r"cat|dog", "I like cats."))
print(re.search(r"cat|dog", "I love dogs!"))
print(re.search(r"cat|dog", "I like both dogs and cats."))

print(re.search(r"cat|dog", "I like cats."))
print(re.search(r"cat|dog", "I love dogs!"))
print(re.search(r"cat|dog", "I like both dogs and cats."))
print(re.findall(r"cat|dog", "I like both dogs and cats."))

##### 💠**Repetition Qualifiers**

Repetition qualifiers in regex allow matching characters multiple times.
The ***** matches any character repeated zero or more times, but behaves greedily by matching as much as possible. For example, **p.*n matches from the first "p" to the last "n" in a string.

The + qualifier matches one or more occurrences of the preceding character, while ? matches zero or one occurrence. For example, O+L+ matches one or more O's followed by one or more L's, and P?each matches both "each" and "Peach".

These qualifiers help build more complex patterns for finding specific text patterns like the longest word in a string or hostnames in log files.

In [None]:
import re
print(re.search(r"Py.*n", "Pygmalion"))
print(re.search(r"Py.*n", "Python Programming"))
print(re.search(r"Py.*?n", "Python Programming"))
print(re.search(r"Py[a-z]*n", "Python Programming"))
print(re.search(r"Py[a-z]*n", "Pyn"))

In [None]:
import re
print(re.search(r"o+l+", "goldfish"))
print(re.search(r"o+l+", "woolly"))
print(re.search(r"o+l+", "boil"))

In [None]:
import re
print(re.search(r"p?each", "To each their own"))
print(re.search(r"p?each", "I like peaches"))

Greedy vs Non-Greedy in Regular Expressions\

**Greedy:** Matches as much as possible.
Example: a.*a
→ Captures from the first a to the last a.

**Non-Greedy:** Matches as little as possible.
Example: a.*?a
→ Captures from the first a to the next closest a.

Explaining the expression ( . * ? )

( . ) means any character (except newlines).

( * ) means zero or more repetitions of the preceding item (in this case, any character).

( ? ) makes the * non-greedy, so it matches the smallest possible number of characters between the two letters.

→ So ( . * ? ) means: “the fewest possible characters of any kind.”

##### 💠**Escaping Characters**

Escaping special characters in regex requires using a backslash (). For example, to match a literal dot rather than any character, use . as in .com which matches ".com" specifically.

Python uses raw strings (r"") to avoid confusion with string escape sequences like \n or \t, as backslashes won't be interpreted when generating the string.
Python provides special character sequences:

* \w matches alphanumeric characters (letters, numbers, underscores)
* \d matches digits
* \s matches whitespace (space, tab, newline)
* \b matches word boundaries

For regex testing and analysis, regex101.com is a helpful resource.

In [None]:
import re
print(re.search(r".com", "welcome"))
print(re.search(r"\.com", "welcome"))
print(re.search(r"\.com", "mydomain.com"))

<re.Match object; span=(2, 6), match='lcom'>
None
<re.Match object; span=(8, 12), match='.com'>


In [None]:
import re
print(re.search(r"\w*", "This is an example"))
print(re.search(r"\w*", "And_this_is_another"))

<re.Match object; span=(0, 4), match='This'>
<re.Match object; span=(0, 19), match='And_this_is_another'>


##### 💠**Regular Expressions in Action**

Regular expressions can be combined to create powerful pattern matching. To match countries that start and end with "a", use `^a.*a$`, where `^` marks the beginning and `$` marks the end of the string.

To validate Python variable names (starting with letter or underscore, containing letters, numbers or underscores):

Start with:

1. `^[a-zA-Z_]` to match first character
2. Add `[a-zA-Z0-9_]*$` to match remaining characters
3. The complete pattern is ` ^[a-zA-Z_][a-zA-Z0-9_]*$ `

This pattern correctly validates variable names like "my_var1" while rejecting invalid ones like "123var" (starts with number) or "my var" (contains space).

Practice with regex builds comfort with this powerful tool for text processing.

In [None]:
import re
print(re.search(r"A.*a", "Argentina"))
print(re.search(r"A.*a", "Azerbaijan"))
print(re.search(r"^A.*a$", "Azerbaijan"))
print(re.search(r"^A.*a$", "Argentina"))

<re.Match object; span=(0, 9), match='Argentina'>
<re.Match object; span=(0, 9), match='Azerbaija'>
None
<re.Match object; span=(0, 9), match='Argentina'>


In [None]:
import re
pattern = r"^[a-zA-Z_][a-zA-Z0-9_]*$"
print(re.search(pattern, "_this_is_a_valid_variable_name"))
print(re.search(pattern, "this isn't a valid variable"))
print(re.search(pattern, "my_variable1"))
print(re.search(pattern, "2my_variable1"))

Explanation of the regex `r"^[A-Z][a-z\s]+[.?!]$`:

`^` — Start of string

`[A-Z]` — First character must be an uppercase letter

`[a-z\s]+` — At least one lowercase letter or space

`[.?!]` — Ends with a period, question mark, or exclamation mark

`$` — End of string

##### 💠**Practice of the Theme: Basic Regular Expressions**

Question 1
The check_web_address() function checks if the text passed qualifies as a top-level web address, meaning that it contains alphanumeric characters (which includes letters, numbers, and underscores), as well as periods, dashes, and a plus sign, followed by a period and a character-only top-level domain such as ".com", ".info", ".edu", etc. Fill in the regular expression to do that, using escape characters, wildcards, repetition qualifiers, beginning and end-of-line characters, and character classes.

In [None]:
import re
def check_web_address(text):
  pattern = r"^[\w\.\-\+]+\.([a-zA-Z]+)$"
  result = re.search(pattern, text)
  return result != None

print(check_web_address("gmail.com")) # True
print(check_web_address("www@google")) # False
print(check_web_address("www.Coursera.org")) # True
print(check_web_address("web-address.com/homepage")) # False
print(check_web_address("My_Favorite-Blog.US")) # True


Question 2
The check_time() function checks for the time format of a 12-hour clock, as follows: the hour is between 1 and 12, with no leading zero, followed by a colon, then minutes between 00 and 59, then an optional space, and then AM or PM, in upper or lower case. Fill in the regular expression to do that. How many of the concepts that you just learned can you use here?

In [None]:
import re
def check_time(text):
  pattern = r"^(1[0-2]|[1-9]):[0-5][0-9]\s?(am|pm|AM|PM)$"
  result = re.search(pattern, text)
  return result != None

print(check_time("12:45pm")) # True
print(check_time("9:59 AM")) # True
print(check_time("6:60am")) # False
print(check_time("five o'clock")) # False
print(check_time("6:02 am")) # True
print(check_time("6:02km")) # False


Question 3
The contains_acronym() function checks the text for the presence of 2 or more characters or digits surrounded by parentheses, with at least the first character in uppercase (if it's a letter), returning True if the condition is met, or False otherwise. For example, "Instant messaging (IM) is a set of communication technologies used for text-based communication" should return True since (IM) satisfies the match conditions." Fill in the regular expression in this function:

In [None]:
import re
def contains_acronym(text):
  pattern = r"\(([A-Z][a-zA-Z0-9]+)\)"
  result = re.search(pattern, text)
  return result != None

print(contains_acronym("Instant messaging (IM) is a set of communication technologies used for text-based communication")) # True
print(contains_acronym("American Standard Code for Information Interchange (ASCII) is a character encoding standard for electronic communication")) # True
print(contains_acronym("Please do NOT enter without permission!")) # False
print(contains_acronym("PostScript is a fourth-generation programming language (4GL)")) # True
print(contains_acronym("Have fun using a self-contained underwater breathing apparatus (Scuba)!")) # True


What does the r before the pattern string in re.search(r"Py.*n", sample.txt) indicate?

Raw Strings

What does the plus character [+] do in regex?

Matches one or more occurrences of the character before it.



An intern implemented a zip code checker, but it works only with five-digit zip codes. Your task is to update the checker so that it includes all nine digits of the zip code; the leading five digits and the optional four after the hyphen. The zip code needs to be preceded by at least one space, and cannot be at the start of the text. Update the regular expression.

In [None]:
import re

def correct_function(text):
  result = re.search(r"\s\d{5}(-\d{4})?", text)  # Corrected regex pattern with space
  return result is not None

def check_zip_code(text):
  return correct_function(text)  # Call the correct_function

# Call the check_zip_code function with test cases
print(check_zip_code("The zip codes for New York are 10001 thru 11104."))  # True
print(check_zip_code("90210 is a TV show"))  # False (no space before 90210)
print(check_zip_code("Their address is: 123 Main Street, Anytown, AZ 85258-0001."))  # True
print(check_zip_code("The Parliament of Canada is at 111 Wellington St, Ottawa, ON K1A0A9."))  # False


#### ◽**Theme 3: Advanced Regular Expressions**

##### 💠**Capturing Groups**

Capturing groups in regex allow extracting matched portions for further processing. Created by enclosing patterns in parentheses, they store matched text that can be accessed using the `groups()` method or index notation.

When working with names in "lastname, firstname" format, use `(\w+)`, `(\w+)` to capture both parts separately. The complete match is accessed at index 0, while captured groups start at index 1.

To handle more complex names with spaces, dots, or dashes, expand the character class: `([a-zA-Z .-]+)`, `([a-zA-Z .-]+)`

This pattern can be implemented in a rearrange_name function that returns "firstname lastname" when the pattern matches, or the original string if no match is found.

In [None]:
import re
result = re.search(r" ^(\w*), (\w*)$", "Lovelace, Ada")
print(result)
print(result.groups())
print(result[0])
print(result[1])
print(result[2])
"{} {}".format(result[2], result[1])

<re.Match object; span=(0, 13), match='Lovelace, Ada'>
('Lovelace', 'Ada')
Lovelace, Ada
Lovelace
Ada


'Ada Lovelace'

In [None]:
import re
def rearrange_name(name):
    result = re.search(r"^(\w*), (\w*)$", name)
    if result is None:
        return name
    return "{} {}".format(result[2], result[1])
rearrange_name("Lovelace, Ada")
rearrange_name("Lovelace, Ada")

'Ada Lovelace'

In [None]:
import re
def rearrange_name(name):
    result = re.search(r"^(\w*), (\w*)$", name)
    if result is None:
        return name
    return "{} {}".format(result[2], result[1])
rearrange_name("Ritchie, Dennis")

'Dennis Ritchie'

In [None]:
import re
def rearrange_name(name):
    result = re.search(r"^([\w \.-]*), ([\w \.-]*)$", name)
    if result == None:
        return name
    return "{} {}".format(result[2], result[1])
rearrange_name("Hopper, Grace M.")

'Grace M. Hopper'

##### 💠**More on Repetition Qualifiers**

Numeric repetition qualifiers in regex allow matching patterns a specific number of times using curly brackets:

* `{n}`: Exactly n repetitions
* `{n,m}`: Between n and m repetitions
* `{n,}`: At least n repetitions
* `{,m}`: Up to m repetitions (from zero)

For example, `[a-zA-Z]{5}` matches exactly 5 letters. To match complete words of exactly 5 letters, use `\b[a-zA-Z]{5}\b` where `\b` marks word boundaries.

The findall function returns all matches rather than just the first one. For instance, `[a-zA-Z0-9]{5,10}` finds all alphanumeric sequences between 5-10 characters long, while `s[a-zA-Z0-9]{,20}` matches "s" followed by up to 20 alphanumeric characters.

In [None]:
import re
print(re.search(r"[a-zA-Z]{5}", "a ghost"))

<re.Match object; span=(2, 7), match='ghost'>


In [None]:
import re
print(re.search(r"[a-zA-Z]{5}", "a scary ghost appeared"))

<re.Match object; span=(2, 7), match='scary'>


In [None]:
import re
print(re.findall(r"[a-zA-Z]{5}", "a scary ghost appeared"))

['scary', 'ghost', 'appea']


In [None]:
import re
re.findall(r"\b[a-zA-Z]{5}\b", "A scary ghost appeared")

['scary', 'ghost']

In [None]:
import re
print(re.findall(r"\w{5,10}", "I really like strawberries"))

['really', 'strawberri']


In [None]:
import re
print(re.findall(r"\w{5,}", "I really like strawberries"))

['really', 'strawberries']


In [None]:
import re
print(re.search(r"s\w{,20}", "I really like strawberries"))

<re.Match object; span=(14, 26), match='strawberries'>


**Reflection:**

In [None]:
import re

def long_words(text):
  pattern = r"\b\w{7,}\b"
  result = re.findall(pattern, text)
  return result

# Test cases
print(long_words("I like to drink coffee in the morning."))  # ['morning']
print(long_words("I also have a taste for hot chocolate in the afternoon."))  # ['chocolate', 'afternoon']
print(long_words("I never drink tea late at night."))  # []

['morning']
['chocolate', 'afternoon']
[]


##### 💠**Extracting a PID Using regexes in Python**

The process ID extraction example uses capturing groups to get numbers between square brackets:

`[(\d+)]` matches a pattern where:

* `[` is an escaped opening square bracket
* `(\d+)` is a capturing group matching one or more digits
* `]` is an escaped closing square bracket

To safely extract PIDs from log lines, the extract_pid function:

1. Searches for the pattern in the string
2. Checks if the result exists (not None)
3. Returns the first capturing group if found
4. Returns an empty string if no match exists

This approach prevents errors when processing lines without PIDs. For example, given "[12345]" it returns "12345", while for strings without this pattern it returns an empty string.

The regex was modified from `r"\[(\d+)\]"` to `r"\[" + r"(\d+)" + r"\]"` in order to visualize it correctly in Github

In [None]:
import re
log = "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade"
# Regex format to prevent GitHub LaTeX rendering:
regex = r"\[" + r"(\d+)" + r"\]"
result = re.search(regex, log)
print(result[1])

12345


In [None]:
import re
log = "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade"
regex = r"\[" + r"(\d+)" + r"\]"
result = re.search(regex, log)
result = re.search(regex, "A completely different string that also has numbers [34567]")
print(result[1])

34567


In [None]:
import re
log = "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade"
regex = r"\[" + r"(\d+)" + r"\]"
result = re.search(regex, log)
result = re.search(regex, "A completely different string that also has numbers [34567]")
result = re.search(regex, "99 elephants in a [cage]")
print(result[1])
#Note that this print command results in an error as shown in the video.

In [None]:
import re
log = "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade"
regex = r"\[" + r"(\d+)" + r"\]"
result = re.search(regex, log)
result = re.search(regex, "A completely different string that also has numbers [34567]")
result = re.search(regex, "99 elephants in a [cage]")
def extract_pid(log_line):
    regex = r"\[" + r"(\d+)" + r"\]"
    result = re.search(regex, log_line)
    if result is None:
        return ""
    return result[1]
print(extract_pid(log))

12345


In [None]:
import re
log = "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade"
regex = r"\[" + r"(\d+)" + r"\]"
result = re.search(regex, log)
result = re.search(regex, "A completely different string that also has numbers [34567]")
result = re.search(regex, "99 elephants in a [cage]")
def extract_pid(log_line):
    regex = r"\[" + r"(\d+)" + r"\]"
    result = re.search(regex, log_line)
    if result is None:
        return ""
    return result[1]
print(extract_pid("99 elephants in a [cage]"))




**Reflection:** Add to the regular expression used in the extract_pid function, to return the uppercase message in parenthesis, after the process id.

In [None]:
import re
def extract_pid(log_line):
    regex = r"\[" + r"(\d+)" + r"\]:\s+([A-Z]+)"
    result = re.search(regex, log_line)
    if result is None:
        return None
    return "{} ({})".format(result.group(1), result.group(2))

print(extract_pid("July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade")) # 12345 (ERROR)
print(extract_pid("99 elephants in a [cage]")) # None
print(extract_pid("A string that also has numbers [34567] but no uppercase message")) # None
print(extract_pid("July 31 08:08:08 mycomputer new_process[67890]: RUNNING Performing backup")) # 67890 (RUNNING)

12345 (ERROR)
None
None
67890 (RUNNING)


**Regular Expression Explanation:**

    # r"\[" + r"(\d+)" + r"\]:\s+([A-Z]+)"
    #
    # \[         => Match a literal '[' character. The backslash escapes the special meaning of '['.
    # (\d+)      => Match one or more digits and capture them in group 1 (this is the PID).
    # \]         => Match a literal ']' character.
    # :          => Match a literal colon ':'.
    # \s+        => Match one or more whitespace characters (space, tab, etc.).
    # ([A-Z]+)   => Match one or more uppercase letters and capture them in group 2 (the status message).

##### 💠**Splitting and Replacing**

* **The `split()` Function**

Divides text using regex patterns as separators

Example: `re.split(r"[.?!]", text)` splits text into sentences
Using capturing parentheses (pattern) keeps separators in the result list

* **The `sub()` Function**

Substitutes matching patterns with replacement text
For anonymizing emails: `re.sub(r"[\w.%+-]+@[\w.-]+", "[REDACTED]", text)`

Captured groups can be referenced in replacements using `\1`, `\2`, etc.

Example for name rearrangement: `re.sub(r"([a-zA-Z \.-]+), ([a-zA-Z \.-]+)", r"\2 \1", name)`

These functions expand regex capabilities for text processing, making them powerful tools for data manipulation despite their complexity.

In [None]:
import re
re.split(r"[.?!]", "One sentence. Another one? And the last one!")

['One sentence', ' Another one', ' And the last one', '']

In [None]:
import re
re.split(r"([.?!])", "One sentence. Another one? And the last one!")

['One sentence', '.', ' Another one', '?', ' And the last one', '!', '']

In [None]:
import re
re.sub(r"[\w.%+-]+@[\w.-]+", "[REDACTED]", "Received an email for go_nuts95@my.example.com")

'Received an email for [REDACTED]'

In [None]:
import re
re.sub(r"^([\w .-]*), ([\w .-]*)$", r"\2 \1", "Lovelace, Ada")

'Ada Lovelace'

In [None]:
re.split(r"the|a", "One sentence. Another one? And the last one!")

['One sentence. Ano', 'r one? And ', ' l', 'st one!']

##### 💠**Practice**

**Advanced regex techniques expand pattern matching capabilities:**

Alterations use pipe symbol `(|)` to match any one of multiple options:

`r"location.*(London|Berlin|Madrid)"` matches `"location"` followed by any of the three cities

Position anchors limit where matches can occur:

`^` at beginning matches only at start of string
`$` at end matches only at end of string
Example: `r"^My name is (\w+)"` only matches if string begins with `"My name is"`

**Character ranges match single characters from defined sets:**

`r"[A-Z]"` matches any uppercase letter
`r"[0-9$-,.]"` matches digits or specific symbols
Combined with quantities: `r"([0-9]{3}-[0-9]{3}-[0-9]{4})"` matches phone numbers like "888-123-7612"

Backreferences in substitutions refer to captured groups:

`re.sub(r"([A-Z]).\s+(\w+)", r"Ms. \2", text)` changes "A. Weber" to "Ms. Weber"

Lookahead matches patterns only when followed by another pattern:

`r"(Test\d)-(?=Passed)"` matches Test numbers only if followed by "Passed"

Question 1
You’re working with a CSV file that contains employee information. Each record has a name field, followed by a phone number field, and a role field. The phone number field contains U.S. phone numbers and needs to be modified to the international format, with +1- in front of the phone number. The rest of the phone number should not change. Fill in the regular expression, using groups, to use the transform_record() function to do that.

In [None]:
import re
def transform_record(record):
  new_record = re.sub(r"(\d{3}-\d{3}-\d{4}|\d{3}-\d{7})", r"+1-\1", record)
  return new_record

print(transform_record("Sabrina Green,802-867-5309,System Administrator"))
# Sabrina Green,+1-802-867-5309,System Administrator

print(transform_record("Eli Jones,684-3481127,IT specialist"))
# Eli Jones,+1-684-3481127,IT specialist

print(transform_record("Melody Daniels,846-687-7436,Programmer"))
# Melody Daniels,+1-846-687-7436,Programmer

print(transform_record("Charlie Rivera,698-746-3357,Web Developer"))
# Charlie Rivera,+1-698-746-3357,Web Developer

The multi_vowel_words() function returns all words with 3 or more consecutive vowels (a, e, i, o, u). Fill in the regular expression to do that.

In [None]:
import re
def multi_vowel_words(text):
  pattern = r'\b\w*[aeiou]{3,}\w*\b'
  result = re.findall(pattern, text)
  return result

print(multi_vowel_words("Life is beautiful"))
# ['beautiful']

print(multi_vowel_words("Obviously, the queen is courageous and gracious."))
# ['Obviously', 'queen', 'courageous', 'gracious']

print(multi_vowel_words("The rambunctious children had to sit quietly and await their delicious dinner."))
# ['rambunctious', 'quietly', 'delicious']

print(multi_vowel_words("The order of a data queue is First In First Out (FIFO)"))
# ['queue']

print(multi_vowel_words("Hello world!"))
# []

When capturing regex groups, what datatype does the groups method return?

A tuple

The transform_comments() function converts comments in a Python script into those usable by a C compiler. This means looking for text that begins with a hash mark (#) and replacing it with double slashes (//), which is the C single-line comment indicator. For the purpose of this exercise, we'll ignore the possibility of a hash mark embedded inside of a Python command, and assume that it's only used to indicate a comment. We also want to treat repetitive hash marks (##), (###), etc., as a single comment indicator, to be replaced with just (//) and not (#//) or (//#). Fill in the parameters of the substitution method to complete this function:

In [None]:
import re
def transform_comments(line_of_code):
  result = re.sub(r'#+', '//', line_of_code)
  return result

print(transform_comments("### Start of program"))
# Should be "// Start of program"
print(transform_comments("  number = 0   ## Initialize the variable"))
# Should be "  number = 0   // Initialize the variable"
print(transform_comments("  number += 1   # Increment the variable"))
# Should be "  number += 1   // Increment the variable"
print(transform_comments("  return(number)"))
# Should be "  return(number)"

The convert_phone_number() function checks for a U.S. phone number format: XXX-XXX-XXXX (3 digits followed by a dash, 3 more digits followed by a dash, and 4 digits), and converts it to a more formal format that looks like this: (XXX) XXX-XXXX. Fill in the regular expression to complete this function.

In [None]:
import re
def convert_phone_number(phone):
  result = re.sub(r'(\b\d{3})-(\d{3})-(\d{4}\b)', r'(\1) \2-\3', phone)
  return result

print(convert_phone_number("My number is 212-345-9999.")) # My number is (212) 345-9999.
print(convert_phone_number("Please call 888-555-1234")) # Please call (888) 555-1234
print(convert_phone_number("123-123-12345")) # 123-123-12345
print(convert_phone_number("Phone number of Buckingham Palace is +44 303 123 7300"))
# Phone number of Buckingham Palace is +44 303 123 7300

#### ◽**Theme 4: Module Review**

##### 💠**Glossary terms from course 2, module 3**

**Terms and definitions from course 2, module 3**

**Alteration:** RegEx that matches any one of the alternatives separated by the pipe symbol

**Backreference:** This is applied when using re.sub( ) to substitute the value of a capture group into the output

**Character classes:** These are written inside square brackets and let us list the characters we want to match inside of those brackets

**Character ranges:** Ranges used to match a single character against a set of possibilities

**grep:** An especially easy to use yet extremely powerful tool for applying RegExes

**Lookahead:** RegEx that matches a pattern only if it’s followed by another pattern

**Regular expression:** A search query for text that's expressed by string pattern, also known as RegEx or RegExp

**Wildcard:** A character that can match more than one character

#####💠**Qwiklabs assessment: Work with regular expressions**

**Introduction:**
It's time to put your new skills to the test! In this lab, you'll have to find the users using an old email domain in a big list using regular expressions.

**What you'll do**
* Replacing the old domain name (abc.edu) with a new domain name (xyz.edu).

* Storing all domain names, including the updated ones, in a new file.

Please note that there is a graded quiz that follows this lab. You must complete the lab before attempting the quiz. The quiz will assess your comprehension of the key concepts and procedures covered in the lab.

Here's what you can do to prepare:

* Pay close attention to the instructions and explanations provided during the lab session.

* Actively participate in the lab activities and take notes throughout.

* Review your lab notes before taking the quiz.

**Pro tip:**
You are allowed to refer to your completed lab notes during the quiz.

Data directory in `cd data`. `ls`. `user_emails.csv`. `cat user_email.csv` -> Access to Python with `cd ~/scripts`. `ls`. `script.py`. Use this file to use regex to find all the instances with the old domain `("abc.edu")` in the user_email.csv and replace them with the new domain `("xyz.edu")`. The file has already the functions. Just complete it to make it work.

Let's update permissions `sudo schmod 777 script.py`
`nano script.py`. `import csv`, `import re`.

**IDENTIFY THE OLD DOMAIN**
```
def contains_domain(address, domain):
  domain_pattern = r'[\w\.-]+@'+domain+'$'
  if re.match(domain_pattern, address):
    return True
  return False
```
**REPLACE THE DOMAIN NAME**
```
def replace_domain(address, old_domain, new_domain):
  old_domain_pattern = r'' + old_domain + '$'
  address = re.sub(old_domain_pattern, new_domain, address)
  return address
```
**Write a CSV file with replaced domain from main**
```#!/usr/bin/env python3

import re
import csv

def contains_domain(address, domain):
    """Returns True if the email address contains the given domain in the domain position, False if not."""
    domain_pattern = r'[\w\.-]+@' + domain + '$'
    if re.match(domain_pattern, address):
        return True
    return False

def replace_domain(address, old_domain, new_domain):
    """Replaces the old domain with the new domain in the received address."""
    old_domain_pattern = r'' + old_domain + '$'
    address = re.sub(old_domain_pattern, new_domain, address)
    return address

def main():
    """Processes the list of emails, replacing any instances of the old domain with the new domain."""
    old_domain, new_domain = 'abc.edu', 'xyz.edu'
    csv_file_location = '/home/student/data/user_emails.csv'
    report_file = '/home/student/data/updated_user_emails.csv'

    user_email_list = []
    old_domain_email_list = []
    new_domain_email_list = []

    with open(csv_file_location, 'r') as f:
        user_data_list = list(csv.reader(f))
        user_email_list = [data[1].strip() for data in user_data_list[1:]]

        for email_address in user_email_list:
            if contains_domain(email_address, old_domain):
                old_domain_email_list.append(email_address)
                replaced_email = replace_domain(email_address, old_domain, new_domain)
                new_domain_email_list.append(replaced_email)

        email_index = user_data_list[0].index('Email Address')

        for user in user_data_list[1:]:
            for old_email, new_email in zip(old_domain_email_list, new_domain_email_list):
                if user[email_index].strip() == old_email:
                    user[email_index] = new_email

    with open(report_file, 'w+') as output_file:
        writer = csv.writer(output_file)
        writer.writerows(user_data_list)

main()

```
Run `./script.py`
`ls ~/data`
`cat /dataupdated_user_emails.csv`

Expected result:

```
Full Name, Email Address
Blossom Gill, blossom@xyz.edu
Hayes Delgado, nonummy@utnisia.com
Petra Jones, ac@xyz.edu
Oleg Noel, noel@liberomauris.ca
Ahmed Miller, ahmed.miller@nequenonquam.co.uk
Macaulay Douglas, mdouglas@xyz.edu
Aurora Grant, enim.non@xyz.edu
Madison Mcintosh, mcintosh@nisiaenean.net
Montana Powell, montanap@semmagna.org
Rogan Robinson, rr.robinson@xyz.edu
Simon Rivera, sri@xyz.edu
Benedict Pacheco, bpacheco@xyz.edu
Maisie Hendrix, mai.hendrix@xyz.edu
Xaviera Gould, xlg@utnisia.net
Oren Rollins, oren@semmagna.com
Flavia Santiago, flavia@utnisia.net
Jackson Owens, jackowens@xyz.edu
Britanni Humphrey, britanni@ut.net
Kirk Nixon, kirknixon@xyz.edu
Bree Campbell, breee@utnisia.net

```



##### 💠**Module 3 challenge: Work with Regular Expressions**

✅ Question 1

What is a regular expression?

Answer: A sequence of characters that forms a search pattern

✅ Question 2

What is a characteristic of a CSV file?

Answer: Data in each row is separated by a special character

✅ Question 3

Complete the function to extract full URLs. Question 3
You are reading an article that contains website urls in the format:

https://www.website-domain.com

You’d like to extract the complete urls from the text automatically, instead of copying and pasting them by hand. Complete the function find_url() to extract all encrypted websites that begin with https:// and end with any top level domain, such as .org, .com, or .co from the text.

In [None]:
def find_url(website):
 pattern = r"https://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"#enter the regex pattern here
 result = re.findall(pattern, website) #enter the re method here
 return result


print(find_url("Go to the website https://www.coursera.com find more information about Google Certificate Programs. Then, visit https://www.python.org/ to learn more about Python. ")) # Should return ['https://www.coursera.com', 'https://www.python.org']
print(find_url("You can find anything on https://www.google.com!")) # Should return ['https://www.google.com']
print(find_url("You can find anything on http://www.google.com!")) # Should return []
print(find_url("Check out python.org!")) # Should return []


✅ Question 4

Split sentences keeping punctuation with the word.

You are exploring the punctuation at the end of sentences and want to split sentences so that each word is separate and any punctuation is included in the word next to it. Complete the parse_sentences() function to accomplish this task.

In [None]:
def parse_sentences(sentence):
 pattern = r'\w+[\!\?\.]?|[^\w\s]' #enter the regex pattern here
 result = re.findall(pattern, sentence) #enter the re method  here
 return result

print(parse_sentences("Hello! How are you doing?")) # should return ['Hello!', 'How', 'are', 'you', 'doing?']
print(parse_sentences("what a beautiful day it is")) # should return ['what', 'a', 'beautiful', 'day', 'it', 'is']
print(parse_sentences("2 + 2 is definitely 4!")) # should return ['2', '+', '2', 'is', 'definitely', '4!']


✅ Question 5

Extract 9-character employee IDs. A company uses unique, 9-character codes that begin with a capital letter, followed by a hyphen (-), followed by 7 or 8 digits as employee ID numbers, in the format:

A-1234567 or A-12345678

Project reports submitted to the company include the employee’s ID number and a summary of the work they completed on the project. A data analyst wants to pull all of the employee ID numbers out of these projects. Complete the find_eid() function to extract these employee ID numbers from the reports.

In [None]:
def find_eid(report):
  pattern = r'\b[A-Z]-\d{7,8}\b' #enter the regex pattern here
  result = re.findall(pattern, report) #enter the re method  here
  return result


print(find_eid("Employees B-1234567 and C-12345678 worked with products X-123456 and Z-123456789")) # Should return ['B-1234567', 'C-12345678']
print(find_eid("Employees B-1234567 and C-12345678, not employees b-1234567 and c-12345678")) #Should return ['B-1234567', 'C-12345678']


✅ Question 6

Role of replace_domain() in the lab.

Answer: To replace the old domain with the new domain in an email address



In [None]:
def replace_domain(address, old_domain, new_domain):
  old_domain_pattern = r'' + old_domain + '$'
  address = re.sub(old_domain_pattern, new_domain, address)
  return address

✅ Question 7

Purpose of old_domain_email_list.

Answer: To store email addresses with the old domain that match the regex pattern

✅ Question 8

Why write the updated list to an output file?

Answer: To save the changes made to the email addresses and create a new CSV file with updated data

✅ Question 9

Why replace domains and generate a new file?

Answer: To enhance data security measures

✅ Question 10

How contains_domain() and replace_domain() work together.

Answer:
contains_domain() uses a regular expression to identify emails with a specific domain, and replace_domain() replaces these domains with new ones

### 🟢 **Module 4: Managing Data and Processes**


#### ◽**Theme 1: Data Streams**

##### 💠**Reading data interactively** 24/05/20

Python's `input()` function allows interactive user input during script execution. It always returns data as a string, so conversion to other types is necessary when needed.

In [None]:
#!/usr/bin/env python3
name = input("Please enter your name: ")
print("Hello, " + name)

Please enter your name: Paco
Hello, Paco


When collecting numeric input, convert the string to the appropriate type:

In [None]:
def to_seconds(hours, minutes, seconds):
    return hours*3600+minutes*60+seconds

print("Welcome to this time converter")

cont = "y"
while(cont.lower() == "y"):
    hours = int(input("Enter the number of hours: "))
    minutes = int(input("Enter the number of minutes: "))
    seconds = int(input("Enter the number of seconds: "))

    print("That's {} seconds".format(to_seconds(hours, minutes, seconds)))
    print()
    cont = input("Do you want to do another conversion? [y to continue] ")

print("Goodbye!")

Welcome to this time converter
Enter the number of hours: 2
Enter the number of minutes: 30
Enter the number of seconds: 0
That's 9000 seconds

Do you want to do another conversion? [y to continue] n
Goodbye!


While interactive input isn't always the best solution for automation tasks, it's valuable for scenarios requiring user-specific information that can't be predetermined.

##### 💠**Standard Streams**

I/O streams are pathways for programs to receive input and send output. Python uses three default streams:

* STDIN (Standard Input): Channel for receiving input data, typically from keyboard. Used when calling the input() function.
* STDOUT (Standard Output): Channel for sending normal program output, usually displayed on screen. Used when calling the print() function.
* STDERR (Standard Error): Channel specifically for error messages and diagnostics. Python error messages like TypeError appear here.

In [None]:
#!/usr/bin/env python3

data = input("This will come from STDIN: ")
print("Now we write it to STDOUT: " + data)

This will come from STDIN: Paco
Now we write it to STDOUT: Paco


These streams aren't exclusive to Python - all system commands use them. For instance, when using commands like cat or ls, their output goes to STDOUT and their errors to STDERR, though both typically display on screen.

##### 💠**Environment variables**

The shell is a command-line interface that executes commands on Linux systems. Common shells include bash (most common), Zsh, and Fish. Environment variables are values set in the shell environment that programs can access.
Environment variables store information like:

* PATH: Lists directories the shell searches for executable files
* HOME: User's home directory location
* SHELL: Current shell being used

To view environment variables:

* Use the env command to see all variables
* Use echo $VARIABLE_NAME to see specific variables

In Python, access environment variables using the os.environ dictionary from the os module:

In [None]:
!echo $PATH

/opt/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin


In [None]:
#!/usr/bin/env python3
import os
print("HOME: " + os.environ.get("HOME", ""))
print("SHELL: " + os.environ.get("SHELL", ""))
print("FRUIT: " + os.environ.get("FRUIT", ""))

HOME: /root
SHELL: /bin/bash
FRUIT: 


The get() method returns a default value (empty string) if the variable doesn't exist, preventing errors.

To set environment variables for a Python script to access:

In [None]:
!export FRUIT=Strawberry

In [None]:
#!/usr/bin/env python3
import os
os.environ["FRUIT"] = "Strawberry"

print("HOME: " + os.environ.get("HOME", ""))
print("SHELL: " + os.environ.get("SHELL", ""))
print("FRUIT: " + os.environ.get("FRUIT", ""))

HOME: /root
SHELL: /bin/bash
FRUIT: Strawberry


##### 💠**Command-Line Arguments and Exit Status** 25/05/20

Command line arguments provide information to programs at startup. They enable generic code to run automatically without user input, making them valuable for system administration and automation tasks. Access these values using `sys.argv` list from the sys module. When running a script without parameters, `sys.argv` contains only the program name; with parameters, each appears as a separate element in the list.

The exit status (or return code) is a value returned by programs to the shell. In Unix-like systems, exit status 0 indicates success, while non-zero values indicate failure with specific error information. Check exit status using the $? variable. The wc command demonstrates this behavior, returning 0 for success and 1 for failure.

Python scripts exit with 0 by default when successful and non-zero values when errors occur. Developers can specify custom exit codes using `sys.exit()` to indicate different error conditions. This functionality helps in error handling, logging failures, and implementing automatic retries when commands fail.

In [None]:
import sys

# Mocking command-line arguments

sys.argv = ['script_name.py', 'one', 'two', 'three']

print(sys.argv)

['script_name.py', 'one', 'two', 'three']


In [None]:
# Execute a command and capture its return code
import subprocess

# For a successful command
result = subprocess.run(['echo', 'hello'], capture_output=True)
print(f"Exit code: {result.returncode}")  # Should print 0

# For a command that fails
result = subprocess.run(['cat', 'nonexistent_file.txt'], capture_output=True)
print(f"Exit code: {result.returncode}")  # Should print non-zero (usually 1)

Exit code: 0
Exit code: 1


The image demonstrates the `$?` variable in Linux/Unix terminals, which stores the exit status of the previous command:

**Exit status 0:** Indicates successful command execution (shown after wc variables.py)

**Exit status non-zero:** Indicates command failure (shown as 1 after trying to access a non-existent file)

##### 💠**More About Input Functions**

The shebang line #!/usr/bin/env python3 specifies Python 3 as the interpreter, which is important because of differences in handling data streams between Python versions.

In Python 2, raw_input() is the preferred method for getting user input as it simply captures a string without evaluating it. The input() function in Python 2 acts like eval(raw_input()), meaning it evaluates the entered text as a Python expression, performing operations like basic math.

In Python 3, input() behaves like Python 2's raw_input(), capturing user input as a plain string without evaluation. The expression entered is treated solely as a string. To evaluate a string as a Python expression in Python 3, eval() must be explicitly called on the input string.

Python 3 doesn't natively include raw_input(), though there are techniques to maintain backward compatibility when modernizing legacy Python code. This difference in input handling is crucial when writing code that needs to run in different Python environments.

##### 💠**Practice**

**Question 1**
Which command will print out the exit value of a script that just ran successfully? **echo $?**

**Question 2** Which command will create a new environment variable? **Env**

**Question 3** Which I/O stream are we using when we use the input function to accept user input in a Python script? **STDIN**

**Question 4** What is the meaning of an exit code of 0? **The program ended succesfully**

**Question 5** Which statements are true about input() and raw_input() in Python 2? (select all that apply) **In Python 2, input() evaluates the user's input as an expression., raw_input() gets a raw string from the user.**

#### ◽**Theme 2: Python Subprocesses**

##### 💠**Running System Commands in Python**

The subprocess module enables Python scripts to execute system commands when built-in or external modules lack necessary functionality. The `subprocess.run()` function launches these commands and returns a CompletedProcess object containing execution information.

ICMP stands for Internet Control Message Protocol.

When executing an external command, Python creates a secondary environment for the child process (subprocess) while the parent process (Python script) remains blocked until completion. The function receives a list starting with the command name followed by any parameters.

The returncode attribute in the CompletedProcess object indicates success (0) or failure (non-zero). This information allows scripts to implement conditional logic based on command execution results.

This approach works well for commands without useful output (like cp, chmod, sleep) or when displaying output on screen is sufficient. For commands whose output needs further processing, a different strategy is required.

In [None]:
import subprocess
subprocess.run(["date"])

CompletedProcess(args=['date'], returncode=0)

In [None]:
import subprocess
subprocess.run(["date"])
subprocess.run(["sleep", "2"])

CompletedProcess(args=['sleep', '2'], returncode=0)

In [None]:
import subprocess
subprocess.run(["date"])
subprocess.run(["sleep", "2"])
result = subprocess.run(["ls", "this_file_does_not_exist"])
print(result.returncode)

2


In [None]:
print("Code")

Code


##### 💠**Obtaining the Output of a System Command**

To manipulate system command output in Python scripts, use subprocess.run() with capture_output=True. This captures command output for further processing, useful for tasks like extracting user login information with commands like who.

When output is captured, it's stored in the stdout attribute of the CompletedProcess object. This output appears as a bytes array (indicated by the "B" prefix) because Python doesn't automatically know which encoding to use. Convert it to a proper string using the decode() method, which applies UTF-8 encoding by default. Once decoded, the string can be manipulated using standard string operations like split().

If the command writes to standard error, this content is captured in the stderr attribute. For example, when running rm with a non-existent filename, the command fails with a non-zero return code, an empty stdout, and error message in stderr. This demonstrates how Python separately captures the different output streams.

These capabilities allow Python scripts to execute system commands, verify their success or failure, and process their output, significantly expanding the range of tasks that can be automated.

In [None]:
result = subprocess.run(["host", "8.8.8.8"], capture_output=True)

result = subprocess.run(["host", "8.8.8.8"], capture_output=True)
print(result.returncode)

result = subprocess.run(["host", "8.8.8.8"], capture_output=True)
print(result.stdout)

result = subprocess.run(["host", "8.8.8.8"], capture_output=True)
print(result.stdout.decode().split())

0
b'8.8.8.8.in-addr.arpa domain name pointer dns.google.\n'
['8.8.8.8.in-addr.arpa', 'domain', 'name', 'pointer', 'dns.google.']


In [None]:
import subprocess
result = subprocess.run(["rm", "does_not_exist"], capture_output=True)

import subprocess
result = subprocess.run(["rm", "does_not_exist"], capture_output=True)
print(result.returncode)

import subprocess
result = subprocess.run(["rm", "does_not_exist"], capture_output=True)
print(result.returncode)
print(result.stdout)
print(result.stderr)

1
1
b''
b"rm: cannot remove 'does_not_exist': No such file or directory\n"


##### 💠**Advanced subprocess management**

The subprocess module offers advanced options for running system commands in Python. To modify environment variables for child processes, first copy the current environment with os.environ.copy(), make necessary changes, and pass this modified dictionary using the env parameter.

For example, adding a directory to the PATH variable requires joining the new directory with the existing path using os.path.join() and the appropriate path separator. This allows the command to find executables in additional locations.

Other useful run() parameters include:

* **cwd:** Changes the current working directory for command execution
* **timeout:** Kills the process if it exceeds the specified time limit
* **shell:** When set to True, executes the command inside the default system shell, enabling variable expansions and shell operations (though this poses security risks)

Using system commands via subprocesses comes with drawbacks. It builds assumptions about the infrastructure into scripts, which can break if command flags change or when switching operating systems. For one-off, well-defined tasks, subprocesses are convenient, but for complex or long-running operations, Python's built-in or external modules are usually better alternatives.

Before using subprocesses, check Python's standard library or PyPI repository to avoid reinventing existing solutions.

In [None]:
import os
import subprocess

# Create a simple script to demonstrate the concept
script_path = "/tmp/myapp"
with open(script_path, "w") as f:
    f.write('#!/bin/bash\necho "Hello from myapp. My PATH is $PATH"')

# Make the script executable
os.chmod(script_path, 0o755)

# Copy current environment and modify PATH
my_env = os.environ.copy()
my_env["PATH"] = os.pathsep.join(["/tmp", my_env["PATH"]])

# Run using full path (no need for shell)
result = subprocess.run([script_path], env=my_env, capture_output=True, text=True)

# Display the result
print("Return code:", result.returncode)
print("Output:", result.stdout)


Return code: 0
Output: Hello from myapp. My PATH is /tmp:/opt/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin



##### 💠**Study Guide**

subprocess allows running external applications and scripts from Python, enabling parallel execution and shell integration. It's ideal when interfacing with external processes, capturing outputs, handling errors, or running complex commands with minimal overhead.

For tasks like opening multiple files, running shell commands, or capturing return codes and pipe outputs, subprocess is preferred. However, for simple file or directory operations, os and Pathlib are more efficient and readable:

**1. Getting current directory:**

  * **subprocess:** check_output(['pwd'])

  * **os:** getcwd()

  * **Pathlib:** Path.cwd()

**2. Creating directories:**

  * **subprocess:** run(['mkdir', 'dir'])

  * **os:** mkdir()

  * **Pathlib:** Path().mkdir(exist_ok=True)

**subprocess** excels when precision and flexibility are needed:

* **run():** Recommended; executes, waits, and returns a CompletedProcess object with stdout, stderr, and return code.

* **call():** Older; runs and returns exit code.

* **check_call():** Like call(), but raises CalledProcessError on non-zero exit.

* **check_output():** Returns output; raises error on failure.

All these wrap around Popen(), which provides full control: background execution, input/output/error pipe handling, and non-blocking status checks.

This enables background processing without halting the script.

Takeaway: Use subprocess when shell-level control, process communication, or parallelism is needed. Use os or Pathlib for standard, readable file and path operations.

##### 💠**Practice**

**What type of object does a run function return?**


**How can you change the current working directory where a command will be executed?**

This will change the current working directory where the command will be executed.

**When a child process is run using the subprocess module, which of the following are true?**

To run the external command, a secondary environment is created for the child subprocess, where the command is executed.

While the parent process is waiting on the subprocess to finish, it’s blocked, meaning the parent can’t do any work until the child finishes.

After the external command completes its work, the child process exits, and the flow of control returns to the parent.

**When using the run command of the subprocess module, what parameter, when set to True, allows us to store the output of a system command?**

The capture_output parameter allows us to get and store the output of the system command we're using.

**What does the copy method of os.environ do?**

Creates a new dictionary of environment variables

#### ◽**Theme 3: Processing Log Files**

Log files are records that document events occurring in operating systems, applications, and services. They contain valuable information about system activities, errors, warnings, and events that happen when programs run without being connected to a terminal. These files are crucial for system administrators and developers, especially when debugging complex problems on a computer.

Python provides powerful tools for interacting with the system through file operations, regular expressions, shell interaction, and command execution, making it ideal for processing these log files.

Regular expressions combined with scripting offer tremendous flexibility for extracting specific information from log files, which can otherwise be overwhelming to search manually due to their volume and complexity. This approach allows for customized text processing to get precisely the results needed.

With these techniques, you can create targeted solutions for handling the large volumes of data commonly found in system logs, web request logs, and other text-based records.

##### 💠**Filtering log files with regular expressions**

When processing log files, use open() to access file objects and iterate through lines with for-loops. For large files, read line by line instead of loading everything into memory.


The example involves examining server logs for Cron jobs started by system administrators. By filtering lines containing "Cron" substring, then using the "in" keyword with the "continue" statement to skip irrelevant lines.

To extract usernames, regular expressions are employed with escape characters, capture groups, and anchors. The pattern r"USER \((\w+)\)$" searches for "USER" followed by a username in parentheses at line end. Backslashes escape parentheses, while another set creates a capture group for the alphanumeric username (\w+).

After testing the regex in an interpreter and confirming it works, it's incorporated into the script which processes log files, identifies users who started Cron jobs, and prints the results.

In [None]:
#!/usr/bin/env python3
import re

# Create the syslog file content directly in the code
syslog_content = """Jul 6 14:01:23 computer.name CRON[29440]: USER (good_user)
Jul 6 14:02:08 computer.name jam_tag=psim[29187]: (UUID:006)
Jul 6 14:02:09 computer.name jam_tag=psim[29187]: (UUID:007)
Jul 6 14:03:01 computer.name CRON[29440]: USER (naughty_user)
Jul 6 14:03:40 computer.name cacheclient[29807]: start syncing
Jul 6 14:04:01 computer.name CRON[29440]: USER (naughty_user)
Jul 6 14:05:01 computer.name CRON[29440]: USER (naughty_user)
Jul 6 14:06:01 computer.name CRON[29440]: USER (naughty_user)"""

# Write the content to a file
with open('syslog.txt', 'w') as f:
    f.write(syslog_content)

print("Created syslog.txt file with sample data")

# Now process the file as in your original script
with open('syslog.txt') as f:
    for line in f:
        if "CRON" not in line:
            continue
        pattern = r"USER \((.+)\)$"
        result = re.search(pattern, line)
        if result:
            print(result[1])


Created syslog.txt file with sample data
good_user
naughty_user
naughty_user
naughty_user
naughty_user


**Reflect**

We're using the same syslog, and we want to display the date, time, and process id that's inside the square brackets. We can read each line of the syslog and pass the contents to the show_time_of_pid function. Fill in the gaps to extract the date, time, and process id from the passed line, and return this format: Jul 6 14:01:23 pid:29440.

In [None]:
import re
def show_time_of_pid(line):
  pattern = r"(Jul \d+ \d+:\d+:\d+).*\[(\d+)\]"
  result = re.search(pattern, line)
  return "{} pid:{}".format(result[1], result[2])

print(show_time_of_pid("Jul 6 14:01:23 computer.name CRON[29440]: USER (good_user)")) # Jul 6 14:01:23 pid:29440
print(show_time_of_pid("Jul 6 14:02:08 computer.name jam_tag=psim[29187]: (UUID:006)")) # Jul 6 14:02:08 pid:29187
print(show_time_of_pid("Jul 6 14:02:09 computer.name jam_tag=psim[29187]: (UUID:007)")) # Jul 6 14:02:09 pid:29187
print(show_time_of_pid("Jul 6 14:03:01 computer.name CRON[29440]: USER (naughty_user)")) # Jul 6 14:03:01 pid:29440
print(show_time_of_pid("Jul 6 14:03:40 computer.name cacheclient[29807]: start syncing from \"0xDEADBEEF\"")) # Jul 6 14:03:40 pid:29807
print(show_time_of_pid("Jul 6 14:04:01 computer.name CRON[29440]: USER (naughty_user)")) # Jul 6 14:04:01 pid:29440
print(show_time_of_pid("Jul 6 14:05:01 computer.name CRON[29440]: USER (naughty_user)")) # Jul 6 14:05:01 pid:29440

Jul 6 14:01:23 pid:29440
Jul 6 14:02:08 pid:29187
Jul 6 14:02:09 pid:29187
Jul 6 14:03:01 pid:29440
Jul 6 14:03:40 pid:29807
Jul 6 14:04:01 pid:29440
Jul 6 14:05:01 pid:29440


##### 💠**Making sense out of the data**

A script that extracts usernames starting cron jobs can be improved by counting occurrences of each username. Dictionaries are ideal for counting string appearances, using usernames as keys and occurrence counts as values.

The implementation uses the get() method to efficiently handle this counting. First, an empty dictionary is created with curly brackets. When processing each username, the count is incremented by setting dictionary[name] = dictionary.get(name, 0) + 1. The get() method returns the current value or 0 if the key doesn't exist.

Before adding values to the dictionary, check that the regular expression actually found a match by verifying if the result is not None. Use the continue keyword to skip lines without matches.

**The script flow is modified to:**

1. Initialize an empty dictionary
2. Process each line for usernames
3. Check if regex returned a match
4. Extract the name from the capture group
5. Increment the count in the dictionary
6. Print the completed dictionary

The improved script quickly shows both who started cron jobs and their frequency, providing more comprehensive information for investigating server issues.

In [None]:
#!/usr/bin/env python3
import re
import sys

# Create the syslog file content directly in the code
syslog_content = """Jul 6 14:01:23 computer.name CRON[29440]: USER (good_user)
Jul 6 14:02:08 computer.name jam_tag=psim[29187]: (UUID:006)
Jul 6 14:02:09 computer.name jam_tag=psim[29187]: (UUID:007)
Jul 6 14:03:01 computer.name CRON[29440]: USER (naughty_user)
Jul 6 14:03:40 computer.name cacheclient[29807]: start syncing
Jul 6 14:04:01 computer.name CRON[29440]: USER (naughty_user)
Jul 6 14:05:01 computer.name CRON[29440]: USER (naughty_user)
Jul 6 14:06:01 computer.name CRON[29440]: USER (naughty_user)"""

# Write the content to a file
with open('syslog.txt', 'w') as f:
    f.write(syslog_content)

print("Created syslog.txt file with sample data")


Created syslog.txt file with sample data


In [None]:

# Function to extract time and PID from log lines
def show_time_of_pid(line):
    pattern = r"(Jul \d+ \d+:\d+:\d+).*\[(\d+)\]"
    result = re.search(pattern, line)
    if result:
        return "{} pid:{}".format(result[1], result[2])
    return "No match found"

# Testing the time and PID extraction function
print("\nTesting time and PID extraction:")
print(show_time_of_pid("Jul 6 14:01:23 computer.name CRON[29440]: USER (good_user)"))
print(show_time_of_pid("Jul 6 14:02:08 computer.name jam_tag=psim[29187]: (UUID:006)"))
print(show_time_of_pid("Jul 6 14:03:40 computer.name cacheclient[29807]: start syncing"))



Testing time and PID extraction:
Jul 6 14:01:23 pid:29440
Jul 6 14:02:08 pid:29187
Jul 6 14:03:40 pid:29807


In [None]:
# Dictionary to count username occurrences
print("\nCounting username occurrences in CRON entries:")
usernames = {}
logfile = 'syslog.txt'

with open(logfile) as f:
    for line in f:
        if "CRON" not in line:
            continue

        pattern = r"USER \((.+)\)$"
        result = re.search(pattern, line)

        if result is None:
            continue

        name = result[1]
        usernames[name] = usernames.get(name, 0) + 1

print(usernames)


Counting username occurrences in CRON entries:
{'good_user': 1, 'naughty_user': 4}


In [None]:
# Simple example of dictionary usage
print("\nSimple dictionary example:")
usernames = {}
name = "good_user"
usernames[name] = usernames.get(name, 0) + 1
print(usernames)
usernames[name] = usernames.get(name, 0) + 1
print(usernames)




Simple dictionary example:
{'good_user': 1}
{'good_user': 2}


In [None]:
# Example of extracting all usernames with counts from the log file
print("\nCounting all usernames from full log analysis:")
usernames = {}

with open(logfile) as f:
    for line in f:
        if "CRON" not in line:
            continue

        pattern = r"USER \((\w+)\)"
        result = re.search(pattern, line)

        if result is None:
            continue

        name = result[1]
        usernames[name] = usernames.get(name, 0) + 1

print(usernames)


Counting all usernames from full log analysis:
{'good_user': 1, 'naughty_user': 4}


**This script combines:**

1. Creating the sample syslog file - Instead of reading from a command-line argument, it creates the syslog file directly in code
2. The time and PID extraction function - Implements the show_time_of_pid function to extract timestamps and process IDs from log entries
3. Username counting from CRON entries - Uses dictionaries to count occurrences of usernames in CRON entries
4. Dictionary usage examples - Shows the basic pattern of incrementing counts in a Python dictionary

**The script is structured to:**

1. Create the syslog file
2. Demonstrate the time and PID extraction functionality
3. Count usernames from CRON entries
4. Show simple dictionary usage for tracking counts
5. Perform a full analysis of the log file to count all usernames

##### 💠**Practice**

You have created a Python script to read a log of users running CRON jobs. The script needs to accept a command line argument for the path to the log file. Which line of code accomplishes this?

**Question 1**
You have created a Python script to read a log of users running CRON jobs. The script needs to accept a command line argument for the path to the log file. Which line of code accomplishes this?

**syslog=sys.argv[1]**

Correct
Right on! This will assign the script's first command line argument to the variable syslog.


---


**Question 2**
Which of the following is a data structure that can be used to count how many times a specific error appears in a log?

**Dictionary**

Correct
Great work! A dictionary is useful to count appearances of strings.


---


**Question 3**
Which keyword will return control back to the top of a loop when iterating through logs?

**continue**

Correct
Excellent! The continue statement is used to return control back to the top of a loop.



---


**Question 4**
When searching log files using regex, which statement uses a capture group to search for the alphanumeric word "IP" followed by one or more digits wrapped in parentheses at the end of a line?

**r"IP \((\d+)\)$"**

Correct
Awesome! This expression will search for the word IP followed by a space and parentheses. It uses a capture group and \d+ to capture any digit characters found in the parentheses.


---



**Question 5**
Which of the following are true about parsing log files? (Select all that apply.)

**You should parse log files line by line.**
Correct
Well done! Since log files can get pretty large, it's a good idea to parse them one line at a time instead of loading the entire file into memory at once.

**It is efficient to ignore lines that don't contain the information we need.**
Correct
Right on! We can save a lot of time by not parsing lines that don't contain what we need.

**We have to open() the log files first.**
Correct
Nice job! Before we can parse our log file, we have to use the open() or with open() command on the file first.



#### ◽**Theme 4: Module Review**

##### 💠**Glossary terms from course 2, module 4**

**Bash:** The most commonly used shell on Linux

**Command line arguments:** Inputs provided to a program when running it from the command line

**Environment variables:** Settings and data stored outside a program that can be accessed by it to alter how the program behaves in a particular environment

**Input / Output (I/O):** These streams are the basic mechanism for performing input and output operations in your programs

**Log files:** Log files are records or text files that store a history of events, actions, or errors generated by a computer system, software, or application for diagnostic, troubleshooting, or auditing purposes

**Standard input stream commonly (STDIN):** A channel between a program and a source of input

**Standard output stream (STDOUT):** A pathway between a program and a target of output, like a display

**Standard error (STDERR):** This displays output like standard out, but is used specifically as a channel to show error messages and diagnostics from the program

**Shell:** The application that reads and executes all commands

**Subprocesses:** A process to call and run other applications from within Python, including other Python scripts

##### 💠**Quiklabs: Work with Log Files**

Introduction
You dealt with a program that kept throwing an error because the source code was too complicated to quickly find the error. The good news is that the program outputs a log file you could read! Let's review how to write a script to search the log file for the exact error, then output that error into a separate file so you can work out what's wrong.

This exemplar is a walkthrough of the previous Qwiklab activity, including detailed instructions and solutions. You may use this exemplar if you were unable to complete the lab and/or you need extra guidance in competing lab tasks. You may also refer to this exemplar to prepare for the graded quiz in this module.

View log file
In the /data directory, there's a file named fishy.log, which contains the system log. Log entries are written in this format:

In [None]:
Month Day hour:minute:second mycomputername "process_name"["random 5 digit number"] "ERROR/INFO/WARN" "Error description"

For every process, the runtime log that's generated contains a timestamp and appropriate message alongside. You can view all logs using the command below:

In [None]:
cat ~/data/fishy.log

Output:

In [None]:
July 31 00:06:21 mycomputername kernel[96041]: WARN Failed to start network connection
July 31 00:09:53 mycomputername updater[46711]: WARN Computer needs to be turned off and on again
July 31 00:12:36 mycomputername kernel[48462]: INFO Successfully connected
July 31 00:13:52 mycomputername updater[43530]: ERROR Error running Python2.exe: Segmentation Fault (core dumped)
July 31 00:16:13 mycomputername NetworkManager[63902]: WARN Failed to start application install
July 31 00:26:45 mycomputername CRON[83063]: INFO I'm sorry Dave. I'm afraid I can't do that
July 31 00:27:56 mycomputername cacheclient[75746]: WARN PC Load Letter
July 31 00:33:31 mycomputername system[25588]: ERROR Out of yellow ink, specifically, even though you want grayscale
July 31 00:36:55 mycomputername updater[73786]: WARN Packet loss
July 31 00:37:38 mycomputername dhcpclient[87602]: INFO Googling the answer
July 31 00:37:48 mycomputername utility[21449]: ERROR The cake is a lie!
July 31 00:44:50 mycomputername kernel[63793]: ERROR Failed process [13966]

Find an error
In this lab, we'll search for the CRON error that failed to start. To do this, we'll use a python script to search log files for a particular type of ERROR log. In this case, we'll search for a CRON error within the fishy.log file that failed to start by narrowing our search to "CRON ERROR Failed to start".

To get started, let's create a python script named find_error.py within scripts directory using nano editor.

In [None]:
cd ~/scripts
nano find_error.py

Completed Script
The Python code you wrote is designed to search a log file for user-specified errors. It prompts the user to enter the error message, then iterates through each line of the log file, checking for matches against the specified error pattern. If a match is found, the line is added to a list of found errors. Once the search is complete, the found errors are written to a separate output file for further analysis. The script uses regular expressions for pattern matching and provides flexibility in defining error patterns.

In [None]:
#!/usr/bin/env python3
import sys
import os
import re


def error_search(log_file):
        error = input("What is the error?")
        returned_errors = []
        with open(log_file, mode='r',encoding='UTF-8') as file:
                for log in file.readlines():
                        error_patterns = ["error"]
                for i in range(len(error.split(' '))):
client_loop: send disconnect: I/O errorappend(r"{}".format(error.split(' ')[i].lower()))
                if all(re.search(error_pattern, log.lower()) for error_pattern in error_patterns$
                        returned_errors.append(log)
        file.close()
        return returned_errors


def file_output(returned_errors):
        with open(os.path.expanduser('~') + '/data/errors_found.log', 'w') as file:
                for error in returned_errors:
                        file.write(error)
                file.close()
if __name__ == "__main__":
        log_file = sys.argv[1]
        returned_errors = error_search(log_file)
        file_output(returned_errors)
        sys.exit(0)


Save the file by clicking Ctrl-o, followed by the Enter key and Ctrl-x.

Make the file executable before running it.

In [None]:
sudp chmod +x find_error.py

 Now, run the file by passing the path to fishy.log as a parameter to the script.

In [None]:
./find_error.py ~/data/fishy.log

This script will now prompt for the type of error to be searched. Continue by entering the following type of error:

In [None]:
CRON ERROR Failed to start

On successful execution, this will generate an errors_found.log file, where you will find all the ERROR logs based on your search. You can view the ERROR log using the command below:

In [None]:
cat ~/data/errors_found.log

Output:

In [None]:
July 31 04:11:32 mycomputername CRON[51253]: ERROR: Failed to start CRON job due to script syntax error. Inform the CRON job owner!

Congratulations!
Congrats! You've written a script to search the log file for the exact error, and then output that error into a separate file for further analysis. As an IT specialist, this tool will be super helpful, allowing you to use Python scripting to filter out and analyze all types of logs.



##### 💠**Module 4 challenge: Working with log files**

1. In the process of connecting to a virtual machine using SSH and PuTTY on Windows, which of the following steps is necessary?
✅ Downloading the PPK key file from the Qwiklabs Start Lab page

2. What is the primary purpose of the sys module in Python?
✅ Provides functions and variables to interact with the Python interpreter and the runtime environment

3. In the lab’s Python script, what is the role of the error_search function in relation to processing log files with regular expressions (RegEx)?
✅ To interactively receive an error type from the user and use RegEx to find corresponding logs

4. What is the role of fishy.log in the provided Python script for log file analysis?
✅ It is the log file that is being analyzed for specific error patterns.

5. What is the step-by-step process of how errors are searched for and processed in the script within the lab?
✅ Set the log_file variable, call the error_search() function with the log_file parameter to search for errors, and store the matching errors in the returned_errors list.

6. Based on the Python script provided for log file analysis, what type of information would you expect to find in the errors_found.log file?
✅ Specific error logs that match the user-defined search criteria

7. What is the function that takes the errors returned by another function as a formal parameter?
✅ file_output

8. In the lab’s Python script find_error.py, what happens when the script is executed with a log file like fishy.log?
✅ The script prompts the user for a type of error, searches fishy.log for that error, and writes the found errors to errors_found.log.

9. What is the primary function of the os module in Python?
✅ For managing and manipulating file paths and directory structures

10. How would you modify the script to also search for warning messages in addition to errors?
✅ Modify the error_patterns list initialization to include both "error" and "WARN" as base patterns.



### 🟢 **Module 5: Testing in Python** 27/05/25

#### ◽**Theme 1: Simple Tests**

##### 💠**What is testing?**

**Software testing**

The process of evaluating computer code to determine whether or not it does what you expect it to do.

##### 💠**Manual Testing and Automated Testing**

To ensure software behaves correctly, developers use test cases, which are predefined scenarios to check code functionality. The most basic testing method involves trying different parameters and comparing outcomes to expected results. When tests are written as part of the code, they become automatic tests, enabling ongoing validation without manual effort. Using test cases simplifies repeated testing across various inputs, improving reliability. As software grows more complex, software testing becomes increasingly valuable for detecting and managing errors effectively.

#### ◽**Theme 2: Unit Tests**

##### 💠**unittest**

A unittest provides developers with a set of tools to construct and run tests.

**Concepts**

* **Test fixture:** Preparation to perform one or more tests.

* **Test case:** Individual unit of testing for specific response.

* **Test suite:** Collection of test cases.

* **Test runner:** Run the test.

##### 💠**pytest**

Powerful Python testing tool. Supports automatic test discovery and generates infomative test reports.

**How to write tests**

Pytest are written using operation **assert ()**. It is a sanity check for the code.

If the condition provided to **assert()** turns out to be false, it indicates a bug, an exception is raised and stops execution.



**Pytest fixtures**

Are used to separate parts of code that only run for tests. They are reusable pieces of test setups.

Pytest is a user-friendly testing framework for developers writing code in Python to focus on creating simple and clear tests.

##### 💠**Comparing unittest and pytest**

Unittest is a tool that is built directly into Python, while pytest must be imported from outside your script. Test discovery acts differently for each test type. Unittest has the functionality to automatically detect test cases within an application, but it must be called from the command line. Pytests are performed automatically using the prefix test_. Unittests use an object-oriented approach to write tests, while pytests use a functional approach. Pytests use built-in assert statements, making tests easier to read and write. On the other hand, unittests provide special assert methods like assertEqual() or assertTrue().

Unittest and pytest are both beneficial to developers in executing tests on their code written in Python. Each one has its pros and cons, and it is up to the developer and their preference on which type of testing framework they want to use.

##### 💠**Unit tests**

 Used to verify that small, isolated parts of a program are correct.

##### 💠**Writing unit tests in python**

Should import the part of code you want to test and the unittest module.

Define the class that is Test and call it with unittest as a test case.

Define the initial input to test that is the testcase and the expected result.

Make the assert evaluation that compare the real result with the expectation

Run it and see the result.

In [None]:
#!/usr/bin/env python3

from rearrange import rearrange_name
import unittest


class TestRearrange(unittest.TestCase):
    def test_basic(self):
        testcase = "Gonzalez, Luis"
        expected = "Luis Gonzalez"
        self.assertEqual(rearrange_name(testcase), expected)

unittest.main()

##### 💠**Edge Cases**

Are inputs to our code that produce unexpected results, and are found at the extreme endes of the ranges of input we imagine our programs will typically work with.

In [None]:
def test_empty(self):
        testcase = " "
        expected = " "
        self.assertEqual(rearrange_name(testcase), expected)

##### 💠**Additional test cases**

Test are good to prove that the code works for the cases it will receive potentially in use, with this information we can anticipate to malfunctions.

In [None]:
    def test_double_name(self):
        testcase = "Hopper, Grace M."
        expected = "Grace M. Hopper"
        self.assertEqual(rearrange_name(testcase), expected)

    def test_one_name(self):
        testcase = "Voltaire"
        expected = "Voltaire"
        self.assertEqual(rearrange_name(testcase), expected)

##### 💠**Study guide: Unit tests**

Unit tests check individual functions or methods for correct behavior.

Use Python's unittest module to write and run tests.

Test methods must start with test_ and belong to a class inheriting from unittest.TestCase.

**Use assertions like:**

* assertEqual(a, b)

* assertTrue(x)

* assertRaises(Exception)

**Run tests via command line:**

* python -m unittest test_module.TestClass.test_method

**Structure tests using:**

* setUp() and tearDown() (run before and after each test)

Use the Arrange-Act-Assert pattern to organize test logic.

Group related tests into test suites for better management.

Unit testing helps catch bugs early and supports automation.

#### ◽**Theme 3: Other Test Concepts**

##### 💠**Black Box vs White Box**

* **White Box** relies on the test creator's knowledge of the software being tested to construct the text cases.

* **Black Box** Written and awareness of what the program is supposed to do - its requirements or specifications - but not how it does it.




##### 💠**Other Test Types**

**Integration Tests** Verify interactions and validate whole systems.

**Regression Tests** Written as part of debugging and troubleshooting process to verify fixed bugs.

**Build Verification Tests / Smoke Test** Running a piece of software code as-is to see if it runs

**Load Tests** These tests verify that the system behaves well when it's under significant load.

##### 💠**Test-Driven Development TDD**

This concept is based in start coding keeping in mind the solution that the script should do, if you assure first it pass the needed tests, you can write code with no coming back to see if it works.

#### ◽**Theme 4: Errors and Exceptions**

##### 💠**The Try-Except concept**

The code in the except block is only executed if one of the instructions in the try block raises an error of the matching type.

An exception is not meant to produce an error, but to bypass it.

In [None]:
#!/usr/bin/env python3

def character_frequency(filename):
  """Counts the frequency of each character in the given file."""
  # First try to open the file
  try:
    f = open(filename)
  except OSError:
    return None

  # Now process the file
  characters = {}
  for line in f:
    for char in line:
      characters[char] = characters.get(char, 0) + 1
  f.close()
  return characters

##### 💠**Raising Errors**

Giving a visual output is useful to identify the kind of errors that are happening.

In [None]:
#!/usr/bin/env python3

def validate_user(username, minlen):
    assert type(username) == str, "username must be a string"
    if minlen < 1:
        raise ValueError("minlen must be at least 1")
    if len(username) < minlen:
        return False
    if not username.isalnum():
        return False
    return True


##### 💠**Testing for expected errors**

It is helpful to create test units to test expected situations in combo, this example makes 4 tests in one run:

In [None]:
#!/usr/bin/env python3

import unittest

from validations import validate_user

class TestValidateUser(unittest.TestCase):
  def test_valid(self):
    self.assertEqual(validate_user("validuser", 3), True)

  def test_too_short(self):
    self.assertEqual(validate_user("inv", 5), False)

  def test_invalid_characters(self):
    self.assertEqual(validate_user("invalid_user", 1), False)

  def test_invalid_minlen(self):
    self.assertRaises(ValueError, validate_user, "user", -1)

# Run the tests
unittest.main()


##### 💠**Study Guide: Handling Errors**

In Python, error handling allows you to anticipate and manage problems during code execution using try and except blocks. Code inside the try block runs until an error occurs, which is then caught by a matching except clause—otherwise, the error propagates and may crash the program. Developers should be specific about the errors they catch (e.g., ValueError, FileNotFoundError, ZeroDivisionError) and use raise to deliberately trigger exceptions when certain conditions aren’t met. Additionally, assert statements help catch issues early by enforcing conditions during development. Good practice involves predicting where failures may occur, handling them clearly, and avoiding vague or overly broad exception handling.


### 🟢 **Module 6: Bash Scripting** 06/06/25

#### ◽**Theme 1: Interacting with the Command Line Shell**

##### 💠**Basic Linux commands**

**Command Summary – Shell (Git Bash on Windows)**

`mkdir mynewdir`
Creates a new directory named mynewdir.

`cd mynewdir`
Changes into the mynewdir directory.

`pwd`
Prints the current working directory path.

`cp ../spider.txt`
Attempted copy failed – missing destination.

`cp ../spider.txt .`
Corrects the previous command – copies spider.txt into the current directory.

`touch myfile.txt`
Creates a new empty file named myfile.txt.

`ls -l`
Lists files in long format (shows permissions, size, etc.).

`ls -la`
Same as ls -l, but includes hidden files and . / ...

`mv myfile.txt emptyfile.txt`
Renames myfile.txt to emptyfile.txt.

`cp spider.txt yetanotherfile.txt`
Copies spider.txt to a new file named yetanotherfile.txt.

`rm *`
Deletes all files in the current directory.

`ls -l`
Confirms the directory is empty.

`cd ..`
Moves up one level to the parent directory.

`rmdir mynewdir/`
Deletes the now-empty mynewdir directory.

`ls mynewdir`
Error – confirms the directory no longer exists.

`ls`
Lists the contents of the current (home) directory.

##### 💠**Redirecting streams**

**Redirection** The process of sending a stream to a different destination

`>` Each time that STDOUT is performed to redirection, the destination is overwritten.

To APPEND it is possible to use `>>`

The `<` symbol in the terminal is used to redirect input from a file into a command. It tells the command, "use this file as if I were typing it." For example, sort < names.txt will sort the lines from the names.txt file without needing manual input.

< new_file.txt
Redirects the contents of new_file.txt into the script’s input (input() reads this instead of waiting for keyboard input).

2> error_file.txt
Redirects the error messages (STDERR, file descriptor 2) to a file called error_file.txt.

##### 💠**Pipes and pipelines** 07/06/25

**Pipes** connect the output of one program to the input of another in order to pass data between programs.

Breakdown of the command:

`cat spider.txt | tr ' ' '\n' | sort | uniq -c | sort -nr | head`

1. cat spider.txt
Reads and outputs the content of spider.txt.

cat: Short for concatenate, it was originally meant to join multiple files. Today, it's often used to simply dump a file to standard output.

2. tr ' ' '\n'
Replaces each space (' ') with a newline ('\n'), effectively putting each word on a new line. This assumes that words are space-separated.

Stands for translate. It’s a character-by-character translator, ideal for quick transformations like this one.

3. sort
Sorts the lines alphabetically. Required for the next step (uniq) to work properly.

Literally just sorts input. Unix naming kept it minimal and literal.

4. uniq -c
Counts unique lines and prefixes each with the number of occurrences.

It filters for unique lines. With -c, it counts how many times each one occurs.

5. sort -nr
Sorts the result numerically (-n) in reverse order (-r), so the most frequent items appear first.

This time used with different options — again, a straightforward name doing multiple types of sorting depending on flags.

6. head
Shows the first 10 lines of the result — i.e., the 10 most frequent tokens.

Because it displays the “head” (top) of the output.



##### 💠**Signaling processes**

Signals are tokens delivered to running processes to indicate a desired action

CTRL + C = SIGNINT (Interrupt)

CTRL + Z = SIGNSTOP

fg = SIGNCONTINUE

kill = SIGNTERM (Terminate)

ps = send the PID


ps ax | grep ping (To find the ID of the process and could interact with it from another terminal)



#### ◽**Theme 2: Bash Scripting**

##### 💠**Creating Bash Scripts** 08/06/25

Bash is the most commonly used shell in linux.

`gather-information.sh`


In [None]:
#!/bin/bash

echo "starting at: $(date)"; echo

echo "UPTIME"; uptime; echo

echo "FREE"; free; echo

echo "WHO"; who; echo

echo "Finishing at: $(date)"


##### 💠**Using variables and Globs**

Bash is a full scripting language, not just for running commands. It supports variables, conditionals, loops, and functions. Variables are assigned with =, without spaces, and accessed using $. To make a variable available to subprocesses, use export.

You can enhance scripts using variables, like creating a line variable to print separators.

Bash also supports globs — wildcards like * and ? — to match filenames. For example:

    *.py matches all Python files,

    C* matches files starting with "C",

    ????? matches files with five-character names.

These help batch-process files, and similar functionality exists in Python's glob module.


##### 💠**Conditional execution in Bash**

In bash scripting, an exit value of 0 means success.

In [None]:
#!/bin/bash

if grep "127.0.0.1" /etc/hosts; then
	echo "Everything OK"
else
	echo "ERROR! 127.0.0.1 is not in /etc/hosts"
fi


**Test** A command that evaluates the conditions received and exits with zero when they're true and with one when they're false



```
luis@Luis-HP-EliteBook-840-G5:~$ if test -n "$PATH"; then echo "Your PATH is not empty"; fi
Your PATH is not empty


luis@Luis-HP-EliteBook-840-G5:~$ if [ -n "$PATH" ]; then echo "Your PATH is not empty"; fi
Your PATH is not empty

```




#### ◽**Theme 3: Advanced Bash Concepts**

##### 💠**While loops in Bash Scripts**

**./while.sh**



```
#!/bin/bash

n=1
while [ $n -le 5 ]; do
  echo "Iteration number $n"
  ((n+=1))
done


```



**./random-exit.py**




```
#!/usr/bin/env python3

import sys
import random

value=random.randitn(0, 3)
print("Returning: " + str(value))
sys.exit(value)

```



**./retry.sh**



```
#!/bin/bash

n=0
command=$1
while ! $command && [ $n -le 5 ]; do
  sleep $n
  ((n=n+1))
  echo "Retry #$n"
done;

```



##### 💠**For loops in Bash Scripts**


```
luis@Luis-HP-EliteBook-840-G5:~$ nano fruits.sh

luis@Luis-HP-EliteBook-840-G5:~$ cat fruits.sh
#!/bin/bash

for fruit in peach orange apple; do
	echo "I like $fruit!"
done

luis@Luis-HP-EliteBook-840-G5:~$ chmod +x fruits.sh

luis@Luis-HP-EliteBook-840-G5:~$ ./fruits.sh
I like peach!
I like orange!
I like apple!

luis@Luis-HP-EliteBook-840-G5:~$

```





```
luis@Luis-HP-EliteBook-840-G5:~/old_website_practice$ ls -l
total 12
-rw-rw-r-- 1 luis luis 1 Jun  9 22:46 Test1.HTM
-rw-rw-r-- 1 luis luis 1 Jun  9 22:46 Test2.HTM
-rw-rw-r-- 1 luis luis 1 Jun  9 22:46 Test3.HTM
luis@Luis-HP-EliteBook-840-G5:~/old_website_practice$ basename Test1.HTM .HTM
Test1
luis@Luis-HP-EliteBook-840-G5:~/old_website_practice$ nano
luis@Luis-HP-EliteBook-840-G5:~/old_website_practice$ nano rename.sh
luis@Luis-HP-EliteBook-840-G5:~/old_website_practice$ chmod +x rename.sh
luis@Luis-HP-EliteBook-840-G5:~/old_website_practice$ ./rename.sh
mv Test1.HTM Test1.html
mv Test2.HTM Test2.html
mv Test3.HTM Test3.html
luis@Luis-HP-EliteBook-840-G5:~/old_website_practice$ nano rename.sh
luis@Luis-HP-EliteBook-840-G5:~/old_website_practice$ ./rename.sh
luis@Luis-HP-EliteBook-840-G5:~/old_website_practice$ ls -l
total 16
-rwxrwxr-x 1 luis luis 98 Jun  9 22:53 rename.sh
-rw-rw-r-- 1 luis luis  1 Jun  9 22:46 Test1.html
-rw-rw-r-- 1 luis luis  1 Jun  9 22:46 Test2.html
-rw-rw-r-- 1 luis luis  1 Jun  9 22:46 Test3.html
luis@Luis-HP-EliteBook-840-G5:~/old_website_practice$

```




##### 💠**Advanced Command Interaction**


```
luis@Luis-HP-EliteBook-840-G5:~$ nano toploglines.sh
luis@Luis-HP-EliteBook-840-G5:~$ cat toploglines.sh
#!/bin/bash

for logfile in /var/log/*log; do
    echo "Processing: $logfile"
    cut -d' ' -f5- $logfile | sort | uniq -c | sort -nr | head -5
done
luis@Luis-HP-EliteBook-840-G5:~$ chmod +x toploglines.sh
luis@Luis-HP-EliteBook-840-G5:~$ ./toploglines.sh

```




##### 💠**Choosing between Bash and Python**

```
luis@Luis-HP-EliteBook-840-G5:~$ nano capitalize_words.py
luis@Luis-HP-EliteBook-840-G5:~$ cat capitalize_words.py
#!/usr/bin/env python3

import sys

for line in sys.stdin:
    words = line.strip().split()
    print(" ".join([word.capitalize() for word in words]))

luis@Luis-HP-EliteBook-840-G5:~$ nano story.txt
luis@Luis-HP-EliteBook-840-G5:~$ cat story.txt
once upon a time there was an egg of programming language called python
luis@Luis-HP-EliteBook-840-G5:~$ cat story.txt | ./capitalize_words.py
bash: ./capitalize_words.py: Permission denied
luis@Luis-HP-EliteBook-840-G5:~$ chmod +x capitalize_words.py
luis@Luis-HP-EliteBook-840-G5:~$ cat story.txt | ./capitalize_words.py
Once Upon A Time There Was An Egg Of Programming Language Called Python
luis@Luis-HP-EliteBook-840-G5:~$

```