In [1]:
# Anil Osman Tur
# 2024-10

## Exercise Compilation Intro to Python 4

Let's remember this :)

```python
while True:
    print("Ask questions. :)")
    if satisfied_with_answer:
        break
```

----

Let's pull the notebook from the repo to our local

```sh
git clone https://github.com/AnilOsmanTur/Intro-to-Python-Exercises.git
```

After cloning the repo we can just use git pull command to get the latest version of the notebook.

```sh
git pull
```
----


## Regular expression

Let's remember some key points about regular expressions (regex) in Python:

1. The `re` module is used for working with regular expressions in Python.

2. Basic pattern matching:
   - `re.search(pattern, string)` searches for a pattern in a string
   - `re.findall(pattern, string)` finds all occurrences of a pattern

3. Common regex patterns:
   - `.` matches any single character
   - `^` matches the start of a string
   - `$` matches the end of a string
   - `[a-e]` matches any single character in the range a to e
   - `[^abc]` matches any character except a, b, or c
   - `\d` matches any digit

4. Quantifiers:
   - `*` matches 0 or more occurrences
   - `+` matches 1 or more occurrences
   - `?` matches 0 or 1 occurrence

5. Groups and alternation:
   - `(...)` creates a group
   - `|` represents alternation (OR)

6. Useful regex functions:
   - `re.sub(pattern, replacement, string)` for substituting patterns
   - `re.split(pattern, string)` for splitting strings based on a pattern

7. Flags can be used to modify regex behavior, e.g., `re.I` for case-insensitive matching

8. It's important to remember that string manipulation using regex can be computationally intensive.



In [2]:
# Lets rember how to use your functions as a module
import os
import sys
sys.path.append('../')
from book_downloader.cleaning import clean_text, split_text, read_book, download_book

# read the list of books from the booklist.txt
with open('../book_downloader/booklist.txt', 'r') as f:
    books = f.readlines()

# split the books into name and urls, comma separated
strip_split = lambda x: x.strip().split(',')
books = [strip_split(item) for item in books if item != '']

# Create a folder to store the books
if not os.path.exists('bookcode'):
    os.makedirs('bookcode')

# Process the books
for book in books:
    name, url = book
    # remove spaces from the book name
    book[0] = book[0].replace(' ', '_')
    tmp_book = download_book(book[1], path=f'bookcode/{book[0]}.txt')
    tmp_book = read_book(file_path=f'bookcode/{book[0]}.txt')
    tmp_book = clean_text(tmp_book)
    sentences = split_text(tmp_book)
    for i in sentences[100:110]:
        print(i)
    print('------------------------------------')



CHAPTER 41
Moby Dick
CHAPTER 42
The Whiteness of the Whale
CHAPTER 43
Hark
CHAPTER 44
The Chart
CHAPTER 45
The Affidavit
------------------------------------
At that age I became acquainted with the celebrated poets of our owncountry
but it was only when it had ceased to be in my power to derive itsmost important benefits from such a conviction that I perceived thenecessity of becoming acquainted with more languages than that of my nativecountry
Now I am twenty-eight and am in reality more illiterate than manyschoolboys of fifteen
It is true that I have thought more and that mydaydreams are more extended and magnificent, but they want (as the painterscall it) _keeping
_ and I greatly need a friend who would have senseenough not to despise me as romantic, and affection enough for me toendeavour to regulate my mind
Well, these are useless complaints
I shall certainly find no friend on thewide ocean, nor even here in Archangel, among merchants and seamen
Yetsome feelings, unallied to the 

## Regular Expressions with Text Analysis


In [3]:
# A program that finds all occurrences of words containing "ing" in a given text.
import re

def find_ing_words(text):
    pattern = r'\b\w+ing\b'
    matches = re.findall(pattern, text)
    return matches

# Example usage
text = "I am running and jumping while singing in the rain"
print(find_ing_words(text))


['running', 'jumping', 'singing']


In [4]:
# Create a script that extracts all capitalized words from a sentence.
import re

def find_capital_words(text):
    pattern = r'.[A-Z]+[a-z]*' # what we did in class
    pattern = r'\b[A-Z][a-z]*\b' # maybe a bit better way to do it
    # what are the differences, what do you think?
    matches = re.findall(pattern, text)
    return matches

# Example usage
text = "The Quick Brown Fox Jumps Over The Lazy Dog"
print(find_capital_words(text))


[' Quick', ' Brown', ' Fox', ' Jumps', ' Over', ' The', ' Lazy', ' Dog']


In [6]:
# Implement a simple stemming function that removes common English suffixes (-ing, -ed, -s) from words.
import re

def simple_stem(word):
    patterns = [
        (r'ing$', ''), # This will remove present tense 'ing' as well
        (r'ed$', ''), # This will remove past tense 'ed' as well
        (r's$', '') # This will remove plural 's' as well
    ]
    
    for pattern, replacement in patterns:
        if re.search(pattern, word):
            return re.sub(pattern, replacement, word)
    return word

# Example usage
words = ["running", "jumped", "cats"]
print([simple_stem(word) for word in words])

['runn', 'jump', 'cat']


In [16]:
# A program that matches all words starting with a vowel.
import re

def find_vowel_words(text):
    pattern = r'\b[aeiouAEIOU]\w+'
    matches = re.findall(pattern, text)
    return matches

# Example usage
text = "top An elephant and ^octopus are inside the orange umbrella"
print(find_vowel_words(text))

['An', 'elephant', 'and', 'octopus', 'are', 'inside', 'orange', 'umbrella']
