# Problem 1


---




Write regular expressions that match the output strings in the given set of input strings.


#### Problem 1.1
Input:
```
ababababab
abb
bbabaa
bba
abbabb
```

Output:
```
abb
abbabb
```

Note: Do not use the regular expression which resembles `(word1|word2)` to handle each case individually. This defeats the purpose of the regular expression.

In [None]:
import re

words = ["ababababab", "abb", "bbabaa", "bba", "abbabb"]
pattern = re.compile("^abb(abb)?")
for word in words:
  if pattern.match(word):
    print(word)

abb
abbabb


#### Problem 1.2
Input:
```
abc
bc
ab
aabbcc
abbbbbbbb
ccbbaa
```

Output:
```
abc
ab
abbbbbbbb
```

In [None]:
words = ["abc", "bc", "ab", "aabbcc", "abbbbbbbb", "ccbbaa"]

pattern = re.compile("^ab[bc]*")
for word in words:
  if pattern.match(word):
    print(word)

abc
ab
abbbbbbbb


#### Problem 1.3

Input:
```
abc
abbbbbbbb
acc
abcbcbcbc
ac
xzzyyyyzyzzzz
accvvvvcvcccc
 ```

Output:
```
abc
abbbbbbbb
abcbcbcbc
```

Note: Use positive strings to define the regular expression.

In [None]:
words = ['abc', 'abbbbbbbb', 'acc', 'abcbcbcbc', 'ac', 'xzzyyyyzyzzzz', 'accvvvvcvcccc']

pattern = re.compile("^ab[bc]*")
for word in words:
  if pattern.match(word):
    print(word)

abc
abbbbbbbb
abcbcbcbc


#### Problem 1.4

Input:
```
x. y
x?  Y
x! y
x y
x.  Y
x Y
xY
```

Ouput:
```
x. y
x?  Y
x! y
x.  Y
```


In [None]:
words = ['x. y', 'x?  Y', 'x! y', 'x y', 'x.  Y', 'x Y', 'xY']

pattern = re.compile("^x[.!?]\s+[Yy]")
for word in words:
  if pattern.match(word):
    print(word)

x. y
x?  Y
x! y
x.  Y


#### Problem 1.5

Input:

```
<html class="client-nojs" lang="en" dir="ltr">
<table class="infobox biota" style="text-align: left; width: 200px; font-size: 100%">
<body class="ltr sitedir-ltr">
<div id="page-base" class="noprint">
<a id="top"></a>
```

Output:
```
tags:
html
table
body
div
a

keys:
class
lang
dir
style
id

values:
client-nojs; infobox biota; ltr sitedir-ltr; noprint
en
ltr
text-align: left; width: 200px; font-size: 100%
page-base; top
```

Note: For this question, your code must either use either the function [re.group()](https://docs.python.org/3/library/re.html#re.Match.group) or [re.findall()](https://docs.python.org/3/library/re.html#re.findall) to find tags, keys and values. Do not write a separate regular expression for each input line (i.e., your code has to make use of the same regular expression for each line). The output should be exactly as above.

In [None]:
input_lines = ['<html class="client-nojs" lang="en" dir="ltr">', '<table class="infobox biota" style="text-align: left; width: 200px; font-size: 100%">', '<body class="ltr sitedir-ltr">', '<div id="page-base" class="noprint">', '<a id="top"></a>']

tags = []
keys = []
values_dict = {}

pattern = re.compile(r"<(\w+)|(\w+)=\"(.*?)\"")

for line in input_lines:
    found = pattern.findall(line)
    for tag, key, value in found:
        if tag and tag not in tags:
            tags.append(tag)
        if key and key not in keys:
            keys.append(key)
        if value:
            if key in values_dict:
                values_dict[key].append(value)
            else:
                values_dict[key] = [value]

print("tags:")
print("\n".join(tags))

print("\nkeys:")
print("\n".join(keys))

print("\nvalues:")
formatted_values = ["; ".join(values_dict[key]) for key in keys]
print("\n".join(formatted_values))


tags:
html
table
body
div
a

keys:
class
lang
dir
style
id

values:
client-nojs; infobox biota; ltr sitedir-ltr; noprint
en
ltr
text-align: left; width: 200px; font-size: 100%
page-base; top


*italicized text*

## Problem 2 and 3

See https://docs.google.com/document/d/1KRnEequL1kJcngDNOvmTzX5iuy88J8IRWTETujsVgAQ/edit?usp=sharing



## Instructions to convert the ipython notebook to a pdf

Step 1: Make sure Google Drive is mounted (it likely already is from the first question).

In [None]:
from google.colab import drive
drive.mount('/content/drive/')
!ls "/content/drive/My Drive/Colab Notebooks/"

Mounted at /content/drive/
'Assignment 1'
'NL2DS-W2025-Assignment-1.ipynb adlı not defterinin kopyası'
'NL2DS-W2025-Assignment-2-regular-expressions.ipynb adlı not defterinin kopyası'


 Step 2: Install packages for converting .ipynb to .pdf

In [None]:
!apt-get -q install texlive-xetex texlive-fonts-recommended texlive-plain-generic pandoc

Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  dvisvgm fonts-droid-fallback fonts-lato fonts-lmodern fonts-noto-mono fonts-texgyre
  fonts-urw-base35 libapache-pom-java libcmark-gfm-extensions0.29.0.gfm.3 libcmark-gfm0.29.0.gfm.3
  libcommons-logging-java libcommons-parent-java libfontbox-java libfontenc1 libgs9 libgs9-common
  libidn12 libijs-0.35 libjbig2dec0 libkpathsea6 libpdfbox-java libptexenc1 libruby3.0 libsynctex2
  libteckit0 libtexlua53 libtexluajit2 libwoff1 libzzip-0-13 lmodern pandoc-data poppler-data
  preview-latex-style rake ruby ruby-net-telnet ruby-rubygems ruby-webrick ruby-xmlrpc ruby3.0
  rubygems-integration t1utils teckit tex-common tex-gyre texlive-base texlive-binaries
  texlive-latex-base texlive-latex-extra texlive-latex-recommended texlive-pictures tipa
  xfonts-encodings xfonts-utils
Suggested packages:
  fonts-noto fonts-freefont-otf | fonts-freefont-ttf libavalon-fr

Step 3: Convert to PDF (replace `000000000` with your student ID)

In [None]:
%env STUDENT_ID=000000000
!jupyter nbconvert --to pdf \
    "/content/drive/My Drive/Colab Notebooks/NL2DS-W2025-Assignment-2-regular-expressions.ipynb" \
    --output "/content/drive/My Drive/Colab Notebooks/$STUDENT_ID-W2025-Assignment-2-regular-expressions"


env: STUDENT_ID=000000000
[NbConvertApp] Converting notebook /content/drive/My Drive/Colab Notebooks/NL2DS-W2025-Assignment-2-regular-expressions.ipynb to pdf
[NbConvertApp] Writing 61843 bytes to notebook.tex
[NbConvertApp] Building PDF
[NbConvertApp] Running xelatex 3 times: ['xelatex', 'notebook.tex', '-quiet']
[NbConvertApp] Running bibtex 1 time: ['bibtex', 'notebook']
[NbConvertApp] PDF successfully created
[NbConvertApp] Writing 63983 bytes to /content/drive/My Drive/Colab Notebooks/000000000-W2025-Assignment-2-regular-expressions.pdf


Step 4: Download the resulting PDF file. If you are using Chrome, you can do so by running the following code. On other browsers, you can download the PDF using the file mananger on the left of the screen (Navigate to the file > Right Click > Download).

In [None]:
import os
from google.colab import files
files.download(f"/content/drive/My Drive/Colab Notebooks/{os.getenv('STUDENT_ID')}-W2025-Assignment-2-regular-expressions.pdf")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>