<a href="https://colab.research.google.com/github/MK316/applications/blob/main/ReSearch_words.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🌟 Basic Regular Expression including Wildcards** 🌟 

Regexpression | Interpretation | Example 
---|---|---
`.`| `Wildcard, matches any character` | ``
`^abc`	|   `Matches some pattern abc at the start of a string` | ``
`abc$`	|   `Matches some pattern abc at the end of a string` | ``
`[abc]`	|    `Matches one of a set of characters`|
`[^abc]`  |  `Matches anything but a set of characters`|
`[A-Z0-9]`|	`Matches one of a range of characters`|
`ed\|ing\|s` |	`Matches one of the specified strings (disjunction)`|
`*`| `Zero or more of previous item,`| `e.g. a*, [a-z]* (also known as Kleene Closure)`
`+`	 | `One or more of previous item` | `e.g. a+, [a-z]+`
`?` |	`Zero or one of the previous item (i.e. optional)` | `e.g. a?, [a-z]?`
`{n}`	| `Exactly n repeats where n is a non-negative integer` |
`{n,}` | `At least n repeats` |
`{,n}` | `No more than n repeats` |
`{m,n}` |	`At least m and no more than n repeats` |
`a(b\|c)+` |	`Parentheses that indicate the scope of the operators` |
`(...)` | `Matches whatever regular expression is inside the parentheses` |
`\d` | `Matches any decimal digit; this is equivalent to the class [0-9].`|
`\D` | `Matches any non-digit character; this is equivalent to the class [^0-9].`|
`\s` | `Matches any whitespace character; this is equivalent to the class [ \t\n\r\f\v].` |
`\S` | Matches any non-whitespace character; this is equivalent to the class [^ \t\n\r\f\v].`|
`\w` | `Matches any alphanumeric character; this is equivalent to the class [a-zA-Z0-9_].` |
`\W` | `Matches any non-alphanumeric character; this is equivalent to the class [^a-zA-Z0-9_].` |
===

In [8]:
#@markdown Import packages:
import nltk
import re
nltk.download('words')

[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Unzipping corpora/words.zip.


True

In [9]:
#@markdown ◉ English dictionary:

engdict = nltk.corpus.words.words('en')
print('Dictionary length: ', len(engdict))

Dictionary length:  235886


In [None]:
result = [w for w in engdict if re.search('ed$', w)]
result[:10]

In [14]:
#@markdown ◉ Treebank (Wall Street Journal)
nltk.download('treebank')
wsj = nltk.corpus.treebank.words(); print('WSJ length: ', len(wsj))

[nltk_data] Downloading package treebank to /root/nltk_data...
[nltk_data]   Package treebank is already up-to-date!
WSJ length:  100676


In [None]:
result = [w for w in wsj if re.search('es$', w)]
result[:10]