## Regular Expressions(Regex)

This Python script showcases various text processing tasks using regular expressions, a fundamental tool in Natural Language Processing (NLP). Each section of the script demonstrates a specific regex pattern to perform a different task on textual data.

1. Find all numbers in text:
   This section uses a regex pattern to find all numeric digits in the given text and prints them.

2. Find all non-numeric characters in text:
   Here, a regex pattern is used to find all non-numeric characters in the text, excluding digits, and prints them.

3. Find all punctuation marks in text:
   Using a regex pattern, this section identifies all punctuation marks (e.g., '.', ',', '?', '!') in the text and prints them.

4. Remove all punctuation marks from text:
   This part utilizes the `re.sub()` function to remove all punctuation marks from the text and prints the cleaned text.

5. Remove all spaces from text:
   Using a regex pattern, this section removes all whitespace characters (spaces, tabs, newlines) from the text and prints the result.

6. Find all hyphenated words in text:
   Here, a regex pattern is employed to find all words containing hyphens ("-") in the text and prints them.

7. Extract all email addresses from text:
   This section uses a regex pattern to extract all email addresses from the text and prints them.

8. Extract all usernames from email addresses:
   Building upon the previous section, this part extracts all usernames (before '@') from the email addresses found in the text and prints them.


#### Import a Python library for handling regular expressions

In [8]:
import re

#### 1. Find all numbers in text

In [10]:

text = "During the 1970s, many programmers began to write conceptual ontologies, which structured real-world information into computer-understandable data. Examples are MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979), and Plot Units (Lehnert 1981). During this time, the first many chatterbots were written (e.g., PARRY)."

digits = re.findall(r'\d+', text)

for digit in digits:
    print(digit)


1970
1975
1978
1978
1976
1977
1979
1981


#### 2. Find all non-numeric characters in text

In [11]:

text = "During the 1970s, many programmers began to write conceptual ontologies, which structured real-world information into computer-understandable data. Examples are MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979), and Plot Units (Lehnert 1981). During this time, the first many chatterbots were written (e.g., PARRY)."

symbols = re.findall(r'\D', text)

for symbol in symbols:
    print(symbol, end='')

During the s, many programmers began to write conceptual ontologies, which structured real-world information into computer-understandable data. Examples are MARGIE (Schank, ), SAM (Cullingford, ), PAM (Wilensky, ), TaleSpin (Meehan, ), QUALM (Lehnert, ), Politics (Carbonell, ), and Plot Units (Lehnert ). During this time, the first many chatterbots were written (e.g., PARRY).

#### 3. Find all punctuation marks in text

In [12]:

text = "Sveiki. Šaunu, kad susirinkote!?"

PunctuationMark = re.findall(r'[.,!?]', text)

print(PunctuationMark)

['.', ',', '!', '?']


#### 4. Remove all punctuation marks from text

In [13]:

text = "Sveiki. Šaunu, kad susirinkote!?"

without_separation_sign = re.sub(r'[.,!?]', '', text)

print(without_separation_sign)

Sveiki Šaunu kad susirinkote


#### 5. Remove all spaces from text

In [14]:

text = "Sveiki. Šaunu, kad susirinkote!?"

without_spaces = re.sub(r'\s', '', text)

print(without_spaces)

Sveiki.Šaunu,kadsusirinkote!?


#### 6. Find all hyphenated words in text

In [15]:

text = "Paskelbti 1-osios ir 3-osios vietų laimėtojai"

dashes = re.findall(r'\w+-\w+', text)

print(dashes)

['1-osios', '3-osios']


#### 7. Extract all email addresses from text

In [16]:

text = "Sveiki, visas bendras užklausas siųskite adresu info@mano.lt. Jei turite klausimų, susijusių su uždavinių sprendimu, siųskite juos adresu uzdaviniai@mano.lt. Dėl techninių problemų kreipkitės adresu pagalba@mano.lt."

addresses = re.findall(r'\S+@\S+', text)

print(addresses)

['info@mano.lt.', 'uzdaviniai@mano.lt.', 'pagalba@mano.lt.']


#### 8. Extract all usernames from email addresses

In [17]:

text = "Sveiki, visas bendras užklausas siųskite adresu info@mano.lt. Jei turite klausimų, susijusių su uždavinių sprendimu, siųskite juos adresu uzdaviniai@mano.lt. Dėl techninių problemų kreipkitės adresu pagalba@mano.lt."

addresses = re.findall(r'(\w+)@\w+\.\w+', text)

print(addresses)

['info', 'uzdaviniai', 'pagalba']
