Regex Data Extraction

Project Overview

This is an individual assignment for the Formative One - Regex Onboarding Hackathon. As a junior full-stack developer, I've created a web application that aggregates data from various sources. This project utilizes the raw power of regular expressions to efficiently extract specific data types from hundreds of pages of text. The repository contains a Python script that validates different data formats, demonstrating the ability to handle various data types and edge cases.

Features Implemented

The assignment required the implementation of at least four of the following data extractions. This project includes a command-line interface for validating these data formats:

Email Addresses: Validates email formats like user@example.com and firstname.lastname@company.co.uk.
URLs: Validates URLs such as https://www.example.com and https://subdomain.example.org/page.
Phone Numbers: Validates various phone number formats, including (123) 456-7890, 123-456-7890, and 123.456.7890.
Credit Card Numbers: Validates credit card formats like 1234 5678 90123456 and 1234-5678-9012-3456.
Time: Validates time in both 24-hour (14:30) and 12-hour (2:30 PM) formats.
HTML Tags: Validates HTML tags like <p>, <div class="example">, and <img src="image.jpg" alt="description">.
Hashtags: Validates hashtags such as #example and #ThisIsAHashtag.
Currency Amounts: Validates currency amounts like $19.99 and $1,234.56.

Requirements

Python 3.x: This project is implemented in Python.
re module: The built-in Python re module is used for regular expression operations.
GitHub Repository: The project is hosted on GitHub under the name alu_regex-data-extraction-Git-with-gideon, with the account created using your ALU email address.

How to Use the Script

Clone this repository: git clone https://github.com/Git-with-gideon/alu_regex-data-extraction-Git-with-gideon.git
Navigate to the project directory: cd alu_regex-data-extraction-Git-with-gideon
Run the script from your terminal: python main.py or python3 main.py
Follow the on-screen prompts to select a validation type and enter a string.
The script will tell you whether the input is valid or not.

And for test_cases.py

There are predefined inputs, and the functions will run on a predefined set of inputs

Code and Repository Quality

Code Quality: The code is clean, readable, and well-documented to explain the logic behind the regex patterns and functions.
README: This file provides a detailed overview of the project and setup instructions.
Test Cases: The repository includes sample inputs and their corresponding outputs to demonstrate the functionality of the regex solutions.

Example Output for main.py

-------------------------------------
Welcome to the Regex Data Validator!
Select a data type to validate:
1. Email Addresses
2. URLs
3. Phone Numbers
4. Hashtags
5. Currency Amounts
6. Exit
-------------------------------------
Enter your choice (1-6):

Regular Expression Patterns

The script uses the following regular expression patterns for validation:

Email: r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
URL: r'^(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/[a-zA-Z0-9]+\.[^\s]{2,}|[a-zA-Z0-9]+\.[^\s]{2,})$'
Phone Number: r'^$?\d{3}$?[-.\s]?\d{3}[-.\s]?\d{4}$'
Credit Card: r'^\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}$'
Time: r'^(?:[01]\d|2[0-3]):[0-5]\d(?::[0-5]\d)?$|^(?:1[0-2]|0?[1-9]):[0-5]\d(?::[0-5]\d)?\s?(?:am|pm)$'
HTML Tag: r'^<([a-z]+)([^>]*)>(.*?)<\/\1>$|^<([a-z]+)([^>]*)\/>$'
Hashtag: r'^#([a-zA-Z0-9_]+)$'
Currency: r'^\$?\d{1,3}(?:,?\d{3})*(?:\.\d{2})?$'

Author

Erioluwa Gideon Olowoyo : Full-stack developer in training.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
main.py		main.py
regex_patterns.py		regex_patterns.py
test_cases.py		test_cases.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Regex Data Extraction

Project Overview

Features Implemented

Requirements

How to Use the Script

And for test_cases.py

Code and Repository Quality

Example Output for main.py

Regular Expression Patterns

Author

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Git-with-gideon/alu_regex-data-extraction-Git-with-gideon

Folders and files

Latest commit

History

Repository files navigation

Regex Data Extraction

Project Overview

Features Implemented

Requirements

How to Use the Script

And for test_cases.py

Code and Repository Quality

Example Output for main.py

Regular Expression Patterns

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages