Skip to content

This is a project that demonstrates the use of Regular Expressions (Regex) in Python to extract specific data types from text strings. The goal is to develop a tool that can identify and extract various pieces of information, such as email addresses, URLs, and phone numbers, from a large body of text.

Notifications You must be signed in to change notification settings

Git-with-gideon/alu_regex-data-extraction-Git-with-gideon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Regex Data Extraction

Project Overview

This is an individual assignment for the Formative One - Regex Onboarding Hackathon. As a junior full-stack developer, I've created a web application that aggregates data from various sources. This project utilizes the raw power of regular expressions to efficiently extract specific data types from hundreds of pages of text. The repository contains a Python script that validates different data formats, demonstrating the ability to handle various data types and edge cases.


Features Implemented

The assignment required the implementation of at least four of the following data extractions. This project includes a command-line interface for validating these data formats:

  • Email Addresses: Validates email formats like user@example.com and firstname.lastname@company.co.uk.
  • URLs: Validates URLs such as https://www.example.com and https://subdomain.example.org/page.
  • Phone Numbers: Validates various phone number formats, including (123) 456-7890, 123-456-7890, and 123.456.7890.
  • Credit Card Numbers: Validates credit card formats like 1234 5678 90123456 and 1234-5678-9012-3456.
  • Time: Validates time in both 24-hour (14:30) and 12-hour (2:30 PM) formats.
  • HTML Tags: Validates HTML tags like <p>, <div class="example">, and <img src="image.jpg" alt="description">.
  • Hashtags: Validates hashtags such as #example and #ThisIsAHashtag.
  • Currency Amounts: Validates currency amounts like $19.99 and $1,234.56.

Requirements

  • Python 3.x: This project is implemented in Python.
  • re module: The built-in Python re module is used for regular expression operations.
  • GitHub Repository: The project is hosted on GitHub under the name alu_regex-data-extraction-Git-with-gideon, with the account created using your ALU email address.

How to Use the Script

  1. Clone this repository: git clone https://github.com/Git-with-gideon/alu_regex-data-extraction-Git-with-gideon.git
  2. Navigate to the project directory: cd alu_regex-data-extraction-Git-with-gideon
  3. Run the script from your terminal: python main.py or python3 main.py
  4. Follow the on-screen prompts to select a validation type and enter a string.
  5. The script will tell you whether the input is valid or not.

And for test_cases.py

There are predefined inputs, and the functions will run on a predefined set of inputs

Code and Repository Quality

  • Code Quality: The code is clean, readable, and well-documented to explain the logic behind the regex patterns and functions.
  • README: This file provides a detailed overview of the project and setup instructions.
  • Test Cases: The repository includes sample inputs and their corresponding outputs to demonstrate the functionality of the regex solutions.

Example Output for main.py

-------------------------------------
Welcome to the Regex Data Validator!
Select a data type to validate:
1. Email Addresses
2. URLs
3. Phone Numbers
4. Hashtags
5. Currency Amounts
6. Exit
-------------------------------------
Enter your choice (1-6):


Regular Expression Patterns

The script uses the following regular expression patterns for validation:

  • Email: r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
  • URL: r'^(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/[a-zA-Z0-9]+\.[^\s]{2,}|[a-zA-Z0-9]+\.[^\s]{2,})$'
  • Phone Number: r'^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$'
  • Credit Card: r'^\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}$'
  • Time: r'^(?:[01]\d|2[0-3]):[0-5]\d(?::[0-5]\d)?$|^(?:1[0-2]|0?[1-9]):[0-5]\d(?::[0-5]\d)?\s?(?:am|pm)$'
  • HTML Tag: r'^<([a-z]+)([^>]*)>(.*?)<\/\1>$|^<([a-z]+)([^>]*)\/>$'
  • Hashtag: r'^#([a-zA-Z0-9_]+)$'
  • Currency: r'^\$?\d{1,3}(?:,?\d{3})*(?:\.\d{2})?$'

Author

  • Erioluwa Gideon Olowoyo : Full-stack developer in training.

About

This is a project that demonstrates the use of Regular Expressions (Regex) in Python to extract specific data types from text strings. The goal is to develop a tool that can identify and extract various pieces of information, such as email addresses, URLs, and phone numbers, from a large body of text.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages