The script utilizes regular expressions, requests library, and BeautifulSoup to scrape a webpage, extract phone numbers, and display corresponding names and addresses.

In [2]:
import re
import requests
from bs4 import BeautifulSoup

The function scrapes a webpage, extracts information such as names, contact numbers, and addresses from a telephone directory table, and prints each entry along with its corresponding details, including phone numbers, using regular expressions.

The re module in Python provides support for working with regular expressions, allowing users to search for patterns within strings, perform substitutions, and more. In the provided script, re.compile() is used to create a regular expression pattern that matches phone numbers with various formats such as "123-456-7890", "123 456 7890", "123.456.7890", etc. This pattern is then applied to the contact number field to extract phone numbers from each entry in the telephone directory.

In [3]:
def scrape_telephone_directory(url):
    # Send a GET request to the URL
    response = requests.get(url)
    
    # Check if the request was successful (status code 200)
    if response.status_code == 200:
        # Parse the HTML content
        soup = BeautifulSoup(response.content, 'html.parser')
        
        # Find all <tr> tags within the table
        rows = soup.find_all('tr')
        
        # Define a regular expression pattern for phone numbers
        phone_number_pattern = re.compile(r'\b\d{3}[-.\s]?\d{3}[-.\s]?\d{4}\b')
        
        # Loop through each row and extract data
        for row in rows:
            # Find all <td> tags within the row
            columns = row.find_all('td')
            # Check if there are exactly three columns (name, contact number, address)
            if len(columns) == 3:
                # Extract name, contact number, and address
                name = columns[0].text.strip()
                contact_number = columns[1].text.strip()
                address = columns[2].text.strip()
                # Find all phone numbers in the contact number field
                phone_numbers = phone_number_pattern.findall(contact_number)
                # Print or save the extracted data
                for phone_number in phone_numbers:
                    print("Name:", name)
                    print("Contact Number:", phone_number)
                    print("Address:", address)
                    print("------------")
    else:
        print("Failed to retrieve the webpage.")


This block of code ensures that the scrape_telephone_directory function is executed only when the script is run directly, setting the URL of the webpage to scrape and calling the function with that URL.

In [4]:
if __name__ == "__main__":
    url = "https://ictadministration.gov.pk/telephone-directory/"
    scrape_telephone_directory(url)


Name: Chief Commissioner Office
Contact Number: 051-9108312
Address: ICT Administration Complex, Mauve Area, G-11/4, Islamabad
------------
Name: Islamabad Police
Contact Number: 051-9258371
Address: Police Lines Headquarters, H-11, Islamabad
------------
Name: Deputy Commissioner Office
Contact Number: 051-9108108
Address: ICT Administration Complex, Mauve Area, G-11/4, Islamabad
------------
Name: Deputy Commissioner Office
Contact Number: 051-9108109
Address: ICT Administration Complex, Mauve Area, G-11/4, Islamabad
------------
Name: Revenue Department
Contact Number: 051-9262372
Address: Plot No. 65, Near Honda Workshop, I-10/3, Islamabad
------------
Name: Labour Department
Contact Number: 051-9108397
Address: ICT Agriculture Complex, Mauve Area, G-11/4, Islamabad
------------
Name: Cooperative Societies Department
Contact Number: 051-9261239
Address: F-8 Markaz, Islamabad
------------
Name: Civil Defense Department
Contact Number: 051-9160778
Address: F-8 Markaz, Islamabad
-----

This Python code defines a function named scrape_telephone_directory that retrieves the HTML content of a webpage specified by the URL parameter using the requests library. It then parses the HTML content using BeautifulSoup, searching for all <tr> tags representing rows in a table. Within each row, it extracts data such as name, contact number, and address, utilizing a regular expression pattern to find phone numbers. For each entry, it prints the name, contact number, and address, along with the corresponding phone numbers found in the contact number field. The if __name__ == "__main__": block ensures that the function is executed only when the script is run directly, setting the URL of the webpage to scrape and calling the scrape_telephone_directory function with that URL.