### Problem Statement

In the rapidly evolving field of aviation, regulatory bodies such as the Federal Aviation Administration (FAA) in the United States, Transport Canada (TC), and the European Union Aviation Safety Agency (EASA) frequently update their policies, regulations, and safety guidelines. These changes are crucial for maintaining the highest standards of safety, efficiency, and compliance within the aviation industry. Aviation training organizations and educational institutions must constantly monitor and incorporate these updates into their curriculum to ensure that professionals are learning the most current and relevant information.



### Features of the Citation Tracking and Validation Script

1. **Comprehensive Change Tracking:** The script is capable of monitoring and identifying changes not only to the text within the cited sources from FAA and other aviation authorities but also to images, including visual charts and statistical information. This ensures a holistic update of all forms of content that might impact aviation training and education.

2. **Link Tracking:** It efficiently tracks the URLs or links embedded within these sources, ensuring that all external references remain current and accessible. This feature is critical for maintaining the integrity of the citation and the reliability of the content provided to clients.

3. **User-Friendly Citation Input:** Users can easily input citation links into the database, making it simple to specify which sources need monitoring for updates or changes. This input process is designed to be intuitive and accessible, allowing for a broad range of users to effectively manage their citation tracking needs.

4. **Automated Weekly Updates:** The script is programmed to conduct checks for changes on a weekly basis automatically. This frequency ensures that updates are caught and reported in a timely manner, without inundating users with too much information or requiring daily monitoring efforts.
 
5. **Reliable Information Delivery:** By providing up-to-date changes from authoritative sources, the script ensures that clients always have access to the most current and reliable information. This is essential for organizations that depend on the latest regulatory updates to maintain compliance and instructional relevance in the fast-paced aviation industry.

### Importing Libraries

In [1]:
!pip install requests
!pip install bs4

Collecting bs4
  Downloading bs4-0.0.1.tar.gz (1.1 kB)
  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py) ... [?25ldone
[?25h  Created wheel for bs4: filename=bs4-0.0.1-py3-none-any.whl size=1256 sha256=3feef3f6b2938ac4435b9210bdab299f1dd9cf7dcc7382879e4fa0bfcf866dc4
  Stored in directory: /Users/akshaykhandelwal/Library/Caches/pip/wheels/d4/c8/5b/b5be9c20e5e4503d04a6eac8a3cd5c2393505c29f02bea0960
Successfully built bs4
Installing collected packages: bs4
Successfully installed bs4-0.0.1


In [2]:
import requests
from bs4 import BeautifulSoup
import hashlib
import os
import time

## Testing

In [9]:
import requests
from bs4 import BeautifulSoup

# Function to extract all content information from a web URL
def extract_content_info(url):
    try:
        # Send a GET request to the URL
        response = requests.get(url)

        # Check if the request was successful (status code 200)
        if response.status_code == 200:
            # Parse the HTML content using Beautiful Soup
            soup = BeautifulSoup(response.content, 'html.parser')

            # Extract text content
            text_content = soup.get_text()

            # Extract all links (href attributes)
            links = [link.get('href') for link in soup.find_all('a') if link.get('href')]

            # Extract image URLs (src attributes)
            images = [img.get('src') for img in soup.find_all('img') if img.get('src')]

            # Return the extracted content information
            return {
                'text_content': text_content,
                'links': links,
                'images': images
                # Add more content types as needed
            }
        else:
            print(f"Failed to fetch URL: {url}. Status code: {response.status_code}")
            return None

    except requests.RequestException as e:
        print(f"An error occurred: {e}")
        return None

# URL to extract content information from
target_url = 'https://tc.canada.ca/en/programs/airport-critical-infrastructure-program' # A sample Transport Canada Link

# Extract content information
content_info = extract_content_info(target_url)

if content_info:
    # Print extracted content information
    print("Text Content:")
    print(content_info['text_content'])
    print("\nLinks:")
    print(content_info['links'])
    print("\nImages:")
    print(content_info['images'])
    # Print other types of content extracted as needed


Text Content:





































Airport Critical Infrastructure Program













































            Skip to main content
          



            Skip to "About this site"
          









Language selection


WxT Language switcher
 Françaisfr







 /
        Gouvernement du Canada











Search this site



Customize your search
canada.ca.ca

Search







Menu
Main Menu 


Home


Jobs and the workplace


Immigration and citizenship


Travel and tourism


Business and industry


Health


Benefits


Health – More




Taxes


Environment and natural resources


National security and defence


Culture, history and sport


Policing, justice and emergencies


Transport and infrastructure


Canada and the world


Money and finances


Science and innovation






You are here



Canada.ca


Transport Canada


Programs








 






Airport Critical Infrastructure Program










From: Transport Canada
















The Airport C

### Refactoring For Multiple Links

In [11]:
import requests
from bs4 import BeautifulSoup

# Define the URLs in a dictionary with descriptive keys
urls = {
    'SMS L101': "https://www.faa.gov/air_traffic/flight_info/aeronav/digital_products/aero_guide/",
    'SMS L201': "https://tc.canada.ca/en/programs/airport-critical-infrastructure-program",
}

def extract_content_info(url):
    """Fetch and extract content information from a given URL."""
    try:
        response = requests.get(url)
        if response.status_code == 200:  ## Checking if link is still active
            soup = BeautifulSoup(response.content, 'html.parser')
            text_content = soup.get_text()  #Extracting Text
            links = [link.get('href') for link in soup.find_all('a') if link.get('href')]  #Extracting Links
            images = [img.get('src') for img in soup.find_all('img') if img.get('src')] #Extracting Visual charts and Images
            return {
                'text_content': text_content,
                'links': links,
                'images': images
            }
        else:
            print(f"Failed to fetch URL: {url}. Status code: {response.status_code}.")  #Broken link flagging
            return None
    except requests.RequestException as e:
        print(f"An error occurred: {e}")
        return None

def extract_multiple_urls(urls_dict):
    """Extract content from multiple URLs and organize it by key."""
    results = {}
    for key, url in urls_dict.items():
        content_info = extract_content_info(url)
        if content_info:
            results[key] = content_info
        else:
            results[key] = "Failed to fetch or process URL."
    return results

# Extract content information for each URL in the dictionary
content_infos = extract_multiple_urls(urls)

# Process and output the extracted content
for key, content in content_infos.items():
    if isinstance(content, dict):
        print(f"Results for {key}:")
        print("Text Content:")
        print(content['text_content'][:1000], "...")  # Display a snippet of text content for brevity
        print("Links:")
        for link in content['links']:
            print(link)
        print("Images:")
        for image in content['images']:
            print(image)
        print("\n")
    else:
        print(f"{key}: {content}")


Results for SMS L101:
Text Content:




                          
Aeronautical Chart Users' Guide










Skip to page content





Please enable JavaScript to use this site.   Skip to main content      USA Banner           An official website of the United States government Here's how you know            Official websites use .govA .gov website belongs to an official government organization in the United States.        Secure .gov websites use HTTPS A lock ( LockA locked padlock ) or https:// means youâ€™ve safely connected to the .gov website. Share sensitive information only on official, secure websites.                  United States Department of Transportation United States Department of Transportation                           Secondary navigation   About   Jobs   News               Enter Search Term(s):              Ultimenu: Main navigation Aircraft   Subnav: Aircraft 1   Aircraft Certification   Aviation Safety Draft Documents Open for Comment   Vintage & Experimental Airc