**Broken Link Validator**

**Description:** Checks the validity of URLs in Excel files and classifies them as Working, Redirected, Blocked, or Not working.

In [21]:
#!pip install requests

In [22]:
import os  
import pandas as pd  
import requests
from concurrent.futures import ThreadPoolExecutor

In [23]:
os.chdir(r"C:/Users/mmaha/projects/broken links")

In [24]:
#os.getcwd()
print(os.listdir())

['.gitignore.txt', '.ipynb_checkpoints', 'data', 'example.png', 'notebook', 'README.md.txt', 'requirements.txt.txt', 'url_status_2024.xlsx']


In [25]:
df = pd.read_excel("data/check url status.xls")

In [26]:
df.tail()

Unnamed: 0,URL,Title
774,https://www.indianretailer.com/news/retail-ind...,Retail India News: Vedix and Shoppers Stop Lau...
775,https://eu.fdlreporter.com/story/money/2024/05...,Remember when bowling was a favorite pastime f...
776,https://chiefswire.usatoday.com/2024/08/23/nc-...,NC Sportsbook Promos: Best North Carolina Spor...
777,https://www.energyjobline.com/job/director-sus...,"Director, Sustainability Reporting APAC"
778,https://www.cbs17.com/business/press-releases/...,Military's Leading Foodservice Units Honored f...


In [27]:
# Function to check URL status
def check_url(URL):
    try:
        response = requests.get(URL, allow_redirects=True, timeout=10)
        if response.status_code == 200:
            return "Working"
        elif response.status_code in [301, 302]:  
            return "Redirected but Working"
        elif response.status_code == 403:  
            return "Blocked (403 Forbidden)"
        else:
            return f"Not Working ({response.status_code})"
    except requests.RequestException as e:
        return f"Not Working ({str(e)})"

# Apply ThreadPoolExecutor for faster execution
with ThreadPoolExecutor(max_workers=10) as executor:
    df['Status'] = list(executor.map(check_url, df['URL']))

# Export to Excel
df.to_excel("data/url_status.xlsx", index=False)

# Print the DataFrame
df

Unnamed: 0,URL,Title,Status
0,https://simpleflying.com/how-to-spend-skymiles...,5 Ways To Spend Delta SkyMiles Without Flying,Working
1,https://sports.yahoo.com/portugal-route-final-...,Portugal Route To The Final: Winning start put...,Working
2,https://indiaeducationdiary.in/anandana-the-co...,Anandana- The Coca-Cola India Foundation and I...,Blocked (403 Forbidden)
3,https://www.theringer.com/movies/2024/5/30/241...,The Postapocalyptic Movie Survivability Index,Working
4,https://www.iracing.com/enascar-coca-cola-irac...,eNASCAR Coca-Cola iRacing Series Returns to NA...,Working
...,...,...,...
774,https://www.indianretailer.com/news/retail-ind...,Retail India News: Vedix and Shoppers Stop Lau...,Blocked (403 Forbidden)
775,https://eu.fdlreporter.com/story/money/2024/05...,Remember when bowling was a favorite pastime f...,Working
776,https://chiefswire.usatoday.com/2024/08/23/nc-...,NC Sportsbook Promos: Best North Carolina Spor...,Not Working (404)
777,https://www.energyjobline.com/job/director-sus...,"Director, Sustainability Reporting APAC",Working


**ERROR description**

If response.status_code == 200, then the URL is considered "Working" in your script.

301 Moved Permanently → The URL has changed permanently, and future requests should use the new URL.

302 Found (Temporary Redirect): The resource is temporarily moved to another location.

HTTP 403 (Forbidden) → "Blocked (403 Forbidden)"
- Possible causes:
- Website blocks automated requests (bot protection).
- Missing authentication or permission.
- IP restrictions.

404 Not Found -> The requested URL does not exist.
  
500 Internal Server Error -> The server encountered an error.

405 Method Not Allowed -> The request method is not supported by the server.

503 Service Unavailable -> The server is down or overloaded.
