# Static Data Scraper

This notebook extracts static data (HTML tables/text) from the [Sri Lanka Treasury website](https://www.treasury.gov.lk/) using:
- `requests` to fetch HTML
- `BeautifulSoup` to parse and navigate
- `pandas` to read HTML tables


In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

print("✅ Libraries imported successfully.")


✅ Libraries imported successfully.


Fetch homepage HTML

In [2]:
url = "https://www.treasury.gov.lk/"
headers = {"User-Agent": "Group5Scraper/1.0 (+https://github.com/YourGitHubRepoLink)"}

response = requests.get(url, headers=headers)

if response.status_code == 200:
    html_content = response.text
    print("✅ Page fetched successfully.")
else:
    print("❌ Failed to fetch page. Status code:", response.status_code)


✅ Page fetched successfully.


Parse with BeautifulSoup

In [3]:
try:
    tables = pd.read_html(url)
    print(f"✅ Found {len(tables)} tables.")
    if tables:
        display(tables[0])  # Show first table
except Exception as e:
    print("⚠️ No HTML tables found or error reading tables:", e)


✅ Found 4 tables.


Unnamed: 0,Currency,Buying,Selling
0,USD,297.35LKR,304.87LKR
1,GBP,401.53LKR,414.18LKR


Save HTML to file

In [5]:
with open("data_raw/treasury_homepage.html", "w", encoding="utf-8") as f:
    f.write(html_content)

print("💾 HTML saved to data_raw/treasury_homepage.html")


💾 HTML saved to data_raw/treasury_homepage.html


In [8]:
from bs4 import BeautifulSoup
import pandas as pd

# Open saved HTML file
with open("data_raw/treasury_homepage.html", "r", encoding="utf-8") as f:
    soup = BeautifulSoup(f, "html.parser")

# Example scraping (modify selectors for your case)
data = []
for item in soup.find_all("a"):  
    data.append({
        "text": item.get_text(strip=True),
        "link": item.get("href")
    })

# Convert to DataFrame
df = pd.DataFrame(data)

# Save raw CSV
output_path = "data_raw/treasury_data_raw.csv"
df.to_csv(output_path, index=False, encoding="utf-8")
print(f"✅ Raw data saved to: {output_path}")


✅ Raw data saved to: data_raw/treasury_data_raw.csv
