# Evolution of Website Design

## Overview

## This project analyzes the evolution of website design from the early 2000s, focusing on key aspects such as:

- ### Number of images per year
- ### Website size (in KB)
- ### Number of ads on the website
- ### Most used font
- ### Average font size
## Data was collected from arquivo.pt, a web archive, which provided snapshots of websites over time. The analysis covers various years (2000–2024), and the results are stored in a CSV file, then using data stored in CSV file visualized graphs were made to better understand the data.

## Technologies Used

- ### Python: For data scraping, processing, and analysis.
- ### BeautifulSoup: For extracting website content and HTML structure.
- ### Matplotlib: For visualizing the collected data in graphical form.
- ### Pandas: For handling and processing CSV data.

# URL's for data scraping

In [6]:
urls = {
    2000: "https://arquivo.pt/noFrame/replay/20000606140117/http://www.sapo.pt/",
    2001: "https://arquivo.pt/noFrame/replay/20010606140117/http://www.sapo.pt/",
    2002: "https://arquivo.pt/noFrame/replay/20020606140117/http://www.sapo.pt/",
    2003: "https://arquivo.pt/noFrame/replay/20030606140117/http://www.sapo.pt/",
    2004: "https://arquivo.pt/noFrame/replay/20040606140117/http://www.sapo.pt/",
    2005: "https://arquivo.pt/noFrame/replay/20050606140117/http://www.sapo.pt/",
    2006: "https://arquivo.pt/noFrame/replay/20060606140117/http://www.sapo.pt/",
    2007: "https://arquivo.pt/noFrame/replay/20070606140117/http://www.sapo.pt/",
    2008: "https://arquivo.pt/noFrame/replay/20080606140117/http://www.sapo.pt/",
    2009: "https://arquivo.pt/noFrame/replay/20090606140117/http://www.sapo.pt/",
    2010: "https://arquivo.pt/noFrame/replay/20100202180215/http://www.sapo.pt/",
    2011: "https://arquivo.pt/noFrame/replay/20110202180215/http://www.sapo.pt/",
    2012: "https://arquivo.pt/noFrame/replay/20120202180215/http://www.sapo.pt/",
    2013: "https://arquivo.pt/noFrame/replay/20130202180215/http://www.sapo.pt/",
    2014: "https://arquivo.pt/noFrame/replay/20140202180215/http://www.sapo.pt/",
    2015: "https://arquivo.pt/noFrame/replay/20150202180215/http://www.sapo.pt/",
    2016: "https://arquivo.pt/noFrame/replay/20160202180215/http://www.sapo.pt/",
    2017: "https://arquivo.pt/noFrame/replay/20170202180215/http://www.sapo.pt/",
    2018: "https://arquivo.pt/noFrame/replay/20180202180215/http://www.sapo.pt/",
    2019: "https://arquivo.pt/noFrame/replay/20190202180215/http://www.sapo.pt/",
    2020: "https://arquivo.pt/noFrame/replay/20200202180215/http://www.sapo.pt/"
}

# Analyzation of www.sapo.pt in 2000, 2010 and 2020

## *Photos and videos count*

In [1]:
import requests
from bs4 import BeautifulSoup
import csv

def analyze_media(archived_url):
    if archived_url:
        html_content = requests.get(archived_url).text
        soup = BeautifulSoup(html_content, 'html.parser')

        images = soup.find_all('img')
        
        return {
            'images': len(images)
        }
    return {
        'images': 0
    }

def collect_data(urls):
    data = []
    
    for year, url in urls.items():
        print(f"Fetching data for {year}...")
        media_data = analyze_media(url)
        data.append({'year': year, 'image_count': media_data['images']})

    with open('ImageCount.csv', mode='w', newline='') as file:
        writer = csv.DictWriter(file, fieldnames=['year', 'image_count'])
        writer.writeheader()
        for row in data:
            writer.writerow(row)

    print("Data saved to 'ImageCount.csv'")

collect_data(urls)


Fetching data for 2000...
Fetching data for 2001...
Fetching data for 2002...
Fetching data for 2003...
Fetching data for 2004...
Fetching data for 2005...
Fetching data for 2006...
Fetching data for 2007...
Fetching data for 2008...
Fetching data for 2009...
Fetching data for 2010...
Fetching data for 2011...
Fetching data for 2012...
Fetching data for 2013...
Fetching data for 2014...
Fetching data for 2015...
Fetching data for 2016...
Fetching data for 2017...
Fetching data for 2018...
Fetching data for 2019...
Fetching data for 2020...
Data saved to 'ImageCount.csv'


## *Calculating website size in bytes*

In [5]:
import requests
import csv

def get_page_size(archived_url):
    try:
        response = requests.get(archived_url)
        if response.ok:
            page_size = len(response.content)
            return page_size
        else:
            print(f"Error fetching page: {response.status_code}")
            return None
    except Exception as e:
        print(f"Exception occurred: {e}")
        return None

def collect_data(urls):
    data = []

    for year, url in urls.items():
        print(f"Fetching data for {year}...")
        size = get_page_size(url)
        if size is not None:
            data.append({'year': year, 'size': size})
        else:
            print(f"Year {year}: Failed to fetch size")

    with open('WebsiteSize.csv', mode='w', newline='') as file:
        writer = csv.DictWriter(file, fieldnames=['year', 'size'])
        writer.writeheader()
        for row in data:
            writer.writerow(row)

    print("Data saved to 'WebsiteSize.csv'")

collect_data(urls)


Fetching data for 2000...
Fetching data for 2001...
Fetching data for 2002...
Fetching data for 2003...
Fetching data for 2004...
Fetching data for 2005...
Fetching data for 2006...
Fetching data for 2007...
Fetching data for 2008...
Fetching data for 2009...
Fetching data for 2010...
Fetching data for 2011...
Fetching data for 2012...
Fetching data for 2013...
Fetching data for 2014...
Fetching data for 2015...
Fetching data for 2016...
Fetching data for 2017...
Fetching data for 2018...
Fetching data for 2019...
Fetching data for 2020...
Data saved to 'WebsiteSize.csv'


## *Number of ads:*

In [12]:
import requests
from bs4 import BeautifulSoup
import csv

def fetch_archived_page(archived_url):
    try:
        response = requests.get(archived_url)
        if response.ok:
            return response.text
        else:
            print(f"Failed to fetch page: {response.status_code}")
            return None
    except Exception as e:
        print(f"Exception occurred: {e}")
        return None

def analyze_ads(html_content):
    soup = BeautifulSoup(html_content, 'html.parser')
    ad_divs = soup.find_all('div', class_=lambda x: x and ('ad' in x or 'advertisement' in x.lower()))
    iframes = soup.find_all('iframe')
    script_tags = soup.find_all('script')

    ads_count = len(ad_divs) + len(iframes) + len(script_tags)

    return {'ads_count': ads_count}

def collect_data(urls):
    data = []

    for year, url in urls.items():
        print(f"Fetching data for {year}...")
        html_content = fetch_archived_page(url)
        ads_data = analyze_ads(html_content)
        data.append({'year': year, 'adds': ads_data['ads_count']})

    with open('NumberOfAdds.csv', mode='w', newline='') as file:
        writer = csv.DictWriter(file, fieldnames=['year', 'adds'])
        writer.writeheader()
        for row in data:
            writer.writerow(row)

    print("Data saved to 'NumberOfAdds.csv'")

collect_data(urls)


Fetching data for 2000...
Fetching data for 2001...
Fetching data for 2002...
Fetching data for 2003...
Fetching data for 2004...
Fetching data for 2005...
Fetching data for 2006...
Fetching data for 2007...
Fetching data for 2008...
Fetching data for 2009...
Fetching data for 2010...
Fetching data for 2011...
Fetching data for 2012...
Fetching data for 2013...
Fetching data for 2014...
Fetching data for 2015...
Fetching data for 2016...
Fetching data for 2017...
Fetching data for 2018...
Fetching data for 2019...
Fetching data for 2020...
Data saved to 'NumberOfAdds.csv'


## *Most used font and average font size*

In [14]:
import requests
from bs4 import BeautifulSoup
import re
from collections import Counter
import csv

def fetch_archived_page(archived_url):
    try:
        response = requests.get(archived_url)
        return response.text if response.ok else None
    except Exception as e:
        print(f"Exception occurred: {e}")
        return None

def fetch_css_file(css_url):
    try:
        response = requests.get(css_url)
        return response.text if response.ok else None
    except Exception as e:
        print(f"Exception occurred: {e}")
        return None

def analyze_typography(html_content):
    soup = BeautifulSoup(html_content, 'html.parser')
    text_elements = soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'p', 'span'])
    font_sizes, font_families = [], []

    for element in text_elements:
        style = element.get('style')
        if style:
            if 'font-size' in style:
                font_sizes.append(style.split('font-size:')[-1].split(';')[0].strip())
            if 'font-family' in style:
                font_families.append(style.split('font-family:')[-1].split(';')[0].strip())

    external_links = soup.find_all('link', rel="stylesheet")
    for link in external_links:
        css_href = link.get('href')
        if css_href:
            css_url = css_href if css_href.startswith('http') else f"https://arquivo.pt{css_href}"
            css_content = fetch_css_file(css_url)
            if css_content:
                extract_fonts_from_css(css_content, font_sizes, font_families)

    return font_sizes, font_families

def extract_fonts_from_css(css_content, font_sizes, font_families):
    font_sizes.extend(re.findall(r'font-size:\s*([^;]+);', css_content))
    font_families.extend(re.findall(r'font-family:\s*([^;]+);', css_content))

def calculate_average_font_size(font_sizes):
    total_size, count = 0, 0
    for size in font_sizes:
        size_value = re.findall(r'(\d*\.?\d+)', size)
        if size_value:
            value = float(size_value[0])
            if 'px' in size:
                total_size += value
            elif 'em' in size:
                total_size += value * 16
            elif '%' in size:
                total_size += (value / 100) * 16
            count += 1
    return total_size / count if count > 0 else 0

def get_most_used_font(font_families):
    if font_families:
        font_count = Counter(font_families)
        most_common_font = font_count.most_common(1)[0]
        return most_common_font[0], most_common_font[1]
    return None, 0

def collect_data_to_csv(archived_urls):
    data = []

    for year, url in archived_urls.items():
        print(f"Fetching data for {year}...")
        html_content = fetch_archived_page(url)
        if html_content:
            font_sizes, font_families = analyze_typography(html_content)
            average_font_size = calculate_average_font_size(font_sizes)
            most_used_font, count = get_most_used_font(font_families)
            data.append({
                'year': year,
                'average_font_size_px': average_font_size,
                'most_used_font': most_used_font,
                'font_usage_count': count
            })
            print(f"{year}: Average Font Size = {average_font_size:.2f}px, Most Used Font = {most_used_font} ({count} times)")
        else:
            print(f"Failed to fetch the page for the year {year}.")
            data.append({
                'year': year,
                'average_font_size_px': None,
                'most_used_font': None,
                'font_usage_count': 0
            })

    # Save to CSV
    with open('TypographyReport.csv', mode='w', newline='') as file:
        writer = csv.DictWriter(file, fieldnames=['year', 'average_font_size_px', 'most_used_font', 'font_usage_count'])
        writer.writeheader()
        for row in data:
            writer.writerow(row)

    print("\nTypography analysis data saved to 'TypographyReport.csv'")

collect_data_to_csv(urls)


Fetching data for 2000...
2000: Average Font Size = 0.00px, Most Used Font = None (0 times)
Fetching data for 2001...
2001: Average Font Size = 0.00px, Most Used Font = None (0 times)
Fetching data for 2002...
2002: Average Font Size = 0.00px, Most Used Font = None (0 times)
Fetching data for 2003...
2003: Average Font Size = 0.00px, Most Used Font = None (0 times)
Fetching data for 2004...
2004: Average Font Size = 0.00px, Most Used Font = None (0 times)
Fetching data for 2005...
2005: Average Font Size = 0.00px, Most Used Font = None (0 times)
Fetching data for 2006...
2006: Average Font Size = 0.00px, Most Used Font = None (0 times)
Fetching data for 2007...
2007: Average Font Size = 0.00px, Most Used Font = None (0 times)
Fetching data for 2008...
2008: Average Font Size = 14.59px, Most Used Font = Arial, Helvetica, sans-serif (1 times)
Fetching data for 2009...
2009: Average Font Size = 14.43px, Most Used Font = Arial, Helvetica, sans-serif (1 times)
Fetching data for 2010...
2010