# Digital Shift: The Evolution of Products and Platforms in Portuguese E-commerce

## Introduction:

E-commerce has become a major player in the retail landscape, especially in sectors like electronics, where online shopping has seen significant growth.

In recent years, the competition among retailers in Portugal as Worten, Fnac, Rádio Popular, El Corte Inglés, Pc Diga and Staples has intensified. This competition has driven changes in both the types of products offered and the functionalities of their respective platforms, shaping the way consumers interact with online shopping.

By analyzing historical data from Arquivo.pt, it is possible to explore the evolution of these e-commerce platforms, providing insights into how product offerings and platform features have developed over time.

## Context

The Arquivo.pt platform stores archived web pages dating back to 1996, making it an invaluable resource for examining the historical trajectory of e-commerce in Portugal.

For this project, Arquivo.pt will be the primary data source, allowing us to analyze past versions of e-commerce websites for major electronics retailers such as Worten, Fnac, Rádio Popular, El Corte Inglés, Pc Diga and Staples.

By examining archived versions of these websites, we aim to track the changes in product offerings (with a focus on electronic devices like smartphones, laptops, and televisions) and investigate how the functionalities of these platforms have evolved to improve the customer experience.


## Goals

The main objectives of this project are:

- **Analyze the evolution of electronic product offerings over the years**, identifying key trends in product categories (e.g., smartphones, televisions, laptops), and examining which products gained popularity or disappeared from the market.

- **Compare specific devices**, such as various models of the iPhone, in terms of pricing and features across multiple retailers, highlighting how these factors have shifted over time.

- **Examine price trends and promotional strategies** used by different retailers during key sales periods, such as Black Friday or Christmas, to identify which platforms offer the best deals.

- **Evaluate the evolution of e-commerce platform functionalities**, particularly search mechanisms, filters, and personalization tools, to understand how they improved the user experience. (This part is optional and may be omitted if there isn't enough time to complete it.)




### Data Collection:



In [2]:
sites = ["www.fnac.pt", "www.worten.pt", "www.elcorteingles.pt", "www.radiopopular.pt", "www.staples.pt", "www.pcdiga.com"]

In [3]:
import requests
import urllib.parse
import json
import time
import pandas as pd

In [4]:
## Função para procurar no arquivo.pt
def arquivo_search(query=None, max_items=500, from_year=None, to_year=None, site=None, doc_type=None, version_history_url=None):
    if version_history_url:
        encoded_url = urllib.parse.quote(version_history_url, safe='')
        base_url = f"https://arquivo.pt/textsearch?versionHistory={encoded_url}"
    else:
        base_url = "https://arquivo.pt/textsearch?q=" + urllib.parse.quote(query)  # query precisa ser codificada também
    
    if from_year and to_year:
        base_url += f"&from={from_year}&to={to_year}"
    
    if site:
        base_url += f"&siteSearch={site}"
    
    if doc_type:
        base_url += f"&type={doc_type}"
    
    base_url += f"&maxItems={max_items}&prettyPrint=false"
    
    response = requests.get(base_url)
    
    if response.status_code == 200:
        return response.json()
    else:
        print(f"Erro na requisição: {response.status_code}")
        return None

In [118]:
# Extrair a pagina principal de cada um dos sites, ao longo dos anos (2005 a 2023)
def process_sites(sites):
    # Dicionário para armazenar links por site e por ano
    site_data = {}

    for site in sites:
        print(f"Processando site: {site}")
        site_links_by_year = {year: [] for year in range(2005, 2024)}
        
        for year in range(2005, 2024):
            
            #procura pelo version history
            version_history_url = f"http://{site}/"
            response = arquivo_search(version_history_url=version_history_url, from_year=year, to_year=year + 1)

            if response and 'response_items' in response:
                response_items = response['response_items']

                for item in response_items:
                    item_year = int(item['tstamp'][0:4]) # os primeiros 4 caracteres representam o ano

                    # guardar 2 links por ano
                    if item_year == year and item['originalURL'] == version_history_url:
                        if len(site_links_by_year[year]) < 2:
                            site_links_by_year[year].append(item['linkToArchive'])
                        else:
                            break


            # no caso de não encontrar nenhum link, colcoar como none o value
            if len(site_links_by_year[year]) == 0:
                site_links_by_year[year] = None
        
        site_data[site] = site_links_by_year
    
    return site_data



In [119]:
dados_sites = process_sites(sites)

print(dados_sites)

Processando site: www.fnac.pt
Processando site: www.worten.pt
Processando site: www.elcorteingles.pt
Processando site: www.radiopopular.pt
Processando site: www.staples.pt
Processando site: www.pcdiga.com
{'www.fnac.pt': {2005: None, 2006: ['https://arquivo.pt/wayback/20061118120805/http://www.fnac.pt/', 'https://arquivo.pt/wayback/20061001061029/http://www.fnac.pt/'], 2007: ['https://arquivo.pt/wayback/20070928223117/http://www.fnac.pt/', 'https://arquivo.pt/wayback/20070607054401/http://www.fnac.pt/'], 2008: ['https://arquivo.pt/wayback/20081027081756/http://www.fnac.pt/', 'https://arquivo.pt/wayback/20081021193426/http://www.fnac.pt/'], 2009: ['https://arquivo.pt/wayback/20091218064527/http://www.fnac.pt/', 'https://arquivo.pt/wayback/20090925194633/http://www.fnac.pt/'], 2010: ['https://arquivo.pt/wayback/20100804062306/http://www.fnac.pt/', 'https://arquivo.pt/wayback/20100605082703/http://www.fnac.pt/'], 2011: ['https://arquivo.pt/wayback/20110702090458/http://www.fnac.pt/', 'htt

In [120]:
for site, links_by_year in dados_sites.items():
    print(f"Site: {site}")
    for year, link in links_by_year.items():
        print(f"Ano: {year}, Link: {link}")

Site: www.fnac.pt
Ano: 2005, Link: None
Ano: 2006, Link: ['https://arquivo.pt/wayback/20061118120805/http://www.fnac.pt/', 'https://arquivo.pt/wayback/20061001061029/http://www.fnac.pt/']
Ano: 2007, Link: ['https://arquivo.pt/wayback/20070928223117/http://www.fnac.pt/', 'https://arquivo.pt/wayback/20070607054401/http://www.fnac.pt/']
Ano: 2008, Link: ['https://arquivo.pt/wayback/20081027081756/http://www.fnac.pt/', 'https://arquivo.pt/wayback/20081021193426/http://www.fnac.pt/']
Ano: 2009, Link: ['https://arquivo.pt/wayback/20091218064527/http://www.fnac.pt/', 'https://arquivo.pt/wayback/20090925194633/http://www.fnac.pt/']
Ano: 2010, Link: ['https://arquivo.pt/wayback/20100804062306/http://www.fnac.pt/', 'https://arquivo.pt/wayback/20100605082703/http://www.fnac.pt/']
Ano: 2011, Link: ['https://arquivo.pt/wayback/20110702090458/http://www.fnac.pt/', 'https://arquivo.pt/wayback/20110519163144/http://www.fnac.pt/']
Ano: 2012, Link: ['https://arquivo.pt/wayback/20120122102914/http://www.

In [122]:
# guardar os dados num ficheiro csv
df = pd.DataFrame(dados_sites)
df.to_csv("sites_links.csv")


# Data collection
A primeira analise que vamos fazer é coletar o numero de Categorias, e quais categorias estao presentes em cada um dos sites de x em x anos.