Atributos del nodo:

1. Nombre (del conocimiento)
2. Literal (descripción)
3. Narrower concepts (conocimientos hijos)
4. Broader concepts (conocimientos ancestros)
5. Tipo de competencia (conocimiento, habilidad,...)
6. Competencia optativa para la ocupación x
7. Competencia esencial para la ocupación x
8. Nivel del conocimiento
9. Número de ocupaciones relacionadas al nodo
10. peso? (pensar si definir un peso de la arista es necesario)


----

Pasos:

1. Conectarse con la API de ESCO  **(Done)**
2. Comprender cuales son las llaves de la respuesta **(Done)**
3. Extraer el listado de todos los conocimientos con la API **(Done)**
4. Extraer la metadata de los concomientos extraídos en el paso 3. **(Done)**
6. Crear el grafo. **(Done)**

-----

### Ejemplo de conexión con la API

In [63]:
import requests

def get_esco_concept_data(uri: str, language: str = "es") -> dict:
    """
    Consulta la ESCO API para un recurso de tipo 'skill' o 'concept',
    y devuelve una versión filtrada con solo los datos esenciales.

    Args:
        uri (str): URI del recurso ESCO (por ejemplo, un skill o categoría).
        language (str): Idioma preferido (por defecto "es").

    Returns:
        dict: Diccionario con los datos esenciales o un error.
    """

    # Endpoint de ESCO (modifica a 'concept' si es necesario)
    url = "https://ec.europa.eu/esco/api/resource/skill"

    params = {
        "uri": uri,
        "language": language
    }

    headers = {
        "Accept": "application/json"
    }

    response = requests.get(url, params=params, headers=headers)

    if response.status_code == 200:
        data = response.json()

        # Claves de interés dentro de _links
        link_keys = [
            'hasSkillType', 'broaderConcept', 'broaderHierarchyConcept',
            'narrowerSkill', 'narrowerConcept',
            'isEssentialForOccupation', 'isOptionalForOccupation'
        ]

        # Extraer solo uri y title
        filtered_links = {
            key: [
                {
                    'uri': item.get('uri'),
                    'title': item.get('title')
                }
                for item in data.get('_links', {}).get(key, [])
            ]
            for key in link_keys
        }

        # Extraer ancestors con uri y título
        raw_ancestors = data.get('_embedded', {}).get('ancestors', [])
        filtered_ancestors = [
            {
                'uri': item.get('_links', {}).get('self', {}).get('uri'),
                'title': item.get('title')
            }
            for item in raw_ancestors
        ]

        # Compilar resultado esencial
        result = {
            "className": data.get("className"),
            "uri": data.get("uri"),
            "title": data.get("title"),
            "preferredLabel_es": data.get("preferredLabel", {}).get(language),
            "preferredLabel_en": data.get("preferredLabel",{}).get("en"),
            "description": (
                data.get("description", {}).get(language, {}).get("literal")
                or data.get("description", {}).get("en", {}).get("literal")
            ),
            "_links": filtered_links,
            "_embedded": {
                "ancestors": filtered_ancestors,
                "title": data.get("_embedded", {}).get("title")
            }
        }

        return result

    else:
        return {
            "error": f"Error {response.status_code}: {response.text}",
            "uri": uri
        }


In [66]:
uri = "http://data.europa.eu/esco/skill/0e361e34-c563-4892-b9d3-873a6a4fef8a"
uri = "http://data.europa.eu/esco/skill/cd56a093-3400-4635-80bd-b9611dcf1542"
concept_info = get_esco_concept_data(uri, language="es")

for key, value in concept_info.items():
    print(f"{key}: {value}\n")

className: Skill

uri: http://data.europa.eu/esco/skill/cd56a093-3400-4635-80bd-b9611dcf1542

title: Gamemaker Studio

preferredLabel_es: Gamemaker Studio

preferredLabel_en: Gamemaker Studio

description: Motor de juego de plataforma transversal que está redactado en un lenguaje de programación Delphi y que consiste en entornos de desarrollo integrados y herramientas de diseño especializadas diseñadas para la iteración rápida de juegos informáticos creados por usuarios.

_links: {'hasSkillType': [{'uri': 'http://data.europa.eu/esco/skill-type/knowledge', 'title': 'conocimiento'}], 'broaderConcept': [], 'broaderHierarchyConcept': [{'uri': 'http://data.europa.eu/esco/isced-f/0211', 'title': 'técnicas audiovisuales y producción para medios de comunicación'}], 'narrowerSkill': [], 'narrowerConcept': [], 'isEssentialForOccupation': [], 'isOptionalForOccupation': []}

_embedded: {'ancestors': [{'uri': 'http://data.europa.eu/esco/skill/cd56a093-3400-4635-80bd-b9611dcf1542', 'title': 'Gamemak

### Codigo para recuperar las uris de todos los knowledge skills del esco

In [56]:
import requests

def get_knowledge_skills(language="es", limit=100, offset=0):
    """
    Fetch ESCO concepts from the 'skills' scheme and filter by skill type 'knowledge'.

    Returns:
        dict: {uri: title} of knowledge skills
        set: all skill type URIs encountered
    """
    url = "https://ec.europa.eu/esco/api/resource/skill"

    params = {
        "isInScheme": "http://data.europa.eu/esco/concept-scheme/skills",
        "language": language,
        "limit": limit,
        "offset": offset,
        "view": "full"  # ensures we get full details including _links
    }

    headers = {
        "Accept": "application/json"
    }

    response = requests.get(url, params=params, headers=headers)

    if response.status_code != 200:
        print(f"Error {response.status_code}: {response.text}")
        return {}, set()

    response_data = response.json()
    embedded = response_data.get('_embedded', {})
    knowledge_concepts = {}
    all_skill_types = set()

    for concept_uri, concept_data in embedded.items():
        if not isinstance(concept_data, dict):
            continue

        skill_types = concept_data.get('_links', {}).get('hasSkillType', [])

        for st in skill_types:
            if 'uri' in st:
                all_skill_types.add(st['uri'])

        is_knowledge = any(
            link.get('uri') == 'http://data.europa.eu/esco/skill-type/knowledge'
            for link in skill_types
        )

        if is_knowledge:
            title = concept_data.get("title")
            knowledge_concepts[concept_uri] = title

    return knowledge_concepts, all_skill_types


# === RUN THE BATCH PROCESS ===
all_knowledge_skills = {}
all_skills_types = set()
limit = 100
page = 0
max_pages = 200  # ← limit * pages = 20,000 max items

while page < max_pages:
    skill_batch, skill_types = get_knowledge_skills(limit=limit, offset=page)

    all_knowledge_skills.update(skill_batch)
    all_skills_types.update(skill_types)

    print(f"✅ Page {page} fetched. Total knowledge skills: {len(all_knowledge_skills)}")
    page += 1

# Optionally: Save results
import json
with open("knowledge_skills.json", "w", encoding="utf-8") as f:
    json.dump(all_knowledge_skills, f, ensure_ascii=False, indent=2)

print(f"\n🎉 Done. Total knowledge skills collected: {len(all_knowledge_skills)}")


✅ Page 0 fetched. Total knowledge skills: 20
✅ Page 1 fetched. Total knowledge skills: 33
✅ Page 2 fetched. Total knowledge skills: 51
✅ Page 3 fetched. Total knowledge skills: 70
✅ Page 4 fetched. Total knowledge skills: 86
✅ Page 5 fetched. Total knowledge skills: 107
✅ Page 6 fetched. Total knowledge skills: 122
✅ Page 7 fetched. Total knowledge skills: 144
✅ Page 8 fetched. Total knowledge skills: 162
✅ Page 9 fetched. Total knowledge skills: 182
✅ Page 10 fetched. Total knowledge skills: 193
✅ Page 11 fetched. Total knowledge skills: 211
✅ Page 12 fetched. Total knowledge skills: 229
✅ Page 13 fetched. Total knowledge skills: 249
✅ Page 14 fetched. Total knowledge skills: 260
✅ Page 15 fetched. Total knowledge skills: 274
✅ Page 16 fetched. Total knowledge skills: 287
✅ Page 17 fetched. Total knowledge skills: 297
✅ Page 18 fetched. Total knowledge skills: 315
✅ Page 19 fetched. Total knowledge skills: 328
✅ Page 20 fetched. Total knowledge skills: 352
✅ Page 21 fetched. Total kno

### Codigo para recuperar toda la información de los knowledge skills del esco

In [3]:
import json
import os
import requests
import time
from datetime import datetime

# === Function to fetch ESCO concept data ===
def get_esco_concept_data(uri: str, language: str = "es") -> dict:
    url = "https://ec.europa.eu/esco/api/resource/skill"
    params = {"uri": uri, "language": language}
    headers = {"Accept": "application/json"}

    response = requests.get(url, params=params, headers=headers)
    if response.status_code == 200:
        data = response.json()

        link_keys = [
            'hasSkillType', 'broaderConcept', 'broaderHierarchyConcept',
            'narrowerSkill', 'narrowerConcept',
            'isEssentialForOccupation', 'isOptionalForOccupation'
        ]

        filtered_links = {
            key: [
                {'uri': item.get('uri'), 'title': item.get('title')}
                for item in data.get('_links', {}).get(key, [])
            ]
            for key in link_keys
        }

        occupations = []
        for occ_key in ["isEssentialForOccupation", "isOptionalForOccupation"]:
            for occ in data.get('_links', {}).get(occ_key, []):
                occupations.append({
                    "uri": occ.get("uri"),
                    "title": occ.get("title"),
                    "type": "essential" if occ_key == "isEssentialForOccupation" else "optional"
                })

        raw_ancestors = data.get('_embedded', {}).get('ancestors', [])
        filtered_ancestors = [
            {
                'uri': item.get('_links', {}).get('self', {}).get('uri'),
                'title': item.get('title')
            }
            for item in raw_ancestors
        ]
    
                # Define root knowledge URI
        ROOT_URI = "http://data.europa.eu/esco/skill/K"
        
        # Find index of the root (if it exists)
        root_index = next((i for i, item in enumerate(filtered_ancestors) if item["uri"] == ROOT_URI), None)
        
        if root_index is not None:
            # Count before and after root, including the root itself (+1)
            levels = [root_index + 1, len(filtered_ancestors) - root_index]
            level = max(levels)
        else:
            # If root not found, fallback to just length
            levels = [len(filtered_ancestors)]
            level = len(filtered_ancestors)
        
        parent_knowledge = filtered_ancestors[1] if len(filtered_ancestors) > 1 else None
        fetched_at = datetime.utcnow().strftime("%Y-%m-%d")
        energy = len(occupations)

        return {
            "className": data.get("className"),
            "uri": data.get("uri"),
            "title": data.get("title"),
            "preferredLabel_es": data.get("preferredLabel", {}).get(language),
            "preferredLabel_en": data.get("preferredLabel", {}).get("en"),
            "description": (
                data.get("description", {}).get(language, {}).get("literal")
                or data.get("description", {}).get("en", {}).get("literal")
            ),
            "occupations": occupations,
            "energy": energy,
            "level": level,
            "levels": levels,
            "parent": parent_knowledge,
            "_links": filtered_links,
            "_embedded": {
                "ancestors": filtered_ancestors,
                "title": data.get("_embedded", {}).get("title")
            },
            "fetchedAt": fetched_at
        }
    else:
        return {
            "error": f"Error {response.status_code}: {response.text}",
            "uri": uri
        }

# === Paths ===
input_path = os.path.join("..", "output", "knowledge_skills.json")
output_path = os.path.join("..", "output", "knowledge_skills_full.json")
error_log_path = os.path.join("..", "output", "knowledge_skills_errors.json")

# === Load base knowledge skills ===
with open(input_path, "r", encoding="utf-8") as f:
    base_knowledge = json.load(f)

# === Enrich data ===
enriched_knowledge = {}
error_log = {}

for i, (uri, title) in enumerate(base_knowledge.items(), 1):
    print(f"🔍 Fetching {i}/{len(base_knowledge)}: {title}")
    concept_data = get_esco_concept_data(uri)

    if "error" in concept_data:
        print(f"⚠️  Failed to fetch {title}: {concept_data['error']}")
        error_log[uri] = {
            "title": title,
            "error": concept_data["error"]
        }
        continue

    enriched_knowledge[uri] = concept_data

    # Periodic backup every 100 entries
    if i % 100 == 0:
        print(f"💾 Backup at item {i}")
        with open(output_path, "w", encoding="utf-8") as f_out:
            json.dump(enriched_knowledge, f_out, ensure_ascii=False, indent=2)
        with open(error_log_path, "w", encoding="utf-8") as f_err:
            json.dump(error_log, f_err, ensure_ascii=False, indent=2)

    time.sleep(0.3)  # Be kind to the API

# === Final Save ===
with open(output_path, "w", encoding="utf-8") as f:
    json.dump(enriched_knowledge, f, ensure_ascii=False, indent=2)

with open(error_log_path, "w", encoding="utf-8") as f:
    json.dump(error_log, f, ensure_ascii=False, indent=2)

print(f"\n✅ Saved enriched knowledge skills to: {output_path}")
print(f"⚠️ Logged {len(error_log)} errors to: {error_log_path}")



🔍 Fetching 1/2673: estrategia de colaboración masiva
🔍 Fetching 2/2673: Gamemaker Studio
🔍 Fetching 3/2673: sistemas de asistencia visual en los aeropuertos
🔍 Fetching 4/2673: anclas utilizadas en el transporte marítimo
🔍 Fetching 5/2673: proceso de impresión serigráfica
🔍 Fetching 6/2673: comercio internacional
🔍 Fetching 7/2673: SPARK
🔍 Fetching 8/2673: gestos con las manos
🔍 Fetching 9/2673: electrofotografía
🔍 Fetching 10/2673: estado de las carreteras locales
🔍 Fetching 11/2673: MySQL
🔍 Fetching 12/2673: reglas de vuelo visual
🔍 Fetching 13/2673: ABBYY FineReader
🔍 Fetching 14/2673: Informatica PowerCenter
🔍 Fetching 15/2673: masterización de audio
🔍 Fetching 16/2673: políticas organizativas
🔍 Fetching 17/2673: software de fabricación asistida por ordenador
🔍 Fetching 18/2673: seguridad alimentaria de la carne de caza silvestre
🔍 Fetching 19/2673: normas aplicables a las terminales de aeropuerto
🔍 Fetching 20/2673: programas de ofimática
🔍 Fetching 21/2673: procedimientos para la 

-----

### Codigo para recuperar todos los Skills

In [54]:
import requests
import json
import time

import requests
import json
import time

def get_all_skill_uris(language="es", limit=1000, output_file="all_skills.json"):
    """
    Fetch all ESCO skill URIs from the ESCO API and save to a JSON file.
    Treats offset as a page number (required by ESCO API).
    
    Args:
        language (str): Language code.
        limit (int): Results per page.
        output_file (str): Output JSON filename.

    Returns:
        dict: {uri: title} of all skills.
    """
    url = "https://ec.europa.eu/esco/api/resource/skill"
    page = 0
    all_skills = {}

    headers = {"Accept": "application/json;charset=UTF-8"}

    # First request to get total count
    params = {
        "isInScheme": "http://data.europa.eu/esco/concept-scheme/skills",
        "language": language,
        "limit": limit,
        "offset": page,
        "view": "full"
    }

    response = requests.get(url, params=params, headers=headers)
    if response.status_code != 200:
        print(f"❌ Error {response.status_code}: {response.text}")
        return {}

    data = response.json()
    total = data.get("total", 0)
    print(f"🔢 Total skills reported by API: {total}")

    while page * limit < total:
        params["offset"] = page
        response = requests.get(url, params=params, headers=headers)

        if response.status_code != 200:
            print(f"❌ Error {response.status_code} at page {page}: {response.text}")
            break

        embedded = response.json().get("_embedded", {})
        print(f"📦 Page {page}: Returned {len(embedded)} items")

        if not embedded:
            print(f"⛔ No more data at page {page}. Ending.")
            break

        for uri, concept_data in embedded.items():
            title = concept_data.get("title")
            if uri and title:
                all_skills[uri] = title

        page += 1
        print(f"✅ Total collected so far: {len(all_skills)}")
        time.sleep(1.5)  # Be gentle with API

    # Save to JSON
    with open(output_file, "w", encoding="utf-8") as f:
        json.dump(all_skills, f, ensure_ascii=False, indent=2)

    print(f"\n💾 Saved {len(all_skills)} skills to {output_file}")
    return all_skills



In [55]:
get_all_skill_uris()

🔢 Total skills reported by API: 14158
📦 Page 0: Returned 1000 items
✅ Total collected so far: 1000
📦 Page 1: Returned 1000 items
✅ Total collected so far: 2000
📦 Page 2: Returned 1000 items
✅ Total collected so far: 3000
📦 Page 3: Returned 1000 items
✅ Total collected so far: 4000
📦 Page 4: Returned 1000 items
✅ Total collected so far: 5000
❌ Error 500 at page 5: {"logref":"RemoteSolrException","status":500,"message":"Error from server at http://tmldb00699.cc.cec.eu.int:1061/esco-solr: Expected mime type application/octet-stream but got text/html. <!doctype html><html lang=\"en\"><head><title>HTTP Status 400 – Bad Request</title><style type=\"text/css\">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 400 – Bad Request</h1><hr class=\"line\" /><p><

{'http://data.europa.eu/esco/skill/3164ecc3-1ccc-43c6-8d7e-9e9e80b46ae4': 'diagnosticar la muerte cerebral',
 'http://data.europa.eu/esco/skill/3f8d4e8f-17cc-4447-9325-fcfe4306c238': 'describir las propias aspiraciones artísticas en relación con las tendencias artísticas',
 'http://data.europa.eu/esco/skill/69b8fc3e-b896-4c14-b71b-ee7819dfd111': 'operar cargadoras transportadoras',
 'http://data.europa.eu/esco/skill/97756f18-bf3d-4ad8-be45-fb88dce250f6': 'limpiar zonas de vinificación',
 'http://data.europa.eu/esco/skill/a94ef591-d0c5-4097-98ff-6a7f50ea72e8': 'controlar registros de perforación',
 'http://data.europa.eu/esco/skill/694dc996-52f3-4afa-a802-672e19f061b7': 'estrategia de colaboración masiva',
 'http://data.europa.eu/esco/skill/09be5c9a-aaa4-4fe7-a701-982f492e93ac': 'investigar incidentes relativos a los animales',
 'http://data.europa.eu/esco/skill/8316979b-df5a-4dc7-98ef-61ec6a10df84': 'realizar investigaciones cuantitativas',
 'http://data.europa.eu/esco/skill/113bbd65-c