# Counting Slides from a YAML File
In this notebook, we will count the total number of slides from `.pptx` files listed in Zenodo records retrieved from a YAML file.

## Step 1: Load YAML File
We will start by loading the YAML file to extract the URLs.

In [1]:
import yaml
import os

# Load YML file
yaml_path = "./resources/nfdi4bioimage.yml"
if not os.path.exists(yaml_path):
    with open(yaml_path, "w") as f:
        f.write("urls: []\n")

with open(yaml_path, "r") as yml_file:
    data = yaml.safe_load(yml_file)

urls = data.get("urls", [])
urls

FileNotFoundError: [Errno 2] No such file or directory: './resources/nfdi4bioimage.yml'

## Step 2: Identify Zenodo URLs and Fetch Record Information via API
For URLs containing `https://zenodo.org`, retrieve the file details via the Zenodo API.

In [None]:
import requests

# Filter for Zenodo URLs
zenodo_urls = [url for url in urls if "https://zenodo.org" in url]

# Fetch record information
def get_zenodo_files(zenodo_url):
    record_id = zenodo_url.split("/")[-1]
    api_url = f"https://zenodo.org/api/records/{record_id}"
    response = requests.get(api_url)
    return response.json().get("files", [])

all_files = [get_zenodo_files(url) for url in zenodo_urls]
all_files

## Step 3: Filter `.pptx` Files and Download Them
Check the files in each Zenodo record and download those with `.pptx` extensions.

In [None]:
import os

# Ensure temp directory exists
os.makedirs("./temp", exist_ok=True)

# Filter and download .pptx files
pptx_files = []
for files in all_files:
    for file in files:
        if file["key"].endswith(".pptx"):
            response = requests.get(file["links"]["download"])
            file_path = f"./temp/{file['key']}"
            with open(file_path, "wb") as f:
                f.write(response.content)
            pptx_files.append(file_path)

pptx_files

## Step 4: Count Slides in Downloaded `.pptx` Files
Use the `python-pptx` library to open each `.pptx` file and count the slides.

In [None]:
from pptx import Presentation

# Count slides
def count_slides(file_path):
    presentation = Presentation(file_path)
    return len(presentation.slides)

total_slides = sum(count_slides(file_path) for file_path in pptx_files)
total_slides

## Step 5: Save the Result
Write the total count of slides to a text file for later reference.

In [None]:
# Save total count to file
with open("./temp/total_slides.txt", "w") as f:
    f.write(f"Total slides: {total_slides}\n")

## Cleanup
Remove all temporary `.pptx` files to clean up the workspace.

In [None]:
import shutil

shutil.rmtree("./temp")