# Counting Slides from a YAML File
In this notebook, we will count the total number of slides from `.pptx` files listed in Zenodo records retrieved from a YAML file.

## Step 1: Load YAML File
We will start by loading the YAML file to extract the URLs.

In [3]:
import yaml
import os

yaml_path = "../resources/nfdi4bioimage.yml"

with open(yaml_path, "r", encoding="utf-8") as yml_file:
    data = yaml.safe_load(yml_file)

urls = []
records = data["resources"]

for r in records:
    r_urls = r["url"]
    if not isinstance(r_urls, list):
        r_urls = [r_urls]
    for u in r_urls:
        urls.append(u)

len(urls)

755

In [4]:
urls[:3]

['https://focalplane.biologists.com/2023/07/26/sharing-your-poster-on-figshare/',
 'https://biapol.github.io/blog/marcelo_zoccoler/omero_scripts/readme.html',
 'https://biapol.github.io/blog/robert_haase/browsing_idr/readme.html']

## Step 2: Identify Zenodo URLs and Fetch Record Information via API
For URLs containing `https://zenodo.org`, retrieve the file details via the Zenodo API.

In [5]:
import requests

# Filter for Zenodo URLs
zenodo_urls = [url for url in urls if "https://zenodo.org" in url]
len(zenodo_urls), zenodo_urls[0]

(183, 'https://zenodo.org/record/4071471')

In [6]:
# Fetch record information
def get_zenodo_files(zenodo_url):
    record_id = zenodo_url.split("/")[-1]
    api_url = f"https://zenodo.org/api/records/{record_id}"
    response = requests.get(api_url)
    return response.json().get("files", [])

all_files = [get_zenodo_files(url) for url in zenodo_urls]
len(all_files)

183

In [7]:
all_files[:3]

[[{'id': '9c5eb1a7-1ad0-432a-8349-09d7a6f336aa',
   'key': 'Train-the-Trainer_Concept_V3.pdf',
   'size': 17495119,
   'checksum': 'md5:f12de829b72f268c5e65627562c175ba',
   'links': {'self': 'https://zenodo.org/api/records/4071471/files/Train-the-Trainer_Concept_V3.pdf/content'}},
  {'id': '84080478-da62-4d89-830c-27064ec5647a',
   'key': 'WorkingMaterials_FDMentor_V3.zip',
   'size': 29100571,
   'checksum': 'md5:08e28db87c659d1df36b454a53c21a9e',
   'links': {'self': 'https://zenodo.org/api/records/4071471/files/WorkingMaterials_FDMentor_V3.zip/content'}}],
 [{'id': 'b6d0b3e8-895b-4934-a8cd-1f1748b5a0ae',
   'key': 'Poster-Efficiently-starting-institutional-RDM.pdf',
   'size': 433244,
   'checksum': 'md5:a1399249c4b1368107959c5cc897ae2d',
   'links': {'self': 'https://zenodo.org/api/records/3490058/files/Poster-Efficiently-starting-institutional-RDM.pdf/content'}}],
 [{'id': 'b362d45e-35da-461a-a4d4-c42d1ca672d7',
   'key': 'Module_03_RDM_Knowledge_Blocks.pdf',
   'size': 15471950,

## Step 3: Filter `.pptx` Files and Download Them
Check the files in each Zenodo record and download those with `.pptx` extensions.

In [9]:
import os

# Ensure temp directory exists
os.makedirs("./temp", exist_ok=True)

# Filter and download .pptx files
pptx_files = []
for files in all_files:
    for file in files:
        if file["key"].endswith(".pptx"):
            response = requests.get(file["links"]["self"])
            file_path = f"./temp/{file['key']}"
            with open(file_path, "wb") as f:
                f.write(response.content)
            pptx_files.append(file_path)

pptx_files

['./temp/202310_GENERAL_OMERO_Material_01_WhatIsOMERO.pptx',
 './temp/202310_GENERAL_OMERO_Material_06-1_DataSearch.pptx',
 './temp/202310_GENERAL_OMERO_Material_03_OMERO_Explained.pptx',
 './temp/202310_GENERAL_OMERO_Material_07-0_Metadata.pptx',
 './temp/202310_GENERAL_OMERO_Material_04_UserGroups.pptx',
 './temp/202310_GENERAL_OMERO_Material_09_More.pptx',
 './temp/202310_GENERAL_OMERO_Material_06-0_DataOrganization.pptx',
 './temp/202310_GENERAL_OMERO_Material_07-1_Metadata_Tags.pptx',
 './temp/202310_GENERAL_OMERO_Material_05_UploadingData.pptx',
 './temp/202310_GENERAL_OMERO_Material_07-3_Metadata_Ontologies.pptx',
 './temp/202310_GENERAL_OMERO_Material_02_ConnectToOMERO.pptx',
 './temp/202310_GENERAL_OMERO_Material_07-2_Metadata_KeyValuePairs.pptx',
 './temp/202310_GENERAL_OMERO_Material_08_OMERO-Fiji.pptx',
 './temp/Bio-Image_Data_Strudel_TU-Dresden_TP_Workshop_2023.pptx',
 './temp/LLMs_BIA_v3.pptx',
 './temp/Cultivating Open Training_v2.pptx',
 './temp/DataWeek_git_de.pptx',
 

## Step 4: Count Slides in Downloaded `.pptx` Files
Use the `python-pptx` library to open each `.pptx` file and count the slides.

In [10]:
from pptx import Presentation

# Count slides
def count_slides(file_path):
    presentation = Presentation(file_path)
    return len(presentation.slides)

total_slides = sum(count_slides(file_path) for file_path in pptx_files)
total_slides

2371