# Plot number of Zenodo links over time

In this notebook, we will analyze the CSV files located in the `download_statistics` folder. Each file is named using a date (e.g., `20240628.csv`), which indicates when the data was created. We will plot how the number of records in these files changes over time.

## Step 1: Import required libraries

In [1]:
import os
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore", category=UserWarning, module="matplotlib")

## Step 2: Define the folder containing the CSV files

In [2]:
data_folder = "download_statistics"

# Create the folder if it doesn't exist
if not os.path.exists(data_folder):
    os.makedirs(data_folder)
    print(f"Folder '{data_folder}' was created. Please add CSV files to this folder and re-run the notebook.")

## Step 3: List all CSV files and extract their dates

In [3]:
file_list = [f for f in os.listdir(data_folder) if f.endswith(".csv")]

# Check if no files are found
if not file_list:
    print("No CSV files found in the 'download_statistics' folder. Please add some files and re-run the notebook.")
    file_list = []

dates, record_counts = [], []

for file in file_list:
    date = file.replace(".csv", "")
    dates.append(pd.to_datetime(date))

    file_path = os.path.join(data_folder, file)
    df = pd.read_csv(file_path)
    record_counts.append(len(df))

No CSV files found in the 'download_statistics' folder. Please add some files and re-run the notebook.


## Step 4: Create a DataFrame to hold the results

In [4]:
if file_list:
    results = pd.DataFrame({
        "Date": dates,
        "Record_Count": record_counts
    }).sort_values(by="Date")

    results.reset_index(drop=True, inplace=True)
    results.to_csv("zenodo_links_over_time.csv", index=False)

## Step 5: Plot the number of records over time

In [5]:
if file_list:
    plt.figure(figsize=(10, 6))
    plt.plot(results["Date"], results["Record_Count"], marker="o", linestyle="-", color="b")
    plt.xlabel("Date")
    plt.ylabel("Number of Records")
    plt.title("Number of Zenodo Links Over Time")
    plt.grid(True)
    plt.tight_layout()
    plt.savefig("zenodo_links_over_time.png")