# Plot Number of Zenodo Links Over Time

In this notebook, we analyze the CSV files in the `../download_statistics` folder. These files contain data, and their filenames represent dates. We'll extract the creation dates, count the records in each file, and plot the number of records over time.

In [1]:
# Import required libraries
import os
import pandas as pd
import matplotlib.pyplot as plt

## Verify and List CSV Files
Ensure the `../download_statistics` folder exists and list all CSV files within it.

In [2]:
# Folder path
folder_path = "../download_statistics"

# Check if folder exists
if not os.path.exists(folder_path):
    raise FileNotFoundError(f"Folder '{folder_path}' does not exist. Please create it and add CSV files.")

# List CSV files
csv_files = [f for f in os.listdir(folder_path) if f.endswith(".csv")]
dates = [os.path.splitext(f)[0] for f in csv_files]  # Extract date from filenames
dates.sort()  # Ensure chronological order
csv_files = [file for _, file in sorted(zip(dates, csv_files))]  # Sort files accordingly
dates.sort()  # Double-check sorting of dates

## Count Records in Each CSV File
Count how many rows (records) exist in each file and store this along with its corresponding date.

In [3]:
record_counts = []

for file in csv_files:
    file_path = os.path.join(folder_path, file)
    data = pd.read_csv(file_path)
    record_counts.append(len(data))

# Combine dates and record counts into a DataFrame
time_data = pd.DataFrame({"Date": pd.to_datetime(dates), "RecordCount": record_counts})

## Plot the Number of Records Over Time
Visualize the changes in the number of records over time.

In [4]:
plt.figure(figsize=(10, 6))
plt.plot(time_data["Date"], time_data["RecordCount"], marker="o", linestyle="-", color="b")
plt.title("Number of Zenodo Links Over Time")
plt.xlabel("Date")
plt.ylabel("Number of Records")
plt.grid()

# Save the plot to a PNG file
plot_path = "zenodo_links_over_time.png"
plt.savefig(plot_path)
plt.close()

## Save Processed Data to CSV
Export the table of dates and record counts to a CSV file for further analysis.

In [5]:
output_csv_path = "zenodo_links_over_time.csv"
time_data.to_csv(output_csv_path, index=False)