# Plot Number of Zenodo Links Over Time

This notebook analyzes the CSV files in the `../download_statistics` folder. These files contain data, and their filenames represent dates. We'll extract the dates, count the records in each file, and visualize the number of records over time.

In [1]:
# Import required libraries
import os
import pandas as pd
import matplotlib.pyplot as plt

## Check the existence of the folder
Ensure the `../download_statistics` folder exists. If it doesn't, raise an error.

In [2]:
# Folder path
folder_path = "../download_statistics"

# Verify folder existence
if not os.path.exists(folder_path):
    raise FileNotFoundError(f"Folder '{folder_path}' does not exist. Please create it and add CSV files.")

## List all CSV files and sort them by date
Get the list of all files in the folder, extract the date from each filename, and sort filenames by their corresponding dates.

In [3]:
# Get and sort CSV files
csv_files = [f for f in os.listdir(folder_path) if f.endswith(".csv")]
dates = [os.path.splitext(f)[0] for f in csv_files]
csv_files = [file for _, file in sorted(zip(dates, csv_files))]  # Sort files by extracted dates
dates.sort()  # Sort extracted dates

## Count the number of records in each file
Read each file in the sorted list, count its rows, and store the results in a DataFrame.

In [4]:
# Count records for each file
record_counts = []

for file in csv_files:
    file_path = os.path.join(folder_path, file)
    data = pd.read_csv(file_path)
    record_counts.append(len(data))

# Combine dates with record counts into a DataFrame
time_data = pd.DataFrame({"Date": pd.to_datetime(dates), "RecordCount": record_counts})

## Save the processed data to a CSV file
Store the processed `time_data` DataFrame in a CSV file for future reference.

In [5]:
# Save data to CSV
output_csv_path = "zenodo_links_over_time.csv"
time_data.to_csv(output_csv_path, index=False)

## Plot the number of records over time
Visualize the record counts over time as a line plot.

In [6]:
# Plot data
plt.figure(figsize=(10, 6))
plt.plot(time_data["Date"], time_data["RecordCount"], marker="o", color="b")
plt.title("Number of Zenodo Links Over Time")
plt.xlabel("Date")
plt.ylabel("Number of Records")
plt.grid()

# Save the plot as a PNG file
plot_path = "zenodo_links_over_time.png"
plt.savefig(plot_path)
plt.close()