# Plot number of zenodo links over time

This notebook processes CSV files from the `download_statistics` folder. Each file represents a particular date and contains certain records. We will plot how the number of records changes over time.

## Step 1: Import necessary libraries
We use `pandas` for data processing and `matplotlib` for plotting.

In [1]:
import os
import pandas as pd
import matplotlib.pyplot as plt

## Step 2: Define the path to the `download_statistics` folder
Set the relative path where the files are located.

In [2]:
# Relative path to the data folder
data_folder = '../download_statistics'

## Step 3: Get a list of all `.csv` files in the folder
Extract the filenames without extensions as the dates.

In [3]:
# List all CSV files in the folder
file_names = [f for f in os.listdir(data_folder) if f.endswith('.csv')]

# Extract dates from filenames
dates = [f.split('.')[0] for f in file_names]
print("Found files:", file_names)

Found files: ['20241112.csv', '20250415.csv', '20250107.csv', '20241022.csv', '20250311.csv', '20250128.csv', '20250520.csv', '20241001.csv', '20250211.csv', '20250225.csv', '20250325.csv', '20240628.csv', '20250218.csv', '20250527.csv', '20241210.csv', '20240809.csv', '20250408.csv', '20250114.csv', '20250429.csv', '20250318.csv', '20241105.csv', '20240826.csv', '20250204.csv', '20250506.csv', '20250401.csv', '20241015.csv', '20250610.csv', '20241224.csv', '20241203.csv', '20250603.csv', '20241119.csv', '20241217.csv', '20240910.csv', '20241029.csv', '20240924.csv', '20241008.csv', '20250121.csv', '20241231.csv', '20240711.csv', '20250422.csv', '20250513.csv', '20250304.csv', '20241126.csv', '20240917.csv', '20240903.csv']


## Step 4: Load record counts for each file
Read the CSV files and count the number of rows in each.

In [4]:
record_counts = []

for file, date in zip(file_names, dates):
    file_path = os.path.join(data_folder, file)
    df = pd.read_csv(file_path)
    record_counts.append({'date': pd.to_datetime(date, format='%Y%m%d'), 'count': len(df)})

record_counts_df = pd.DataFrame(record_counts).sort_values(by='date')
print(record_counts_df.head())

         date  count
11 2024-06-28     34
38 2024-07-11     34
15 2024-08-09     37
21 2024-08-26     40
44 2024-09-03     43


## Step 5: Plot the number of records over time
Create a line plot to visualize how the number of records changes over time and save the plot to a `.png` file.

In [5]:
plt.figure(figsize=(8, 6))
plt.plot(record_counts_df['date'], record_counts_df['count'], marker='o', linestyle='-', color='b')
plt.title('Number of Zenodo Links Over Time')
plt.xlabel('Date')
plt.ylabel('Record Count')
plt.grid()
plt.tight_layout()

# Save the plot
output_plot_path = os.path.join('..', 'download_statistics', 'zenodo_links_over_time.png')
plt.savefig(output_plot_path)
plt.close()

print(f"Plot saved to {output_plot_path}")

Plot saved to ../download_statistics/zenodo_links_over_time.png
