This script automates the process of exporting full tables from a Datasette instance by downloading them in CSV format using the `_stream=on` feature, which bypasses default row limits and retrieves the entire dataset. It is particularly useful for obtaining complete datasets from the [planning.data.gov.uk](https://datasette.planning.data.gov.uk/) performance database.

### Detailed Overview:

1. **Command-Line Interface**:
   The script uses `argparse` to accept a required `--output-dir` argument, specifying where the CSV files will be saved.

2. **Datasette Table Downloads**:
   A dictionary of table names and corresponding Datasette URLs is defined in the script (`tables`). For each entry:
   - The full CSV URL is constructed by appending `.csv?_stream=on` to the base URL.
   - The script attempts to load the data using `pandas.read_csv()` and saves it to the specified directory under the given table name.

3. **Error Handling**:
   If any request or file operation fails (e.g. network issues, bad URLs), an error message is printed, allowing the user to identify which tables couldn't be downloaded.

4. **Use Case**:
   This is useful when exporting tables that are too large to download via the web UI or API defaults (which often cap rows at 1000), ensuring full data extraction for offline analysis or archiving.

In [1]:
import pandas as pd
import os
import argparse

def full_datasette_table(tables, output_dir):
    """
    Downloads full tables from Datasette in CSV format using streaming.

    Args:
        tables (dict): A dictionary where keys are table names and values are their Datasette URLs.
        output_dir (str): The directory to save the exported CSV files.
    """
    os.makedirs(output_dir, exist_ok=True)  # Ensure output directory exists

    for name, url in tables.items():
        full_url = f"{url}.csv?_stream=on"  # Enable full streaming of rows
        try:
            df = pd.read_csv(full_url)  # Load full dataset
            csv_name = f"{name}.csv"
            save_path = os.path.join(output_dir, csv_name)
            df.to_csv(save_path, index=False)  # Save to CSV without index
            print(f"Saved: {save_path}")
        except Exception as e:
            print(f"[ERROR] Failed to fetch {name}: {e}")

def parse_args():
    """
    Parses command-line arguments for specifying the output directory.

    Returns:
        argparse.Namespace: Parsed arguments containing the output directory path.
    """
    parser = argparse.ArgumentParser(description="Datasette batch exporter")
    parser.add_argument(
        "--output-dir",
        type=str,
        required=True,
        help="Directory to save exported CSVs"
    )
    return parser.parse_args()

if __name__ == "__main__":
    # Parse command-line arguments
    #args = parse_args()

    # Dictionary of table names and their Datasette URLs
    tables = {
        "endpoint_dataset_issue_type_summary":
            "https://datasette.planning.data.gov.uk/performance/endpoint_dataset_issue_type_summary"
    }

    # Run export
    output_dir = "C:/Users/DanielGodden/Documents/MCHLG/collecting_and_managing_data"
    full_datasette_table(tables, output_dir)#args.output_dir)


Saved: C:/Users/DanielGodden/Documents/MCHLG/collecting_and_managing_data\endpoint_dataset_issue_type_summary.csv
