This code connects to the mapMECFS CKAN DataStore, retrieves data for a given resource_id, and returns it as a Pandas DataFrame.

CKAN is a platform for managing and publishing data. The DataStore is a feature that allows querying data directly via an API.

For more information on CKAN DataStore, visit: https://docs.ckan.org/en/latest/maintaining/datastore.html#

Here's what the code does:
1. Defines a function `get_resource_data` that:
- Takes a CKAN resource_id, a page_size (how many records to fetch per request), and your mapMECFS API key for authorization.
- Uses the mapMECFS CKAN DataStore API endpoint (`datastore_search`) to get data in batches (pagination) until no more records remain.
2. For each batch, it:
- Constructs a URL with the resource_id, limit (page_size), and offset (starting point for fetching records).
- Sends an HTTP GET request with the provided API key and a User-Agent header.
- Checks if the response is valid and contains JSON data.
- Checks if the returned JSON is successful and includes a 'records' field inside 'result'.
- Converts the list of records into a Pandas DataFrame.
- Keeps requesting the next batch by increasing the offset until no records remain.
3. Finally, it combines all pages of data into a single DataFrame and returns it.

<b>Note</b>:
- You must have a valid mapMECFS API key for the CKAN instance if the data is not publicly accessible.
- Adjust the resource_id and API key before running.
- Install required packages if you haven't already: `!pip install requests pandas`

In [None]:
pip install requests pandas

In [None]:
import requests
import pandas as pd

def get_resource_data(resource_id, page_size=100, api_key=None):
    """
    Fetch all records for a given CKAN resource_id from the DataStore.

    Parameters:
    - resource_id (str): The unique identifier of the CKAN resource.
    - page_size (int): Number of records to fetch per request. Defaults to 100.
    - api_key (str): Your CKAN API key if required for authorization.

    Returns:
    - combined_data (pd.DataFrame): A DataFrame containing all retrieved records.
      If no data is retrieved, returns an empty DataFrame.
    """

    # Check if an API key is provided if needed
    if api_key is None:
        raise ValueError("Please provide a valid API key if the data is not public.")

    offset = 0
    resource_records = []
    base_url = "https://mapmecfs.org/api/3/action/datastore_search"

    while True:
        # Prepare the parameters for the API request:
        # - resource_id: which dataset resource to fetch
        # - limit: how many records per batch
        # - offset: where to start fetching data from
        params = {
            "resource_id": resource_id,
            "limit": page_size,
            "offset": offset
        }

        # The headers contain:
        # - Authorization: The API key, which might be required depending on the dataset's access settings.
        # - User-Agent: A browser-like user-agent string to help ensure we get a proper response.
        headers = {
            "Authorization": api_key,
            "User-Agent": "Mozilla/5.0"
        }

        # Make the HTTP GET request to the DataStore API endpoint
        response = requests.get(base_url, headers=headers, params=params)

        # Check for a successful HTTP response and JSON content
        if response.status_code != 200 or "application/json" not in response.headers.get('Content-Type', ''):
            print(f"Failed to retrieve data for resource_id: {resource_id}")
            print("Status code:", response.status_code)
            print("Response content:\n", response.text)
            break

        # Parse the JSON response into a Python dictionary
        result_list = response.json()

        # 'success' is a field in CKAN responses indicating if the request succeeded
        if not result_list.get("success", False):
            print(f"Request not successful for resource_id: {resource_id}")
            print("Response content:\n", response.text)
            break

        # 'records' should be present inside 'result' if data is available
        result = result_list.get("result", {})
        if "records" not in result:
            print(f"No records field found for resource_id: {resource_id}")
            print("Response content:\n", response.text)
            break

        records = result["records"]

        # If no records are returned, it means we've reached the end of the data
        if len(records) == 0:
            break

        # Convert the records (list of dicts) into a Pandas DataFrame for easy data manipulation
        df = pd.DataFrame(records)
        # Add the resource_id to track which resource these records came from
        df["resource_id"] = resource_id

        # Store the DataFrame in our list to combine later
        resource_records.append(df)

        # Increase offset to fetch the next batch of records in the next loop iteration
        offset += page_size

    # Combine all fetched DataFrames into one
    if len(resource_records) > 0:
        combined_data = pd.concat(resource_records, ignore_index=True)
    else:
        combined_data = pd.DataFrame()
        print(f"No data retrieved from resource_id: {resource_id}")

    return combined_data

# Example usage:
# Replace <Insert API Key> with your actual CKAN API key if needed, or remove the api_key parameter if data is public.
resource_id = "fcac7a0d-2325-4cb8-9bee-f0ee3fb29435" #This is a random resource_id used for testing
api_key = "<Insert API Key>"
df = get_resource_data(resource_id, page_size=100, api_key=api_key)
df.head()