# Story protector
This notebook crawls the items, maps, dashboards, scenes, within a story and delete protects those items and their content provided it is within your org.

## How to run
1. Provide the `itemId` of your story to the `story_id` parameter below.
2. Configure `delete_protect` to set whether you would like to apply **delete protection** to the story and all of the content items found within it. **True** = protect items, **False** = leave unprotected.
3. Configure `share` to set whether you would like to perform a bulk update of the sharing permissions for the story and all of its content.
4. If `share` is set to **True**, provide a sharing level 'private', 'org', or 'public'
5. Configure `agoNotebook` == `False` if you are running this script outside of ArcGIS Online
6. For easier viewing of the results, click 'View' > 'Collapse All Code' in the menu bar above. 
7. Once parameters have been configured, click 'Run' > 'Run All Cells' in the menu bar above.
8. Scroll down in the notebook and inspect the results.


In [None]:
## These are the input parameters
story_id = '' # <-- Paste your story itemId here
delete_protect = True # <- toggle the delete protection ON (True) or OFF (False)
## If the `share` setting below is False then this setting won't be configured and the `share_level` will also be ignored.
share = False # <- if you want to bulk share the content set this to True otherwise, False
share_level = 'public' # <- set this to ['private', 'org', or 'public']
agoNotebook = True # <- set this to False if running this Notebook outside of ArcGIS Online

## Script setup
These are functions that do smaller tasks within the main script. For instance, some crawl specific items like dashboards or webmaps and other crawl nested group layers within a webmap.

Storing them here is just easier and makes bits of code re-usable.

### Import the packages

In [None]:
from arcgis.gis import GIS
from arcgis.gis import Item
import re # import regex
import pandas as pd
from typing import List, Set, Union

# Set Pandas dataframe display options
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_columns',500)

### Authenticate with ArcGIS Online

If you are running this notebook outside of ArcGIS Online you will need to log in.
- An easy way to do that is with the Python keyring module.
- (If the keyring module is not installed, install it from a Python prompt >>> pip install keyring)
- Before running this cell, open a command line window on your machine and run the command:
  - python -m keyring set system <your_ago_username> 
  - if using Windows Powershell, use ./python -m keyring set system <your_ago_username>
- You will be prompted to enter your password
- When you hit Enter/Return the password will be saved to your local credential store

We then retrieve the password with the code in the cell below

In [None]:
# Define the GIS
if agoNotebook == False:
    import keyring
    service_name = "system" # Use the default local credential store
    success = False # Set initial state

    # Ask for the username
    while success == False:
        username_for_keyring = input("Enter your ArcGIS Online username:") # If you are using VS Code, the text input dialog box appears at the top of the window
        # Get the credential object
        credential = keyring.get_credential(service_name, username_for_keyring)
        # Check if the username is in the credential store
        if credential is None:
            print(f"'{username_for_keyring}' is not in the local system's credential store. Try another username.")
        # Retrieve the password, login and set the GIS portal
        else:
            password_from_keyring = keyring.get_password("system", username_for_keyring)
            portal_url = 'https://www.arcgis.com'  
            gis = GIS(portal_url, username=username_for_keyring, password=password_from_keyring)
            success = True
            # Print a success message with username and user's organization role
            print("Successfully logged in as: " + gis.properties.user.username, "(role: " + gis.properties.user.role + ")")
else:
    gis = GIS("home")

### Helper functions

In [None]:
# Empty container to eventually hold all of the items found within the story
result_list = []

# Define a crawler helper to fetch the item info
def getResourceInfo(resourceId, itemList):
    query = f"id: {resourceId}"
    resource = gis.content.advanced_search(query=query, max_items=-1, as_dict=True)['results']
    if len(resource) > 0:
        itemList += resource

# Define a crawler helper to fetch the item info
def query_found_items(found_items, item_list):
    for item_id in found_items:
        query = f"id: {item_id}"
        item_results = gis.content.advanced_search(query=query, max_items=-1, as_dict=True)['results']
        if len(item_results) > 0:
            item_list += item_results
    return item_list
        

def get_item_data(item: Item):
    """
    Fetches data for a given ArcGIS item, handling different resource types.

    Args:
        item (Item): The ArcGIS item to fetch data for.

    Returns:
        tuple: A tuple containing the item data and any related data.
    """
    item_data = item.get_data(try_json=True)
    if item.type in ["StoryMap", "StoryMap Theme"]:
        # Should only be relevant to StoryMaps
        try:
            resources = [resource["resource"] for resource in item.resources.list()]
            has_published_data = "published_data.json" in resources
            draft_id = None
            for keyword in item.typeKeywords:
                if keyword.startswith("smdraftresourceid"):
                    draft_id = keyword.split(":")[1]
            if has_published_data and not draft_id:
                return (item.resources.get("published_data.json", try_json=True), None)
            elif draft_id and not has_published_data:
                return (item.resources.get(f"{draft_id}", try_json=True), None)
            elif draft_id and has_published_data:
                return (
                    item.resources.get(f"{draft_id}", try_json=True),
                    item.resources.get("published_data.json", try_json=True),
                )
            else:
                return (item.resources.get("draft.json", try_json=True), None)
        except Exception as e:
            return (None, None)
    return (item_data, None)

def find_all_possible_ids(json_string: str):
    """
    Extracts all possible item IDs from a JSON string using regex.

    Args:
        json_string (str): The JSON string to search for IDs.

    Returns:
        list: A list of found item IDs.
    """
    return re.findall(r"[\"\'\/]([a-zA-Z0-9]{32})[\"\'\/]", json_string)
    
def get_related_items_for_id(
    gis_con: GIS,
    item_id: str,
    relation_path: List[str] = [],
    relations_in_process: List[str] = []
):
    """
    Fetches related items for a given item ID and updates the related items DataFrame.
    This function attempts to fetch an item by its ID up to three times. If successful, it processes the item to find its related items and updates the provided DataFrames accordingly. It also handles cyclic dependencies and ensures that items are not processed multiple times.
    Args:
        gis_con (GIS): The GIS connection object.
        item_id (str): The ID of the item to fetch and process.
        related_items_df (pd.DataFrame): DataFrame to store information about related items.
        missed_items_df (pd.DataFrame): DataFrame to store information about items that could not be fetched.
        main_ancestors (Set[str]): Set to store the IDs of main ancestor items.
        base_ancestor (Union[Item, None], optional): The base ancestor item. Defaults to None.
        relation_path (List[str], optional): List to track the relation path of items. Defaults to an empty list.
        relations_in_process (List[str], optional): List to track items that are currently being processed. Defaults to an empty list.
    Returns:
        related_items
    """

    valid_item = None
    # Attempt to fetch the item up to 3 times
    for tries in range(3):
        try:
            valid_item = Item(gis_con, item_id)
            break  # Exit loop on successful fetch
        except Exception as e:
            # print(f"Error fetching item {item_id}: {e}. Retrying ({tries+1}/3)...")
            time.sleep(1)  # Adding delay before retry
            # if tries == 2:
            #     missed_items_df.loc[uuid.uuid4()] = [item_id, None, None, str(e)]
    # Copy the current relation path for further processing
    new_relation_path = relation_path.copy()
    
    # Only add the valid item's ID to the relation path if not handling the main ancestor
    if valid_item:
        new_relation_path.append(valid_item.itemid)
    # If valid_item was successfully fetched, and all previous conditions are met, proceed to fetch related items for this item
    if valid_item:
        items_related_to_valid_item = set()
        relations_in_process.append(valid_item.itemid)
        valid_item_data = get_item_data(valid_item)
        # If the first part of the fetched data is not empty
        if valid_item_data[0] is not None or valid_item_data[0] != {}:
            related_json_string = str(valid_item_data[0])
            related_ids = find_all_possible_ids(related_json_string)
            [items_related_to_valid_item.add(related_id) for related_id in related_ids]
            related_items = list(items_related_to_valid_item)
        # If the second part of the fetched data is not empty (only relevant to StoryMaps), and considers draft related items
        if valid_item_data[1] is not None or valid_item_data[1] != {}:
            related_json_string = str(valid_item_data[1])
            related_ids = find_all_possible_ids(related_json_string)
            [items_related_to_valid_item.add(related_id) for related_id in related_ids]
            related_items = list(items_related_to_valid_item)
        # Iterate over each related ID found
        for related_id in items_related_to_valid_item:
            # Recursively call the function to find related items for each related ID
            get_related_items_for_id(
                gis_con,
                related_id,
                new_relation_path,
            )
    return related_items

# Content discovery
The script below crawls the story data and calls the helper functions defined above to subsequently crawl the contents of items found within the story.

Once this block runs the script will return a table showing all the items found within the story.

In [None]:
# Define the main story crawler function
story = None
# Crawl the story to find items and record their item_id
try:
    story = Item(gis, story_id)
    story_data = get_item_data(story)
    items_in_story = list(set(find_all_possible_ids(str(story_data))))

    for index, item_id in enumerate(items_in_story):
        # print(f"Processing item {index} with id {item_id}")
        try:
            related_items = get_related_items_for_id(
                gis, item_id, items_in_story
            )
            for related_item in related_items:
                items_in_story.append(related_item)
        except Exception as e:
            # print(str(e))
            pass
            
    items_found = query_found_items(items_in_story, result_list)
    # Turn the contents from the story into a dataframe
    items_df = pd.DataFrame(items_found)
    # Create a convenient subset of columns
    items_df = items_df[['id', 'owner', 'created', 'isOrgItem', 'modified', 'title', 'type','protected', 'access']] # drop columns except these
    # Remove duplicate items
    items_df = items_df.drop_duplicates(subset='id') # drop duplicate items
    # Filter to only show those items that are within the 'home' org
    items_df = items_df.loc[items_df['isOrgItem'] == True]

    #Preview
    items_df

except Exception as e:
    print(f"Error fetching story: {e}")
    print(f"Check the story itemID '{story_id}' and try again.")
    

# Protect the items
Using the table of items above, this next block will loop through those items and perform to desired protection and sharing updates.

Once complete, this block will report back an updated table of all the items for review.

In [None]:
# Now that we have a list of items we'll protect them from deletion and optionally make them public
if story:
    id_list = items_df['id'].tolist()

    # Function to perform the protection and sharing
    def update_item_properties(item, protection, share, level):
        i = gis.content.get(item)
        i.protect(enable = protection)
        if share:
            i.update(item_properties={"access": level})

    # Update the settings for each item
    for item in id_list:
        try:
            update_item_properties(item, delete_protect, share, share_level)
        except:
            print('Error: Could not update "{0}".'.format(item))

## Review the results
Wait a few moments after running the above. This last cell will query those items that were protected and present an updated table where you can confirm that things were protected/shared as expected.

In [None]:
# Reset the container
if story:
    itemList = []

    # Re-query the items to refresh the properties
    for item in id_list:
        getResourceInfo(item, itemList)

    # Turn the contents from the story into a dataframe
    items_df = pd.DataFrame(itemList)
    # Create a convenient subset of columns
    items_df = items_df[['id', 'owner', 'created', 'isOrgItem', 'modified', 'title', 'type','protected', 'access']] # drop columns except these
    items_df