# Populate AreaFillRGB in the DMU table
This is a Jupyter Notebook with some code and examples to show you how to auto-populate the `AreaFillRGB` field in the `DescriptionOfMapUnits` table based on the `MapUnit` or `Symbol` fields in the DMU and either a style file you are using or the actual layer renderer in an active map. I will eventually turn this into a tool in the GeMS Tools toolbox when I have more time but the great thing about a Jupyter Notebook is that there is context and documentation right beside the code. 

Using this Notebook requires a basic understanding of Python, Jupyter Notebooks, and JSON. You will have to set some Python variables to the catalog paths of some objects such as geodatabases and style files. To make some decisions about how to proceed, you will have to inspect some properties of objects by looking at some tables and lists of values. There is discussion of some complicated concepts that are provided only for context, but you don't to understand everything brought up.

Remember to set the focus on each cell that needs to run for your particular situation and press shift-enter to run the code.

A related Notebook is [Make Stylx Symbols](https://github.com/DOI-USGS/gems-tools-pro/blob/notebooks/Make%20Stylx%20Symbols.ipynb), which, like this one, has an example of accessing symbol properties in a stylx file.

Start at [Step 1](#one)* if symbols are in a style

Start at [Step 2](#two)* if symbols are in a layer renderer in a map

*internal links to anchor headings don't work for me in ArcGIS Pro, but do in a standalone Jupyter Notebook.

In [47]:
import json
import csv
import sqlite3
import pandas as pd
from pprint import pprint
import colorsys
from itertools import islice

## Functions
You can keep this section folded out of sight if you like. You don't have to change anything here but do run each of these cells.

In [14]:
def color_convert(space, values, n=2):
    # between the built-in standard library of colorsys, the cmyk_to_rgb function below, and a simple
    # conversion from grayscale (that might not be completely accurate) we have the conversion from 6
    # of the color spaces you can choose to use in ArcGIS Pro to RGB covered.
    # Hopefully, no one is selecting colors by using the LAB space
    if space == 'CIMCMYKColor': 
        return cmyk_to_rgb(values, n)
    
    elif space == 'CIMHSLColor': 
        h, s, l = values[0]/360, values[1]/100, values[2]/100
        rgb = colorsys.hls_to_rgb(h, l, s)
        rgb = [round(c*255, n) for c in rgb]
        return [rgb[0], rgb[1], rgb[2]]
    
    elif space == 'CIMHSVColor':
        h, s, v = values[0]/360, values[1]/100, values[2]/100
        rgb = colorsys.hsv_to_rgb(h, s, v)
        rgb = [round(c*255, n) for c in rgb]
        return [rgb[0], rgb[1], rgb[2]]
    
    elif space == 'CIMRGBColor':
        return [v[0], v[1], v[2]]
    
    elif space == "CIMGrayColor":
        return [v[0], v[0], v[0]]

In [15]:
def cmyk_to_rgb(cmyk, n='int', cmyk_scale=100, rgb_scale=255):
    c = cmyk[0]
    m = cmyk[1]
    y = cmyk[2]
    k = cmyk[3]
    r = rgb_scale * (1.0 - c / float(cmyk_scale)) * (1.0 - k / float(cmyk_scale))
    g = rgb_scale * (1.0 - m / float(cmyk_scale)) * (1.0 - k / float(cmyk_scale))
    b = rgb_scale * (1.0 - y / float(cmyk_scale)) * (1.0 - k / float(cmyk_scale))
    
    if n == 'int':
        r = int(r)
        g = int(g)
        b = int(b)
    elif n == 0:
        r = r
        g = g
        b = b
    else:
        r = round(r, n)
        g = round(g, n)
        b = round(b, n)
    
    return f"{str(r)}, {str(g)}, {str(b)}"

In [45]:
def get_class(class_string):
    """When the CIM definition of a layer is retrieved with arcpy all 'type' values (which,
    in the JSON representation, appear as strings starting with 'CIM'  ) are instantiated as 
    CIM objects and there is no .type attribute to interrogate through dot notation. Because 
    of that, it is difficult to discover what kind of object is being evaluated when iterating.
    This function cleans up the .__class__ string so that the CIM type is returned"""
    clean_str = str(class_string).split(".")[-1]
    clean_str = clean_str.rstrip(">'")
                                 
    return clean_str

In [48]:
def take(n, iterable):
    """Return the first n items of the iterable as a list."""
    return list(islice(iterable, n))

<a id="one"></a>

## 1. Use a style file to look up colors
Start here if your symbols are stored in a style file.

### 1.1 Explore the style file
ArcGIS style files are SQLite databases with a specified set of tables. The important one for us is called `Items`. Take a look at the columns and a few rows of this table to become familiar with what the rest of this Notebook does.

In [None]:
# find the path to the style file holding your symbols and save it to the variable stylx below
# when writing this Notebook, I used the FGDC.stylx file available from the GeMS NGMDB page
#   https://ngmdb.usgs.gov/Info/standards/GeMS/#reso
stylx = r"path\to\stylx\file"

# use sqlite3 to make a connection
con = sqlite3.connect(stylx)

# create an SQL query that pandas will use to pull some info from the table
# and display just the first few rows of data
query = "SELECT * FROM Items"
df_items = pd.read_sql_query(query, con)
df_items.head()

Unnamed: 0,ID,CLASS,CATEGORY,NAME,TAGS,CONTENT,KEY
0,1786,4,1.1 - Contacts,01.01.01,"Contact—Identity and existence certain, locati...","{""type"":""CIMLineSymbol"",""symbolLayers"":[{""type...",1786
1,1787,4,1.1 - Contacts,01.01.02,"Contact—Identity or existence questionable, lo...","{""type"":""CIMLineSymbol"",""symbolLayers"":[{""type...",1787
2,1788,4,1.1 - Contacts,01.01.03,"Contact—Identity and existence certain, locati...","{""type"":""CIMLineSymbol"",""symbolLayers"":[{""type...",1788
3,1789,4,1.1 - Contacts,01.01.04,"Contact—Identity or existence questionable, lo...","{""type"":""CIMLineSymbol"",""symbolLayers"":[{""type...",1789
4,1790,4,1.1 - Contacts,01.01.05,"Contact—Identity and existence certain, locati...","{""type"":""CIMLineSymbol"",""symbolLayers"":[{""type...",1790


Some of the data in this table should be familiar. `Category` is a class of symbols. `Name` is the name of the symbol that would match values in `Symbol` in the DMU if you are using the [Match Layer Symbology to A Style tool](https://pro.arcgis.com/en/pro-app/3.3/tool-reference/data-management/match-layer-symbology-to-a-style.htm) and `Tags` has a longer description of the symbol. Not all style files will be this informative but before you proceed, you will need to discover the `Category` value for the polygon fills you are interested in.

### 1.2 Find the CATEGORY with your polygon fill symbols

In [4]:
# create another query that will make a collection all distince values in Category
query = "SELECT DISTINCT category FROM ITEMS"
cursor = con.execute(query)
for category in cursor:
    print(category)

('1.1 - Contacts',)
('1.2 - Key beds',)
('1.3 - Dikes',)
('2.1 - Faults (generic; vertical, subvertical, or high-angle; or unknown or unspecified orientation or sense of slip)',)
('2.2 - Normal faults',)
('2.3 - Low-angle faults (unknown or unspecified sense of slip)',)
('2.4 - Reverse faults',)
('2.5 - Rotational or scissor faults',)
('2.8 - Thrust faults',)
('2.9 - Overturned thrust faults',)
('2.10 - Detachment faults (sense of slip unspecified)',)
('2.12 - Fault scarps',)
('2.13 - Quaternary faulting',)
('3.1 - Boundaries located by geophysical methods',)
('3.2 - Faults located by geophysical methods',)
('3.3 - Geophysical survey lines and stations',)
('4.1 - Lineaments',)
('4.2 - Joints',)
('5.1 - Anticlines',)
('5.2 - Antiforms',)
('5.3 - Asymmetric, overturned, and inverted anticlines',)
('5.4 - Antiformal sheath folds',)
('5.5 - Synclines',)
('5.6 - Synforms',)
('5.7 - Asymmetric, overturned, and inverted synclines',)
('5.8 - Synformal sheath folds',)
('5.9 - Monoclines',)
('11

With the FGDC Style, this list is long but down near the bottom is the category we want: `'CMYK Polygons'`

<a id="onethree"></a>

### 1.3 Inspect the CONTENT of a symbol
The symbol properties are stored in JSON in the field `CONTENT`. Take a look at a polygon fill symbol in the category you care about.

In [11]:
# create a query to pull the CONTENT for CATEGORY 'CMYK Polygons'
# or the category with your polygon symbols from a different style
query = "SELECT name, content FROM ITEMS WHERE category = 'CMYK Polygons'"
cursor = con.execute(query)
just_one = cursor.fetchone()
name = just_one[0]
symbol = json.loads(just_one[1])
pprint(symbol)

{'angleAlignment': 'Map',
 'symbolLayers': [﻿{'anchor3D': 'Center',
                   'capStyle': 'Round',
                   'color': {'type': 'CIMRGBColor', 'values': [0, 0, 0, 100]},
                   'enable': True,
                   'height3D': 1,
                   'joinStyle': 'Round',
                   'lineStyle3D': 'Strip',
                   'miterLimit': 10,
                   'type': 'CIMSolidStroke',
                   'width': 0},
                  {'color': {'type': 'CIMCMYKColor',
                             'values': [0, 8, 0, 0, 100]},
                   'enable': True,
                   'type': 'CIMSolidFill'}],
 'type': 'CIMPolygonSymbol'}


Symbol properties follow the schema of the [ArcGIS Cartographic Information Model](https://pro.arcgis.com/en/pro-app/latest/arcpy/mapping/python-cim-access.htm#:~:text=The%20CIM%20is%20the%20Esri%20Cartographic%20Information%20Model.,the%20CIM%20as%20limited%20to%20only%20cartographic%20settings.), which I don't understand well. Not all polygon symbols, even the ones in the FGDC Style have the same number of `symbolLayers`, for some reason. By cross-referencing the JSON with the FGDC CMYK Color Chart, I confirmed that the JSON object we want is the one with `type` = `CIMSolidFill` where the `values` of `color` are the values we are concerned with. Regardless of the number of `symbolLayers`, we should be able to find the correct object by searching for the correct `type`.

Note also the `type` value of the `color` object. In the case of the FGDC Style, the colors have been saved as `CIMCMYKColor` but there are a number of [other classes](https://pro.arcgis.com/en/pro-app/latest/sdk/api-reference/topic815.html) corresponding to common color spaces. It may be possible that a style category has color values in more than one color space so we will collect those as well so that we apply the correct conversion to RGB.

### 1.4 Make python dictionary of symbol names, color space, and colors
In this step we will build a python dictionary of all symbol names in the style and the converted color values. With this done, to retrieve the values for writing to `AreaFillRGB`, which is done in Step 3, it's a simple matter of looking the `Symbol` name up in the dictionary, as opposed to building a new database query for each `Symbol` name. With the dictionary built, you also have the option of working off that the next time you need to find values in the style or writing the whole thing to a text file for easier reference in the future. With the sizes of the tables and databases we are working with here, the difference in execution time between these two methods will be negligible. 

In [51]:
# start with an empty dictionary
name_color = {}

# build a query getting the name and content for each row
# remember to change the category name below if you have to
query = "SELECT name, content FROM ITEMS WHERE category = 'CMYK Polygons'"
cursor = con.execute(query)

# iterate through the cursor and add entries to the dictionary
# getting the color values requires converting the string in Content to JSON
# and then selecting the correct object based on 'type': 'CIMSolidFill'
for row in (cursor):
    for sym_lyr in json.loads(row[1])["symbolLayers"]:
        if sym_lyr["type"] == "CIMSolidFill":
            color_space = sym_lyr["color"]["type"]
            color_values = sym_lyr["color"]["values"]
            name_color[row[0]] = (color_space, color_values)

# print a few entries in the dictionary to see that it looks right
for entry in list(name_color)[:5]:
    print(entry, name_color[entry])

1 ('CIMCMYKColor', [0, 8, 0, 0, 100])
10 ('CIMCMYKColor', [0, 0, 8, 0, 100])
100 ('CIMCMYKColor', [8, 0, 0, 0, 100])
101 ('CIMCMYKColor', [8, 8, 0, 0, 100])
102 ('CIMCMYKColor', [8, 13, 0, 0, 100])


### 1.5 Convert to RGB
Of course, these values are in CMYK and the field is called `AreaFillRGB` so we need to convert them (I have always thought that this field should, in fact, be color space-agnostic but here we are). Because we might be collecting multiple color spaces, we need multiple conversions. In the function `color_convert` above there are methods to convert 6 out of the 7 possible color spaces you can work with in ArcGIS Pro. Note that by default it returns whole integer values of R ,G, and B. If you want to allow decimal places, specify a value `n` for what place to round to. Finally, `color_convert` returns a string of comma-delimited values because that is the format for writing to `AreaFillRGB`.

In [None]:
# try out the cmyk_to_rgb function to find out what RGB values you want to save
# whole integers are specified by using 'int' for n, the last argument
# just print out 10 entries using take()
for key, value in take(10, name_color.items()):
    print(f"{key}: {color_convert(value[0], value[1], 'int')}")

1: 255, 234, 255
10: 255, 255, 234
100: 234, 255, 255
101: 234, 234, 255
102: 234, 221, 255
103: 234, 204, 255
104: 234, 178, 255
105: 234, 153, 255
106: 234, 127, 255
107: 234, 102, 255


In [53]:
# rounded to n decimal places (the number might have fewer if there are only zeros after the n you have chosen)
for key, value in take(10, name_color.items()):
    print(f"{key}: {color_convert(value[0], value[1], 2)}")

1: 255.0, 234.6, 255.0
10: 255.0, 255.0, 234.6
100: 234.6, 255.0, 255.0
101: 234.6, 234.6, 255.0
102: 234.6, 221.85, 255.0
103: 234.6, 204.0, 255.0
104: 234.6, 178.5, 255.0
105: 234.6, 153.0, 255.0
106: 234.6, 127.5, 255.0
107: 234.6, 102.0, 255.0


In [54]:
# once you have decided on the number of decimal places or if you just want whole integer values,
# re-build the dictionary with the RGB values
# we'll do it in one line with a dictionary comprehension
new_dict = {k: color_convert(v[0], v[1], 'int') for k,v in name_color.items()}

# and take a look
for k, v in take(10, new_dict.items()):
    print(f"{k}: {v}")

1: 255, 234, 255
10: 255, 255, 234
100: 234, 255, 255
101: 234, 234, 255
102: 234, 221, 255
103: 234, 204, 255
104: 234, 178, 255
105: 234, 153, 255
106: 234, 127, 255
107: 234, 102, 255


Now that you have the dictionary built, go to Step 3

<a id="two"></a>

## 2. Use a layer renderer to get symbol values
Start here if your MapUnit polygons are symbolized in a layer in a map. Either you are not using a style file or you have already matched symbols to a style.

To prepare for this step, set up a map layer called "MapUnitPolys" and symbolize it using either a Unique Values renderer or using the Match Layer Symbology To A Style tool. Note that your choice for the field containing the unique values, that is, `MapUnit` or `Symbol`, will be the field you use in Step 3 to select rows in the DMU table for updating.

### 2.1 Drill down through the layer's CIM definition
The symbology that has been used for a layer can be accessed through the CIM [(Cartographic Information Model)](https://pro.arcgis.com/en/pro-app/latest/arcpy/mapping/python-cim-access.htm#:~:text=The%20CIM%20is%20the%20Esri%20Cartographic%20Information%20Model.,the%20CIM%20as%20limited%20to%20only%20cartographic%20settings.) definition for the layer which allows for dot notation (e.g., `object.method().property`) access of attributes and their values. When a .lyrx layer file is created, the content is the JSON representation of the CIM and nested a few levels down is the same JSON representation of polygon symbols that we explored in style files in [Step 1.3](#onethree). 

Because of the various levels of nesting I think are possible with where those symbol properties end up, I tried to write code that would find a list of named JSON objects regardless of where they were in the file, but was not successful. I tried a few examples I found for recursively searching JSON objects, but none of them worked and I was losing too much time trying to figure out why.

With the code below, although the dot-notation paths are hard-coded, I *think* they will always exist for any polygon symbol. If they fail, you will have to inspect the .lyrx file and try to figure out what the notation should be to drill down to the right level and extract the information.

Note that when a CIM definition is retrieved by arcpy, all JSON objects are instantiated as the CIM `type` value that is shown in the array. That is, all arrays become collections of CIM objects, not simply JSON arrays or python dictionaries. In dot notation access, there is no .`type` attribute for any of those objects, as there are for the other JSON keys, because the object being interrogated IS the type. Very handy for copying and modifying symbols programmatically, but to discover what `type` the object is, the only solution I found was to inspect the `__class__` attribute which is available for all python objects regardless of how they were created. 

In [None]:
# first get the current project
p = arcpy.mp.ArcGISProject('current')

# get a pointer to the MapUnitPolys layer
# change the name of the map wildcard below for listMaps() to the map that has the MapUnitPolys layer
m = p.listMaps('Map1')[0]
l = m.listLayers('MapUnitPolys')[0]

# get the CIM definition of the layer
l_cim = l.getDefinition("v3")

# create an emtpy dictionary to save values into as we iterate
# this will take the same form as the one built in step 1.4
# name of symbol: (color space, color values)
name_color = {}

# start by iterating through the groups. I am not quite sure what these 
# would be in a layer renderer
groups = l_cim.renderer.groups
for group in groups:
    # in each group, there is a list of "classes"
    for sym_class in group.classes:
        name = sym_class.label
        symbol_layers = sym_class.symbol.symbol.symbolLayers
        # for reference with a JSON file of the CIM definition:
        # at this point, the dot notation to this array is
        # renderer.groups[i].classes[i].symbol.symbol.symbolLayers
        for symbol_layer in symbol_layers:
            if get_class(symbol_layer.__class__) == "CIMSolidFill":
                color_space = get_class(symbol_layer.color.__class__)
                color_values = symbol_layer.color.values
                name_color[name] = (color_space, color_values)

# look at the dictionary. My test layer only had 6 map unit polygons
# and they were labelled by the FGDC CMYK color symbol name
for k,v in name_color.items():
    print(f"{k}: {v}")

290: ('CIMCMYKColor', [13, 0, 100, 0, 100])
399: ('CIMCMYKColor', [20, 100, 100, 0, 100])
609: ('CIMCMYKColor', [50, 100, 0, 0, 100])
900: ('CIMCMYKColor', [100, 0, 0, 0, 100])
200: ('CIMCMYKColor', [13, 0, 0, 0, 100])


### 2.2 Convert to RGB
As with Step , regardless of the color space the symbol was originally stored in, convert to RGB

In [41]:
# now re-build the dictionary with the RGB values
# we'll do it in one line with a dictionary comprehension
# note that we are re-writing the content of the name_color dictionary. 
# To run this cell again, first go through step 2.1 again.
new_dict = {k: color_convert(v[0], v[1], 2) for k,v in name_color.items()}

# print the entries out
for k,v in new_dict.items():
    print(f"{k}: {v}")

290: 221.85, 255.0, 0.0
399: 204.0, 0.0, 0.0
609: 127.5, 0.0, 255.0
900: 0.0, 255.0, 255.0
200: 221.85, 255.0, 255.0


## 3. Write values to the DMU table
Now, we just need to iterate through the `MapUnit` or `Symbol` values (depending on how you established unique colors) in the DMU and write the values in the dictionary to the `AreaFillRGB` field. To minimize the number of nested for and cursor loops, first collect a python list of key values

### 3.1 Check that all Symbol or MapUnit values have an entry in name_color

In [None]:
# set the variable dmu below to the full catalog path to the DMU table
# drag and drop from the Catalog window
dmu = r"path\to\my_gems.gdb\DescriptionOfMapUnits"

# Are MapUnitPolys symbolized on Symbol or MapUnit?
# set the variable key_field below appropriately. Comment out the line you don't need.
key_field = "Symbol"
# key_field = "MapUnit"

# use a list comprehension with a SearchCursor to get the values in that field
symbol_names = [row[0] for row in arcpy.da.SearchCursor(dmu, key_field) if not row[0] == None]
        
# check that all Symbol or MapUnit values are in the name_color dictionary
for symbol_name in symbol_names:
    if not symbol_name in name_color.keys():
        print(f"The value {symbol_name} from field {key_field} is not in the dictionary")

# fix any problems and come back to this step until nothing prints out

### 3.1 Write the RGB color values from name_color to the appropriate AreaFillRGB

In [44]:
# iterate through the symbol names in the name_color dictionary, and use an UpdateCursor to 
# write the value to AreaFillRGB in the DMU
for symbol_name in symbol_names:
    # use the appropriate where clause below to query either Symbol or MapUnit
    # comment out the other one
    where = f"Symbol = '{symbol_name}'"
    # where = f"MapUnit = '{symbol_name}'"

    with arcpy.da.UpdateCursor(dmu, [key_field, "AreaFillRGB"], where) as cursor:
        for row in cursor:
            row[1] = new_dict[symbol_name]
            cursor.updateRow(row)

### 3.3 Inspect the table
Use a SearchCursor on DescriptionOfMapUnits to check that the values were written correctly.

In [47]:
fields = ["MapUnit", "Symbol", "AreaFillRGB"]
with arcpy.da.SearchCursor(dmu, fields) as cursor:
    for row in cursor:
        print(row[0], row[1], row[2])

None None None
Da 290 221.85, 255.0, 0.0
Dt 399 204.0, 0.0, 0.0
Ddr 609 127.5, 0.0, 255.0
Ddr 900 0.0, 255.0, 255.0
Swlc 200 221.85, 255.0, 255.0
