Different Protocol Names In Study Sequence Cause An Error #501

ptth222 · 2023-08-28T20:23:09Z

I initially modified a JSON example directly and found this issue, but I think showing it from the Tab side is clearer.

I modified the BII-I-1 Tab example so that the first culture has a different protocol than the rest. This validates and converts to JSON without issues. If I try to convert that JSON back to Tab though there is an issue caused by the different protocol.

Modified study and investigation files:
s_BII-S-1.txt
i_investigation.txt

Code:

isa_json = isatab2json.convert('C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/tab/BII-I-1_conversion_testing', use_new_parser=True)

with open('C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/BII-I-1_testing.json', 'w') as out_fp:
     json.dump(isa_json, out_fp, indent=2)

with open('C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/BII-I-1_testing.json') as file_pointer:
    json2isatab.convert(file_pointer, 'C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/BII-I-1_testing/', validate_first=False)

Traceback:

Traceback (most recent call last):

  File "C:\Users\Sparda\AppData\Local\Temp\ipykernel_5600\1208495759.py", line 5, in <cell line: 4>
    json2isatab.convert(file_pointer, 'C:/Users/Sparda/Desktop/Moseley Lab/Code/MESSES/isadatasets/BII-I-1_testing/', validate_first=False)

  File "C:\Python310\lib\site-packages\isatools\convert\json2isatab.py", line 49, in convert
    isatab.dump(isa_obj=isa_obj, output_path=path, i_file_name=i_file_name,

  File "C:\Python310\lib\site-packages\isatools\isatab\dump\core.py", line 170, in dump
    write_study_table_files(investigation, output_path)

  File "C:\Python310\lib\site-packages\isatools\isatab\dump\write.py", line 134, in write_study_table_files
    df_dict[olabel][-1] = node.executes_protocol.name

KeyError: 'Protocol REF.growth protocol 2'

I investigated the error and it seems to come from identifying process nodes by the protocol they execute instead of by their position like is done with sample nodes in the same section of code. I think I was able to fix it by simply changing the process node code to be like the sample node code.

New Code:

        sample_in_path_count = 0
        protocol_in_path_count = 0
        longest_path = _longest_path_and_attrs(paths, s_graph.indexes)
        
        for node_index in longest_path:
            node = s_graph.indexes[node_index]
            if isinstance(node, Source):
                olabel = "Source Name"
                columns.append(olabel)
                columns += flatten(
                    map(lambda x: get_characteristic_columns(olabel, x),
                        node.characteristics))
                columns += flatten(
                    map(lambda x: get_comment_column(
                        olabel, x), node.comments))
            elif isinstance(node, Process):
                olabel = "Protocol REF.{}".format(protocol_in_path_count)
                columns.append(olabel)
                protocol_in_path_count += 1
                if node.executes_protocol.name not in protnames.keys():
                    protnames[node.executes_protocol.name] = protrefcount
                    protrefcount += 1
                columns += flatten(map(lambda x: get_pv_columns(olabel, x),
                                       node.parameter_values))
                if node.date is not None:
                    columns.append(olabel + ".Date")
                if node.performer is not None:
                    columns.append(olabel + ".Performer")
                columns += flatten(
                    map(lambda x: get_comment_column(
                        olabel, x), node.comments))

            elif isinstance(node, Sample):
                olabel = "Sample Name.{}".format(sample_in_path_count)
                columns.append(olabel)
                sample_in_path_count += 1
                columns += flatten(
                    map(lambda x: get_characteristic_columns(olabel, x),
                        node.characteristics))
                columns += flatten(
                    map(lambda x: get_comment_column(
                        olabel, x), node.comments))
                columns += flatten(map(lambda x: get_fv_columns(olabel, x),
                                       node.factor_values))


        omap = get_object_column_map(columns, columns)
        # load into dictionary
        df_dict = dict(map(lambda k: (k, []), flatten(omap)))

        for path_ in paths:
            for k in df_dict.keys():  # add a row per path
                df_dict[k].extend([""])

            sample_in_path_count = 0
            protocol_in_path_count = 0
            for node_index in path_:
                node = s_graph.indexes[node_index]
                if isinstance(node, Source):
                    olabel = "Source Name"
                    df_dict[olabel][-1] = node.name
                    for c in node.characteristics:
                        category_label = c.category.term if isinstance(c.category.term, str) \
                            else c.category.term["annotationValue"]
                        clabel = "{0}.Characteristics[{1}]".format(
                            olabel, category_label)
                        write_value_columns(df_dict, clabel, c)
                    for co in node.comments:
                        colabel = "{0}.Comment[{1}]".format(olabel, co.name)
                        df_dict[colabel][-1] = co.value

                elif isinstance(node, Process):
                    olabel = "Protocol REF.{}".format(
                        protocol_in_path_count)
                    df_dict[olabel][-1] = node.executes_protocol.name
                    for pv in node.parameter_values:
                        pvlabel = "{0}.Parameter Value[{1}]".format(
                            olabel, pv.category.parameter_name.term)
                        write_value_columns(df_dict, pvlabel, pv)
                    if node.date is not None:
                        df_dict[olabel + ".Date"][-1] = node.date
                    if node.performer is not None:
                        df_dict[olabel + ".Performer"][-1] = node.performer
                    for co in node.comments:
                        colabel = "{0}.Comment[{1}]".format(olabel, co.name)
                        df_dict[colabel][-1] = co.value

                elif isinstance(node, Sample):
                    olabel = "Sample Name.{}".format(sample_in_path_count)
                    sample_in_path_count += 1
                    df_dict[olabel][-1] = node.name
                    for c in node.characteristics:
                        category_label = c.category.term if isinstance(c.category.term, str) \
                            else c.category.term["annotationValue"]
                        clabel = "{0}.Characteristics[{1}]".format(
                            olabel, category_label)
                        write_value_columns(df_dict, clabel, c)
                    for co in node.comments:
                        colabel = "{0}.Comment[{1}]".format(olabel, co.name)
                        df_dict[colabel][-1] = co.value
                    for fv in node.factor_values:
                        fvlabel = "{0}.Factor Value[{1}]".format(
                            olabel, fv.factor_name.name)
                        write_value_columns(df_dict, fvlabel, fv)

This is approximately lines 64-167 in isatools\isatab\dump\write.py in the write_study_table_files function. The changed code no longer errors and the converted study Tab from the JSON looks correct to me.

The text was updated successfully, but these errors were encountered:

Changed write_study_table_files and write_assay_table_files to count the protocol nodes instead of naming them by the protocol executed. Addresses issue ISA-tools#501.

ptth222 added a commit to ptth222/isa-api that referenced this issue Sep 6, 2023

Update write.py

f1936eb

Changed write_study_table_files and write_assay_table_files to count the protocol nodes instead of naming them by the protocol executed. Addresses issue ISA-tools#501.

This was referenced Sep 6, 2023

Update write.py #502

Closed

Are Multiple "<entity> Name" Columns Allowed? #500

Open

proccaserra mentioned this issue Oct 23, 2023

Develop #505

Closed

ptth222 mentioned this issue Feb 5, 2024

Fairly significant changes to check_protocol_fields #531

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different Protocol Names In Study Sequence Cause An Error #501

Different Protocol Names In Study Sequence Cause An Error #501

ptth222 commented Aug 28, 2023

Different Protocol Names In Study Sequence Cause An Error #501

Different Protocol Names In Study Sequence Cause An Error #501

Comments

ptth222 commented Aug 28, 2023