# <a id='toc1_'></a>[Molecular Oncology Almanac Assertion Analysis](#toc0_)

MOA evidence items are referred to as assertions and MOA variants are referred to as features in this analysis. 

**Table of contents**<a id='toc0_'></a>    
- [Molecular Oncology Almanac Assertion Analysis](#toc1_)    
  - [All Features (Variants) Analysis](#toc1_1_)    
    - [Creating a table with feature (variant) and assertion (evidence) information](#toc1_1_1_)    
    - [Converting feature (variant) types to normalized categories](#toc1_1_2_)    
    - [Adding a numerical impact score based on the predictive implication](#toc1_1_3_)    
    - [Impact Score Analysis](#toc1_1_4_)    
    - [Features (Variants) Analysis](#toc1_1_5_)    
    - [Assertions (Evidence Items) Analysis](#toc1_1_6_)    
    - [Summaries for all Features (Variants) and Assertions (Evidence Items)](#toc1_1_7_)    
  - [Create functions / global variables used in analysis](#toc1_2_)    
  - [Normalized Analysis](#toc1_3_)    
  - [Not Supported Analysis](#toc1_4_)    
    - [Feature (Variant) Analysis](#toc1_4_1_)    
    - [Not Supported Feature (Variant) Analysis by Subcategory](#toc1_4_2_)    
    - [Not Support Feature (Variant) Assertion (Evidence) Analysis by Subcategory](#toc1_4_3_)    
    - [Impact by Subcategory](#toc1_4_4_)    
- [MOA Summary](#toc2_)    
  - [Feature (Variant) Analysis](#toc2_1_)    
  - [Evidence Analysis](#toc2_2_)    
  - [Impact](#toc2_3_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

In [89]:
from enum import Enum
from typing import Dict
import json

import pandas as pd
import plotly.express as px
import requests
from ga4gh.core import sha512t24u

In [90]:
def get_feature_digest(feature: Dict) -> str:
    """Get digest for feature

    :param feature: MOA feature
    :return: Digest
    """
    attrs = json.dumps(
        feature["attributes"][0], sort_keys=True, separators=(",", ":"), indent=None
    ).encode("utf-8")
    return sha512t24u(attrs)

In [91]:
class VariantNormType(str, Enum):
    """Variation Normalization types"""

    NORMALIZED = "Normalized"
    NOT_SUPPORTED = "Not Supported"


VARIANT_NORM_TYPE_VALUES = [v.value for v in VariantNormType.__members__.values()]

In [92]:
class VariantCategory(str, Enum):
    """Create enum for the kind of variants that are in MOA."""

    EXPRESSION = "Expression"
    EPIGENETIC_MODIFICATION = "Epigenetic Modification"
    FUSION = "Fusion"
    PROTEIN_CONS = "Protein Consequence"
    GENE_FUNC = "Gene Function"
    REARRANGEMENTS = "Rearrangements"
    COPY_NUMBER = "Copy Number"
    OTHER = "Other"
    GENOTYPES_EASY = "Genotypes Easy"
    GENOTYPES_COMPOUND = "Genotypes Compound"
    REGION_DEFINED_VAR = "Region Defined Variant"
    TRANSCRIPT_VAR = "Transcript Variant"  # no attempt to normalize these ones, since there is no query we could use


VARIANT_CATEGORY_VALUES = [v.value for v in VariantCategory.__members__.values()]

## <a id='toc1_1_'></a>[All Features (Variants) Analysis](#toc0_)

### <a id='toc1_1_1_'></a>[Creating a table with feature (variant) and assertion (evidence) information](#toc0_)

In [93]:
# Create dictionary for MOA Feature ID -> Feature Type
r = requests.get("https://moalmanac.org/api/features")
if r.status_code == 200:
    feature_data = r.json()

features = {}

for feature in feature_data:
    feature_id = feature["feature_id"]
    digest = get_feature_digest(feature)
    features[digest] = feature["feature_type"]

count_unique_feature_ids = len(features.keys())
print(count_unique_feature_ids)

429


In [94]:
# Create DF for assertions and their associated feature + predictive implication
r = requests.get("https://moalmanac.org/api/assertions")
if r.status_code == 200:
    assertion_data = r.json()

transformed = []

# Mapping from feature ID to feature digest
feature_id_to_digest = {}

for assertion in assertion_data:
    assertion_id = assertion["assertion_id"]
    predictive_implication = assertion["predictive_implication"]

    if len(assertion["features"]) != 1:
        print(f"assertion id ({assertion_id}) does not have 1 feature")
        continue

    feature = assertion["features"][0]
    feature_id = feature["feature_id"]
    feature_digest = get_feature_digest(feature)

    feature_id_to_digest[feature_id] = digest

    transformed.append(
        {
            "assertion_id": assertion_id,
            "feature_id": feature_id,
            "feature_type": features[feature_digest],
            "predictive_implication": predictive_implication,
            "feature_digest": feature_digest,
        }
    )
moa_df = pd.DataFrame(transformed)
moa_df

Unnamed: 0,assertion_id,feature_id,feature_type,predictive_implication,feature_digest
0,1,1,rearrangement,FDA-Approved,RnRyn89cJzVbVM93aw4OA44NIF5zblyP
1,2,2,rearrangement,FDA-Approved,RnRyn89cJzVbVM93aw4OA44NIF5zblyP
2,3,3,rearrangement,FDA-Approved,RnRyn89cJzVbVM93aw4OA44NIF5zblyP
3,4,4,rearrangement,FDA-Approved,RnRyn89cJzVbVM93aw4OA44NIF5zblyP
4,5,5,rearrangement,FDA-Approved,RnRyn89cJzVbVM93aw4OA44NIF5zblyP
...,...,...,...,...,...
889,890,890,somatic_variant,FDA-Approved,c3CkYcMt4ssh4AL4gpacJtFil8xl2TB2
890,891,891,somatic_variant,FDA-Approved,uAW4cOXId1N1MKo5fqHYdw9JGceCMmE5
891,892,892,somatic_variant,FDA-Approved,B5m8cSgi6w2xRCg0X_dPpQU2dwbvtXk1
892,893,893,somatic_variant,FDA-Approved,YLXf4Q8yr45bD0I_v6nkpDjBFuGPdFbd


In [95]:
unique_features_df = moa_df.sort_values("feature_id").drop_duplicates("feature_digest")
len_unique_feature_ids = len(list(unique_features_df.feature_id))

In [96]:
total_len_features = len(moa_df.feature_digest.unique())
f"Total number of unique features (variants): {total_len_features}"

'Total number of unique features (variants): 428'

In [97]:
assert total_len_features == len_unique_feature_ids

In [98]:
total_len_assertions = len(moa_df.assertion_id.unique())
f"Total number of unique assertions (evidence items): {total_len_assertions}"

'Total number of unique assertions (evidence items): 894'

### <a id='toc1_1_2_'></a>[Converting feature (variant) types to normalized categories](#toc0_)

In [99]:
list(moa_df.feature_type.unique())

['rearrangement',
 'somatic_variant',
 'germline_variant',
 'copy_number',
 'microsatellite_stability',
 'mutational_signature',
 'mutational_burden',
 'knockdown',
 'aneuploidy']

In [100]:
moa_df["category"] = moa_df["feature_type"].copy()

moa_df["category"] = moa_df["category"].replace(
    "rearrangement", VariantCategory.REARRANGEMENTS.value
)
moa_df["category"] = moa_df["category"].replace(
    "aneuploidy", VariantCategory.COPY_NUMBER.value
)
moa_df["category"] = moa_df["category"].replace(
    "knockdown", VariantCategory.EXPRESSION.value
)
moa_df["category"] = moa_df["category"].replace(
    "somatic_variant", VariantCategory.PROTEIN_CONS.value
)
moa_df["category"] = moa_df["category"].replace(
    "germline_variant", VariantCategory.PROTEIN_CONS.value
)
moa_df["category"] = moa_df["category"].replace(
    "microsatellite_stability", VariantCategory.REARRANGEMENTS.value
)
moa_df["category"] = moa_df["category"].replace(
    "mutational_burden", VariantCategory.OTHER.value
)
moa_df["category"] = moa_df["category"].replace(
    "mutational_signature", VariantCategory.OTHER.value
)
moa_df["category"] = moa_df["category"].replace(
    "copy_number", VariantCategory.COPY_NUMBER.value
)

moa_df.head()

Unnamed: 0,assertion_id,feature_id,feature_type,predictive_implication,feature_digest,category
0,1,1,rearrangement,FDA-Approved,RnRyn89cJzVbVM93aw4OA44NIF5zblyP,Rearrangements
1,2,2,rearrangement,FDA-Approved,RnRyn89cJzVbVM93aw4OA44NIF5zblyP,Rearrangements
2,3,3,rearrangement,FDA-Approved,RnRyn89cJzVbVM93aw4OA44NIF5zblyP,Rearrangements
3,4,4,rearrangement,FDA-Approved,RnRyn89cJzVbVM93aw4OA44NIF5zblyP,Rearrangements
4,5,5,rearrangement,FDA-Approved,RnRyn89cJzVbVM93aw4OA44NIF5zblyP,Rearrangements


In [101]:
list(moa_df.category.unique())

['Rearrangements', 'Protein Consequence', 'Copy Number', 'Other', 'Expression']

### <a id='toc1_1_3_'></a>[Adding a numerical impact score based on the predictive implication](#toc0_)
This is based on the structure of MOA scoring

In [102]:
predictive_implication_categories = moa_df.predictive_implication.unique()
list(predictive_implication_categories)

['FDA-Approved',
 'Guideline',
 'Clinical trial',
 'Preclinical',
 'Inferential',
 'Clinical evidence']

In [103]:
moa_df["impact_score"] = moa_df["predictive_implication"].copy()

moa_df.loc[moa_df["impact_score"] == "FDA-Approved", "impact_score"] = 10
moa_df.loc[moa_df["impact_score"] == "Guideline", "impact_score"] = 10
moa_df.loc[moa_df["impact_score"] == "Clinical evidence", "impact_score"] = 5
moa_df.loc[moa_df["impact_score"] == "Clinical trial", "impact_score"] = 5
moa_df.loc[moa_df["impact_score"] == "Preclinical", "impact_score"] = 1
moa_df.loc[moa_df["impact_score"] == "Inferential", "impact_score"] = 0.5

moa_df.head()

Unnamed: 0,assertion_id,feature_id,feature_type,predictive_implication,feature_digest,category,impact_score
0,1,1,rearrangement,FDA-Approved,RnRyn89cJzVbVM93aw4OA44NIF5zblyP,Rearrangements,10
1,2,2,rearrangement,FDA-Approved,RnRyn89cJzVbVM93aw4OA44NIF5zblyP,Rearrangements,10
2,3,3,rearrangement,FDA-Approved,RnRyn89cJzVbVM93aw4OA44NIF5zblyP,Rearrangements,10
3,4,4,rearrangement,FDA-Approved,RnRyn89cJzVbVM93aw4OA44NIF5zblyP,Rearrangements,10
4,5,5,rearrangement,FDA-Approved,RnRyn89cJzVbVM93aw4OA44NIF5zblyP,Rearrangements,10


### <a id='toc1_1_4_'></a>[Impact Score Analysis](#toc0_)

In [104]:
feature_categories_impact_data = dict()
for category in VARIANT_CATEGORY_VALUES:
    feature_categories_impact_data[category] = {}
    impact_category_df = moa_df[moa_df.category == category]

    total_sum_category_impact = impact_category_df["impact_score"].sum()
    feature_categories_impact_data[category][
        "total_sum_category_impact"
    ] = total_sum_category_impact
    print(f"{category}: {total_sum_category_impact}")

Expression: 12
Epigenetic Modification: 0
Fusion: 0
Protein Consequence: 4152.5
Gene Function: 0
Rearrangements: 643.0
Copy Number: 400.0
Other: 53.5
Genotypes Easy: 0
Genotypes Compound: 0
Region Defined Variant: 0
Transcript Variant: 0


### <a id='toc1_1_5_'></a>[Features (Variants) Analysis](#toc0_)

In [105]:
moa_feature_data = dict()
for category in VARIANT_CATEGORY_VALUES:
    moa_feature_data[category] = {}
    feature_type_df = moa_df[moa_df.category == category]

    number_unique_category_features = len(set(feature_type_df.feature_digest))
    moa_feature_data[category][
        "number_unique_category_features"
    ] = number_unique_category_features

    fraction_category_feature = (
        f"{number_unique_category_features} / {total_len_features}"
    )
    moa_feature_data[category]["fraction_category_feature"] = fraction_category_feature

    percent_category_feature = (
        "{:.2f}".format(number_unique_category_features / total_len_features * 100)
        + "%"
    )
    moa_feature_data[category]["percent_category_feature"] = percent_category_feature

### <a id='toc1_1_6_'></a>[Assertions (Evidence Items) Analysis](#toc0_)

In [106]:
moa_assertion_data = dict()
for category in VARIANT_CATEGORY_VALUES:
    moa_assertion_data[category] = {}
    assertion_type_df = moa_df[moa_df.category == category]

    number_unique_category_assertions = len(set(assertion_type_df.assertion_id))
    moa_assertion_data[category][
        "number_unique_category_assertions"
    ] = number_unique_category_assertions

    fraction_category_assertion = (
        f"{number_unique_category_assertions} / {total_len_assertions}"
    )
    moa_assertion_data[category][
        "fraction_category_assertion"
    ] = fraction_category_assertion

    percent_category_assertion = (
        "{:.2f}".format(number_unique_category_assertions / total_len_assertions * 100)
        + "%"
    )
    moa_assertion_data[category][
        "percent_category_assertion"
    ] = percent_category_assertion

### <a id='toc1_1_7_'></a>[Summaries for all Features (Variants) and Assertions (Evidence Items)](#toc0_)

In [107]:
feature_category_impact_score = [
    v["total_sum_category_impact"] for v in feature_categories_impact_data.values()
]
feature_category_number = [
    v["number_unique_category_features"] for v in moa_feature_data.values()
]
feature_category_fraction = [
    v["fraction_category_feature"] for v in moa_feature_data.values()
]
feature_category_percent = [
    v["percent_category_feature"] for v in moa_feature_data.values()
]
feature_category_assertion_number = [
    v["number_unique_category_assertions"] for v in moa_assertion_data.values()
]
feature_category_assertion_fraction = [
    v["fraction_category_assertion"] for v in moa_assertion_data.values()
]
feature_category_assertion_percent = [
    v["percent_category_assertion"] for v in moa_assertion_data.values()
]

In [108]:
feature_category_dict = {
    "Category": VARIANT_CATEGORY_VALUES,
    "Number of Features": feature_category_number,
    "Fraction of Features": feature_category_fraction,
    "Percent of Features": feature_category_percent,
    "Number of Assertions": feature_category_assertion_number,
    "Fraction of Assertions": feature_category_assertion_fraction,
    "Percent of Assertions": feature_category_assertion_percent,
    "Impact Score": feature_category_impact_score,
}

In [109]:
moa_feature_df = pd.DataFrame(feature_category_dict)
moa_feature_df

Unnamed: 0,Category,Number of Features,Fraction of Features,Percent of Features,Number of Assertions,Fraction of Assertions,Percent of Assertions,Impact Score
0,Expression,11,11 / 428,2.57%,12,12 / 894,1.34%,12.0
1,Epigenetic Modification,0,0 / 428,0.00%,0,0 / 894,0.00%,0.0
2,Fusion,0,0 / 428,0.00%,0,0 / 894,0.00%,0.0
3,Protein Consequence,323,323 / 428,75.47%,676,676 / 894,75.62%,4152.5
4,Gene Function,0,0 / 428,0.00%,0,0 / 894,0.00%,0.0
5,Rearrangements,38,38 / 428,8.88%,81,81 / 894,9.06%,643.0
6,Copy Number,47,47 / 428,10.98%,102,102 / 894,11.41%,400.0
7,Other,9,9 / 428,2.10%,23,23 / 894,2.57%,53.5
8,Genotypes Easy,0,0 / 428,0.00%,0,0 / 894,0.00%,0.0
9,Genotypes Compound,0,0 / 428,0.00%,0,0 / 894,0.00%,0.0


In [110]:
moa_feature_df["Percent of Features"] = (
    moa_feature_df["Fraction of Features"].astype(str)
    + " ("
    + moa_feature_df["Percent of Features"]
    + ")"
)
moa_feature_df["Percent of Assertions"] = (
    moa_feature_df["Fraction of Assertions"].astype(str)
    + " ("
    + moa_feature_df["Percent of Assertions"]
    + ")"
)

In [111]:
moa_feature_df_abbreviated = moa_feature_df.drop(
    [
        "Number of Features",
        "Fraction of Features",
        "Number of Assertions",
        "Fraction of Assertions",
    ],
    axis=1,
)

In [112]:
moa_feature_df_abbreviated = moa_feature_df_abbreviated.set_index("Category")
moa_feature_df_abbreviated

Unnamed: 0_level_0,Percent of Features,Percent of Assertions,Impact Score
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Expression,11 / 428 (2.57%),12 / 894 (1.34%),12.0
Epigenetic Modification,0 / 428 (0.00%),0 / 894 (0.00%),0.0
Fusion,0 / 428 (0.00%),0 / 894 (0.00%),0.0
Protein Consequence,323 / 428 (75.47%),676 / 894 (75.62%),4152.5
Gene Function,0 / 428 (0.00%),0 / 894 (0.00%),0.0
Rearrangements,38 / 428 (8.88%),81 / 894 (9.06%),643.0
Copy Number,47 / 428 (10.98%),102 / 894 (11.41%),400.0
Other,9 / 428 (2.10%),23 / 894 (2.57%),53.5
Genotypes Easy,0 / 428 (0.00%),0 / 894 (0.00%),0.0
Genotypes Compound,0 / 428 (0.00%),0 / 894 (0.00%),0.0


In [113]:
fig = px.scatter(
    data_frame=moa_feature_df,
    x="Number of Assertions",
    y="Impact Score",
    size="Number of Features",
    size_max=40,
    text="Number of Features",
    color="Category",
)
fig.show()

In [114]:
fig.write_html("moa_feature_categories_impact_scatterplot.html")

## <a id='toc1_2_'></a>[Create functions / global variables used in analysis](#toc0_)

In [115]:
feature_analysis_summary = {
    "Variant Category": VARIANT_NORM_TYPE_VALUES,
    "Count of MOA Features per Category": [],
    "Fraction of all MOA Features": [],
    "Percentage of all MOA Features": [],
}
feature_analysis_summary

{'Variant Category': ['Normalized', 'Not Supported'],
 'Count of MOA Features per Category': [],
 'Fraction of all MOA Features': [],
 'Percentage of all MOA Features': []}

In [116]:
def feature_analysis(
    df: pd.DataFrame, variant_norm_type: VariantNormType
) -> pd.DataFrame:
    """Do feature analysis (counts, percentages). Updates `feature_analysis_summary`

    :param df: Dataframe of variants
    :param variant_norm_type: The kind of features that are in `df`
    :return: Transformed dataframe with variant ID duplicates dropped
    """
    # Drop duplicate rows
    df = df.drop_duplicates(subset=["feature_id"])
    feature_ids = list(df["feature_id"])

    # Count
    num_features = len(feature_ids)
    fraction_features = f"{num_features} / {total_len_features}"
    print(f"\nNumber of {variant_norm_type.value} Features in MOA: {fraction_features}")

    # Percentage
    percentage_features = f"{num_features / total_len_features * 100:.2f}%"
    print(
        f"Percentage of {variant_norm_type.value} Features in MOA: {percentage_features}"
    )

    feature_analysis_summary["Count of MOA Features per Category"].append(num_features)
    feature_analysis_summary["Fraction of all MOA Features"].append(fraction_features)
    feature_analysis_summary["Percentage of all MOA Features"].append(
        percentage_features
    )

    return df

In [117]:
assertion_analysis_summary = {
    "Variant Category": VARIANT_NORM_TYPE_VALUES,
    "Count of MOA Assertions per Category": [],
    "Fraction of all MOA Assertions": [],
    "Percentage of all MOA Assertions": [],
}
assertion_analysis_summary

{'Variant Category': ['Normalized', 'Not Supported'],
 'Count of MOA Assertions per Category': [],
 'Fraction of all MOA Assertions': [],
 'Percentage of all MOA Assertions': []}

In [118]:
def assertion_analysis(
    all_df: pd.DataFrame,
    variant_norm_df: pd.DataFrame,
    variant_norm_type: VariantNormType,
):
    """Do evidence analysis (counts, percentages). Updates `assertion_analysis_summary`

    :param all_df: Dataframe for all assertions and features
    :param variant_norm_df: Dataframe for features given certain `variant_norm_type`
    :param variant_norm_type: The kind of variants that are in `df`
    """
    # Need to do this bc of duplicate features
    _feature_ids = set(variant_norm_df.feature_digest)
    tmp_df = all_df[all_df["feature_digest"].isin(_feature_ids)]

    # Count
    num_assertions = len(tmp_df.assertion_id)
    fraction_assertions = f"{num_assertions} / {total_len_assertions}"
    print(
        f"Number of {variant_norm_type.value} Feature Assertions in MOA: {fraction_assertions}"
    )

    # Percentage
    percentage_assertions = f"{num_assertions / total_len_assertions * 100:.2f}%"
    print(
        f"Percentage of {variant_norm_type.value} Feature Assertions in MOA: {percentage_assertions}"
    )

    assertion_analysis_summary["Count of MOA Assertions per Category"].append(
        num_assertions
    )
    assertion_analysis_summary["Fraction of all MOA Assertions"].append(
        fraction_assertions
    )
    assertion_analysis_summary["Percentage of all MOA Assertions"].append(
        percentage_assertions
    )

In [119]:
feature_id_to_digest_df = pd.DataFrame(
    feature_id_to_digest.items(), columns=["feature_id", "feature_digest"]
)
feature_id_to_digest_df

Unnamed: 0,feature_id,feature_digest
0,1,qd-wbn94ZLfPXYoS5Mdv0IGpN163Z6pr
1,2,qd-wbn94ZLfPXYoS5Mdv0IGpN163Z6pr
2,3,qd-wbn94ZLfPXYoS5Mdv0IGpN163Z6pr
3,4,qd-wbn94ZLfPXYoS5Mdv0IGpN163Z6pr
4,5,qd-wbn94ZLfPXYoS5Mdv0IGpN163Z6pr
...,...,...
889,890,qd-wbn94ZLfPXYoS5Mdv0IGpN163Z6pr
890,891,qd-wbn94ZLfPXYoS5Mdv0IGpN163Z6pr
891,892,qd-wbn94ZLfPXYoS5Mdv0IGpN163Z6pr
892,893,qd-wbn94ZLfPXYoS5Mdv0IGpN163Z6pr


## <a id='toc1_3_'></a>[Normalized Analysis](#toc0_)

In [120]:
normalized_queries_df = pd.read_csv("able_to_normalize_queries.csv", sep="\t")
normalized_queries_df = pd.merge(
    normalized_queries_df,
    feature_id_to_digest_df,
    left_on="variant_id",
    right_on="feature_id",
)
normalized_queries_df.shape

(179, 7)

In [121]:
normalized_queries_df = pd.merge(
    normalized_queries_df["variant_id"],
    moa_df,
    left_on="variant_id",
    right_on="feature_id",
    how="left",
)
normalized_queries_df = normalized_queries_df.drop(columns=["variant_id"])

In [122]:
normalized_queries_df = feature_analysis(
    normalized_queries_df, VariantNormType.NORMALIZED
)
normalized_queries_df


Number of Normalized Features in MOA: 179 / 428
Percentage of Normalized Features in MOA: 41.82%


Unnamed: 0,assertion_id,feature_id,feature_type,predictive_implication,feature_digest,category,impact_score
0,71,71,somatic_variant,Preclinical,KgolzM3HWhww4t4HywFYCySUtGRIQ_mx,Protein Consequence,1
1,73,73,somatic_variant,Clinical evidence,j3HtSnIdrU8CcuW8_Qs3qVxOn-kMJV1T,Protein Consequence,5
2,75,75,somatic_variant,Clinical evidence,X_Az48pPjt4IODuY2a50Yl2_1tGopcuF,Protein Consequence,5
3,76,76,somatic_variant,Clinical evidence,LQQXFXpA4FCOQ3Fz4988x2vynER4J-Wh,Protein Consequence,5
4,77,77,somatic_variant,Clinical evidence,DKoCqZUY0WBdUnoly9DL_PAjBBZTs51d,Protein Consequence,5
...,...,...,...,...,...,...,...
174,858,858,copy_number,Clinical evidence,UYE-1dofAcf0kc44xdOY2hxwkMNUzjl7,Copy Number,5
175,859,859,copy_number,Inferential,s8SpNzXJuTJlGEqC0Rk-zd8ke9l4fq00,Copy Number,0.5
176,868,868,somatic_variant,FDA-Approved,xEngbInsi1BKQp2pVFi44N8CYLj6ZEkD,Protein Consequence,10
177,869,869,somatic_variant,FDA-Approved,fqvuveTjuO96HizOsbWgFQmfF76lGtdl,Protein Consequence,10


In [123]:
assertion_analysis(moa_df, normalized_queries_df, VariantNormType.NORMALIZED)

Number of Normalized Feature Assertions in MOA: 325 / 894
Percentage of Normalized Feature Assertions in MOA: 36.35%


## <a id='toc1_4_'></a>[Not Supported Analysis](#toc0_)

In [124]:
not_supported_queries_df = pd.read_csv("not_supported_variants.csv", sep="\t")
not_supported_queries_df = pd.merge(
    not_supported_queries_df,
    feature_id_to_digest_df,
    left_on="variant_id",
    right_on="feature_id",
)
not_supported_queries_df.shape

(244, 6)

In [125]:
not_supported_queries_df = pd.merge(
    not_supported_queries_df["variant_id"],
    moa_df,
    left_on="variant_id",
    right_on="feature_id",
    how="left",
)
not_supported_queries_df = not_supported_queries_df.drop(columns=["variant_id"])
not_supported_queries_df

Unnamed: 0,assertion_id,feature_id,feature_type,predictive_implication,feature_digest,category,impact_score
0,1,1,rearrangement,FDA-Approved,RnRyn89cJzVbVM93aw4OA44NIF5zblyP,Rearrangements,10
1,12,12,rearrangement,FDA-Approved,g99yF3kKnB-We_fMS5RaVygoSuT7qA-I,Rearrangements,10
2,15,15,rearrangement,FDA-Approved,e8PMq2A96-aBJ3Ip74ovx5VOUCztBTq7,Rearrangements,10
3,18,18,rearrangement,Guideline,DxfRiRV-3J6zRON4pnzNJjXkJf2bsp20,Rearrangements,10
4,21,21,rearrangement,Preclinical,BRsPjsZSCyDXnKtBt9XgsWX2JDNWY3FP,Rearrangements,1
...,...,...,...,...,...,...,...
239,849,849,copy_number,Inferential,jLD_tOKaW8wX5P2RiZ7EkMT8yBzfA_U_,Copy Number,0.5
240,853,853,copy_number,Inferential,3ZPmhQucEgPWkaRLg9viECM4O4pEg-BU,Copy Number,0.5
241,862,862,copy_number,Preclinical,LwEU_0YQA4iVchOdEiBrq_RmIciU-9EW,Copy Number,1
242,863,863,rearrangement,FDA-Approved,MpJsmn4LCLMXDsTNTTlyV5t3fKAnsJzL,Rearrangements,10


### <a id='toc1_4_1_'></a>[Feature (Variant) Analysis](#toc0_)

In [126]:
not_supported_queries_df = feature_analysis(
    not_supported_queries_df, VariantNormType.NOT_SUPPORTED
)


Number of Not Supported Features in MOA: 244 / 428
Percentage of Not Supported Features in MOA: 57.01%


### <a id='toc1_4_2_'></a>[Not Supported Feature (Variant) Analysis by Subcategory](#toc0_)

In [127]:
not_supported_feature_analysis_summary = {
    "Category": VARIANT_CATEGORY_VALUES,
    "Count of MOA Features per Category": [],
    "Fraction of all MOA Features": [],
    "Percent of all MOA Features": [],
    "Fraction of Not Supported Features": [],
    "Percent of Not Supported Features": [],
}

In [128]:
not_supported_feature_categories_summary_data = dict()
total_number_unique_not_supported_features = len(
    set(not_supported_queries_df.feature_id)
)

for category in VARIANT_CATEGORY_VALUES:  # These are not supported categories
    not_supported_feature_categories_summary_data[category] = {}
    category_df = not_supported_queries_df[
        not_supported_queries_df.category == category
    ]

    # Count
    number_unique_not_supported_category_features = len(set(category_df.feature_id))
    not_supported_feature_categories_summary_data[category][
        "number_unique_not_supported_category_features"
    ] = number_unique_not_supported_category_features

    # Fraction
    fraction_not_supported_category_feature_of_moa = (
        f"{number_unique_not_supported_category_features} / {total_len_features}"
    )
    not_supported_feature_categories_summary_data[category][
        "fraction_not_supported_category_feature_of_moa"
    ] = fraction_not_supported_category_feature_of_moa

    # Percent
    percent_not_supported_category_feature_of_moa = f"{number_unique_not_supported_category_features / total_len_features * 100:.2f}%"
    not_supported_feature_categories_summary_data[category][
        "percent_not_supported_category_feature_of_moa"
    ] = percent_not_supported_category_feature_of_moa

    # Not supported fraction
    fraction_not_supported_category_feature_of_total_not_supported = f"{number_unique_not_supported_category_features} / {total_number_unique_not_supported_features}"
    not_supported_feature_categories_summary_data[category][
        "fraction_not_supported_category_feature_of_total_not_supported"
    ] = fraction_not_supported_category_feature_of_total_not_supported

    # Not supported percent
    percent_not_supported_category_feature_of_total_not_supported = f"{number_unique_not_supported_category_features / total_number_unique_not_supported_features * 100:.2f}%"
    not_supported_feature_categories_summary_data[category][
        "percent_not_supported_category_feature_of_total_not_supported"
    ] = percent_not_supported_category_feature_of_total_not_supported

    not_supported_feature_analysis_summary["Count of MOA Features per Category"].append(
        number_unique_not_supported_category_features
    )
    not_supported_feature_analysis_summary["Fraction of all MOA Features"].append(
        fraction_not_supported_category_feature_of_moa
    )
    not_supported_feature_analysis_summary["Percent of all MOA Features"].append(
        percent_not_supported_category_feature_of_moa
    )
    not_supported_feature_analysis_summary["Fraction of Not Supported Features"].append(
        fraction_not_supported_category_feature_of_total_not_supported
    )
    not_supported_feature_analysis_summary["Percent of Not Supported Features"].append(
        percent_not_supported_category_feature_of_total_not_supported
    )

In [129]:
number_unique_not_supported_category_features

0

In [130]:
not_supported_variant_df = pd.DataFrame(not_supported_feature_analysis_summary)
not_supported_variant_df

Unnamed: 0,Category,Count of MOA Features per Category,Fraction of all MOA Features,Percent of all MOA Features,Fraction of Not Supported Features,Percent of Not Supported Features
0,Expression,11,11 / 428,2.57%,11 / 244,4.51%
1,Epigenetic Modification,0,0 / 428,0.00%,0 / 244,0.00%
2,Fusion,0,0 / 428,0.00%,0 / 244,0.00%
3,Protein Consequence,169,169 / 428,39.49%,169 / 244,69.26%
4,Gene Function,0,0 / 428,0.00%,0 / 244,0.00%
5,Rearrangements,38,38 / 428,8.88%,38 / 244,15.57%
6,Copy Number,17,17 / 428,3.97%,17 / 244,6.97%
7,Other,9,9 / 428,2.10%,9 / 244,3.69%
8,Genotypes Easy,0,0 / 428,0.00%,0 / 244,0.00%
9,Genotypes Compound,0,0 / 428,0.00%,0 / 244,0.00%


### <a id='toc1_4_3_'></a>[Not Support Feature (Variant) Assertion (Evidence) Analysis by Subcategory](#toc0_)

List all the possible variant categories

In [131]:
not_supported_feature_categories = not_supported_queries_df.category.unique()
[v for v in not_supported_feature_categories]

['Rearrangements', 'Protein Consequence', 'Copy Number', 'Other', 'Expression']

In [132]:
not_supported_queries_df

Unnamed: 0,assertion_id,feature_id,feature_type,predictive_implication,feature_digest,category,impact_score
0,1,1,rearrangement,FDA-Approved,RnRyn89cJzVbVM93aw4OA44NIF5zblyP,Rearrangements,10
1,12,12,rearrangement,FDA-Approved,g99yF3kKnB-We_fMS5RaVygoSuT7qA-I,Rearrangements,10
2,15,15,rearrangement,FDA-Approved,e8PMq2A96-aBJ3Ip74ovx5VOUCztBTq7,Rearrangements,10
3,18,18,rearrangement,Guideline,DxfRiRV-3J6zRON4pnzNJjXkJf2bsp20,Rearrangements,10
4,21,21,rearrangement,Preclinical,BRsPjsZSCyDXnKtBt9XgsWX2JDNWY3FP,Rearrangements,1
...,...,...,...,...,...,...,...
239,849,849,copy_number,Inferential,jLD_tOKaW8wX5P2RiZ7EkMT8yBzfA_U_,Copy Number,0.5
240,853,853,copy_number,Inferential,3ZPmhQucEgPWkaRLg9viECM4O4pEg-BU,Copy Number,0.5
241,862,862,copy_number,Preclinical,LwEU_0YQA4iVchOdEiBrq_RmIciU-9EW,Copy Number,1
242,863,863,rearrangement,FDA-Approved,MpJsmn4LCLMXDsTNTTlyV5t3fKAnsJzL,Rearrangements,10


In [133]:
assertion_analysis(moa_df, not_supported_queries_df, VariantNormType.NOT_SUPPORTED)

Number of Not Supported Feature Assertions in MOA: 564 / 894
Percentage of Not Supported Feature Assertions in MOA: 63.09%


In [134]:
not_supported_feature_assertion_summary = {
    "Category": VARIANT_CATEGORY_VALUES,
    "Count of MOA Assertions per Category": [],
    "Fraction of MOA Assertions": [],
    "Percent of all MOA Assertions": [],
    "Fraction of Not Supported Feature Assertions": [],
    "Percent of Not Supported Feature Assertions": [],
}

In [135]:
not_supported_feature_ids = set(not_supported_queries_df.feature_digest)

In [136]:
not_supported_feature_categories_assertion_summary_data = dict()
total_number_not_supported_feature_unique_assertions = len(
    set(not_supported_queries_df.assertion_id)
)
not_supported_feature_ids = set(not_supported_queries_df.feature_digest)

for category in VARIANT_CATEGORY_VALUES:
    not_supported_feature_categories_assertion_summary_data[category] = {}

    # Need to do this bc of duplicate features
    tmp_df = moa_df[moa_df["feature_digest"].isin(not_supported_feature_ids)]

    evidence_category_df = tmp_df[tmp_df.category == category]

    evidence_category_df = evidence_category_df.drop_duplicates(subset=["assertion_id"])

    # Count
    number_unique_not_supported_category_assertion = len(
        set(evidence_category_df.assertion_id)
    )
    not_supported_feature_categories_assertion_summary_data[category][
        "number_unique_not_supported_category_assertion"
    ] = number_unique_not_supported_category_assertion

    # Fraction
    fraction_not_supported_category_feature_assertion_of_moa = (
        f"{number_unique_not_supported_category_assertion} / {total_len_assertions}"
    )
    not_supported_feature_categories_assertion_summary_data[category][
        "fraction_not_supported_category_feature_assertion_of_moa"
    ] = fraction_not_supported_category_feature_assertion_of_moa

    # Percent
    percent_not_supported_category_feature_assertion_of_moa = f"{number_unique_not_supported_category_assertion / total_len_assertions * 100:.2f}%"
    not_supported_feature_categories_assertion_summary_data[category][
        "percent_not_supported_category_feature_assertion_of_moa"
    ] = percent_not_supported_category_feature_assertion_of_moa

    # Not supported fraction
    fraction_not_supported_category_feature_assertion_of_total_not_supported = f"{number_unique_not_supported_category_assertion} / {total_number_not_supported_feature_unique_assertions}"
    not_supported_feature_categories_assertion_summary_data[category][
        "fraction_not_supported_category_feature_assertion_of_total_not_supported"
    ] = fraction_not_supported_category_feature_assertion_of_total_not_supported

    # Not supported percent
    percent_not_supported_category_feature_assertion_of_total_not_supported = f"{number_unique_not_supported_category_assertion / total_number_not_supported_feature_unique_assertions * 100:.2f}%"
    not_supported_feature_categories_assertion_summary_data[category][
        "percent_not_supported_category_feature_assertion_of_total_not_supported"
    ] = percent_not_supported_category_feature_assertion_of_total_not_supported

    not_supported_feature_assertion_summary[
        "Count of MOA Assertions per Category"
    ].append(number_unique_not_supported_category_assertion)
    not_supported_feature_assertion_summary["Fraction of MOA Assertions"].append(
        fraction_not_supported_category_feature_assertion_of_moa
    )
    not_supported_feature_assertion_summary["Percent of all MOA Assertions"].append(
        percent_not_supported_category_feature_assertion_of_moa
    )
    not_supported_feature_assertion_summary[
        "Fraction of Not Supported Feature Assertions"
    ].append(fraction_not_supported_category_feature_assertion_of_total_not_supported)
    not_supported_feature_assertion_summary[
        "Percent of Not Supported Feature Assertions"
    ].append(percent_not_supported_category_feature_assertion_of_total_not_supported)

In [137]:
number_unique_not_supported_category_features

0

### <a id='toc1_4_4_'></a>[Impact Score Analysis by Subcategory](#toc0_)

In [138]:
not_supported_impact_summary = {
    "Category": VARIANT_CATEGORY_VALUES,
    "MOA Total Sum Impact Score": [],
    "Average Impact Score per Feature": [],
    "Average Impact Score per Assertion": [],
    "Total Number Assertions": [
        v["number_unique_not_supported_category_assertion"]
        for v in not_supported_feature_categories_assertion_summary_data.values()
    ],
    "Total Number Features": [
        v["number_unique_not_supported_category_features"]
        for v in not_supported_feature_categories_summary_data.values()
    ],
}

In [139]:
not_supported_feature_categories_impact_data = dict()
for category in VARIANT_CATEGORY_VALUES:
    # print(category)
    not_supported_feature_categories_impact_data[category] = {}
    impact_category_df = not_supported_queries_df[
        not_supported_queries_df["category"] == category
    ].copy()

    total_sum_not_supported_category_impact = impact_category_df["impact_score"].sum()
    # print(f"total sum {total_sum_not_supported_category_impact}")
    not_supported_feature_categories_impact_data[category][
        "total_sum_not_supported_category_impact"
    ] = total_sum_not_supported_category_impact
    # print(f"")
    number_unique_not_supported_category_features = (
        impact_category_df.feature_id.nunique()
    )
    number_unique_not_supported_category_assertion = (
        impact_category_df.assertion_id.nunique()
    )
    # print(number_unique_not_supported_category_features)
    # print(number_unique_not_supported_category_assertion)

    if number_unique_not_supported_category_features == 0:
        avg_impact_score_feature = 0
        not_supported_feature_categories_impact_data[category][
            "avg_impact_score_feature"
        ] = avg_impact_score_feature

        avg_impact_score_assertion = 0
        not_supported_feature_categories_impact_data[category][
            "avg_impact_score_evidence"
        ] = avg_impact_score_assertion
    else:
        avg_impact_score_feature = f"{total_sum_not_supported_category_impact / number_unique_not_supported_category_features:.2f}"
        not_supported_feature_categories_impact_data[category][
            "avg_impact_score_feature"
        ] = avg_impact_score_feature

        avg_impact_score_assertion = f"{total_sum_not_supported_category_impact / number_unique_not_supported_category_assertion:.2f}"
        not_supported_feature_categories_impact_data[category][
            "avg_impact_score_evidence"
        ] = avg_impact_score_assertion

    not_supported_impact_summary["MOA Total Sum Impact Score"].append(
        total_sum_not_supported_category_impact
    )
    not_supported_impact_summary["Average Impact Score per Feature"].append(
        avg_impact_score_feature
    )
    not_supported_impact_summary["Average Impact Score per Assertion"].append(
        avg_impact_score_assertion
    )

    print(
        f"Number of unique features within category: {number_unique_not_supported_category_features}"
    )
    print(
        f"{category}: {total_sum_not_supported_category_impact}, {avg_impact_score_feature}, {avg_impact_score_assertion}"
    )

Number of unique features within category: 11
Expression: 11, 1.00, 1.00
Number of unique features within category: 0
Epigenetic Modification: 0, 0, 0
Number of unique features within category: 0
Fusion: 0, 0, 0
Number of unique features within category: 169
Protein Consequence: 1054.5, 6.24, 6.24
Number of unique features within category: 0
Gene Function: 0, 0, 0
Number of unique features within category: 38
Rearrangements: 291.0, 7.66, 7.66
Number of unique features within category: 17
Copy Number: 44.5, 2.62, 2.62
Number of unique features within category: 9
Other: 32.5, 3.61, 3.61
Number of unique features within category: 0
Genotypes Easy: 0, 0, 0
Number of unique features within category: 0
Genotypes Compound: 0, 0, 0
Number of unique features within category: 0
Region Defined Variant: 0, 0, 0
Number of unique features within category: 0
Transcript Variant: 0, 0, 0


In [140]:
not_supported_feature_impact_df = pd.DataFrame(not_supported_impact_summary)

In [141]:
not_supported_feature_impact_df

Unnamed: 0,Category,MOA Total Sum Impact Score,Average Impact Score per Feature,Average Impact Score per Assertion,Total Number Assertions,Total Number Features
0,Expression,11.0,1.0,1.0,12,11
1,Epigenetic Modification,0.0,0.0,0.0,0,0
2,Fusion,0.0,0.0,0.0,0,0
3,Protein Consequence,1054.5,6.24,6.24,419,169
4,Gene Function,0.0,0.0,0.0,0,0
5,Rearrangements,291.0,7.66,7.66,81,38
6,Copy Number,44.5,2.62,2.62,29,17
7,Other,32.5,3.61,3.61,23,9
8,Genotypes Easy,0.0,0.0,0.0,0,0
9,Genotypes Compound,0.0,0.0,0.0,0,0


In [142]:
not_supported_feature_impact_df.to_csv(
    "../not_supported_feature_impact_df.csv", index=False
)

# <a id='toc2_'></a>[MOA Summary](#toc0_)

## <a id='toc2_1_'></a>[Feature (Variant) Analysis](#toc0_)

In [143]:
all_features_df = pd.DataFrame(feature_analysis_summary)

In [144]:
all_features_df["Percentage of all MOA Features"] = (
    all_features_df["Fraction of all MOA Features"].astype(str)
    + "  ("
    + all_features_df["Percentage of all MOA Features"]
    + ")"
)

In [145]:
for_merge_all_variant_percent_of_moa_df = all_features_df.drop(
    columns=["Fraction of all MOA Features"]
)

all_features_percent_of_moa_df = all_features_df.drop(
    columns=["Fraction of all MOA Features", "Count of MOA Features per Category"]
)

In [146]:
for_merge_all_variant_percent_of_moa_df.to_csv(
    "../for_merge_all_variant_percent_of_moa_df.csv", index=False
)

Summary Table 1: The table below shows the 2 categories that MOA features (variants) were divided into after normalization and what percentage they make up of all features (variants) in MOA data. 

In [147]:
all_features_percent_of_moa_df = all_features_percent_of_moa_df.set_index(
    "Variant Category"
)
all_features_percent_of_moa_df

Unnamed: 0_level_0,Percentage of all MOA Features
Variant Category,Unnamed: 1_level_1
Normalized,179 / 428 (41.82%)
Not Supported,244 / 428 (57.01%)


In [148]:
moa_summary_table_1 = all_features_percent_of_moa_df

Summary Table 2: The table below shows the categories that the Not Supported features (variants) were broken into and what percentage of all MOA features (variants) they make up.

In [149]:
not_supported_features_total_df = pd.DataFrame(not_supported_feature_analysis_summary)
not_supported_features_total_df["Percent of all MOA Features"] = (
    not_supported_features_total_df["Fraction of all MOA Features"].astype(str)
    + "  ("
    + not_supported_features_total_df["Percent of all MOA Features"]
    + ")"
)
for_merge_not_supported_features_total_df = not_supported_features_total_df.drop(
    columns=[
        "Fraction of all MOA Features",
        "Fraction of Not Supported Features",
        "Percent of Not Supported Features",
    ]
)

not_supported_features_total_df = not_supported_features_total_df.drop(
    columns=[
        "Fraction of all MOA Features",
        "Fraction of Not Supported Features",
        "Percent of Not Supported Features",
        "Count of MOA Features per Category",
    ]
)
not_supported_features_total_df = not_supported_features_total_df.set_index("Category")
not_supported_features_total_df

Unnamed: 0_level_0,Percent of all MOA Features
Category,Unnamed: 1_level_1
Expression,11 / 428 (2.57%)
Epigenetic Modification,0 / 428 (0.00%)
Fusion,0 / 428 (0.00%)
Protein Consequence,169 / 428 (39.49%)
Gene Function,0 / 428 (0.00%)
Rearrangements,38 / 428 (8.88%)
Copy Number,17 / 428 (3.97%)
Other,9 / 428 (2.10%)
Genotypes Easy,0 / 428 (0.00%)
Genotypes Compound,0 / 428 (0.00%)


In [150]:
moa_summary_table_2 = not_supported_features_total_df

In [151]:
for_merge_not_supported_features_total_df.to_csv(
    "../for_merge_not_supported_features_total_df.csv", index=False
)

Summary Table 3: The table below shows the categories that the Not Supported features (variants) were broken into what percent each sub category take up in Not Supported variant group.

In [152]:
not_supported_features_category_df = pd.DataFrame(
    not_supported_feature_analysis_summary
)
not_supported_features_category_df["Percent of Not Supported Features"] = (
    not_supported_features_category_df["Fraction of Not Supported Features"].astype(str)
    + "  ("
    + not_supported_features_category_df["Percent of Not Supported Features"]
    + ")"
)
not_supported_features_category_df = not_supported_features_category_df.drop(
    columns=[
        "Fraction of all MOA Features",
        "Fraction of Not Supported Features",
        "Percent of all MOA Features",
        "Count of MOA Features per Category",
    ]
)
not_supported_features_category_df = not_supported_features_category_df.set_index(
    "Category"
)
not_supported_features_category_df

Unnamed: 0_level_0,Percent of Not Supported Features
Category,Unnamed: 1_level_1
Expression,11 / 244 (4.51%)
Epigenetic Modification,0 / 244 (0.00%)
Fusion,0 / 244 (0.00%)
Protein Consequence,169 / 244 (69.26%)
Gene Function,0 / 244 (0.00%)
Rearrangements,38 / 244 (15.57%)
Copy Number,17 / 244 (6.97%)
Other,9 / 244 (3.69%)
Genotypes Easy,0 / 244 (0.00%)
Genotypes Compound,0 / 244 (0.00%)


In [153]:
moa_summary_table_3 = not_supported_features_category_df

## <a id='toc2_2_'></a>[Evidence Analysis](#toc0_)

In [154]:
all_features_assertions_df = pd.DataFrame(assertion_analysis_summary)

In [155]:
all_features_assertions_df["Percentage of all MOA Assertions"] = (
    all_features_assertions_df["Fraction of all MOA Assertions"].astype(str)
    + "  ("
    + all_features_assertions_df["Percentage of all MOA Assertions"]
    + ")"
)

In [156]:
for_merge_all_features_assertions_df = all_features_assertions_df.drop(
    columns=["Fraction of all MOA Assertions"]
)

all_features_assertions_df = for_merge_all_features_assertions_df.drop(
    columns=["Count of MOA Assertions per Category"]
)

In [157]:
for_merge_all_features_assertions_df.to_csv(
    "../for_merge_all_features_assertions_df.csv", index=False
)

Summary Table 4: The table below shows what percentage of all assertions (evidence items) in MOA are associated with Normalized and Not Supported features (variants)

In [158]:
all_features_assertions_df = all_features_assertions_df.set_index("Variant Category")
moa_summary_table_4 = all_features_assertions_df
moa_summary_table_4

Unnamed: 0_level_0,Percentage of all MOA Assertions
Variant Category,Unnamed: 1_level_1
Normalized,325 / 894 (36.35%)
Not Supported,564 / 894 (63.09%)


In [159]:
not_supported_feature_assertion_df = pd.DataFrame(
    not_supported_feature_assertion_summary
)

In [160]:
not_supported_feature_assertion_df["Percent of all MOA Assertions"] = (
    not_supported_feature_assertion_df["Fraction of MOA Assertions"].astype(str)
    + "  ("
    + not_supported_feature_assertion_df["Percent of all MOA Assertions"]
    + ")"
)
not_supported_feature_assertion_df["Percent of Not Supported Feature Assertions"] = (
    not_supported_feature_assertion_df[
        "Fraction of Not Supported Feature Assertions"
    ].astype(str)
    + "  ("
    + not_supported_feature_assertion_df["Percent of Not Supported Feature Assertions"]
    + ")"
)

In [161]:
not_supported_feature_assertion_df = not_supported_feature_assertion_df.drop(
    columns=[
        "Fraction of MOA Assertions",
        "Fraction of Not Supported Feature Assertions",
    ]
)

In [162]:
for_merge_not_supported_feature_assertion_df = not_supported_feature_assertion_df.drop(
    ["Percent of Not Supported Feature Assertions"], axis=1
)

not_supported_feature_assertion_of_moa_df = (
    for_merge_not_supported_feature_assertion_df.drop(
        ["Count of MOA Assertions per Category"], axis=1
    )
)

not_supported_feature_assertion_of_not_supported_df = (
    not_supported_feature_assertion_df.drop(
        ["Percent of all MOA Assertions", "Count of MOA Assertions per Category"],
        axis=1,
    )
)

In [163]:
for_merge_not_supported_feature_assertion_df.to_csv(
    "../for_merge_not_supported_feature_assertion_df.csv", index=False
)

Summary Table 5: The table below shows the percentage of all MOA assertions (evidence items) that are associated with a Not Supported variant sub category.

In [164]:
not_supported_feature_assertion_of_moa_df = (
    not_supported_feature_assertion_of_moa_df.set_index("Category")
)
moa_summary_table_5 = not_supported_feature_assertion_of_moa_df
moa_summary_table_5

Unnamed: 0_level_0,Percent of all MOA Assertions
Category,Unnamed: 1_level_1
Expression,12 / 894 (1.34%)
Epigenetic Modification,0 / 894 (0.00%)
Fusion,0 / 894 (0.00%)
Protein Consequence,419 / 894 (46.87%)
Gene Function,0 / 894 (0.00%)
Rearrangements,81 / 894 (9.06%)
Copy Number,29 / 894 (3.24%)
Other,23 / 894 (2.57%)
Genotypes Easy,0 / 894 (0.00%)
Genotypes Compound,0 / 894 (0.00%)


Summary Table 6: The table below shows the percentage of all MOA Assertions (evidence items) associated with Not Supported features (variants) that are associated with a variant sub category. 

In [165]:
not_supported_feature_assertion_of_not_supported_df = (
    not_supported_feature_assertion_of_not_supported_df.set_index("Category")
)
moa_summary_table_6 = not_supported_feature_assertion_of_not_supported_df
moa_summary_table_6

Unnamed: 0_level_0,Percent of Not Supported Feature Assertions
Category,Unnamed: 1_level_1
Expression,12 / 244 (4.92%)
Epigenetic Modification,0 / 244 (0.00%)
Fusion,0 / 244 (0.00%)
Protein Consequence,419 / 244 (171.72%)
Gene Function,0 / 244 (0.00%)
Rearrangements,81 / 244 (33.20%)
Copy Number,29 / 244 (11.89%)
Other,23 / 244 (9.43%)
Genotypes Easy,0 / 244 (0.00%)
Genotypes Compound,0 / 244 (0.00%)


## <a id='toc2_3_'></a>[Impact](#toc0_)

The bar graph below shows the relationship between the Not Suported variant sub category impact score and the sub category. Additionally, the colors illustrate the number of assertions (evidence items) associated each sub category.

In [166]:
not_supported_feature_impact_df = pd.DataFrame(not_supported_impact_summary)
not_supported_feature_impact_df

Unnamed: 0,Category,MOA Total Sum Impact Score,Average Impact Score per Feature,Average Impact Score per Assertion,Total Number Assertions,Total Number Features
0,Expression,11.0,1.0,1.0,12,11
1,Epigenetic Modification,0.0,0.0,0.0,0,0
2,Fusion,0.0,0.0,0.0,0,0
3,Protein Consequence,1054.5,6.24,6.24,419,169
4,Gene Function,0.0,0.0,0.0,0,0
5,Rearrangements,291.0,7.66,7.66,81,38
6,Copy Number,44.5,2.62,2.62,29,17
7,Other,32.5,3.61,3.61,23,9
8,Genotypes Easy,0.0,0.0,0.0,0,0
9,Genotypes Compound,0.0,0.0,0.0,0,0


In [167]:
not_supported_feature_impact_df.to_csv(
    "../not_supported_feature_impact_df.csv", index=False
)

In [168]:
fig3 = px.bar(
    not_supported_feature_impact_df,
    x="Category",
    y="MOA Total Sum Impact Score",
    hover_data=["Total Number Assertions"],
    color="Total Number Assertions",
    labels={"MOA Total Sum Impact Score": "MOA Total Sum Impact Score"},
    text_auto=".1f",
    color_continuous_scale="geyser",
)
fig3.update_traces(width=1)
fig3.show()

In [169]:
fig3.write_html("moa_ns_categories_impact_redgreen.html")

The scatterplot below shows the relationship between the Not Suported variant sub category impact score and the number of assertions (evidence items) associated with features (variants) in each sub category. Additionally, the sizes of the data point represent the number of features (variants) in each sub category. 

In [170]:
fig2 = px.scatter(
    data_frame=not_supported_feature_impact_df,
    x="Total Number Assertions",
    y="MOA Total Sum Impact Score",
    size="Total Number Features",
    size_max=40,
    text="Total Number Features",
    color="Category",
)
fig2.show()

In [171]:
fig2.write_html("moa_ns_categories_impact_scatterplot.html")