# <a id='toc1_'></a>[Merged notebooks for Not Supported CIViC and MOA evidence (assertion analysis)](#toc0_)

**Table of contents**<a id='toc0_'></a>    
- [Merged notebooks for Not Supported CIViC and MOA evidence (assertion analysis)](#toc1_)    
    - [Create output directory](#toc1_1_1_)    
  - [Merge CIViC and MOA Summary Variant Dataframes](#toc1_2_)    
    - [Summary Table 1](#toc1_2_1_)    
    - [Building Summary Table 2](#toc1_2_2_)    
    - [Summary Table 2](#toc1_2_3_)    
    - [Building Summary Table 3](#toc1_2_4_)    
    - [Summary Table 3](#toc1_2_5_)    
    - [Building Summary Table 4](#toc1_2_6_)    
    - [Summary Table 4](#toc1_2_7_)    
    - [Building Summary Table 5](#toc1_2_8_)    
    - [Summary Table 5](#toc1_2_9_)    
    - [Building Summary Table 6](#toc1_2_10_)    
    - [Summary Table 6](#toc1_2_11_)    
  - [Merge CIViC and MOA Summary Evidence Dataframes](#toc1_3_)    
    - [Summary Table 7](#toc1_3_1_)    
    - [Building Summary Table 8](#toc1_3_2_)    
    - [Summary Table 8](#toc1_3_3_)    
    - [Building Summary Table 9](#toc1_3_4_)    
    - [Summary Table 9](#toc1_3_5_)    
    - [Building Summary Table 10](#toc1_3_6_)    
    - [Summary Table 10](#toc1_3_7_)    
    - [Building Summary Table 11](#toc1_3_8_)    
    - [Summary Table 11](#toc1_3_9_)    
    - [Building Summary Table 12](#toc1_3_10_)    
    - [Summary Table 12](#toc1_3_11_)    
  - [Merge CIViC and MOA Summary Impact Dataframes](#toc1_4_)    
    - [Building Summary Table 13 & 14](#toc1_4_1_)    
    - [Summary Table 13](#toc1_4_2_)    
    - [Summary Table 14](#toc1_4_3_)    
  - [Builidng Scatterpie plot](#toc1_5_)    
    - [Merge aspects of the dataframe (number of evidence items, variants, impact score)](#toc1_5_1_)    
    - [Calculate the ratio of features/variants that come from MOA](#toc1_5_2_)    
  - [Building Parallel Impact Plot](#toc1_6_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

In [None]:
from pathlib import Path
import zipfile
import pandas as pd
from civicpy import civic as civicpy
from enum import Enum
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from matplotlib.lines import Line2D
from matplotlib.pyplot import figure
from matplotlib.path import Path as mPath
from matplotlib.offsetbox import OffsetImage, AnnotationBbox
from matplotlib.colors import LinearSegmentedColormap

### <a id='toc1_1_1_'></a>[Create output directory](#toc0_)

In [None]:
path = Path("merged_moa_civic_evidence_analysis_output")
path.mkdir(exist_ok=True)

## <a id='toc1_2_'></a>[Merge CIViC and MOA Summary Variant Dataframes](#toc0_)

In [None]:
# Use latest cache that has been pushed to the repo
latest_cache_zip_path = sorted(Path().glob("../../analysis/civic/cache-*.pkl.zip"))[-1]
print(f"Using {latest_cache_zip_path} for civicpy cache")

with zipfile.ZipFile(latest_cache_zip_path, "r") as zip_ref:
    zip_ref.extractall("../../analysis/civic/")

civicpy.load_cache(
    local_cache_path=Path("../../analysis/civic/cache.pkl"), on_stale="ignore"
)

In [None]:
civic_variant_ids = civicpy.get_all_variants(include_status=["accepted", "submitted"])
total_number_variants = len(civic_variant_ids)
f"Total Number of variants in CIViC: {total_number_variants}"

In [None]:
moa_df = pd.read_csv(
    "../moa/assertion_analysis/moa_assertion_analysis_output/moa_df.csv", sep=","
)

In [None]:
total_len_features = len(moa_df.feature_digest.unique())
f"Total number of unique features (variants): {total_len_features}"

Add all variants from CIViC and MOA

In [None]:
merged_variant_total = total_number_variants + total_len_features
merged_variant_total = float(merged_variant_total)
merged_variant_total

Import summary tables from source notebooks

In [None]:
for_merge_all_variant_percent_of_civic_df = pd.read_csv(
    "../civic/evidence_analysis/civic_evidence_analysis_output/for_merge_all_variant_percent_of_civic_df.csv",
    sep=",",
)
for_merge_all_variant_percent_of_moa_df = pd.read_csv(
    "../moa/assertion_analysis/moa_assertion_analysis_output/for_merge_all_variant_percent_of_moa_df.csv",
    sep=",",
)

Merge CIViC and MOA sumary tables

In [None]:
merged_all_variants_df = pd.merge(
    for_merge_all_variant_percent_of_civic_df,
    for_merge_all_variant_percent_of_moa_df,
    on="Variant Category",
    how="outer",
)
merged_all_variants_df = merged_all_variants_df.replace(np.nan, 0, regex=True)

### <a id='toc1_2_1_'></a>[Summary Table 1](#toc0_)

The table below shows the 3 categories that CIViC and MOA variants were divided into after normalization and what percent they make up of all variants in the respective source data.

<ins>Numerator:</ins> # of CIViC or MOA variants based on normalization status
<br><ins>Denominator:</ins> # of all CIViC or MOA variants

In [None]:
merged_civic_moa_summary_table_1 = merged_all_variants_df[
    ["Variant Category", "Percent of all CIViC Variants", "Percent of all MOA Features"]
]

merged_civic_moa_summary_table_1 = merged_civic_moa_summary_table_1.set_index(
    "Variant Category"
)
merged_civic_moa_summary_table_1

### <a id='toc1_2_2_'></a>[Building Summary Table 2](#toc0_)

Add up variants from CIViC and MOA for each Variant Category

In [None]:
merged_all_variants_df["Count of CIViC Variants per Category"] = merged_all_variants_df[
    "Count of CIViC Variants per Category"
].astype(float)
merged_all_variants_df["Sum of Variants from CIViC and MOA per Category"] = (
    merged_all_variants_df["Count of CIViC Variants per Category"]
    + merged_all_variants_df["Count of MOA Features per Category"]
)

New column for the total combined variant number from CIViC and MOA

In [None]:
merged_all_variants_df[
    "Sum of total Variants from CIViC and MOA"
] = merged_variant_total

New percent of each category of the total merged variants from CIViC and MOA

In [None]:
merged_all_variants_df["Merged Variant Percent"] = (
    merged_all_variants_df["Sum of Variants from CIViC and MOA per Category"]
    / merged_all_variants_df["Sum of total Variants from CIViC and MOA"]
) * 100
merged_all_variants_df = merged_all_variants_df.round({"Merged Variant Percent": 2})
merged_all_variants_df["Merged Variant Percent"] = (
    merged_all_variants_df["Merged Variant Percent"].astype(str) + "%"
)

In [None]:
merged_all_variants_df[
    "Sum of Variants from CIViC and MOA per Category"
] = merged_all_variants_df["Sum of Variants from CIViC and MOA per Category"].astype(
    int
)

merged_all_variants_df[
    "Sum of total Variants from CIViC and MOA"
] = merged_all_variants_df["Sum of total Variants from CIViC and MOA"].astype(int)

Merge fraction and percent

In [None]:
merged_all_variants_df["Percent of all Merged Variants"] = (
    merged_all_variants_df["Sum of Variants from CIViC and MOA per Category"].astype(
        str
    )
    + " / "
    + merged_all_variants_df["Sum of total Variants from CIViC and MOA"].astype(str)
    + " ("
    + merged_all_variants_df["Merged Variant Percent"].astype(str)
    + ")"
)

### <a id='toc1_2_3_'></a>[Summary Table 2](#toc0_)

The table below shows the 3 categories that merged CIViC and MOA variants were divided into after normalization and what percent they make up of all variants in the combined data. 

<ins>Numerator:</ins> # of variants (from CIViC and MOA combined) based on normalization status
<br><ins>Denominator:</ins> # of all variants (from CIViC and MOA combined)

In [None]:
merged_civic_moa_summary_table_2 = merged_all_variants_df[
    ["Variant Category", "Percent of all Merged Variants"]
].copy()
merged_civic_moa_summary_table_2 = merged_civic_moa_summary_table_2.set_index(
    "Variant Category"
)
merged_civic_moa_summary_table_2

### <a id='toc1_2_4_'></a>[Building Summary Table 3](#toc0_)

In [None]:
merged_all_variants_df["Count of CIViC Variants per Category"] = merged_all_variants_df[
    "Count of CIViC Variants per Category"
].astype(int)

merged_all_variants_df["Count of MOA Features per Category"] = merged_all_variants_df[
    "Count of MOA Features per Category"
].astype(int)

New percent of CIViC contribution for each category

In [None]:
merged_all_variants_df["CIViC Variants of Category Percent"] = (
    merged_all_variants_df["Count of CIViC Variants per Category"]
    / merged_all_variants_df["Sum of Variants from CIViC and MOA per Category"]
) * 100
merged_all_variants_df = merged_all_variants_df.round(
    {"CIViC Variants of Category Percent": 2}
)
merged_all_variants_df["CIViC Variants of Category Percent"] = (
    merged_all_variants_df["CIViC Variants of Category Percent"].astype(str) + "%"
)

Merge fraction and percent

In [None]:
merged_all_variants_df["Percent of CIViC Variants of Category"] = (
    merged_all_variants_df["Count of CIViC Variants per Category"].astype(str)
    + " / "
    + merged_all_variants_df["Sum of Variants from CIViC and MOA per Category"].astype(
        str
    )
    + " ("
    + merged_all_variants_df["CIViC Variants of Category Percent"].astype(str)
    + ")"
)

New percent of MOA contribution for each category

In [None]:
merged_all_variants_df["MOA Variants of Category Percent"] = (
    merged_all_variants_df["Count of MOA Features per Category"]
    / merged_all_variants_df["Sum of Variants from CIViC and MOA per Category"]
) * 100
merged_all_variants_df = merged_all_variants_df.round(
    {"MOA Variants of Category Percent": 2}
)
merged_all_variants_df["MOA Variants of Category Percent"] = (
    merged_all_variants_df["MOA Variants of Category Percent"].astype(str) + "%"
)

Merge fraction and percent

In [None]:
merged_all_variants_df["Percent of MOA Variants of Category"] = (
    merged_all_variants_df["Count of MOA Features per Category"].astype(str)
    + " / "
    + merged_all_variants_df["Sum of Variants from CIViC and MOA per Category"].astype(
        str
    )
    + " ("
    + merged_all_variants_df["MOA Variants of Category Percent"].astype(str)
    + ")"
)

### <a id='toc1_2_5_'></a>[Summary Table 3](#toc0_)

The table below shows what percent of the merged data originates from CIViC and MOA sources

<ins>Numerator:</ins> # of CIViC or MOA variants based on normalization status
<br><ins>Denominator:</ins> # of all variants (from CIViC and MOA combined) based on normalization status

In [None]:
merged_civic_moa_summary_table_3 = merged_all_variants_df[
    [
        "Variant Category",
        "Percent of CIViC Variants of Category",
        "Percent of MOA Variants of Category",
    ]
].copy()
merged_civic_moa_summary_table_3 = merged_civic_moa_summary_table_3.set_index(
    "Variant Category"
)
merged_civic_moa_summary_table_3

### <a id='toc1_2_6_'></a>[Building Summary Table 4](#toc0_)

Import summary tables from source notebooks

In [None]:
for_merge_not_supported_variant_percent_of_civic_df = pd.read_csv(
    "../civic/evidence_analysis/civic_evidence_analysis_output/for_merge_not_supported_variant_percent_of_civic_df.csv",
    sep=",",
)
for_merge_not_supported_features_total_df = pd.read_csv(
    "../moa/assertion_analysis/moa_assertion_analysis_output/for_merge_not_supported_features_total_df.csv",
    sep=",",
)

Merge CIViC and MOA summary tables

In [None]:
merged_not_supported_variants_df = pd.merge(
    for_merge_not_supported_variant_percent_of_civic_df,
    for_merge_not_supported_features_total_df,
    on="Category",
    how="outer",
)
merged_not_supported_variants_df = merged_not_supported_variants_df.replace(
    np.nan, 0, regex=True
)

### <a id='toc1_2_7_'></a>[Summary Table 4](#toc0_)

The table below shows the categories of Not Supported variants and what percent of source (CIViC or MOA) variants they make up. These percentages will not add up to 100% because Not Supported variants make up a subset of CIViC variants and a subset of MOA variants.(Reference Summary Table 1- merged_civic_moa_summary_table_1)

<ins>Numerator:</ins> # of CIViC or MOA variants that are Not Supported in a given Subcategory
<br><ins>Denominator:</ins> # of all CIViC or MOA variants

In [None]:
merged_civic_moa_summary_table_3 = merged_not_supported_variants_df.drop(
    ["Count of CIViC Variants per Category", "Count of MOA Features per Category"],
    axis=1,
)
merged_civic_moa_summary_table_3 = merged_civic_moa_summary_table_3.set_index(
    "Category"
)
merged_civic_moa_summary_table_3

### <a id='toc1_2_8_'></a>[Building Summary Table 5](#toc0_)

Add up evidence from CIViC and MOA for each Variant Category

In [None]:
merged_not_supported_variants_df["Sum of Variants from CIViC and MOA per Category"] = (
    merged_not_supported_variants_df["Count of CIViC Variants per Category"]
    + merged_not_supported_variants_df["Count of MOA Features per Category"]
)
merged_not_supported_variants_df["Sum of total Variants from CIViC and MOA"] = int(
    merged_variant_total
)

New percent of each category of the total merged variants from CIViC and MOA

In [None]:
merged_not_supported_variants_df[
    "Merged Not Supported Variant Category of Merged Total Percent"
] = (
    merged_not_supported_variants_df["Sum of Variants from CIViC and MOA per Category"]
    / merged_not_supported_variants_df["Sum of total Variants from CIViC and MOA"]
) * 100
merged_not_supported_variants_df = merged_not_supported_variants_df.round(
    {"Merged Not Supported Variant Category of Merged Total Percent": 2}
)
merged_not_supported_variants_df[
    "Merged Not Supported Variant Category of Merged Total Percent"
] = (
    merged_not_supported_variants_df[
        "Merged Not Supported Variant Category of Merged Total Percent"
    ].astype(str)
    + "%"
)

Merge fraction and percent

In [None]:
merged_not_supported_variants_df["Percent of all Merged Variants"] = (
    merged_not_supported_variants_df[
        "Sum of Variants from CIViC and MOA per Category"
    ].astype(str)
    + " / "
    + merged_not_supported_variants_df[
        "Sum of total Variants from CIViC and MOA"
    ].astype(str)
    + " ("
    + merged_not_supported_variants_df[
        "Merged Not Supported Variant Category of Merged Total Percent"
    ].astype(str)
    + ")"
)

### <a id='toc1_2_9_'></a>[Summary Table 5](#toc0_)

The table below shows the categories of Not Supported variants and what percent of all variants (CIViC and MOA) they make up. These percentages will not add up to 100% because Not Supported variants make up a subset of all variants (CIViC and MOA). 

<ins>Numerator:</ins> # of variants that are Not Supported in a given Subcategory
<br><ins>Denominator:</ins> # of all variants

In [None]:
merged_civic_moa_summary_table_5 = merged_not_supported_variants_df[
    ["Category", "Percent of all Merged Variants"]
].copy()
merged_civic_moa_summary_table_5 = merged_civic_moa_summary_table_5.set_index(
    "Category"
)
merged_civic_moa_summary_table_5

### <a id='toc1_2_10_'></a>[Building Summary Table 6](#toc0_)

Add up all variants, for total number of Not Supported Variants

In [None]:
merged_not_supported_variant_total = merged_not_supported_variants_df[
    "Sum of Variants from CIViC and MOA per Category"
].sum()
merged_not_supported_variants_df[
    "Merged Not Supported Variant Total"
] = merged_not_supported_variant_total

New percent of each category of the total merged variants from CIViC and MOA

In [None]:
merged_not_supported_variants_df[
    "Merged Not Supported Variant Category of Not Supported Percent"
] = (
    merged_not_supported_variants_df["Sum of Variants from CIViC and MOA per Category"]
    / merged_not_supported_variants_df["Merged Not Supported Variant Total"]
) * 100
merged_not_supported_variants_df = merged_not_supported_variants_df.round(
    {"Merged Not Supported Variant Category of Not Supported Percent": 2}
)
merged_not_supported_variants_df[
    "Merged Not Supported Variant Category of Not Supported Percent"
] = (
    merged_not_supported_variants_df[
        "Merged Not Supported Variant Category of Not Supported Percent"
    ].astype(str)
    + "%"
)

Merge fraction and percent

In [None]:
merged_not_supported_variants_df["Percent of all Not Supported Variants"] = (
    merged_not_supported_variants_df[
        "Sum of Variants from CIViC and MOA per Category"
    ].astype(str)
    + " / "
    + merged_not_supported_variants_df["Merged Not Supported Variant Total"].astype(str)
    + " ("
    + merged_not_supported_variants_df[
        "Merged Not Supported Variant Category of Not Supported Percent"
    ].astype(str)
    + ")"
)

### <a id='toc1_2_11_'></a>[Summary Table 6](#toc0_)

The table below shows the categories of Not Supported variants the percent of each category of total merged Not Supported variants.

<ins>Numerator:</ins> # of variants that are Not Supported in a given Subcategory
<br><ins>Denominator:</ins> # of all variants that are Not Supported

In [None]:
merged_civic_moa_summary_table_6 = merged_not_supported_variants_df[
    ["Category", "Percent of all Not Supported Variants"]
].copy()
merged_civic_moa_summary_table_6 = merged_civic_moa_summary_table_6.set_index(
    "Category"
)
merged_civic_moa_summary_table_6

## <a id='toc1_3_'></a>[Merge CIViC and MOA Summary Evidence Dataframes](#toc0_)

In [None]:
civic_evidence_items = civicpy.get_all_evidence(
    include_status=["accepted", "submitted"]
)

In [None]:
total_ac_sub_evidence = len(civic_evidence_items)
f"Total Number of accepted and submitted evidence items in CIViC: {total_ac_sub_evidence}"

Import dataframe for assertions

In [None]:
total_len_assertions = len(moa_df.assertion_id.unique())
f"Total number of unique assertions (evidence items): {total_len_assertions}"

Add all evidence items(assertions) from CIViC and MOA

In [None]:
merged_evidence_total = total_ac_sub_evidence + total_len_assertions

Import summary tables from source notebooks

In [None]:
for_merge_all_variant_evidence_percent_of_civic_df = pd.read_csv(
    "../civic/evidence_analysis/civic_evidence_analysis_output/for_merge_all_variant_evidence_percent_of_civic_df.csv",
    sep=",",
)
for_merge_all_features_assertions_df = pd.read_csv(
    "../moa/assertion_analysis/moa_assertion_analysis_output/for_merge_all_features_assertions_df.csv",
    sep=",",
)

Merge CIViC and MOA summary tables

In [None]:
merged_all_evidence_df = pd.merge(
    for_merge_all_variant_evidence_percent_of_civic_df,
    for_merge_all_features_assertions_df,
    on="Variant Category",
    how="outer",
)
merged_all_evidence_df = merged_all_evidence_df.replace(np.nan, 0, regex=True)
merged_all_evidence_df["Count of MOA Assertions per Category"] = merged_all_evidence_df[
    "Count of MOA Assertions per Category"
].astype(int)

### <a id='toc1_3_1_'></a>[Summary Table 7](#toc0_)

The table below shows what percent of all evidence items in CIViC and MOA are associated with Normalized, Unable to Normalize, and Not Supported variants. This will not add up to 100% because evidence items may be used accross multiple variants.

<ins>Numerator:</ins> # of CIViC or MOA evidence items based on normalization status of associated variant
<br><ins>Denominator:</ins> # of all CIViC or MOA evidence items

In [None]:
merged_civic_moa_summary_table_7 = merged_all_evidence_df.drop(
    [
        "Count of CIViC Evidence Items per Category",
        "Count of MOA Assertions per Category",
    ],
    axis=1,
)
merged_civic_moa_summary_table_7 = merged_civic_moa_summary_table_7.set_index(
    "Variant Category"
)
merged_civic_moa_summary_table_7

### <a id='toc1_3_2_'></a>[Building Summary Table 8](#toc0_)

In [None]:
# add up variants from CIViC and MOA for each Variant Category
merged_all_evidence_df["Sum of Evidence from CIViC and MOA per Category"] = (
    merged_all_evidence_df["Count of CIViC Evidence Items per Category"]
    + merged_all_evidence_df["Count of MOA Assertions per Category"]
)

In [None]:
# new column for the total combined variant number from CIViC and MOA
merged_all_evidence_df[
    "Sum of total Evidence from CIViC and MOA"
] = merged_evidence_total

In [None]:
# new percent of each category of the total merged variants from CIViC and MOA
merged_all_evidence_df["Merged Evidence Percent"] = (
    merged_all_evidence_df["Sum of Evidence from CIViC and MOA per Category"]
    / merged_all_evidence_df["Sum of total Evidence from CIViC and MOA"]
) * 100
merged_all_evidence_df = merged_all_evidence_df.round({"Merged Evidence Percent": 2})
merged_all_evidence_df["Merged Evidence Percent"] = (
    merged_all_evidence_df["Merged Evidence Percent"].astype(str) + "%"
)

In [None]:
# merge fraction and percent
merged_all_evidence_df["Percent of all Merged Evidence Items"] = (
    merged_all_evidence_df["Sum of Evidence from CIViC and MOA per Category"].astype(
        str
    )
    + " / "
    + merged_all_evidence_df["Sum of total Evidence from CIViC and MOA"].astype(str)
    + " ("
    + merged_all_evidence_df["Merged Evidence Percent"].astype(str)
    + ")"
)

### <a id='toc1_3_3_'></a>[Summary Table 8](#toc0_)

The table below shows what percent of all evidence items in merged CIViC and MOA data are associated with Normalized, Unable to Normalize, and Not Supported variants. This will not add up to 100% because evidence itmes may be used across multiple variants.

<ins>Numerator:</ins> # of evidence items (from CIViC and MOA combined) based on normalization status of associated variant
<br><ins>Denominator:</ins> # of all evidence items (from CIViC and MOA combined)

In [None]:
# clean up summary table by dropping columns and resetting index
merged_civic_moa_summary_table_8 = merged_all_evidence_df[
    ["Variant Category", "Percent of all Merged Evidence Items"]
].copy()
merged_civic_moa_summary_table_8 = merged_civic_moa_summary_table_8.set_index(
    "Variant Category"
)
merged_civic_moa_summary_table_8

### <a id='toc1_3_4_'></a>[Building Summary Table 9](#toc0_)

In [None]:
# new percent of CIViC contribution for each category
merged_all_evidence_df["CIViC Evidence of Category Percent"] = (
    merged_all_evidence_df["Count of CIViC Evidence Items per Category"]
    / merged_all_evidence_df["Sum of Evidence from CIViC and MOA per Category"]
) * 100
merged_all_evidence_df = merged_all_evidence_df.round(
    {"CIViC Evidence of Category Percent": 2}
)
merged_all_evidence_df["CIViC Evidence of Category Percent"] = (
    merged_all_evidence_df["CIViC Evidence of Category Percent"].astype(str) + "%"
)

In [None]:
# merge fraction and percent
merged_all_evidence_df["Percent of CIViC Evidence of Category"] = (
    merged_all_evidence_df["Count of CIViC Evidence Items per Category"].astype(str)
    + " / "
    + merged_all_evidence_df["Sum of Evidence from CIViC and MOA per Category"].astype(
        str
    )
    + " ("
    + merged_all_evidence_df["CIViC Evidence of Category Percent"].astype(str)
    + ")"
)

In [None]:
# new percent of MOA contribution for each category
merged_all_evidence_df["MOA Evidence of Category Percent"] = (
    merged_all_evidence_df["Count of MOA Assertions per Category"]
    / merged_all_evidence_df["Sum of Evidence from CIViC and MOA per Category"]
) * 100
merged_all_evidence_df = merged_all_evidence_df.round(
    {"MOA Evidence of Category Percent": 2}
)
merged_all_evidence_df["MOA Evidence of Category Percent"] = (
    merged_all_evidence_df["MOA Evidence of Category Percent"].astype(str) + "%"
)

In [None]:
# merge fraction and percent
merged_all_evidence_df["Percent of MOA Evidence of Category"] = (
    merged_all_evidence_df["Count of MOA Assertions per Category"].astype(str)
    + " / "
    + merged_all_evidence_df["Sum of Evidence from CIViC and MOA per Category"].astype(
        str
    )
    + " ("
    + merged_all_evidence_df["MOA Evidence of Category Percent"].astype(str)
    + ")"
)

### <a id='toc1_3_5_'></a>[Summary Table 9](#toc0_)

The table below shows what percent of the merged data originates from CIViC and MOA sources

<ins>Numerator:</ins> # of CIViC or MOA evidence items based on normalization status of associated variant
<br><ins>Denominator:</ins> # of all evidence items based on normalization status

In [None]:
# clean up summary table by dropping columns and resetting index
merged_civic_moa_summary_table_9 = merged_all_evidence_df[
    [
        "Variant Category",
        "Percent of CIViC Evidence of Category",
        "Percent of MOA Evidence of Category",
    ]
].copy()
merged_civic_moa_summary_table_9 = merged_civic_moa_summary_table_9.set_index(
    "Variant Category"
)
merged_civic_moa_summary_table_9

### <a id='toc1_3_6_'></a>[Building Summary Table 10](#toc0_)

In [None]:
# import summary tables from source notebooks
for_merge_not_supported_variant_evidence_percent_of_civic_df = pd.read_csv(
    "../civic/evidence_analysis/civic_evidence_analysis_output/for_merge_not_supported_variant_evidence_percent_of_civic_df.csv",
    sep=",",
)
for_merge_not_supported_feature_assertion_df = pd.read_csv(
    "../moa/assertion_analysis/moa_assertion_analysis_output/for_merge_not_supported_feature_assertion_df.csv",
    sep=",",
)

In [None]:
# merge CIViC and MOA summary tables
merged_not_supported_evidence_df = pd.merge(
    for_merge_not_supported_variant_evidence_percent_of_civic_df,
    for_merge_not_supported_feature_assertion_df,
    on="Category",
    how="outer",
)
merged_not_supported_evidence_df = merged_not_supported_evidence_df.replace(
    np.nan, 0, regex=True
)

### <a id='toc1_3_7_'></a>[Summary Table 10](#toc0_)

The table below shows the categories of Not Supported variants and what percent of source (CIViC or MOA) evidence items are associated with those variants. These percentages will not add up to 100% because Not Supported variants make up 44.11% of CIViC variants and 63.09% of MOA variants and evidence itmes may be used across multiple variants.(Reference Summary Table 7- merged_civic_moa_summary_table_7)

<ins>Numerator:</ins> # of CIViC or MOA evidence items that are associated with Not Supported variants in a given Subcategory
<br><ins>Denominator:</ins> # of all CIViC or MOA evidence items

In [None]:
# clean up summary table by dropping columns and resetting index
merged_civic_moa_summary_table_10 = merged_not_supported_evidence_df.drop(
    [
        "Count of CIViC Evidence Items per Category",
        "Count of MOA Assertions per Category",
    ],
    axis=1,
)
merged_civic_moa_summary_table_10 = merged_civic_moa_summary_table_10.set_index(
    "Category"
)
merged_civic_moa_summary_table_10

### <a id='toc1_3_8_'></a>[Building Summary Table 11](#toc0_)

In [None]:
# add up evidence from CIViC and MOA for each Variant Category
merged_not_supported_evidence_df["Sum of Evidence from CIViC and MOA per Category"] = (
    merged_not_supported_evidence_df["Count of CIViC Evidence Items per Category"]
    + merged_not_supported_evidence_df["Count of MOA Assertions per Category"]
)
# new column for the total combined evidence number from CIViC and MOA
merged_not_supported_evidence_df["Sum of total Evidence from CIViC and MOA"] = int(
    merged_evidence_total
)

In [None]:
# new percent of each category of the total merged evidence from CIViC and MOA
merged_not_supported_evidence_df[
    "Merged Not Supported Evidence Category of Merged Total Percent"
] = (
    merged_not_supported_evidence_df["Sum of Evidence from CIViC and MOA per Category"]
    / merged_not_supported_evidence_df["Sum of total Evidence from CIViC and MOA"]
) * 100
merged_not_supported_evidence_df = merged_not_supported_evidence_df.round(
    {"Merged Not Supported Evidence Category of Merged Total Percent": 2}
)
merged_not_supported_evidence_df[
    "Merged Not Supported Evidence Category of Merged Total Percent"
] = (
    merged_not_supported_evidence_df[
        "Merged Not Supported Evidence Category of Merged Total Percent"
    ].astype(str)
    + "%"
)

In [None]:
# merge fraction and percent
merged_not_supported_evidence_df["Percent of all Merged Evidence Items"] = (
    merged_not_supported_evidence_df[
        "Sum of Evidence from CIViC and MOA per Category"
    ].astype(str)
    + " / "
    + merged_not_supported_evidence_df[
        "Sum of total Evidence from CIViC and MOA"
    ].astype(str)
    + " ("
    + merged_not_supported_evidence_df[
        "Merged Not Supported Evidence Category of Merged Total Percent"
    ].astype(str)
    + ")"
)

### <a id='toc1_3_9_'></a>[Summary Table 11](#toc0_)

The table below shows the evidence items associated with categories of Not Supported variants and what percent of all evidence items (CIViC and MOA) they make up. These percentages will not add up to 100% because evidence items associated with Not Supported variants make up a subset of all variants (CIViC and MOA). 

<ins>Numerator:</ins> # of evidence items that are associated with Not Supported variants in a given Subcategory
<br><ins>Denominator:</ins> # of all evidence items

In [None]:
# clean up summary table by dropping columns and resetting index
merged_civic_moa_summary_table_11 = merged_not_supported_evidence_df[
    ["Category", "Percent of all Merged Evidence Items"]
].copy()
merged_civic_moa_summary_table_11 = merged_civic_moa_summary_table_11.set_index(
    "Category"
)
merged_civic_moa_summary_table_11

### <a id='toc1_3_10_'></a>[Building Summary Table 12](#toc0_)

In [None]:
# add up all evidence, for total evidence items associated with Not Supported Variants
merged_not_supported_evidence_total = merged_not_supported_evidence_df[
    "Sum of Evidence from CIViC and MOA per Category"
].sum()
merged_not_supported_evidence_df[
    "Merged Not Supported Evidence Total"
] = merged_not_supported_evidence_total

In [None]:
# new percent of each category of the total merged evidence from CIViC and MOA
merged_not_supported_evidence_df[
    "Merged Not Supported Evidence Category of Not Supported Percent"
] = (
    merged_not_supported_evidence_df["Sum of Evidence from CIViC and MOA per Category"]
    / merged_not_supported_evidence_df["Merged Not Supported Evidence Total"]
) * 100
merged_not_supported_evidence_df = merged_not_supported_evidence_df.round(
    {"Merged Not Supported Evidence Category of Not Supported Percent": 2}
)
merged_not_supported_evidence_df[
    "Merged Not Supported Evidence Category of Not Supported Percent"
] = (
    merged_not_supported_evidence_df[
        "Merged Not Supported Evidence Category of Not Supported Percent"
    ].astype(str)
    + "%"
)

In [None]:
# merge fraction and percent
merged_not_supported_evidence_df["Percent of all Not Supported Evidence Items"] = (
    merged_not_supported_evidence_df[
        "Sum of Evidence from CIViC and MOA per Category"
    ].astype(str)
    + " / "
    + merged_not_supported_evidence_df["Merged Not Supported Evidence Total"].astype(
        str
    )
    + " ("
    + merged_not_supported_evidence_df[
        "Merged Not Supported Evidence Category of Not Supported Percent"
    ].astype(str)
    + ")"
)

### <a id='toc1_3_11_'></a>[Summary Table 12](#toc0_)

The table below shows the categories of Not Supported variants the percent of each category of total merged Not Supported evidence.

<ins>Numerator:</ins> # of evidence items that are associated with Not Supported variants in a given Subcategory
<br><ins>Denominator:</ins> # of evidence items associated with Not Supported variants

In [None]:
# clean up summary table by dropping columns and resetting index
merged_civic_moa_summary_table_12 = merged_not_supported_evidence_df[
    ["Category", "Percent of all Not Supported Evidence Items"]
].copy()
merged_civic_moa_summary_table_12 = merged_civic_moa_summary_table_12.set_index(
    "Category"
)
merged_civic_moa_summary_table_12

## <a id='toc1_4_'></a>[Merge CIViC and MOA Summary Impact Dataframes](#toc0_)

In [None]:
not_supported_variant_impact_df = pd.read_csv(
    "../civic/evidence_analysis/civic_evidence_analysis_output/not_supported_variant_impact_df.csv",
    sep=",",
)
not_supported_feature_impact_df = pd.read_csv(
    "../moa/assertion_analysis/moa_assertion_analysis_output/not_supported_feature_impact_df.csv",
    sep=",",
)

In [None]:
moa_impact_df = pd.read_csv(
    "../moa/assertion_analysis/moa_assertion_analysis_output/not_supported_feature_impact_df.csv",
    sep=",",
)
civic_both_evidence_impact_df = pd.read_csv(
    "../civic/evidence_analysis/civic_evidence_analysis_output/civic_both_evidence_cat_impact_df.csv",
    sep=",",
)
civic_accepted_evidence_only_impact_df = pd.read_csv(
    "../civic/evidence_analysis/civic_evidence_analysis_output/civic_accepted_evidence_only_impact_df.csv",
    sep=",",
)

### <a id='toc1_4_1_'></a>[Building Summary Table 13 & 14](#toc0_)

In [None]:
civic_both_evidence_impact_df = civic_both_evidence_impact_df.rename(
    columns={
        "category": "Category",
        "number_of_variants": "Number_CIViC_Variants",
        "#_evidence_items": "Number_CIViC_Evidences",
        "impact": "CIViC_Impact_Score",
        "average_impact_per_variant": "CIViC_AVG_Variant_Impact",
    }
)
civic_accepted_evidence_only_impact_df = civic_accepted_evidence_only_impact_df.rename(
    columns={
        "category": "Category",
        "number_of_variants": "Number_CIViC_Variants",
        "#_evidence_items": "Number_CIViC_Evidences",
        "impact": "CIViC_Impact_Score",
        "average_impact_per_variant": "CIViC_AVG_Variant_Impact",
    }
)

In [None]:
merged_both_impact_df = pd.merge(
    civic_both_evidence_impact_df, moa_impact_df, on="Category"
).copy()
merged_accepted_only_impact_df = pd.merge(
    civic_accepted_evidence_only_impact_df, moa_impact_df, on="Category"
).copy()

In [None]:
merged_both_impact_df["Sum_Variants"] = (
    merged_both_impact_df["Number_CIViC_Variants"]
    + merged_both_impact_df["Total Number Features"]
)
merged_both_impact_df["Sum_Evidence_Items"] = (
    merged_both_impact_df["Number_CIViC_Evidences"]
    + merged_both_impact_df["Total Number Assertions"]
)
merged_both_impact_df["Sum_Impact"] = (
    merged_both_impact_df["CIViC_Impact_Score"]
    + merged_both_impact_df["MOA Total Sum Impact Score"]
)
merged_both_impact_df["Average_Sum_Impact_Per_Variant"] = (
    merged_both_impact_df["Sum_Impact"] / merged_both_impact_df["Sum_Variants"]
)
merged_both_impact_df.sort_values(by=["Sum_Impact"], ascending=False, ignore_index=True)

### <a id='toc1_4_2_'></a>[Summary Table 13](#toc0_)

In [None]:
merged_both_impact_df[
    "Ratio of MOA Features to MOA+CIVIC per Category"
] = merged_both_impact_df["Total Number Features"] / (
    merged_both_impact_df["Total Number Features"]
    + merged_both_impact_df["Number_CIViC_Variants"]
)
merged_both_impact_df

In [None]:
def add_color(
    df: pd.DataFrame, color: str, all_categories: bool, index_pos: int
) -> pd.DataFrame:
    """Add column with information about the color of the lines in the parallel plot with the most impactful category being red

    :param df: Dataframe of variants
    :param color: string of hexadecimal color code
    :param all_categories: True if want to apply the color to all of the categories, False if you just want to apply it a certain category
    :param index: if all_categories = False, the index of the category you would like to indicate based on being ordered by the Sum_Impact in ascending order
    :return: Transformed dataframe with a color column
    """
    if all_categories is True:
        df["Color"] = color
    else:
        df["Color"][index_pos] = color
    return df

In [None]:
def add_linewidth(
    df: pd.DataFrame, width: int, all_categories: bool, index_pos: int
) -> pd.DataFrame:
    """Add column with information about the width of the lines in the parallel plot with the most impactful category being thickest

    :param df: Dataframe of variants
    :param color: number indicating width
    :param all_categories: True if want to apply the width to all of the categories, False if you just want to apply it a certain category
    :param index: if all_categories = False, the index of the category you would like to indicate based on being ordered by the Sum_Impact in ascending order
    :return: Transformed dataframe with a line width column
    """
    if all_categories is True:
        df["Line_Width"] = width
    else:
        df["Line_Width"][index_pos] = width
    return df

In [None]:
merged_both_impact_df = add_color(
    merged_both_impact_df, "#222222", True, index_pos=None
)

In [None]:
merged_both_impact_df = add_linewidth(merged_both_impact_df, 1, True, index_pos=None)

In [None]:
merged_both_impact_df = add_linewidth(merged_both_impact_df, 3, False, index_pos=0)

In [None]:
merged_both_impact_df = add_color(merged_both_impact_df, "#CC0000", False, index_pos=0)

In [None]:
merged_both_impact_df = merged_both_impact_df.sort_values(
    "Sum_Impact", ignore_index=True
)

In [None]:
merged_both_impact_df.to_csv(
    "merged_moa_civic_evidence_analysis_output/merged_both_impact_df.csv", index=False
)
merged_both_impact_df

In [None]:
merged_accepted_only_impact_df["Sum_Variants"] = (
    merged_accepted_only_impact_df["Number_CIViC_Variants"]
    + merged_accepted_only_impact_df["Total Number Features"]
)
merged_accepted_only_impact_df["Sum_Evidence_Items"] = (
    merged_accepted_only_impact_df["Number_CIViC_Evidences"]
    + merged_accepted_only_impact_df["Total Number Assertions"]
)
merged_accepted_only_impact_df["Sum_Impact"] = (
    merged_accepted_only_impact_df["CIViC_Impact_Score"]
    + merged_accepted_only_impact_df["MOA Total Sum Impact Score"]
)
merged_accepted_only_impact_df["Average_Sum_Impact_Per_Variant"] = (
    merged_accepted_only_impact_df["Sum_Impact"]
    / merged_accepted_only_impact_df["Sum_Variants"]
)
merged_accepted_only_impact_df.sort_values(
    by=["Sum_Impact"], ascending=False, ignore_index=True
)

In [None]:
merged_accepted_only_impact_df = add_color(
    merged_accepted_only_impact_df, "#222222", True, index_pos=None
)

In [None]:
merged_accepted_only_impact_df = add_linewidth(
    merged_accepted_only_impact_df, 1, True, index_pos=None
)

In [None]:
merged_accepted_only_impact_df = add_color(
    merged_accepted_only_impact_df, "#CC0000", False, index_pos=0
)

In [None]:
merged_accepted_only_impact_df = add_linewidth(
    merged_accepted_only_impact_df, 3, False, index_pos=0
)

In [None]:
merged_accepted_only_impact_df = merged_accepted_only_impact_df.sort_values(
    "Sum_Impact", ignore_index=True
)

### <a id='toc1_4_3_'></a>[Summary Table 14](#toc0_)

In [None]:
merged_accepted_only_impact_df[
    "Ratio of MOA Features to MOA+CIVIC per Category"
] = merged_accepted_only_impact_df["Total Number Features"] / (
    merged_accepted_only_impact_df["Total Number Features"]
    + merged_accepted_only_impact_df["Number_CIViC_Variants"]
)

In [None]:
merged_accepted_only_impact_df.to_csv(
    "merged_moa_civic_evidence_analysis_output/merged_accepted_only_impact_df.csv",
    index=False,
)
merged_accepted_only_impact_df

## <a id='toc1_5_'></a>[Building Scatterpie plot](#toc0_)

In [None]:
merged_not_supported_impact_df = pd.merge(
    not_supported_feature_impact_df, not_supported_variant_impact_df, on="Category"
)

### <a id='toc1_5_1_'></a>[Merge aspects of the dataframe (number of evidence items, variants, impact score)](#toc0_)

In [None]:
merged_not_supported_impact_df["Sum Evidence Items"] = (
    merged_not_supported_impact_df["Total Number Evidence Items"]
    + merged_not_supported_impact_df["Total Number Assertions"]
)

In [None]:
merged_not_supported_impact_df["Sum Variants"] = (
    merged_not_supported_impact_df["Total Number Variants"]
    + merged_not_supported_impact_df["Total Number Features"]
)

In [None]:
merged_not_supported_impact_df["Sum Impact Score"] = (
    merged_not_supported_impact_df["CIVIC Total Sum Impact Score"]
    + merged_not_supported_impact_df["MOA Total Sum Impact Score"]
)
merged_not_supported_impact_df

### <a id='toc1_5_2_'></a>[Calculate the ratio of features/variants that come from MOA](#toc0_)

In [None]:
merged_not_supported_impact_df[
    "Ratio of MOA Features to MOA+CIVIC per Category"
] = merged_not_supported_impact_df["Total Number Features"] / (
    merged_not_supported_impact_df["Total Number Features"]
    + merged_not_supported_impact_df["Total Number Variants"]
)

In [None]:
merged_not_supported_impact_df.to_csv(
    "merged_moa_civic_evidence_analysis_output/merged_not_supported_impact_df.csv",
    index=False,
)
merged_not_supported_impact_df

In [None]:
trimmed_merged_not_supported_impact_df = merged_not_supported_impact_df[
    [
        "Category",
        "Sum Evidence Items",
        "Sum Variants",
        "Sum Impact Score",
        "Ratio of MOA Features to MOA+CIVIC per Category",
    ]
].copy()
trimmed_merged_not_supported_impact_df

In [None]:
trimmed_merged_not_supported_impact_df.to_csv(
    "merged_moa_civic_evidence_analysis_output/trimmed_merged_not_supported_impact_df.csv",
    index=False,
)

In [None]:
variant_category_list = merged_not_supported_impact_df["Category"]
variant_category_list

In [None]:
ratio_list = merged_not_supported_impact_df[
    "Ratio of MOA Features to MOA+CIVIC per Category"
].to_numpy()

In [None]:
moa_civic_evidence_sum_list = merged_not_supported_impact_df[
    "Sum Evidence Items"
].to_numpy()

In [None]:
moa_civic_variant_sum_list = merged_not_supported_impact_df["Sum Variants"].to_numpy()

In [None]:
moa_civic_impact_score_list = merged_not_supported_impact_df[
    "Sum Impact Score"
].to_numpy()

In [None]:
dict_summary = merged_not_supported_impact_df.to_dict("list")

In [None]:
cat_to_coords = dict()
for i, row in merged_not_supported_impact_df.iterrows():
    cat_to_coords[row[0]] = f"{(row[12], row[14])}"

In [None]:
cat_to_coords_list = list(cat_to_coords.items())
cat_to_coords_list

In [None]:
fig5, ax = plt.subplots(figsize=(25, 15))
legend_element_list = []
colors = [
    "red",
    "darkorange",
    "forestgreen",
    "lawngreen",
    "gold",
    "cyan",
    "deepskyblue",
    "mediumslateblue",
    "blue",
    "pink",
    "deeppink",
    "purple",
]
variant_category = variant_category_list

for i in range(11):
    ratio = ratio_list[i]
    size = moa_civic_variant_sum_list[i] * 20
    cat_coor_label = cat_to_coords_list[i]

    # Both will always have b
    x = [0] + np.cos(np.linspace(2 * np.pi * ratio, 2 * np.pi, 100)).tolist()
    y = [0] + np.sin(np.linspace(2 * np.pi * ratio, 2 * np.pi, 100)).tolist()
    marker_b = np.column_stack([x, y])

    if ratio:
        # use both a & b
        x = [0] + np.cos(np.linspace(0, 2 * np.pi * ratio, 100)).tolist()
        y = [0] + np.sin(np.linspace(0, 2 * np.pi * ratio, 100)).tolist()
        marker_a = np.column_stack([x, y])

        ax.scatter(
            moa_civic_evidence_sum_list[i],
            moa_civic_impact_score_list[i],
            marker=marker_a,
            s=size,
            facecolor=colors[i],
            edgecolors="black",
            hatch="///////",
        )

    ax.scatter(
        moa_civic_evidence_sum_list[i],
        moa_civic_impact_score_list[i],
        marker=marker_b,
        s=size,
        facecolor=colors[i],
        label=cat_coor_label,
    )
legend_elements = [
    Line2D(
        [0],
        [0],
        marker="o",
        color="w",
        label="Expression (626, 3629.00)",
        markerfacecolor="red",
        markersize=20,
    ),
    Line2D(
        [0],
        [0],
        marker="o",
        color="w",
        label="Epigenetic Modification (24, 285.50)",
        markerfacecolor="darkorange",
        markersize=20,
    ),
    Line2D(
        [0],
        [0],
        marker="o",
        color="w",
        label="Fusion (1239, 5689.25)",
        markerfacecolor="forestgreen",
        markersize=20,
    ),
    Line2D(
        [0],
        [0],
        marker="o",
        color="w",
        label="Protein Consequence (705, 3747.75)",
        markerfacecolor="lawngreen",
        markersize=20,
    ),
    Line2D(
        [0],
        [0],
        marker="o",
        color="w",
        label="Gene Function (347, 1822.50)",
        markerfacecolor="gold",
        markersize=20,
    ),
    Line2D(
        [0],
        [0],
        marker="o",
        color="w",
        label="Rearrangements (218, 945.00)",
        markerfacecolor="cyan",
        markersize=20,
    ),
    Line2D(
        [0],
        [0],
        marker="o",
        color="w",
        label="Copy Number (93, 254.00)",
        markerfacecolor="deepskyblue",
        markersize=20,
    ),
    Line2D(
        [0],
        [0],
        marker="o",
        color="w",
        label="Other  (185, 708.50)",
        markerfacecolor="mediumslateblue",
        markersize=20,
    ),
    Line2D(
        [0],
        [0],
        marker="o",
        color="w",
        label="Genotypes Easy (23, 195.00)",
        markerfacecolor="blue",
        markersize=20,
    ),
    Line2D(
        [0],
        [0],
        marker="o",
        color="w",
        label="Genotypes Compound (7, 117.50)",
        markerfacecolor="pink",
        markersize=20,
    ),
    Line2D(
        [0],
        [0],
        marker="o",
        color="w",
        label="Region Defined (924, 8311.50)",
        markerfacecolor="deeppink",
        markersize=20,
    ),
    Line2D(
        [0],
        [0],
        marker="o",
        color="w",
        label="Transcript (471, 346.50)",
        markerfacecolor="purple",
        markersize=20,
    ),
    Line2D([0], [0], color="w", markerfacecolor="white"),
    mpatches.Circle(
        xy=(0, 0),
        radius=1,
        edgecolor="black",
        facecolor="white",
        hatch="///////",
        label="Variants from MOA",
    ),
    mpatches.Circle(
        xy=(0, 0),
        radius=1,
        edgecolor="black",
        facecolor="white",
        label="Variants from CIViC",
    ),
]
legend_2_elements = [
    Line2D([0], [0], color="w", markerfacecolor="white"),
    Line2D([0], [0], color="w", markerfacecolor="white"),
    Line2D([0], [0], color="w", markerfacecolor="white"),
    Line2D(
        [0],
        [0],
        marker="o",
        color="w",
        label="          300 variants per category",
        markerfacecolor="black",
        markersize=85,
    ),
    Line2D([0], [0], color="k", markerfacecolor="white"),
    Line2D([0], [0], color="w", markerfacecolor="white"),
    Line2D([0], [0], color="w", markerfacecolor="white"),
    Line2D(
        [0],
        [0],
        marker="o",
        color="w",
        label="          200 variants per category",
        markerfacecolor="black",
        markersize=60,
    ),
    Line2D([0], [0], color="k", markerfacecolor="white"),
    Line2D([0], [0], color="w", markerfacecolor="white"),
    Line2D(
        [0],
        [0],
        marker="o",
        color="w",
        label="          50 variants per category",
        markerfacecolor="black",
        markersize=24,
    ),
    Line2D([0], [0], color="w", markerfacecolor="white"),
]

# Create the figure
first_legend = ax.legend(
    handles=legend_elements,
    loc="lower left",
    bbox_to_anchor=(0.0, 0.325, 3, 0.550),
    fontsize=20,
)
# first_legend._legend_box.align = "left"
ax.add_artist(first_legend)
ax.set_xlabel("Number of Evidence Items per Category (MOA & CIViC)", fontsize=25)
ax.set_ylabel("Impact Score per Category (MOA & CIViC)", fontsize=25)
for axis in ["top", "bottom", "left", "right"]:
    ax.spines[axis].set_linewidth(4)
ax.tick_params(width=4)
second_legend = ax.legend(handles=legend_2_elements, loc="lower right", frameon=False)
ax.add_artist(second_legend)
plt.title("Impact Score of Currently Not Supported Variant Categories", fontsize=40)
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)
plt.setp(plt.gca().get_legend().get_texts(), fontsize="17")

plt.savefig(
    "../merged_moa_civic/merged_moa_civic_evidence_analysis_output/impact_scatterpie_plot.jpeg",
    dpi=1000,
)


plt.show()

## <a id='toc1_6_'></a>[Building Parallel Impact Plot](#toc0_)

In [None]:
def fig2rgb_array(fig):
    """Converting matplotlib figure type into a numplot array
    :param fig: plt.subplots
    :return: Numplot array
    """
    fig.canvas.draw()
    buf, (bw, bh) = fig.canvas.print_to_buffer()
    ncols, nrows = fig.canvas.get_width_height()
    return np.frombuffer(buf, dtype=np.uint8).reshape(nrows, ncols, 4)

In [None]:
# Data and options:
pd.set_option("display.width", 1200)
pd.set_option("display.max_columns", 300)
pd.set_option("display.max_rows", 300)

In [None]:
Bezier = False
PlotChange = True
Submitted_Opacity = 0.2
# If you want auto colors, change the #Auto Colors setting section below
Autocolors = False

In [None]:
def plot_impact(df: pd.DataFrame) -> pd.DataFrame:
    """Plot impact parallel plot

    :param df: Dataframe of Variant Categories with number of variants, evidence items, and impact score
    :return: Returns a parallel plot with number with number of variants, evidence items, and impact score per Not Supported Variant Subcategories
    """
    df_byImpact = df.copy(deep=True)

    ratios = df_byImpact["Ratio of MOA Features to MOA+CIVIC per Category"].to_list()

    my_vars = ["Sum_Variants", "Sum_Evidence_Items", "Sum_Impact", "Category"]
    my_vars_names = ["Variants", "Evidence Items", "Impact Score", "Variant Category"]

    # Below are settings for line types, font sizes, or hard-coded colors.

    ratioplots = []
    scrapfig, scrapax = plt.subplots(figsize=[0.5, 0.5])

    for ratio in ratios:
        scrapax.pie([ratio, 2.0 - ratio], colors=["black", "darkgrey"])
        data = fig2rgb_array(scrapfig)
        ratioplots.append(data)

    plt.close()

    # Adapt the data:
    df_clean_Impact = df_byImpact[my_vars + ["Line_Width", "Color"]]
    df_clean_Impact = df_clean_Impact.dropna()
    df_clean_Impact = df_clean_Impact.reset_index(drop=True)

    df_plot = df_clean_Impact[my_vars]

    # Convert categories to numeric:
    ym = []
    dics_vars = []
    for v, var in enumerate(my_vars):
        if df_plot[var].dtype.kind not in ["i", "u", "f"]:
            dic_var = dict([(val, c) for c, val in enumerate(df_plot[var].unique())])
            dics_vars += [dic_var]
            ym += [[dic_var[i] for i in df_plot[var].tolist()]]
        else:
            ym += [df_plot[var].tolist()]
    ym = np.array(ym).T

    # Padding:
    ymins = ym.min(axis=0)
    ymaxs = ym.max(axis=0)
    dys = ymaxs - ymins
    ymins -= dys * 0.05
    ymaxs += dys * 0.05
    dys = ymaxs - ymins

    # Adjust to the main axis:
    zs = np.zeros_like(ym)
    zs[:, 0] = ym[:, 0]
    zs[:, 1:] = (ym[:, 1:] - ymins[1:]) / dys[1:] * dys[0] + ymins[0]

    # Auto Colors - V1.0
    n_levels = len(dics_vars[0])
    my_colors = [
        "#F41E1E",
        "#F4951E",
        "#F4F01E",
        "#4EF41E",
        "#1EF4DC",
        "#1E3CF4",
        "#F41EF3",
    ]
    cmap = LinearSegmentedColormap.from_list("my_palette", my_colors)
    my_palette = [cmap(i / n_levels) for i in np.array(range(n_levels))]

    # Plot:
    fig, host_ax = plt.subplots(figsize=(10, 5), tight_layout=True)

    # Make the axes:
    axes = [host_ax] + [host_ax.twinx() for i in range(ym.shape[1] - 1)]
    dic_count = 0

    for i, ax in enumerate(axes):
        ax.set_ylim(bottom=ymins[i], top=ymaxs[i])
        ax.spines.top.set_visible(False)
        ax.spines.bottom.set_visible(False)
        ax.ticklabel_format(style="plain")
        if ax != host_ax:
            ax.spines.left.set_visible(False)
            ax.yaxis.set_ticks_position("right")
            ax.spines.right.set_position(("axes", i / (ym.shape[1] - 1)))

        if df_plot.iloc[:, i].dtype.kind not in ["i", "u", "f"]:
            dic_var_i = dics_vars[dic_count]
            ax.set_yticks(range(len(dic_var_i)))
            ax.set_yticklabels(
                ["             " + key_val for key_val in dics_vars[dic_count].keys()]
            )

            tick_labels = ax.yaxis.get_ticklabels()

            atickcount = 0
            for atick in tick_labels:
                ib = OffsetImage(
                    ratioplots[atickcount][0 : int(len(ratioplots[atickcount]) / 2)],
                    zoom=0.85,
                )

                ib.image.axes = ax
                ab = AnnotationBbox(
                    ib,
                    (i, tick_labels[atickcount].get_position()[1]),
                    frameon=False,
                    box_alignment=(-0.05, 0.3),
                )
                ax.add_artist(ab)
                atickcount += 1

            dic_count += 1

    ax.spines.right.set_visible(False)

    host_ax.set_xlim(left=0, right=ym.shape[1] - 1)

    host_ax.set_xticks(range(ym.shape[1]))

    host_ax.set_xticklabels(my_vars_names, fontsize=14)

    host_ax.tick_params(axis="x", which="major", pad=7)

    host_ax.set_title("Clinical Impact of Not Supported Variants", fontsize=18)

    # Make the curves:
    host_ax.spines.right.set_visible(False)
    host_ax.xaxis.tick_top()
    host_ax.tick_params(axis="x", which="both", length=0)
    for j in range(ym.shape[0]):
        if Bezier:
            verts = list(
                zip(
                    [
                        x
                        for x in np.linspace(
                            0, len(ym) - 1, len(ym) * 3 - 2, endpoint=True
                        )
                    ],
                    np.repeat(zs[j, :], 3)[1:-1],
                )
            )
            codes = [mPath.MOVETO] + [mPath.CURVE4 for _ in range(len(verts) - 1)]
        else:
            verts = list(zip([x for x in range(len(ym))], zs[j, :]))
            codes = [mPath.MOVETO] + [mPath.LINETO for _ in range(len(verts) - 1)]
        mpath = mPath(verts, codes)

        if Autocolors:
            acolor = my_palette[j % len(dics_vars[0])]
        else:
            acolor = df_clean_Impact["Color"].iloc[j]

        patch = mpatches.PathPatch(
            mpath,
            facecolor="none",
            lw=df_clean_Impact["Line_Width"].iloc[j],
            edgecolor=acolor,
        )
        host_ax.add_patch(patch)
    return ax

Only accepted variants

In [None]:
merged_accepted_only_impact_df["Category"] = merged_accepted_only_impact_df[
    "Category"
].str.replace("Variants", "")

In [None]:
merged_accepted_only_impact_plot = plot_impact(merged_accepted_only_impact_df)
merged_accepted_only_impact_plot

In [None]:
merged_accepted_only_impact_plot.figure.savefig(
    "merged_moa_civic_evidence_analysis_output/merged_accepted_only_impact_plot.png"
)

Both accepted and submitted variants

In [None]:
merged_both_impact_df["Category"] = merged_both_impact_df["Category"].str.replace(
    "Variants", ""
)

In [None]:
merged_both_impact_plot = plot_impact(merged_both_impact_df)

In [None]:
merged_both_impact_plot.figure.savefig(
    "merged_moa_civic_evidence_analysis_output/merged_both_impact_plot.png"
)