# **Empowering Healthcare with Symbolic Learning and Knowledge Graph Embeddings**

![Desing Pattern](https://github.com/SDM-TIB/HyAI/assets/25593410/42cf771d-b82d-4097-b0f7-0ceb3581b171)

## **Pattern Detection, Analysis and Explanation over P4-LUCAT Knowledge Graph**

>> **Pattern Detection. Unsupervised learning**
*   Learned embeddings of the entities in the Knowledge Graph
*   Apply community detection Algorithms: SemEP, KMeans
*   Compute the quality of the communities generated
*   Generate Radar-plots 
*   Visualize PCA Projection to 2D of the detected cummunities


Git clone

In [None]:
!git clone https://github.com/SDM-TIB/SymbolicLearning_KGE.git
%cd SymbolicLearning_KGE/PatternDetection

In [None]:
import ComputeCommunities as SemCD
import EvaluationMetric

**Clean the build**

In [None]:
!make clean

**Build the project**

In [None]:
!make

## Run the semEP-node

**Parameter input**

In [None]:
model_list = ['TransH', 'RotatE']
threshold = [35, 36, 37, 64, 65]
kg_name = 'TransformedKG'
target_predicate = 'hasRelapse_Progression'

In [None]:
SemCD.run(kg_name=kg_name, target_predicate=target_predicate,
          model_list=model_list, threshold=threshold)

## Quality of the generated communities 

Moving to clusteringMeasures folder 

In [None]:
%cd clusteringMeasures

**Clean the build**

In [None]:
!make clean

**Build the project**

In [None]:
!make

**Compute Metrices**

In [None]:
import ComputeMetrices
ComputeMetrices.run(model_list, threshold)

**Generate Radar Plot**

In [None]:
%cd ../
EvaluationMetric.GenerateRadarPlot(model_list, threshold)

# Analysis and Explanation over P4-LUCAT Knowledge Graph

## **Pattern Detection, Analysis and Explanation over P4-LUCAT Knowledge Graph**

>> **Pattern Analysis and Explanation**
*   Visualize PCA Projection to 2D of the detected cummunities
*   Selecting a target predicate to analyze
*   Distribution of Relapse by Cluster


In [None]:
%%capture
%cd ../PatternAnalysisExplanation
!pip install PyMuPDF
import PatternAnalysis
import pandas as pd

**Visualize PCA Projection to 2D of the detected cummunities**

In [None]:
PatternAnalysis.PCA_projection(kg_name, model_list[0], threshold[1])
PatternAnalysis.PCA_projection(kg_name, model_list[1], threshold[3])

**Targeting the clusters with the '*target_predicate*' selected**

In [None]:
target_cls = PatternAnalysis.target_cluster(kg_name, model_list[1], target_predicate, 'SemEP', threshold[3])
display(target_cls.head(), target_cls.shape)

**Compute the amount of patients for each the target value (column Relapse) and cluster**

In [None]:
df = target_cls[['Relapse', 'cluster']]
q = df.groupby(['Relapse', 'cluster']).size().reset_index(name='count_values')
df_reset = q.reset_index(drop=True)
df_reset

**Normalized Clinical Records** 

In [None]:
# Group by 'Relapse' and sum 'count_values'
result = df_reset.groupby('Relapse')['count_values'].sum().reset_index()

a = df_reset.loc[df_reset.Relapse=='No relapse'].copy()
b = df_reset.loc[df_reset.Relapse=='Relapse'].copy()
c = df_reset.loc[df_reset.Relapse=='UnKnown'].copy()
a['count_values'] = a['count_values']/result[result['Relapse'] == 'No relapse']['count_values'].values[0]
b['count_values'] = b['count_values']/result[result['Relapse'] == 'Relapse']['count_values'].values[0]
c['count_values'] = c['count_values']/result[result['Relapse'] == 'UnKnown']['count_values'].values[0]

df_reset = pd.concat([a, b, c])
df_reset

## **Visualization**

In [None]:
PatternAnalysis.catplot(df_reset, model_list[1])