#Physionet Challenge 2020 - SNOMED mappings, Dataset by Bjorn  @bjoernjostein
https://www.kaggle.com/bjoernjostein/physionet-snomed-mappings

There are two CSV-files in this dataset. One of them describes the unscored diagnoses and the other one describes the scored diagnoses.

The first three columns describe the diagnoses by name, SNOMED CT code, and abbreviation. The last seven gives an overview of how many times the different diagnoses appear in the six different datasets + total among all datasets.

This dataset was used in the Physionet Challenge 2020 to classify 12-lead ECG.

REFERENCES

Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PCh, Mark RG, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 101(23):e215-e220 [Circulation Electronic Pages; http://circ.ahajournals.org/content/101/23/e215.full]; 2000 (June 13). PMID: 10851218; doi: 10.1161/01.CIR.101.23.e215

![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTrzZjGy9MkNEl3qa2GkSFqdl6G8HhVc7xkhQ&usqp=CAU)hitconsultant.net

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns #visualization
import matplotlib.pyplot as plt #visualization
%matplotlib inline
import plotly.express as px
import plotly.graph_objects as go
import plotly.offline as py
import plotly.express as px
import squarify
plt.style.use('fivethirtyeight')

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

#Coronary heart disease (CHD) is usually diagnosed after a risk assessment and some further tests.

RISK ASSESSMENT

If a GP(General Practitioner) thinks you may be at risk of CHD (Coronary heart disease), they may do a risk assessment for cardiovascular disease, heart attack or stroke. 

This may be carried out as part of an NHS Health Check.

The GP (General Practitioner) will:

ask about your medical and family history
check your blood pressure
do a blood test to assess your cholesterol level
Before having the cholesterol test, you may be asked not to eat for 12 hours so there's no food in your body that could affect the result.

The GP (General Practitioner)or practice nurse can carry out the blood test. A sample will be taken either using a needle and a syringe or by pricking your finger.

The GP (General Practitioner) will also ask about your lifestyle, how much exercise you do and whether you smoke. All these factors will be considered as part of the diagnosis.
https://www.nhs.uk/conditions/coronary-heart-disease/diagnosis/

In [None]:
nRowsRead = 1000 # specify 'None' if want to read whole file
df = pd.read_csv('../input/physionet-snomed-mappings/SNOMED_mappings_unscored.csv', delimiter=';', encoding = "'utf8'", nrows = nRowsRead)
df.dataframeName = 'SNOMED_mappings_unscored.csv'
nRow, nCol = df.shape
print(f'There are {nRow} rows and {nCol} columns')
df.head()

#Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT)

#Dx- Medical diagnosis (abbreviated Dx or DS) is the process of determining which disease or condition explains a person's symptoms and signs

#Consumer Product Safety Commission (CPSC)

#PTB - Pulmonary Tuberculosis

In [None]:
plt.style.use('fivethirtyeight')
import matplotlib.pyplot as plt
import seaborn as sns

from wordcloud import WordCloud

plt.rcParams['figure.figsize'] = (15, 15)
wordcloud = WordCloud(background_color = 'black', width = 1200,  height = 1200,colormap='Set2', max_words = 100).generate(str(df["Dx"]))
plt.imshow(wordcloud)
plt.axis('off')
plt.title('SNOMED',fontsize = 20)
plt.show()

In [None]:
#Code by Gabriel Preda
#plt.style.use('dark_background')
def plot_count(feature, title, df, size=1):
    f, ax = plt.subplots(1,1, figsize=(4*size,4))
    total = float(len(df))
    g = sns.countplot(df[feature], order = df[feature].value_counts().index[:20], palette= ('#32a852', '#a84e32', '#3242a8'))
    g.set_title("Number and percentage of {}".format(title))
    if(size > 2):
        plt.xticks(rotation=90, size=8)
    for p in ax.patches:
        height = p.get_height()
        ax.text(p.get_x()+p.get_width()/2.,
                height + 3,
                '{:1.2f}%'.format(100*height/total),
                ha="center") 
    plt.show()

#Further tests

You may be referred for further tests to help confirm CHD (Coronary heart disease). A number of different tests are used to diagnose heart-related problems, including:

electrocardiogram (ECG)
exercise stress tests
X-rays
echocardiogram
blood tests
coronary angiography
radionuclide tests
MRI scans
CT scans
https://www.nhs.uk/conditions/coronary-heart-disease/diagnosis/

In [None]:
plot_count("CPSC", "CPSC", df,4)

In [None]:
plot_count("CPSC-Extra", "CPSC Extra", df,4)

In [None]:
plot_count("StPetersburg", "StPetersburg", df,4)

In [None]:
plot_count("PTB", "PTB", df,4)

In [None]:
plot_count("PTB-XL", "PTB-XL", df,4)

In [None]:
plot_count("Georgia", "Georgia", df,4)

In [None]:
plot_count("Total", "Total", df,4)

In [None]:
df = df.rename(columns={'SNOMED CT Code':'SNOMED', 'CPSC-Extra': 'CPSCX', 'PTB-XL': 'PTBXL'})

In [None]:
#Code by Ashish Gupta https://www.kaggle.com/roydatascience/cost-prediction-exploratory-data-analytics/notebook

f, ax = plt.subplots(ncols=2, figsize=(10,5))
sns.kdeplot(df.SNOMED, color='b', shade=True, ax=ax[0])
sns.kdeplot(df.SNOMED, color='r', shade=True, bw=100, ax=ax[1])

ax[0].set_title('KDE')
ax[1].set_title('KDE, bandwidth = 100')

plt.show()

In [None]:
#Code by Ashish Gupta https://www.kaggle.com/roydatascience/cost-prediction-exploratory-data-analytics/notebook

f, ax = plt.subplots(ncols=2, figsize=(10,5))
sns.kdeplot(df.Total, color='b', shade=True, ax=ax[0])
sns.kdeplot(df.Total, color='r', shade=True, bw=100, ax=ax[1])

ax[0].set_title('KDE')
ax[1].set_title('KDE, bandwidth = 100')

plt.show()

In [None]:
#Codes by Marília Prata https://www.kaggle.com/mpwolke/snip-test-bland-altman-analysis

fig=sns.lmplot(x="Total", y="SNOMED",data=df)

#Codes by Soumyadip Ghorai https://www.kaggle.com/soumyadipghorai/indian-food-visualization-3d-west-bengal/notebook

#Add another column to find the total Diagnoses used in each diagnosis...

In [None]:
l = []
for i in range(len(df)) : 
    y = df.Dx.loc[i]
    y_ = y.split(sep = ',')
    l.append(len(y_))
df['total_Dx'] = pd.Series(l)

In [None]:
#Save that snippet for another Time
#Abbreviation = []
#for i in range(len(df)) : 
 #   if df.Abbreviation_profile.loc[i] == '-1' : 
  #      Abbreviation.append(df.name.loc[i])
#Abbreviation

In [None]:
Dx_df = df.Total.value_counts().reset_index()
Dx_df.columns = ['Dx Count', 'Total Diagnoses']
fig = px.bar(Dx_df, y= 'Dx Count', x = 'Total Diagnoses',title = 'Total Diagnoses ', color_discrete_sequence = ['#0569ff'])
fig.show()

In [None]:
Dx_dict = {}
Dx_list = []
for i in range(len(df)) : 
    Dx = df.Dx[i].split(sep = ',')
    for j in range(len(Dx)) : 
        Dx_list.append(Dx[j])
for i in Dx_list : 
    count = Dx_list.count(i)
    Dx_dict[i] = count
Dx_dict

In [None]:
Dx_df = pd.DataFrame.from_dict(Dx_dict, orient = 'index')
Dx_df.columns = ['count']
Dx_df.sort_values(by = 'count', ascending = False, inplace = True)

In [None]:
fig = px.bar(Dx_df.head(10), y= 'count',title = 'Ten Dx (Diagnosis)', color_discrete_sequence = ['#7b32a8'],
            labels = {'index': 'Diagnoses', 'count' : 'Count of Dx (Diagnosis)'})
fig.show()

In [None]:
fig = px.scatter_3d(df, x='Dx', y='SNOMED', z='StPetersburg',
                    color='CPSC',
                    hover_data=['PTBXL'],
                    opacity=0.5)
fig.update_layout(title='Coronary Heart Disease Diagnosis')
fig.show()

In [None]:
corr = df.corr()
corr.style.background_gradient(cmap = 'coolwarm')

In [None]:
#Code by Olga Belitskaya https://www.kaggle.com/olgabelitskaya/sequential-data/comments
from IPython.display import display,HTML
c1,c2,f1,f2,fs1,fs2=\
'#eb3434','#eb3446','Akronim','Smokum',30,15
def dhtml(string,fontcolor=c1,font=f1,fontsize=fs1):
    display(HTML("""<style>
    @import 'https://fonts.googleapis.com/css?family="""\
    +font+"""&effect=3d-float';</style>
    <h1 class='font-effect-3d-float' style='font-family:"""+\
    font+"""; color:"""+fontcolor+"""; font-size:"""+\
    str(fontsize)+"""px;'>%s</h1>"""%string))
    
    
dhtml('Programming is more than an important practical art. It is also a gigantic undertaking in the foundations of knowledge, Grace Hopper quote' )