<a href="https://www.kaggle.com/code/nagatominori/jp-chronic-kidney-disease-and-preventive-medicine?scriptVersionId=233436097" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# 慢性腎臓病と予防医学
このノートブックでは、慢性腎臓病の診断と関連因子の関係を調査している。
結果を通して、予防医学の観点から医療従事者がどのように介入するべきかを考察する。
今回の分析では、カイ二乗検定とロジスティック回帰モデルを使用した。

## 出典
このデータセットは、Rabie El Kharoua 氏が Kaggle 上で公開した、慢性腎臓病（Chronic Kidney Disease）に関する合成データセットに基づいています。
元データは以下のリンクからご覧いただけます。
https://www.kaggle.com/datasets/rabieelkharoua/chronic-kidney-disease-dataset-analysis  

このデータセットは、Creative Commons Attribution 4.0 International（CC BY 4.0）ライセンス のもとで提供されています。

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/Chronic_Kidney_Dsease_data.csv


# データ概要

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Read CSV from Kaggle dataset
file_path = '/kaggle/input/Chronic_Kidney_Dsease_data.csv'
df = pd.read_csv(file_path)
df.set_index('PatientID', inplace=True)

# Confirmation of basic information
print(df.info())
print(df.isnull().sum())
print(df.describe())

<class 'pandas.core.frame.DataFrame'>
Index: 1659 entries, 1 to 1659
Data columns (total 53 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Age                            1659 non-null   int64  
 1   Gender                         1659 non-null   int64  
 2   Ethnicity                      1659 non-null   int64  
 3   SocioeconomicStatus            1659 non-null   int64  
 4   EducationLevel                 1659 non-null   int64  
 5   BMI                            1659 non-null   float64
 6   Smoking                        1659 non-null   int64  
 7   AlcoholConsumption             1659 non-null   float64
 8   PhysicalActivity               1659 non-null   float64
 9   DietQuality                    1659 non-null   float64
 10  SleepQuality                   1659 non-null   float64
 11  FamilyHistoryKidneyDisease     1659 non-null   int64  
 12  FamilyHistoryHypertension      1659 non-null   int64 

In [3]:
df['Diagnosis'].value_counts(normalize=True)

Diagnosis
1    0.918626
0    0.081374
Name: proportion, dtype: float64

# 慢性腎疾患と環境および職業的曝露との相関関係

In [4]:
# Correlation between Chronic Kidney Disease and HeavyMetalsExposure
ckd_heavymetal = pd.crosstab(df['HeavyMetalsExposure'], df['Diagnosis'])
print(ckd_heavymetal)

chi2, p, dof, expected = stats.chi2_contingency(ckd_heavymetal)

print("Chi-square test results:")
print(f"Chi2 = {chi2:.3f}, p-value = {p:.4f}")
if p < 0.05:
    print("There is a significant association between chronic kidney disease and HeavyMetalsExposure")
else:
    print("No significant association between chronic kidney disease and HeavyMetalsExposure")



Diagnosis              0     1
HeavyMetalsExposure           
0                    130  1456
1                      5    68
Chi-square test results:
Chi2 = 0.037, p-value = 0.8471
No significant association between chronic kidney disease and HeavyMetalsExposure


In [5]:
# Correlation between Chronic Kidney Disease and OccupationalExposureChemicals
ckd_chemical = pd.crosstab(df['OccupationalExposureChemicals'], df['Diagnosis'])
print(ckd_chemical)      


chi2, p, dof, expected = stats.chi2_contingency(ckd_chemical)
print("Chi-square test results:")

print(f"Chi2 = {chi2:.3f}, p-value = {p:.4f}")
if p < 0.05:
    print("There is a significant association between chronic kidney disease and OccupationalExposureChemicals")
else:
    print("No significant association between chronic kidney disease and OccupationalExposureChemicals")

Diagnosis                        0     1
OccupationalExposureChemicals           
0                              119  1369
1                               16   155
Chi-square test results:
Chi2 = 0.219, p-value = 0.6397
No significant association between chronic kidney disease and OccupationalExposureChemicals


In [6]:
# Correlation between Chronic Kidney Disease and WaterQuality
ckd_waterq = pd.crosstab(df['WaterQuality'], df['Diagnosis'])
print(ckd_waterq)      


chi2, p, dof, expected = stats.chi2_contingency(ckd_waterq)
print("Chi-square test results:")

print(f"Chi2 = {chi2:.3f}, p-value = {p:.4f}")
if p < 0.05:
    print("There is a significant association between chronic kidney disease and WaterQuality")
else:
    print("No significant association between chronic kidney disease and WaterQuality")


Diagnosis       0     1
WaterQuality           
0             105  1227
1              30   297
Chi-square test results:
Chi2 = 0.426, p-value = 0.5141
No significant association between chronic kidney disease and WaterQuality


重金属曝露、化学物質への職業曝露、水質のいずれにおいても、慢性腎臓病との間に有意な差は認められなかった。

# 慢性腎疾患とライフスタイルの関係

In [7]:
# Correlation between Chronic Kidney Disease and BMI
import statsmodels.api as sm

X = sm.add_constant(df['BMI']) 
y = df['Diagnosis']

model = sm.Logit(y, X)
result = model.fit()
print(result.summary())

Optimization terminated successfully.
         Current function value: 0.280586
         Iterations 7
                           Logit Regression Results                           
Dep. Variable:              Diagnosis   No. Observations:                 1659
Model:                          Logit   Df Residuals:                     1657
Method:                           MLE   Df Model:                            1
Date:                Sat, 12 Apr 2025   Pseudo R-squ.:                0.005413
Time:                        09:39:29   Log-Likelihood:                -465.49
converged:                       True   LL-Null:                       -468.03
Covariance Type:            nonrobust   LLR p-value:                   0.02439
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.6722      0.340      4.919      0.000       1.006       2.338
BMI            0.0278      0.

In [8]:
# Correlation between Chronic Kidney Disease and Smoking
ckd_smoking = pd.crosstab(df['Smoking'], df['Diagnosis'])
print(ckd_smoking)      


chi2, p, dof, expected = stats.chi2_contingency(ckd_smoking)
print("Chi-square test results:")

print(f"Chi2 = {chi2:.3f}, p-value = {p:.4f}")
if p < 0.05:
    print("There is a significant association between chronic kidney disease and smoking")
else:
    print("No significant association between chronic kidney disease and smoking")

Diagnosis    0     1
Smoking             
0          101  1072
1           34   452
Chi-square test results:
Chi2 = 0.992, p-value = 0.3193
No significant association between chronic kidney disease and smoking


In [9]:
# Correlatinon between Chronic Kidney Disease and AlcoholConsumption
ckd_alcohol = pd.crosstab(df['AlcoholConsumption'], df['Diagnosis'])
print(ckd_alcohol)      


chi2, p, dof, expected = stats.chi2_contingency(ckd_alcohol)
print("Chi-square test results:")

print(f"Chi2 = {chi2:.3f}, p-value = {p:.4f}")
if p < 0.05:
    print("There is a significant association between chronic kidney disease and AlcoholConsumption")
else:
    print("No significant association between chronic kidney disease and AlcoholConsumption")

Diagnosis           0  1
AlcoholConsumption      
0.021740            0  1
0.027360            0  1
0.043682            0  1
0.053363            0  1
0.059254            0  1
...                .. ..
19.950964           0  1
19.959241           0  1
19.981815           0  1
19.986598           0  1
19.992713           0  1

[1659 rows x 2 columns]
Chi-square test results:
Chi2 = 1659.000, p-value = 0.4885
No significant association between chronic kidney disease and AlcoholConsumption


In [10]:
# Correlatinon between Chronic Kidney Disease and PhysicalActivity
X = sm.add_constant(df['PhysicalActivity']) 
y = df['Diagnosis']

model = sm.Logit(y, X)
result = model.fit()
print(result.summary())

Optimization terminated successfully.
         Current function value: 0.281881
         Iterations 6
                           Logit Regression Results                           
Dep. Variable:              Diagnosis   No. Observations:                 1659
Model:                          Logit   Df Residuals:                     1657
Method:                           MLE   Df Model:                            1
Date:                Sat, 12 Apr 2025   Pseudo R-squ.:               0.0008234
Time:                        09:39:30   Log-Likelihood:                -467.64
converged:                       True   LL-Null:                       -468.03
Covariance Type:            nonrobust   LLR p-value:                    0.3800
                       coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------------
const                2.5648      0.187     13.741      0.000       2.199       2.931
PhysicalAct

In [11]:
# Correlatinon between Chronic Kidney Disease and DietQuality
X = sm.add_constant(df['DietQuality']) 
y = df['Diagnosis']

model = sm.Logit(y, X)
result = model.fit()
print(result.summary())

Optimization terminated successfully.
         Current function value: 0.281122
         Iterations 7
                           Logit Regression Results                           
Dep. Variable:              Diagnosis   No. Observations:                 1659
Model:                          Logit   Df Residuals:                     1657
Method:                           MLE   Df Model:                            1
Date:                Sat, 12 Apr 2025   Pseudo R-squ.:                0.003515
Time:                        09:39:30   Log-Likelihood:                -466.38
converged:                       True   LL-Null:                       -468.03
Covariance Type:            nonrobust   LLR p-value:                   0.06970
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
const           2.7216      0.193     14.099      0.000       2.343       3.100
DietQuality    -0.0570    

In [12]:
# Correlatinon between Chronic Kidney Disease and SleepQuality
X = sm.add_constant(df['SleepQuality']) 
y = df['Diagnosis']

model = sm.Logit(y, X)
result = model.fit()
print(result.summary())

Optimization terminated successfully.
         Current function value: 0.281886
         Iterations 6
                           Logit Regression Results                           
Dep. Variable:              Diagnosis   No. Observations:                 1659
Model:                          Logit   Df Residuals:                     1657
Method:                           MLE   Df Model:                            1
Date:                Sat, 12 Apr 2025   Pseudo R-squ.:               0.0008037
Time:                        09:39:30   Log-Likelihood:                -467.65
converged:                       True   LL-Null:                       -468.03
Covariance Type:            nonrobust   LLR p-value:                    0.3857
                   coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------
const            2.1084      0.372      5.663      0.000       1.379       2.838
SleepQuality     0.0458

BMIと慢性腎疾患の間に弱い相関関係が見られると考えられる。
慢性腎疾患と喫煙、アルコール摂取、運動、食生活の質、睡眠の質との有意差は認められなかった。

# 慢性腎臓病と健康行動の関連

In [13]:
# Correlatinon between Chronic Kidney Disease and MedicalCheckupsFrequency
X = sm.add_constant(df['MedicalCheckupsFrequency']) 
y = df['Diagnosis']

model = sm.Logit(y, X)
result = model.fit()
print(result.summary())

Optimization terminated successfully.
         Current function value: 0.282037
         Iterations 6
                           Logit Regression Results                           
Dep. Variable:              Diagnosis   No. Observations:                 1659
Model:                          Logit   Df Residuals:                     1657
Method:                           MLE   Df Model:                            1
Date:                Sat, 12 Apr 2025   Pseudo R-squ.:               0.0002713
Time:                        09:39:30   Log-Likelihood:                -467.90
converged:                       True   LL-Null:                       -468.03
Covariance Type:            nonrobust   LLR p-value:                    0.6143
                               coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------------
const                        2.3453      0.178     13.153      0.000       1.996

In [14]:
# Correlatinon between Chronic Kidney Disease and MedicationAdherence
X = sm.add_constant(df['MedicationAdherence']) 
y = df['Diagnosis']

model = sm.Logit(y, X)
result = model.fit()
print(result.summary())

Optimization terminated successfully.
         Current function value: 0.282112
         Iterations 6
                           Logit Regression Results                           
Dep. Variable:              Diagnosis   No. Observations:                 1659
Model:                          Logit   Df Residuals:                     1657
Method:                           MLE   Df Model:                            1
Date:                Sat, 12 Apr 2025   Pseudo R-squ.:               4.158e-06
Time:                        09:39:30   Log-Likelihood:                -468.02
converged:                       True   LL-Null:                       -468.03
Covariance Type:            nonrobust   LLR p-value:                    0.9503
                          coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------------
const                   2.4142      0.179     13.513      0.000       2.064       2.764
Me

In [15]:
# Correlatinon between Chronic Kidney Disease and MedicationAdherence
X = sm.add_constant(df['HealthLiteracy']) 
y = df['Diagnosis']

model = sm.Logit(y, X)
result = model.fit()
print(result.summary())

Optimization terminated successfully.
         Current function value: 0.282100
         Iterations 6
                           Logit Regression Results                           
Dep. Variable:              Diagnosis   No. Observations:                 1659
Model:                          Logit   Df Residuals:                     1657
Method:                           MLE   Df Model:                            1
Date:                Sat, 12 Apr 2025   Pseudo R-squ.:               4.766e-05
Time:                        09:39:30   Log-Likelihood:                -468.00
converged:                       True   LL-Null:                       -468.03
Covariance Type:            nonrobust   LLR p-value:                    0.8327
                     coef    std err          z      P>|z|      [0.025      0.975]
----------------------------------------------------------------------------------
const              2.3903      0.182     13.163      0.000       2.034       2.746
HealthLiteracy   

慢性腎臓病と健康診断の頻度、アドヒアランス、ヘルスリテラシーとの関連は認められなかった。

# 結論
In the present analysis, BMI was found to be associated with a diagnosis of chronic kidney disease.
今回の分析では、BMIと慢性腎臓病に弱い関連がみられた。
慢性腎臓病を防ぐには、患者が体重を増やさないように医療従事者が運動と食事についての指導をすることが適切であると考えられる。
例として、BMIが高い人を対象に健康教室を開催したり、運動や食事についてのパンフレットを配布することが必要と考えられる。
このデータでは、慢性腎臓病と診断された人とそうでない人の数に大きな差があったため、今回の分析では有意差が認められなかった関連因子があると考えられる。
そのため、慢性腎臓病の患者と健常者の人数が同じデータで再度分析することが必要である。