<a href="https://colab.research.google.com/github/Aiswaryabinu/logistic_regression/blob/main/task2(risk).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The CHA2DS2-VASc score is a tool used to assess the risk of stroke in individuals with atrial fibrillation.


In [None]:
def chads_vasc_score(patient):
    score = 0

    if patient["hypertension"]:
        score += 1
    if patient["age"] >= 75:
        score += 2
    elif 65 <= patient["age"] < 75:
        score += 1
    if patient["diabetes"]:
        score += 1
    if patient["stroke"]:
        score += 2
    if patient["vascular_disease"]:
        score += 1
    if patient["sex"] == "female":
        score += 1
    if patient["heart_failure"]:
        score += 1

    return score


Low Risk: Score = 0 (men) or 1 (women) → No treatment needed.

Moderate Risk: Score = 1 (men) or 2 (women) → Consider treatment.

High Risk: Score ≥ 2 (men) or ≥ 3 (women) → Anticoagulation recommended.

In [None]:
patient = {
    "heart_failure": False,
    "hypertension": True,
    "age": 60,
    "diabetes": False,
    "stroke": False,
    "vascular_disease": True,
    "sex": "female"
}

print(chads_vasc_score(patient))  # Output: 3 # high risk


3


The MELD (Model for End-Stage Liver Disease) score is a numerical scale used to assess the severity of chronic liver disease and prioritize patients for liver transplantation

MELD Score = 9.57 * ln(creatinine) + 3.78 * ln(bilirubin) + 11.2 * ln(INR) + 6.43

Where:

  ln: represents the natural logarithm.

  Creatinine: is measured in mg/dL.

  Bilirubin: is measured in mg/dL.

  INR: is the International Normalized Ratio.


In [None]:
import numpy as np

In [None]:

def liver_disease_mortality(input_creatine, input_bilirubin, input_inr):
    """
    Calculate the probability of mortality given that the patient has
    liver disease.
    Parameters:
        Creatine: mg/dL
        Bilirubin: mg/dL
        INR:
    """
    # Coefficient values
    coef_creatine = 0.957
    coef_bilirubin = 0.378
    coef_inr = 1.12
    intercept = 0.643
    # Calculate the natural logarithm of input variables
    log_cre = np.log(input_creatine)
    log_bil = np.log(input_bilirubin)

    #  Calculate the natural log of input_inr
    log_inr = np.log(input_inr)

    # Compute output
    meld_score = (coef_creatine * log_cre) +(coef_bilirubin * log_bil ) +(coef_inr * log_inr) +intercept

    #  Multiply meld_score by 10 to get the final risk score
    meld_score = meld_score * 10

    return meld_score

In [None]:

tmp_meld_score = liver_disease_mortality(1.0, 2.0, 1.1)
print(f"The patient's MELD score is: {tmp_meld_score:.2f}")

The patient's MELD score is: 10.12


ASCVD stands for Atherosclerotic Cardiovascular Disease. It refers to a group of conditions caused by the buildup of plaque in the arteries, leading to narrowed or blocked blood vessels.

    """
    Atherosclerotic Cardiovascular Disease (ASCVD) Risk Estimator Plus

    Parameters:
        x_age : Age in years
        x_cho : Total cholesterol (mg/dL)
        x_hdl : HDL cholesterol (mg/dL)
        x_sbp : Systolic blood pressure (mmHg)
        x_smo : Smoking status (1 = smoker, 0 = non-smoker)
        x_dia : Diabetes status (1 = diabetic, 0 = non-diabetic)
        verbose : If True, prints intermediate values

    Returns:
        risk_score : Estimated ASCVD risk score
    """

In [None]:
import numpy as np

def ascvd(x_age,
          x_cho,
          x_hdl,
          x_sbp,
          x_smo,
          x_dia,
          verbose=False
         ):

    # Define the coefficients
    b_age = 17.114
    b_cho = 0.94
    b_hdl = -18.92
    b_age_hdl = 4.475
    b_sbp = 27.82
    b_age_sbp = -6.087
    b_smo = 0.691
    b_dia = 0.874

    # Calculate the sum of the products of inputs and coefficients
    sum_prod = (
        b_age * np.log(x_age) +
        b_cho * np.log(x_cho) +
        b_hdl * np.log(x_hdl) +
        b_age_hdl * np.log(x_age) * np.log(x_hdl) +
        b_sbp * np.log(x_sbp) +
        b_age_sbp * np.log(x_age) * np.log(x_sbp) +
        b_smo * x_smo +
        b_dia * x_dia
    )

    if verbose:
        print(f"np.log(x_age):{np.log(x_age):.2f}")
        print(f"np.log(x_cho):{np.log(x_cho):.2f}")
        print(f"np.log(x_hdl):{np.log(x_hdl):.2f}")
        print(f"np.log(x_age) * np.log(x_hdl):{np.log(x_age) * np.log(x_hdl):.2f}")
        print(f"np.log(x_sbp): {np.log(x_sbp):.2f}")
        print(f"np.log(x_age) * np.log(x_sbp): {np.log(x_age) * np.log(x_sbp):.2f}")
        print(f"sum_prod {sum_prod:.2f}")

    # Calculate Risk Score = 1 - (0.9533 ^ (e ^ (sum_prod - 86.61)))
    risk_score = 1 - np.power(0.9533, np.exp(sum_prod - 86.61))

    return risk_score


In [None]:
tmp_risk_score = ascvd(x_age=55,
                      x_cho=213,
                      x_hdl=50,
                      x_sbp=120,
                      x_smo=0,
                      x_dia=0,
                      verbose=True
                      )
print(f"\npatient's ascvd risk score is {tmp_risk_score:.2f}")

np.log(x_age):4.01
np.log(x_cho):5.36
np.log(x_hdl):3.91
np.log(x_age) * np.log(x_hdl):15.68
np.log(x_sbp): 4.79
np.log(x_age) * np.log(x_sbp): 19.19
sum_prod 86.17

patient's ascvd risk score is 0.03


In [None]:
import numpy as np
import pandas as pd

In [None]:
# Load dataset with 100 rows
X, y = load_data(100)

# Display the first few rows
print("Features (X):")
print(X.head())

print("\nLabels (y):")
print(y.head())

Features (X):
   feature_0  feature_1  feature_2  feature_3  feature_4
0   0.837757   0.402522   0.157880   0.640673   0.254098
1   0.398717   0.945840   0.961916   0.461422   0.708695
2   0.960956   0.765708   0.317410   0.041396   0.361672
3   0.246340   0.770420   0.824126   0.498830   0.810363
4   0.466056   0.356168   0.497639   0.752016   0.611973

Labels (y):
0    0
1    0
2    1
3    0
4    1
Name: target, dtype: int64


In [None]:
def load_data(num_rows):
  """
  Placeholder function for loading data.
  Replace with your actual data loading logic.
  """
  # Create dummy dataframes for demonstration
  X = pd.DataFrame(np.random.rand(num_rows, 5), columns=[f'feature_{i}' for i in range(5)])
  y = pd.Series(np.random.randint(0, 2, num_rows), name='target')
  return X, y

In [None]:
# utils.py

import numpy as np
import pandas as pd

def load_data(n_samples=100):
    """
    Generates synthetic tabular data for testing purposes.

    Parameters:
    - n_samples: number of samples to generate

    Returns:
    - X: pd.DataFrame of features
    - y: pd.Series of binary labels (0 or 1)
    """
    np.random.seed(42)  # for reproducibility

    X = pd.DataFrame({
        'age': np.random.randint(20, 80, size=n_samples),
        'cholesterol': np.random.randint(150, 300, size=n_samples),
        'hdl': np.random.randint(30, 100, size=n_samples),
        'sbp': np.random.randint(90, 180, size=n_samples),
        'creatinine': np.random.uniform(0.5, 3.0, size=n_samples),
        'bilirubin': np.random.uniform(0.2, 5.0, size=n_samples),
        'inr': np.random.uniform(0.9, 2.5, size=n_samples),
        'smoker': np.random.choice([0, 1], size=n_samples),
        'diabetic': np.random.choice([0, 1], size=n_samples),
    })

    y = pd.Series(np.random.choice([0, 1], size=n_samples), name='label')

    return X, y


In [None]:
X.head()

Unnamed: 0,feature_0,feature_1,feature_2,feature_3,feature_4
0,0.837757,0.402522,0.15788,0.640673,0.254098
1,0.398717,0.94584,0.961916,0.461422,0.708695
2,0.960956,0.765708,0.31741,0.041396,0.361672
3,0.24634,0.77042,0.824126,0.49883,0.810363
4,0.466056,0.356168,0.497639,0.752016,0.611973


In [None]:
# Call the .mean function of the data frame without choosing an axis
print(f"Pandas: X.mean():\n{X.mean()}")
print()

Pandas: X.mean():
feature_0    0.494141
feature_1    0.547893
feature_2    0.510795
feature_3    0.492006
feature_4    0.542039
dtype: float64



In [None]:
# Call the .mean function of the data frame, choosing axis=0
print(f"Pandas: X.mean(axis=0)\n{X.mean(axis=0)}")

Pandas: X.mean(axis=0)
feature_0    0.494141
feature_1    0.547893
feature_2    0.510795
feature_3    0.492006
feature_4    0.542039
dtype: float64


In [None]:

# Store the data frame data into a numpy array
X_np = np.array(X)

In [None]:
# view the first 2 rows of the numpy array
print(f"First 2 rows of the numpy array:\n{X_np[0:2,:]}")
print

First 2 rows of the numpy array:
[[0.83775695 0.40252219 0.15788001 0.64067314 0.25409848]
 [0.39871667 0.94583998 0.96191575 0.46142213 0.70869528]]


<function print(*args, sep=' ', end='\n', file=None, flush=False)>

In [None]:

# Call the .mean function of the numpy array without choosing an axis
print(f"Numpy.ndarray.mean: X_np.mean:\n{X_np.mean()}")
print()
# Call the .mean function of the numpy array, choosing axis=0
print(f"Numpy.ndarray.mean: X_np.mean(axis=0):\n{X_np.mean(axis=0)}")

Numpy.ndarray.mean: X_np.mean:
0.5173746556512572

Numpy.ndarray.mean: X_np.mean(axis=0):
[0.49414063 0.54789289 0.51079451 0.49200596 0.5420393 ]


If the mean is too high or too low, it might suggest skewed data.

Useful for normalization (scaling all values based on the mean).

Compare features, detect bias or scaling needs

All values are between 0 and 1,  data is already scaled or normalized (possibly Min-Max or StandardScaler).