In [None]:
import pandas as pd
import numpy as np

Patient Dataset -
*   Columns as Heart Rate (HR), Blood Pressure (BP), Temperature (Temp).
*   Each row represent's patient data about these 3


In [None]:
df_patient = pd.DataFrame({'HR' : [76, 74, 72, 78],
                           'BP' : [126, 120, 118, 136],
                           'Temp': [38, 38, 37.5, 37]}, index=['patient-1', 'patient-2', 'patient-3','patient-4'])
df_patient.head()

Unnamed: 0,HR,BP,Temp
patient-1,76,126,38.0
patient-2,74,120,38.0
patient-3,72,118,37.5
patient-4,78,136,37.0


In [None]:
def euclidean_distance(pt1, pt2):
  return np.linalg.norm(pt1 - pt2)

def normalization(df):
  return (df - (df.mean())) / (df.std())

1. Which patient is farthest from the rest?
2. Which two patients are nearest?

In [None]:
df_patient.mean(axis=0)

HR       75.0
BP      125.6
Temp     37.5
dtype: float64

In [None]:
central_patient = df_patient.mean(axis=0)
num_patients = len(df_patient)
distances = [euclidean_distance(central_patient, df_patient.iloc[i]) for i in range(num_patients)]
print(f'The patient that is farthest from all is patient-{distances.index(max(distances))}')

The patient that is farthest from all is patient-3


**The time complexity in both of the above case is O(n)**

In [None]:
def nearest_distance(df):
  nearest = np.inf
  nearest_pair = ()
  num_patients = len(df_patient)

  for i in range(num_patients):
    for j in range(i + 1, num_patients):
        euc_dist = euclidean_distance(df_patient.iloc[i], df_patient.iloc[j])

        if euc_dist < nearest:
            nearest = euc_dist
            nearest_pair = (i, j)
  print(f"The nearest distance is {nearest} between patients {nearest_pair}")

In [None]:
nearest_distance(df_patient)

The nearest distance is 2.8722813232690143 between patients (1, 2)


**The time complexity in both of the above case is O(n^2)**

3. Create a new dummy patient, and then find which is the closest patient?


In [None]:
df_patient.loc['patient-5'] = {'HR':75,'BP':128,'Temp':37}
df_patient

Unnamed: 0,HR,BP,Temp
patient-1,76,126,38.0
patient-2,74,120,38.0
patient-3,72,118,37.5
patient-4,78,136,37.0
patient-5,75,128,37.0


In [None]:
distances = (df_patient.apply(lambda row: euclidean_distance(df_patient.iloc[-1], row), axis = 1)).tolist()
print(f'the dummy patient it is closest to is {distances.index(min(distances[0:-1])) + 1}')

the dummy patient it is closest to is 1


2. Can you name one practical use of such a simple technique when about 100s of features are available and
thousands of patient data is available?

- Identifying Nearest Patients:
When encountering a new patient, identifying their closest match can be valuable. By examining how similar patients were treated in the past, we can draw insights to provide accurate predictions or recommendations for the new patient's treatment.

- Identifying the Most Distant Patient:
Determining a patient who is farthest from all others can offer insights into extreme cases. This is particularly useful for identifying patients with unique health conditions or significant health issues that set them apart from the healthier population.

- Examining the Farthest Patients:
Studying the farthest patients from each other can help us see on the differences between healthy and unhealthy individuals. This comparison helps us understand the different characteristics that separate these two groups, and it allows us to determine where new patients fit.
