### **Problem: Medical Diagnosis System**

We need to develop a system that diagnoses a specific disease based on patient symptoms. The system should take symptoms as input from a user and output a diagnosis using a trained machine learning model.

### **Solution Design**

**1. Data Preprocessing**

Clean the data by handling missing values and encoding categorical variables such as "Yes" and "No".
Split the data into training and testing sets for model evaluation.

**2. Model Training**

We will use a classification algorithm such as Decision Tree or Random Forest to predict the disease based on symptoms like fever, cough, fatigue, and shortness of breath.
Train the model on the training dataset.

**3. Model Evaluation**

Use metrics such as accuracy, precision, recall, and F1-score to evaluate how well the model predicts on the test data.

**4. User Interface**

Create an interface that allows a user to input their symptoms and returns a predicted diagnosis.

### **Python Implementation**

Here is the implementation using a Decision Tree Classifier, a common choice for medical diagnosis problems, as it provides interpretable results and works well with categorical data.

In [None]:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [None]:
# Example data
data = {
    'Fever': ['Yes', 'Yes', 'No', 'Yes', 'No'],
    'Cough': ['No', 'Yes', 'Yes', 'Yes', 'No'],
    'Fatigue': ['Yes', 'Yes', 'No', 'Yes', 'No'],
    'Shortness of Breath': ['No', 'Yes', 'No', 'No', 'No'],
    'Diagnosis': ['Influenza', 'COVID-19', 'Common Cold', 'Bronchitis', 'Healthy']
}

# Create DataFrame
df = pd.DataFrame(data)

# Preprocess data: encode categorical variables
df_encoded = df.apply(lambda x: x.map({'Yes': 1, 'No': 0, 'Influenza': 0, 'COVID-19': 1, 'Common Cold': 2, 'Bronchitis': 3, 'Healthy': 4}))

# Split the data into features and target
X = df_encoded[['Fever', 'Cough', 'Fatigue', 'Shortness of Breath']]
y = df_encoded['Diagnosis']

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the Decision Tree Classifier
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# Predict on the test set
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='macro')
recall = recall_score(y_test, y_pred, average='macro')
f1 = f1_score(y_test, y_pred, average='macro')

# Display evaluation metrics
print(f'Accuracy: {accuracy}')
print(f'Precision: {precision}')
print(f'Recall: {recall}')
print(f'F1-score: {f1}')

# Function to make predictions based on user input
def diagnose_patient(fever, cough, fatigue, breathlessness):
    # Encode the inputs to match the training data
    symptoms = [[1 if fever == 'Yes' else 0,
                 1 if cough == 'Yes' else 0,
                 1 if fatigue == 'Yes' else 0,
                 1 if breathlessness == 'Yes' else 0]]
    
    # Predict the disease
    prediction = model.predict(symptoms)
    
    # Decode the prediction back to the disease name
    diseases = {0: 'Influenza', 1: 'COVID-19', 2: 'Common Cold', 3: 'Bronchitis', 4: 'Healthy'}
    return diseases[prediction[0]]

# Example usage
fever = input("Do you have a fever? (Yes/No): ")
cough = input("Do you have a cough? (Yes/No): ")
fatigue = input("Do you feel fatigued? (Yes/No): ")
breathlessness = input("Do you have shortness of breath? (Yes/No): ")

diagnosis = diagnose_patient(fever, cough, fatigue, breathlessness)
print(f'The diagnosis is: {diagnosis}')

### **Explanation of the Code:**

**1. Data Preprocessing:**

* We load the data and convert the categorical variables such as 'Yes'/'No' for symptoms and disease names into numeric values so they can be processed by the model.
* The dataset is split into a training set (80%) and a test set (20%) for evaluating model performance.

**2. Training:**

* We use a Decision Tree Classifier. This model works well for categorical inputs and provides clear, interpretable results, which is important in medical applications.

**3. Evaluation:**

* We evaluate the modelâ€™s accuracy, precision, recall, and F1-score on the test data. These metrics provide insights into how well the model is diagnosing diseases, especially in cases where the classes are imbalanced.

**4. User Input and Diagnosis:**

* The system asks the user for their symptoms (fever, cough, fatigue, and shortness of breath) and returns the most probable diagnosis based on the trained model.

### **Handling Errors:**

* **Missing Input**: The system could be expanded to handle cases where the user provides incomplete information by asking them to confirm or providing default values for missing data.
* **Unknown Symptom Combinations**: If the input symptoms do not match any of the trained patterns well, the system may make a less reliable prediction. You could mitigate this by outputting a confidence score along with the diagnosis.