<h3>Training Model for Predicting Symptoms using Logistic Regression</h3>

In [2]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# 1. Load CSV data
print("Loading dataset...")
df = pd.read_csv('symptoms.csv')
print(f"Dataset loaded with {df.shape[0]} rows and {df.shape[1]} columns.")

# 2. Handle missing values (Fill missing symptoms with "No symptom")
print("Handling missing values...")
df.fillna('No symptom', inplace=True)

# 3. Combine symptom columns into a single text column with "$" as the separator
print("Combining symptom columns into a single text column...")
df['Symptoms'] = df['Symptoms'] + '$' + df['symptom 1']

# 4. Encode disease names into numeric labels
print("Encoding disease labels...")
label_encoder = LabelEncoder()
df['Disease_Label'] = label_encoder.fit_transform(df['Disease'])

# 5. Define features (X) and labels (y)
X = df['Symptoms']  # Text data (symptoms)
y = df['Disease_Label']  # Labels (numeric encoded diseases)

# 6. TF-IDF Vectorization
print("Vectorizing symptoms using TF-IDF...")
vectorizer = TfidfVectorizer(max_features=5000, stop_words='english')  # Adjust max_features as needed
X_tfidf = vectorizer.fit_transform(X)

# 7. Train Logistic Regression model
print("Training Logistic Regression model...")
model = LogisticRegression(max_iter=1000)  # Increase max_iter if convergence issues occur
model.fit(X_tfidf, y)

# 8. Evaluate model (optional)
y_pred = model.predict(X_tfidf)
accuracy = accuracy_score(y, y_pred)
print(f"Accuracy on the entire dataset: {accuracy:.4f}")

# Function to predict symptoms for a given disease
def get_symptoms_for_disease(disease_name):
    # Predict the disease label from the model based on the symptoms
    disease_label = label_encoder.transform([disease_name])[0]  # Get the encoded label
    disease_data = df[df['Disease_Label'] == disease_label]
    
    if disease_data.empty:
        return f"No data found for disease: {disease_name}"
    
    # Get symptoms associated with the given disease
    symptoms = disease_data['Symptoms'].values[0]
    return symptoms

# Example of how to use the function
disease_name = 'Malaria'  # Replace with the disease name you're looking for
result = get_symptoms_for_disease(disease_name)

print(result)


Loading dataset...
Dataset loaded with 41 rows and 3 columns.
Handling missing values...
Combining symptom columns into a single text column...
Encoding disease labels...
Vectorizing symptoms using TF-IDF...
Training Logistic Regression model...
Accuracy on the entire dataset: 1.0000
Doctor, I’ve been feeling really unwell. I’ve had chills that make me shiver, and I’ve been vomiting on and off. On top of that, I’ve got a high fever that just doesn’t seem to go away. It’s been really hard to manage, and I’m starting to feel weak. Could you tell me what might be going on?$chills, vomiting, high fever


<h3>Saving the Model</h3>

In [3]:
import joblib

# Save the Logistic Regression model
print("Saving Logistic Regression model...")
joblib.dump(model, 'logistic_regression222_model.pkl')

# Save the TF-IDF vectorizer
print("Saving TF-IDF vectorizer...")
joblib.dump(vectorizer, 'tfidf222_vectorizer.pkl')

# Save the LabelEncoder
print("Saving LabelEncoder...")
joblib.dump(label_encoder, 'label222_encoder.pkl')

# If any other preprocessing steps were done, save those as well
# For example, saving the entire pipeline if used
# joblib.dump(preprocessing_pipeline, 'preprocessing_pipeline.pkl')

print("Model, vectorizer, and label encoder saved successfully.")


Saving Logistic Regression model...
Saving TF-IDF vectorizer...
Saving LabelEncoder...
Model, vectorizer, and label encoder saved successfully.


<h3>Testing the model</h3>

In [1]:
import joblib
import pandas as pd
from sklearn.preprocessing import LabelEncoder  # Import LabelEncoder

# Load the saved Logistic Regression model
loaded_model = joblib.load('logistic_regression222_model.pkl')

# Load the saved TF-IDF vectorizer
loaded_vectorizer = joblib.load('tfidf222_vectorizer.pkl')

# Load the CSV dataset for symptoms and diseases (same as training data)
df = pd.read_csv('symptoms.csv')

# Check if 'Disease_Label' exists in the DataFrame
if 'Disease_Label' not in df.columns:
    # If the 'Disease_Label' doesn't exist, encode the disease labels
    label_encoder = LabelEncoder()
    df['Disease_Label'] = label_encoder.fit_transform(df['Disease'])

# Function to predict symptoms for a given disease
def get_symptoms_for_disease(disease_name):
    # Encode disease name to its label (using LabelEncoder from the training process)
    disease_label = label_encoder.transform([disease_name])[0]  # Get the encoded label

    # Filter dataset to find symptoms for the given disease label
    disease_data = df[df['Disease_Label'] == disease_label]
    
    if disease_data.empty:
        return f"No data found for disease: {disease_name}"
    
    # Get symptoms associated with the given disease
    symptoms = disease_data['Symptoms'].values[0]
    symptom_1 = disease_data['symptom 1'].values[0]
    return symptoms, symptom_1

# Example of how to use the function
disease_name = input("Enter the disease name: ")  # User provides the disease name
result, symptom_1 = get_symptoms_for_disease(disease_name)

# Print the symptoms associated with the disease
print(f"Symptoms for {disease_name}: {result}")
print(f"Additional symptom: {symptom_1}")


Enter the disease name:  Dengue


Symptoms for Dengue: Doctor, I’ve been running a high fever for the past few days, and I’ve been feeling nauseous pretty frequently. On top of that, I’ve lost my appetite completely, and I can’t seem to eat anything without feeling worse. Can you help me understand what might be causing this?
Additional symptom: high fever, nausea, loss of appetite
