**To get the csv file click the below link**
[Click me](https://drive.google.com/file/d/14QfbZcIInmIaCZ75QdNWHh-DZv0HIsSu/view)

In this step, we import the necessary libraries for our machine learning project:

- **`pandas`**: Used for handling and analyzing structured data.
- **`numpy`**: Provides support for large, multi-dimensional arrays and numerical operations.
- **`train_test_split`** *(from sklearn.model_selection)*: Splits the dataset into **training** and **testing** sets.
- **`StandardScaler`** *(from sklearn.preprocessing)*: Standardizes features by removing the mean and scaling to unit variance — this is especially important for algorithms like SVM.
- **`SVC`** *(Support Vector Classifier from sklearn.svm)*: A powerful classifier that works well with both linear and non-linear data.
- **`accuracy_score`** *(from sklearn.metrics)*: Measures how often the classifier correctly predicts labels — a key **performance metric**.


In [8]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

We are downloading the dataset from **Google Drive** using the `gdown` library, which allows file access via shared Drive links.

- The dataset is saved as **`parkisions.csv`**.
- After downloading, we load the file into a **pandas DataFrame** for further processing.

> This step is crucial to make the dataset available locally before beginning any analysis or model training.


In [None]:
# Load dataset
#df = pd.read_csv('parkinsons.data')



# Google Drive file ID (Replace with your actual file ID)
import gdown


file_id = "14QfbZcIInmIaCZ75QdNWHh-DZv0HIsSu"
url = f"https://drive.google.com/uc?id={file_id}"


# Define output file name
output = "parkisions.csv"

# Download the file
print("Downloading dataset from Google Drive... ⏳")
gdown.download(url, output, quiet=False)
print("Download complete! ✅")

# Load CSV into DataFrame
df = pd.read_csv(output)




Downloading dataset from Google Drive... ⏳


Downloading...
From: https://drive.google.com/uc?id=14QfbZcIInmIaCZ75QdNWHh-DZv0HIsSu
To: d:\kailas\aiml\AI_ML_projects\Parkison_prediction\Suicide_Detection.csv
100%|██████████| 40.7k/40.7k [00:00<00:00, 431kB/s]


Download complete! ✅


In this step, we prepare our dataset for training the machine learning model:

- **`X`** (features): We drop the **`name`** (non-numeric, irrelevant for prediction) and **`status`** (target variable) columns.
- **`y`** (target): This contains the **`status`** column, which indicates whether a person has Parkinson's disease (1) or not (0).

---

###  **Train-Test Split**
We split the dataset into **training** and **testing** sets using `train_test_split()`:

- `test_size=0.2`: Reserves **20%** of the data for testing.
- `random_state=42`: Ensures the results are **reproducible** by setting a seed.

>  This split helps us evaluate how well the model generalizes to unseen data.


In [None]:
# Features and target
X = df.drop(['name', 'status'], axis=1)
y = df['status']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

We use **`StandardScaler`** to standardize the feature values:

- Standardization transforms the data so that it has a **mean of 0** and a **standard deviation of 1**.
- This step is **especially important for SVMs**, which are sensitive to the scale of input features.

---

###  Steps:
- `scaler.fit_transform(X_train)`: **Fits** the scaler on the training data and then **transforms** it.
- `scaler.transform(X_test)`: Transforms the test data using the same scaler to ensure **consistency**.



In [None]:
# Standardization
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

We train a **Support Vector Machine (SVM)** classifier using a **linear kernel**, which works well when the classes are linearly separable.

---

###  **Model Training**
- `SVC(kernel='linear')`: Initializes an SVM with a **linear decision boundary**.
- `model.fit(...)`: Trains the SVM using the **scaled training data**.

---

###  **Model Evaluation**
- `model.predict(...)`: Generates predictions on the **scaled test set**.
- `accuracy_score(...)`: Compares predictions to true labels to calculate the **accuracy** of the model.

>  A high accuracy score indicates the model is performing well on unseen data.


In [None]:
# Train SVM model
model = SVC(kernel='linear')
model.fit(X_train_scaled, y_train)

# Evaluate
y_pred = model.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)
print(f"SVM Model Accuracy: {accuracy:.4f}")

SVM Model Accuracy: 0.8718


We use the trained SVM model to make a prediction on **custom input data**. Here's the process:

---

### **Steps:**

1. **Define Input Data**:
   - A sample tuple representing the voice measurements of a patient (replace with actual data as needed).
   
2. **Convert to Numpy Array**:
   - `np.asarray(...).reshape(1, -1)`: Converts the input into the correct 2D shape required by the model.

3. **Scale the Input**:
   - `scaler.transform(...)`: Standardizes the new data using the same scaler fitted on the training data.

4. **Predict**:
   - `model.predict(...)`: Uses the trained SVM model to predict if the person has **Parkinson’s disease (1)** or **not (0)**.

>  Make sure the input has the **same number of features** and **order** as the training data.


In [None]:

# Input data for prediction (replace with your test input)
input_data = (180.97800,200.12500,155.49500,0.00406,0.00002,0.00220,0.00244,
              0.00659,0.03852,0.33100,0.02107,0.02493,0.02877,0.06321,0.02782,
              16.17600,0.583574,0.727747,-5.657899,0.315903,3.098256,0.200423)

# Convert to numpy, reshape, scale, and predict
input_array = np.asarray(input_data).reshape(1, -1)
input_scaled = scaler.transform(input_array)
prediction = model.predict(input_scaled)




We interpret the model's prediction to display a human-readable output:

- If the predicted value is **1**, it means the patient is **positive for Parkinson's disease**.
- If the predicted value is **0**, the patient is **negative** (i.e., no Parkinson's detected).

---

###  **Final Output**
The result is printed clearly for the user:

```python
Prediction result:
Positive for Parkinson's  → if model predicts 1  
Negative for Parkinson's  → if model predicts 0


In [None]:
# Output prediction
print("\nPrediction result:")
if prediction[0] == 1:
    print("Positive for Parkinson's")
else:
    print("Negative for Parkinson's")


Prediction result:
Positive for Parkinson's


We use the **`joblib`** library to save the trained **SVM model** and the **StandardScaler** object. This allows us to reuse them later without retraining.

---

###  Files Created:
- **`svc_model.pkl`** → Contains the trained SVM model.
- **`scaler.pkl`** → Contains the fitted StandardScaler used for input preprocessing.

>  Saving models is essential for **deployment** or future **inference** without retraining.


In [None]:
import joblib


joblib.dump(model, 'svc_model.pkl')
joblib.dump(scaler, 'scaler.pkl')

['scaler.pkl']

In this step, we import the necessary libraries to work with **LangChain** and **Google's Generative AI**:

- **`ChatGoogleGenerativeAI`** *(from langchain_google_genai)*: This class allows us to interact with Google's generative AI services.
- **`PromptTemplate`** *(from langchain.prompts)*: Used for creating structured input prompts to send to the model.
- **`LLMChain`** *(from langchain.chains)*: Helps in chaining multiple language models together to perform complex tasks.
- **`numpy`**: Provides support for numerical operations (if needed for preprocessing or other tasks).
- **`pickle`**: Allows us to serialize and deserialize Python objects, enabling saving and loading of data structures.

>  This setup is important for integrating advanced AI models into our application using **LangChain**.


In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
import numpy as np
import pickle

In this step, we save the trained model and scaler using the **`pickle`** module. This allows us to load them later for predictions without retraining.

---

###  **Steps:**

1. **Save the Model**:
   - The trained **SVM model** is saved using `pickle.dump()`, with the file name **`svc_model.pkl`**.

2. **Standardization and Saving the Scaler**:
   - We fit and transform the **`X_train`** data using **`StandardScaler`** to standardize the features.
   - The scaler is then saved to a file, **`scaler.pkl`**, ensuring we can use the same scaling method for future predictions.

>  We use `'wb'` (write binary mode) to ensure the data is saved correctly.


In [None]:
# Load model and scaler
with open('svc_model.pkl', 'wb') as f:
     pickle.dump(model, f)
     
# Assuming X_train is your training data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)  # Fit and transform training data

# Save the scaler to a file
with open('scaler.pkl', 'wb') as f:
    pickle.dump(scaler, f)

In this step, we set up the **Google Gemini API** to enable integration with **LangChain**.

---

###  **Steps:**

1. **Set the API Key**:
   - We use the **`os.environ`** method to securely set the **Google API key** (`GOOGLE_API_KEY`).
   - Make sure to replace the API key string with your actual key.

2. **Initialize the Gemini Model**:
   - `ChatGoogleGenerativeAI(model="gemini-1.5-flash")`: This initializes the **Gemini 1.5 Flash model**, allowing us to interact with Google's generative AI.

>  The **Gemini API** is now set up and ready for use in the LangChain pipeline.


In [None]:
# Set up Gemini API
import os
os.environ["GOOGLE_API_KEY"] = "AIzaSyDTjCC5GTBSS5MXWJzYzoPueYcmcv58Wqw"

llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash")

In this step, we create a **prompt template** to guide the AI in extracting specific voice features from a patient's speech message.

---

###  **Steps:**

1. **Prompt Template**:
   - The template is designed to extract **22 voice features** from the patient's speech message, including measurements like **Fo**, **Jitter**, **Shimmer**, and more.
   - The prompt specifies that the model should return only a **comma-separated list** of numeric values without any explanation or additional text.

2. **Voice Features to Extract**:
   - Fo, Fhi, Flo, Jitter(%), Jitter(Abs), RAP, PPQ, DDP, Shimmer, Shimmer(dB),
   - Shimmer:APQ3, APQ5, APQ, DDA, NHR, HNR, RPDE, DFA, spread1, spread2, D2, PPE.

---

###  **Why Use This Template?**
This structured prompt ensures that the AI focuses on extracting the **exact values** needed, making the process **efficient** and **consistent**.


In [None]:
# Prompt template to extract values
template = """
You are an assistant that extracts medical voice measurements from casual patient speech.

From the patient's message, extract the following **22 voice features** in this order:
Fo, Fhi, Flo, Jitter(%), Jitter(Abs), RAP, PPQ, DDP, Shimmer, Shimmer(dB),
Shimmer:APQ3, APQ5, APQ, DDA, NHR, HNR, RPDE, DFA, spread1, spread2, D2, PPE.

Only return a comma-separated list of 22 numeric values, no text explanation.

Patient message:
{sentence}
"""


In this step, we configure the **PromptTemplate** and use it in an **LLMChain** to generate the desired output from the Gemini model.

---

###  **Steps:**

1. **Create a PromptTemplate**:
   - **`PromptTemplate`** is initialized with the input variable **`sentence`**, which represents the patient's speech.
   - The **`template`** is the prompt we defined earlier, which outlines the 22 voice features to extract.

2. **Create an LLMChain**:
   - The **`LLMChain`** combines the **Gemini model** (`llm`) and the **prompt template** to form a complete workflow.
   - The `LLMChain` will take the **input sentence**, process it through the prompt, and return the extracted voice features.

>  The setup is now ready to process any patient message and extract the required voice measurements.


In [None]:
prompt = PromptTemplate(
    input_variables=["sentence"],
    template=template,
)

chain = LLMChain(llm=llm, prompt=prompt)



  chain = LLMChain(llm=llm, prompt=prompt)


This function **`predict_from_sentence`** processes a patient's speech input, extracts the required voice features, and predicts whether the person has **Parkinson’s disease**.

---

###  **Steps:**

1. **Input Sentence**:
   - The function takes the patient's speech input as a string (**`sentence`**).

2. **Extract Voice Features**:
   - The **`chain.run(sentence)`** method runs the input sentence through the **LLMChain**, which processes the speech and returns a **comma-separated list of 22 extracted values**.

3. **Validate the Extracted Values**:
   - The extracted values are converted into a **list of floats**.
   - The function checks that exactly **22 values** are extracted; if not, it raises an error.

4. **Standardize Input Data**:
   - The values are reshaped and **standardized** using the previously saved **scaler** to match the format expected by the SVM model.

5. **Predict**:
   - The **`model.predict()`** method is used to predict the likelihood of Parkinson’s disease, returning either **Positive** or **Negative**.

---

###  **Error Handling**:
- The function includes error handling to catch issues during the extraction or prediction process.


In [None]:
sentence = input()

# --- Function to process input and predict ---
def predict_from_sentence(sentence):
    response = chain.run(sentence)
    print("\n🔹 Extracted Values:\n", response)

    try:
        values = [float(x.strip()) for x in response.split(',')]
        if len(values) != 22:
            print(" Invalid number of values extracted.")
            return

        input_array = np.asarray(values).reshape(1, -1)
        input_scaled = scaler.transform(input_array)
        prediction = model.predict(input_scaled)

        result = "Positive for Parkinson's" if prediction[0] == 1 else "Negative for Parkinson's"
        print("\n Prediction:", result)

    except Exception as e:
        print(" Error during prediction:", e)

predict_from_sentence(sentence)



  response = chain.run(sentence)



🔹 Extracted Values:
 199.23,209.51,192.09,0.00241,0.00001,0.00134,0.00138,0.00402,0.01015,0.089,0.00504,0.00641,0.00762,0.01513,0.00167,30.94,0,0.432,0.742,-7.68,0.173,0.0685

 Prediction: Negative for Parkinson's


