In [None]:
import os
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain


# Set API key
os.environ["GOOGLE_API_KEY"] = "AIzaSyDTjCC5GTBSS5MXWJzYzoPueYcmcv58Wqw"

### Step 1: Import Required Libraries

We start by importing necessary Python libraries:
- `os`: For interacting with the operating system (used here to set environment variables).
- `pandas`: For data handling and analysis.
- `RandomForestClassifier`: A machine learning model from scikit-learn.
- `LabelEncoder`: Used for converting categorical variables into numerical format.
- `ChatGoogleGenerativeAI`, `PromptTemplate`, `LLMChain`: From `langchain` library for integrating Google Generative AI and building AI chains.

We use the `os.environ` method to set our Google API key as an environment variable.
**Note**: Never share your API key publicly. It should be kept secret for security reasons.


### Step 3: Load the Dataset

We load the customer churn dataset using `pandas.read_csv()`.  
- The dataset file is named **`churn_data.csv`**.
- `df.head()` is used to display the first five rows of the dataset to understand its structure.


In [None]:
#  Load Data
df = pd.read_csv("churn_data.csv")
df.head()

Unnamed: 0,CustomerID,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Churn
0,30000000,Male,55,10,106835.58,4,1,0,29489.59,1
1,30000001,Female,44,0,178.49,1,1,0,98065.66,1
2,30000002,Female,48,1,79542.9,4,0,1,114933.25,1
3,30000003,Male,32,6,28704.57,1,0,0,112406.14,1
4,30000004,Male,28,8,164803.36,1,0,0,102067.5,1


### Step 4: Preprocess the Data

We perform basic preprocessing steps:

- **Label Encoding**: The `Gender` column is converted from categorical (e.g., 'Male', 'Female') to numerical values using `LabelEncoder`.  
  - Example: Female → 0, Male → 1.

- **Feature and Target Split**:
  - `X`: Contains all features (input variables) except `CustomerID` and the target column `Churn`.
  - `y`: Contains the target variable `Churn`, which we want to predict.


In [26]:
# Step 2: Preprocess
le = LabelEncoder()
df['Gender'] = le.fit_transform(df['Gender'])  # Female: 0, Male: 1

X = df.drop(['CustomerID', 'Churn'], axis=1)
y = df['Churn']

### Step 5: Train the Machine Learning Model

We use the `RandomForestClassifier` from scikit-learn to train a model that predicts customer churn.

- `RandomForestClassifier()`: A powerful ensemble learning method that builds multiple decision trees and merges them for more accurate and stable predictions.
- `model.fit(X, y)`: Fits the model on the training data (`X` as features, `y` as target).


In [None]:
# Step 3: Train model
model = RandomForestClassifier()
model.fit(X, y)


### Step 6: Set Up LangChain with Gemini

We initialize a Generative AI model using **LangChain's Google Gemini integration**.

- `ChatGoogleGenerativeAI`: A LangChain wrapper that connects to Google's Gemini models.
- `model="gemini-1.5-flash"`: Specifies the version of the Gemini model to use.
- `temperature=0`: Controls randomness in the output. A lower value like `0` means more deterministic and consistent responses.


In [28]:
# LangChain Gemini Setup
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=0)

### Step 7: Define the Prompt Template for Gemini

We create a prompt template that tells the Gemini model exactly what information to extract from a user sentence.

 **The model should extract and return the following fields as JSON**:
- `Gender`: Male or Female
- `Age`: Numeric
- `Tenure`: Number of years as a customer
- `Balance`: Numeric bank balance
- `NumOfProducts`: Number of products the customer uses
- `HasCrCard`: 1 if the customer has a credit card, 0 otherwise
- `IsActiveMember`: 1 if the customer is active, 0 otherwise
- `EstimatedSalary`: Numeric estimated salary

 The sentence that needs to be analyzed will be inserted dynamically using `{sentence}`.


In [29]:
template = """
Extract the following fields from the sentence below and return as JSON with exact keys:
- Gender (Male/Female)
- Age (number)
- Tenure (number of years)
- Balance (number)
- NumOfProducts (number)
- HasCrCard (1 if has credit card, 0 otherwise)
- IsActiveMember (1 if active, 0 otherwise)
- EstimatedSalary (number)

Sentence: {sentence}
"""


### Step 8: Create the Prompt and LLM Chain

We now create the connection between the input sentence and the Gemini model using LangChain:

- `PromptTemplate`: 
  - Accepts dynamic input (`sentence`) and formats it using our previously defined prompt template.

- `LLMChain`: 
  - Connects the language model (`llm`) with the prompt to form a chain.
  - This chain will be used to send the formatted prompt to Gemini and get structured output.

 **User Input Sentence**:  
We define an example sentence describing a customer's details in natural language. The model will extract features from this sentence.


In [30]:
prompt = PromptTemplate(
    input_variables=["sentence"],
    template=template
)

chain = LLMChain(llm=llm, prompt=prompt)

# User input
sentence = "A 29-year-old female with a balance of 120000, 1 product, has a credit card, is active, earns 80000, and has been with us for 5 years."

### Step 9: Process the Sentence with Gemini and Extract JSON Output

We now use the created `LLMChain` to process the user input sentence and extract the relevant fields as a structured JSON object.

- `chain.run(sentence)`: This sends the `sentence` to the model, which processes it based on the defined prompt template.
- The model returns the extracted details (like Gender, Age, Tenure, etc.) as a JSON string, which we print for verification.

 **Extracted JSON**: The model will return the data in the specified format, such as:
```json
{
  "Gender": "Female",
  "Age": 29,
  "Tenure": 5,
  "Balance": 120000,
  "NumOfProducts": 1,
  "HasCrCard": 1,
  "IsActiveMember": 1,
  "EstimatedSalary": 80000
}


In [31]:
# Gemini processes the sentence
parsed_output = chain.run(sentence)
print("\nExtracted JSON:\n", parsed_output)


Extracted JSON:
 ```json
{
  "Gender": "Female",
  "Age": 29,
  "Tenure": 5,
  "Balance": 120000,
  "NumOfProducts": 1,
  "HasCrCard": 1,
  "IsActiveMember": 1,
  "EstimatedSalary": 80000
}
```


### Step 10: Extract and Parse JSON from Gemini Response

To ensure that the response from the Gemini model is in valid JSON format, we use regular expressions (regex) to search for and extract the JSON string:

- `re.search(r"\{.*\}", parsed_output, re.DOTALL)`: This regex searches for a JSON-like structure (curly braces `{}`) in the `parsed_output` string.
  - `re.DOTALL`: Allows `.` to match newline characters, ensuring the regex can capture multi-line JSON data.
- If a valid JSON string is found, we extract it using `.group(0)` and parse it using `json.loads()` to convert it into a Python dictionary (`user_data`).
- If the regex does not find valid JSON, a `ValueError` is raised to handle any parsing issues.

### Example Output: 
```json
{
  "Gender": "Female",
  "Age": 29,
  "Tenure": 5,
  "Balance": 120000,
  "NumOfProducts": 1,
  "HasCrCard": 1,
  "IsActiveMember": 1,
  "EstimatedSalary": 80000
}


In [32]:
import re
import json

# Try to extract JSON using regex
json_match = re.search(r"\{.*\}", parsed_output, re.DOTALL)
if json_match:
    json_str = json_match.group(0)
    user_data = json.loads(json_str)
else:
    raise ValueError("Could not extract valid JSON from Gemini response.")


### Step 11: Map Extracted Data to Feature List and Predict Churn

We convert the extracted data (`user_data`) into a format suitable for the trained model:

- **Mapping to Feature List**:
  - `Gender`: Maps "male" to 1 and "female" to 0.
  - `Age`, `Tenure`, `Balance`, `NumOfProducts`, `HasCrCard`, `IsActiveMember`, and `EstimatedSalary`: Each of these values is converted into an integer as required by the model.

- **Prediction**:
  - The `model.predict([feature_list])` method is used to predict the churn (1 for "Churn", 0 for "No Churn") based on the feature list.
  - The prediction is displayed as either `"Churn"` or `"No Churn"` based on the model's output.

### Example Output:
```bash
Churn Prediction: No Churn


In [33]:
# Map to feature list
feature_list = [
    1 if user_data["Gender"].lower() == "male" else 0,
    int(user_data["Age"]),
    int(user_data["Tenure"]),
    int(user_data["Balance"]),
    int(user_data["NumOfProducts"]),
    int(user_data["HasCrCard"]),
    int(user_data["IsActiveMember"]),
    int(user_data["EstimatedSalary"])
]

# Predict
prediction = model.predict([feature_list])[0]
print("\nChurn Prediction:", "Churn" if prediction == 1 else "No Churn")


Churn Prediction: No Churn


