# Gemini
The Gemini API gives you access to the latest generative AI models from Google.

#### Prerequisites  
Python 3.9+  
An installation of jupyter to run the notebook.

#### For Better Display of Formatted Text and Code in Jupyter Notebooks

The to_markdown function is used to convert a given text into Markdown format, ensuring a clear and structured display of both regular text and code blocks within Jupyter notebooks. It enhances readability by replacing bullet points and correctly formatting code, making it ideal for presenting mixed content in an organized manner.

In [1]:
import textwrap
from IPython.display import Markdown

def to_markdown(text):
    def format_lines(lines):
        formatted_lines = []
        in_code_block = False

        for line in lines:
            if line.startswith('```'):
                in_code_block = not in_code_block
                formatted_lines.append(line)
            elif in_code_block:
                formatted_lines.append(line)
            else:
                if line.startswith('•'):
                    line = '  *' + line[1:]
                formatted_lines.append('> ' + line)
        return formatted_lines

    lines = text.split('\n')
    formatted_lines = format_lines(lines)
    return Markdown('\n'.join(formatted_lines))

### Setup

The Python SDK for the Gemini API, is contained in the google-generativeai package. Install the dependency using pip :

In [2]:
# !pip install -q -U google-generativeai

### API KEY 

Before you can use the Gemini API, you must first obtain an API key. If you don't already have one, create a key with one click in Google AI Studio.  
[Get an Api Key](https://aistudio.google.com/app/apikey)
ey

In [3]:
import google.generativeai as genai

In [4]:
# Install python-dotenv if not already installed
!pip install python-dotenv



#### Create a file named .env in your project directory. This file will store your API key.

**Example** : GOOGLE_API_KEY=Put_the_key_here

In [5]:
import json

# Load JSON file
with open("d:/k2analytics/datafile/Google_Gemini_API_Key.json", 'r') as file:
    data = json.load(file)

# Extract value for a specific key
GOOGLE_API_KEY = data['GOOGLE_API_KEY']


In [6]:
genai.configure(api_key = GOOGLE_API_KEY)

## Models 

The Gemini API offers different models that are optimized for specific use cases. Here's a brief overview of Gemini variants that are available:

[Models](https://ai.google.dev/gemini-api/docs/models/gemini?authuser=1)

In [7]:
model = genai.GenerativeModel('gemini-1.5-pro')

The **generate_content method** can handle a wide variety of use cases, including multi-turn chat and multimodal input, depending on what the underlying model supports. The available models only support text and images as input, and text as output.

In [8]:
response = model.generate_content("What is Prompt Engineering")

In [9]:
to_markdown(response.text)

> ## Prompt Engineering: Crafting Instructions for Intelligent Machines
> 
> Prompt engineering is the art and science of crafting effective prompts to elicit desired responses from artificial intelligence (AI) systems, particularly large language models (LLMs) like ChatGPT. It's like giving clear and specific instructions to a highly skilled but literal-minded assistant.
> 
> **Think of it this way:**
> 
> * **AI:** The powerful but directionless engine.
> * **Prompt:** The steering wheel and instructions that guide the engine.
> * **Output:** The destination reached, shaped by the prompt's quality.
> 
> **Why is Prompt Engineering Important?**
> 
> LLMs are trained on massive datasets, making them capable of impressive feats.  However, they need guidance to understand your specific needs and deliver relevant results. This is where prompt engineering comes in. 
> 
> A well-crafted prompt can:
> 
> * **Improve accuracy and relevance:** Get more precise and tailored outputs.
> * **Unlock hidden capabilities:** Explore creative uses of the AI beyond basic tasks.
> * **Control output format:** Generate text, code, lists, and more in your desired style.
> * **Enhance consistency:** Ensure predictable results across similar prompts.
> 
> **Key Elements of Prompt Engineering:**
> 
> * **Clear Intent:** Define your objective clearly. What do you want the AI to achieve?
> * **Context & Background:** Provide relevant information and examples to guide the AI.
> * **Constraints & Formatting:** Specify desired length, style, format, and tone.
> * **Keywords & Phrases:** Use relevant terms that the AI understands.
> * **Iterative Refinement:** Experiment, analyze results, and tweak prompts for improvement.
> 
> **Examples of Prompt Engineering in Action:**
> 
> * **Basic:** "Write a short story about a cat who goes on an adventure."
> * **Advanced:**  "Write a humorous short story, in the style of Douglas Adams, about a cat named Mittens who travels through time in a malfunctioning washing machine. Limit the story to 500 words."
> 
> **The Future of Prompt Engineering:**
> 
> Prompt engineering is rapidly evolving as AI technology advances. New techniques and tools are emerging, making it easier for users to interact with AI effectively. 
> 
> **In a nutshell, prompt engineering is the key to unlocking the full potential of AI and transforming how we interact with machines.** 
> 

## Prompt Engineering for Linear Models 

Linear Models: Often used in tasks like regression .  

**Bad Prompt**  
"Explain linear regression."

**Good Prompt**   
"Can you explain the basic concept of linear regression, including its key assumptions and a simple example, for a beginner in statistics?"

**Bad Prompt Example**   
"Make a linear regression model."  

&emsp; *Why it's Bad:*
* Lacks Detail: The prompt is too vague and does not specify any parameters or details about the data, the target variable, or the features.
* No Context: It doesn’t provide any context about the data or the problem the model is supposed to solve.  
* Ambiguous Outcome: It doesn't specify what is expected as an output (e.g., just the model, performance metrics, or predictions).  


**Good Prompt Example**  
"Create a linear regression model using the provided dataset named as 'housePrice' to predict housing prices. The dataset includes features such as square footage, number of bedrooms, and age of the house. Use 'price' as the target variable. After creating the model, evaluate its performance using R-squared and Mean Absolute Error (MAE) metrics. Provide a summary of the model's coefficients and predictions for a sample of test data."

&emsp; *Why it's Good:*
* Specific Details: Clearly specifies the target variable ('price') and the features (square footage, number of bedrooms, age).
* Context Provided: Gives context about the data (housing prices) and the goal of the model (prediction).
* Expected Outcome: Outlines what is expected (model creation, performance evaluation, summary of coefficients, and sample predictions).

In [10]:
avg_lr_prompt = "Make a linear regression model."                
good_lr_prompt = """Create a linear regression model using the provided dataset named as 'housePrice' to predict housing prices. 
    The dataset includes features such as square footage, number of bedrooms, and age of the house. 
    Use 'price' as the target variable. 
    After creating the model, evaluate its performance using R-squared and Mean Absolute Error (MAE) metrics. 
    Provide a summary of the model's coefficients and predictions for a sample of test data."""


In [11]:
avg_lr_prompt_response = model.generate_content(good_lr_prompt) 
to_markdown(avg_lr_prompt_response.text)  

> I can't make a linear regression model without data! To create a model, I need information about the variables you want to explore. 
> 
> **Here's what I need from you:**
> 
> 1. **Dataset:** A table of data with at least two columns. 
>    * One column should represent the **independent variable (X)**. This is the variable you believe might influence or predict the other variable.
>    * The other column (or columns) should represent the **dependent variable (Y)**. This is the variable you want to predict or understand.
> 
> 2. **Context:** Tell me about your data. 
>    * What do the variables represent? (e.g., age, income, temperature, etc.)
>    * What are you trying to predict or understand? 
>    * Are there any specific relationships you expect to see?
> 
> **Once you provide this information, I can:**
> 
> * **Build a linear regression model:** I can find the line of best fit through your data, represented by an equation in the form of `Y = mX + b`.
> * **Interpret the results:** I can tell you how strong the relationship is between your variables, how well the model fits the data, and what the coefficients (m and b) mean.
> * **Make predictions:** You can give me new values for the independent variable, and I can use the model to predict the corresponding values for the dependent variable.
> 
> **Example:**
> 
> Let's say you want to explore the relationship between the number of hours studied and exam scores. 
> 
> * **Dataset:** You could provide a table with "Hours Studied" as the independent variable and "Exam Score" as the dependent variable.
> * **Context:** You're trying to understand if studying more hours leads to higher exam scores.
> 
> I can then build a linear regression model to analyze this relationship.
> 
> **Please provide your data and context so I can help you create your linear regression model!** 
> 

In [12]:
good_lr_prompt_response = model.generate_content(good_lr_prompt) 
to_markdown(good_lr_prompt_response.text)  

```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error

# **1. Load and Prepare the Data**

# Assuming 'housePrice' is a CSV file, replace 'your_file_path.csv'
house_data = pd.read_csv('your_file_path.csv')

# **2. Feature Selection and Data Splitting**
features = ['square_footage', 'bedrooms', 'age'] 
X = house_data[features]
y = house_data['price']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) 

# **3. Model Building**
model = LinearRegression()
model.fit(X_train, y_train)

# **4. Model Evaluation**
predictions = model.predict(X_test)

r2 = r2_score(y_test, predictions)
mae = mean_absolute_error(y_test, predictions)

print(f"R-squared: {r2:.2f}")
print(f"Mean Absolute Error: {mae:.2f}")

# **5. Model Coefficients Interpretation**
coefficients = pd.DataFrame({'Feature': features, 'Coefficient': model.coef_})
print("\nModel Coefficients:\n", coefficients)

# **6. Predictions on Sample Test Data**
sample_data = pd.DataFrame({'square_footage': [1500, 2000], 
                         'bedrooms': [3, 4], 
                         'age': [10, 25]})

sample_predictions = model.predict(sample_data)
print("\nPredictions for Sample Data:\n", sample_predictions)
```
> 
> **Explanation:**
> 
> 1. **Load and Prepare Data:** 
>    - Replace 'your_file_path.csv' with the actual path to your 'housePrice' dataset.
>    - The code assumes your dataset has columns named 'square_footage', 'bedrooms', 'age', and 'price'. Adjust column names if needed.
> 
> 2. **Feature Selection and Data Splitting:**
>    - We select the relevant features for prediction and store them in 'X'. 
>    - The target variable, 'price', is stored in 'y'.
>    - We use `train_test_split` to divide the data into training (80%) and testing (20%) sets.
> 
> 3. **Model Building:**
>    - A `LinearRegression` model is created and fitted (trained) on the training data.
> 
> 4. **Model Evaluation:**
>    - We predict house prices for the test data and calculate the R-squared and Mean Absolute Error (MAE) to assess model performance.
>      - **R-squared:** Measures the proportion of variation in the target variable explained by the model (higher is better).
>      - **MAE:**  Represents the average absolute difference between predicted and actual prices (lower is better).
> 
> 5. **Model Coefficients Interpretation:**
>    - The code displays the coefficients of the linear equation learned by the model. Each coefficient indicates the estimated change in house price for a one-unit change in the corresponding feature, holding other features constant.
> 
> 6. **Predictions on Sample Test Data:**
>    - We create a small 'sample_data' DataFrame to demonstrate how to make predictions on new data using the trained model.
> 
> **Output:**
> 
> - The output will show you the R-squared, MAE, the model's coefficients, and the predicted prices for the sample data.
> 
> **Key Points:**
> 
> - This code provides a basic framework. You might need to handle missing values, explore feature engineering, or try different regression models to improve accuracy depending on your dataset. 
> - Visualizations (e.g., scatter plots, residual plots) are also helpful for understanding model performance and identifying patterns in your data. 
> 

## Build Effective Prompts for Tree-Based Models 

Tree-Based Models: Includes decision trees, random forests, and gradient boosting machines. 

**Bad Prompt Example**   
"Create a tree-based model"

**Good Prompt Example**   
"Create a Random Forest classifier using the provided dataset to predict customer churn. The dataset includes features such as customer tenure, monthly charges, and contract type. Use 'churn' as the target variable. After creating the model, evaluate its performance using accuracy, precision, recall, and F1-score metrics. Additionally, provide the feature importance rankings and visualize the first decision tree in the forest."

In [13]:
prompt1 = "Create a tree-based model."              
prompt2 = """Create a Random Forest classifier using the provided dataset named telecom_churn to predict customer churn. 
    The dataset has the following features: customer_tenure, monthly_charges, and contract_type.
    Split the data in Train - Test proportion of 70:30
    Use K-Fold Cross-Validation factor of 10 while building the model and AUC as model performance measure for cross-validation.
    Use 'churn' as the target variable. 
    After creating the model, evaluate its performance using accuracy, precision, recall, and F1-score metrics. 
    Additionally, provide the feature importance rankings and visualize the first decision tree in the forest."""


In [15]:
treeBasedModel_Response = model.generate_content(prompt1) 
to_markdown(treeBasedModel_Response.text)

```python
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a decision tree classifier
model = DecisionTreeClassifier()

# Train the model on the training data
model.fit(X_train, y_train)

# Make predictions on the testing data
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
```
> 
> **Explanation:**
> 
> 1. **Import necessary libraries:**
>    - `load_iris`: Loads the iris dataset.
>    - `DecisionTreeClassifier`: Imports the Decision Tree classifier.
>    - `train_test_split`: Splits data into training and testing sets.
>    - `accuracy_score`: Calculates the accuracy of the model.
> 
> 2. **Load and prepare data:**
>    - `load_iris()`: Loads the iris dataset.
>    - `X = iris.data`: Assigns features to `X`.
>    - `y = iris.target`: Assigns target variable to `y`.
>    - `train_test_split(X, y, test_size=0.2, random_state=42)`: Splits data into 80% training and 20% testing sets with a fixed random state for reproducibility.
> 
> 3. **Create and train the model:**
>    - `model = DecisionTreeClassifier()`: Creates a Decision Tree classifier object.
>    - `model.fit(X_train, y_train)`: Trains the model using the training features and labels.
> 
> 4. **Make predictions:**
>    - `y_pred = model.predict(X_test)`: Predicts the target variable for the test features.
> 
> 5. **Evaluate the model:**
>    - `accuracy = accuracy_score(y_test, y_pred)`: Calculates the accuracy of the model by comparing predicted and actual labels.
>    - `print(f"Accuracy: {accuracy}")`: Prints the accuracy score.
> 
> This code demonstrates a simple implementation of a Decision Tree model for classification. You can modify the code by adjusting parameters like `criterion`, `max_depth`, and others to tune the model's performance. 
> 

In [16]:
treeBasedModel_Response = model.generate_content(prompt2) 
to_markdown(treeBasedModel_Response.text)

```python
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
from sklearn.model_selection import KFold
import matplotlib.pyplot as plt
from sklearn.tree import plot_tree

# **1. Load and Prepare the Data**

# Assuming 'telecom_churn.csv' is your dataset file
data = pd.read_csv('telecom_churn.csv') 

# **2. Data Preprocessing**

# One-hot encode categorical features (if any)
data = pd.get_dummies(data, columns=['contract_type'], drop_first=True)

# **3. Split Data into Features and Target**
X = data.drop('churn', axis=1)
y = data['churn']

# **4. Split Data into Training and Testing Sets**
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# **5. Create a Random Forest Classifier**
rf_classifier = RandomForestClassifier(random_state=42)

# **6. K-Fold Cross-Validation**
kfold = KFold(n_splits=10, shuffle=True, random_state=42)
auc_scores = cross_val_score(rf_classifier, X_train, y_train, cv=kfold, scoring='roc_auc')

print("Cross-Validation AUC Scores:", auc_scores)
print("Mean AUC Score:", auc_scores.mean())

# **7. Train the Model**
rf_classifier.fit(X_train, y_train)

# **8. Make Predictions**
y_pred = rf_classifier.predict(X_test)

# **9. Evaluate the Model**
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print("\nModel Evaluation Metrics:")
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1-Score:", f1)

# **10. Feature Importance**
importances = rf_classifier.feature_importances_
feature_names = X.columns

# Create a DataFrame for feature importance
feature_importance_df = pd.DataFrame({'Feature': feature_names, 'Importance': importances})
feature_importance_df = feature_importance_df.sort_values('Importance', ascending=False)

print("\nFeature Importance Rankings:")
print(feature_importance_df)

# **11. Visualize the First Decision Tree (Optional)**
plt.figure(figsize=(20, 10))
plot_tree(rf_classifier.estimators_[0], feature_names=feature_names, filled=True, rounded=True)
plt.show()

```
> 
> **Explanation:**
> 
> 1. **Load and Prepare Data:** Load your 'telecom_churn.csv' dataset.
> 2. **Data Preprocessing:** Handle categorical features using one-hot encoding.
> 3. **Split Data:** Split into features (X) and the target variable (y), then into training and testing sets.
> 4. **Create Classifier:** Initialize a RandomForestClassifier.
> 5. **K-Fold Cross-Validation:** Perform 10-fold cross-validation using AUC as the evaluation metric.
> 6. **Train Model:** Train the Random Forest model on the training data.
> 7. **Make Predictions:** Use the trained model to predict churn on the test data.
> 8. **Evaluate Model:** Calculate accuracy, precision, recall, and F1-score.
> 9. **Feature Importance:** Get feature importance scores and display them.
> 10. **Visualize Decision Tree (Optional):** Visualize the first decision tree in the forest (can be computationally expensive for large forests).
> 
> **Remember to replace 'telecom_churn.csv' with the actual file name of your dataset.** 
> 

In [18]:
def generate_modeling_prompt(model_type, task, dataset_description, features, target_variable, metrics, additional_requirements):
    prompt = (
        f"Create a {model_type} using the provided dataset to {task}. "
        f"{dataset_description}. "
        f"The dataset includes features such as {', '.join(features)}. "
        f"Use '{target_variable}' as the target variable. "
        f"After creating the model, evaluate its performance using {', '.join(metrics)} metrics. "
        f"{additional_requirements}"
    )
    return prompt

# Example usage
model_type = "Random Forest classifier"
task = "predict customer churn"
dataset_description = "The name of the dataset is df"
features = ["customer tenure", "monthly charges", "contract type"]
target_variable = "churn"
metrics = ["accuracy", "precision", "recall", "F1-score"]
additional_requirements = (
    "Additionally, provide the feature importance rankings and visualize the first decision tree in the forest."
)

TreeBasedPrompt = generate_modeling_prompt(model_type, task, dataset_description, features, target_variable, metrics, additional_requirements)
print(TreeBasedPrompt)

Create a Random Forest classifier using the provided dataset to predict customer churn. The name of the dataset is df. The dataset includes features such as customer tenure, monthly charges, contract type. Use 'churn' as the target variable. After creating the model, evaluate its performance using accuracy, precision, recall, F1-score metrics. Additionally, provide the feature importance rankings and visualize the first decision tree in the forest.


In [None]:
TBM_response = model.generate_content(TreeBasedPrompt)
to_markdown(TBM_response.text)

# Use Generative AI to describe Image

In [None]:
import PIL.Image

In [None]:
!curl -o image.jpg https://t0.gstatic.com/licensed-image?q=tbn:ANd9GcQ_Kevbk21QBRy-PgB4kQpS79brbmmEG7m3VOTShAn4PecDU5H5UxrJxE3Dw1JiaG17V88QIol19-3TM2wCHw
img1 = PIL.Image.open('image.jpg')
img1

In [None]:
response = model.generate_content(img1)
to_markdown(response.text)

In [None]:
!curl -o image.jpg https://images.pexels.com/photos/414612/pexels-photo-414612.jpeg
img2 = PIL.Image.open('image.jpg')
img2

In [None]:
response = model.generate_content(img2)
to_markdown(response.text)

In [None]:
!curl -o image.jpg https://cdn.pixabay.com/photo/2015/04/23/22/00/tree-736885_960_720.jpg
img3 = PIL.Image.open('image.jpg')
img3

In [None]:
response = model.generate_content(img3)
to_markdown(response.text)

### Thank you

##### If you are using LangChain with OpenAI. You may require the following installations
- pip install langchain
- pip install openai