# **Introduction to GoogleAI**

Launched in December 2023 after PaLM and LaMDA.

You can use Gemini chatbot which was formally known as Bard. Google announced Bard in February 2023 as a GenAI chatbot powered by LaMDA. Later chatbot switched to PaLM model before finally switching to the Gemini model. 

Let's list down a few reasons as why you might want to choose Gemini.
- **Context Window:** In May 2024, Gemini 1.5 was updated with a context window of 2 million tokens. To put that in perspective, 2 million tokens can  process up to 2 hours of video, 22 hours of audio, 60K lines of code, or 1.4 million words of text.
- **Multimodal Capabilities:** Works with text, images and videos.
- **Variety of options:** Variants: Ultra, Pro, Flash and Nano.
- **Generous free offerings:** Offers a free to use option.

**Important Links:**
1. [Gemini Chatbot](https://gemini.google.com/app)
2. []()

**Dependencies:**
```python
! pip install google-generativeai
```

In [2]:
# ! pip install google-generativeai

## **Importing Google Gemini AI**

In [3]:
import google.generativeai as genai

## **Setting the API Key**

In [4]:
f = open("keys/.gemini.txt")
key = f.read()

genai.configure(api_key=key)

## **Available Models**

In [5]:
for m in genai.list_models():
    print(m.name)

models/chat-bison-001
models/text-bison-001
models/embedding-gecko-001
models/gemini-1.0-pro-latest
models/gemini-1.0-pro
models/gemini-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-001
models/gemini-1.5-pro
models/gemini-1.5-flash-latest
models/gemini-1.5-flash-001
models/gemini-1.5-flash
models/embedding-001
models/text-embedding-004
models/aqa


## **Prompting the Gemini Model**

In [6]:
model = genai.GenerativeModel(model_name="gemini-1.5-flash")

user_prompt = """Complete the following:
                In our solar system, Earth is a """

response = model.generate_content(user_prompt)

print(response.text)

In our solar system, Earth is a **planet**. 



In [7]:
user_prompt = """Generate some factual information to complete the following in 2-3 lines:
                In our solar system, Earth is a """

response = model.generate_content(user_prompt)

print(response.text)

In our solar system, Earth is a **rocky planet**, the **third planet from the Sun**, and the **only known planet to harbor life.** 



## **Adding a System Prompt**

**Important Note:** System Prompt can be specified using `system_instruction`. `system_instruction` is not enabled for models/gemini-pro.

In [8]:
model = genai.GenerativeModel(model_name="models/gemini-1.5-pro-latest", 
                              system_instruction="""Generate some factual information to complete the user input. 
                              Completion must have maximum 2-3 lines.""")

user_prompt = """In our solar system, Earth is a """

response = model.generate_content(user_prompt)

print(response.text)

In our solar system, Earth is a **rocky planet with a dynamic surface**, home to a vast diversity of life. 



## **Important Parameters**

If you run the above code few times, you will notice that the output changes across runs. Generative models are **non-deterministic**. This means that even with the same input they can produce different outputs. This behavior allows for creativity and diversity in the generated outputs, which can be great when trying to generate different creative styles. There are parameters which can help us control this behavior like temperature, top_p, etc...

- **candidate_count:** This controls the number of responses that will be generated for a single prompt. Default value is 1. Increasing this will generate more text responses. This increase the resource usage.
- **stop_sequence:** It allows to specify a list of strings that will act as stopsigns for the model.
- **max_output_tokens:** This is the maximum number of tokens the model will generate in the response.
- **temperature:** It act as a control knob that influences the randomness of the model's output. A higher temperature value will result in a more varied and creative response. Lower values would be more effective in returning predictable results with an LLM.
- **top_p:** Range from [0.0, 1.0]. This is also known a **nucleus sampling**. The LLM only considers the next word options that cumulatively add up to a probability of reaching or exceeding the `top_p` value. A higher value will create looser threshold. This will allow the model to consider a wider range of probable options while still prioritizing the most likely ones. A lower `top_p` value will create a stricter threshold, leading to less diverse and more predictable outputs.
- **top_k:** This parameter limits the number of possible next words to the `k` most probable options based on the probability distribution. A lower `k` value restricts the selection to a smaller pool of the most likely words, leading to less diverse and more predictable outputs.

Both `top_p` and `top_k` works in conjunction with the `temperature` parameter.

In [14]:
model = genai.GenerativeModel("gemini-1.5-flash")

# Setting our parameters
custom_config = genai.types.GenerationConfig(max_output_tokens=256, temperature=1.0)

user_prompt = """What is feature selection in data science? Explain in detail."""

# Passing our custom parameters to the generate_content method
response = model.generate_content(user_prompt, generation_config=custom_config)

print(response.text)

## Feature Selection in Data Science: A Comprehensive Explanation

Feature selection is a crucial step in data science, particularly in machine learning. It involves **identifying and choosing the most relevant features (variables) from a dataset to use in model building**.  This process is essential for several reasons:

**Benefits of Feature Selection:**

* **Improved Model Performance:**  By removing irrelevant or redundant features, we reduce noise and complexity in the dataset. This leads to more accurate and efficient models.
* **Reduced Overfitting:**  Overfitting occurs when a model learns the training data too well and performs poorly on unseen data. Feature selection helps prevent overfitting by focusing on the most informative features.
* **Simplified Models:**  Fewer features mean simpler models that are easier to understand, interpret, and deploy.
* **Faster Training and Inference:**  With fewer features, models require less computational resources for training and making 

In [16]:
# Setting our parameters
custom_config = genai.types.GenerationConfig(temperature=0.1, top_p=0.1, top_k=32)

user_prompt = """What is feature selection in data science? Explain in detail."""

# Passing our custom parameters to the generate_content method
response = model.generate_content(user_prompt, generation_config=custom_config)

print(response.text)

## Feature Selection: The Art of Choosing the Right Variables

In data science, feature selection is a crucial process that involves **identifying and selecting the most relevant features (variables) from a dataset** for building a predictive model. It's like choosing the right ingredients for a recipe – the wrong ones can ruin the dish, while the right ones create a masterpiece.

**Why is Feature Selection Important?**

* **Improved Model Performance:** Irrelevant or redundant features can introduce noise and complexity, hindering model accuracy and generalization. Selecting the right features can lead to simpler, more interpretable, and more accurate models.
* **Reduced Overfitting:** Overfitting occurs when a model learns the training data too well, failing to generalize to unseen data. Feature selection helps prevent this by reducing the number of features, thus reducing the model's complexity.
* **Faster Training and Deployment:** Fewer features mean less data to process, leading 