# Automating Market Research with Gemini 1.5, Search, and Google Sheets


### Install the Google Generative AI Library
This cell installs the necessary Python package for interacting with Google's Gemini models. The `--upgrade` flag ensures that we have the latest version.

In [None]:
!pip install --upgrade google-generativeai

### Import necessary libraries and configure API key

This cell imports the required modules for working with JSON and the Google Generative AI library, and user data in Colab. It also retrieves the Gemini API key from user data stored in the Colab environment, and configures the `genai` library to use it.

**Please make sure you have stored your API key in the `GEMINI_API_KEY2` user data field in Colab.** You can do this via the Colab interface, or programmatically.

In [1]:
import json
import google.generativeai as genai
from google.ai.generativelanguage_v1beta.types import content
from google.colab import userdata
userdata.get('GEMINI_API_KEY2')

genai.configure(api_key=userdata.get('GEMINI_API_KEY2'))

### Setting Generation Parameters

This code block defines the parameters for the large language model's text generation. These parameters control aspects like the randomness of the output (`temperature`), how many of the most likely next words are considered (`top_p`, and `top_k`), the maximum length of the generated text (`max_output_tokens`), and the format of the response (`response_mime_type`).

In [None]:
generation_config = {
  "temperature": 1,
  "top_p": 0.95,
  "top_k": 40,
  "max_output_tokens": 8192,
  "response_mime_type": "text/plain",
}

### Initializing the Gemini model with Google Search retrieval

This code initializes a Gemini generative model. We specify the model name (`gemini-1.5-flash`) and pass the previously defined generation configuration. Crucially, we also equip the model with a Google Search retrieval tool. This allows the model to access and use real-time information from the web when generating its response.

### Making the Gemini model request

We also make the actual request to the language model. The prompt asks for information about the startup Vercel, requesting details about the company, its leadership, products, funding, and series level. The backslashes allow the prompt string to be split acrsos multiple lines in the code for readability.

In [None]:
# Choose a Gemini model.
model = genai.GenerativeModel(
  model_name="gemini-1.5-flash",
  generation_config=generation_config,
  tools = [
    genai.protos.Tool(
      google_search_retrieval = genai.protos.GoogleSearchRetrieval()
    ),
  ],
)

# Make the LLM request.
response = model.generate_content("Tell me a little bit about Vercel, the startup." \
                                  "Make sure to be very detailed, and include information" \
                                  "about the CEO, CTO, and all of the products that they " \
                                  "have created. Also reference which Series level " \
                                  "they are (A, B, C, etc.), and the size and date of " \
                                  "their last funding round.")

print(response.text)

Vercel is a cloud-based platform that simplifies frontend web application development, scaling, and security.  Founded by Guillermo Rauch (CEO), the company's headquarters are in San Francisco, California.  While the provided text doesn't name a CTO, it details several key products and funding information.

**Products:**

* **Next.js:** A popular open-source toolkit for building websites, allowing for cloud-based rendering of visual assets to speed up loading times. It uses React, an open-source library, for building website components.
* **DX Platform:** Vercel's flagship offering, simplifying the deployment of frontend code to production. It includes pre-packaged CI/CD pipelines, staging environments for testing, and the ability to quickly roll back updates if necessary.  It also offers managed website hosting infrastructure and cloud databases.
* **Serverless Storage Solutions:**  This suite includes Vercel KV, Vercel Postgres, Vercel Blob, and Vercel Edge Config, providing various 

### Defining the Response Schema

This code block defines the expected structure of the LLM's response using a schema.  The schema specifies that the response should be a JSON object with a required "company_name" field.

The `company_name` field itself is an object containing fields like `CEO_name`, `current_valuation`, `last_funding_month_and_year`, `series_level`, and `product_names`.  The `product_names` field is expected to be an array of strings.  This schema ensures the LLM output is structured and parsable. The `response_mime_type` is set to ensure the output is in JSON format.

In [None]:
generation_config = {
  "temperature": 1,
  "top_p": 0.95,
  "top_k": 40,
  "max_output_tokens": 8192,
  "response_schema": content.Schema(
    type = content.Type.OBJECT,
    enum = [],
    required = ["company_name"],
    properties = {
      "company_name": content.Schema(
        type = content.Type.OBJECT,
        enum = [],
        required = ["CEO_name", "current_valuation",
                    "last_funding_month_and_year",
                    "series_level", "product_names"],
        properties = {
            "product_names": genai.protos.Schema(
            type = genai.protos.Type.ARRAY,
            items = genai.protos.Schema(
              type = genai.protos.Type.STRING,
            ),
          ),
            "CEO_name": content.Schema(
            type = content.Type.STRING,
          ),
            "current_valuation": content.Schema(
            type = content.Type.STRING,
          ),
            "last_funding_month_and_year": content.Schema(
            type = content.Type.STRING,
          ),
            "series_level": content.Schema(
            type = content.Type.STRING,
          ),
        },
      ),
    },
  ),

  "response_mime_type": "application/json",
}

### Instantiating the Generative Model

This code initializes a `GenerativeModel` object from the `genai` library. It specifies the model to use (`gemini-1.5-flash-8b`) and the generation configuration defined in the previous step. This sets up the model for generating content based on the provided schema and parameters.

### Making the LLM Request and Processing the Response

This code makes the actual request to the language model using `model.generate_content()`. It passes `response.text` (presumably containing the prompt or context for the LLM) as input.  The LLM's response is then parsed as a JSON object using `json.loads()`. Finally, the JSON object is formatted with indentation using `json.dumps()` for readability and printed to the console.

In [None]:
model = genai.GenerativeModel(
  model_name="gemini-1.5-flash-8b",
  generation_config=generation_config,
)

# Make the LLM request.
response = model.generate_content(response.text)

json_object = json.loads(response.text)

json_formatted_str = json.dumps(json_object, indent=2)

print(json_formatted_str)

{
  "company_name": {
    "CEO_name": "Guillermo Rauch",
    "current_valuation": "$3.31 billion",
    "last_funding_month_and_year": "May 2024",
    "product_names": [
      "Next.js",
      "DX Platform",
      "Serverless Storage Solutions",
      "v0",
      "Vercel AI SDK"
    ],
    "series_level": "Series E"
  }
}


### Authentication and Setup

This section handles authentication with Google Colab and sets up the necessary libraries for interacting with Google Sheets.

### Create a New Google Sheet

This code creates a new Google Sheet titled "Startup market research", opens the Sheet, and adds data.

In [None]:
from google.colab import auth
auth.authenticate_user()

import gspread
from google.auth import default
creds, _ = default()

gc = gspread.authorize(creds)

sh = gc.create('Startup market research')

# Open our new sheet and add some data.
worksheet = gc.open('Startup market research').sheet1

# Flatten JSON data.
data = []
for key, value in json_object["company_name"].items():
    if key == "product_names":  # Handle product_names list separately
        # Join the products into a single string separated by commas
        value = ", ".join(value)
    data.append([key, value])  # Now value is a simple string or the joined string

# Update the worksheet with the flattened data
worksheet.update('A1', data)

  worksheet.update('A1', data)


{'spreadsheetId': '13F-kjLPG5RVebF8cp7jvBnWzYUbfvCJkdejQIWcrxKc',
 'updatedRange': 'Sheet1!A1:B5',
 'updatedRows': 5,
 'updatedColumns': 2,
 'updatedCells': 10}