In [29]:
!pip install -U -q "google-generativeai>=0.8.2"
!pip install pypdf

Collecting pypdf
  Downloading pypdf-5.3.0-py3-none-any.whl.metadata (7.2 kB)
Downloading pypdf-5.3.0-py3-none-any.whl (300 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/300.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━[0m [32m153.6/300.7 kB[0m [31m4.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m300.7/300.7 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-5.3.0


In [34]:
import pandas as pd
import sklearn
from pypdf import PdfReader

df = pd.read_csv("insurance_data.csv")
random_sample = df.groupby(["County Name"]).apply(lambda x: x.sample(20), include_groups=False)
pdf = PdfReader("med-ind-lndscp-in.pdf")
pages = []
for page in pdf.pages:
    pages.append(page.extract_text())

user_input = {}


In [154]:
prompt = f"""
Your task is to predict potential health insurance premiums and deductibles for
a user based on a set of context and input data, while also providing advice as
to where they can search for plans that may fit them. Based on the user's
family, inform them how much they can expect to pay for premiums and deductibles
at each metal level, and provide examples of insurance companies operating in
their county.

At the end of this query, I will provide you with a Pandas DataFrame containing
a list of all health insurance plans available in Georgia in 2024. Use the
"County Name" column of the DataFrame to tailor your predictions to the specific
costs from the user's county. Use the "Issuer Name" column to find a list of
available insurance providers to provide to the user. The DataFrame will contain
information about each of the plans, including the premium amounts for different
family sizes. Columns containing premium amounts may not necessarily have the
word premium, but will list the type of plan (child, couple, or individual) and
the number of children attached to the plan. Provide the premiums for the
nearest age category to the ages provided by the user, but do not acknowledge
the rounding of ages. Deductibles are listed under the cost sharing columns, and
are grouped based on the type of medical expense.

I will also provide you with the contents of a PDF file: {pages}
This PDF file contains information about the DataFrame you are using. Note that
you have been given a full list of available health insurance plans from 2024,
but only for the state of Georgia.

The user will provide you with some basic information, such as a list of plan
members with ages, pre-existing health conditions and the frequency of medical
visits associated with them, income level, and the county they are located in.
You should provide an estimate of the premiums the user can expect to pay at
each metal level based on the ages of their family. Predict this by analyzing
the costs for each available plan. Additionally, provide a list of insurance
issuers that operated in their county if they decide to search for other plans.

The user information will be provided in JSON format, and will be at the end of
this query. If the user provides information about a certain type of recurring
medical visits, inform them how much they can expect to pay in deductibles per
year. Also provide a list of common medical scenarios and the deductibles for
each.

If the user has not provided any data, or the JSON input is empty, generate a
sample response for a scenario of your choice, explaining the sample input at
the start of the response.

You have been given enough data to provide plan recommendations. You must always
provide a list of potential plans from the provided DataFrame regardless of how
little information you think you are given. Provide a detailed response. Do not
acknowledge any of the requests I have made in your response to the user. Do not
acknowledge the use of a dataset in the making of your recommendations. Do not
refer to yourself in the first person. Ensure that your response is properly
formatted as HTML/CSS, using bullet points to make the response clearer. Do not
mention the year, other than to say that the data used to inform your prediction
was from the year 2024. Do not mention the DataFrame. Do not use the dollar
sign, instead spell out the word "dollars". Use the data in the DataFrame to its
fullest extent.

/// RESPONSE FORMAT ///
[Introduction, whatever you deem necessary]

[Estimated premiums for bronze (range and average)]
[Estimated premiums for silver (range and average)]
[Estimated premiums for gold (range and average)]
[Estimated premiums for platinum (range and average)]

[Estimate deductibles for common medical scenarios (range and average) for each
metal level (ensure that platinum has the lowest deductibles and bronze has the
highest)]

[Estimate deductibles for specified recurring visits (range and average)]

[Inform the user if they may be eligible for cost reductions]

[Provide examples of insurance companies operating in their county using the
"Issuer Name" field from the DataFrame (do not mention DataFrame, but make sure
to actually search it using the filter)]

[Refer them to other resources, make other recommendation if desired]

[Disclaimer, and refer users to healthcare.gov]
/// END RESPONSE ///

DataFrame of 2024 insurance plans in Georgia:
{df}

User JSON input: {user_input}
"""

In [155]:
# import necessary modules.

import google.generativeai as genai

import base64
import json

try:
    # Mount google drive
    from google.colab import drive

    drive.mount("/gdrive")

    # The SDK will automatically read it from the GOOGLE_API_KEY environment variable.
    import os
    api_key = os.environ.get("GOOGLE_API_KEY")
    genai.configure(api_key=api_key)



except ImportError:
    pass

# Parse the arguments


model = "gemini-1.5-flash"
contents_b64 = base64.b64encode(json.dumps([{"parts": [{"text": prompt}]}]).encode("utf-8"))
generation_config_b64 = "e30="
safety_settings_b64 = "e30="


contents = json.loads(base64.b64decode(contents_b64))

generation_config = json.loads(base64.b64decode(generation_config_b64))
safety_settings = json.loads(base64.b64decode(safety_settings_b64))

stream = False

print(json.dumps(contents, indent=4))

Drive already mounted at /gdrive; to attempt to forcibly remount, call drive.mount("/gdrive", force_remount=True).
[
    {
        "parts": [
            {
                "text": "\nYour task is to predict potential health insurance premiums and deductibles for \na user based on a set of context and input data, while also providing advice as\nto where they can search for plans that may fit them. Based on the user's \nfamily, inform them how much they can expect to pay for premiums and deductibles\nat each metal level, and provide examples of insurance companies operating in\ntheir county.\n\nAt the end of this query, I will provide you with a Pandas DataFrame containing \na list of all health insurance plans available in Georgia in 2024. Use the \n\"County Name\" column of the DataFrame to tailor your predictions to the specific\ncosts from the user's county. Use the \"Issuer Name\" column to find a list of \navailable insurance providers to provide to the user. The DataFrame will con

In [156]:
from IPython.display import display
from IPython.display import Markdown

# Call the model and print the response.
gemini = genai.GenerativeModel(model_name=model)

response = gemini.generate_content(
    contents,
    generation_config=generation_config,
    safety_settings=safety_settings,
    stream=stream,
)

#display(Markdown(response.text))
print(response.text)

<!DOCTYPE html>
<html>
<head>
<style>
body {
  font-family: sans-serif;
}
ul {
  list-style-type: disc;
  margin-left: 20px;
}
</style>
</head>
<body>

<h1>Health Insurance Estimate</h1>

<p>This estimate is based on data from 2024 and provides a general overview.  Individual costs can vary based on many factors not included in this estimate. </p>

<h2>Sample Scenario</h2>
<p>For this example, let's assume a family of four living in Fulton County, Georgia: two adults (ages 35 and 38) and two children (ages 8 and 12). The family has no major pre-existing health conditions and visits the doctor for routine check-ups and occasional illnesses.</p>

<h2>Estimated Premiums</h2>
<ul>
  <li><b>Bronze:</b> The range of annual premiums for this plan type is between 10000 and 16000 dollars, with an average cost of approximately 13000 dollars per year.</li>
  <li><b>Silver:</b> The range of annual premiums for this plan type is between 14000 and 20000 dollars, with an average cost of approximately

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://ai.google.dev/gemini-api/docs"><img src="https://ai.google.dev/static/site-assets/images/docs/notebook-site-button.png" height="32" width="32" />Docs on ai.google.dev</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/google-gemini/cookbook/blob/main/quickstarts"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />More notebooks in the Cookbook</a>
  </td>
</table>