[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/cookbook/blob/main/gen-ai/google-ai/gemini-2/web-search.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/aurelio-labs/cookbook/blob/main/gen-ai/google-ai/gemini-2/web-search.ipynb)

# Gemini 2.0 Flash Web Search + Citations

## Setup Instructions

This notebook has been tested both locally with Python `3.12.7` and in Google Colab with Python `3.10.12`.

To run locally, please refer to the [setup instructions with `uv` here](https://github.com/aurelio-labs/agents-course/blob/main/local-setup.md). To run in Google Colab, simply run all cells in the notebook.

## Setup with Google AI

To begin, we will need a Google API key. For that, you can setup an account in [Google AI Studio](https://aistudio.google.com). To initialize and begin using Gemini 2 that is all we need, but later we will need to upgrade our plan to get access to the Google Search tool.

After you have your account and API key, we initialize our [`google.genai` client](https://github.com/googleapis/python-genai):

In [1]:
import os
from getpass import getpass
from google import genai

# pass your API key here
os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY") or getpass(
    "Enter Google API Key: "
)
# initialize our client
client = genai.Client()

Enter Google API Key: ··········


To generate some text with Gemini we call the `generate_content` method, Gemini does tend to respond with markdown, and is pretty verbose, so we'll use IPython's `Markdown` to display the output.

In [2]:
from IPython.display import Markdown

model_id = "gemini-2.0-flash-exp"

response = client.models.generate_content(
    model=model_id, contents='What is Gemini?'
)
Markdown(response.text)

Gemini refers to a few different things, but most likely you're asking about:

*   **Google Gemini:** This is Google's most advanced large language model (LLM). It's a family of models designed to be multimodal, meaning it can understand and generate content from different types of information like text, images, audio, and video. There are different sizes of Gemini models, including Gemini Ultra, Gemini Pro, and Gemini Nano, each designed for different capabilities and use cases. Gemini powers various Google products and services, including Bard (now Gemini).

Here's a breakdown of the key aspects of Google Gemini:

*   **Multimodal:** Can process and generate text, code, images, audio, and video.
*   **Advanced Reasoning:** Designed to solve complex problems and perform reasoning tasks.
*   **Different Sizes:** Available in Ultra, Pro, and Nano versions for varying performance and device compatibility.
*   **Integration:** Integrated into Google products like Gemini (formerly Bard), Search, and more.

If you're interested in a specific aspect of Gemini, please provide more details, and I can give you a more tailored explanation.


We got a pretty descriptive response. Now let's take a look at tool use in Gemini, starting with the `GoogleSearch` tool.

_Note: to use this tool, you **must** upgrade your plan, you can do that by navigating to your Google AI studio settings and clicking the **API Plan Information** link. For the purposes of this notebook, you should be covered by their free allowance but in any case you must have billing enabled._

In [3]:
from google.genai.types import Tool, GoogleSearch

search_tool = Tool(google_search=GoogleSearch())

To use the tool, we must create a [`GenerateContentConfig` object](https://github.com/googleapis/python-genai/blob/cf3c476ea300acc09b514d23b5f1f08c05efce60/google/genai/types.py#L1404). This object is used to configure input into Gemini, this can include `system_instructions`, `temperature`, `max_output_tokens`, our `response_modalities`, and _very importantly_, our `tools`.

Most of these parameters are optional and the _defaults_ seem to depend on the model being used although it isn't clear what the default parameters are for Gemini 2.0 Flash. Nonetheless, for parameters whose defaults are more easily guessable we have included them in our `config` object definition below.

In [4]:
from google.genai.types import GenerateContentConfig

config = GenerateContentConfig(
    system_instruction=(
        "You are a helpful assistant that provides up to date information "
        "to help the user in their research."
    ),
    tools=[search_tool],
    response_modalities=["TEXT"],
    temperature=0.0,  # likely default
    candidate_count=1,  # likely default
    # following are some other parameters, all optional
    #top_p,
    #top_k,
    #max_output_tokens,
    #max_tokens,
    #stop_sequences,
    #safety_settings,
)
config

GenerateContentConfig(
  candidate_count=1,
  response_modalities=[
    'TEXT',
  ],
  system_instruction='You are a helpful assistant that provides up to date information to help the user in their research.',
  temperature=0.0,
  tools=[
    Tool(
      google_search=GoogleSearch()
    ),
  ]
)

Now we have our `config` setup, we can go ahead and ask Gemini about the latest news in AI, by default Gemini would not be able to answer this question, but we can get up to date information by augmenting Gemini's knowledge with the `GoogleSearch` tool.

In [5]:
from ipywidgets import Dropdown, Textarea, RadioButtons, VBox, Button, Output
from IPython.display import display

# Create input widgets
status_dropdown = Dropdown(
    options=['Agency/Business', 'Freelancer'],
    description='Status:',
    disabled=False,
)

services_textarea = Textarea(
    value='',
    description='Services Offered:',
    disabled=False,
    layout={'width': '500px'}
)

products_textarea = Textarea(
    value='',
    description='Product Functionality (Description):',
    disabled=False,
    layout={'width': '500px'}
)

client_type_radiobuttons = RadioButtons(
    options=['Clients who immediately need help',
             'Clients who are engaging in conversation of such topics',
             'Clients who show no activity but are target businesses'],
    description='Target Client Type:',
    disabled=False
)

timeframe_radiobuttons = RadioButtons(
    options=['1 day ago', '1 week ago', '1 month ago'],
    description='Search Timeframe:',
    disabled=False
)

submit_button = Button(description="Submit")
output_area = Output()

# Display the form
form = VBox([status_dropdown, services_textarea, products_textarea, client_type_radiobuttons, timeframe_radiobuttons, submit_button, output_area])
display(form)

def on_submit_button_clicked(b):
    with output_area:
        output_area.clear_output()
        print("Collecting Information:")
        print(f"Status: {status_dropdown.value}")
        print(f"Services Offered: {services_textarea.value}")
        print(f"Product Functionality: {products_textarea.value}")
        print(f"Target Client Type: {client_type_radiobuttons.value}")
        print(f"Search Timeframe: {timeframe_radiobuttons.value}")

submit_button.on_click(on_submit_button_clicked)

VBox(children=(Dropdown(description='Status:', options=('Agency/Business', 'Freelancer'), value='Agency/Busine…

In [91]:
response = client.models.generate_content(
    model=model_id,
    contents="""Based on this user query :"Status: Agency/Business
                                          Services Offered: chatbot creation, enterprise ai , doc parsing, doc quering, ai, machine learning, data analytics + data science,
                                          Product Functionality: leadlens: it's a fresh lead generator tool
                                          Target Client Type: Clients who immediately need help
                                          Search Timeframe: 1 day ago"
    look up site:linkedin.com/posts and find out what are some keywords that people use, when looking for a genai chatbot solution, with an intention of collaborating with agencies and freelancers. return the list of terms possible, to automate official search for such keywords on linkedin to get the right leads.also return the google search with boolean and  site:linkedin.com/posts. focus more on core keywords,i need atleast 50 of them. keep them grounded, such that whn searched on google, i get valid results as of 2025 an d how people post as of recent.maybe do a couple of linedin posts search and see what phrases do they use for this kind of job,

    """,
    config=config
)
Markdown(response.text)

Okay, I will find keywords and search terms related to GenAI chatbot solutions, focusing on collaboration with agencies and freelancers on LinkedIn. I'll aim for at least 50 keywords, grounded in how people currently post and search in 2025, and provide a Google search query using boolean operators and the `site:linkedin.com/posts` operator.

I apologize for the error in the previous response. I had a syntax error in my `tool_code` block. Let me correct that and try again.

Okay, based on the search results, here's a list of keywords and search terms people are likely using on LinkedIn in 2025 when looking for GenAI chatbot solutions and seeking collaboration with agencies and freelancers:

**Core Keywords (GenAI Chatbot Focus):**

1.  GenAI Chatbot
2.  Generative AI Chatbot
3.  AI Chatbot
4.  AI Agent
5.  AI Assistant
6.  Conversational AI
7.  Chatbot Development
8.  AI Solutions
9.  LLM (Large Language Model)
10. NLP (Natural Language Processing)
11. RAG (Retrieval-Augmented Generation)
12. AI Automation
13. Intelligent Automation
14. Chatbot Integration
15. AI-Powered Chatbot
16. Custom Chatbot
17. Enterprise Chatbot
18. Virtual Assistant
19. Digital Assistant
20. Chatbot Agency
21. AI Agency
22. GenAI Consulting
23. AI Consulting
24. Chatbot Specialist
25. AI Specialist
26. Chatbot Developer
27. AI Developer
28. AI Engineer
29. GenAI Engineer
30. AI Freelancer
31. GenAI Freelancer
32. AI Expert
33. GenAI Expert
34. AI Solutions Provider
35. Chatbot Solutions Provider
36. AI Consultant
37. Chatbot Consultant
38. AI Services
39. Chatbot Services
40. AI Implementation
41. Chatbot Implementation
42. AI Strategy
43. Chatbot Strategy
44. AI Transformation
45. Chatbot Transformation
46. AI-Driven Solutions
47. Chatbot-Driven Solutions
48. AI-Enabled Chatbot
49. GenAI-Enabled Chatbot
50. AI-First Chatbot

**Keywords related to Collaboration & Hiring:**

51. "Looking for GenAI Chatbot"
52. "Hiring GenAI Chatbot Developer"
53. "Seeking AI Chatbot Agency"
54. "Partner with AI Freelancer"
55. "Need GenAI Chatbot Expert"
56. "Open to AI Chatbot Collaboration"
57. "Find AI Chatbot Consultant"
58. "Looking to Collaborate on GenAI"
59. "Seeking Partnership for AI Chatbot"
60. "GenAI Chatbot Agency Collaboration"
61. "AI Chatbot Freelancer Partnership"
62. "GenAI Chatbot Expert Needed"
63. "AI Chatbot Help Wanted"
64. "GenAI Chatbot Solution Provider"
65. "AI Chatbot Service Provider"
66. "GenAI Chatbot Implementation Partner"
67. "AI Chatbot Integration Specialist"
68. "GenAI Chatbot Project"
69. "AI Chatbot Development Services"
70. "GenAI Chatbot Consulting Services"
71. "AI Chatbot Managed Services"
72. "GenAI Chatbot Support"
73. "AI Chatbot Maintenance"
74. "GenAI Chatbot Optimization"
75. "AI Chatbot Performance Tuning"
76. "GenAI Chatbot Scalability"
77. "AI Chatbot Security"
78. "GenAI Chatbot Compliance"
79. "AI Chatbot Governance"
80. "GenAI Chatbot ROI"
81. "AI Chatbot Cost Reduction"
82. "GenAI Chatbot Efficiency"
83. "AI Chatbot Productivity"
84. "GenAI Chatbot Innovation"
85. "AI Chatbot Future"
86. "GenAI Chatbot Trends"
87. "AI Chatbot Best Practices"
88. "GenAI Chatbot Case Studies"
89. "AI Chatbot Success Stories"
90. "GenAI Chatbot Examples"
91. "AI Chatbot Use Cases"
92. "GenAI Chatbot Applications"
93. "AI Chatbot Solutions for \[Industry]" (e.g., Healthcare, Finance, E-commerce)
94. "GenAI Chatbot for \[Function]" (e.g., Customer Service, Sales, Marketing, HR)
95. "GenAI Chatbot with \[Specific Feature]" (e.g., Voice, Multilingual, Integration)
96. "GenAI Chatbot for \[Specific Platform]" (e.g., Website, WhatsApp, Messenger)
97. "GenAI Chatbot with \[Specific LLM]" (e.g., GPT-4, Gemini, Claude)
98. "GenAI Chatbot with \[Specific Framework]" (e.g., Langchain, Autogen)
99. "GenAI Chatbot with \[Specific Tool]" (e.g., CustomGPT, Botpress)
100. "GenAI Chatbot with \[Specific Integration]" (e.g., CRM, ERP)

**Google Search Query:**

Here's a Google search query you can use with boolean operators and the `site:linkedin.com/posts` operator:

```
(GenAI OR "Generative AI" OR AI OR "Artificial Intelligence") AND Chatbot AND (Agency OR Freelancer OR Consultant OR "Solutions Provider" OR Expert) AND ("looking for" OR hiring OR "seeking" OR "partner with" OR "need help with" OR "open to collaboration") site:linkedin.com/posts
```

**Explanation of the Query:**

*   `(GenAI OR "Generative AI" OR AI OR "Artificial Intelligence")`:  This ensures that the search includes variations of the core technology.
*   `Chatbot`: This focuses the search on chatbot-related posts.
*   `(Agency OR Freelancer OR Consultant OR "Solutions Provider" OR Expert)`: This targets posts where people are looking for external help.
*   `("looking for" OR hiring OR "seeking" OR "partner with" OR "need help with" OR "open to collaboration")`: This captures the intent to collaborate or hire.
*   `site:linkedin.com/posts`: This restricts the search to LinkedIn posts only.

This comprehensive list and the Google search query should provide a strong starting point for automating your lead generation efforts on LinkedIn. Remember to refine these keywords and the search query based on your specific needs and the results you're getting. Good luck!


In [55]:
from googleapiclient.discovery import build

# API_KEY = "AIzaSyDf0IIUZfCajnR2solmjMvgm0xpY_fp2PI"   # Your API key
API_KEY='AIzaSyBefFJNw9kiy1YZTA6yi5wsOBYo5DsIuPQ'
CSE_ID = "14b793596a8e245e3"
                       # Your Custom Search Engine ID
def google_search(query, api_key=API_KEY, cse_id=CSE_ID, num=10, pages=5, date_restrict='d1'):
    """
    Perform a Google Custom Search with optional date restriction.

    date_restrict: str, like "d1" (last day), "w1" (last week), "m1" (last month)
    """
    service = build("customsearch", "v1", developerKey=api_key)
    results = []

    for page in range(pages):
        start_index = page * num + 1
        req = service.cse().list(
            q=query,
            cx=cse_id,
            num=num,
            start=start_index,
            dateRestrict=date_restrict
            # sort="date:r:20250923:20250925"
        )
        res = req.execute()
        items = res.get("items", [])
        for item in items:
            results.append(
                {
                    "title": item["title"],
                    "url": item["link"],
                    "snippet": item.get("snippet", ""),
                }
            )
    return results


q='agency agency OR OR OR ai OR OR OR llm OR rag OR FOR OR machine OR learning OR OR OR lead OR generation OR OR OR automation OR OR OR scraping "looking for" site:linkedin.com/posts'
sresults=google_search(q, num=5,pages=1)


In [54]:
import requests
import json, os

In [58]:
sresults[0]

{'title': '“Opportunities come to those who are looking for them ...',
 'url': 'https://www.linkedin.com/posts/spencerploessl_opportunities-come-to-those-who-are-looking-activity-7376635507109199872-C9QP',
 'snippet': '13 hours ago ... Opportunities come to those who are looking for them. Opportunities come to ... 5 Tips for Hiring an Inbound Marketing Agency that Will Produce a Positive ROI.'}

In [89]:
import requests
import time, random
from bs4 import BeautifulSoup
context_items=[]


for item in sresults[:]:
  url=item.get('url')
  headers = {
      'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
      'Accept-Language': 'en-US,en;q=0.9',
      'Accept-Encoding': 'gzip, deflate, br',
      'Connection': 'keep-alive',
      'Upgrade-Insecure-Requests': '1',
      'DNT': '1', # Do Not Track Request Header
      'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
  }
  response=requests.get(url, headers=headers,proxies={
        "http": "http://pkbgjltc:5p9wu395n3zt@142.111.48.253:7030/",
        "https": "http://pkbgjltc:5p9wu395n3zt@142.111.48.253:7030/"
    })
  soup=BeautifulSoup(response.text,'lxml')

  for _ in soup.find_all('meta'):
    # time.sleep(random.randint(1,4))
    try:
      if len(str(_.get('content')).split()) >=5:
        content=_.get('content')
        break
    except:
      content=''
  context_items.append({'url':url,'text':content,'snippet':item.get('snippet'),'title':item.get('title')})
  # break
#


In [92]:
q='site:x.com inurl:status python agency'
sresults=google_search(q, num=10,pages=1)

In [93]:
sresults

[{'title': 'Kyle Wild on X: "Coding agents feel much, much better at Golang ...',
  'url': 'https://x.com/dorkitude/status/1970938047383654711',
  'snippet': '9 hours ago ... Coding agents feel much, much better at Golang than at Typescript/Python. This is anecdotal, but some theories to back it up: 1.'},
 {'title': 'Alexey Grigorev on X: "RT @Al_Grigor: AI Agents crash course - Day ...',
  'url': 'https://x.com/Al_Grigor/status/1970892806580224295',
  'snippet': "12 hours ago ... RT @Al_Grigor: AI Agents crash course - Day 1 (out of 7) completed Outcome: Data Engineering Zoomcamp's repository is ready for processi…"},
 {'title': 'Aadit Sheth on X: "How to build an AI agent that rewrites its own ...',
  'url': 'https://x.com/aaditsh/status/1970955418399859193',
  'snippet': '8 hours ago ... How to build an AI agent that rewrites its own code. A document page titled\xa0...'},
 {'title': 'Jordan Urbs (@jordanurbs): "Claude Code vs Agent Zero You can\'t ...',
  'url': 'https://x.com/jorda

In [97]:
import requests
import time, random
from bs4 import BeautifulSoup
context_items=[]


for item in sresults[:]:
  url=item.get('url')
  headers = {
      'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
      'Accept-Language': 'en-US,en;q=0.9',
      'Accept-Encoding': 'gzip, deflate, br',
      'Connection': 'keep-alive',
      'Upgrade-Insecure-Requests': '1',
      'DNT': '1', # Do Not Track Request Header
      'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
  }
  response=requests.get(url, headers=headers,proxies={
        "http": "http://pkbgjltc:5p9wu395n3zt@142.111.48.253:7030/",
        "https": "http://pkbgjltc:5p9wu395n3zt@142.111.48.253:7030/"
    })
  soup=BeautifulSoup(response.text,'lxml')
  content=''
  for _ in soup.find_all('meta'):
    # time.sleep(random.randint(1,4))
    try:
      if len(str(_.get('content')).split()) >=3:
        content=_.get('content')
        break
    except:
      content=''
  context_items.append({'url':url,'text':content,'snippet':item.get('snippet'),'title':item.get('title')})
  # break
#


In [98]:
context_items

[{'url': 'https://x.com/dorkitude/status/1970938047383654711',
  'text': '',
  'snippet': '9 hours ago ... Coding agents feel much, much better at Golang than at Typescript/Python. This is anecdotal, but some theories to back it up: 1.',
  'title': 'Kyle Wild on X: "Coding agents feel much, much better at Golang ...'},
 {'url': 'https://x.com/Al_Grigor/status/1970892806580224295',
  'text': '',
  'snippet': "12 hours ago ... RT @Al_Grigor: AI Agents crash course - Day 1 (out of 7) completed Outcome: Data Engineering Zoomcamp's repository is ready for processi…",
  'title': 'Alexey Grigorev on X: "RT @Al_Grigor: AI Agents crash course - Day ...'},
 {'url': 'https://x.com/aaditsh/status/1970955418399859193',
  'text': '',
  'snippet': '8 hours ago ... How to build an AI agent that rewrites its own code. A document page titled\xa0...',
  'title': 'Aadit Sheth on X: "How to build an AI agent that rewrites its own ...'},
 {'url': 'https://x.com/jordanurbs/status/1970719105289461918/photo/1'

In [105]:
for text in soup.find_all('div',class_='css-175oi2r'):
  print(text.get_text(strip=True,separator='\n'))

Something went wrong, but don’t fret — let’s give it another shot.
Try again
Some privacy related extensions may cause issues on x.com. Please disable them and try again.
Something went wrong, but don’t fret — let’s give it another shot.
Try again
Some privacy related extensions may cause issues on x.com. Please disable them and try again.

Something went wrong, but don’t fret — let’s give it another shot.
Try again
Some privacy related extensions may cause issues on x.com. Please disable them and try again.
Something went wrong, but don’t fret — let’s give it another shot.
Try again
Some privacy related extensions may cause issues on x.com. Please disable them and try again.


In [110]:
!pip uninstall -y snscrape
!pip install git+https://github.com/JustAnotherArchivist/snscrape.git@master


Found existing installation: snscrape 0.7.0.20230622
Uninstalling snscrape-0.7.0.20230622:
  Successfully uninstalled snscrape-0.7.0.20230622
Collecting git+https://github.com/JustAnotherArchivist/snscrape.git@master
  Cloning https://github.com/JustAnotherArchivist/snscrape.git (to revision master) to /tmp/pip-req-build-r6pkcij9
  Running command git clone --filter=blob:none --quiet https://github.com/JustAnotherArchivist/snscrape.git /tmp/pip-req-build-r6pkcij9
  Resolved https://github.com/JustAnotherArchivist/snscrape.git to commit 614d4c2029a62d348ca56598f87c425966aaec66
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: snscrape
  Building wheel for snscrape (pyproject.toml) ... [?25l[?25hdone
  Created wheel for snscrape: filename=snscrape-0.7.0.20230622-py3-none-any.whl size=74851 sha256=352a77618a39c8dd552ab88335c46e8

In [1]:
import snscrape.modules.twitter as sntwitter

tweet_id = "1970725228503597430"

# Fetch a single tweet by ID
tweet = sntwitter.TwitterTweetScraper(tweet_id).get_items()
for t in tweet:
    print("ID:", t.id)
    print("Date:", t.date)
    print("Username:", t.user.username)
    print("Display name:", t.user.displayname)
    print("Content:", t.content)
    print("URL:", t.url)
    print("Like count:", t.likeCount)
    print("Retweet count:", t.retweetCount)
    print("Reply count:", t.replyCount)
    print("Language:", t.lang)
    break


AttributeError: 'FileFinder' object has no attribute 'find_module'

In [2]:
import subprocess, json

tweet_id = "1970725228503597430"
result = subprocess.run(
    ["snscrape", "--jsonl", "twitter-tweet", tweet_id],
    capture_output=True, text=True
)
tweet_data = json.loads(result.stdout.strip())
print(tweet_data["content"])


JSONDecodeError: Expecting value: line 1 column 1 (char 0)