# COGS 108 - Project Proposal

## Authors

**Ali Ahmed**: Background research, Writing - original draft

**Rodayna Alnaggar**: Background research, Data curation

**Tessa Kibbe**: Conceptualization, Writing - review & editing

**Sabine A Sanchez**: Data curation, Methodology

**Maanav R Singh**: Project administration, Writing - review & editing

## Research Question

Has the introduction of Google AI Overviews in 2024 led to a statistically significant shift in the proportions of search behaviors (organic clicks vs. zero-click searches) for informational queries between 2023 and 2025?

## Background and Prior Work


The way we search for information online is going through a huge change right now. For about 30 years, Google and other search engines worked pretty simply: you typed in a question, and they gave you a list of websites to click on. This "Ten Blue Links" model meant that users had to do the work of clicking through different sites, reading them, and figuring out what was true on their own.<a name="cite_ref-1"></a>[<sup>1</sup>](#cite_note-1) But now, with AI being added to search engines, things are different. Instead of just showing you where to find answers, search engines are now trying to give you the answer directly. Researchers call this shift moving from "Information Retrieval" to "Generative Information Retrieval."

This change didn't happen overnight, it built up over several years. In 2015, Google started using something called RankBrain, which was the first time they used deep learning to understand search queries better. Then in 2019, they added BERT (which stands for Bidirectional Encoder Representations from Transformers), which helped Google understand words in the context of a full sentence instead of just individually.<a name="cite_ref-2"></a>[<sup>2</sup>](#cite_note-2) In 2021, they introduced MUM (Multitask Unified Model), and finally in 2023, they launched the Search Generative Experience (SGE), which became "AI Overviews" in 2024. This was the first time Google actually showed AI-generated text at the top of search results instead of just links.<a name="cite_ref-3"></a>[<sup>3</sup>](#cite_note-3)

One major consequence of this shift is what researchers call the "Great Decoupling"—basically, more people are searching than ever, but fewer people are actually clicking through to websites. The numbers are pretty striking: about 60% of Google searches now end without the user clicking on anything (this goes up to 77% on phones).<a name="cite_ref-4"></a>[<sup>4</sup>](#cite_note-4) A study by Seer Interactive found that when an AI Overview shows up, click-through rates for the top search results dropped from 1.41% to just 0.64%—that's more than a 50% decrease.<a name="cite_ref-5"></a>[<sup>5</sup>](#cite_note-5) Another study from Ahrefs found that AI Overviews reduce organic clicks by around 58%, and it's getting worse over time, not better.<a name="cite_ref-6"></a>[<sup>6</sup>](#cite_note-6)

Beyond the economic impact on websites, there are also concerns about whether these AI-generated answers are actually reliable. When you see a list of links, you know you need to evaluate them yourself. But when an AI gives you a direct answer, it can seem more trustworthy than it actually is—researchers call this the "Illusion of Neutrality."<a name="cite_ref-7"></a>[<sup>7</sup>](#cite_note-7) This is a problem because users might accept AI-generated summaries without questioning them the way they would with regular search results.

One specific issue is that AI systems can "hallucinate," which means they sometimes make up information that sounds believable but is completely wrong. There have been cases where AI summaries cited satirical articles as if they were real news, or made up citations that don't exist.<a name="cite_ref-8"></a>[<sup>8</sup>](#cite_note-8) On top of that, research from Stanford has found that AI models like Gemini can have political biases built into them based on the data they were trained on and the people who helped fine-tune them.<a name="cite_ref-9"></a>[<sup>9</sup>](#cite_note-9)

Some researchers have started creating tools to test and evaluate these AI search systems. The C-SEO Bench dataset is a benchmark for testing how different writing styles affect whether content shows up in AI-generated answers.<a name="cite_ref-10"></a>[<sup>10</sup>](#cite_note-10) Salesforce also created something called the Answer Engine Eval framework, which measures things like how well AI answers cover a topic and whether the citations they provide are actually reliable.<a name="cite_ref-11"></a>[<sup>11</sup>](#cite_note-11) However, there haven't been many long-term studies that compare how search behavior and information quality have changed since AI Overviews were introduced. This is the gap we want to explore with our research. 

1. <a name="cite_note-1"></a> [^](#cite_ref-1) Ofcom. (2024). *The Era of Answer Engines: Generative AI's impact on search experiences and online safety*. https://www.ofcom.org.uk/research-and-data/online-research/the-era-of-answer-engines
2. <a name="cite_note-2"></a> [^](#cite_ref-2) Digital Marketing Institute. (2025). *Google Algorithm Updates: What do They Mean for Brands and Marketers?* https://digitalmarketinginstitute.com/blog/google-algorithm-updates-what-do-they-mean-for-brands-and-marketers
3. <a name="cite_note-3"></a> [^](#cite_ref-3) Google. (2024). *AI Features and Your Website*. Google Search Central Documentation. https://developers.google.com/search/docs/appearance/ai-overviews
4. <a name="cite_note-4"></a> [^](#cite_ref-4) The Digital Bloom. (2025). *2025 Organic Traffic Crisis: Zero-Click & AI Impact Analysis Report*. https://thedigitalbloom.com/resources/organic-traffic-ai-impact-analysis
5. <a name="cite_note-5"></a> [^](#cite_ref-5) Seer Interactive. (2025). *AIO Impact on Google CTR: September 2025 Update*. https://www.seerinteractive.com/insights/aio-impact-ctr-update
6. <a name="cite_note-6"></a> [^](#cite_ref-6) Ahrefs. (2025). *Update: AI Overviews Reduce Clicks by 58%*. https://ahrefs.com/blog/ai-overviews-reduce-clicks
7. <a name="cite_note-7"></a> [^](#cite_ref-7) Harvard Kennedy School Misinformation Review. (2024). *New sources of inaccuracy? A conceptual framework for studying AI hallucinations*. https://misinforeview.hks.harvard.edu/article/new-sources-of-inaccuracy
8. <a name="cite_note-8"></a> [^](#cite_ref-8) National Institutes of Health. (2024). *Beware of Artificial Intelligence hallucinations or should we call confabulation?* PMC. https://pmc.ncbi.nlm.nih.gov/articles/ai-hallucinations
9. <a name="cite_note-9"></a> [^](#cite_ref-9) Stanford Graduate School of Business. (2024). *Popular AI Models Show Partisan Bias When Asked to Talk Politics*. https://www.gsb.stanford.edu/insights/popular-ai-models-show-partisan-bias
10. <a name="cite_note-10"></a> [^](#cite_ref-10) Parameter Lab. (2025). *C-SEO Bench: Does Conversational SEO Work?* arXiv/OpenReview. https://huggingface.co/datasets/parameterlab/c-seo-bench
11. <a name="cite_note-11"></a> [^](#cite_ref-11) Salesforce AI Research. (2025). *Answer Engine Eval*. GitHub. https://github.com/SalesforceAIResearch/answer-engine-eval


## Hypothesis


We hypothesize that the rollout of Google AI Overviews in 2024 has caused a statistically significant shift in search behavior for informational queries, characterized by an increase in zero-click search rates and a corresponding decrease in organic click-through rates. We expect to see this change in proportions specifically in informational queries compared to navigational and transactional queries because they seek to find answers rather than destinations.

In addition, we hypothesize that the impact of AI Overviews on informational search behavior increased zero-click searches in a nonliner pattern. Specifically, we expect modest changes immediately after the feature was rolled out, signifying a lagged adoption rate, followed by a rapid increase as users grew more accustomed to relying on AI-generated summaries and started believing they were more trustworthy than links because they were tailored responses. This prediction is based on industry research showing that AI Overviews reduce organic clicks by 50-60% (Ahrefs, Seer Interactive) and the broader trend toward zero-click searches documented by SparkToro, suggesting users are increasingly accepting AI-generated answers without clicking through to verify information from original sources.

## Data

### Ideal Dataset

The ideal dataset would include the following variables: **time period** (monthly from 2022-2025), **AI Overview era** (binary pre/post mid-2023), **zero-click rate** (% of searches with no external click), **organic CTR** (% clicking organic results), **query intent** (informational/navigational/transactional), **device type** (mobile/desktop), and **sample size**. We would need approximately 36-48 months of data with at least 12 pre-AIO and 18+ post-AIO monthly observations, each based on 100,000+ queries to ensure statistical reliability.

Ideally, this data would come directly from Google's internal search logs, but since that's not publicly available, the next best source would be large-scale clickstream panels (like Datos or SimilarWeb) that track consenting users via browser extensions. The data would be stored in tabular CSV format with one row per time period × query type × device type, including columns for zero-click rate, organic CTR, and sample size for easy statistical analysis.

---

### Real Datasets

The tricky part about studying zero-click searches is that the detailed data is owned by companies like Semrush, and it's expensive to access. So instead of trying to measure zero-click rates directly, we're going to use a workaround with two free datasets. The idea is pretty simple: we'll use Google Trends to see how much people are searching for certain topics (the "input"), and Wikipedia Clickstream data to see how many people actually click through from Google to read about those topics (the "output"). If search interest stays the same but clicks go down, that gap tells us people are getting their answers without clicking—which is basically what zero-click means.

**1. Wikipedia Clickstream Data**

This data is free and available on the Wikimedia Dumps website (https://dumps.wikimedia.org/other/clickstream/), where you download the monthly files directly. The files are in TSV format, and we'd grab the ones from 2023 through 2025 to cover our timeline.

The main variables we care about are **prev** (where the user came from, like 'google'), **curr** (which Wikipedia article they went to), and **n** (how many times that happened). So if we filter for `prev = 'google'` and look at informational articles like "Quantum_mechanics" or "Climate_change," we can count how many people clicked from Google to those pages each month. If those numbers drop after AI Overviews launched, that's evidence that fewer people are clicking through from search results.

**2. Google Trends Data**

You can get this data from Google Trends (https://trends.google.com/) by downloading CSVs manually, or you can use the Python library `pytrends` to pull it automatically.

The key variables are **date**, **keyword** (what people searched for), and **interest_over_time** (a number from 0-100 showing how popular that search term was). We'll use this to check that people were still searching for the same topics during 2023-2025. If Google Trends shows that interest in "Quantum mechanics" stayed steady but Wikipedia clicks for that article dropped, that's strong evidence that AI Overviews are answering people's questions directly instead of sending them to websites.


## Ethics 

Instructions: Keep the contents of this cell. For each item on the checklist
-  put an X there if you've considered the item
-  IF THE ITEM IS RELEVANT place a short paragraph after the checklist item discussing the issue.
  
Items on this checklist are meant to provoke discussion among good-faith actors who take their ethical responsibilities seriously. Your teams will document these discussions and decisions for posterity using this section.  You don't have to solve these problems, you just have to acknowledge any potential harm no matter how unlikely.

Here is a [list of real world examples](https://deon.drivendata.org/examples/) for each item in the checklist that can refer to.

[![Deon badge](https://img.shields.io/badge/ethics%20checklist-deon-brightgreen.svg?style=popout-square)](http://deon.drivendata.org/)

### A. Data Collection
 - [X] **A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?

    >Our datasets present an interesting informed consent situation. While Wikipedia Clickstream and Google Trends data contain no personally identifiable information, the clickstream data originates from real users' browsing behavior — people who navigated from Google to Wikipedia without knowing their behavior would be aggregated and used in research. Wikimedia publishes this data openly under their privacy policy, which users implicitly accept, but true affirmative opt-in consent was never obtained. We consider this an acceptable tradeoff given the data is fully anonymized and aggregated before we ever access it, meaning no individual can be identified. Additionally, there is a reasonable argument that most internet users today have a general awareness that their searches are not private — Google Trends itself is a widely used public tool that many people interact with directly. This awareness has arguably been further reinforced culturally through true crime media, where high-profile cases like Casey Anthony and the Idaho student murders brought mainstream attention to the fact that search histories can be subpoenaed and used as evidence. Whether or not users formally consented, there is broad public understanding that online search behavior leaves a traceable record. Google Trends data raises fewer concerns as it is a deliberately public-facing tool that Google explicitly designed for research and analysis purposes.

 - [X] **A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?

    >Our analysis is intentionally scoped to English-language searches using English Wikipedia Clickstream data, which allows for a more controlled comparison while acknowledging this limits generalizability to non-English-speaking populations. Within this scope, our most significant source of collection bias stems from using Wikipedia Clickstream as a proxy for zero-click behavior. Wikipedia attracts a specific type of user — generally educated, English-speaking, and seeking encyclopedic information — meaning our analysis inherently reflects only a narrow subset of informational queries. Someone searching "how do I renew my license" or "best pizza near me" will never land on Wikipedia regardless of whether AI Overviews exist, so our measure of zero-click behavior is really only valid for a particular class of curiosity-driven informational queries. Additionally, Google Trends normalizes all data to a scale of 0-100 relative to peak interest rather than showing absolute search volume, meaning if overall search volume grew significantly after AI Overviews launched, Trends would not capture that, potentially masking the true scale of behavioral change. Finally, we must be careful about which Wikipedia articles we select, as certain topics introduce noise that has nothing to do with AI Overviews — for example, a celebrity death, a breaking news event, or a viral moment during 2023-2025 would cause an artificial spike or crash in clicks that would contaminate our results. To mitigate this, we will intentionally exclude any topics that experienced major real-world events during our study period and instead restrict our selection to stable, evergreen topics such as major companies, living celebrities who did not die between 2023-2025, and well-established historical events. These topics have consistent baseline search interest, making them a much cleaner signal for detecting whether AI Overviews are intercepting traffic.

 - [X] **A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis?

    >Our datasets contain no personally identifiable information at any level. Wikipedia Clickstream data is purely aggregate click counts with no user or location data attached. While Google Trends allows filtering by region, we will not be using location-based filtering in our analysis, meaning we never interact with even aggregate location data. There is no PII exposure risk in this project.

 - [X] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)?

### B. Data Storage
 - [X] **B.1 Data security**: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)?
 - [X] **B.2 Right to be forgotten**: Do we have a mechanism through which an individual can request their personal information be removed?
 - [X] **B.3 Data retention plan**: Is there a schedule or plan to delete the data after it is no longer needed?
    >Data files will be deleted from local machines after the project concludes in March 2026.

### C. Analysis
 - [X] **C.1 Missing perspectives**: Have we sought to address blindspots in the analysis through engagement with relevant stakeholders (e.g., checking assumptions and discussing implications with affected communities and subject matter experts)?
    >We did not formally engage with outside stakeholders or domain experts during the development of this project. The perspectives most notably absent are those of anyone who creates content for the web and depends on organic search traffic to reach an audience — this includes bloggers, journalists, writers, photographers, UX designers, and anyone who has built a website whether through custom development or platforms like Wix or Squarespace. These are the people most directly and economically harmed by the trends we are studying, as reduced organic clicks means fewer people ever reach their work. SEO professionals who track click-through rates professionally would also have been valuable consultants for validating our methodology. We acknowledge these missing perspectives as a limitation of our analysis, and note that our conclusions should be interpreted with the understanding that they were not reviewed by those with firsthand experience of the phenomenon we are studying.

 - [X] **C.2 Dataset bias**: Have we examined the data for possible sources of bias and taken steps to mitigate or address these biases (e.g., stereotype perpetuation, confirmation bias, imbalanced classes, or omitted confounding variables)?

    >Our most significant dataset bias is that we are exclusively using English Wikipedia Clickstream data and English-language Google Trends queries, meaning our findings only reflect the behavior of English-speaking users. This is an intentional scope decision rather than an oversight, but it does mean our conclusions cannot be generalized to non-English-speaking populations who may interact with AI Overviews very differently. Confirmation bias is also worth acknowledging — given that our hypothesis already predicts that AI Overviews will reduce organic clicks, we must be careful not to selectively interpret our findings in ways that support that conclusion. We will mitigate this by letting the statistical tests drive our conclusions rather than cherry-picking time periods or topics that support our hypothesis. There are also meaningful confounding variables that could explain drops in Wikipedia clicks that have nothing to do with AI Overviews. One notable example is the growing awareness around source reliability and media literacy — as more people learn about evaluating credible sources, Wikipedia's reputation as an anyone-can-edit platform may lead fewer people to click through to it regardless of whether AI Overviews exist. This cultural shift toward skepticism of Wikipedia as a reliable source could independently reduce click traffic in our study period and be mistakenly attributed to AI Overviews. We will attempt to account for confounding variables by comparing trends across multiple topics and looking for consistent patterns rather than relying on any single data point.

 - [X] **C.3 Honest representation**: Are our visualizations, summary statistics, and reports designed to honestly represent the underlying data?
    >We are committed to honestly representing our data in all visualizations and summary statistics. This means we will not manipulate axis scales, cherry-pick time ranges, or present results in misleading ways. If our findings do not support our hypothesis, we will report that honestly rather than adjusting our analysis to fit our expectations.

 - [X] **C.4 Privacy in analysis**: Have we ensured that data with PII are not used or displayed unless necessary for the analysis?
 - [X] **C.5 Auditability**: Is the process of generating the analysis well documented and reproducible if we discover issues in the future?
    >Our entire analysis will be documented and reproducible through our public GitHub repository. All data cleaning, wrangling, and statistical analysis will be conducted in Jupyter notebooks with clear explanations of each step. Anyone should be able to clone our repository, download the same publicly available datasets from Wikipedia Clickstream and Google Trends, and reproduce our exact results.

### D. Modeling
 - [X] **D.1 Proxy discrimination**: Have we ensured that the model does not rely on variables or proxies for variables that are unfairly discriminatory?
 - [X] **D.2 Fairness across groups**: Have we tested model results for fairness with respect to different affected groups (e.g., tested for disparate error rates)?
 - [X] **D.3 Metric selection**: Have we considered the effects of optimizing for our defined metrics and considered additional metrics?
 - [X] **D.4 Explainability**: Can we explain in understandable terms a decision the model made in cases where a justification is needed?
 - [X] **D.5 Communicate limitations**: Have we communicated the shortcomings, limitations, and biases of the model to relevant stakeholders in ways that can be generally understood?

### E. Deployment
 - [X] **E.1 Monitoring and evaluation**: Do we have a clear plan to monitor the model and its impacts after it is deployed (e.g., performance monitoring, regular audit of sample predictions, human review of high-stakes decisions, reviewing downstream impacts of errors or low-confidence decisions, testing for concept drift)?
 - [X] **E.2 Redress**: Have we discussed with our organization a plan for response if users are harmed by the results (e.g., how does the data science team evaluate these cases and update analysis and models to prevent future harm)?
 - [X] **E.3 Roll back**: Is there a way to turn off or roll back the model in production if necessary?
 - [X] **E.4 Unintended use**: Have we taken steps to identify and prevent unintended uses and abuse of the model and do we have a plan to monitor these once the model is deployed?


## Team Expectations 

* **Communication**: We will communicate primarily through iMessage group chat and meet via FaceTime or in person as needed. All members are expected to respond to messages within 24 hours and notify the group in advance if they cannot attend a scheduled meeting.

* **Equal Contribution**: Each member will contribute equally in terms of effort across all aspects of the project, including research, coding, writing, and editing. We will rotate responsibilities so everyone gains experience in different areas.

* **Tone and Respect**: We will communicate in a blunt but polite manner, using "I statements" when giving feedback (e.g., "I think X might be a problem because Y"). We will assume all criticism is well-intentioned and aimed at improving the project.

* **Task Management**: We will use GitHub to track tasks and deadlines. Tasks will be assigned fairly based on each member's strengths and availability. If someone is struggling with their assigned task, they should notify the group within 48 hours so we can redistribute work if needed.

* **Decision Making**: Major decisions will be made by majority vote. For urgent decisions when someone is unresponsive, the available members can proceed and update the group afterward.

* **Accountability**: If a team member is not meeting expectations, we will first address it directly with them via text message, giving them one week to improve with specific deliverables outlined. The team will work together to find solutions and redistribute tasks as necessary.

* **Deadlines**: We will set internal deadlines 2-3 days before official course deadlines to allow time for review and revision. All members agree to meet these internal deadlines.

## Project Timeline Proposal


**Special Resources/Training Needed:**
We'll need to learn some statistical methods for comparing data before and after AI Overviews launched (like t-tests and chi-square tests), which might go a bit beyond what's covered in class. We'll also work on creating clear visualizations to show trends over time.


| Meeting Date | Meeting Time | Completed Before Meeting | Discuss at Meeting |
|--------------|--------------|--------------------------|-------------------|
| 1/15 | 12 PM | NA | Determine best form of communication |
| 1/23 | 10 AM | brainstorm topics/questions (All) | brainstorm topics/questions (All) |
| 1/26 | 10 AM | Do background research on topic (Ali, Rodayna); Draft ethics considerations (Sabine) | Discuss ideal dataset(s) and ethics; draft project proposal |
| 2/4 | 10 AM | Edit, finalize, and submit proposal (Maanav, Tessa); Search for datasets (All) | Discuss Wrangling and possible analytical approaches; Assign group members to lead each specific part |
| 2/4 | Before 11:59 PM | NA | Turn in Project Proposal |
| 2/6 | Before 11:59 PM | NA | (Optional) Week 5 group progress survey |
| 2/13 | Before 11:59 PM | NA | (Optional) Week 6 group progress survey |
| 2/14 | 6 PM | Import & Wrangle Data (Maanav); Complete initial EDA with 3+ visualizations (Rodayna) | Review/Edit wrangling/EDA; Discuss Analysis Plan; Identify any data quality issues |
| 2/14 | Before 11:59 PM | NA | Turn in Data Checkpoint |
| 2/20 | Before 11:59 PM | NA | (Optional) Week 7 group progress survey |
| 2/23 | 12 PM | Finalize wrangling/EDA (Maanav, Rodayna); Begin statistical analysis comparing pre/post AI Overview periods (Sabine, Tessa) | Discuss/edit Analysis; Review preliminary results; Complete project check-in |
| 2/27 | Before 11:59 PM | NA | (Optional) Week 8 group progress survey |
| 3/2 | 12 PM | Complete hypothesis testing and correlation analysis (Sabine, Tessa); Create initial results visualizations (Rodayna); Review code quality (Maanav) | Review statistical findings; Troubleshoot any analytical issues; Assign final visualization tasks; Discuss interpretation approach |
| 3/4 | Before 11:59 PM | NA | Turn in EDA Checkpoint |
| 3/6 | Before 11:59 PM | NA | (Optional) Week 9 group progress survey |
| 3/9 | 12 PM | Finalize all analysis and visualizations (Sabine, Tessa, Rodayna); Draft results/conclusion/discussion sections (Ali); Polish notebook formatting (Maanav) | Review complete analysis; Edit results and discussion; Outline video presentation; Assign video responsibilities |
| 3/13 | 12 PM | Complete ethics section updates (Sabine); Finalize all written sections (Ali, Tessa); Record individual video segments (All) | Integrate video segments; Final review of complete project; Address any remaining issues |
| 3/13 | Before 11:59 PM | NA | (Optional) Week 10 group progress survey |
| 3/16 | 6 PM | Polish final notebook (Maanav); Complete video editing (Ali, Rodayna); Final proofreading (Tessa) | Final quality check; Submit early for review buffer |
| 3/18 | Before 11:59 PM | NA | Turn in Final Project & Video |