# Google Search Results for Various Geographical Locations

**Author: Eni Mustafaraj**  
**January 2022**

Researchers who audit search engines need to search Google by pretending to be in many locations, in order to see how the algorithm works and what the quality of search results is in different areas of the country.

This notebook uses a CSV file that contains geo-locations for all counties in the United States in the form of latitude and longitude values. These values are used to change the geolocation of the Chrome browser, via Selenium. Once this is done, the search results on Google Search will typically reflect the results for that location.

**Table of Contents**

1. [Get the geolocation data](#sec1)
2. [Randomly pick some locations](#sec2)
3. [Setup the Selenium driver](#sec3)
4. [Perform searches](#sec4)

<a id="sec1"></a>
## 1. Get the geolocation data

This data is in a CSV file associated with the paper "Auditing local news presence on Google News".

In [2]:
import pandas as pd

In [3]:
df = pd.read_csv('county_geocodes_utf.csv') # the file contains lattidues and longitudes of cities in all 50
df.head()

Unnamed: 0.1,Unnamed: 0,sno,State,FIPS,County,CountySeat,Population,Latitude,Longitude
0,606,607,IL,17023,Clark,Marshall,16335,39.332364,-87.791687
1,875,876,IA,19173,Taylor,Bedford,6317,40.737949,-94.697108
2,1329,1330,MN,27031,Cook,Grand Marais,5176,47.538571,-90.29019
3,2289,2290,PA,42091,Montgomery,Norristown,799874,40.209999,-75.370201
4,2818,2819,VT,50025,Windham,Newfane,44513,42.999143,-72.716335


Let's organize the table a bit better by removing two columns and sorting it:

In [5]:
dfClean = df.drop(columns=['sno', 'Unnamed: 0'])
dfClean.sort_values('State', inplace=True)
dfClean.head()

Unnamed: 0,State,FIPS,County,CountySeat,Population,Latitude,Longitude
2400,AK,2150,Kodiak Island,Kodiak,13592,57.553611,-153.630911
1558,AK,2275,Wrangell,Wrangell,2369,56.279121,-132.040326
424,AK,2290,Yukon-Koyukuk [4],Yukon-Koyukuk,5588,65.376131,-151.576855
869,AK,2270,Wade Hampton [4],Wade Hampton,7459,62.283174,-163.19095
885,AK,2195,Petersburg [4],Petersburg,3815,56.639612,-133.527996


<a id="sec2"></a>
## 2. Randomly pick some locations

In [4]:
import random
random.seed(42) # make sure data remains consistent through runs

In [5]:
indices = random.sample(range(dfClean.shape[0]), 10)
indices

[2619, 456, 102, 3037, 1126, 1003, 914, 571, 3016, 419]

In [6]:
ourChoices = dfClean[dfClean.index.isin(indices)]
ourChoices

Unnamed: 0,State,FIPS,County,CountySeat,Population,Latitude,Longitude
456,AL,1059,Franklin,Russellville,31704,34.441988,-87.842815
3016,GA,13177,Lee,Leesburg,28298,31.818419,-84.146681
914,LA,22105,Tangipahoa,Amite,121097,30.621581,-90.406633
3037,MA,25025,Suffolk,Boston,722023,42.33196,-71.020173
1003,MI,26109,Menominee,Menominee,24029,45.544174,-87.509892
419,ND,38037,Grant,Carson,2394,46.357827,-101.639049
571,NY,36001,Albany,Albany,304204,42.588271,-73.974014
102,OH,39017,Butler,Hamilton,368130,39.439915,-84.565397
1126,TX,48199,Hardin,Kountze,54635,30.329612,-94.393149
2619,VA,51019,Bedford,Bedford,68676,37.312408,-79.527947


We will be searching Google by pretending to be in one of these locations.

<a id="sec3"></a>
## 3. Setup the Selenium Driver

I have tested the following code with Chrome version 97. Chrome drivers can be downloaded from [here](https://chromedriver.chromium.org/downloads).

In [13]:
import selenium
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.chrome.options import Options
from time import sleep

Let's create an instance of the driver for testing:

In [15]:
# Set the driver path
driverpath ='../driver/chromedriver'

chrome_options = webdriver.ChromeOptions()
# This option is what will allow to change the geolocation
chrome_options.add_experimental_option("prefs", { "profile.default_content_settings.geolocation": 1})


# Create the driver instance
driver = webdriver.Chrome(executable_path=driverpath, 
                          options=chrome_options)

# Search for the phrase 'pizza'
driver.get("https://google.com/search?query=pizza")

  driver = webdriver.Chrome(executable_path=driverpath,


If you are in Wellesley when you run this code, you should be able to see the Pizza location in Wellesley, as the screenshot below shows.

<img src="pizza.png" width="600">

Meanwhile, if we look at the bottom of the browser page, we will see an approximate location and the opportunity to update it, see screenshot:

<img src="location1.png" width=450>

That is what our code below will do. Selenium will click on the "Update location" link and set the coordinates to a different location of our choice.

In [16]:
# coordinates for Albany count, NY
coordDict = {'latitude': 42.588271, 'longitude': -73.974014, 'accuracy': 100}
driver.execute_cdp_cmd("Emulation.setGeolocationOverride", coordDict)
driver.find_element_by_css_selector("update-location").click()

  driver.find_element_by_css_selector("update-location").click()


If we look at the browser, two things have happened:
1. the pizza locations are different
2. the geolocation at the bottom of the page is different as well

See screenshots:

<p><img src="pizza2.png" width=600><img src="location2.png" width=450></p>

If we search for the Zip code, we find that it belongs to the town of Voorheesville, in the Albany County, in NY.
<img src="zipcode.png" width=650>

Two of the pizzerias listed in the results are in the town of Voorheesville, so the code is working correctly.

In [10]:
# Close the driver
driver.close()

Now that we know that this works, we can package everything in one single function.
We want the function to be able to:

1. take different locations
2. take different query phrases
3. save the page in a file (for later processing)

We will create a function that has three parameters to take into account these needs.

In [11]:
import time, os

def search_geolocation(query, coordinatesDict, locationName):
    """
    This function can search Google by changing the location for 
    the search. Parameters:
    query - a string that contains the phrase that will be searched
    locationName - a string that is used to save the search results page
    coordinatesDict - a dictionary with the latitude, longitude, and accuracy
    """
    # Create a new instance of the driver for every search
    driver = webdriver.Chrome(executable_path=driverpath, 
                              options=chrome_options)
    
    # setup the new coordinates
    driver.execute_cdp_cmd("Emulation.setGeolocationOverride", 
                           coordinatesDict)
    
    # perform the search, because we need the location link to show
    url = f"https://google.com/search?q={query}"
    driver.get(url)
    
    # find the link that will help update the location
    try:
        driver.find_element_by_css_selector("update-location").click()
    except:
        # sometimes, the page is not loaded, so we'll wait and try again
        sleep(2)
        driver.find_element_by_css_selector("update-location").click()
    # wait for the new content to be loaded
    sleep(2)

    # Access the content of the page
    htmlPage = driver.page_source
    
    # if a folder with the name of the query doesn't exist, create it, then save the file
    if not os.path.isdir(query):
        os.mkdir(query)
    with open(f"{query}/{locationName}.html", 'w', encoding='utf-8') as output:
        output.write(htmlPage)
        
    # close the instance
    driver.close()

Let's test this function:

In [12]:
coordDict = {'latitude': 42.588271, 'longitude': -73.974014, 'accuracy': 100}
query = "pizza"
location="Albany_County"

search_geolocation(query, coordDict, location)

Check the folder of the notebook. A new directory "pizza" with the file "Albany_County.html" should have been stored within it, with the pizza locations that we saw previously in the notebook.

<a id="sec4"></a>
## 4. Perform searches

Now that the code works and we have a function that can be called with various paremeters, let's search for a few queries in the random locations we identified above. We need first to get the data from the dataframe and prepare them as parameters for the search function:

In [13]:
for ind in ourChoices.index:
    row = ourChoices.loc[ind].to_dict()
    fileName = f"{row['State']}_{row['CountySeat']}"
    locationsDct = {'latitude': row['Latitude'], 
                    'longitude': row['Longitude'], 
                    'accuracy': 100}
    print(fileName, locationsDct)

AL_Russellville {'latitude': 34.441988, 'longitude': -87.842815, 'accuracy': 100}
GA_Leesburg {'latitude': 31.818419, 'longitude': -84.146681, 'accuracy': 100}
LA_Amite {'latitude': 30.621581, 'longitude': -90.406633, 'accuracy': 100}
MA_Boston {'latitude': 42.33196, 'longitude': -71.020173, 'accuracy': 100}
MI_Menominee {'latitude': 45.544174, 'longitude': -87.50989200000002, 'accuracy': 100}
ND_Carson {'latitude': 46.357827, 'longitude': -101.639049, 'accuracy': 100}
NY_Albany {'latitude': 42.588271, 'longitude': -73.974014, 'accuracy': 100}
OH_Hamilton {'latitude': 39.439915, 'longitude': -84.56539699999998, 'accuracy': 100}
TX_Kountze {'latitude': 30.329612, 'longitude': -94.393149, 'accuracy': 100}
VA_Bedford  {'latitude': 37.31240800000001, 'longitude': -79.527947, 'accuracy': 100}


This works fine, so we'll go and perform the searches now.

We'll search for three queries: "supreme court", "covid tests", "vaccine mandate".

In [14]:
for query in ["supreme court", "covid tests", "vaccine mandate"]:
    
    for ind in ourChoices.index:
        row = ourChoices.loc[ind].to_dict()
        fileName = f"{row['State']}_{row['CountySeat']}"
        locationsDct = {'latitude': row['Latitude'], 
                        'longitude': row['Longitude'], 
                        'accuracy': 100}
        
        search_geolocation(query, locationsDct, fileName)
        
        print(query, fileName)

supreme court AL_Russellville
supreme court GA_Leesburg
supreme court LA_Amite
supreme court MA_Boston
supreme court MI_Menominee
supreme court ND_Carson
supreme court NY_Albany
supreme court OH_Hamilton
supreme court TX_Kountze
supreme court VA_Bedford 
covid tests AL_Russellville
covid tests GA_Leesburg
covid tests LA_Amite
covid tests MA_Boston
covid tests MI_Menominee
covid tests ND_Carson
covid tests NY_Albany
covid tests OH_Hamilton
covid tests TX_Kountze
covid tests VA_Bedford 
vaccine mandate AL_Russellville
vaccine mandate GA_Leesburg
vaccine mandate LA_Amite
vaccine mandate MA_Boston
vaccine mandate MI_Menominee
vaccine mandate ND_Carson
vaccine mandate NY_Albany
vaccine mandate OH_Hamilton
vaccine mandate TX_Kountze
vaccine mandate VA_Bedford 
