# Geoguessr Image Analysis
## Mason Mines: Up-and-Coming Geoguessr Fan

**Introduction:** Geoguessr is a game in which players are placed in a random google-street view somewhere across the globe, and are tasked with identifying where they are based on their surroundings. Points are awarded based on the distance between the guessed and actual location (ranging from 0-5000 points, with 5000 representing essentially adjacent guesses).

**Purpose:** By analyzing street-view data, I hope to gain a greater understanding of global locations and their common attributes. At the very least, I hope to gain more information about the locations chosen during games to gain a competitive edge.

Code and analysis done in Jupyterlab, as I believe it is a better medium to show my design-process in addition to the code.

In [79]:
import pandas as pd

data = pd.read_csv('../data/coords.csv')

# Display the first few rows of the data
print(data.head())

# Optional: Check basic information about the data
print(data.info())


   20.82488495242425  -98.4995168750031
0          -3.451752         -54.563937
1         -23.496464         -47.460542
2         -16.548678         -72.852778
3         -35.010870         140.064397
4         -14.223667         -43.753704
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9999 entries, 0 to 9998
Data columns (total 2 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   20.82488495242425  9999 non-null   float64
 1   -98.4995168750031  9999 non-null   float64
dtypes: float64(2)
memory usage: 156.4 KB
None


### Initial Data Analysis:
**Image Data** 9,999 img files (type: .png)

**Coordinate Data** [2x1] dataframe containing latitude/longitude data (type: float x2)


## Basic Data Cleaning

Includes adding data columns,


In [80]:
# Assign proper column names to the DataFrame
data.columns = ['latitude', 'longitude']
data.head()

Unnamed: 0,latitude,longitude
0,-3.451752,-54.563937
1,-23.496464,-47.460542
2,-16.548678,-72.852778
3,-35.01087,140.064397
4,-14.223667,-43.753704


## Adding Country Data

Ideally, we would be able to map the coordinates to a country location, so that we would be able check if our guess is correct to the location. We can do this through **reverse geo-coding**, or getting a location's info based on it's latitude and longitude. I did this through the package ```reverse_geocode```.


In [81]:
import reverse_geocode

def get_countries(df = data, limit : int = 5):
    data_slice = df.head(limit)
    for index, row in data_slice.iterrows():
        coord = (row['latitude'], row['longitude'])
        print(reverse_geocode.get(coord))

get_countries(10)

AttributeError: 'int' object has no attribute 'head'

Done to everything, that looks like the following.

In [64]:
import reverse_geocode


def add_country_data(data, limit : int = 5):
    """
    Adds country information to the DataFrame based on latitude and longitude.

    Args:
        data (DataFrame): A pandas DataFrame containing 'latitude' and 'longitude' columns.

    Returns:
        DataFrame: A new DataFrame with an additional 'country' column.
    """
    # Select the first `limit` rows for processing
    example_data = data.head(limit).copy()

    # Initialize an empty list to store country names
    country_list = []

    # Iterate over the rows of the provided DataFrame to fetch country data
    for index, row in example_data.iterrows():
        coord = (row['latitude'], row['longitude'])  # Extract lat/lon as a tuple
        try:
            country_info = reverse_geocode.get(coord)  # Get country information
            country_list.append(str(country_info['country']))  # Append the country name
        except Exception as e:
            print(f"Error retrieving data for coordinates {coord}: {e}")
            country_list.append(None)  # Append None if there’s an error

    # Add the 'country' column to the DataFrame
    example_data['country'] = country_list

    return example_data

mod_data = add_country_data(data, data.size)
print(mod_data.head())
print(mod_data.info())



    latitude   longitude    country
0  -3.451752  -54.563937     Brazil
1 -23.496464  -47.460542     Brazil
2 -16.548678  -72.852778       Peru
3 -35.010870  140.064397  Australia
4 -14.223667  -43.753704     Brazil
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9999 entries, 0 to 9998
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   latitude   9999 non-null   float64
 1   longitude  9999 non-null   float64
 2   country    9999 non-null   object 
dtypes: float64(2), object(1)
memory usage: 234.5+ KB
None


Note: This could also be done with other reverse-geocoded data (region, approx. city, etc), but the accuracy may be lower, especially when considering not all google street views are in cities or recognized cities.

### Adding image data

Next, we want to have a way to access the images from the data we have.

In [75]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg


def filter_by_country(data, target_country="Peru", limit=5):
    """
    Finds the indices of rows where the `country` column matches the target_country
    and displays the corresponding images.

    Args:
        data (DataFrame): A pandas DataFrame with a 'country' column.
        target_country (str): The country to search for.
        limit (int): The maximum number of images to display.
    """
    # Filter the DataFrame to locate rows where the 'country' column matches the target_country
    matching_indices = data.index[data['country'] == target_country].tolist()
    country_instances = 0
    for index in matching_indices:
        if country_instances >= limit:
            break  # Stop if the limit is reached

        img_path = f"data/dataset/{index}.png"
        try:
            # Read and display the image
            img = mpimg.imread(img_path)
            plt.imshow(img)
            plt.axis('off')
            plt.show()

            country_instances += 1  # Increment the counter after successfully showing an image
        except FileNotFoundError:
            print(f"Image not found at path: {img_path}")
        except Exception as e:
            print(f"Error displaying image {img_path}: {e}")

filter_by_country(mod_data, "Peru", 5)

Image not found at path: data/dataset/2.png
Image not found at path: data/dataset/444.png
Image not found at path: data/dataset/553.png
Image not found at path: data/dataset/557.png
Image not found at path: data/dataset/581.png
Image not found at path: data/dataset/586.png
Image not found at path: data/dataset/618.png
Image not found at path: data/dataset/676.png
Image not found at path: data/dataset/707.png
Image not found at path: data/dataset/728.png
Image not found at path: data/dataset/798.png
Image not found at path: data/dataset/901.png
Image not found at path: data/dataset/962.png
Image not found at path: data/dataset/1019.png
Image not found at path: data/dataset/1020.png
Image not found at path: data/dataset/1167.png
Image not found at path: data/dataset/1317.png
Image not found at path: data/dataset/1343.png
Image not found at path: data/dataset/1367.png
Image not found at path: data/dataset/1443.png
Image not found at path: data/dataset/1486.png
Image not found at path: dat


## Further analysis: Popularity of Countries

By analyzing the frequency of locations
- Ranked based on whatever
-
