# Coordinates Checker
In this notebook, we will try to solve any issue we could have encountered with the process of downloading GSV images, specially the issue of missing coordinates during the process. 

## Load Checkpoint file

In [None]:
import pandas as pd

# Load the CSV file
file_path = 'image_status_checkpoint.csv' 
df = pd.read_csv(file_path)

# Drop duplicates based on 'X' and 'Y' columns
df1 = df.drop_duplicates(subset=['X', 'Y'])

In [None]:
df1

## Load Original Coordinates File

In [None]:
file_name = 'data/inhabit.xlsx'
df  = pd.read_excel(file_name, sheet_name='all_data') 
df2 = df.drop_duplicates(subset=['X', 'Y'])

In [None]:
df2

## Find Coordinates in Original File but Missing in Checkpoint

In [None]:
# Merge with an indicator to find rows in df2 not in df1
merged_df = df2.merge(df1[['X', 'Y']], on=['X', 'Y'], how='left', indicator=True)

# Filter rows that are only in df2
df_not_in_both = merged_df[merged_df['_merge'] == 'left_only'].drop(columns=['_merge'])

# Save or display the resulting DataFrame
df_not_in_both.to_csv('unique_to_df2.csv', index=False)  # Optional: save the results to a file

In [None]:
df_not_in_both

## Find Coordinates in Checkpoint but Not in Original File

In [None]:
import pandas as pd

# Merge with an indicator to find rows in df2 not in df1
merged_df = df2.merge(df1[['X', 'Y']], on=['X', 'Y'], how='right', indicator=True)

# Filter rows that are only in df2
df_not_in_both = merged_df[merged_df['_merge'] == 'right_only'].drop(columns=['_merge'])

# Save or display the resulting DataFrame
df_not_in_both.to_csv('unique_to_df1.csv', index=False)  # Optional: save the results to a file

In [None]:
df_not_in_both

## Fix Checkpoint File by Removing Incorrect Entries

In [None]:
import pandas as pd

# Load the main DataFrame
file_path = 'image_status_checkpoint.csv'  # Replace with your actual file path
df = pd.read_csv(file_path)


In [None]:
df

In [None]:
# Assuming df_not_in_both is already defined and loaded
# Filter out rows in `df` that match `X` and `Y` pairs in `df_not_in_both`
df_filtered = df[~df.set_index(['X', 'Y']).index.isin(df_not_in_both.set_index(['X', 'Y']).index)]

In [None]:
df_filtered

In [None]:
# Save or display the filtered DataFrame
df_filtered.to_csv('image_status_checkpoint.csv', index=False)  # Optional: save to a new file


In [None]:
import pandas as pd

# Load CSV into a DataFrame
df = pd.read_csv('unique_to_df1.csv')
df

### Go back to 1.StreetView.ipynb
After running this process, you should use `image_status_checkpoint.csv` as your reference file in **1.StreetView.ipynb**, but specifically, the filtered version saved at the end of the "Fix Checkpoint File" section.

This filtered file (`image_status_checkpoint.csv` after modification) removes locations that have already been processed, ensuring that you only go through new locations without repeating the ones you've already covered.

Steps to Proceed:
1. Use the updated `image_status_checkpoint.csv` file for tracking progress.
2. If you need to verify which locations are missing, check `unique_to_df2.csv` (locations in the original dataset but missing in the checkpoint).
3. If you want to see extra locations in your checkpoint that are not in the original, check `unique_to_df1.csv`.