# Research on datasets for Assignment B

### B1: Video:

The first part of the final project is an 1 minute movie, which should explain the central idea/concept that you will investigate in your final project. You're making the movie so that the TAs and I can give you feedback, and so that other groups can steal your ideas (and you can steal ideas from them). The movie must contain the following

- An explanation of the central idea behind your final project, e.g. think about questions such as
    - What is the idea?
    - Which datasets do you need to explore the idea?,
    - Why is it interesting?

- A mock up of the visualization that you wish to build. (Anything is fine here. Pen and paper, MS Paint, Inkscape, D3, Midjourney (or any other LLM), anything is OK).


- Make sure you answer the questions
    - What genre is it? (for Genres, see section 4.3 of the Segel and Heer paper)
    - Why is that genre right for telling the story you want to communicate with the data
    - A walk-through of your preliminary data-analysis, addressing
        - What is the total size of your data? (MB, number of rows, number of variables, etc)
        - What are other properties? (What is the date range? Is is it geo-data?, then a quick plot of locations, etc.)

Show the fundamental distributions of the data (similar to the work we did on SF crime data)
But other than that, there are no constraints. And we do appreciate funny/inventive/beautiful movies, although the academic content is most important. Note that we'll display the movie to the entire class.

(The maximum length is 1 minutes, but its OK if the movie is shorter.)

## Garmin dataset

In [13]:
%matplotlib inline
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
from glob import glob
import json
import matplotlib.pyplot as plt
import seaborn as sns
garmin_base_dir = "../files/Garmin_20241403"

## Files in the exported Garmin dataset

Seems like 
- DI-Connect-Aggregator contains daily health data
- DI-Connect-Wellness contains sleep data

In [14]:
!ls -R {garmin_base_dir} 


[34mDI_CONNECT[m[m            [34mIT_DEVICE_AND_CONTENT[m[m [34mcustomer_data[m[m

../files/Garmin_20241403/DI_CONNECT:
[34mDI-ATP[m[m                    [34mDI-Connect-Routing[m[m        [34mDI-Connect-Wellness[m[m
[34mDI-Connect-Aggregator[m[m     [34mDI-Connect-Social[m[m         [34mDI-GOLF[m[m
[34mDI-Connect-Fitness[m[m        [34mDI-Connect-Uploaded-Files[m[m
[34mDI-Connect-Metrics[m[m        [34mDI-Connect-User[m[m

../files/Garmin_20241403/DI_CONNECT/DI-ATP:
Garmin_Coach_Pause_History.json

../files/Garmin_20241403/DI_CONNECT/DI-Connect-Aggregator:
HydrationLogFile_2020-05-14_2020-08-22.json
HydrationLogFile_2020-08-22_2020-11-30.json
HydrationLogFile_2020-11-30_2021-03-10.json
HydrationLogFile_2021-03-10_2021-06-18.json
HydrationLogFile_2021-06-18_2021-09-26.json
HydrationLogFile_2021-09-26_2022-01-04.json
HydrationLogFile_2022-01-04_2022-04-14.json
HydrationLogFile_2022-04-14_2022-07-23.json
HydrationLogFile_2022-07-2

### Outcommented python code to print a head of oll json-files below

...


<!-- %matplotlib inline
import pandas as pd
import os
import json
from IPython.display import display

# The corrected path
garmin_base_dir = "../files/Garmin_20241403"
di_connect_path = os.path.join(garmin_base_dir, "DI_CONNECT")  # Correcting the folder name

# Adjusting the function to account for JSON structures with multiple entries
def find_and_dance_json(path):
    for root, dirs, files in os.walk(path):
        for file in files:
            if file.endswith('.json'):
                full_path = os.path.join(root, file)
                with open(full_path, 'r') as f:
                    data = json.load(f)
                
                # Checking if the JSON is a list of dictionaries
                if isinstance(data, list) and all(isinstance(item, dict) for item in data):
                    df = pd.DataFrame(data)
                elif isinstance(data, dict):
                    # For a single dictionary, we still convert it into a DataFrame
                    df = pd.DataFrame([data])
                else:
                    # If the structure is different, we might need to adjust the approach
                    print(f"Unexpected JSON structure in {file}")
                    continue
                
                # Displaying the DataFrame
                print(f"DataFrame for {os.path.basename(file)}:")
                display(df.head())

# Time for the adjusted dance!
find_and_dance_json(di_connect_path)
-->

### End of script

## Getting daily health data from DI-Connect-Aggregator


In [18]:
%matplotlib inline
import pandas as pd
import os
import json
from IPython.display import display

pd.set_option('display.max_columns', None)

# The corrected path
garmin_base_dir = "../files/Garmin_20241403"
di_connect_path = os.path.join(garmin_base_dir, "DI_CONNECT", "DI-Connect-Aggregator")  # Correcting the folder name

# Adjusting the function to store each JSON as a separate DataFrame
def find_json(path):
    dfs = {}  # Dictionary to hold DataFrames
    for root, dirs, files in os.walk(path):
        for file in files:
            if file.endswith('.json') & file.startswith('UDS'):
                full_path = os.path.join(root, file)
                with open(full_path, 'r') as f:
                    data = json.load(f)

                # Checking if the JSON is a list of dictionaries
                if isinstance(data, list) and all(isinstance(item, dict) for item in data):
                    df = pd.DataFrame(data)
                elif isinstance(data, dict):
                    # For a single dictionary, we still convert it into a DataFrame
                    df = pd.DataFrame([data])
                else:
                    # If the structure is different, we might need to adjust the approach
                    print(f"Unexpected JSON structure in {file}")
                    continue

                # Storing the DataFrame in the dictionary with the filename as key
                dfs[file] = df

                # Displaying the DataFrame's first few rows
                print(f"DataFrame for {os.path.basename(file)} loaded.")
                #display(dfs[file].head())

    return dfs

dfs = find_json(di_connect_path)

len(dfs)

#dfs['UDSFile_2022-07-24_2022-11-01.json'].head()


#dfs['UDSFile_2022-07-24_2022-11-01.json'][['calendarDate', 'totalKilocalories', 'activeKilocalories', 'restingCaloriesFromActivity', 'totalSteps', 'moderateIntensityMinutes', 'vigorousIntensityMinutes', 'userIntensityMinutesGoal', 'minHeartRate', 'maxHeartRate', 'restingHeartRate','minAvgHeartRate','maxAvgHeartRate','allDayStress','bodyBattery']]
#dfs['UDSFile_2022-07-24_2022-11-01.json'].at[0,'allDayStress']
#dfs['UDSFile_2022-07-24_2022-11-01.json'].at[0,'bodyBattery']

DataFrame for UDSFile_2012-09-14_2012-12-23.json loaded.
DataFrame for UDSFile_2011-05-03_2011-08-11.json loaded.
DataFrame for UDSFile_2011-08-11_2011-11-19.json loaded.
DataFrame for UDSFile_2022-07-24_2022-11-01.json loaded.
DataFrame for UDSFile_2022-11-01_2023-02-09.json loaded.
DataFrame for UDSFile_2023-08-28_2023-12-06.json loaded.
DataFrame for UDSFile_2005-11-10_2006-02-18.json loaded.
DataFrame for UDSFile_2014-08-15_2014-11-23.json loaded.
DataFrame for UDSFile_2014-01-27_2014-05-07.json loaded.
DataFrame for UDSFile_2021-09-27_2022-01-05.json loaded.
DataFrame for UDSFile_2013-07-11_2013-10-19.json loaded.
DataFrame for UDSFile_2022-01-05_2022-04-15.json loaded.
DataFrame for UDSFile_2021-03-11_2021-06-19.json loaded.
DataFrame for UDSFile_2022-04-15_2022-07-24.json loaded.
DataFrame for UDSFile_2023-12-06_2024-03-15.json loaded.
DataFrame for UDSFile_2020-05-15_2020-08-23.json loaded.
DataFrame for UDSFile_2023-05-20_2023-08-28.json loaded.
DataFrame for UDSFile_2021-06-1

22

## 22 json files into one dataframe of health data + remove data from before 2020 (no watch)

In [28]:
import pandas as pd
import os
import json

# Setting up paths and configurations
garmin_base_dir = "../files/Garmin_20241403"
di_connect_path = os.path.join(garmin_base_dir, "DI_CONNECT", "DI-Connect-Aggregator")
columns_of_interest = ['calendarDate', 'totalKilocalories', 'activeKilocalories', 'restingCaloriesFromActivity', 
                       'totalSteps', 'moderateIntensityMinutes', 'vigorousIntensityMinutes', 'userIntensityMinutesGoal', 
                       'minHeartRate', 'maxHeartRate', 'restingHeartRate', 'minAvgHeartRate', 'maxAvgHeartRate',
                       'allDayStress', 'bodyBattery']

# Function to load JSONs and combine them into a filtered DataFrame
def load_and_filter_json(path, start_year=2020):
    all_dfs = []
    for root, _, files in os.walk(path):
        json_files = [f for f in sorted(files) if f.startswith('UDS') and f.endswith('.json')]
        for file in json_files:
            with open(os.path.join(root, file), 'r') as f:
                data = json.load(f)
            df = pd.DataFrame(data) if isinstance(data, list) else pd.DataFrame([data])
            all_dfs.append(df)

    # Combining and filtering the data
    if all_dfs:
        full_df = pd.concat(all_dfs, ignore_index=True)
        full_df['calendarDate'] = pd.to_datetime(full_df['calendarDate'])
        filtered_df = full_df.loc[full_df['calendarDate'].dt.year >= start_year, columns_of_interest]
        return filtered_df
    return pd.DataFrame()  # Return empty DataFrame if no data was loaded

# Apply the function and display the resulting DataFrame
focus_df = load_and_filter_json(di_connect_path)
focus_df.head()


Unnamed: 0,calendarDate,totalKilocalories,activeKilocalories,restingCaloriesFromActivity,totalSteps,moderateIntensityMinutes,vigorousIntensityMinutes,userIntensityMinutesGoal,minHeartRate,maxHeartRate,restingHeartRate,minAvgHeartRate,maxAvgHeartRate,allDayStress,bodyBattery
14,2020-06-18,1923.0,446.0,,13987.0,11.0,0.0,180.0,64.0,128.0,68.0,65.0,121.0,"{'userProfilePK': 86607424, 'calendarDate': '2...","{'userProfilePK': 86607424, 'calendarDate': '2..."
15,2020-06-19,1885.0,408.0,,12455.0,2.0,10.0,180.0,55.0,160.0,64.0,56.0,158.0,"{'userProfilePK': 86607424, 'calendarDate': '2...","{'userProfilePK': 86607424, 'calendarDate': '2..."
16,2020-06-20,2456.0,975.0,,26379.0,20.0,88.0,180.0,53.0,159.0,62.0,54.0,156.0,"{'userProfilePK': 86607424, 'calendarDate': '2...","{'userProfilePK': 86607424, 'calendarDate': '2..."
17,2020-06-21,2202.0,734.0,,12401.0,52.0,4.0,180.0,50.0,139.0,60.0,52.0,134.0,"{'userProfilePK': 86607424, 'calendarDate': '2...","{'userProfilePK': 86607424, 'calendarDate': '2..."
18,2020-06-22,2017.0,549.0,,15256.0,5.0,43.0,180.0,53.0,152.0,60.0,54.0,149.0,"{'userProfilePK': 86607424, 'calendarDate': '2...","{'userProfilePK': 86607424, 'calendarDate': '2..."


### allDaysStress and bodyBattery are dictionaries inside each date:

In [17]:
focus_df[['allDayStress']].iat[0,0]['aggregatorList']
#focus_df[['bodyBattery']].iat[0,0]['bodyBatteryStatList']

[{'type': 'TOTAL',
  'averageStressLevel': 40,
  'averageStressLevelIntensity': 35,
  'maxStressLevel': 93,
  'stressIntensityCount': 376,
  'stressOffWristCount': 148,
  'stressTooActiveCount': 253,
  'totalStressCount': 777,
  'totalStressIntensity': -5086,
  'stressDuration': 15480,
  'restDuration': 7080,
  'activityDuration': 15180,
  'uncategorizedDuration': 8880,
  'totalDuration': 46620,
  'lowDuration': 7680,
  'mediumDuration': 7020,
  'highDuration': 780},
 {'type': 'AWAKE',
  'averageStressLevel': 40,
  'averageStressLevelIntensity': 35,
  'maxStressLevel': 93,
  'stressIntensityCount': 369,
  'stressOffWristCount': 119,
  'stressTooActiveCount': 253,
  'totalStressCount': 741,
  'totalStressIntensity': -5035,
  'stressDuration': 15120,
  'restDuration': 7020,
  'activityDuration': 15180,
  'uncategorizedDuration': 7140,
  'totalDuration': 44460,
  'lowDuration': 7320,
  'mediumDuration': 7020,
  'highDuration': 780},
 {'type': 'ASLEEP',
  'averageStressLevel': 31,
  'avera

## Getting the wellness sleep data from DI-Connect-Wellness

In [35]:
folder = "DI-Connect-Wellness"
di_connect_path = os.path.join(garmin_base_dir, "DI_CONNECT", folder)

# Function to load JSONs and combine them into a filtered DataFrame
def load_and_filter_json(path, start_year=2020):
    all_dfs = []
    for root, _, files in os.walk(path):
        json_files = [f for f in sorted(files) if f.endswith('sleepData.json')]
        for file in json_files:
            with open(os.path.join(root, file), 'r') as f:
                data = json.load(f)
            df = pd.DataFrame(data) if isinstance(data, list) else pd.DataFrame([data])
            all_dfs.append(df)

    # Combining and filtering the data
    if all_dfs:
        full_df = pd.concat(all_dfs, ignore_index=True)
        full_df['calendarDate'] = pd.to_datetime(full_df['calendarDate'])
        filtered_df = full_df.loc[full_df['calendarDate'].dt.year >= start_year]
        return filtered_df
    return pd.DataFrame()  # Return empty DataFrame if no data was loaded

# Apply the function and display the resulting DataFrame
sleep_df = load_and_filter_json(di_connect_path)
sleep_df


Unnamed: 0,sleepStartTimestampGMT,sleepEndTimestampGMT,calendarDate,sleepWindowConfirmationType,retro,deepSleepSeconds,lightSleepSeconds,remSleepSeconds,awakeSleepSeconds,unmeasurableSeconds,averageRespiration,lowestRespiration,highestRespiration,spo2SleepSummary
0,2020-06-17T21:00:00.0,2020-06-18T04:00:00.0,2020-06-18,UNCONFIRMED,False,,,,,,,,,
1,2020-06-18T21:25:00.0,2020-06-19T03:46:00.0,2020-06-19,ENHANCED_CONFIRMED_FINAL,False,2880.0,15600.0,3720.0,660.0,0.0,16.0,6.0,19.0,
2,2020-06-19T19:58:00.0,2020-06-20T04:14:00.0,2020-06-20,ENHANCED_CONFIRMED_FINAL,False,3660.0,16020.0,8100.0,840.0,1140.0,16.0,7.0,21.0,"{'userProfilePk': 86607424, 'deviceId': 333492..."
3,2020-06-20T21:56:00.0,2020-06-21T04:52:00.0,2020-06-21,ENHANCED_CONFIRMED_FINAL,False,3120.0,15180.0,5640.0,1020.0,0.0,15.0,7.0,18.0,"{'userProfilePk': 86607424, 'deviceId': 333492..."
4,2020-06-21T21:09:00.0,2020-06-22T04:55:00.0,2020-06-22,ENHANCED_CONFIRMED_FINAL,False,3120.0,17400.0,7320.0,120.0,0.0,16.0,6.0,21.0,"{'userProfilePk': 86607424, 'deviceId': 333492..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1359,2024-03-09T22:12:00.0,2024-03-10T07:39:00.0,2024-03-10,ENHANCED_CONFIRMED_FINAL,False,9780.0,18060.0,5880.0,300.0,0.0,16.0,12.0,21.0,
1360,2024-03-10T18:04:00.0,2024-03-11T04:45:00.0,2024-03-11,ENHANCED_CONFIRMED_FINAL,False,4560.0,22500.0,9420.0,1980.0,0.0,16.0,12.0,21.0,
1361,2024-03-11T20:14:00.0,2024-03-12T04:25:00.0,2024-03-12,ENHANCED_CONFIRMED_FINAL,False,2580.0,17880.0,8160.0,840.0,0.0,16.0,12.0,22.0,
1362,2024-03-12T22:03:00.0,2024-03-13T02:10:00.0,2024-03-13,ENHANCED_CONFIRMED,False,5220.0,4680.0,2760.0,240.0,1920.0,16.0,12.0,19.0,


In [None]:
focus_df[['allDayStress']].iat[0,0]['aggregatorList']

## Checking the menstruation cycles from DI-Connect-Wellness as well 

In [34]:
folder = "DI-Connect-Wellness"
di_connect_path = os.path.join(garmin_base_dir, "DI_CONNECT", folder)

# Function to load JSONs and combine them into a filtered DataFrame
def load_and_filter_json(path, start_year=2020):
    all_dfs = []
    for root, _, files in os.walk(path):
        json_files = [f for f in sorted(files) if f.endswith('Cycles.json')]
        for file in json_files:
            with open(os.path.join(root, file), 'r') as f:
                data = json.load(f)
            df = pd.DataFrame(data) if isinstance(data, list) else pd.DataFrame([data])
            all_dfs.append(df)

    # Combining and filtering the data
    if all_dfs:
        full_df = pd.concat(all_dfs, ignore_index=True)
        return full_df
    return pd.DataFrame()  # Return empty DataFrame if no data was loaded

# Apply the function and display the resulting DataFrame
mens_df = load_and_filter_json(di_connect_path)
mens_df


Unnamed: 0,userProfilePk,startDate,predictedPeriodLength,actualPeriodLength,predictedCycleLength,actualCycleLength,hormonalContraception,fertileWindowStart,fertileWindowLength,status,reportTimestamp,hasLoggedOvulationDay,initialPredictedPeriodLength,initialPredictedCycleLength,createTimestamp,cycleType,applicableMenstrualCycleLength,applicablePeriodLength
0,86607424,2020-06-28,4,4,25,27.0,NONE,10.0,7.0,PAST,2020-07-27T00:00:00.0,False,4.0,25.0,2020-07-02T22:43:05.239,REGULAR,27,4
1,86607424,2020-07-25,4,4,28,26.0,NONE,9.0,0.0,PAST,2020-07-27T12:05:03.640,False,4.0,28.0,2020-07-27T12:05:03.640,REGULAR,26,4
2,86607424,2020-08-20,4,4,27,27.0,NONE,10.0,0.0,PAST,2020-10-30T06:40:43.844,False,4.0,28.0,2020-08-20T17:09:24.265,REGULAR,27,4
3,86607424,2020-09-16,5,5,27,27.0,NONE,10.0,0.0,PAST,2020-11-29T08:10:20.0,False,4.0,28.0,2020-09-18T06:13:48.736,REGULAR,27,5
4,86607424,2020-10-13,4,4,25,25.0,NONE,8.0,0.0,PAST,2020-11-29T08:10:20.0,False,4.0,24.0,2020-10-25T20:34:32.958,REGULAR,25,4
5,86607424,2020-11-07,4,4,26,19.0,NONE,,,PAST,2020-11-29T08:10:20.0,False,4.0,26.0,2020-11-07T13:30:07.153,REGULAR,19,4
6,86607424,2020-11-26,3,3,25,27.0,NONE,10.0,0.0,PAST,2020-12-29T00:00:00.0,False,4.0,25.0,2020-11-29T08:10:20.0,REGULAR,27,3
7,86607424,2020-12-23,4,4,26,26.0,NONE,9.0,0.0,PAST,2021-01-17T19:01:01.121,False,,,2020-12-29T08:33:05.856,REGULAR,26,4
8,86607424,2021-01-18,4,4,26,25.0,NONE,8.0,0.0,PAST,2021-01-18T16:54:40.318,False,4.0,26.0,2021-01-18T16:54:40.318,REGULAR,25,4
9,86607424,2021-02-12,5,4,26,25.0,NONE,8.0,0.0,PAST,2021-02-15T05:10:38.266,False,,,2021-02-15T05:10:27.615,REGULAR,25,4


## GARMIN RUNNING HEATMAP (outcommented)

<!-- import requests

my_client_id = "18a3b276-54d8-4f8b-a5fd-d0090b04a810"
my_client_secret = "MEHn2fe22ZIyJbSujpSztjK25xP5B0VYeec"

# Assuming Garmin has a straightforward OAuth2 flow (check documentation for exact steps)
def get_access_token(client_id, client_secret):
    #auth_url = 'https://connectapi.garmin.com/oauth2/token'
    auth_url = 'https://seminmis.dk'
    response = requests.post(auth_url, data={
        'grant_type': 'client_credentials',
        'client_id': client_id,
        'client_secret': client_secret,
    })
    response_data = response.json()
    return response_data['access_token']

def fetch_activities(access_token):
    activities_url = 'https://connectapi.garmin.com/activities'
    headers = {'Authorization': f'Bearer {access_token}'}
    response = requests.get(activities_url, headers=headers)
    activities = response.json()
    return activities  # This will be a list or dict depending on API

def download_gpx(activity_id, access_token):
    gpx_url = f'https://connectapi.garmin.com/activities/{activity_id}/gpx'
    headers = {'Authorization': f'Bearer {access_token}'}
    response = requests.get(gpx_url, headers=headers)
    gpx_data = response.content  # GPX data as bytes
    with open(f'{activity_id}.gpx', 'wb') as f:
        f.write(gpx_data)

# Replace 'your_client_id' and 'your_client_secret' with your actual Garmin API credentials
access_token = get_access_token(my_client_id, my_client_secret)
activities = fetch_activities(access_token)

for activity in activities:  # You may need to adjust depending on the structure of activities
    activity_id = activity['id']  # Assuming 'id' is part of the activity structure
    download_gpx(activity_id, access_token)
-->


## CATS AND THE CITY

In [7]:
import os
from PIL import Image
from PIL.ExifTags import TAGS, GPSTAGS
import pandas as pd

def get_decimal_from_dms(dms, ref):
    degrees = dms[0]
    minutes = dms[1] / 60.0
    seconds = dms[2] / 3600.0

    if ref in ['S', 'W']:
        degrees = -degrees
        minutes = -minutes
        seconds = -seconds

    return degrees + minutes + seconds

def extract_gps_info(image_path):
    try:
        image = Image.open(image_path)
        exif_data = {TAGS[key]: value for key, value in image._getexif().items() if key in TAGS and type(value) is not bytes}

        if 'GPSInfo' in exif_data:
            gps_info = {GPSTAGS.get(key, key): value for key, value in exif_data['GPSInfo'].items()}
            lat = get_decimal_from_dms(gps_info['GPSLatitude'], gps_info['GPSLatitudeRef'])
            lon = get_decimal_from_dms(gps_info['GPSLongitude'], gps_info['GPSLongitudeRef'])
            return lat, lon
        else:
            print("HERE")
            return None, None
    except:
        return None, None

def process_folder(folder_path):
    data = {'Filename': [], 'Latitude': [], 'Longitude': [], 'ImagePath': []}
    for filename in os.listdir(folder_path):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg', '.heic')):
    #            print(filename)
            lat, lon = extract_gps_info(os.path.join(folder_path, filename))
            if lat is not None and lon is not None:
                data['Filename'].append(filename)
                data['Latitude'].append(lat)
                data['Longitude'].append(lon)
                # Assuming thumbnails are stored in a parallel structure or with a naming convention
                data['ImagePath'].append(filename)
            else:
                 print("No GPS data found")
    return pd.DataFrame(data)

# Specify your folder path here
folder_path = '../files/catimg/'
df = process_folder(folder_path)

# To save the DataFrame to a CSV file
df.to_csv('geodata.csv', index=False)

df


Unnamed: 0,Filename,Latitude,Longitude,ImagePath
0,IMG_2399.jpeg,55.703536,12.521419,IMG_2399.jpeg
1,IMG_3731.jpeg,55.717647,12.495931,IMG_3731.jpeg
2,IMG_3789.jpeg,55.648728,12.604731,IMG_3789.jpeg
3,IMG_3459.jpeg,55.623853,12.573617,IMG_3459.jpeg
4,IMG_1608.jpeg,55.135456,15.140861,IMG_1608.jpeg
...,...,...,...,...
100,IMG_7073.jpeg,55.740422,12.459467,IMG_7073.jpeg
101,IMG_3175.jpeg,55.730072,12.523000,IMG_3175.jpeg
102,IMG_2461.jpeg,55.702775,12.523586,IMG_2461.jpeg
103,IMG_7049.jpeg,55.713903,12.488917,IMG_7049.jpeg


In [8]:
import folium

# Assuming 'df' is your DataFrame with the latitude and longitude

# Create a map centered around the average location in your DataFrame
map_center = [df['Latitude'].mean(), df['Longitude'].mean()]
m = folium.Map(location=map_center, zoom_start=12)

# Add markers for each image location
for idx, row in df.iterrows():
    folium.Marker([row['Latitude'], row['Longitude']],
                  popup=f"Filename: {row['Filename']}").add_to(m)

# Display the map in Jupyter Notebook
m


In [9]:
import folium
from folium import IFrame

# Your modified DataFrame now includes 'ImagePath'

map_center = [df['Latitude'].mean(), df['Longitude'].mean()]
m = folium.Map(location=map_center, zoom_start=12)

for idx, row in df.iterrows():
    # Adjust image path as necessary to point to the correct location; might need adjustments if using a notebook
    image_path =  row['ImagePath']
    iframe = IFrame(f'<img src="{folder_path}{image_path}" width="150" height="100">', width=200, height=150)
    popup = folium.Popup(iframe, max_width=300)
    folium.Marker([row['Latitude'], row['Longitude']], popup=popup).add_to(m)

m
