# Travel Mapper Notebook

My family and I did a world travel in 2021/22 and visited a couple of countries during that time. In this project, I want to visualize the airport locations we visited on an interactive map using Folium and show a time beam with the dates we arrived and departed from each airport. This allows users to easily see and explore where we traveled to and how long we stayed there. It can also easily be used to visualize any other location than airports only.

## What I Do

1. **Data Loading**: I load a CSV file (`locations.csv`) containing the airports and departure/arrival timestamps.
2. **Geocoding Locations and Data Preparation**: I use the geopy library to geocode the locations of the airports and obtain their latitude and longitude coordinates. I merge the geocoded coordinates with the original DataFrame and export the augmented data to a new CSV file.
3. **Map Creation, Adding Markers, and Display**: Using Folium, I create an interactive map centered on the mean location of the provided coordinates. I iterate through the travel locations and add markers to the map with popups displaying the location names. The popups are customized to be flexible in width to accommodate varying lengths of text. Finally, I display the interactive map with all the markers.
4. **Creating and Displaying a Time Beam**: I use Plotly to create a time beam visualization that shows the duration of stays at various locations. The time beam provides an intuitive view of the travel timeline.

## How It Works

1. **Importing Libraries**: Import the necessary libraries, including Pandas for data handling, Folium for map visualization, and Plotly for creating interactive plots.

In [63]:
# Step 1: Import Libraries
import pandas as pd
from geopy.geocoders import Nominatim
import time
import folium
import plotly.express as px
from IPython.display import display
import pytz

2. **Loading Data**: Read the CSV file containing the travel locations into a DataFrame using Pandas.

In [62]:
# Step 2: Read the CSV file
df = pd.read_csv('data/locations.csv',sep=";")
display(df.head())

# Step 3: Extract list of unique locations
locations = df.location.unique()

Unnamed: 0,departure_arrival,timestamp,location
0,departure,08/09/2021 11:45:00 Europe/Berlin,"Frankfurt Airport, Germany"
1,arrival,08/09/2021 13:00:00 Europe/Amsterdam,"Amsterdam Airport Schiphol, Netherlands"
2,departure,08/10/2021 11:50:00 Europe/Amsterdam,"Amsterdam Airport Schiphol, Netherlands"
3,arrival,08/10/2021 17:30:00 America/Aruba,"Reina Beatrix Airport, Aruba"
4,departure,09/10/2021 13:30:00 America/Aruba,"Reina Beatrix Airport, Aruba"


3. **Geocoding Locations and Data Preparation**: Use the geopy library to geocode the locations and obtain their latitude and longitude coordinates. Merge the geocoded coordinates with the original DataFrame and export the augmented data to a new CSV file.

In [3]:
# Step 3: Geocode locations using geopy
# Initialize Nominatim geocoder with a unique user agent
geolocator = Nominatim(user_agent="UseYourOwnUserAgentNameHere")

# Function to geocode location
def geocode_location(location):
    try:
        location_obj = geolocator.geocode(location)
        if location_obj:
            return location_obj.latitude, location_obj.longitude
        else:
            return None, None
    except Exception as e:
        print(f"Error geocoding {location}: {e}")
        return None, None

# Function to get coordinates and add to DataFrame
def get_coordinates(locations):
    coordinates = [geocode_location(location) for location in locations]
    df = pd.DataFrame({
        'location': locations,
        'latitude': [coord[0] for coord in coordinates],
        'longitude': [coord[1] for coord in coordinates]
    })

    # Count the number of NaNs before dropping
    nans_before = df['latitude'].isna().sum()

    # Drop rows with NaN values in latitude or longitude
    df.dropna(subset=['latitude', 'longitude'], inplace=True)

    # Print the number of NaNs dropped
    nans_after = df['latitude'].isna().sum()
    print(f"Number of NaNs dropped: {nans_before - nans_after}")
    
    return df

# Get coordinates for the list of locations
coordinates = get_coordinates(locations)

# Merge the coordinates DataFrame with the existing DataFrame
df_coordinates = df.merge(coordinates, on='location', how='left')

# Save the dataframe with coordinates to a CSV file
df_coordinates.to_csv('data/locations_with_coordinates.csv', index=False)

Number of NaNs dropped: 0


4. **Map Creation, Adding Markers, and Display**: Using Folium, create an interactive map centered on the mean location of the provided coordinates. Add markers to the map with popups displaying the location names, customized to be flexible in width. The interactive map is then displayed.

In [58]:
# Step 4: Create a map with Folium
map_center = [df_coordinates['latitude'].mean(), df_coordinates['longitude'].mean()]
map_locations = folium.Map(
    location=map_center, 
    zoom_start=2,
    min_zoom=2,
    max_bounds=True,
    tiles='OpenStreetMap',
    no_wrap=True,
    prefer_canvas=True
)

for idx, row in df_coordinates.iterrows():
    # Custom HTML content for the popup to make it wider
    popup_content = f"""
    <div style="min-width: 0px; max-width: 500px;">
        <p>{row['location']}</p>
    </div>
    """
    popup = folium.Popup(popup_content, max_width=300)
    folium.Marker([row['latitude'], row['longitude']], popup=popup).add_to(map_locations)

# Display the map
map_locations

From the visualization, we can derive that most airports visited during the world travels are situated in the Americas, some in Europe and even fewer in Australia.

5. **Data Preparation**: I prepare the data by calculating the duration of stay at each location.

In [5]:
def create_stays_df(df):
    # Initialize an empty list to store the new rows
    new_data = []

    # Iterate through the DataFrame
    for i in range(len(df)-1):
        if df.iloc[i]['departure_arrival'] == 'arrival' and df.iloc[i+1]['departure_arrival'] == 'departure':
            new_row = {
                'location': df.iloc[i]['location'],
                'arrival_date': df.iloc[i]['timestamp'],
                'departure_date': df.iloc[i+1]['timestamp']
            }
            new_data.append(new_row)

    # Create the new DataFrame
    new_df = pd.DataFrame(new_data)
    return new_df

# Create the new DataFrame
df_stays = create_stays_df(df)

# Split the timestamp into two columns: datetime and timezone
df_stays[['arrival_datetime', 'arrival_timezone']] = df_stays['arrival_date'].str.rsplit(' ', n=1, expand=True)
df_stays[['departure_datetime', 'departure_timezone']] = df_stays['departure_date'].str.rsplit(' ', n=1, expand=True)

# Convert the datetime columns to datetime objects
df_stays['arrival_datetime'] = pd.to_datetime(df_stays['arrival_datetime'])
df_stays['departure_datetime'] = pd.to_datetime(df_stays['departure_datetime'])

# Function to convert the datetime to UTC using the timezone
def convert_to_utc(dt, tz_str):
    tz = pytz.timezone(tz_str)
    dt = tz.localize(dt)
    dt_utc = dt.astimezone(pytz.utc)
    return dt_utc

# Apply the function to convert to UTC
df_stays['arrival_utc'] = df_stays.apply(lambda row: convert_to_utc(row['arrival_datetime'], row['arrival_timezone']), axis=1)
df_stays['departure_utc'] = df_stays.apply(lambda row: convert_to_utc(row['departure_datetime'], row['departure_timezone']), axis=1)

# Compute the duration in days between each row
df_stays['duration'] = (df_stays['departure_utc'] - df_stays['arrival_utc']).dt.total_seconds() / (3600 * 24)

# Set the duration for the first row to 0
df_stays.loc[0, 'duration'] = 0

# Prepare data for plotting
df_stays['start'] = df_stays['arrival_utc'].dt.strftime('%Y-%m-%d %H:%M:%S')
df_stays['end'] = df_stays['departure_utc'].dt.strftime('%Y-%m-%d %H:%M:%S')

# Ensure 'start' and 'end' columns are datetime objects
df_stays['start'] = pd.to_datetime(df_stays['start'])
df_stays['end'] = pd.to_datetime(df_stays['end'])

# Prepare the data for the horizontal bar chart
df_stays['duration'] = df_stays['end'] - df_stays['start']
df_stays['start_str'] = df_stays['start'].astype(str)
df_stays['end_str'] = df_stays['end'].astype(str)

6. **Time Beam Creation**: I use Plotly to create a bar chart where each bar represents the duration of stay at a location.

In [64]:
# Create the figure
fig = go.Figure()

# Add traces to the figure
for i, row in df_stays.iterrows():
    duration_seconds = (row['end'] - row['start']).total_seconds()
    fig.add_trace(go.Bar(
        x=[''],
        y=[duration_seconds],
        name=row['location'],
        orientation='v',
        text=row['location'],
        textposition='inside',
        textfont=dict(size=10),
        hoverinfo='text',
        hovertemplate=(
            f"<b>Airport:</b> {row['location']}<br>"
            f"<b>Arrival:</b> {row['start']}<br>"
            f"<b>Departure:</b> {row['end']}<br>"
            "<extra></extra>"
        ),
        constraintext='inside'
    ))


7. **Customizing the Visualization**: The bars are customized with labels and tooltips to provide more information.

In [None]:
# Calculate the total duration in seconds for y-axis range
min_start = df_stays['start'].min()
max_end = df_stays['end'].max()
total_duration = (max_end - min_start).total_seconds()

# Function to avoid overlapping dates
def generate_tick_labels(df, min_start, threshold_days=4):
    tickvals = []
    ticktext = []
    last_label_date = min_start - pd.Timedelta(days=threshold_days)

    for i, row in df.iterrows():
        start_date = row['start']
        if (start_date - last_label_date).days >= threshold_days:
            tickvals.append((start_date - min_start).total_seconds())
            ticktext.append(start_date.strftime('%b %d, %Y'))
            last_label_date = start_date

    return tickvals, ticktext

# Generate tick values and labels avoiding close dates
tickvals, ticktext = generate_tick_labels(df_stays, min_start)

8. **Displaying the Time Beam**: The time beam is displayed, showing an intuitive timeline of travel durations.

In [65]:
# Update layout to remove the legend and customize the x-axis labels
fig.update_layout(
    title="Time Beam of Airports",
    yaxis=dict(
        title='Timeline',
        tickvals=tickvals,
        ticktext=ticktext,
        showticklabels=True,
        autorange="reversed",
    ),
    xaxis=dict(
        title='',
        showticklabels=False,
        showgrid=False
    ),
    margin=dict(t=40, b=0),
    showlegend=False,
    barmode='stack',
    height=1000,
    width=600,
    paper_bgcolor='white',
    plot_bgcolor='white'
)

# Show the figure
fig.show()

From the chart, it can be deducted that the longest stays were in the Dominican Republic, Mexico and Canada. There was also a longer stay in Australia which included some domestic flights to Brisbane and Cairns.

## Conclusion
In this notebook, I demonstrated how to visualize airport locations on an interactive map using Folium and how to create a time beam to represent the duration of stays using Plotly. These visualizations provide an intuitive way to explore travel data. The interactive map allows users to see the geographical spread of travel destinations, while the time beam offers a clear timeline of travel durations. This combination of tools can be particularly useful to visualize longer travels and travels which consists of several countries.