## ℹ️ Information

### * This notebook presents the analysis of a travel trip dataset. 
### * Our aim is to examine the effects of various characteristics on trip types and travel costs and to create meaningful visualisations from this data.
### * The data includes start and end dates of trips, types of transport and accommodation, and costs. 
### * Throughout the analysis, you will see the steps of cleaning, transforming and visualising the data with various graphs.

## Data Loading and First Review

In [1]:
import pandas as pd
import plotly.express as px
import plotly.graph_objs as go
import warnings

warnings.filterwarnings("ignore")

In [2]:
df = pd.read_csv("/kaggle/input/traveler-trip-data/Travel details dataset.csv")
df_copy = df.copy()

In [3]:
df.head()

Unnamed: 0,Trip ID,Destination,Start date,End date,Duration (days),Traveler name,Traveler age,Traveler gender,Traveler nationality,Accommodation type,Accommodation cost,Transportation type,Transportation cost
0,1,"London, UK",5/1/2023,5/8/2023,7.0,John Smith,35.0,Male,American,Hotel,1200,Flight,600
1,2,"Phuket, Thailand",6/15/2023,6/20/2023,5.0,Jane Doe,28.0,Female,Canadian,Resort,800,Flight,500
2,3,"Bali, Indonesia",7/1/2023,7/8/2023,7.0,David Lee,45.0,Male,Korean,Villa,1000,Flight,700
3,4,"New York, USA",8/15/2023,8/29/2023,14.0,Sarah Johnson,29.0,Female,British,Hotel,2000,Flight,1000
4,5,"Tokyo, Japan",9/10/2023,9/17/2023,7.0,Kim Nguyen,26.0,Female,Vietnamese,Airbnb,700,Train,200


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 139 entries, 0 to 138
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Trip ID               139 non-null    int64  
 1   Destination           137 non-null    object 
 2   Start date            137 non-null    object 
 3   End date              137 non-null    object 
 4   Duration (days)       137 non-null    float64
 5   Traveler name         137 non-null    object 
 6   Traveler age          137 non-null    float64
 7   Traveler gender       137 non-null    object 
 8   Traveler nationality  137 non-null    object 
 9   Accommodation type    137 non-null    object 
 10  Accommodation cost    137 non-null    object 
 11  Transportation type   136 non-null    object 
 12  Transportation cost   136 non-null    object 
dtypes: float64(2), int64(1), object(10)
memory usage: 14.2+ KB


In [6]:
df.describe().T

Unnamed: 0,count,mean,min,25%,50%,75%,max,std
Trip ID,139.0,70.0,1.0,35.5,70.0,104.5,139.0,40.269923
Start date,137.0,2023-05-06 01:13:34.598540032,2021-06-15 00:00:00,2022-09-01 00:00:00,2023-06-15 00:00:00,2023-11-20 00:00:00,2025-05-21 00:00:00,
End date,137.0,2023-05-13 14:00:52.554744576,2021-06-20 00:00:00,2022-09-10 00:00:00,2023-06-20 00:00:00,2023-11-30 00:00:00,2025-05-29 00:00:00,
Duration (days),137.0,7.605839,5.0,7.0,7.0,8.0,14.0,1.601276
Traveler age,137.0,33.175182,20.0,28.0,31.0,38.0,60.0,7.145441


## Date Conversions and Cleaning Missing Values

In [5]:
df["Start date"] = pd.to_datetime(df["Start date"])
df["End date"] = pd.to_datetime(df["End date"])

In [7]:
df.isnull().sum()

Trip ID                 0
Destination             2
Start date              2
End date                2
Duration (days)         2
Traveler name           2
Traveler age            2
Traveler gender         2
Traveler nationality    2
Accommodation type      2
Accommodation cost      2
Transportation type     3
Transportation cost     3
dtype: int64

In [8]:
df = df.dropna()
df = df.reset_index()

## Editing Data for Type Conversion

In [9]:
col_list = ["Transportation cost", "Accommodation cost"]

for col in col_list:
    df[col] = df[col].astype(str)
    df[col] = df[col].str.replace("$", "")
    df[col] = df[col].str.replace(" USD", "")
    df[col] = df[col].str.replace(",", "")
    df[col] = pd.to_numeric(df[col])

In [10]:
df["Transportation type"] = df["Transportation type"].astype(str)

# Grouping of Data

## Grouping by Destination

In [11]:
group_destination = df.groupby("Destination").agg({"Trip ID":"count", "Duration (days)":"mean", "Traveler age":"mean", "Traveler nationality":"max",
                              "Accommodation type": "max", "Accommodation cost" : "mean", "Transportation type" : "max", "Transportation cost" : "mean"}).reset_index()

## Grouping by Gender

In [12]:
group_gender = df.groupby("Traveler gender").agg({"Trip ID":"count", "Duration (days)":"mean", "Traveler age":"mean", "Traveler nationality":"max",
                              "Accommodation type": "max", "Accommodation cost" : "mean", "Transportation type" : "max", 
                                 "Transportation cost" : "mean"}).reset_index()

## Grouping by Accommodation Type

In [13]:
group_accommodation_type = df.groupby("Accommodation type").agg({"Trip ID":"count", "Duration (days)":"mean", "Traveler age":"mean",
                                        "Accommodation cost" : "mean", "Transportation type" : "max", 
                                        "Transportation cost" : "mean"}).reset_index()

## Grouping by Type of Traveler Nationality

In [14]:
group_nationality = df.groupby("Traveler nationality").agg({"Trip ID":"count", "Duration (days)":"mean", "Traveler age":"mean",
                                        "Accommodation cost" : "mean", "Transportation type" : "max", 
                                        "Transportation cost" : "mean"}).reset_index()

## Grouping by Type of Transport

In [15]:
group_transportation_type = df.groupby("Transportation type").agg({"Trip ID":"count", "Duration (days)":"mean", "Traveler age":"mean",
                                                                   "Accommodation cost" : "mean","Transportation cost" : "mean"}).reset_index()

## Grouping by Travel Start and End Dates

In [16]:
group_start_end_date = df.groupby(["Start date", "End date"]).agg({"Accommodation cost":"mean"}).reset_index()
group_start_end_date = group_start_end_date.sort_values("Accommodation cost", ascending = False)

## Grouping by Destination and Traveller Nationality

In [18]:
group_destination_nationality = df.groupby(["Destination", "Traveler nationality"]).agg({"Accommodation cost":"mean", "Trip ID": "count"}).reset_index()
group_destination_nationality = group_destination_nationality.sort_values("Accommodation cost", ascending = False)

# Visualisations

## Average Cost by Accommodation Types

In [19]:
group_accommodation_type = group_accommodation_type.sort_values(by='Accommodation cost')

sorted_accommodation_types = group_accommodation_type['Accommodation type'].unique()

fig = px.bar(
    group_accommodation_type,
    y='Accommodation type',
    x='Accommodation cost',
    color='Accommodation cost',
    orientation='h',
    color_continuous_scale='blugrn'  
)


fig.update_layout(

    title_x=0.5, 
    title_font_size=20,  
    legend=dict(
        font=dict(size=12),  
        bgcolor='rgba(255, 255, 255, 0.5)',  
        bordercolor='rgba(0, 0, 0, 0.5)',  
        borderwidth=2  
    ),
    margin=dict(
        l=50,
        r=50,
        b=50,
        t=80
    ))


fig.show()


## Average Cost by Mode of Transport

In [20]:
base_colors = [
    'rgb(196, 230, 195)',
    'rgb(175, 211, 182)',
    'rgb(154, 192, 170)',
    'rgb(133, 173, 157)',
    'rgb(112, 154, 145)',
    'rgb(91, 135, 133)',
    'rgb(70, 116, 120)',
    'rgb(49, 97, 108)',
    'rgb(29, 79, 96)'
]

base_colors.reverse()

In [21]:
group_transportation_type = group_transportation_type.sort_values(by='Transportation cost', ascending=False)
group_transportation_type["Transportation cost"] = group_transportation_type["Transportation cost"].round(2)
# base_colors = px.colors.sequential.Blugrn


fig = go.Figure(data=[
    go.Pie(
        sort=False,
        labels=group_transportation_type["Transportation type"],
        values=group_transportation_type["Transportation cost"],
        pull=0.09,
        hole=0.3,
        marker=dict(colors=base_colors),
    )
])


fig.update_layout(
    title='Transportation AVG Cost By Transportation Type',
    title_x=0.5,  
    title_font_size=20,
    legend=dict(
        title='Transportation Type',  
        font=dict(size=12),
        bgcolor='rgba(255, 255, 255, 0.5)',
        bordercolor='rgba(0, 0, 0, 0.5)',
        borderwidth=2 
    ),
    margin=dict(
        l=50,
        r=50,
        b=50,
        t=80
    ))

fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=16,
                  marker=dict(line=dict(color='#000000', width=1)))


fig.show()

## Number of Travellers by Mode of Transport

In [37]:
group_transportation_type = group_transportation_type.sort_values(by='Trip ID', ascending=False)


fig = px.scatter(group_transportation_type, 
                 x="Transportation type", 
                 y="Trip ID", 
                 size="Trip ID", 
                 color="Trip ID",
                 hover_name="Trip ID",
                 text="Transportation type",
                 size_max=200,
                 color_continuous_scale='blugrn', 
                 title='Traveler Count By Transportation Type')

fig.update_layout(coloraxis_colorbar=dict(title="Traveler Count"))

fig.show()


## Number of Travellers by Gender

In [23]:
base_colors = [
    'rgb(62, 193, 169)',
    'rgb(252, 19, 240)',
]

In [38]:
group_gender = group_gender.sort_values(by='Trip ID', ascending=False)

# base_colors = px.colors.sequential.Blugrn

fig = go.Figure(data=[
    go.Pie(
        sort=False,
        labels=group_gender["Traveler gender"],
        values=group_gender["Trip ID"],
        pull=0.02,
        hole=0.3,
        marker=dict(colors=base_colors),
    )
])


fig.update_layout(
    title='Traveler Count By Gender',
    title_x=0.5,
    title_font_size=20,
    legend=dict(
        title='Traveler Gender',
        font=dict(size=12),
        bgcolor='rgba(255, 255, 255, 0.5)',  
        bordercolor='rgba(0, 0, 0, 0.5)', 
        borderwidth=2  
    ),
    margin=dict(
        l=50,
        r=50,  
        b=50,  
        t=80  
    ))

fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=16,
                  marker=dict(line=dict(color='#000000', width=1)))

fig.show()

## The Relationship between Accommodation Cost and Traveller Nationality

In [32]:
group_destination_nationality_graph = group_destination_nationality.head(15)
sources = group_destination_nationality_graph["Destination"]
targets = group_destination_nationality_graph["Traveler nationality"]
values = group_destination_nationality_graph["Accommodation cost"]

labels = list(pd.concat([sources, targets]).unique())
source_indices = [labels.index(source) for source in sources]
target_indices = [labels.index(target) for target in targets]


customdata = list(zip(sources, targets))

fig = go.Figure(data=[go.Sankey(
    node=dict(
        pad=15,
        thickness=20,
        line=dict(color="black", width=0.5),
        label=labels,
        color="blue"  
    ),
    link=dict(
        source=source_indices,
        target=target_indices,
        value=values,
        customdata=customdata,
        color="lightblue",  
        hovertemplate='Destination: %{customdata[0]}<br>Traveler nationality: %{customdata[1]}<br>Accommodation cost: %{value}<extra></extra>',
    )
)])


fig.update_layout(
    title_text="Top 10 Destinations and Traveler Nationalities By Accommodation Cost",
    font_size=10,
    font_color='black',
    title_font=dict(size=20, color='darkblue'),
    plot_bgcolor='white',
    paper_bgcolor='white',
    height=600,
    width=800
)

fig.show()

## Number of Travellers by Destinations (Radar Graph)

In [47]:
categories = group_destination['Destination'].tolist()
values = group_destination['Trip ID'].tolist()

values += values[:1]
categories += categories[:1]

fig = go.Figure()

fig.add_trace(go.Scatterpolar(
    r=values,
    theta=categories,
    fill='toself',
    name='Traveler Count'
))

fig.update_layout(
    polar=dict(
        radialaxis=dict(
            visible=True,
            range=[0, max(values)]
        ),
        angularaxis=dict(
            tickmode='array',
            tickvals=[i for i in range(len(categories)) if i % 2 == 0], 
            ticktext=[categories[i] for i in range(len(categories)) if i % 2 == 0],  
            tickfont=dict(size=10)
        )
    ),
    showlegend=True
)



fig.show()


## Number of Travellers by Accommodation Types (Radar Graph)

In [48]:
categories = group_accommodation_type['Accommodation type'].tolist()
values = group_accommodation_type['Trip ID'].tolist()


values += values[:1]
categories += categories[:1]


fig = go.Figure()

fig.add_trace(go.Scatterpolar(
    r=values,
    theta=categories,
    fill='toself',
    name='Traveler Count'
))

fig.update_layout(
    polar=dict(
        radialaxis=dict(
            visible=True,
            range=[0, max(values)]
        ),
        angularaxis=dict(
            tickmode='array',
            tickvals=[i for i in range(len(categories)) if i % 2 == 0],
            ticktext=[categories[i] for i in range(len(categories)) if i % 2 == 0],
            tickfont=dict(size=10)
        )
    ),
    showlegend=True
)

fig.update_layout(coloraxis_colorbar=dict(title="Traveler Count"))

fig.show()
