# Danish Tourism Patterns - Analysis Notebook

## 1. Motivation

### What is your dataset?

We're analyzing Danish tourism patterns using data from multiple sources, which all are provided below:

1. **Danish Tourism Statistics** (Statistics Denmark - Statistik Banken)
   - Demographics data: Socioeconomic (FU14), Regional (FU17), Age (FU18): https://www.statistikbanken.dk/statbank5a/SelectVarVal/Define.asp?MainTable=FU14&PLanguage=0&PXSId=0&wsid=cftree

2. **Air Travel Comparisons**
   - World Bank tourism data: https://databank.worldbank.org/id/7ad54403

3. **Global Reference Data**
   - TripAdvisor Restaurants (31 European cities): https://www.kaggle.com/datasets/damienbeneschi/krakow-ta-restaurans-data-raw/code
   - Global Peace Index (GPI): https://www.kaggle.com/datasets/natalyreguer in/global-peace-index-gpi
   - Cost of Living Index: https://www.numbeo.com/cost-of-living/rankings_by_country.jsp
   - Mean Temperature by Country: https://www.kaggle.com/datasets/palinatx/mean-temperature-for-countries-by-year-2014-2022?select=combined_temperature.csv
   - CO2 Emissions: https://flightemissionmap.org/#Copenhagen/55.67,12.57/127/20000

The Danish data covers demographics with detailed stats on who travels and how much they spend. The global data helps understand why Danes choose certain destinations.

### Why did you choose these datasets?

We wanted to answer questions like:
- How has Danish tourism changed over time?
- What factors make destinations popular?
- Do different demographics travel differently?
- And just in general get as much informations about travelling, with focus on danes.

We combined government statistics with global data to get both local patterns and international context. The idea was to create something that helps people understand Danish travel trends and maybe plan their own trips.

### What was your goal for the end user's experience?

Our goal for the end user is that they are able to see how Danish tourism has evolved, Compare countries on different metrics, Explore demographic patterns and in the end be able to make informed travel decisions, and find out where the next travel destination should be.
We went for a magazine-style design that guides users through key insights but also lets them explore freely.

## 2. Basic stats

### Data cleaning process

Each visualization needed different cleaning approaches:

**For the bubble chart:**
```python
import pandas as pd
import numpy as np

# Load World Bank tourism data
df = pd.read_csv('../data/bubble_plot.csv')

# Clean currency and numeric fields
df['GDP'] = pd.to_numeric(df['GDP, PPP (current international $) [NY.GDP.MKTP.PP.CD]'].str.replace(',', ''), errors='coerce')
df['Population'] = pd.to_numeric(df['Population, total [SP.POP.TOTL]'], errors='coerce')
df['Departures'] = pd.to_numeric(df['International tourism, number of departures [ST.INT.DPRT]'].str.replace(',', ''), errors='coerce')
df['PerCapita'] = pd.to_numeric(df['International Tourism Departures per capita'], errors='coerce')

# Add continent categorization
continent_mapping = {
    'Denmark': 'Europe', 'Sweden': 'Europe', 'Norway': 'Europe', 
    'United Kingdom': 'Europe', 'Germany': 'Europe', 'France': 'Europe',
    'United States': 'North America', 'Canada': 'North America',
    'China': 'Asia', 'Japan': 'Asia', 'India': 'Asia',
    'Australia': 'Oceania', 'New Zealand': 'Oceania',
    'South Africa': 'Africa', 'Egypt, Arab Rep.': 'Africa',
    'Brazil': 'South America', 'Argentina': 'South America',
    'Saudi Arabia': 'Middle East', 'United Arab Emirates': 'Middle East'
}

df['Continent'] = df['Country Name'].map(continent_mapping).fillna('Other')
```

**For the choropleth map:**
```python
import folium
import warnings

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore', category=pd.errors.SettingWithCopyWarning)
warnings.filterwarnings('ignore', category=FutureWarning)

# Load all datasets
cost_df = pd.read_csv("../data/CostOfLiving.csv")
gpi_df = pd.read_csv("../data/GPI.csv")
temp_df = pd.read_csv("../data/combined_temperature.csv")
restaurants_df = pd.read_csv("../data/TA_restaurants_europa.csv")
CO2_df = pd.read_csv("../data/CO2Emission.csv")

# Clean and standardize country names
cost_df.columns = cost_df.columns.str.strip()
cost_df['Country'] = cost_df['Country'].str.strip()

# Country name mapping for standardization
country_name_map = {
    'United States': 'United States of America',
    'Russia': 'Russian Federation',
    'South Korea': 'Korea, Republic of',
    'Bosnia And Herzegovina': 'Bosnia and Herzegovina',
    'United Kingdom': 'United Kingdom'
    # ... (extensive mapping)
}
```

**For the demographics data:**
```python
# Process socioeconomic spending data
socio_translations = {
    "Gennemsnitshusstand": "Average Household",
    "Selvstændig": "Self-employed",
    "Lønmodtager på højeste niveau": "High Income",
    "Lønmodtager på mellemniveau": "Medium Income",
    "Lønmodtager på grundniveau": "Basic Income",
    "Arbejdsløs": "Unemployed",
    "Uddannelsessøgende": "Student",
    "Pensionist": "Pensioner",
    "Ude af erhverv We øvrigt": "Not in Workforce"
}

socio_df = socioeconomic.copy()
socio_df['Group_EN'] = socio_df['Group'].map(socio_translations)
socio_df['Total'] = socio_df['Packages'] + socio_df['Restaurants'] + socio_df['Accommodation']
```

### Dataset statistics

| Visualization | Dataset | Size | Coverage | 
|--------------|---------|------|----------|
| Bubble Chart | Tourism data | 5,000+ rows | 1996-2019, 200+ countries |
| Choropleth | Cost/Climate/Safety | 124,000+ rows | All from cost of living, peace, rent index, temperature to groceries prices CO2, resturant prices and also tripadvisor informations about best style resturants |
| Radar Charts | Region/Socioeconomic and Age | 30 rows | who spends money on travelling |

Key findings based on datasets:
- Danish tourism departures per Capita grew from 0.957 to 1.563 (1996-2019)
- Cost of living has stronger correlation with destination choice than safety
- High income groups spend 3x more on package holidays

## 3. Data Analysis

### Describe your data analysis and explain what you've learned about the dataset

#### Denmark's Tourism Journey - Bubble Chart

The bubble chart visualization tracks Denmark's tourism evolution compared to other countries:

```python
# Create animation for bubble chart
plotly_frames = []
for year in years:
    frame_data_payload = []
    for continent_or_denmark in continent_order_for_traces:
        if continent_or_denmark == 'Denmark':
            current_year_data = df_clean[(df_clean['Time'] == year) & (df_clean['IsDenmark'])]
            # Denmark gets special star marker
        else:
            current_year_data = df_clean[(df_clean['Time'] == year) & (df_clean['Continent'] == continent_or_denmark) & (~df_clean['IsDenmark'])]
        # Add trace data for each frame
    plotly_frames.append(go.Frame(data=frame_data_payload, name=str(year)))
```

**Key Insights:**
- Denmark consistently above global average in tourism per capita
- Strong growth correlates with GDP growth (400% increase 1996-2019)
- Nordic countries show similar patterns, but Denmark leads regionally

#### Global Destination Factors - Choropleth Map

Interactive world map analyzing eight different destination factors:

```python
# Define metrics for comparison
metrics = [
    ('Cost of Living Index', 'Cost of Living Index'),
    ('GPI', 'Global Peace Index'),
    ('Rent Index', 'Rent Index'),
    ('Annual_Mean_Temperature', 'Temperature (°C)'),
    ('Groceries Index', 'Groceries Index'),
    ('CO2', 'CO2 Emissions (kg)'),
    ('Restaurant Price Index', 'Restaurant Prices'),
    ('restaurants', 'Restaurant Information')
]

# Create interactive layers with tooltips
for metric in metrics:
    feature_group = folium.FeatureGroup(name=metric[1])
    # Add GeoJSON with dynamic styling
    # Enable metric switching with JavaScript
```

**Key Discoveries:**
- Cost of living is the strongest predictor of destination popularity
- Safety (GPI) has moderate negative correlation with tourism
- Temperature plays significant role in summer travel choices

#### Demographic Spending Patterns - Radar Charts

Three radar charts compare spending across demographics:

```python
# Create normalized radar charts
def plot_single_radar_chart(ax, df, chart_title, group_col='Group', display_col=None):
    # Calculate angles for radar chart
    angles = np.linspace(0, 2 * np.pi, len(groups), endpoint=False).tolist()
    angles += angles[:1]  # Complete the circle
    
    # Normalize data using global maximum
    global_max = max(
        socio_df[['Packages', 'Restaurants', 'Accommodation']].values.max(),
        age_df[['Packages', 'Restaurants', 'Accommodation']].values.max(),
        region_df[['Packages', 'Restaurants', 'Accommodation']].values.max()
    )
    
    # Plot each spending category
    for category in ['Packages', 'Restaurants', 'Accommodation']:
        values = df[category].tolist()
        values += values[:1]
        normalized = [(val / global_max * 100) for val in values]
        
        ax.fill(angles, normalized, alpha=0.25, color=category_colors[category])
        ax.plot(angles, normalized, linewidth=2, color=category_colors[category])
```

**Demographic Patterns:**
- Self-employed heavily favor package holidays
- Students spend proportionally more on accommodation
- Regional differences show Copenhagen 40% higher spending

#### Correlation Analysis - Destination Factors

Comprehensive correlation matrix reveals relationships between metrics:

```python
# Create correlation matrix
correlation_data = merged_df[[metric[0] for metric in available_metrics]].copy()
corr_matrix = correlation_data.corr()

# Interactive heatmap
fig_heatmap = go.Figure(data=go.Heatmap(
    z=corr_matrix.values,
    x=[metric[1] for metric in available_metrics],
    y=[metric[1] for metric in available_metrics],
    colorscale='RdBu',
    zmid=0,
    text=np.round(corr_matrix.values, 3),
    texttemplate='%{text}'
))
```

**Correlation Findings:**
- Cost metrics highly correlated (0.9+ between most)
- Cost vs Safety: -0.524 (moderate negative)
- Temperature vs CO2: +0.512 (climate impact)
- Cost vs Temperature: -0.317 (warmer = cheaper)

We also used:
- Correlation analysis to identify destination attractors
- Time series analysis for tourism trends
- Normalization techniques for cross-metric comparison
- Geographic clustering through choropleth visualization


## 4. Genre. Which genre of data story did you use?

For this project, we chose a magazine-style approach with interactive elements. This genre works well because it combines guided storytelling with user exploration, which is perfect for a topic like tourism where people have different interests and questions.


### missing Which tools did you use from each of the 3 categories of Visual Narrative (Figure 7 in Segal and Heer). Why? and Which tools did you use from each of the 3 categories of Narrative Structure (Figure 7 in Segal and Heer). Why?

## 5. Visualizations

We selected four main visualizations that work together to tell the story of Danish tourism from different angles.

The bubble chart shows Denmark's tourism journey from 1996 to 2019, with GDP on a log scale on the x-axis and tourism departures per capita on the y-axis. The bubble size represents population, and the animation progresses year by year. We marked Denmark with a star to make it easy to follow throughout the animation. This visualization was chosen because it shows Denmark's relative position in global tourism evolution - you can see how we've grown compared to other countries while our economy has developed.

The choropleth map displays eight different metrics that influence destination choices. Users can switch between factors like cost of living, safety, temperature, and CO2 emissions. Each country is color-coded based on the selected metric, and hovering reveals detailed data. We added restaurant markers for major European cities to provide local context. This map enables direct comparison of potential destinations across multiple factors, making the decision-making process transparent for travelers.

For demographic analysis, we created three radar charts comparing spending patterns across socioeconomic groups, age ranges, and regions. All three use normalized scales to ensure fair comparison between categories like package holidays, restaurants, and accommodation. The color coding remains consistent to make patterns easier to spot. These charts reveal how different Danish demographics approach travel spending in an intuitive format that shows relative proportions clearly.

Finally, the correlation heatmap shows relationships between all destination factors. It uses a red-blue color scale where red indicates negative correlation and blue shows positive. All correlation values are displayed numerically for precision. This visualization provides quantitative evidence for what influences destination preferences, supporting data-driven decision making.

These four visualizations complement each other perfectly. The bubble chart establishes Denmark's global context and growth story. The choropleth map empowers users to select destinations based on their preferences. The radar charts reveal demographic insights for personalization. And the correlation heatmap validates the relationships between factors. Together, they create a comprehensive toolkit for understanding Danish tourism from historical, geographic, demographic, and analytical perspectives.

## 6. Discussion. Think critically about your creation

## 7. Contributions. Who did what?

### References
1. Statistics Denmark: https://www.statistikbanken.dk
2. World Bank: https://databank.worldbank.org/id/7ad54403
3. Numbeo: https://www.numbeo.com/cost-of-living
4. Global Peace Index, Temperature data and resturant data: https://www.kaggle.com/
7. CO2 Emissions: https://flightemissionmap.org

# SKal koden bare tilføjes i bunden eller hvodan gør vi det?