# Quieter Holiday Destinations Project Report

## INTRODUCTION:
### Aims and objectives of the project
This project aims to address the needs of adventurous travellers seeking holiday destinations that offer a summer feel without the chaos of crowds. Our primary objective is to provide data-driven recommendations for alternative, quieter destinations that match popular tourist hotspots in terms of climate and experience, while also providing practical information on travel logistics.

Specifically, we aim to answer the following questions:

1. What destinations offer tourist interest but are less overcrowded than Greece?
2. Which quieter destinations also have similar weather to these popular destinations?
3. How much will flights cost?

### Roadmap of the report
This report will first outline the background and rationale for our project, followed by detailed specifications and requirements. We will then describe our implementation approach, team roles, and development methodology. The data collection section will outline our data sources, collection methods, and potential challenges. Finally, we will present our findings and conclusions.

## BACKGROUND:
The tourism industry has seen significant growth in recent years, leading to overcrowding in popular destinations. This phenomenon, often referred to as "overtourism," has negative impacts on both the traveller experience and local environments. Our project focuses on helping adventurous female solo travellers who have previously explored guidebook recommendations and now seek authentic, less crowded alternatives.

Our target audience is a mathematically literate but not technically oriented traveller who values concise, clear information that can be easily accessed on mobile devices. She plans to travel in early summer (specifically June 6th), outside school holidays, but still within peak season. She has flexibility in her return date and is open to extended stays if she enjoys a destination.

The project addresses the growing need for sustainable and enjoyable travel experiences that avoid the pitfalls of mass tourism while still providing the climate and cultural experiences travellers seek.

## SPECIFICATIONS AND DESIGN:
### Requirements - technical and non-technical

**Technical Requirements:**

- Use of Pandas, NumPy, matplotlib, seaborn
- Integration with multiple data sources, including tourism statistics databases and weather APIs
- Data cleaning and transformation capabilities for varied data formats
- Visualisation tools to present findings in an accessible, mobile-friendly format
- API integration for real-time flight information and historical weather data
- Version control for collaborative code development

**Non-technical Requirements:**

- Recommendations must be genuinely less crowded alternatives to popular destinations
- Climate data must be comparable to original destinations for the specific travel period
- Flight information must be accurate and relevant to the target audience's profile
- Visualisations must be clear and interpretable without technical expertise
- Report and findings must be concise and have the potential to be readable on mobile devices

### Design and architecture
Our solution was designed as a data pipeline that:
- Collects tourism data to identify less-visited destinations
- Analyses historical weather data to match climate conditions
- Retrieves flight information to provide practical travel logistics
- Processes and transforms this data to generate meaningful comparisons
- Visualises the results in an accessible format

The architecture employs a Jupyter notebook as the primary development and presentation environment, with modular components for each data source and analysis type.

## IMPLEMENTATION AND EXECUTION:
### Development approach and team member roles
We have adopted an Agile methodology using Trello for project management. Each team member contributes across various aspects of the project, leveraging individual strengths while developing new skills. Quality assurance partners review completed work to ensure accuracy and consistency.

Team members have been assigned specific responsibilities based on the project timeline:
- Initial project setup and GitHub repository management: Fi Douglas-Mullett
- Initial Trello kanban board setup and management: Fi Douglas-Mullett
- UNWTO tourism data exploration and processing: Fi Douglas-Mullett
- UNWTO tourism data analysis: Alannah Dowdall
- Weather API integration and analysis: Ellice Price
- SkyScanner API integration: Alannah Dowdall
- CSV dataset creation to identify the top three alternative destinations: Divya Sethi
- Flight data analysis: Zarfishan Rizwan and Claire Cooper
- ReadMe file: Claire Cooper
- Data cleaning and transformation: Fi Douglas-Mullett, Alannah Dowdall, Ellice Price
- Visualisation development: Ellice Price, Claire Cooper, Fi Douglas-Mullet, Zarfishan Rizwan, Divya Sethi and Brinel Ndombele
- Report writing: Claire Cooper
- Code review: Ellice Price, Claire Cooper, Fi Douglas-Mullett, Zarfishan Rizwan

### Tools and libraries
- Project Management: Trello for Agile workflow management
- Version Control: GitHub for code sharing and version tracking
- Development Environment: Jupyter Notebook
- Data Processing: Pandas for data manipulation and cleaning
- Data Analysis: NumPy for numerical analysis
- Visualisation: Matplotlib for creating clear, accessible graphics
- APIs: OpenWeather API and SkyScanner API
- Communication: Slack for team coordination, Discord and Zoom for group meetings

### Implementation process

Our implementation followed the planned data pipeline, with team members contributing to specific components. We began with tourism data collection and cleaning, followed by integration of weather data and flight information, and finally data transformation for analysis.

Key achievements included:

- Successfully cleaning and restructuring UNWTO tourism data into a usable format
- Obtaining a comprehensive database of country coordinates for weather data retrieval
- Developing effective API integration with both OpenWeather and SkyScanner
- Implementing data filtering based on travel safety advisories and tourism levels
- Merging multiple datasets while maintaining data integrity

Throughout implementation, we faced several obstacles requiring adaptive solutions. Initially, we planned to use city-level tourism data, but found only country-level data was consistently available. This required pivoting to a two-step approach: identifying countries with lower tourism rates, then researching specific cities within those countries. Originally, we aimed to find alternatives to popular destinations like Hvar, Bangkok, Rome, Crete, and Dubai. However, as the project progressed, we refined our focus to identify quieter alternatives specifically similar to Greece, to allow for a deeper and more manageable analysis within the project’s timeframe.

Another significant pivot involved our weather data collection. API call limits (1000 calls daily) necessitated strategic planning for data retrieval. Rather than collecting weather for every day of a potential week’s stay, we focused on historical data for June 6th across multiple years to establish reliable averages.

### Agile development
We employed several Agile practices throughout:

- Iterative development with regular reviews and adjustments
- Using Trello board to manage tasks from backlog to completion
- Weekly synchronous meetings to discuss progress and challenges
- Quality assurance partners for code review before merging
- Slack for continuous communication and updates


<img src="images/Trello Kanban board.jpg" alt="Trello Kanban board" width="1500">
<p style="font-style: italic; margin-top: 5px; margin-bottom: 0;">Our Kanban board on Trello in use</p>

### Implementation challenges ###

- Data Granularity Issues: The UNWTO tourism data provided country-level statistics rather than city-specific information, creating challenges for precise recommendations. We addressed this by developing a filtering system that identified promising countries and then researched cities within those countries.
- API Rate Limitations: The OpenWeather API's limit of 1000 calls per day constrained our ability to gather comprehensive historical weather data. We strategically prioritized countries based on initial filtering parameters and focused on obtaining the most relevant data points rather than exhaustive coverage. We focused on obtaining the mean temperature and mean rainfall per country.
- Weather data precision and accuracy: The OpenWeather API required coordinates to retrieve historical data for each country. A dataset (‘countries.csv’) published by Google was used and merged with the tourism data to provide these coordinates. The coordinates are the geographical center of every country. We obtained these coordinates rather than coordinates for the capital of every country because we anticipated that the solo female traveller may not only stay in the capital. Furthermore, we thought it would improve the validity of the results by ensuring all weather data was based on the centre of every country, thus providing a more representative dataset. We understand that this comes with limitations because the weather across even small countries can vary greatly. 
- Nested JSON Processing: The SkyScanner API returned deeply nested JSON structures that proved challenging to extract and transform. We developed custom flattening functions to reliably extract the necessary flight data.
- Country Name Inconsistencies: Different data sources used varying country name formats (e.g., "USA" vs. "United States"), complicating data merging. We implemented fuzzy matching algorithms to properly align records across datasets.
- Missing Geographical Coordinates: Several countries lacked coordinates in our initial dataset, which were essential for weather data retrieval. Team members manually researched and added these coordinates to ensure comprehensive coverage.
- Mobile Formatting Requirements: Creating visualizations that would remain clear and readable on mobile devices presented unique design challenges. We prioritized simplicity and clarity in our chart design, with careful attention to color contrast and text size.

## DATA COLLECTION:
### What information do we need?
To answer our research questions, we required:

1. Tourism statistics to identify popular destinations and their less-visited alternatives
2. Historical weather data for early June to match climate conditions
3. Flight information, including duration and costs from UK airports to potential destinations

### What information is available?
After evaluating multiple data sources, we identified:
1. UNWTO Tourism Statistics Database providing inbound tourism data for countries up to 2022
2. OpenWeather API offering historical weather data by specific location and time
3. SkyScanner API providing flight schedules, durations, and costs from UK airports

This visualisation belowe contrasts the most popular tourist destinations with the least visited countries, categorizing visitors as overnight, day, and cruise passengers, which visually reinforces our destination filtering rationale by highlighting tourism popularity patterns essential to our recommendation process.

<img src="images/tourist_data_plot.png" alt="Tourism Data" width="1500">
<p style="font-style: italic; margin-top: 5px; margin-bottom: 0;">Highest and lowest countries for Tourism</p>

### What is our data source?
We have selected the following data sources based on comprehensive evaluation:

1. UNWTO Tourism Statistics Database
   
- Provides inbound tourism data for all countries
- Contains data up to 2022
- Offers good data integrity with explanatory comments
- Limitation: The dataset is based on annual inbound tourism figures, rather than monthly. Furthermore, the dataset provides tourism figures based on each country, rather than cities. We aimed to provide an analysis based on tourism in June to recommend three destinations to our solo female traveller. We had also anticipated suggesting cities for our traveller to visit. We had attempted to explore other datasets to combine datasets in order to provide more precise recommendations. However, these datasets did not maintain consistency or integrity. Therefore, we used the UNWTO database and we understood that our analysis is not as precise as we had initially intended. 

2. OpenWeather API
   
- Allows access to historical weather data by specific location and time
- The One Call API 3.0 allows 1000 free calls a day
- Will be used to analyse weather patterns for the week of June 6th from recent years
- Limitations: Although we had filtered our data collection to save API calls, we initially anticipated to collect the average temperature and rainfall for a particular week each year across four years (e.g. 6th June - 13th June). This would have been ideal as it would have improved accuracy of results by accounting for year on year variation. Furthermore, we aimed to retrieve weather data across daylight hours of each day in order to be able to identify the maximum temperature of a day (this data would be most relevant to our traveller). However, collecting data only for 6th June across four years at 12pm used API calls that reached its daily limit. The ‘afternoon’ weather data retrieval (12pm) was chosen because this approximates maximum temperature in most locations. However, we understand that this is an approximation and, in reality, the time of day at which the temperature reaches maximum varies from location to location and as such the 12pm data does not always obtain the maximum temperature. 

3. SkyScanner API

- Offers comprehensive flight scheduling information
- Allows parameter specification to match our target audience criteria
- Provides both flight duration and cost information
- Free key allows 100 calls per month through rapidapi.com
- Limitation: With more time, we would have incorporated flight data for all recommended destinations (post-tourism data and travel advice analysis), allowing us to suggest locations with comparable weather and flights under £400. However, API call limitations restricted this approach. Instead, we demonstrated flight comparisons solely for the top three countries identified through tourism and weather data analysis.
- Limitation: SkyScanner's free API restriction (100 calls monthly) meant we could only gather flight information from one UK airport. We selected Manchester Airport as it's a major transport hub with a single international airport, avoiding the ambiguity of multi-airport cities like London. We recognise this may have affected our results compared to departures from other UK locations.

### How we collected data:

Our data collection involved multiple streams carefully integrated to provide comprehensive information:

**UNWTO Tourism Statistics**:

We accessed the UNWTO Tourism Statistics Database through a downloadable spreadsheet. The initial dataset contained tourism information across multiple years for all countries. Our team worked to transform this into a pandas DataFrame, handling various data consistency issues including missing values, inconsistent formatting across years, and country naming variations. Fi took primary responsibility for cleaning this dataset, creating a structured CSV that served as our foundation for tourism metrics.

**OpenWeather API Integration**:
Ellice developed custom functions to retrieve historical weather data through the OpenWeather API. The implementation required two key components:

1. A geographical database mapping countries to their coordinates, as the API requires latitude/longitude rather than country names
2. Functions to process API responses and calculate average temperatures and rainfall for our target travel dates

The implementation utilized date-based API calls rather than Unix timestamps, simplifying retrieval of historical data. We focused on gathering weather data for June 6th from 2021-2024 to establish reliable averages for comparison. Due to API rate limitations, we prioritized countries that had passed our initial tourism filters, saving our daily allocation of calls for the most relevant destinations.

**SkyScanner API Integration**:
Alannah developed robust functions to interact with the SkyScanner API for retrieval of flight information. The implementation faced several challenges:

1. The API returns deeply nested JSON responses requiring careful extraction
2. Results needed filtering to isolate relevant flight options
3. Data transformation was required to arrange flight information in a standardized format

The final implementation allowed us to query flight duration and cost information for specific city pairs on our target travel dates. The team implemented error handling and result validation to ensure reliable data despite API limitations.

This interactive geographic visualisation below maps flight prices from Manchester to three recommended destinations (Oranjestad, Baku, and Canberra), using circle size to represent average costs whilst providing minimum price information on hover. To use the map to its full capacity please use the Project Report Jupyter Notebook version. The below images are screenshots of the map's capabilities. 

<img src="images/Aruba_Flight_Price.png" alt="Flight Prices A" width="1200">
<img src="images/Azebaijan_Flight_Price.png" alt="Flight Prices B" width="1200">
<img src="images/Austrialia_flight_price.png" alt="Flight Prices C" width="1200">


**Government Travel Advisories**:
Claire researched and compiled UK Foreign Office travel advisories to incorporate safety considerations into our recommendations. This data was manually collected and categorized using Python's pandas library, with destinations assigned numerical safety scores (1-4) based on restriction levels. These scores were integrated into our filtering algorithm, ensuring our final recommendations prioritized traveller safety whilst maintaining compatibility with weather preferences.

This dual-axis visualisation belowe compares average June temperatures and tourism volumes across Greece and its recommended alternatives (Aruba, Azerbaijan, and Australia), demonstrating how our recommended destinations offer similar climate conditions with varying visitor numbers.

<img src="images/temperature_vs_tourism.png" alt="Temp_vs_Tourism" width="1200">


**Note on Historical Weather Data Consistency:**

During final checks, we noticed a slight discrepancy between the historical weather data initially collected and data retrieved during a later re-run, despite using identical latitude and longitude coordinates. The OpenWeather API returned slightly different temperature and rainfall values, which is unexpected for historic data. We suspect this may be due to API-side adjustments or periodic updates to their historic datasets. To ensure consistency and reproducibility across our pipeline, we decided not to re-call the API during final runs. The dataset used (Complete_Weather_Data_For_Destinations.csv - since been renamed) is based on calls made on 17/04/2025 and 18/04/2025, and all downstream processes, including the identification of our top three destinations, are based on that data.

*Please note* : a later re-run of the finalised_countries notebook (since been renamed) with newly retrieved weather data resulted in a slightly different top three: Armenia, Azerbaijan, and Aruba. However, for traceability and internal consistency, we have retained our original results.

This comprehensive visualization below displays flight price versus duration comparisons for the top four recommended destinations (Oranjestad, Baku, Canberra, and Athens), highlighting average values for each metric to support the final travel recommendations presented in the conclusion.

<img src="images/flight_price_duration.png" alt="FlightPrices" width="1200">
<img src="images/average_flight_price_duration.png" alt="FlightPrices" width="1200">


## CONCLUSION

Our project set out to provide adventurous travellers with data-driven recommendations for quieter holiday destinations offering comparable experiences to popular tourist hotspots. Through comprehensive analysis of tourism statistics, weather patterns, and flight information, we have developed a methodology that successfully identifies alternative destinations matching our target criteria.

Despite facing difficulties with data collection — particularly limited historical weather data and restricted API access — we have reached our objectives. We recommended three quieter destinations, each selected based on lower tourism volumes and similar average temperatures and rainfall compared to Greece. Through our report, our target audience — the solo female traveller — can now make an informed choice based on the tourism levels, weather graphs, and flight expenses we have provided.

If the project were to be extended, we would:

- Collect weather data for 6–13 June and additional historical records past 2021 to improve weather accuracy.
- With fewer limitations to the number of calls to the OpenWeather API, we would collect weather data from multiple coordinates across a single country, which could be utilised to produce a more representative and accurate mean average for each country. 
- Filter destinations further using expanded flight data from additional API keys.
-Apply machine learning to predict future weather and flight prices for recommended destinations.

We hope our guide empowers solo travellers to find peaceful, confident alternatives for their summer adventures, opening up new horizons beyond the traditional tourist trail.

In [4]:
!jupyter nbconvert --to pdf "Project Report.ipynb"


[NbConvertApp] Converting notebook Project Report.ipynb to pdf
[NbConvertApp] ERROR | Error while converting 'Project Report.ipynb'
Traceback (most recent call last):
  File "C:\Users\ickle\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.13_qbz5n2kfra8p0\LocalCache\local-packages\Python313\site-packages\nbconvert\nbconvertapp.py", line 487, in export_single_notebook
    output, resources = self.exporter.from_filename(
                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        notebook_filename, resources=resources
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\ickle\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.13_qbz5n2kfra8p0\LocalCache\local-packages\Python313\site-packages\nbconvert\exporters\templateexporter.py", line 390, in from_filename
    return super().from_filename(filename, resources, **kw)  # type:ignore[return-value]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ickle\AppData\Local\Packages