# Capstone Group 10 Project
## Data Collection

# Project Overview

The COVID-19 pandemic severely affected the travel industry. International travel has been impacted, and in turn travel companies and travel websites have lost much of their engagement. Post- pandemic a huge economic downturn has also throroughly scraped travel budgets, with [recession fears](https://skift.com/2024/08/05/whats-going-on-with-the-economy-and-how-it-will-impact-the-travel-sector/#:~:text=What%E2%80%99s%20Going%20on%20With%20the%20Economy%3F%20And%20How%20Will%20it%20Impact%20the%20Travel%20Sector%3F) impacting the travel sector

However, there is hope on the horizon for international travel and a time where life is somewhat back to normal. In order to increase engagement in the travel industry and increase excitement about travel opportunities, this Travel WordFinder will be created. With the proliferation of information on the internet and the rising cases of fake SEO ranking of pages on Google and other search engines, it has also become difficult to get quick answers surrounding destinations of travel. This calls for the development of products that can ease travel in the future. 

The Travel WordFinder is a data product that will allow future travelers to get a prediction for their perfect destination with the input of just a few words.

Who will use it? Travel agencies can incorporate such as product into their website whereby they can help clients

 
### Methodology & Data Used

This project will utilize data from 21 countries recommended by Lonely Planet as 'Best Travel Destinations for 2025', which can be found via [this link](https://www.lonelyplanet.com/best-in-travel). The dataset has been compliled by scraping the titles from Lonely Planet's attractions for each of the 21 countries, including the descriptions for each of the attractions. The final dataset includes 2,600 unique text values.

### Steps taken for scraping 
- For each country in the [Best Travel Destinations for 2025](https://www.lonelyplanet.com/best-in-travel), 
- Click on the Learn more about the country link e.g for Cameroon - [Learn More about Cameroon](https://www.lonelyplanet.com/cameroon)
- This will direct you to another page with `Must See Attractions` where you should click on [View More Attractions](https://www.lonelyplanet.com/cameroon/attractions)
- You will see a list of attractions for each country and from here, you can scrape for the titles and descriptions for each. 
- NOTE: Some countries have more than one page result for the attractions while other have only one. You will see in the code below that there are different functions for each (one page result and more than one page result)

The aim will be to model using the attraction name and decsription as the features and the country as the target variable, possibly creating a recommendation system. 

In [726]:
# Import Statements

import requests
from bs4 import BeautifulSoup

import pandas as pd

### Grab the attraction titles for one city

In [727]:
# Use link for Cameroon from Lonely Planet

url = 'https://www.lonelyplanet.com/cameroon/attractions'

html_cameroon = requests.get(url)

cameroon_content = html_cameroon.content
soup = BeautifulSoup(cameroon_content, 'html.parser')

In [728]:
# Print the html code nicely
print(soup.prettify())

<!DOCTYPE html>
<html>
 <head>
  <meta charset="utf-8"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <meta content="111537044496" property="fb:app_id"/>
  <meta content="Lonely Planet" property="og:site_name"/>
  <meta content="website" property="og:type"/>
  <meta content="summary_large_image" name="twitter:card"/>
  <meta content="@lonelyplanet" name="twitter:site"/>
  <meta content="15066760" name="twitter:site:id"/>
  <title>
   Must-see attractions Cameroon, West Africa - Lonely Planet
  </title>
  <link href="https://www.lonelyplanet.com/cameroon/attractions" rel="canonical"/>
  <meta content="https://www.lonelyplanet.com/cameroon/attractions" property="og:url"/>
  <meta content="Must-see attractions Cameroon, West Africa - Lonely Planet" name="title"/>
  <meta content="Must-see attractions Cameroon, West Africa - Lonely Planet" property="og:title"/>
  <meta content="Must-see attractions Cameroon, West Africa - Lonely Planet" name="twitter:title"/>
 

In [729]:
soup.find_all("a")

[<a aria-label="Lonely Planet homepage" class="-m-1.5 p-1.5" href="/"><span class="sr-only">Lonely Planet</span><svg aria-hidden="true" class="text-5xl text-blue" fill="currentColor" height="1em" viewbox="0 0 711.06 98.68" width="2em" xmlns="http://www.w3.org/2000/svg"><path d="M219.05 75.21c-10.54 0-16.72-6.28-16.72-16.93V0h15.76v57.32c0 3 1.28 4.48 4.16 4.48h3.83v13.41ZM500.31 75.21c-10.54 0-16.72-6.28-16.72-16.93V0h15.76v57.32c0 3 1.28 4.48 4.16 4.48h3.83v13.41Z"></path><path d="M278.13 21.77h-15.76v29c0 7.45-3.83 10.65-8.94 10.65s-9-3.2-9-10.65v-29h-15.71v29.94c0 14.69 11 23.53 23.08 23.53 3.86 0 9.54-2 10.7-5.71v2.74c0 8.52-6 12.67-12.68 12.67a16.23 16.23 0 0 1-13.95-7.13L225.33 87c6.39 9.37 16.61 11.71 24.07 11.71 17.36 0 28.75-10.22 28.75-27.37V21.77ZM27 47.52c0-17.14 12.25-29.07 28.86-29.07s29 11.93 29 29.07-12.28 29.08-29 29.08S27 64.67 27 47.52Zm41.75 0c0-9.79-5.54-15.12-12.89-15.12s-12.67 5.33-12.67 15.12 5.43 15.23 12.67 15.23 12.89-5.43 12.89-15.23ZM197.26 51.46h-37.6c1.18

In [730]:
# Grab the container with the attraction name inside
container = soup.find('ul', class_="md:grid space-y-14 md:space-y-0 gap-x-6 gap-y-14 md:grid-cols-12 mb-12")
print(container)

<ul class="md:grid space-y-14 md:space-y-0 gap-x-6 gap-y-14 md:grid-cols-12 mb-12"><li class="col-span-1 md:col-span-3 lg:col-span-3"><div class="relative"><article class="relative rounded"><div class="relative flex max-w-full items-center justify-center overflow-hidden rounded relative aspect-square"><img alt="" class="max-w-full w-full h-full object-cover rounded relative aspect-square" data-nimg="1" decoding="async" height="400" loading="lazy" src="https://lonelyplanetstatic.imgix.net/marketing/placeholders/placeholder-attractions.jpg?fit=crop&amp;ar=1%3A1&amp;w=1200&amp;auto=format&amp;q=75" srcset="https://lonelyplanetstatic.imgix.net/marketing/placeholders/placeholder-attractions.jpg?fit=crop&amp;ar=1%3A1&amp;w=640&amp;auto=format&amp;q=75 1x, https://lonelyplanetstatic.imgix.net/marketing/placeholders/placeholder-attractions.jpg?fit=crop&amp;ar=1%3A1&amp;w=1200&amp;auto=format&amp;q=75 2x" style="color:transparent" width="600"/><svg aria-hidden="true" class="absolute top-[50%] l

In [731]:
# Find H2 elements within the container
attractions = container.find_all('li', class_ = "col-span-1 md:col-span-3 lg:col-span-3")
attractions

[<li class="col-span-1 md:col-span-3 lg:col-span-3"><div class="relative"><article class="relative rounded"><div class="relative flex max-w-full items-center justify-center overflow-hidden rounded relative aspect-square"><img alt="" class="max-w-full w-full h-full object-cover rounded relative aspect-square" data-nimg="1" decoding="async" height="400" loading="lazy" src="https://lonelyplanetstatic.imgix.net/marketing/placeholders/placeholder-attractions.jpg?fit=crop&amp;ar=1%3A1&amp;w=1200&amp;auto=format&amp;q=75" srcset="https://lonelyplanetstatic.imgix.net/marketing/placeholders/placeholder-attractions.jpg?fit=crop&amp;ar=1%3A1&amp;w=640&amp;auto=format&amp;q=75 1x, https://lonelyplanetstatic.imgix.net/marketing/placeholders/placeholder-attractions.jpg?fit=crop&amp;ar=1%3A1&amp;w=1200&amp;auto=format&amp;q=75 2x" style="color:transparent" width="600"/><svg aria-hidden="true" class="absolute top-[50%] left-0 right-0 w-24 max-w-[30%] h-auto mx-auto text-white aspect-square transform t

In [732]:
# Grab the attraction title from this span element
if attractions:
    span = attractions[0].find('span', class_= 'heading-05 font-semibold')
    if span:
        print(span.text)

Palais Royal


In [733]:
att = attraction.find('a')
att

<a class="card-link line-clamp-2 w-[80%] md:w-90" href="/cameroon/northern-cameroon/attractions/parc-national-du-waza/a/poi-sig/1300504/1327480"><span class="heading-05 font-semibold">Parc National du Waza</span></a>

In [734]:
# Grab all titles with a for loop and store in a list

final_attractions = []

for attraction in attractions:
    att = attraction.find('a')
    span = att.find('span', class_= 'heading-05 font-semibold')
    if span:
        final_attractions.append(span.text)

final_attractions

['Palais Royal',
 'Chefferie',
 "Fon's Palace",
 'Limbe Wildlife Centre',
 'Botanical Gardens',
 "Parc National de Campo-Ma'an",
 'Chutes de la Lobé',
 "Espace Doual'art",
 'Musée des Arts et Traditions Bamoun',
 'Bandjoun Station',
 'Grande Mosquée',
 'Cathedral',
 'Old Church',
 'Palais du Lamido',
 'Parc National du Waza']

In [735]:
# A for loop to grab all descriptions
final_descriptions = []

for attraction in attractions:
    p_tag = attraction.find('p', class_= 'relative line-clamp-3')
    if p_tag:
        final_descriptions.append(p_tag.text)

final_descriptions

["The must-see attraction is the sultan's palace, home to the 19th sultan of the Bamoun dynasty. It has a fascinating, well-organised museum providing great…",
 'Approached via a ceremonial gate, the compound is centred on a hugely impressive bamboo building, its conical thatched roof supported by wooden pillars…',
 "Just north of Bamenda is the large Tikar community of Bafut, traditionally the most powerful of the Grassfields kingdoms. The fon's (local chief's) palace…",
 'Many zoos in Africa are depressing places, but the Limbe Wildlife Centre is a shining exception. It houses rescued chimpanzees, gorillas, drills and other…',
 "Limbe's Botanical Gardens, the second oldest in Africa, are the home of, among others, cinnamon, nutmeg, mango, ancient cycads and an unnamed tree that…",
 "Campo-Ma'an comprises 7700 sq km of protected biodiverse rainforest, sheltering many wonderful plants and animals, including buffaloes, forest elephants,…",
 "The Chutes de la Lobé are an impressive set o

### Function for pages with only one result per page like Cameroon and Paraguay
There is a slight different in the class information for places with only one page result and those with more than one result. Therefore, the functions used will be different

In [736]:
# Function for pages with only one result per page
def get_attractions_one_page(html_path):
    """
    input html path from TripAdvisor as a string
    returns list of attraction names from that page
    """

    html = requests.get(html_path)
    content = html.content
    soup = BeautifulSoup(content, 'html.parser')
    soup = BeautifulSoup(html.content, 'html.parser')
    
    container = soup.find('ul', class_="md:grid space-y-14 md:space-y-0 gap-x-6 gap-y-14 md:grid-cols-12 mb-12")
    attractions = container.find_all('li', class_ = "col-span-1 md:col-span-3 lg:col-span-3")
    
    final_attractions = []
    for attraction in attractions:
        att = attraction.find('a')
        span = att.find('span', class_= 'heading-05 font-semibold')
        if span:
            final_attractions.append(span.text)
    return final_attractions


### 1. Cameroon

In [737]:
cameroon_html = 'https://www.lonelyplanet.com/cameroon/attractions'

In [738]:
all_attractions_cameroon = get_attractions_one_page(cameroon_html)
print(len(all_attractions_cameroon))
print(all_attractions_cameroon[:5])

15
['Palais Royal', 'Chefferie', "Fon's Palace", 'Limbe Wildlife Centre', 'Botanical Gardens']


In [739]:
def get_all_descriptions_one_page(html_path):
    """
    input html path from TripAdvisor as a string
    returns list of attraction names from that page
    """

    html = requests.get(html_path)
    content = html.content
    soup = BeautifulSoup(content, 'html.parser')
    soup = BeautifulSoup(html.content, 'html.parser')
    
    container = soup.find('ul', class_="md:grid space-y-14 md:space-y-0 gap-x-6 gap-y-14 md:grid-cols-12 mb-12")
    attractions = container.find_all('li', class_ = "col-span-1 md:col-span-3 lg:col-span-3")


    final_descriptions = []
    for attraction in attractions:
        p_tag = attraction.find('p', class_= 'relative line-clamp-3')
        if p_tag:
            final_descriptions.append(p_tag.text)
    return final_descriptions

In [740]:
all_descriptions_cameroon = get_all_descriptions_one_page(cameroon_html)
print(len(all_descriptions_cameroon))
print(all_descriptions_cameroon[:5])

15
["The must-see attraction is the sultan's palace, home to the 19th sultan of the Bamoun dynasty. It has a fascinating, well-organised museum providing great…", 'Approached via a ceremonial gate, the compound is centred on a hugely impressive bamboo building, its conical thatched roof supported by wooden pillars…', "Just north of Bamenda is the large Tikar community of Bafut, traditionally the most powerful of the Grassfields kingdoms. The fon's (local chief's) palace…", 'Many zoos in Africa are depressing places, but the Limbe Wildlife Centre is a shining exception. It houses rescued chimpanzees, gorillas, drills and other…', "Limbe's Botanical Gardens, the second oldest in Africa, are the home of, among others, cinnamon, nutmeg, mango, ancient cycads and an unnamed tree that…"]


In [741]:
cameroon_df = pd.DataFrame(all_attractions_cameroon, columns=['Attraction'])
cameroon_df['Description'] = all_descriptions_cameroon
cameroon_df['Country'] = 'Cameroon'
cameroon_df['Continent'] = 'Africa'
cameroon_df.head()

Unnamed: 0,Attraction,Description,Country,Continent
0,Palais Royal,The must-see attraction is the sultan's palace...,Cameroon,Africa
1,Chefferie,"Approached via a ceremonial gate, the compound...",Cameroon,Africa
2,Fon's Palace,Just north of Bamenda is the large Tikar commu...,Cameroon,Africa
3,Limbe Wildlife Centre,"Many zoos in Africa are depressing places, but...",Cameroon,Africa
4,Botanical Gardens,"Limbe's Botanical Gardens, the second oldest i...",Cameroon,Africa


In [742]:
cameroon_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/cameroon_df.csv')

### 2. Lithuania

### Make a function to grab attraction titles for any destination for places with more than one result per page
There is a slight different in the class information for places with only one page result and those with more than one result. Therefore, the functions used will be different

In [743]:
def get_attractions(html_path):
    """
    input html path from TripAdvisor as a string
    returns list of attraction names from that page
    """


    html = requests.get(html_path)
    content = html.content
    soup = BeautifulSoup(content, 'html.parser')
    soup = BeautifulSoup(html.content, 'html.parser')
    
    container = soup.find('ul', class_="md:grid space-y-14 md:space-y-0 gap-x-6 gap-y-14 md:grid-cols-12")
    attractions = container.find_all('li', class_ = "col-span-1 md:col-span-3 lg:col-span-3")
    final_attractions = []
    for attraction in attractions:
        att = attraction.find('a')
        span = att.find('span', class_= 'heading-05 font-semibold')
        if span:
            final_attractions.append(span.text)
    return final_attractions

In [744]:
lithuania_attractions = get_attractions('https://www.lonelyplanet.com/lithuania/attractions')

lithuania_attractions

['Hill of Crosses',
 'Palace of the Grand Dukes of Lithuania',
 'Vilnius Cathedral',
 'Grūtas Park',
 'Vilnius University',
 'Cold War Museum',
 'Museum of Genocide Victims',
 'Ninth Fort',
 'Cathedral Bell Tower',
 'Nemunas Delta Regional Park',
 'Trakai Castle',
 'Europos Parkas Sculpture Park',
 'Cathedral Square',
 'Amber Museum',
 'Lithuanian Ethnocosmology Museum',
 'Tolerance Centre',
 "Sts Johns' Church",
 'Užupis Art Incubator',
 'MK Čiurlionis National Museum of Art',
 'Rinkuškiai Brewery',
 'Plateliai Manor',
 'MO Museum',
 'Orvydas Garden',
 'Narrow-Gauge Railway Museum',
 'Sugihara House',
 'Museum of Devils',
 'Kiemo Galerija',
 'Choral Synagogue',
 'Mindaugas',
 'Antakalnis Cemetery',
 'Gediminas Castle & Museum',
 'Gates of Dawn',
 "St Casimir's Church",
 'Pažaislis Monastery',
 'Trakai Historical National Park',
 "St Anne's Church",
 'Holocaust Exhibition',
 'Kenessa',
 'National Museum of Lithuania',
 'Presidential Palace']

### Get all attractions from all 5 pages

In [745]:
# Figure out the pattern with page numbers
page_1 = "https://www.lonelyplanet.com/lithuania/attractions"
page_2 = "https://www.lonelyplanet.com/lithuania/attractions?page=2"


In [746]:
# Test pattern (30 attractions per page)
base = 'https://www.lonelyplanet.com/lithuania/attractions'
page = '?page'
end = "=" + str(2)

page_2 = base + page + end
page_2

'https://www.lonelyplanet.com/lithuania/attractions?page=2'

In [747]:
# Create list of all htmls for London
lithuania_htmls = ['https://www.lonelyplanet.com/lithuania/attractions'
]
for i in range(2, 6):
    base = 'https://www.lonelyplanet.com/lithuania/attractions'
    page = '?page'
    end = "=" + str(i)
    html = base + page + end
    lithuania_htmls.append(str(html))
    
print(lithuania_htmls)

['https://www.lonelyplanet.com/lithuania/attractions', 'https://www.lonelyplanet.com/lithuania/attractions?page=2', 'https://www.lonelyplanet.com/lithuania/attractions?page=3', 'https://www.lonelyplanet.com/lithuania/attractions?page=4', 'https://www.lonelyplanet.com/lithuania/attractions?page=5']


In [748]:
# Build another function that loops through the list of urls and grabs titles
def get_all_attractions(html_list):
    """
    input list of strings of html paths from TripAdvisor
    returns list of attraction names from all pages
    """
    final_attractions = []
    for html in html_list:
        html_= requests.get(html)
        # content = html.content
        soup = BeautifulSoup(html_.content, 'html.parser')

        container = soup.find('ul', class_="md:grid space-y-14 md:space-y-0 gap-x-6 gap-y-14 md:grid-cols-12")
        attractions = container.find_all('li', class_ = "col-span-1 md:col-span-3 lg:col-span-3")
        new_attractions = []
        for attraction in attractions:
            att = attraction.find('a')
            span = att.find('span', class_= 'heading-05 font-semibold')
            if span:
                new_attractions.append(span.text)
        final_attractions.extend(new_attractions)
    return final_attractions


In [749]:
# Warning! Takes a few mins to run
all_lithuania_attractions = get_all_attractions(lithuania_htmls)

In [750]:
len(all_lithuania_attractions)

200

In [751]:
all_lithuania_attractions[:5]

['Hill of Crosses',
 'Palace of the Grand Dukes of Lithuania',
 'Vilnius Cathedral',
 'Grūtas Park',
 'Vilnius University']

### Function for descriptions for attractions with more than one page results

In [752]:
def get_all_descriptions(html_list):
    """
    input html path from TripAdvisor as a string
    returns list of attraction names from that page
    """
    final_descriptions = []
    for html in html_list:
        html_= requests.get(html)
        # content = html.content
        soup = BeautifulSoup(html_.content, 'html.parser')

        container = soup.find('ul', class_="md:grid space-y-14 md:space-y-0 gap-x-6 gap-y-14 md:grid-cols-12")
        attractions = container.find_all('li', class_ = "col-span-1 md:col-span-3 lg:col-span-3")

        new_descriptions = []
        
        for attraction in attractions:
            p_tag = attraction.find('p', class_= 'relative line-clamp-3')
            if p_tag:
                new_descriptions.append(p_tag.text)
        final_descriptions.extend(new_descriptions)
    return final_descriptions


In [753]:
all_descriptions_lithuania = get_all_descriptions(lithuania_htmls)
print(len(all_descriptions_lithuania))
print(all_descriptions_lithuania[:5])

200
["Lithuania's fabled Hill of Crosses is a symbol of defiance as much as a pilgrimage site. More than 100,000 crosses have been planted on this low hill,…", 'If you only see one museum in Vilnius, make it this one. On a site that has been settled since the 4th century AD stands the latest in a procession of…', "Stately Vilnius Cathedral, divorced from its freestanding belfry, is a national symbol and the city's most instantly recognisable building. Known in full…", 'With Soviet-era statues of Lenin, Stalin and prominent Lithuanian members of the Communist Party that once dominated Lithuanian towns lining the forest…', 'Founded in 1579 during the Catholic Counter Reformation, Vilnius University was run by Jesuits for two centuries. During the 19th century it became one of…']


In [754]:
# Make into a DataFrame
lithuania_df = pd.DataFrame(all_lithuania_attractions, columns=['Attraction'])
lithuania_df['Description'] = all_descriptions_lithuania
lithuania_df['Country'] = 'Lithuania'
lithuania_df['Continent'] = 'Europe'
lithuania_df.head()


Unnamed: 0,Attraction,Description,Country,Continent
0,Hill of Crosses,Lithuania's fabled Hill of Crosses is a symbol...,Lithuania,Europe
1,Palace of the Grand Dukes of Lithuania,"If you only see one museum in Vilnius, make it...",Lithuania,Europe
2,Vilnius Cathedral,"Stately Vilnius Cathedral, divorced from its f...",Lithuania,Europe
3,Grūtas Park,"With Soviet-era statues of Lenin, Stalin and p...",Lithuania,Europe
4,Vilnius University,Founded in 1579 during the Catholic Counter Re...,Lithuania,Europe


In [755]:
# Save DF to local comp
lithuania_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/lithuania_df.csv')

## Repeat for all top Destinations

### 3. Fiji

In [756]:
fiji_html = 'https://www.lonelyplanet.com/fiji/attractions'

In [757]:
# Create list of all htmls for Paris
fiji_htmls = [fiji_html]
for i in range(2, 3):
    base = 'https://www.lonelyplanet.com/fiji/attractions'
    page = '?page'
    end = "=" + str(i)
    html = base + page + end
    fiji_htmls.append(str(html))
print(fiji_htmls)

['https://www.lonelyplanet.com/fiji/attractions', 'https://www.lonelyplanet.com/fiji/attractions?page=2']


In [758]:
all_attractions_fiji = get_all_attractions(fiji_htmls)
print(len(all_attractions_fiji))
print(all_attractions_fiji[:5])

80
['Colo-i-Suva Forest Park', 'Fiji Museum', 'Suva Municipal Market', 'Sri Siva Subramaniya Swami Temple', 'Mariamma Temple']


In [759]:
all_descriptions_fiji = get_all_descriptions(fiji_htmls)
print(len(all_descriptions_fiji))
print(all_descriptions_fiji[:5])

80
['Colo-i-Suva (pronounced tholo-ee-soo-va) is a 2.5-sq-km oasis of lush rainforest teeming with tropical plants and vivid and melodic bird life. The 6.5km…', 'This museum offers a great journey into Fiji’s historical and cultural and evolution. To enjoy the exhibits in chronological order, start with the…', 'It’s the beating heart of Suva and a great place to spend an hour or so poking around with a camera. The boys with barrows own the lanes and they aren’t…', 'This riotously bright Hindu temple is one of the few places outside India where you can see traditional Dravidian architecture; the wooden carvings of…', "The South Indian fire-walking festival is held here during July or August. Of all Fiji's cultural rituals, the extraordinary art of fire walking is…"]


In [760]:
fiji_df = pd.DataFrame(all_attractions_fiji, columns=['Attraction'])
fiji_df['Description'] = all_descriptions_fiji
fiji_df['Country'] = 'Fiji'
fiji_df['Continent'] = 'Asia'
fiji_df.head()

Unnamed: 0,Attraction,Description,Country,Continent
0,Colo-i-Suva Forest Park,Colo-i-Suva (pronounced tholo-ee-soo-va) is a ...,Fiji,Asia
1,Fiji Museum,This museum offers a great journey into Fiji’s...,Fiji,Asia
2,Suva Municipal Market,It’s the beating heart of Suva and a great pla...,Fiji,Asia
3,Sri Siva Subramaniya Swami Temple,This riotously bright Hindu temple is one of t...,Fiji,Asia
4,Mariamma Temple,The South Indian fire-walking festival is held...,Fiji,Asia


In [761]:
fiji_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/fiji_df.csv')

### 4. Laos

In [762]:
laos_html = 'https://www.lonelyplanet.com/laos/attractions'

In [763]:
# Create list of all htmls for Paris
laos_htmls = [laos_html]
for i in range(2, 4):
    base = 'https://www.lonelyplanet.com/laos/attractions'
    page = '?page'
    end = "=" + str(i)
    html = base + page + end
    laos_htmls.append(str(html))
print(laos_htmls)

['https://www.lonelyplanet.com/laos/attractions', 'https://www.lonelyplanet.com/laos/attractions?page=2', 'https://www.lonelyplanet.com/laos/attractions?page=3']


In [764]:
all_attractions_laos = get_all_attractions(laos_htmls)
print(len(all_attractions_laos))
print(all_attractions_laos[:5])

120
['Wat Xieng Thong', 'Vieng Xai Caves', 'Wat Phu Champasak', 'Phu Si', 'Tat Kuang Si']


In [765]:
all_descriptions_laos = get_all_descriptions(laos_htmls)
print(len(all_descriptions_laos))
print(all_descriptions_laos[:5])

120
["Luang Prabang's best-known monastery is centred on a 1560 sǐm (ordination hall). Its roofs sweep low to the ground and there's a stunning 'tree of life'…", "Joining a truly fascinating 18-point tour is the only way to see Vieng Xai's seven most important war-shelter cave complexes, set in beautiful gardens…", 'Bucolic Wat Phu sits in graceful decrepitude, and while it lacks the arresting enormity of Angkor in Cambodia, given its few visitors and more dramatic…', 'Dominating the old city centre and a favourite with sunset junkies, the 100m-tall Phu Si (prepare your legs for a steep 329-step ascent) is crowned by a…', 'Thirty kilometres southwest of Luang Prabang, Tat Kuang Si is a many-tiered waterfall tumbling over limestone formations into a series of cool, swimmable…']


In [766]:
laos_df = pd.DataFrame(all_attractions_laos, columns=['Attraction'])
laos_df['Description'] = all_descriptions_laos
laos_df['Country'] = 'Laos'
laos_df['Continent'] = 'Asia'
laos_df.head()

Unnamed: 0,Attraction,Description,Country,Continent
0,Wat Xieng Thong,Luang Prabang's best-known monastery is centre...,Laos,Asia
1,Vieng Xai Caves,Joining a truly fascinating 18-point tour is t...,Laos,Asia
2,Wat Phu Champasak,"Bucolic Wat Phu sits in graceful decrepitude, ...",Laos,Asia
3,Phu Si,Dominating the old city centre and a favourite...,Laos,Asia
4,Tat Kuang Si,"Thirty kilometres southwest of Luang Prabang, ...",Laos,Asia


In [767]:
laos_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/laos_df.csv')

### 5. Kazakhstan

In [768]:
kazakhstan_html = 'https://www.lonelyplanet.com/kazakhstan/attractions'

In [769]:
# Create list of all htmls for Paris
kazakhstan_htmls = [kazakhstan_html]
for i in range(2, 4):
    base = 'https://www.lonelyplanet.com/kazakhstan/attractions'
    page = '?page'
    end = "=" + str(i)
    html = base + page + end
    kazakhstan_htmls.append(str(html))
print(kazakhstan_htmls)

['https://www.lonelyplanet.com/kazakhstan/attractions', 'https://www.lonelyplanet.com/kazakhstan/attractions?page=2', 'https://www.lonelyplanet.com/kazakhstan/attractions?page=3']


In [770]:
all_attractions_kazakhstan = get_all_attractions(kazakhstan_htmls)
print(len(all_attractions_kazakhstan))
print(all_attractions_kazakhstan[:5])

120
['Beket-Ata', 'National Museum of the Republic of Kazakhstan', 'Charyn Canyon', 'KarLag Museum', 'Khan Shatyr']


In [771]:
all_descriptions_kazakstan = get_all_descriptions(kazakhstan_htmls)
print(len(all_descriptions_kazakstan))
print(all_descriptions_kazakstan[:5])

120
['Some 285km east of Aktau, Beket-Ata is an important and extremely popular place of pilgrimage for those wishing to visit the underground mosque and final…', 'This huge blue-glass-and-white-marble museum covers the history and culture of Kazakhstan from ancient to modern times. Themed halls comprise interactive…', 'Over millions of years, the swift Charyn (Sharyn) River has carved a truly spectacular 150m- to 300m-deep canyon into the otherwise flat steppe some 200km…', "Housed in the old KarLag headquarters building, this poignant museum walks you through KarLag's role in the Soviet Gulag Archipelago. The repression of…", "Nur-Sultan's most extraordinary building (so far), the Khan Shatyr is a 150m-high, translucent, tentlike structure made of ethylene tetrafluoroethylene …"]


In [772]:
kazakhstan_df = pd.DataFrame(all_attractions_kazakhstan, columns=['Attraction'])
kazakhstan_df['Description'] = all_descriptions_kazakstan
kazakhstan_df['Country'] = 'Kazakhstan'
kazakhstan_df['Continent'] = 'Asia'
kazakhstan_df.head()

Unnamed: 0,Attraction,Description,Country,Continent
0,Beket-Ata,"Some 285km east of Aktau, Beket-Ata is an impo...",Kazakhstan,Asia
1,National Museum of the Republic of Kazakhstan,This huge blue-glass-and-white-marble museum c...,Kazakhstan,Asia
2,Charyn Canyon,"Over millions of years, the swift Charyn (Shar...",Kazakhstan,Asia
3,KarLag Museum,Housed in the old KarLag headquarters building...,Kazakhstan,Asia
4,Khan Shatyr,Nur-Sultan's most extraordinary building (so f...,Kazakhstan,Asia


In [773]:
kazakhstan_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/kazakhstan_df.csv')

### 6. Paraguay

In [774]:
paraguay_html = 'https://www.lonelyplanet.com/paraguay/attractions'

In [775]:
all_attractions_paraguay = get_attractions_one_page(paraguay_html)
print(len(all_attractions_paraguay))
print(all_attractions_paraguay[:5])

37
['Yaguarón Church', 'Trinidad', 'Museo Jesuítica de Santa Fe', 'Panteón de los Héroes', 'Casa de la Independencia']


In [776]:
all_descriptions_paraguay = get_all_descriptions_one_page(paraguay_html)
print(len(all_descriptions_paraguay))
print(all_descriptions_paraguay[:5])

37
['This 18th-century Franciscan church is a landmark of colonial architecture that is not to be missed. The simple design of the exterior, with its separate…', 'A visually striking red-sandstone structure with an ornate style incorporating Roman arches, and strongly featuring the passion-flower motif, signature of…', 'A must-see for those interested in Jesuit history, housing fine examples of religious carving. The indigenous carvers were taught their trade by a Jesuit…', "Asunción's most instantly recognizable building, the imposing Panteón de los Héroes protects the remains of Mariscal Francisco Solano López and other key…", 'The Casa de la Independencia dates from 1772 and is where Paraguay became the first country on the continent to declare its independence in 1811. Rooms…']


In [777]:
paraguay_df = pd.DataFrame(all_attractions_paraguay, columns=['Attraction'])
paraguay_df['Description'] = all_descriptions_paraguay
paraguay_df['Country'] = 'Paraguay'
paraguay_df['Continent'] = 'South America'
paraguay_df.head()

Unnamed: 0,Attraction,Description,Country,Continent
0,Yaguarón Church,This 18th-century Franciscan church is a landm...,Paraguay,South America
1,Trinidad,A visually striking red-sandstone structure wi...,Paraguay,South America
2,Museo Jesuítica de Santa Fe,A must-see for those interested in Jesuit hist...,Paraguay,South America
3,Panteón de los Héroes,Asunción's most instantly recognizable buildin...,Paraguay,South America
4,Casa de la Independencia,The Casa de la Independencia dates from 1772 a...,Paraguay,South America


In [778]:
paraguay_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/paraguay_df.csv')

### 7. Trinidad & Tobago

In [779]:
trin_n_tob_html = 'https://www.lonelyplanet.com/trinidad-and-tobago/attractions'

In [780]:
all_attractions_trin_n_tob = get_attractions_one_page(trin_n_tob_html)
print(len(all_attractions_trin_n_tob))
print(all_attractions_trin_n_tob[:5])

40
['Asa Wright Nature Centre', 'Pitch Lake', 'Pirate’s Bay', 'Corbin Local Wildlife Park', 'Queen’s Park Savannah']


In [781]:
all_descriptions_trin = get_all_descriptions_one_page(trin_n_tob_html)
print(len(all_descriptions_trin))
print(all_descriptions_trin[:5])

40
['A former cocoa and coffee plantation transformed into an 600-hectare nature reserve, this place blows the minds of birdwatchers. Even if you can’t tell a…', "About 25km southwest of San Fernando, and just south of the small town of La Brea, this slowly bubbling black 'lake' is perhaps Trinidad’s greatest oddity…", "Past Charlotteville's pier, a dirt track winds up and around the cliff to concrete steps that descend to Pirate’s Bay, which offers excellent snorkeling…", "Established by hunter turned conservationist Roy Corbin in Tobago's forest-covered interior, just inland of the windward coast's Hope Bay, this nonprofit…", 'Once part of a sugar plantation, formerly home to a racecourse and now the epicenter of the annual Carnival, this public park is encircled by a 3.7km…']


In [782]:
trin_n_tob_df = pd.DataFrame(all_attractions_trin_n_tob, columns=['Attraction'])
trin_n_tob_df['Description'] = all_descriptions_trin
trin_n_tob_df['Country'] = 'Trinidad & Tobago'
trin_n_tob_df['Continent'] = 'South America'
trin_n_tob_df.head()

Unnamed: 0,Attraction,Description,Country,Continent
0,Asa Wright Nature Centre,A former cocoa and coffee plantation transform...,Trinidad & Tobago,South America
1,Pitch Lake,"About 25km southwest of San Fernando, and just...",Trinidad & Tobago,South America
2,Pirate’s Bay,"Past Charlotteville's pier, a dirt track winds...",Trinidad & Tobago,South America
3,Corbin Local Wildlife Park,Established by hunter turned conservationist R...,Trinidad & Tobago,South America
4,Queen’s Park Savannah,"Once part of a sugar plantation, formerly home...",Trinidad & Tobago,South America


In [783]:
trin_n_tob_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/trin_n_tob_df.csv')

### 8. Vanuatu

In [784]:
vanuatu_html = 'https://www.lonelyplanet.com/vanuatu/attractions'

In [785]:
all_attractions_vanuatu = get_attractions_one_page(vanuatu_html)
print(len(all_attractions_vanuatu))
print(all_attractions_vanuatu[:5])

31
['Mele Cascades', 'National Museum of Vanuatu', 'Port Olry', 'Iririki Island', 'Hideaway Island']


In [786]:
all_descriptions_vanuatu = get_all_descriptions_one_page(vanuatu_html)
print(len(all_descriptions_vanuatu))
print(all_descriptions_vanuatu[:5])

31
['This popular and photogenic swimming spot is 10km from Port Vila. A series of clear aquamarine pools terrace up the hillside, culminating in an impressive…', 'This excellent museum, in a soaring traditional building opposite the parliament, has a well-displayed collection of traditional artefacts such as tamtam …', 'At the end of the sealed road you come to Port Olry, a small francophone fishing village with a stunning curve of white-sand beach and eye-watering…', 'Iririki is the green, bungalow-laden island right across from Port Vila’s waterfront; it was closed following Cyclone Pam in 2015 but is expected to…', "Just 100m or so offshore from Mele Beach, Hideaway Island isn't all that hidden but it's one of Vila's favourite spots for snorkelling, diving or just…"]


In [787]:
vanuatu_df = pd.DataFrame(all_attractions_vanuatu, columns=['Attraction'])
vanuatu_df['Description'] = all_descriptions_vanuatu
vanuatu_df['Country'] = 'Vanuatu'
vanuatu_df['Continent'] = 'Oceania'
vanuatu_df.head()

Unnamed: 0,Attraction,Description,Country,Continent
0,Mele Cascades,This popular and photogenic swimming spot is 1...,Vanuatu,Oceania
1,National Museum of Vanuatu,"This excellent museum, in a soaring traditiona...",Vanuatu,Oceania
2,Port Olry,At the end of the sealed road you come to Port...,Vanuatu,Oceania
3,Iririki Island,"Iririki is the green, bungalow-laden island ri...",Vanuatu,Oceania
4,Hideaway Island,"Just 100m or so offshore from Mele Beach, Hide...",Vanuatu,Oceania


In [788]:
vanuatu_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/vanuatu_df.csv')

### 9. Slovakia

In [789]:
slovakia_html = 'https://www.lonelyplanet.com/slovakia/attractions'

In [790]:
# Create list of all htmls for Paris
slovakia_htmls = [slovakia_html]
for i in range(2, 3):
    base = 'https://www.lonelyplanet.com/slovakia/attractions'
    page = '?page'
    end = "=" + str(i)
    html = base + page + end
    slovakia_htmls.append(str(html))
print(slovakia_htmls)

['https://www.lonelyplanet.com/slovakia/attractions', 'https://www.lonelyplanet.com/slovakia/attractions?page=2']


In [791]:
all_attractions_slovakia = get_all_attractions(slovakia_htmls)
print(len(all_attractions_slovakia))
print(all_attractions_slovakia[:5])

80
['Spiš Castle', 'Danubiana Meulensteen Art Museum', 'Orava Castle', 'Slovenský Raj National Park', 'Hlavné Námestie']


In [792]:
all_descriptions_slovakia = get_all_descriptions(slovakia_htmls)
print(len(all_descriptions_slovakia))
print(all_descriptions_slovakia[:5])

80
["Crowning a travertine hill above Spišské Podhradie village, this vast, Unesco-listed fortification is one of Central Europe's biggest castle complexes…", 'The windswept location of this world-class art gallery is as invigorating as the works on display. On a promontory jutting into the Danube, the…', 'The sight of Orava Castle, roosting on a forest-cloaked hilltop, sends a chill down the spine. Vampire aficionados may recognise its stern silhouette from…', "Slovenský Raj's rocky plateaus, primeval forests and interlacing streams form some of Slovakia's most picturesque hiking terrain. Treks often involve…", "The nucleus for Bratislava's history, festivals and chic cafe culture is Hlavné nám (Main Sq). There's architectural finery in almost every direction,…"]


In [793]:
slovakia_df = pd.DataFrame(all_attractions_slovakia, columns=['Attraction'])
slovakia_df['Description'] = all_descriptions_slovakia
slovakia_df['Country'] = 'Slovakia'
slovakia_df['Continent'] = 'Europe'
slovakia_df.head()

Unnamed: 0,Attraction,Description,Country,Continent
0,Spiš Castle,Crowning a travertine hill above Spišské Podhr...,Slovakia,Europe
1,Danubiana Meulensteen Art Museum,The windswept location of this world-class art...,Slovakia,Europe
2,Orava Castle,"The sight of Orava Castle, roosting on a fores...",Slovakia,Europe
3,Slovenský Raj National Park,"Slovenský Raj's rocky plateaus, primeval fores...",Slovakia,Europe
4,Hlavné Námestie,"The nucleus for Bratislava's history, festival...",Slovakia,Europe


In [794]:
slovakia_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/slovakia_df.csv')

### 10. Armenia

In [795]:
armenia_html = 'https://www.lonelyplanet.com/armenia/attractions'

In [796]:
# Create list of all htmls for Paris
armenia_htmls = [armenia_html]
for i in range(2, 3):
    base = 'https://www.lonelyplanet.com/armenia/attractions'
    page = '?page'
    end = "=" + str(i)
    html = base + page + end
    armenia_htmls.append(str(html))
print(armenia_htmls)

['https://www.lonelyplanet.com/armenia/attractions', 'https://www.lonelyplanet.com/armenia/attractions?page=2']


In [797]:
all_attractions_armenia = get_all_attractions(armenia_htmls)
print(len(all_attractions_armenia))
print(all_attractions_armenia[:5])

80
['Armenian Genocide Memorial & Museum', 'History Museum of Armenia', 'Noravank', 'Old Khndzoresk', 'Cafesjian Center for the Arts']


In [798]:
all_descriptions_armenia = get_all_descriptions(armenia_htmls)
print(len(all_descriptions_armenia))
print(all_descriptions_armenia[:5])

80
['Commemorating the massacre of Armenians in the Ottoman Empire from 1915 to 1922, this institution uses photographs, documents, reports and films to…', "Its simply extraordinary collection of Bronze Age artefacts make this museum Armenia's pre-eminent cultural institution and an essential stop on every…", 'Founded by Bishop Hovhannes in 1205 and sensitively renovated in the 1990s, Noravank (New Monastery) is one of the most spectacular sites in Armenia and…', 'Dug into volcanic sandstone on the slopes of Khor Dzor (Deep Gorge), the village of Old Khndzoresk was inhabited as far back as the 13th century. By the…', "Housed in a vast flight of stone steps known as the Cascade, this arts centre is one of the city's major cultural attractions. Originally conceived in the…"]


In [799]:
armenia_df = pd.DataFrame(all_attractions_armenia, columns=['Attraction'])
armenia_df['Description'] = all_descriptions_armenia
armenia_df['Country'] = 'Armenia'
armenia_df['Continent'] = 'Europe'
armenia_df.head()

Unnamed: 0,Attraction,Description,Country,Continent
0,Armenian Genocide Memorial & Museum,Commemorating the massacre of Armenians in the...,Armenia,Europe
1,History Museum of Armenia,Its simply extraordinary collection of Bronze ...,Armenia,Europe
2,Noravank,Founded by Bishop Hovhannes in 1205 and sensit...,Armenia,Europe
3,Old Khndzoresk,Dug into volcanic sandstone on the slopes of K...,Armenia,Europe
4,Cafesjian Center for the Arts,Housed in a vast flight of stone steps known a...,Armenia,Europe


In [800]:
armenia_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/armenia_df.csv')

### 11. South Carolina

In [801]:
sc_html = 'https://www.lonelyplanet.com/usa/the-south/south-carolina/attractions'

In [802]:
all_attractions_sc = get_attractions_one_page(sc_html)
print(len(all_attractions_sc))
print(all_attractions_sc[:5])

40
['Aiken-Rhett House', 'Guardians of Charleston Harbor', 'Old Slave Mart Museum', 'Brookgreen Gardens', 'Heyward-Washington House']


In [803]:
all_descriptions_sc = get_all_descriptions_one_page(sc_html)
print(len(all_descriptions_sc))
print(all_descriptions_sc[:5])

40
['The only surviving urban town-house complex, this 1820 abode gives a fascinating glimpse into antebellum life on a 45-minute self-guided audio tour. The…', 'The first shots of the Civil War rang out at Fort Sumter, on a pentagon-shaped island in the harbor. A Confederate stronghold, this fort was shelled to…', "Formerly called Ryan's Mart, this building once housed an open-air market that auctioned African American men, women and children in the mid-1800s, the…", 'These magical gardens, 16 miles south of Myrtle Beach on Hwy 17S, are home to the largest collection of American sculpture in the country, set amid more…', 'As the name hints, this 1772 Georgian-style town house is kind of a big deal because George Washington rented it for a week, and visitors can stand in…']


In [804]:
sc_df = pd.DataFrame(all_attractions_sc, columns=['Attraction'])
sc_df['Description'] = all_descriptions_sc
sc_df['Country'] = 'U.S.'
sc_df['Continent'] = 'North America'
sc_df.head()

Unnamed: 0,Attraction,Description,Country,Continent
0,Aiken-Rhett House,"The only surviving urban town-house complex, t...",U.S.,North America
1,Guardians of Charleston Harbor,The first shots of the Civil War rang out at F...,U.S.,North America
2,Old Slave Mart Museum,"Formerly called Ryan's Mart, this building onc...",U.S.,North America
3,Brookgreen Gardens,"These magical gardens, 16 miles south of Myrtl...",U.S.,North America
4,Heyward-Washington House,"As the name hints, this 1772 Georgian-style to...",U.S.,North America


In [805]:
sc_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/sc_df.csv')

### 12. The Terai, Nepal

In [806]:
terai_html = 'https://www.lonelyplanet.com/nepal/the-terai-and-mahabharat-range/attractions'

In [807]:
all_attractions_terai = get_attractions_one_page(terai_html)
print(len(all_attractions_terai))
print(all_attractions_terai[:5])

22
['Janaki Mandir', 'Shashwat Dham', 'Daman Mountain Resort View Tower', 'Amar Narayan Mandir', 'Crocodile Breeding Project']


In [808]:
all_descriptions_terai = get_all_descriptions_one_page(terai_html)
print(len(all_descriptions_terai))
print(all_descriptions_terai[:5])

22
["At the heart of Janakpur lies the marble Janaki Mandir, one of the grander pieces of architecture in Nepal, and the city's must-see sight. Built in…", "Burned out on ancient sites? Here's a contemporary take on the temple compound, and it's very impressive. The centrepiece is a Shiva temple that resembles…", 'Some of the best views of the Himalaya in Nepal can be had from the concrete viewing tower inside the Daman Mountain Resort. Unfortunately, the resort has…', 'At the bottom of Asan Tole (the steep road running east from Sitalpati), the Amar Narayan Mandir is a classic three-tiered, pagoda-style wooden temple…', 'A few hundred metres past the park headquarters in Kasara is a crocodile breeding project, where you can see both gharials and marsh muggers up close. The…']


In [809]:
terai_df = pd.DataFrame(all_attractions_terai, columns=['Attraction'])
terai_df['Description'] = all_descriptions_terai
terai_df['Country'] = 'Nepal'
terai_df['Continent']= 'Asia'
terai_df.head()

Unnamed: 0,Attraction,Description,Country,Continent
0,Janaki Mandir,At the heart of Janakpur lies the marble Janak...,Nepal,Asia
1,Shashwat Dham,Burned out on ancient sites? Here's a contempo...,Nepal,Asia
2,Daman Mountain Resort View Tower,Some of the best views of the Himalaya in Nepa...,Nepal,Asia
3,Amar Narayan Mandir,At the bottom of Asan Tole (the steep road run...,Nepal,Asia
4,Crocodile Breeding Project,A few hundred metres past the park headquarter...,Nepal,Asia


In [810]:
terai_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/terai_df.csv')

### 13. Launceston and the Tamar Valley

In [811]:
launceston_html = 'https://www.lonelyplanet.com/australia/tasmania/launceston/attractions'

In [812]:
all_attractions_launceston = get_attractions_one_page(launceston_html)
print(len(all_attractions_launceston))
print(all_attractions_launceston[:5])

22
['Cataract Gorge', 'Queen Victoria Museum', 'Queen Victoria Art Gallery', 'Franklin House', 'Boag’s Brewery']


In [813]:
all_descriptions_launceston = get_all_descriptions_one_page(launceston_html)
print(len(all_descriptions_launceston))
print(all_descriptions_launceston[:5])

22
["At magnificent Cataract Gorge, right at the city centre's edge, the bushland, cliffs and ice-cold South Esk River feel a million miles from town. At First…", 'Inside the restored and reinvented Inveresk railway yards, QVMAG has the usual assembly of dinosaurs and stuffed animals, but they sit alongside historic…', 'Colonial paintings, including works by John Glover, are the pride of the collection at this art gallery in a meticulously restored 19th-century building…', 'A relatively short drive south of the city, Franklin House is one of Launceston’s most fetching Georgian-era homes. Built in 1838 by former convict and…', 'James Boag’s beer has been brewed on William St since 1881. See the amber alchemy in action on 90-minute guided tours, which include a beer and cheese…']


In [814]:
launceston_df = pd.DataFrame(all_attractions_launceston, columns=['Attraction'])

launceston_df['Description'] = all_descriptions_launceston
launceston_df['Country'] = 'Launceston'
launceston_df['Continent']= 'Australia'
launceston_df.head()

Unnamed: 0,Attraction,Description,Country,Continent
0,Cataract Gorge,"At magnificent Cataract Gorge, right at the ci...",Launceston,Australia
1,Queen Victoria Museum,Inside the restored and reinvented Inveresk ra...,Launceston,Australia
2,Queen Victoria Art Gallery,"Colonial paintings, including works by John Gl...",Launceston,Australia
3,Franklin House,"A relatively short drive south of the city, Fr...",Launceston,Australia
4,Boag’s Brewery,James Boag’s beer has been brewed on William S...,Launceston,Australia


In [815]:
launceston_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/launceston_df.csv')

### 14. Bavaria, Germany

In [816]:
bavaria_html = 'https://www.lonelyplanet.com/germany/bavaria/attractions'

In [817]:
# Create list of all htmls for Paris
bavaria_htmls = [bavaria_html]
for i in range(2, 7):
    base = 'https://www.lonelyplanet.com/germany/bavaria/attractions'
    page = '?page'
    end = "=" + str(i)
    html = base + page + end
    bavaria_htmls.append(str(html))
print(bavaria_htmls)

['https://www.lonelyplanet.com/germany/bavaria/attractions', 'https://www.lonelyplanet.com/germany/bavaria/attractions?page=2', 'https://www.lonelyplanet.com/germany/bavaria/attractions?page=3', 'https://www.lonelyplanet.com/germany/bavaria/attractions?page=4', 'https://www.lonelyplanet.com/germany/bavaria/attractions?page=5', 'https://www.lonelyplanet.com/germany/bavaria/attractions?page=6']


In [818]:
all_attractions_bavaria = get_all_attractions(bavaria_htmls)
print(len(all_attractions_bavaria))
print(all_attractions_bavaria[:5])

240
['Schloss Linderhof', 'Schloss Neuschwanstein', 'Schloss Hohenschwangau', 'Zugspitze', 'KZ-Gedenkstätte Dachau']


In [819]:
all_descriptions_bavaria = get_all_descriptions(bavaria_htmls)
print(len(all_descriptions_bavaria))
print(all_descriptions_bavaria[:5])

240
['A pocket-sized trove of weird treasures, Schloss Linderhof was Ludwig II’s smallest but most sumptuous palace, and the only one he lived to see fully…', 'Appearing through the mountaintops like a mirage, Schloss Neuschwanstein was the model for Disney’s Sleeping Beauty castle. King Ludwig II planned this…', 'King Ludwig II grew up at the sun-yellow Schloss Hohenschwangau and later enjoyed summers here until his death in 1886. His father, Maximilian II, built…', 'On good days, views from Germany’s rooftop extend into four countries. The return trip starts in Garmisch aboard a cogwheel train (Zahnradbahn) that chugs…', 'Officially called the KZ-Gedenkstätte Dachau, this was the Nazis’ first concentration camp, built by Heinrich Himmler in March 1933 to house political…']


In [820]:
bavaria_df = pd.DataFrame(all_attractions_bavaria, columns=['Attraction'])

bavaria_df['Description'] = all_descriptions_bavaria
bavaria_df['Country'] = 'Germany'
bavaria_df['Continent'] = 'Europe'
bavaria_df.head()

Unnamed: 0,Attraction,Description,Country,Continent
0,Schloss Linderhof,"A pocket-sized trove of weird treasures, Schlo...",Germany,Europe
1,Schloss Neuschwanstein,Appearing through the mountaintops like a mira...,Germany,Europe
2,Schloss Hohenschwangau,King Ludwig II grew up at the sun-yellow Schlo...,Germany,Europe
3,Zugspitze,"On good days, views from Germany’s rooftop ext...",Germany,Europe
4,KZ-Gedenkstätte Dachau,"Officially called the KZ-Gedenkstätte Dachau, ...",Germany,Europe


In [821]:
bavaria_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/bavaria_df.csv')

### 15. England

In [822]:
england_html = 'https://www.lonelyplanet.com/england/attractions'

In [823]:
# Create list of all htmls for Paris
england_htmls = [england_html]
for i in range(2, 30):
    base = 'https://www.lonelyplanet.com/england/attractions'
    page = '?page'
    end = "=" + str(i)
    html = base + page + end
    england_htmls.append(str(html))
print(england_htmls)

['https://www.lonelyplanet.com/england/attractions', 'https://www.lonelyplanet.com/england/attractions?page=2', 'https://www.lonelyplanet.com/england/attractions?page=3', 'https://www.lonelyplanet.com/england/attractions?page=4', 'https://www.lonelyplanet.com/england/attractions?page=5', 'https://www.lonelyplanet.com/england/attractions?page=6', 'https://www.lonelyplanet.com/england/attractions?page=7', 'https://www.lonelyplanet.com/england/attractions?page=8', 'https://www.lonelyplanet.com/england/attractions?page=9', 'https://www.lonelyplanet.com/england/attractions?page=10', 'https://www.lonelyplanet.com/england/attractions?page=11', 'https://www.lonelyplanet.com/england/attractions?page=12', 'https://www.lonelyplanet.com/england/attractions?page=13', 'https://www.lonelyplanet.com/england/attractions?page=14', 'https://www.lonelyplanet.com/england/attractions?page=15', 'https://www.lonelyplanet.com/england/attractions?page=16', 'https://www.lonelyplanet.com/england/attractions?page=

In [824]:
# Warning! Takes a bit long to load 
all_attractions_england = get_all_attractions(england_htmls)
print(len(all_attractions_england))
print(all_attractions_england[:5])

1160
['Windsor Castle', 'Westminster Abbey', 'Roman Baths', 'Canterbury Cathedral', 'Natural History Museum']


In [825]:
all_descriptions_england = get_all_descriptions(england_htmls)
print(len(all_descriptions_england))
print(all_descriptions_england[:5])

1160
['The world’s largest and oldest continuously occupied fortress, Windsor Castle is a majestic vision of battlements and towers. Used for state occasions, it…', "A splendid mixture of architectural styles, Westminster Abbey is considered the finest example of Early English Gothic. It's not merely a beautiful place…", "Welcome to one of Northern Europe's most significant Roman sites. Today more than a million visitors a year come to see its historic finds, atmospheric…", 'A rich repository of more than 1400 years of Christian history, Canterbury Cathedral is the Church of England’s mother ship, and a truly extraordinary…', 'With its thunderous, animatronic dinosaur, riveting displays about planet earth, outstanding Darwin Centre and architecture straight from a Gothic fairy…']


In [826]:
england_df = pd.DataFrame(all_attractions_england, columns=['Attraction'])
england_df['Description'] = all_descriptions_england
england_df['Country'] = 'England'
england_df['Continent'] = 'Europe'
england_df.head()

Unnamed: 0,Attraction,Description,Country,Continent
0,Windsor Castle,The world’s largest and oldest continuously oc...,England,Europe
1,Westminster Abbey,"A splendid mixture of architectural styles, We...",England,Europe
2,Roman Baths,Welcome to one of Northern Europe's most signi...,England,Europe
3,Canterbury Cathedral,A rich repository of more than 1400 years of C...,England,Europe
4,Natural History Museum,"With its thunderous, animatronic dinosaur, riv...",England,Europe


In [827]:
england_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/england_df.csv')

### 16. Jordan

In [828]:
jordan_html = 'https://www.lonelyplanet.com/jordan/attractions'

In [829]:
# Create list of all htmls for Jordan
jordan_htmls = [jordan_html]
for i in range(2, 5):
    base = 'https://www.lonelyplanet.com/jordan/attractions'
    page = '?page'
    end = "=" + str(i)
    html = base + page + end
    jordan_htmls.append(str(html))
print(jordan_htmls)

['https://www.lonelyplanet.com/jordan/attractions', 'https://www.lonelyplanet.com/jordan/attractions?page=2', 'https://www.lonelyplanet.com/jordan/attractions?page=3', 'https://www.lonelyplanet.com/jordan/attractions?page=4']


In [830]:
all_attractions_jordan = get_all_attractions(jordan_htmls)
print(len(all_attractions_jordan))
print(all_attractions_jordan[:5])

160
['Petra', 'Citadel', 'Shaumari Wildlife Reserve', 'Darat Al Funun', 'Royal Automobile Museum']


In [831]:
all_descriptions_jordan = get_all_descriptions(jordan_htmls)
print(len(all_descriptions_jordan))
print(all_descriptions_jordan[:5])

160
['The spectacular sandstone city of Petra was built in the 3rd century BC by the Nabataeans, who carved palaces, temples, tombs, storerooms and stables from…', 'The area known as the Citadel sits on the highest hill in Amman, Jebel Al Qala’a (about 850m above sea level), and is the site of ancient Rabbath-Ammon…', 'Established in 1975 by the Royal Society for the Conservation of Nature (RSCN), this 22 sq km reserve was created with the aim of reintroducing wildlife…', 'On the hillside to the north of the downtown area, this cultural haven is dedicated to contemporary art. The main building features an excellent art…', "You really don't have to be a car enthusiast to enjoy this museum, which displays more than 70 classic cars and motorbikes from the personal collection of…"]


In [832]:
jordan_df = pd.DataFrame(all_attractions_jordan, columns=['Attraction'])

jordan_df['Description'] = all_descriptions_jordan
jordan_df['Country'] = 'Jordan'
jordan_df['Continent'] = 'Asia'
jordan_df.head()

Unnamed: 0,Attraction,Description,Country,Continent
0,Petra,The spectacular sandstone city of Petra was bu...,Jordan,Asia
1,Citadel,The area known as the Citadel sits on the high...,Jordan,Asia
2,Shaumari Wildlife Reserve,Established in 1975 by the Royal Society for t...,Jordan,Asia
3,Darat Al Funun,On the hillside to the north of the downtown a...,Jordan,Asia
4,Royal Automobile Museum,You really don't have to be a car enthusiast t...,Jordan,Asia


In [833]:
jordan_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/jordan_df.csv')

### 17. Toulouse, France

In [834]:
toulouse_html = 'https://www.lonelyplanet.com/france/toulouse-gers-vallee-du-tarn/attractions'

In [835]:
all_attractions_toulouse= get_attractions_one_page(toulouse_html)
print(len(all_attractions_toulouse))
print(all_attractions_toulouse[:5])

32
['Cité de l’Espace', 'Couvent des Jacobins', 'Place du Capitole', 'Lectoure', 'Musée et Jardin du Canal du Midi']


In [836]:
all_descriptions_toulouse = get_all_descriptions_one_page(toulouse_html)
print(len(all_descriptions_toulouse))
print(all_descriptions_toulouse[:5])

32
["The fantastic space museum on the city's eastern outskirts brings Toulouse's illustrious aeronautical history to life through hands-on exhibits, including…", 'With its palm tree vaulted ceiling, the Couvent des Jacobins is one of Toulouse’s oldest and most recognizable buildings', 'Toulouse’s grandiose main square is the focal point in the heart of France’s "Pink City"', "It's something of a surprise to come across a place of such historical wealth in such a remote part of the Gers département, well away from any major…", 'Understand France’s mightiest man-made waterway, the Unesco World Heritage Canal du Midi, through illuminating exhibitions and short films at this museum…']


In [837]:
toulouse_df = pd.DataFrame(all_attractions_toulouse, columns=['Attraction'])

toulouse_df['Description'] = all_descriptions_toulouse
toulouse_df['Country'] = 'France'
toulouse_df['Continent'] = 'Europe'
toulouse_df.head()

Unnamed: 0,Attraction,Description,Country,Continent
0,Cité de l’Espace,The fantastic space museum on the city's easte...,France,Europe
1,Couvent des Jacobins,"With its palm tree vaulted ceiling, the Couven...",France,Europe
2,Place du Capitole,Toulouse’s grandiose main square is the focal ...,France,Europe
3,Lectoure,It's something of a surprise to come across a ...,France,Europe
4,Musée et Jardin du Canal du Midi,Understand France’s mightiest man-made waterwa...,France,Europe


In [838]:
toulouse_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/toulouse_df.csv')

### 18. Puducherry (Pondicherry), India

In [839]:
india_html = 'https://www.lonelyplanet.com/india/tamil-nadu/puducherry-pondicherry/attractions'

In [840]:
all_attractions_india= get_attractions_one_page(india_html)
print(len(all_attractions_india))
print(all_attractions_india[:5])

13
['Seafront', 'Sri Aurobindo Ashram', 'Puducherry Museum', 'Institut Français de Pondichéry', 'Sri Manakula Vinayagar Temple']


In [841]:
all_descriptions_india = get_all_descriptions_one_page(india_html)
print(len(all_descriptions_india))
print(all_descriptions_india[:5])

13
['Pondy is a seaside town, but that doesn’t make it a beach destination; the city’s sand is a thin strip of dirty brown that slurps into a seawall of jagged…', 'Founded in 1926 by Sri Aurobindo and a French-born woman, ‘the Mother’, this famous spiritual community has about 2000 members in its many departments…', 'Goodness knows how this converted late-18th-century villa keeps its artefacts from disintegrating, considering there’s a whole floor of French-era…', 'This grand 19th-century neoclassical building is also a flourishing research institution devoted to Indian culture, history and ecology. Visitors can…', 'Pondy may have more churches than most Indian towns, but the Hindu faith still reigns supreme. Pilgrims, tourists and the curious get a head pat from the…']


In [842]:
india_df = pd.DataFrame(all_attractions_india, columns=['Attraction'])

india_df['Description'] = all_descriptions_india
india_df['Country'] = 'India'
india_df['Continent'] = 'Asia'
india_df.head()

Unnamed: 0,Attraction,Description,Country,Continent
0,Seafront,"Pondy is a seaside town, but that doesn’t make...",India,Asia
1,Sri Aurobindo Ashram,Founded in 1926 by Sri Aurobindo and a French-...,India,Asia
2,Puducherry Museum,Goodness knows how this converted late-18th-ce...,India,Asia
3,Institut Français de Pondichéry,This grand 19th-century neoclassical building ...,India,Asia
4,Sri Manakula Vinayagar Temple,Pondy may have more churches than most Indian ...,India,Asia


In [843]:
india_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/india_df.csv')

### 19. Chiang Mai, Thailand

In [844]:
thailand_html = 'https://www.lonelyplanet.com/thailand/chiang-mai-province/chiang-mai/attractions'

In [845]:
all_attractions_thailand= get_attractions_one_page(thailand_html)
print(len(all_attractions_thailand))
print(all_attractions_thailand[:5])

40
['Wat Phra That Doi Suthep', 'Wat Chedi Luang', 'Wat Phra Singh', 'Talat Warorot', 'Wat Phan Tao']


In [846]:
all_descriptions_thailand = get_all_descriptions_one_page(thailand_html)
print(len(all_descriptions_thailand))
print(all_descriptions_thailand[:5])

40
["Overlooking the city from its mountain throne, Wat Phra That Doi Suthep is one of northern Thailand's most sacred temples, and its founding legend is…", "Wat Chedi Luang isn't as grand as Wat Phra Singh, but its towering, ruined Lanna-style chedi (built in 1441) is much taller and the sprawling compound…", "Chiang Mai's most revered temple, Wat Phra Singh is dominated by an enormous, mosaic-inlaid wí·hăhn (sanctuary). Its prosperity is plain to see from the…", "Chiang Mai's oldest public market, Warorot (also spelt Waroros) is a great place to connect with the city's Thai soul. Alongside souvenir vendors you'll…", 'Without doubt the most atmospheric wát in the old city, this teak marvel sits in the shadow of Wat Chedi Luang. Set in a compound full of fluttering…']


In [847]:
thailand_df = pd.DataFrame(all_attractions_thailand, columns=['Attraction'])

thailand_df['Description'] = all_descriptions_thailand
thailand_df['Country'] = 'Thailand'
thailand_df['Continent'] = 'Asia'
thailand_df.head()

Unnamed: 0,Attraction,Description,Country,Continent
0,Wat Phra That Doi Suthep,"Overlooking the city from its mountain throne,...",Thailand,Asia
1,Wat Chedi Luang,Wat Chedi Luang isn't as grand as Wat Phra Sin...,Thailand,Asia
2,Wat Phra Singh,"Chiang Mai's most revered temple, Wat Phra Sin...",Thailand,Asia
3,Talat Warorot,"Chiang Mai's oldest public market, Warorot (al...",Thailand,Asia
4,Wat Phan Tao,Without doubt the most atmospheric wát in the ...,Thailand,Asia


In [848]:
india_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/india_df.csv')

### 20. Genoa, Italy

In [849]:
genoa_html = 'https://www.lonelyplanet.com/italy/liguria-piedmont-and-valle-daosta/genoa/attractions'

In [850]:
all_attractions_genoa= get_attractions_one_page(genoa_html)
print(len(all_attractions_genoa))
print(all_attractions_genoa[:5])

28
['Musei di Strada Nuova', 'Palazzo Reale', 'Old City', 'Boccadasse', 'Palazzo Bianco']


In [851]:
all_descriptions_genoa = get_all_descriptions_one_page(genoa_html)
print(len(all_descriptions_genoa))
print(all_descriptions_genoa[:5])


28
['Skirting the northern edge of the old city limits, pedestrianised Via Garibaldi (formerly Strada Nuova) was planned by Galeazzo Alessi in the 16th century…', "If you only get the chance to visit one of the Palazzi dei Rolli (group of palaces belonging to the city's most eminent families), make it this one. A…", 'The heart of medieval Genoa – bounded by ancient city gates Porta dei Vacca and Porta Soprana, and the streets of Via Cairoli, Via Garibaldi and Via XXV…', 'When the sun is shining, do as the Genovese do and decamp for a passeggiata (late afternoon stroll) along the oceanside promenade, Corso Italia, which…', 'Flemish, Spanish and Italian artists feature at Palazzo Bianco, the second of the triumvirate of palazzi that are together known as the Musei di Strada…']


In [852]:
genoa_df = pd.DataFrame(all_attractions_genoa, columns=['Attraction'])

genoa_df['Description'] = all_descriptions_genoa
genoa_df['Country'] = 'Italy'
genoa_df['Continent'] = 'Europe'
genoa_df.head()

Unnamed: 0,Attraction,Description,Country,Continent
0,Musei di Strada Nuova,Skirting the northern edge of the old city lim...,Italy,Europe
1,Palazzo Reale,If you only get the chance to visit one of the...,Italy,Europe
2,Old City,The heart of medieval Genoa – bounded by ancie...,Italy,Europe
3,Boccadasse,"When the sun is shining, do as the Genovese do...",Italy,Europe
4,Palazzo Bianco,"Flemish, Spanish and Italian artists feature a...",Italy,Europe


In [853]:
genoa_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/genoa_df.csv')

### 21. Osaka, Japan

In [854]:
osaka_html = 'https://www.lonelyplanet.com/japan/kansai/osaka/attractions'

In [855]:
all_attractions_osaka= get_attractions_one_page(osaka_html)
print(len(all_attractions_osaka))
print(all_attractions_osaka[:5])

40
['Abeno Harukas', 'Osaka-jō', 'Dōtombori', 'Amerika-Mura', 'National Museum of Ethnology']


In [856]:
all_descriptions_osaka = get_all_descriptions_one_page(osaka_html)
print(len(all_descriptions_osaka))
print(all_descriptions_osaka[:5])

40
["This César Pelli–designed tower, which opened in March 2014, is Japan's tallest building (300m, 60 storeys). The observatory on the 16th floor is free,…", "After unifying Japan in the late 16th century, General Toyotomi Hideyoshi built this castle (1583) as a display of power, using, it's said, the labour of…", "Highly photogenic Dōtombori is the city's liveliest night spot and the centre of the southern part of town. Its name comes from the 400-year-old canal,…", 'West of Midō-suji, Amerika-Mura is a compact enclave of hip, youth-focused and offbeat shops, plus cafes, bars, tattoo and piercing parlours, nightclubs,…', "This ambitious museum showcases the world's cultures, presenting them as the continuous (and tangled) strings that they are. There are plenty of…"]


In [857]:
osaka_df = pd.DataFrame(all_attractions_osaka, columns=['Attraction'])

osaka_df['Description'] = all_descriptions_osaka
osaka_df['Country'] = 'Japan'
osaka_df['Continent'] = 'Asia'

osaka_df.head()

Unnamed: 0,Attraction,Description,Country,Continent
0,Abeno Harukas,"This César Pelli–designed tower, which opened ...",Japan,Asia
1,Osaka-jō,"After unifying Japan in the late 16th century,...",Japan,Asia
2,Dōtombori,Highly photogenic Dōtombori is the city's live...,Japan,Asia
3,Amerika-Mura,"West of Midō-suji, Amerika-Mura is a compact e...",Japan,Asia
4,National Museum of Ethnology,This ambitious museum showcases the world's cu...,Japan,Asia


In [858]:
osaka_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/osaka_df.csv')

## Concatenate all the DFs

In [859]:
countries_list = [cameroon_df, lithuania_df, fiji_df, laos_df, kazakhstan_df, paraguay_df, trin_n_tob_df, vanuatu_df, slovakia_df, armenia_df,
                  sc_df, terai_df, launceston_df, bavaria_df, england_df, jordan_df, toulouse_df, india_df, thailand_df, genoa_df, osaka_df ]
countries_df = pd.concat(countries_list)
countries_df.head()


Unnamed: 0,Attraction,Description,Country,Continent
0,Palais Royal,The must-see attraction is the sultan's palace...,Cameroon,Africa
1,Chefferie,"Approached via a ceremonial gate, the compound...",Cameroon,Africa
2,Fon's Palace,Just north of Bamenda is the large Tikar commu...,Cameroon,Africa
3,Limbe Wildlife Centre,"Many zoos in Africa are depressing places, but...",Cameroon,Africa
4,Botanical Gardens,"Limbe's Botanical Gardens, the second oldest i...",Cameroon,Africa


In [860]:
# See all 21 countries in the DF
countries_df['Country'].unique()

array(['Cameroon', 'Lithuania', 'Fiji', 'Laos', 'Kazakhstan', 'Paraguay',
       'Trinidad & Tobago', 'Vanuatu', 'Slovakia', 'Armenia', 'U.S.',
       'Nepal', 'Launceston', 'Germany', 'England', 'Jordan', 'France',
       'India', 'Thailand', 'Italy', 'Japan'], dtype=object)

In [861]:
# See all continents in the DF
countries_df['Continent'].unique()

array(['Africa', 'Europe', 'Asia', 'South America', 'Oceania',
       'North America', 'Australia'], dtype=object)

In [862]:
countries_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2600 entries, 0 to 39
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Attraction   2600 non-null   object
 1   Description  2600 non-null   object
 2   Country      2600 non-null   object
 3   Continent    2600 non-null   object
dtypes: object(4)
memory usage: 101.6+ KB


In [863]:
# Save to CSV
countries_df.to_csv('/Users/rosew/Desktop/Moringa/phase_5/individual_attractions/best_travel_destinations_for_2025_df.csv')

## Analysis to be continued in [EDA and Modeling Notebooks]()