# Project Explanation and requirements

▶ An automatic way to retrieve unstructured data from a website and store them in a structured format.

▶ Gives you structured web data from any public website

▶ General DIY web scraping 
```
i. Identify the target website
ii. Collect URLs of the pages where you want to extract from
iii. Make a request to these URLs to get the HTML of the page
iv. Use locators to find the data in the HTML
v. Save the data in a JSON or CSV file or some other structured format
```

# Step 1: 

```
a. Load the website in your web browser
b. Right-click on an element on the page and select "Inspect Element"/F12
c. Create an object (URL) containing the website address
d. Send a get request for the specific URL’s HTML to the web server
e. Retrieve the HTML data that the web server sends back and convert the data into a BeautifulSoup object.
```


In [None]:
!pip install selenium -q

In [None]:
!wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -

OK


In [None]:
!add-apt-repository "deb http://dl.google.com/linux/chrome/deb/ stable main"

0% [Working]            Hit:1 http://dl.google.com/linux/chrome/deb stable InRelease
0% [Connecting to archive.ubuntu.com (185.125.190.39)] [Connecting to security.                                                                               Hit:2 https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/ InRelease
0% [Waiting for headers] [Waiting for headers] [Waiting for headers] [Connectin                                                                               Get:3 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
0% [Waiting for headers] [3 InRelease 8,377 B/114 kB 7%] [Waiting for headers]                                                                                Hit:4 http://archive.ubuntu.com/ubuntu focal InRelease
0% [3 InRelease 14.2 kB/114 kB 12%] [Waiting for headers] [Waiting for headers]                                                                               Ign:5 https://developer.download.nvidia.com/compute/machine-lear

In [None]:
!apt update

[33m0% [Working][0m            Hit:1 http://dl.google.com/linux/chrome/deb stable InRelease
[33m0% [Connecting to archive.ubuntu.com (185.125.190.39)] [Connecting to security.[0m                                                                               Hit:2 https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/ InRelease
[33m0% [Waiting for headers] [Waiting for headers] [Waiting for headers] [Waiting f[0m                                                                               Hit:3 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu focal InRelease
[33m0% [Waiting for headers] [Waiting for headers] [Waiting for headers] [Connectin[0m                                                                               Hit:4 http://security.ubuntu.com/ubuntu focal-security InRelease
[33m0% [Waiting for headers] [Waiting for headers] [Connecting to ppa.launchpad.net[0m                                                                               Hit:5 ht

In [None]:
!apt install google-chrome-stable

Reading package lists... Done
Building dependency tree       
Reading state information... Done
google-chrome-stable is already the newest version (110.0.5481.100-1).
The following package was automatically installed and is no longer required:
  libnvidia-common-510
Use 'apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 23 not upgraded.
[1;33mW: [0mTarget Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list.d/google-chrome.list:3[0m
[1;33mW: [0mTarget Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list.d/google-chrome.list:3[0m


In [None]:
!apt-get install chromium-browser

Reading package lists... Done
Building dependency tree       
Reading state information... Done
chromium-browser is already the newest version (1:85.0.4183.83-0ubuntu0.20.04.2).
The following package was automatically installed and is no longer required:
  libnvidia-common-510
Use 'apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 23 not upgraded.
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list.d/google-chrome.list:3
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list.d/google-chrome.list:3


## CONNECTING WITH LOCAL HOSTIME

This alternative to use colab instead of Jupyter Notebook

1. Must intall jupyter using, use anaconda prompts:
```
jupyter_http_over_ws
```

2. 
```
pip install jupyter_http_over_ws
jupyter serverextension enable --py jupyter_http_over_ws
```
3. 
```
jupyter notebook --NotebookApp.allow_origin='https://colab.research.google.com' --port=8888 --NotebookApp.port_retries=0
```

In [None]:
# Import the libraries
import csv 
import time
import requests
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver

### Hotels Data Collection

In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

class HotelScraper:
    def __init__(self, url):
        self.url = url
        self.user_agent = ({
            'User-Agent':
                'Mozilla/5.0 (Windows NT 10.0; Win64; x64) \
                AppleWebKit/537.36 (KHTML, like Gecko) \
                Chrome/90.0.4430.212 Safari/537.36',
            'Accept-Language': 'en-US, en;q=0.5'
        })
        self.soup = self.get_page_contents()

    def get_page_contents(self):
        page = requests.get(self.url, headers=self.user_agent)
        return BeautifulSoup(page.text, 'html.parser')

    def scrape_hotel_data(self):
        hotels = []
        for name in self.soup.findAll('div',{'class':'listing_title'}):
            hotels.append(name.text.strip())

        ratings = []
        for rating in self.soup.findAll('a',{'class':'ui_bubble_rating'}):
            ratings.append(rating['alt'])  

        reviews = []
        for review in self.soup.findAll('a',{'class':'review_count'}):
            reviews.append(review.text.strip())

        prices = []
        for p in self.soup.findAll('div',{'class':'price-wrap'}):
            prices.append(p.text.replace('₹','').strip())  

        return {'Hotel Names': hotels, 'Ratings': ratings, 'Number of Reviews': reviews}

    def to_csv(self, file_name):
        mecca = pd.DataFrame.from_dict(self.scrape_hotel_data())
        mecca.to_csv(file_name, index=False, header=True)

# Use the class
hotel_scraper = HotelScraper('https://www.tripadvisor.com/Hotels-g293993-a_ufe.true-Mecca_Makkah_Province-Hotels.html')
hotel_scraper.to_csv('hotels.csv')

In [None]:
hotel_scraper = HotelScraper('https://www.tripadvisor.com/Hotels-g293993-a_ufe.true-Mecca_Makkah_Province-Hotels.html')
hotels_data = hotel_scraper.scrape_hotel_data()
print(hotels_data)

{'Hotel Names': ['1. Swissotel Al Maqam Makkah', '2. Fairmont Makkah Clock Royal Tower', '3. Hilton Makkah Convention Hotel', '4. Hilton Suites Makkah', '5. Pullman ZamZam Makkah', '6. Park Inn by Radisson Makkah Aziziyah', '7. Elaf Bakkah Hotel', '8. DoubleTree by Hilton Makkah Jabal Omar', '9. Jabal Omar Conrad Makkah', '10. M Hotel Makkah by Millennium', '11. Jabal Omar Marriott Hotel, Makkah', '12. Jabal Omar Hyatt Regency Makkah', '13. Makkah Hotel', '14. Swissotel Makkah', '15. Makkah Towers', '16. Millennium Makkah Al Naseem', '17. Makarem Umm Al Qura', '18. Four Points by Sheraton Makkah Al Naseem', '19. Park Inn by Radisson Makkah Al Naseem', '20. M Hotel Al Dana Makkah by Millenium', '21. Hotel Al Shohada', '22. Al Shahba Hotel', '23. Makarem Al-Bait Hotel', '24. Jamjoom Ajyad Hotel', '25. Hibatullah Hotel - Managed by AccorHotels', '26. Makarem Mina Hotel', '27. Al Aseel Ajyad', '28. Al Waleed Tower Hotel', '29. Jewar Al Mashaer Hotel', '30. Novotel Thakher Makkah'], 'Rating

In [None]:
!echo %PATH%

C:\Users\sitisuradi\Anaconda3;C:\Users\sitisuradi\Anaconda3\Library\mingw-w64\bin;C:\Users\sitisuradi\Anaconda3\Library\usr\bin;C:\Users\sitisuradi\Anaconda3\Library\bin;C:\Users\sitisuradi\Anaconda3\Scripts;C:\Users\sitisuradi\Anaconda3\bin;C:\Users\sitisuradi\Anaconda3\condabin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\Program Files\Intel\WiFi\bin;C:\Program Files\Common Files\Intel\WirelessCommon;C:\Users\sitisuradi\AppData\Local\Microsoft\WindowsApps;.


In [None]:
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import unittest
import time
import os
import bs4
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.select import Select

os.environ["PATH"] += r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe;'

def get_driver_object():
    """
    Creates and returns the selenium webdriver object 

    Returns:
        Chromedriver object: This driver object can be used to simulate the webbrowser
    """
    url='https://www.tripadvisor.com/Hotels-g293993-a_ufe.true-Mecca_Makkah_Province-Hotels.html'

    # Creating the service object to pass the executable chromedriver path to webdriver
    service_object = Service(executable_path=r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe')
    
    # Creating the ChromeOptions object to pass the additional arguments to webdriver
    options = webdriver.ChromeOptions()
    
    # Adding the arguments to ChromeOptions object
    # options.headless = True                    #To run the chrome without GUI
    options.add_argument("start-maximized")      #To start the window maximised 
    options.add_argument("--disable-extensions") #To disable all the browser extensions
    options.add_argument("--log-level=3")        #To to capture the logs from level 3 or above
    options.add_experimental_option(
        "prefs", {"profile.managed_default_content_settings.images": 2}
    )                                           #To disable the images that are loaded when the website is opened
    
    
    # Creating the Webdriver object of type Chrome by passing service and options arguments
    driver_object = webdriver.Chrome(service=service_object,options=options)
    
    
    return driver_object

get_driver_object()

<selenium.webdriver.chrome.webdriver.WebDriver (session="ae4f92144d49650c0383104ff9f3cd8c")>

In [None]:
def get_website_driver(driver=get_driver_object(),url='https://www.tripadvisor.com/Hotels-g293993-a_ufe.true-Mecca_Makkah_Province-Hotels.html'):
    """it will get the chromedriver object and opens the given URL

    Args:
        driver (Chromedriver): _description_. Defaults to get_driver_object().
        url (str, optional): URL of the website. Defaults to SCRAPING_URL.

    Returns:
        Chromedriver: The driver where the given url is opened.
    """

    # Opening the URL with the created driver object
    print("The webdriver is created") 
    driver.get(url)
    print(f"The URL '{url}' is opened")
    return get_website_driver

get_website_driver()

The webdriver is created
The URL 'https://www.tripadvisor.com/Hotels-g293993-a_ufe.true-Mecca_Makkah_Province-Hotels.html' is opened


<function __main__.get_website_driver(driver=<selenium.webdriver.chrome.webdriver.WebDriver (session="234b52968d3919cf2b27c866700942fe")>, url='https://www.tripadvisor.com/Hotels-g293993-a_ufe.true-Mecca_Makkah_Province-Hotels.html')>

In [None]:
def search_hotels(driver):
    """Opens the Hotels page from which the data can be parsed.

    Args:
        driver (Chromedriver): The driver where the url is opened.
    """

    #Opening the Hotels tab with the given city and waiting for it to load
    open_hotels_tab(driver)
    time.sleep(10)
    
    #Selecting the Check In and Check Out Dates
    select_check_in_out_dates(driver)
    
    #Updating the details
    update_button(driver)
    time.sleep(10)
    return search_hotels

search_hotels(driver)

In [None]:
def parse_hotels(driver):
    """ To parse th web page using the BeautifulSoup

    Args:
        driver (Chromedriver): The driver instance where the hotel details are loaded
    """
    # Getting the HTML page source
    html_source = driver.page_source

    # Creating the BeautifulSoup object with the html source
    soup = BeautifulSoup(html_source,"html.parser")
    
    # Finding all the Hotel Div's in the BeautifulSoup object 
    hotel_tags = soup.find_all("div",{"data-prwidget-name":"meta_hsx_responsive_listing"})

parse_hotels()

In [None]:
check_in = driver.find_element("xpath",'//*[@id="PERSISTENT_TRIP_SEARCH_BAR"]/div[1]/div/div/div[1]/div[1]/button/div').click()
time.sleep(5)

In [None]:
  # Choose check in section
# check_in = driver.find_element("xpath",'//div[@data-test-target="picker-CHECKIN"]')
check_in = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//div[@data-test-target="picker-CHECKIN"]')))
check_in.click()
time.sleep(2)

# Choose check out section
# check_out = driver.find_element("xpath",'//div[@data-test-target="picker-CHECKOUT"]')
# check_out = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//div[@data-test-target="picker-CHECKOUT"]')))
# check_out.click()
# check_in.send_keys("2023-03-10")
# check_out.clear()

# MECCA HOTELS

In [None]:
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
import os
os.environ["PATH"] += r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe;'

chrome_options = Options()
# chrome_options.add_argument("--headless")

driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',
                          chrome_options=chrome_options)
driver.implicitly_wait(0.5)
driver.maximize_window()

  driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',
  driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',


In [None]:
# Navigate to the URL
driver.get('https://www.tripadvisor.com.my/Hotels-g293993-a_ufe.true-Mecca_Makkah_Province-Hotels.html')

In [None]:
from bs4 import BeautifulSoup
import csv
import pandas as pd

soup = BeautifulSoup(driver.page_source, 'html.parser')

hotels = []
for name in soup.findAll('div',{'class':'listing_title'}):   #class may be varied each time, need to particularly update it
    hotels.append(name.text.strip())
prices = []
for p in soup.findAll('div', {'data-clickpart': 'chevron_price'}):
    prices.append(p.text.replace('MYR','').strip())
ratings = []
for rating in soup.findAll('a',{'class':'ui_bubble_rating'}):
    ratings.append(rating['alt'])
reviews = []
for review in soup.findAll('a',{'class':'review_count'}):
    reviews.append(review.text.strip())

# Create the dictionary.
hotel_dict = {'Hotel Names':hotels,'Ratings':ratings,'Number of Reviews':reviews,'Prices':prices}

lengths = [len(hotels), len(prices), len(ratings), len(reviews)]
print(lengths)

df = pd.DataFrame.from_dict(hotel_dict)
df.to_excel('hotels_near_haram.xlsx', index=False)
df

[35, 35, 35, 35]


Unnamed: 0,Hotel Names,Ratings,Number of Reviews,Prices
0,Sponsored DoubleTree by Hilton Makkah Jab...,4.5 of 5 bubbles,258 reviews,"RM 1,021"
1,1. Swissotel Al Maqam Makkah,4.5 of 5 bubbles,"1,512 reviews","RM 1,369"
2,2. Fairmont Makkah Clock Royal Tower,4 of 5 bubbles,"4,189 reviews","RM 1,390"
3,3. Hilton Makkah Convention Hotel,4.5 of 5 bubbles,"1,047 reviews","RM 1,056"
4,4. Hilton Suites Makkah,4.5 of 5 bubbles,"2,138 reviews","RM 1,580"
5,5. Pullman ZamZam Makkah,4 of 5 bubbles,"2,572 reviews","RM 1,622"
6,Sponsored Hilton Suites Makkah,4.5 of 5 bubbles,"2,138 reviews","RM 1,582"
7,6. Park Inn by Radisson Makkah Aziziyah,4 of 5 bubbles,254 reviews,RM 346
8,7. Elaf Bakkah Hotel,4 of 5 bubbles,140 reviews,RM 234
9,8. Jabal Omar Conrad Makkah,4.5 of 5 bubbles,"1,666 reviews","RM 1,404"


In [None]:
driver.quit()

In [None]:
driver.get('https://www.tripadvisor.com/HotelsNear-g293993-d6881993-Great_Mosque_of_Mecca-Mecca_Makkah_Province.html')

In [None]:
from bs4 import BeautifulSoup
import csv
import pandas as pd

soup = BeautifulSoup(driver.page_source, 'html.parser')

hotels = []
for name in soup.findAll('div',{'class':'listing_title'}):   #class may be varied each time, need to particularly update it
    hotels.append(name.text.strip())
prices = []
for p in soup.findAll('div', {'data-clickpart': 'chevron_price'}):
    prices.append(p.text.replace('MYR','').strip())
ratings = []
for rating in soup.findAll('a',{'class':'ui_bubble_rating'}):
    ratings.append(rating['alt'])
reviews = []
for review in soup.findAll('a',{'class':'review_count'}):
    reviews.append(review.text.strip())

# Create the dictionary.
hotel_dict = {'Hotel Names':hotels,'Ratings':ratings,'Number of Reviews':reviews,'Prices':prices}

lengths = [len(hotels), len(prices), len(ratings), len(reviews)]
print(lengths)

df = pd.DataFrame.from_dict(hotel_dict)
df.to_csv('hotels_near_mosque.csv', index=False)
df

[37, 37, 37, 37]


Unnamed: 0,Hotel Names,Ratings,Number of Reviews,Prices
0,Sponsored DoubleTree by Hilton Makkah Jab...,4.5 of 5 bubbles,254 reviews,1131
1,1. Hilton Suites Makkah,4.5 of 5 bubbles,"2,116 reviews",1949
2,2. Swissotel Al Maqam Makkah,4.5 of 5 bubbles,"1,485 reviews",1325
3,3. Makkah Towers,4.5 of 5 bubbles,"1,928 reviews",1915
4,4. Pullman ZamZam Makkah,4 of 5 bubbles,"2,560 reviews",1808
5,5. Fairmont Makkah Clock Royal Tower,4 of 5 bubbles,"4,172 reviews",1848
6,Sponsored Hilton Suites Makkah,4.5 of 5 bubbles,"2,116 reviews",1950
7,6. Makkah Hotel,4.5 of 5 bubbles,"1,707 reviews",1949
8,7. Hilton Makkah Convention Hotel,4.5 of 5 bubbles,"1,033 reviews",1325
9,"8. InterContinental Dar al Tawhid Makkah, an I...",4.5 of 5 bubbles,797 reviews,1872


In [None]:
faq_items = soup.find_all("li", class_="faqItem")

# create a list to hold each faq item as a dictionary
faq_list=[]

for faq_item in soup.select(".faqItem"):
  question = faq_item.select_one(".question").text
  answer = faq_item.select_one(".answer").text
  hotels = faq_item.select(".hotels")
  hotel_names = [hotel.select_one("a").text for hotel in hotels]
  hotel_descriptions = [hotel.select_one(".description").text for hotel in hotels]
  # print(question)
  # print(answer)
  # print("Hotel Names:", hotel_names)
  # print("Hotel Descriptions:", hotel_descriptions)
  
  # add a dictionary of faq item data to the faq_list
  faq_list.append({
      'Questions': question,
      'Answer': answer,
      'Hotel names': hotel_names,
      'Description': hotel_descriptions
  })

# create a DataFrame from the list
df = pd.DataFrame(faq_list)

# save the DataFrame to an excel file
df.to_excel('faq_data.xlsx', index=False)
df = pd.read_excel('faq_data.xlsx')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Questions    10 non-null     object
 1   Answer       10 non-null     object
 2   Hotel names  10 non-null     object
 3   Description  10 non-null     object
dtypes: object(4)
memory usage: 448.0+ bytes


# Al MADINAH HOTELS

In [None]:
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
import os
os.environ["PATH"] += r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe;'

chrome_options = Options()
# chrome_options.add_argument("--headless")

driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',
                          chrome_options=chrome_options)
driver.implicitly_wait(0.5)
driver.maximize_window()

  driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',
  driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',


In [None]:
driver.get('https://www.tripadvisor.com/Hotels-g298551-zff4-Medina_Al_Madinah_Province-Hotels.html')

soup = BeautifulSoup(driver.page_source, 'html.parser')

In [None]:
hotels = []
for name in soup.findAll('div',{'class':'listing_title'}):   #class may be varied each time, need to particularly update it
    hotels.append(name.text.strip())
ratings = []
for rating in soup.findAll('a',{'class':'ui_bubble_rating'}):
    ratings.append(rating['alt'])
reviews = []
for review in soup.findAll('a',{'class':'review_count'}):
    reviews.append(review.text.strip())

# Create the dictionary.
hotel_dict = {'Hotel Names':hotels,'Ratings':ratings,'Number of Reviews':reviews}

lengths = [len(hotels), len(ratings), len(reviews)]
print(lengths)

df = pd.DataFrame.from_dict(hotel_dict)
df.to_csv('hotels_family_near_madinah.csv', index=False)
df = pd.read_csv('hotels_family_near_madinah.csv')
df

[32, 32, 32]


Unnamed: 0,Hotel Names,Ratings,Number of Reviews
0,Sponsored InterContinental Madinah-Dar Al...,4.5 of 5 bubbles,753 reviews
1,1. Pullman Zamzam Madina,4.5 of 5 bubbles,"1,022 reviews"
2,2. Madinah Hilton,4.5 of 5 bubbles,"1,252 reviews"
3,"3. InterContinental Madinah-Dar Al Iman, an IH...",4.5 of 5 bubbles,753 reviews
4,4. Dar Al Taqwa Hotel,4 of 5 bubbles,489 reviews
5,5. Millennium Madinah Airport,4 of 5 bubbles,27 reviews
6,6. Millennium Al Aqeeq Hotel Madinah,3.5 of 5 bubbles,353 reviews
7,7. Marriott Executive Apartments Madinah,4.5 of 5 bubbles,9 reviews
8,8. Dallah Taibah Hotel,4.5 of 5 bubbles,396 reviews
9,9. Millennium Taiba Hotel,4 of 5 bubbles,478 reviews


In [None]:
faq_items = soup.find_all("li", class_="faqItem")

# create a list to hold each faq item as a dictionary
faq_list=[]

for faq_item in soup.select(".faqItem"):
  question = faq_item.select_one(".question").text
  answer = faq_item.select_one(".answer").text
  hotels = faq_item.select(".hotels")
  hotel_names = [hotel.select_one("a").text for hotel in hotels]
  hotel_descriptions = [hotel.select_one(".description").text for hotel in hotels]
  # print(question)
  # print(answer)
  # print("Hotel Names:", hotel_names)
  # print("Hotel Descriptions:", hotel_descriptions)
  
  # add a dictionary of faq item data to the faq_list
  faq_list.append({
      'Questions': question,
      'Answer': answer,
      'Hotel names': hotel_names,
      'Description': hotel_descriptions
  })

# create a DataFrame from the list
df = pd.DataFrame(faq_list)

# save the DataFrame to an excel file
df.to_excel('faq_data_madinah.xlsx', index=False)
df = pd.read_excel('faq_data_madinah.xlsx')
df

Unnamed: 0,Questions,Answer,Hotel names,Description
0,What are the best family hotels near Mount Uhud?,Some of the more popular family hotels near Mo...,"['InterContinental Madinah-Dar Al Iman, an IHG...","[' - Traveler rating: 4.5/5', ' - Traveler rat..."
1,Which family hotels are close to Mohammad Bin ...,These family hotels are close to Mohammad Bin ...,"['InterContinental Madinah-Dar Al Iman, an IHG...","[' - Traveler rating: 4.5/5', ' - Traveler rat..."
2,What are the best family hotels in Medina?,Some of the best family hotels in Medina are:,"['InterContinental Madinah-Dar Al Iman, an IHG...","[' - Traveler rating: 4.5/5', ' - Traveler rat..."
3,Which family hotels in Medina offer a gym?,A gym is available to guests at the following ...,"['Crowne Plaza Madinah', 'Millennium Taiba Hot...","[' - Traveler rating: 4.0/5', ' - Traveler rat..."
4,Which family hotels in Medina have rooms with ...,These family hotels in Medina have great views...,"['InterContinental Madinah-Dar Al Iman, an IHG...","[' - Traveler rating: 4.5/5', ' - Traveler rat..."
5,Do any family hotels in Medina offer free brea...,Free breakfast can be enjoyed at the following...,"['Dallah Taibah Hotel', 'Dar Al Taqwa Hotel', ...","[' - Traveler rating: 4.5/5', ' - Traveler rat..."
6,Which family hotels in Medina have free parking?,These family hotels in Medina have free parking:,"['Madinah Hilton', 'Millennium Taiba Hotel', '...","[' - Traveler rating: 4.5/5', ' - Traveler rat..."
7,What are some family hotels in Medina with a 5...,Travelers seeking the ultimate in luxury often...,"['InterContinental Madinah-Dar Al Iman, an IHG...","[' - Traveler rating: 4.5/5', ' - Traveler rat..."
8,What are some popular family hotels in Medina ...,An upscale traveling experience can be enjoyed...,"['Dallah Taibah Hotel', 'Bosphorus Hotel', 'Sa...","[' - Traveler rating: 4.5/5', ' - Traveler rat..."
9,What are some popular family hotels in Medina ...,These 3 star hotels received great reviews fro...,"['Marriott Executive Apartments Madinah', 'Sar...","[' - Traveler rating: 4.5/5', ' - Traveler rat..."


# ATTRACTIONS IN MEDINA

In [None]:
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
import os
os.environ["PATH"] += r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe;'

chrome_options = Options()
# chrome_options.add_argument("--headless")

driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',
                          chrome_options=chrome_options)
driver.implicitly_wait(0.5)
driver.maximize_window()

  driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',
  driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',


In [None]:
driver.get('https://www.tripadvisor.com/Attractions-g298551-Activities-a_allAttractions.true-Medina_Al_Madinah_Province.html')

soup = BeautifulSoup(driver.page_source, 'html.parser')

In [None]:
results = soup.findAll('div', {'class': 'XfVdV o AIbhI'})

for result in results:
    print(result.text.strip())

1. Al Masjid an Nabawi
2. Masjid Quba
3. Mount Uhud
4. Masjid Al Qiblatayn
5. Grave Of Hamzah
6. Jannatul Baqi
7. The Seven Mosques
8. Al Noor Mall
9. Al-Madina Museum
10. King Fahd Glorious Quran Printing Complex
11. Mosque of Badr
12. Dar Al Madinah Museum
13. Masjid Al Ghamamah
14. The Beautiful Names of Allah Gallery
15. Madina Media Museum
16. Abu Bakar Masjed
17. Koran Museum
18. Rashed Mall
19. Wadi e Jinn - Al Baida
20. Old Bazaar
21. Masjid Jummah
22. Madinah Dates
23. Amberiye Mosque
24. Clock Tower
25. Sela Mountain
26. Hejaz Railway Museum
27. Al Hamra Fish Restaurant
28. Madinah Art Center
29. Al Rashid Mega Mall
30. The International Fair And Museum Of The Prophet's Biography And Islamic Civilization


In [None]:
base_url = "https://www.tripadvisor.com/Attractions-g298551-Activities-a_allAttractions.true-Medina_Al_Madinah_Province.html"
page_numbers = [0, 30, 60]

In [None]:
attraction_ = []
category_ = []

# Loop over pages
for page_number in page_numbers:
  url = base_url + f"-oa{page_number}"
  driver.get(url)
  soup = BeautifulSoup(driver.page_source, 'html.parser')
  
  # Scrape data from current page
  results = soup.findAll('div', {'class': 'XfVdV o AIbhI'})

  for result in results:
    attraction = result.text.strip()
    attraction_.append(attraction)
  
  time.sleep(1)
  
  # Extract the category type
  att_type = soup.findAll("div", class_="alPVI eNNhq PgLKC tnGGX yzLvM")

  for category in att_type:
    category_type = category.find("div", class_="biGQs _P pZUbB hmDzD").text.strip()
    category_.append(category_type)

  time.sleep(1)

print(attraction_)
print(category_)

['1. Al Masjid an Nabawi', '2. Masjid Quba', '3. Mount Uhud', '4. Masjid Al Qiblatayn', '5. Grave Of Hamzah', '6. Jannatul Baqi', '7. Al Noor Mall', '8. Al-Madina Museum', '9. The Seven Mosques', '10. King Fahd Glorious Quran Printing Complex', '11. Mosque of Badr', '12. Dar Al Madinah Museum', '13. Masjid Al Ghamamah', '14. The Beautiful Names of Allah Gallery', '15. Madina Media Museum', '16. Abu Bakar Masjed', '17. Koran Museum', '18. Rashed Mall', '19. Wadi e Jinn - Al Baida', '20. Old Bazaar', '21. Masjid Jummah', '22. Madinah Dates', '23. Amberiye Mosque', '24. Clock Tower', '25. Sela Mountain', '26. Hejaz Railway Museum', '27. Al Hamra Fish Restaurant', '28. Madinah Art Center', '29. Al Rashid Mega Mall', "30. The International Fair And Museum Of The Prophet's Biography And Islamic Civilization", '31. Al Baqi Cemetery', '32. Uhud Mountain', '33. Taiba Commercial Center', '34. Umrahcabs', '35. Almunawara Gift Shop- Alharam', '36. Marks and Spencer', '37. Qisat Almkan', '38. Tales

In [None]:
attraction_dict = {
  'Attraction places':attraction_,
  'Category type':category_
}

# create a DataFrame from the list
df = pd.DataFrame(attraction_dict)

# save the DataFrame to an excel file
df.to_excel('attraction_place.xlsx', index=False)
df = pd.read_excel('attraction_place.xlsx')
df

Unnamed: 0,Attraction places,Category type
0,1. Al Masjid an Nabawi,Religious Sites
1,2. Masjid Quba,Religious Sites
2,3. Mount Uhud,Mountains
3,4. Masjid Al Qiblatayn,Religious Sites
4,5. Grave Of Hamzah,Cemeteries
...,...,...
64,65. SEERO tours,Day Trips
65,66. Way to Saudi,Multi-day Tours • Taxis & Shuttles
66,67. Travel Asia Experience Saudi,Multi-day Tours • Cultural Tours
67,68. zayer,Multi-day Tours • Walking Tours


## Attractions in Mecca

## Restaurants

In [None]:
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
import os
os.environ["PATH"] += r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe;'

chrome_options = Options()
# chrome_options.add_argument("--headless")

driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',
                          chrome_options=chrome_options)
driver.implicitly_wait(0.5)
driver.maximize_window()

  driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',
  driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',


### Top Restaurants in Mecca

In [None]:
driver.get('https://www.tripadvisor.com/Restaurants-g293993-Mecca_Makkah_Province.html')

soup = BeautifulSoup(driver.page_source, 'html.parser')

In [None]:
restaurants = []
for diner in soup.findAll("div", class_="tkvCJ u F f Ff K"):
    name = diner.find("span").text.strip()
    restaurants.append(name)

print(restaurants)

['1. The Oasis', '2. Gurkan Sef Steakhouse', '3. Zamzam Lobby Lounge', '4. Al Rehab Restaurant', '5. Al Deyafa Restaurant', '6. Al Ruwad', '7. Seeneez Restaurant & Cafe', '8. Al Shorfa', '9. Feld D saji', '10. Faisalabad Restaurant', '11. Simit Sarayi', '12. Al Bayt Restaurant', '13. Aryana Restaurants', '14. Retaj Al Bayt Restaurant', '15. paradise restaurant', '16. Buffalo Wings & Rings - Macca', '17. Strawberry & Cheese', '18. Shobak', '19. Atyaf Restaurant', '20. Afandim Restaurant', '21. Pizza Hut', '22. Pizza Hut', '23. Al-Qasr Restaurant', '24. Al Majlis Restaurant', '25. Kabsa Hashi', '26. Al Atbaq Restaurant', '27. AL Khairat Restaurant', '28. Pizza Hut', '29. Turkish Almazaq Restaurant', '30. Tayyibah Restaurant']


In [None]:
name_links = []
for link in soup.findAll("a", class_="Lwqic Cj b"):
    href = link.get("href")
    name_links.append(href)

print(name_links)

['/Restaurant_Review-g293993-d10029108-Reviews-The_Oasis-Mecca_Makkah_Province.html', '/Restaurant_Review-g293993-d14783895-Reviews-Gurkan_Sef_Steakhouse-Mecca_Makkah_Province.html', '/Restaurant_Review-g293993-d10157692-Reviews-Zamzam_Lobby_Lounge-Mecca_Makkah_Province.html', '/Restaurant_Review-g293993-d11837185-Reviews-Al_Rehab_Restaurant-Mecca_Makkah_Province.html', '/Restaurant_Review-g293993-d6474009-Reviews-Al_Deyafa_Restaurant-Mecca_Makkah_Province.html', '/Restaurant_Review-g293993-d7747662-Reviews-Al_Ruwad-Mecca_Makkah_Province.html', '/Restaurant_Review-g293993-d24151350-Reviews-Seeneez_Restaurant_Cafe-Mecca_Makkah_Province.html', '/Restaurant_Review-g293993-d6027006-Reviews-Al_Shorfa-Mecca_Makkah_Province.html', '/Restaurant_Review-g293993-d2445936-Reviews-Feld_D_saji-Mecca_Makkah_Province.html', '/Restaurant_Review-g293993-d8670944-Reviews-Faisalabad_Restaurant-Mecca_Makkah_Province.html', '/Restaurant_Review-g293993-d10165043-Reviews-Simit_Sarayi-Mecca_Makkah_Province.htm

In [None]:
location = []
rating_ = []
cuisine_ = []
price_range_ = []

for link in name_links:
  # Navigate to the link using Selenium
  driver.get("https://www.tripadvisor.com" + link)

  # Extract the location using Beautiful Soup
  soup = BeautifulSoup(driver.page_source, "html.parser")

  # Extract location of the restaurant
  location_element = soup.find("span", class_="yEWoV")
  location_name = location_element.text
  location.append(location_name)
  # location_link = soup.find("a", class_="AYHFM")
  # location = location_link.text if location_link else 'Unknown'

  time.sleep(1)

  # Find the rating 
  rate = soup.find("span", class_="ZDEqb")
  rating = rate.text.strip()
  rating_.append(rating)

  time.sleep(1)

  # Get the cuisine 
  # Find the div element with class="tbUiL b" that contains the text "CUISINES"
  cuisine_header = soup.find("div", class_="tbUiL b", text="CUISINES")

  # If the cuisine header exists, find the div element with class="SrqKb" that follows it
  if cuisine_header:
    cuisine_div = cuisine_header.find_next_sibling("div", class_="SrqKb")
    
    # If the cuisine div exists, extract the text and strip any extra whitespace
    if cuisine_div:
      cuisines = cuisine_div.text.strip()
      cuisine_.append(cuisines)
  else:
    cuisine_.append("N/A")

  time.sleep(1)

  # Get the price range
  price_range_element = soup.find("div", class_="tbUiL b", text="PRICE RANGE")
  if price_range_element:
    price_range = price_range_element.find_next_sibling("div",class_="SrqKb").text.strip()

  else:
    price_range = "NA"
  price_range_.append(price_range)

In [None]:
restaurant_dict = {
    'Restaurant name':restaurants,
    'Price range':price_range_,
    'Cuisines':cuisine_,
    'Rating':rating_,
    'Review Link':name_links,
    'Location':location
}

# create a DataFrame from the list
df = pd.DataFrame(restaurant_dict)

# save the DataFrame to an excel file
df.to_excel('top_mecca_restaurant.xlsx', index=False)
df = pd.read_excel('top_mecca_restaurant.xlsx')
df

Unnamed: 0,Restaurant name,Price range,Cuisines,Rating,Review Link,Location
0,1. The Oasis,MYR 134 - MYR 224,International,4.5,/Restaurant_Review-g293993-d10029108-Reviews-T...,"Ibrahim Al Khalil Street Jabal Omar, Mecca 219..."
1,2. Gurkan Sef Steakhouse,,"Steakhouse, Turkish",4.0,/Restaurant_Review-g293993-d14783895-Reviews-G...,"Ibrahim Al Jaffali Al Awali, Mecca 24372 Saudi..."
2,3. Zamzam Lobby Lounge,MYR 42 - MYR 179,"International, Deli, Arabic, Cafe, Asian",4.5,/Restaurant_Review-g293993-d10157692-Reviews-Z...,"Makkah, Mecca 21955 Saudi Arabia"
3,4. Al Rehab Restaurant,,,4.5,/Restaurant_Review-g293993-d11837185-Reviews-A...,Ibrahim Al Khalil Dar Al Tawhid intercontinent...
4,5. Al Deyafa Restaurant,,"International, Asian, Middle Eastern",4.0,/Restaurant_Review-g293993-d6474009-Reviews-Al...,"King Abdul Aziz Rd, Gate 9601, Abraj Al Bait A..."
5,6. Al Ruwad,,Middle Eastern,4.5,/Restaurant_Review-g293993-d7747662-Reviews-Al...,King Abdul Aziz Endowment Abraj Al Bait Comple...
6,7. Seeneez Restaurant & Cafe,,"American, Pizza, Seafood, Barbecue, Italian",5.0,/Restaurant_Review-g293993-d24151350-Reviews-S...,"Ibrahim Al Joufaili Street, Mecca 24372 Saudi ..."
7,8. Al Shorfa,,"International, Mediterranean, Egyptian",4.5,/Restaurant_Review-g293993-d6027006-Reviews-Al...,"King Abdul Aziz Gate, Abraj Al Bait Complex Al..."
8,9. Feld D saji,,"Asian, Malaysian",4.0,/Restaurant_Review-g293993-d2445936-Reviews-Fe...,"Saffa Tower, Mecca Saudi Arabia"
9,10. Faisalabad Restaurant,,Pakistani,4.0,/Restaurant_Review-g293993-d8670944-Reviews-Fa...,"Hijrah Street, Mecca Saudi Arabia"


### Top Restaurants in Medina

In [None]:
driver.get('https://www.tripadvisor.com/Restaurants-g298551-Medina_Al_Madinah_Province.html')

soup = BeautifulSoup(driver.page_source, 'html.parser')

In [None]:
restaurants = []
for diner in soup.findAll("div", class_="tkvCJ u F f Ff K"):
    name = diner.find("span").text.strip()
    restaurants.append(name)

print(restaurants)

time.sleep(1)

[]


In [None]:
name_links = []
for link in soup.findAll("a", class_="Lwqic Cj b"):
    href = link.get("href")
    name_links.append(href)

print(name_links)

['/Restaurant_Review-g298551-d7856197-Reviews-Arabesque_Restaurant-Medina_Al_Madinah_Province.html', '/Restaurant_Review-g298551-d5521340-Reviews-Al_Baik_Restaurant-Medina_Al_Madinah_Province.html', '/Restaurant_Review-g298551-d3167343-Reviews-Beiruti-Medina_Al_Madinah_Province.html', '/Restaurant_Review-g298551-d2341827-Reviews-Hardee_s-Medina_Al_Madinah_Province.html', '/Restaurant_Review-g298551-d3356872-Reviews-House_of_Donuts-Medina_Al_Madinah_Province.html', '/Restaurant_Review-g298551-d4779096-Reviews-Swiss_House-Medina_Al_Madinah_Province.html', '/Restaurant_Review-g298551-d7346364-Reviews-Hyderabad_House-Medina_Al_Madinah_Province.html', '/Restaurant_Review-g298551-d11734433-Reviews-Tokushi-Medina_Al_Madinah_Province.html', '/Restaurant_Review-g298551-d11824710-Reviews-Town_Pour-Medina_Al_Madinah_Province.html', '/Restaurant_Review-g298551-d8665192-Reviews-Jazz_Lounge-Medina_Al_Madinah_Province.html', '/Restaurant_Review-g298551-d18930337-Reviews-Cacao_and_More-Medina_Al_Madin

In [None]:
location = []
rating_ = []
cuisine_ = []
for link in name_links:
  # Navigate to the link using Selenium
  driver.get("https://www.tripadvisor.com" + link)

  # Extract the location using Beautiful Soup
  soup = BeautifulSoup(driver.page_source, "html.parser")

  # Extract location of the restaurant
  location_element = soup.find("span", class_="yEWoV")
  location_name = location_element.text
  location.append(location_name)
  # location_link = soup.find("a", class_="AYHFM")
  # location = location_link.text if location_link else 'Unknown'

  time.sleep(1)

  # Find the rating 
  rate = soup.find("span", class_="ZDEqb")
  rating = rate.text.strip()
  rating_.append(rating)

  time.sleep(1)

  # Get the cuisine 
  # Find the div element with class="tbUiL b" that contains the text "CUISINES"
  cuisine_header = soup.find("div", class_="tbUiL b", text="CUISINES")

  # If the cuisine header exists, find the div element with class="SrqKb" that follows it
  if cuisine_header:
    cuisine_div = cuisine_header.find_next_sibling("div", class_="SrqKb")
    
    # If the cuisine div exists, extract the text and strip any extra whitespace
    if cuisine_div:
      cuisines = cuisine_div.text.strip()
      cuisine_.append(cuisines)
  else:
    cuisine_.append("N/A")


print(cuisine_)
print(location)

['Mediterranean, Asian, Middle Eastern', 'Fast Food', 'Lebanese, Middle Eastern, Mediterranean', 'American, Fast Food', 'N/A', 'American', 'Indian, Asian', 'Japanese, Sushi, Asian', 'Pizza, European, British, Italian', 'Cafe, International', 'French, American, Arabic', 'American, British', 'American', 'Chinese, Asian', 'Asian, Pakistani', 'Middle Eastern, Italian, Indian, Cafe, Seafood, Turkish', 'N/A', 'Asian', 'Middle Eastern, Barbecue', 'Indian', 'Middle Eastern', 'Lebanese, Mediterranean', 'Chinese, Indian, Seafood, Asian, Grill, Bangladeshi', 'Pizza', 'American', 'N/A', 'Italian, Sushi, Pizza, Middle Eastern, Eastern European', 'N/A', 'Middle Eastern, Cafe', 'N/A']
['King Fahad Road, 2943 Shaza Madinah Hotel, Medina 41476 Saudi Arabia', 'King Faical Road Near the Haram Ennabawi, Medina Saudi Arabia', 'Abo Bakr al Siddiq Opposite side of Ghomme Shopping Center, Medina Saudi Arabia', 'Near Haram, Medina Saudi Arabia', 'Madinah Province, Medina Saudi Arabia', 'Sultanh St, Medina Saud

In [None]:
restaurant_dict = {
    'Restaurant name':restaurants,
    'Cuisines':cuisine_,
    'Rating':rating_,
    'Review Link':name_links,
    'Location':location
}

# create a DataFrame from the list
df = pd.DataFrame(restaurant_dict)

# save the DataFrame to an excel file
df.to_excel('top_medina_restaurant.xlsx', index=False)
df = pd.read_excel('top_medina_restaurant.xlsx')
df

Unnamed: 0,Restaurant name,Cuisines,Rating,Review Link,Location
0,1. Arabesque Restaurant,"Mediterranean, Asian, Middle Eastern",4.5,/Restaurant_Review-g298551-d7856197-Reviews-Ar...,"King Fahad Road, 2943 Shaza Madinah Hotel, Med..."
1,2. Al Baik Restaurant,Fast Food,4.0,/Restaurant_Review-g298551-d5521340-Reviews-Al...,"King Faical Road Near the Haram Ennabawi, Medi..."
2,3. Beiruti,"Lebanese, Middle Eastern, Mediterranean",4.0,/Restaurant_Review-g298551-d3167343-Reviews-Be...,Abo Bakr al Siddiq Opposite side of Ghomme Sho...
3,4. Hardee's,"American, Fast Food",3.5,/Restaurant_Review-g298551-d2341827-Reviews-Ha...,"Near Haram, Medina Saudi Arabia"
4,5. House of Donuts,,4.0,/Restaurant_Review-g298551-d3356872-Reviews-Ho...,"Madinah Province, Medina Saudi Arabia"
5,6. Swiss House,American,4.0,/Restaurant_Review-g298551-d4779096-Reviews-Sw...,"Sultanh St, Medina Saudi Arabia"
6,7. Hyderabad House,"Indian, Asian",4.0,/Restaurant_Review-g298551-d7346364-Reviews-Hy...,"King Abdullah bin Abdul Aziz Road exit 11, Med..."
7,8. Tokushi,"Japanese, Sushi, Asian",4.5,/Restaurant_Review-g298551-d11734433-Reviews-T...,"King Abdullah Branch Road Mudhainib, Medina 42..."
8,9. Town Pour,"Pizza, European, British, Italian",4.5,/Restaurant_Review-g298551-d11824710-Reviews-T...,"Sultana Street, Medina 45879 Saudi Arabia"
9,10. Jazz Lounge,"Cafe, International",4.0,/Restaurant_Review-g298551-d8665192-Reviews-Ja...,"Ali Bin Abi Taleb, Medina Saudi Arabia"


In [None]:
# Close the webdriver when finished
driver.quit()

# Mecca: Arriving & Departing

In [None]:
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
import os
os.environ["PATH"] += r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe;'

chrome_options = Options()
# chrome_options.add_argument("--headless")

driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',
                          chrome_options=chrome_options)
driver.implicitly_wait(0.5)
driver.maximize_window()

  driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',
  driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',


In [None]:
driver.get('https://www.tripadvisor.com/Travel-g293993-s301/Mecca:Saudi-Arabia:Arriving.And.Departing.html')

soup = BeautifulSoup(driver.page_source, 'html.parser')

In [None]:
logistic = []
article_body = soup.find("div", {"class": "articleBody"})

for p in article_body.find_all("p"):
  attr = p.text
  logistic.append(attr)

logistics_dict = {
    'Info':logistic
}

# create a DataFrame from the list
df = pd.DataFrame(logistics_dict)

# save the DataFrame to an excel file
df.to_excel('logistics.xlsx', index=False)
df = pd.read_excel('logistics.xlsx')
df

Unnamed: 0,Info
0,Travellers do not fly directly into Makkah. Yo...
1,The Saudi Arabian Public Transport Company (SA...
2,Taxis are plentiful and you will see drivers h...
3,If you are going to Medina by taxi you could b...


## FAQ Luxury hotels in Medinah

In [None]:
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
import os
os.environ["PATH"] += r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe;'

chrome_options = Options()
# chrome_options.add_argument("--headless")

driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',
                          chrome_options=chrome_options)
driver.implicitly_wait(0.5)
driver.maximize_window()

  driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',
  driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',


In [None]:
driver.get('https://www.tripadvisor.com/Hotels-g298551-zff12-Medina_Al_Madinah_Province-Hotels.html')

soup = BeautifulSoup(driver.page_source, 'html.parser')

In [None]:
from selenium.webdriver.common.by import By

# initialize webdriver and load page
driver.get('https://www.tripadvisor.com/Hotels-g298551-zff12-Medina_Al_Madinah_Province-Hotels.html')

# find all FAQ items
faq_items = driver.find_elements(By.CSS_SELECTOR, '.faqItem')

# create empty list to store data
data = []

# loop through FAQ items and extract data
for item in faq_items:
    question = item.find_element(By.CLASS_NAME, 'question').text
    answer = item.find_element(By.CLASS_NAME, 'answer').text
    hotel_names = [hotel.text for hotel in item.find_elements(By.CLASS_NAME, 'hotels')]
    hotel_ratings = [rating.text for rating in item.find_elements(By.CLASS_NAME, 'description')]
    for name, rating in zip(hotel_names, hotel_ratings):
        data.append({'question': question, 'answer': answer, 'hotel_name': name, 'hotel_rating': rating})

# convert data to pandas DataFrame
df = pd.DataFrame(data)
df

Unnamed: 0,question,answer,hotel_name,hotel_rating
0,What are the best luxury hotels near Mount Uhud?,Some of the more popular luxury hotels near Mo...,The Oberoi Madina - Traveler rating: 5.0/5,- Traveler rating: 5.0/5
1,What are the best luxury hotels near Mount Uhud?,Some of the more popular luxury hotels near Mo...,"InterContinental Madinah-Dar Al Iman, an IHG H...",- Traveler rating: 4.5/5
2,What are the best luxury hotels near Mount Uhud?,Some of the more popular luxury hotels near Mo...,Madinah Hilton - Traveler rating: 4.5/5,- Traveler rating: 4.5/5
3,Which luxury hotels are close to Mohammad Bin ...,These luxury hotels are close to Mohammad Bin ...,The Oberoi Madina - Traveler rating: 5.0/5,- Traveler rating: 5.0/5
4,Which luxury hotels are close to Mohammad Bin ...,These luxury hotels are close to Mohammad Bin ...,"InterContinental Madinah-Dar Al Iman, an IHG H...",- Traveler rating: 4.5/5
5,Which luxury hotels are close to Mohammad Bin ...,These luxury hotels are close to Mohammad Bin ...,Madinah Hilton - Traveler rating: 4.5/5,- Traveler rating: 4.5/5
6,What are the best luxury hotels in Medina?,Some of the best luxury hotels in Medina are:,The Oberoi Madina - Traveler rating: 5.0/5,- Traveler rating: 5.0/5
7,What are the best luxury hotels in Medina?,Some of the best luxury hotels in Medina are:,"InterContinental Madinah-Dar Al Iman, an IHG H...",- Traveler rating: 4.5/5
8,What are the best luxury hotels in Medina?,Some of the best luxury hotels in Medina are:,Madinah Hilton - Traveler rating: 4.5/5,- Traveler rating: 4.5/5
9,Which luxury hotels in Medina offer a gym?,A gym is available to guests at the following ...,The Oberoi Madina - Traveler rating: 5.0/5,- Traveler rating: 5.0/5


## Hotel near Mohamad Bin Abdul Aziz Airport (MED)

In [None]:
driver.get('https://www.tripadvisor.com/HotelsNear-g298551-qMED-Medina_Al_Madinah_Province.html')

soup = BeautifulSoup(driver.page_source, 'html.parser')

In [None]:
dist_div = soup.find('div', {'class': 'distWrapper'})
distance = dist_div.find('b').text.strip()

In [None]:
driver.get('https://www.tripadvisor.com/HotelsNear-g298551-qMED-Medina_Al_Madinah_Province.html')

soup = BeautifulSoup(driver.page_source, 'html.parser')

import re

hotel_name = []
price_ = []
rating_ = []
distance_ = []
street_address = []
locality = []
country_name = []

for hotel in soup.find_all('div', {'class': 'listing'}):
    # Skip sponsored listings
    if hotel.find('span', {'data-sponsored-placement': 'true'}):
        continue

    rating_tag = hotel.find('a', {'class': 'ui_bubble_rating'})
    if rating_tag is not None:
        rating_class = rating_tag.get('class')[1]
        rating = int(rating_class.split('_')[1]) / 10
    else:
        rating = None


    #Extract to get the expected price
    price_element = hotel.find('div', {'class': 'price'})
    if price_element is not None:
        price = price_element.text.strip().replace('MYR ', '')
    else:
        price = None


    # Extract address information
    hotel_ = hotel.find('a', {'class': 'property_title'}).text.strip()
    dist_div = hotel.find('div', {'class': 'distWrapper'})
    distance = dist_div.find('b').text.strip()
    address_div = hotel.find('div', {'class': 'address'})

    # Extract street address, locality, and country name
    street = None
    loc = None
    country = None
    for span in address_div.find_all('span'):
        if 'street-address' in span['class']:
            street = span.text.strip()
        elif 'locality' in span['class']:
            loc = span.text.strip().split(',')[0]
        elif 'country-name' in span['class']:
            country = span.text.strip()
    
    # Append information to lists
    hotel_name.append(hotel_)
    price_.append(price)
    rating_.append(rating)
    distance_.append(distance)
    street_address.append(street)
    locality.append(loc)
    country_name.append(country)

airport_hotel_dict = {
    'Hotel name':hotel_name,
    'Prices':price_,
    'Ratings':rating_,
    'Distance from airport':distance_,
    'Street Address':street_address,
    'Locality':locality,
    'Country':country_name
}

df = pd.DataFrame(airport_hotel_dict)
df

df = df.to_excel('hotel_near_airport.xlsx',index=False)
df = pd.read_excel('hotel_near_airport.xlsx')
df

Unnamed: 0,Hotel name,Prices,Ratings,Distance from airport,Street Address,Locality,Country
0,1. Shahd Al Madinah Hotel,"MYR 1,935",4.5,8.0 miles,King Fahad Road,Medina 42311,Saudi Arabia
1,"2. InterContinental Dar al Hijra Madinah, an I...","Only 1 left at MYR 1,477",4.5,8.5 miles,King Fahad Street,Medina 41455,Saudi Arabia
2,3. Madinah Hilton,"MYR 2,551",4.5,8.6 miles,King Fahd Rd,Medina 3936,Saudi Arabia
3,4. Millennium Madinah Airport,MYR 690,4.0,0.4 miles,12 Boulevard Haussmann,Medina 42342,Saudi Arabia
4,5. Madinah Movenpick Hotel,"MYR 1,291",4.0,8.8 miles,Abi Sayeed Al Khudri Street,Medina 41441,Saudi Arabia
5,6. Anwar Al Madinah Movenpick Hotel,"MYR 2,009",4.0,8.8 miles,Central Zone,Medina,Saudi Arabia
6,7. Hafawah Suites,MYR 387,4.5,6.9 miles,King Abdullah RD,Medina 42319,Saudi Arabia
7,8. Y Platinum Hotel,"MYR 1,421",3.0,6.9 miles,Jebar Bin Sakhr,Medina 42317,Saudi Arabia
8,9. Ewan Dar Alhejra Hotel,MYR 947,4.5,7.9 miles,King Fahd Road 8144,Medina 42313,Saudi Arabia
9,10. Pullman Zamzam Madina,"MYR 1,962",4.5,8.9 miles,Amr Bin Al Gmoh Street Madina,Medina 41499,Saudi Arabia


In [None]:
driver.quit()

## Weather in Mecca and Madinah 

In [None]:
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
import os
os.environ["PATH"] += r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe;'

chrome_options = Options()
# chrome_options.add_argument("--headless")

driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',
                          chrome_options=chrome_options)
driver.implicitly_wait(0.5)
driver.maximize_window()

  driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',
  driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',


In [None]:
driver.get("https://championtraveler.com/dates/best-time-to-visit-mecca-sa/")

soup = BeautifulSoup(driver.page_source, 'html.parser')

seasons = []
descriptions = []
h2 = soup.find('h2', {'id': 'bymonth'})
for h4 in h2.find_all_next('h4'):
    season = h4.text.strip()
    description = h4.find_next('p').text.strip()
    seasons.append(season)
    descriptions.append(description)

season_dict = {
  'Season at Mecca & Madinah':seasons,
  'Descriptions':descriptions
}

df = pd.DataFrame(season_dict)
df

df = df.to_excel('weather.xlsx',index=False)
df = pd.read_excel('weather.xlsx')
df


Unnamed: 0,Season at Mecca & Madinah,Descriptions
0,Spring (March through May),Humidity and temperatures combine to make this...
1,Summer (June through August),The middle-year months have hot weather with h...
2,Fall (September through November),Fall daily highs range from 110.7°F (43.7°C) a...
3,Winter (December through February),Weather is perfect this time of year in Mecca ...


In [None]:
driver.quit()

## 25KM near to Great Mosque of Ka'ba

In [None]:
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
import os
os.environ["PATH"] += r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe;'

chrome_options = Options()
# chrome_options.add_argument("--headless")

driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',
                          chrome_options=chrome_options)
driver.implicitly_wait(0.5)
driver.maximize_window()

  driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',
  driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',


In [None]:
# Navigate to the URL
driver.get('https://www.tripadvisor.com.my/HotelsNear-g293993-d6881993-Great_Mosque_of_Mecca-Mecca_Makkah_Province.html')

In [None]:
soup = BeautifulSoup(driver.page_source, 'html.parser')

import re

hotel_name = []
price_ = []
rating_ = []
distance_ = []
street_address = []
locality = []
country_name = []

for hotel in soup.find_all('div', {'class': 'listing'}):
    # Skip sponsored listings
    if hotel.find('span', {'data-sponsored-placement': 'true'}):
        continue

    rating_tag = hotel.find('a', {'class': 'ui_bubble_rating'})
    if rating_tag is not None:
        rating_class = rating_tag.get('class')[1]
        rating = int(rating_class.split('_')[1]) / 10
    else:
        rating = None


    #Extract to get the expected price
    price_element = hotel.find('div', {'class': 'price'})
    if price_element is not None:
        price = price_element.text.strip().replace('MYR ', '')
    else:
        price = None


    # Extract address information
    hotel_ = hotel.find('a', {'class': 'property_title'}).text.strip()
    dist_div = hotel.find('div', {'class': 'distWrapper'})
    distance = dist_div.find('b').text.strip()
    address_div = hotel.find('div', {'class': 'address'})

    # Extract street address, locality, and country name
    street = None
    loc = None
    country = None
    for span in address_div.find_all('span'):
        if 'street-address' in span['class']:
            street = span.text.strip()
        elif 'locality' in span['class']:
            loc = span.text.strip().split(',')[0]
        elif 'country-name' in span['class']:
            country = span.text.strip()
    
    # Append information to lists
    hotel_name.append(hotel_)
    price_.append(price)
    rating_.append(rating)
    distance_.append(distance)
    street_address.append(street)
    locality.append(loc)
    country_name.append(country)

airport_hotel_dict = {
    'Hotel name':hotel_name,
    'Prices':price_,
    'Ratings':rating_,
    'Distance from haram':distance_,
    'Street Address':street_address,
    'Locality':locality,
    'Country':country_name
}

df = pd.DataFrame(airport_hotel_dict)
df

df = df.to_excel('hotel_near_mosque.xlsx',index=False)
df = pd.read_excel('hotel_near_mosque.xlsx')
df

Unnamed: 0,Hotel name,Prices,Ratings,Distance from haram,Street Address,Locality,Country
0,1. Al-Ghufran Safwah Hotel,"RM 1,703",4.5,0.5 km,Ajyad Street,Mecca 2581,Saudi Arabia
1,"2. InterContinental Dar al Tawhid Makkah, an I...","RM 2,789",4.5,0.6 km,Ibrahim al Khalil Road,Mecca 21955,Saudi Arabia
2,3. Raffles Makkah Palace,"RM 3,877",4.5,0.6 km,King Abdul Aziz Endowment,Mecca 21955,Saudi Arabia
3,4. Makkah Towers,"RM 2,115",4.5,0.6 km,Ibrahim Al Khalil Street,Mecca 21955,Saudi Arabia
4,5. Swissotel Makkah,"RM 1,783",4.0,0.6 km,Ajyad Street,Mecca 21955,Saudi Arabia
5,6. Moevenpick Hotel & Residences Hajar Tower M...,"RM 2,160",4.5,0.6 km,Abraj Al Bait,Mecca 21955,Saudi Arabia
6,7. Pullman ZamZam Makkah,"RM 2,663",4.0,0.6 km,Abraj Al Bait Complex,Mecca 21955,Saudi Arabia
7,8. Hilton Suites Makkah,"RM 2,116",4.5,0.7 km,Jabal Omar,Mecca 21955,Saudi Arabia
8,9. Swissotel Al Maqam Makkah,"RM 1,726",4.5,0.7 km,"King Abdul Aziz Endowment, Ibrahim Al Khalil S...",Mecca 21955,Saudi Arabia
9,10. Fairmont Makkah Clock Royal Tower,"RM 2,104",4.0,0.7 km,King Abdul Aziz Endowment,Mecca 21955,Saudi Arabia


In [None]:
driver.quit()

## Flights

In [None]:
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
import os
os.environ["PATH"] += r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe;'

chrome_options = Options()
# chrome_options.add_argument("--headless")

driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',
                          chrome_options=chrome_options)
driver.implicitly_wait(0.5)
driver.maximize_window()

  driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',
  driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',


In [None]:
driver.get('https://www.tripadvisor.com/CheapFlightsSearchResults-g295419-a_airport0.KUL-a_airport1.JED-a_cos.0-a_date0.20230316-a_date1.20230323-a_formImp.17578aca__2D__4b15__2D__47ef__2D__a7a2__2D__c1bd33faa2a1__2E__10112-a_nearby0.yes-a_nearby1.yes-a_nonstop.yes-a_pax0.a-a_travelers.1-Jeddah_Makkah_Province.html')

In [None]:

flights = []

# loop through each flight container
for flight in soup.find_all('div', class_='jDeNQ'):
    flight_info = {}
    
    # extract flight price
    flight_info['price'] = flight.find('span', class_='xQJFT').text
    
    # extract flight details
    flight_details = []
    for detail in flight.find_all('div', class_='tJUQv'):
        airline_logo = detail.find('img')['src']
        departure_time = detail.find('div', class_='n').text
        origin, destination, airline = detail.find_all('span')
        flight_details.append({
            'airline_logo': airline_logo,
            'departure_time': departure_time,
            'origin': origin.text,
            'destination': destination.text,
            'airline': airline.text
        })
    flight_info['details'] = flight_details
    
    # add flight to list of flights
    flights.append(flight_info)

## FORUM MECCA - PLANNING

In [None]:
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
import os
os.environ["PATH"] += r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe;'

chrome_options = Options()
# chrome_options.add_argument("--headless")

driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',
                          chrome_options=chrome_options)
driver.implicitly_wait(0.5)
driver.maximize_window()

  driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',
  driver = webdriver.Chrome(r'C:\Users\sitisuradi\Downloads\chromedriver_win32\chromedriver.exe',


In [None]:
driver.get('https://www.tripadvisor.com/ShowTopic-g293993-i4536-k13878039-Trip_Report_DIY_Umrah_Trip_March_2022-Mecca_Makkah_Province.html#113052826')

In [None]:
# Initiate the HTML code to be stored in variable 
soup = BeautifulSoup(driver.page_source, 'html.parser')

# Extract forum title
forum_title = soup.find('span',{'class':'topTitleText'}).text

# Extract the pharagraph
infos = []
for info in soup.findAll('p'):
  info_ = info.text.strip()
  if info_:
    infos.append(info_)

print(forum_title)
print(infos)

forum_dict = {
    'Forum title':forum_title,
    'Description':infos
}

df = pd.DataFrame(forum_dict)

df = df.to_excel('umrah_forum_planning.xlsx',index=False)
df = pd.read_excel('umrah_forum_planning.xlsx')
df

Trip Report DIY Umrah Trip March 2022
['Salam everyone. Planning for a new norm umrah trip can be tough when you have misleading info, on top of the ever-changing rules. I did my DIY Umrah trip in March 2022 and Alhamdulillah for this TripAdvisor community it did help me a lot. So, with little info that I know, I would like to share my trip report to hopefully ease it for my brothers and sisters out there planning to visit Haramain soon InsyaAllah.', '**PLEASE BE WARNED THAT THIS IS A LONG POST!!!', 'Pre-departure:', '1. Vaccination: If you do plan to go for Umrah please make sure you’re fully vaccinated with either Pfizer-BioNTech, Moderna, Oxford-AstraZeneca, Janssen, Sinopharm, Sinovac, Cova Xin, Gamaleya (Sputnik V) or Covovax. Those who received the last dose of the vaccine more than 8 months ago must have the booster dose (3rd dose). I am no information regarding children so please do your search accordingly. No meningitis vaccination is required to perform Umrah if you are on To

Unnamed: 0,Forum title,Description
0,Trip Report DIY Umrah Trip March 2022,Salam everyone. Planning for a new norm umrah ...
1,Trip Report DIY Umrah Trip March 2022,**PLEASE BE WARNED THAT THIS IS A LONG POST!!!
2,Trip Report DIY Umrah Trip March 2022,Pre-departure:
3,Trip Report DIY Umrah Trip March 2022,1. Vaccination: If you do plan to go for Umrah...
4,Trip Report DIY Umrah Trip March 2022,"2. Visa. I did my Umrah with a Tourist Visa, w..."
...,...,...
78,Trip Report DIY Umrah Trip March 2022,Assalamualaikum.
79,Trip Report DIY Umrah Trip March 2022,"i will be flying to KSA in a week time, taking..."
80,Trip Report DIY Umrah Trip March 2022,Some airlines require PCR though the country d...
81,Trip Report DIY Umrah Trip March 2022,Thanks in advance.


In [None]:
driver.get('https://mypt3.com/travelog-umrah-diy')

In [None]:
# Initiate the HTML code to be stored in variable 
soup = BeautifulSoup(driver.page_source, 'html.parser')

# Find all h2 tags
h2_tags = soup.find_all('h2')

# Loop through each h2 tag and find the strong tag within
title_ = []
for h2 in h2_tags:
  strong_tag = h2.find('strong')
  if strong_tag:
    text = strong_tag.text
    title_.append(text)
    print(text)



15 Pengalaman Umrah DIY Backpackers
Tips Umrah Diy Terkini Bila Sampai Kota Mekah
Lagi Panduan Umrah Diy Yang Terkini
Mengapa Buat Umrah Guna Agen Pelancongan Mahal
Cara Dapatkan Visa Umrah Untuk Buat Umrah Sendiri


In [None]:
p_tags = soup.find_all('p', {'style': 'text-align: justify;'})

info_ = []
li_info = []

for p_tag in p_tags:
    text = p_tag.text
    info_.append(text)
    
    # Find <li> tags after <p> tag
    li_tags = p_tag.find_all_next('li', {'style': 'text-align: justify;'})
    
    # Find <li> tags under <ol> tags after <p> tag
    ol_tags = p_tag.find_all_next('ol', recursive=False)
    for ol_tag in ol_tags:
        li_tags += ol_tag.find_all('li', {'style': 'text-align: justify;'})
    
    if li_tags:
        for li_tag in li_tags:
            li_text = li_tag.text
            li_info.append(li_text)

    else:
        li_info.append('NA')

li_info

# dict_ = {
#     info_
# }

['Musim sejuk adalah dari Disember hingga akhir Februari (Kebiasaannya)',
 'Musim ini ada hujan juga sekali sekala, tapi hati-hati takut banjir.Di sini, hujan sekejap pun boleh banjir.',
 'Sijil kahwin kalau pergi dengan suami.',
 'Gambar passport Malaysia berlatarbelakangkan warna putih (Size visa Saudi. Kedai gambar tahu)',
 'Dokumen Passport Malaysia',
 'Kad Vaksin',
 'Makanan di Mekah ada mana-mana. Restoran D Saji milik Felda pun ada.',
 'Nak jalan di kawasan Mekah ini kena rancang\xa0keluar 30 minit sebelum waktu solat sebabnya jalan akan ditutup\xa0dan akan sesak.',
 'Nak pilih hotel perlu berhati-hati. Biarpun 300 meter jarak perjalanan \xa0ke Masjidil Haram,\xa0ini boleh buat jalan 30 minit. (tips sesak) Hahahaha.\xa0Jadi berhati-hatilah.',
 'Waktu nak turun di Jabatan imigresen Malaysia perlu beratur panjang. Oleh itu, kalau nak melepas, lepas lah di kapal terbang. Tandas dia kurang selesa sikit dekat airport dan simpan sikit chocolate dan sebagainya. Sebab hari itu ada yang 

In [None]:
p_tags = soup.find_all('p', {'style': 'text-align: justify;'})

data = []
for p_tag in p_tags:
  p_text = p_tag.text
  li_tag = p_tag.find_next_sibling('ul')
  if li_tag:
    li_text = [li.text for li in li_tag.find_all('li')]
    print(li_text)
  else:
    li_text = ['NA']
  data.append({'p_tag': p_text, 'li_info': li_text})

['Tiket murah adalah bermula dari RM 1,800 dan \xa0ke bawah untuk pergi dan balik.', 'Sekiranya tiket bermula dari RM 2,000 dan ke atas, ia adalah mahal.']
['Tiket murah adalah bermula dari RM 1,800 dan \xa0ke bawah untuk pergi dan balik.', 'Sekiranya tiket bermula dari RM 2,000 dan ke atas, ia adalah mahal.']
['Tiket murah adalah bermula dari RM 1,800 dan \xa0ke bawah untuk pergi dan balik.', 'Sekiranya tiket bermula dari RM 2,000 dan ke atas, ia adalah mahal.']
['Tiket murah adalah bermula dari RM 1,800 dan \xa0ke bawah untuk pergi dan balik.', 'Sekiranya tiket bermula dari RM 2,000 dan ke atas, ia adalah mahal.']
['Musim sejuk adalah dari Disember hingga akhir Februari (Kebiasaannya)', 'Musim ini ada hujan juga sekali sekala, tapi hati-hati takut banjir.Di sini, hujan sekejap pun boleh banjir.']
['Sijil kahwin kalau pergi dengan suami.', 'Gambar passport Malaysia berlatarbelakangkan warna putih (Size visa Saudi. Kedai gambar tahu)', 'Dokumen Passport Malaysia', 'Kad Vaksin']
['Sijil

In [None]:
h2_element = soup.find('span', id='Cara_Dapatkan_Visa_Umrah_Untuk_Buat_Umrah_Sendiri')
if h2_element is not None:
    next_p_element = h2_element.find_next_sibling('p')
    if next_p_element is not None:
        print(next_p_element.text)
    else:
        print("No next p element found.")
else:
    print("No h2 element found.")

No next p element found.


## Facebook Scraping 

In [None]:
from facebook_scraper import get_posts