# Dynamic Website Scraping with Selenium: Conquering Infinite Scrolls and Swipes

## Table of Contents

- Overview: #overview
- Key Features: #key-features
- Technologies Used: #technologies-used
- Setup: #setup
- Usage: #usage
- Handling Infinite Scroll: #handling-infinite-scroll
- Simulating Left-Right Swipes: #simulating-left-right-swipes
- Extracting Data: #extracting-data
- Additional Tips: #additional-tips
- Author and License: #author-and-license

## Overview

This project demonstrates how to leverage Selenium to effectively scrape dynamic websites that present challenges like infinite scrolling and swipe-based navigation. It provides practical code examples and guidance for overcoming these common obstacles.

## Key Features

- **Infinite Scroll Handling:** Employs techniques to detect scroll ends and trigger further content loading, ensuring complete data capture.
- **Left-Right Swipe Simulation:** Uses JavaScript execution within Selenium to replicate swiping actions, effectively navigating websites that rely on horizontal swipes.
- **Data Extraction:** Demonstrates methods to extract relevant information from the scraped content, tailoring extraction techniques to the specific website's structure.
- **Clear Code Examples and Explanations:** Provides well-structured code with detailed comments, aiding understanding and adaptability to different scenarios.

## Technologies Used

- Selenium WebDriver
- Python (or your preferred programming language)
- WebDriver for your chosen browser (e.g., ChromeDriver for Chrome)

## Setup

1. Install required libraries:
   ```bash
   pip install selenium
   ```
2. Download the appropriate WebDriver for your browser.

## Usage

(Provide code examples and explanations for:)

- Setting up the WebDriver
- Navigating to the target website
- Identifying elements for scraping
- Handling infinite scroll
- Simulating left-right swipes
- Extracting data
- Saving the extracted data

## Handling Infinite Scroll

(Describe specific techniques used for infinite scroll handling)

## Simulating Left-Right Swipes

(Explain how JavaScript execution is used to simulate swipes)

## Extracting Data

(Detail methods used for data extraction, accounting for website structure)

## Additional Tips

- Adjust wait times and element locators to match the target website's behavior.
- Handle potential errors gracefully (e.g., network issues, website changes).
- Consider using a headless browser for faster execution.
- Respect website terms of service and robots.txt.

## Author and License

Written by Anh Nhat Nguyen

License: This project is licensed under the MIT License.


In [4]:
## import
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as ECondition 
from selenium.webdriver.chrome.service import Service
import json
import time

## create an object of the chrome webdriver
USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.50 Safari/537.36'
option = webdriver.ChromeOptions()
# option.add_argument('headless')
option.add_argument("--window-size=1920,1080")
option.add_argument(f'user-agent={USER_AGENT}')
service = Service(executable_path=r'../chromedriver-win64/chromedriver.exe')
driver = webdriver.Chrome(service=service, options=option)


In [5]:
driver.get(url = "https://migo.travel/Destination/vietnam-hanoi")
pagesource = driver.page_source

In [None]:
pagesource

---

### Banner slide section

In [57]:
#banner
banner_imgs_src_list = []
swiper_banner = driver.find_elements(By.XPATH, value='/html/body/div[1]/main/div[2]/div[1]')
banner_imgs = swiper_banner[0].find_elements(By.TAG_NAME, 'img')
for banner_img in banner_imgs:
    banner_imgs_src = banner_img.get_property('src')
    banner_imgs_src_list.append(banner_imgs_src)

In [115]:
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
for i in range(0, 30):
    ActionChains(driver).click_and_hold(swiper_banner[0]).move_by_offset(-200 , -20).release().perform()
    time.sleep(0.5)
ActionChains(driver).send_keys(Keys.ESCAPE).perform()

In [48]:
banner_imgs_src_list

['https://files.migo.travel/20230508/hanoi---hang-rong---do-luu-niem-p0savsud.webp?f=plVDT9mpvkS0W-57KuxEZw',
 'https://files.migo.travel/20230725/destination-hanoi-migo-10-y5bkpgl2.webp?f=XcGpbyY_gUap2QK_1BRffA',
 'https://files.migo.travel/20230508/hanoi---cot-co-ha-noi-t4flesoi.webp?f=Z7qhniZcy0ilGLAIhcTjbQ',
 'https://files.migo.travel/20230725/destination-hanoi-migo-2-gme5lqrf.webp?f=jlIXmSyJ-0CtKgH4Lr9RCA',
 'https://files.migo.travel/20230725/destination-hanoi-migo-8-p2pldi3d.webp?f=D8cj4NSBJ0qCIPwxSXyMvA',
 'https://files.migo.travel/20230725/destination-hanoi-migo-8-wun3iha0.webp?f=rVnXOPUsVUSs5E5rldGh6Q',
 'https://files.migo.travel/20230725/destination-hanoi-migo-9-teaaq4z3.webp?f=ne22EXBl206enycWJI_cTQ',
 'https://files.migo.travel/20230725/destination-hanoi-migo-18-pej5wel4.webp?f=mDIzf-z7iU6qRpjn1MX2Hw',
 'https://files.migo.travel/20230725/destination-hanoi-migo-5-gw2kok0a.webp?f=9i2j5ln7_0CGbKtGawrGSA',
 'https://files.migo.travel/20230725/destination-hanoi-migo-4-5ezym

---

### About section

In [None]:
#about-container
about_container = driver.find_elements(By.XPATH,'/html/body/div[1]/main/div[2]/div[4]')



In [None]:
about_container_papragraph = [p.text for p in about_container[0].find_elements(By.TAG_NAME, 'p')]
about_container_papragraph

In [None]:
about_container_href = [h.get_property('href') for h in about_container[0].find_elements(By.TAG_NAME, 'a')]
about_container_href

In [None]:
about_container_imgs = [h.get_property('src') for h in about_container[0].find_elements(By.TAG_NAME, 'img')]
about_container_imgs

---

### Explore Hanoi section

In [149]:
expore_section = driver.find_elements(By.ID, 'lstExploreEvent')[-1]

In [165]:
list_attraction = expore_section.find_elements(By.CLASS_NAME, 'list-attraction')[0]

In [None]:
list_attraction.text

In [170]:
list_attraction_imgs_src = []
prev_height = driver.execute_script("return arguments[0].scrollHeight",expore_section)
while True:
    # do scrolling
    driver.execute_script("arguments[0].scrollBy(0,arguments[0].scrollHeight)",expore_section)
    time.sleep(0.5)
    new_current_height = driver.execute_script("return arguments[0].scrollHeight",expore_section)
    print(new_current_height)
    if new_current_height - prev_height == 0:
        break
    prev_height = new_current_height

driver.execute_script("arguments[0].scrollIntoView()",list_attraction)
curr_height = 0
while curr_height <= prev_height:
    driver.execute_script("arguments[0].scrollBy(0,500)",expore_section)
    curr_height += 500
    time.sleep(0.05)
    list_attraction_imgs = list_attraction.find_elements(By.TAG_NAME,'img')
    list_attraction_imgs_lst = [i.get_property('src') for i in list_attraction_imgs if str(i.get_property('src')) != ""]
    list_attraction_imgs_src += list_attraction_imgs_lst
    

39696


In [171]:
list_attraction_imgs_src

['https://files.migo.travel/20231218/artemispastryattractionmigo43-d3vokegk.webp?f=9d8x72pPeEOOI1BhrFIfIw',
 'https://files.migo.travel/20231218/357104289_137513002694758_5010721774976859316_n-1b454zwq.webp?f=k2JPW6JHkkS3TnicuRT3Hw',
 'https://files.migo.travel/20231218/kasaya-nhxe0hxe0ngchayvxe0cafeattractionmigo18-mvydtpwy.webp?f=95lG4p1Do06Rpy5rKIVMLA',
 'https://files.migo.travel/20231218/cxe1imxe2mbistrosignatureveganattractionmigo16-jwaftmbu.webp?f=YRDXia0wnUm9pnPA7lMn9w',
 'https://files.migo.travel/20231218/lxe1lx1ed1t-vietnamesecuisine-edrak12o.webp?f=yKSMDGSpH0OIxc5sH5PZoQ',
 'https://files.migo.travel/20231129/phx1ee5ngthxe0nhtraditionalcuisineattractionmigo13-eka3mw2x.webp?f=6H0OWYPX9k674K0JplIRLQ',
 'https://files.migo.travel/20231128/kumihimo-jwmarriotthanoiattractionmigo28-kc4jffz3.webp?f=b-VRPZpuD0y6c3A2q5mXjQ',
 'https://files.migo.travel/20231114/hangdauwatertankattractionhanoimigo8-oy0kwnlj.webp?f=3Lj0tqsFTkOdlx-XkYB-_w',
 'https://files.migo.travel/20231114/hummingb

In [127]:
list_attraction_imgs = list_attraction.find_elements(By.TAG_NAME,'img')
len(list_attraction_imgs)

198

---

### Lastest stories section

In [6]:
while True:
    try:
        loadMoreButton = driver.find_element(By.XPATH,'//*[@id="load-more"]')
        time.sleep(1)
        loadMoreButton.click()
    except Exception as e:
        print(e)
        break

Message: element not interactable
  (Session info: chrome=120.0.6099.217)
Stacktrace:
	GetHandleVerifier [0x00007FF727CE2142+3514994]
	(No symbol) [0x00007FF727900CE2]
	(No symbol) [0x00007FF7277A74C3]
	(No symbol) [0x00007FF7277F2D29]
	(No symbol) [0x00007FF7277E6A0F]
	(No symbol) [0x00007FF727815FEA]
	(No symbol) [0x00007FF7277E63B6]
	(No symbol) [0x00007FF727816490]
	(No symbol) [0x00007FF7278328F6]
	(No symbol) [0x00007FF727815D93]
	(No symbol) [0x00007FF7277E4BDC]
	(No symbol) [0x00007FF7277E5C64]
	GetHandleVerifier [0x00007FF727D0E16B+3695259]
	GetHandleVerifier [0x00007FF727D66737+4057191]
	GetHandleVerifier [0x00007FF727D5E4E3+4023827]
	GetHandleVerifier [0x00007FF727A304F9+689705]
	(No symbol) [0x00007FF72790C048]
	(No symbol) [0x00007FF727908044]
	(No symbol) [0x00007FF7279081C9]
	(No symbol) [0x00007FF7278F88C4]
	BaseThreadInitThunk [0x00007FF9B4EB7344+20]
	RtlUserThreadStart [0x00007FF9B5A626B1+33]



In [11]:
stories_content_rows = driver.find_elements(By.XPATH,'/html/body/div[1]/main/div[2]/div[7]/div[2]')[0]
stories_content_rows

<selenium.webdriver.remote.webelement.WebElement (session="e7a52f40c5d64ba254db1e2c60ff4180", element="803F638E1ABC4431400FFE8BA70BBAEA_element_317")>

In [14]:
stories_content = stories_content_rows.find_elements(By.TAG_NAME,'a')
len(stories_content)

612

In [24]:
substory_text_href = [t.get_attribute('href') for t in stories_content]
len(substory_text_href), substory_text_href

(612,
 ['https://migo.travel/Experience/5-nha-hang-am-thuc-tay-ban-nha-an-tuong-tai-ha-noi',
  'https://migo.travel/Experience/5-nha-hang-am-thuc-tay-ban-nha-an-tuong-tai-ha-noi',
  'https://migo.travel/Experience/5-nha-hang-am-thuc-tay-ban-nha-an-tuong-tai-ha-noi',
  'https://migo.travel/Experience/5-nha-hang-am-thuc-tay-ban-nha-an-tuong-tai-ha-noi',
  'https://migo.travel/Experience/tiec-toi-trong-gu-bistronomy-nha-hang-fine-dining-hang-dau-tai-ha-noi',
  'https://migo.travel/Pillar/FoodAndDrink',
  'https://migo.travel/Experience/tiec-toi-trong-gu-bistronomy-nha-hang-fine-dining-hang-dau-tai-ha-noi',
  'https://migo.travel/Experience/tiec-toi-trong-gu-bistronomy-nha-hang-fine-dining-hang-dau-tai-ha-noi',
  'https://migo.travel/Experience/thuong-thuc-nhung-bua-an-ngon-mieng-voi-nha-hang-am-thuc-phap-tai-ha-noi',
  'https://migo.travel/Pillar/FoodAndDrink',
  'https://migo.travel/Experience/thuong-thuc-nhung-bua-an-ngon-mieng-voi-nha-hang-am-thuc-phap-tai-ha-noi',
  'https://migo.trav

In [26]:
def unique_list(in_list):
    list_set = set(in_list)
    unique_list = (list(list_set))
    return unique_list

In [27]:
substory_text_href_uniq = unique_list(substory_text_href)
len(substory_text_href_uniq)

158

In [25]:
substory_text_headers = " ".join([t.text for t in stories_content]).split("Read more")
len(substory_text_headers),substory_text_headers    

(154,
 ['  5 impressive Spanish cuisine restaurants in Hanoi ',
  '  Food & Drink · Dinner in GU Bistronomy, the leading fine dining restaurant in Hanoi ',
  '  Food & Drink · Enjoy delicious meals with French cuisine restaurants in Hanoi ',
  '  City & Culture · Enjoy the culture the way Hanoians - winter eating Trang Tien ice cream ',
  '  Food & Drinks · Enjoy 10 street foods in winter in Hanoi ',
  '  Food & Drinks · Beautiful Christmas decorations and restaurants in Hanoi ',
  '  Food & Drinks · Delicious Vietnamese rice at Xoi Rice ',
  "  Food & Drinks · Metropole Hanoi's Spice Garden Restaurant Reopens ",
  "  Food & Drinks · What's special about Tanh Split and SMOKE – two restaurants in Hanoi ",
  '  City & Culture · Hanoi nights are attractive with tourism products ',
  '  Food & Drinks · Go for a drink at Kumihimo Bar & Terrace ',
  '  City & Culture · Experience the heritage train, the 120-year-old Gia Lam railway factory ',
  '  City & Culture · Trains running through Hano

In [32]:
stories_content_paragraph_container = driver.find_elements(By.XPATH,'/html/body/div[1]/main/div[2]/div[7]')[0]
stories_content_paragraph = stories_content_paragraph_container.find_elements(By.TAG_NAME,'div')
stories_content_paragraph_text = [t.text for t in stories_content_paragraph]
stories_content_paragraph_text

['Latest Stories from Hanoi',
 'Latest Stories from Hanoi',
 '',
 'Food & Drinks 11/01/2024\n5 impressive Spanish cuisine restaurants in Hanoi\nIn addition to enjoying the delicious and nutritious flavors, diners also enjoy a luxurious and classy restaurant space. Even the most demanding diners, or diners who have never tried Spanish cuisine, will be intrigued.\nRead more\nFood & Drink · 10/01/2024\nDinner in GU Bistronomy, the leading fine dining restaurant in Hanoi\nOn the journey to experience culinary quintessence, GU is a destination for diners with taste, to satisfy their own culinary taste with creative dishes, valuable and rare wine collections and fine dining standard services.\nRead more\nFood & Drink · 04/01/2024\nEnjoy delicious meals with French cuisine restaurants in Hanoi\nEnjoying French cuisine is an art, as it lies not only in the taste of the dish but also in the presentation and space of enjoyment. Let\'s explore the standard French restaurants below with Migo.\nRea

In [34]:
stories_content_imgs_section = stories_content_paragraph_container.find_elements(By.TAG_NAME,'img')
stories_content_imgs = [t.get_attribute('src') for t in stories_content_imgs_section]
stories_content_imgs

['https://files.migo.travel/20230727/spanish-tapas-and-sangria-on-wooden-table-top-view_519793093-t4wncsyk.webp?f=u5NfbHEAv0uPrtD44OX9Uw',
 'https://files.migo.travel/20230727/spanish-tapas-and-sangria-on-wooden-table-top-view_519793093-t4wncsyk.webp?f=u5NfbHEAv0uPrtD44OX9Uw',
 'https://files.migo.travel/20230929/avargu-q3hzipkn.webp?f=JNV3vnSebEObyGr8Xlagug',
 'https://files.migo.travel/20230724/343562324_964238148088815_7477640552227212310_n-5kvbae0l.webp?f=NsGyw7leNUKuyplbil6UNQ',
 'https://files.migo.travel/20231221/ce13-ce4b-4bf1-a946-8913678355e9_slmx-ujfsiewn.webp?f=_eeR0tZW00u0rIK0VAY8Eg',
 'https://files.migo.travel/20231218/banh-troi-tau-quynh-mai-1702367772-1l5c0dey.webp?f=JsSh-ZwTN0qqYmkhGTmbbw',
 'https://files.migo.travel/20231208/407953667_318473287728201_1261982254385967867_n-btq4kv1i.webp?f=Gi_wpr8AGES_ZLzg6MknoQ',
 'https://files.migo.travel/20231208/347443049_631189431885961_3463678550118900056_n-dhe4u0qf.webp?f=UbsRqXzX90qVQwmxLHabrQ',
 'https://files.migo.travel/20

In [35]:
stories_content_vid_section = stories_content_paragraph_container.find_elements(By.TAG_NAME,'iframe')
stories_content_vid = [t.get_attribute('src') for t in stories_content_imgs_section]
stories_content_vid

['https://files.migo.travel/20230727/spanish-tapas-and-sangria-on-wooden-table-top-view_519793093-t4wncsyk.webp?f=u5NfbHEAv0uPrtD44OX9Uw',
 'https://files.migo.travel/20230727/spanish-tapas-and-sangria-on-wooden-table-top-view_519793093-t4wncsyk.webp?f=u5NfbHEAv0uPrtD44OX9Uw',
 'https://files.migo.travel/20230929/avargu-q3hzipkn.webp?f=JNV3vnSebEObyGr8Xlagug',
 'https://files.migo.travel/20230724/343562324_964238148088815_7477640552227212310_n-5kvbae0l.webp?f=NsGyw7leNUKuyplbil6UNQ',
 'https://files.migo.travel/20231221/ce13-ce4b-4bf1-a946-8913678355e9_slmx-ujfsiewn.webp?f=_eeR0tZW00u0rIK0VAY8Eg',
 'https://files.migo.travel/20231218/banh-troi-tau-quynh-mai-1702367772-1l5c0dey.webp?f=JsSh-ZwTN0qqYmkhGTmbbw',
 'https://files.migo.travel/20231208/407953667_318473287728201_1261982254385967867_n-btq4kv1i.webp?f=Gi_wpr8AGES_ZLzg6MknoQ',
 'https://files.migo.travel/20231208/347443049_631189431885961_3463678550118900056_n-dhe4u0qf.webp?f=UbsRqXzX90qVQwmxLHabrQ',
 'https://files.migo.travel/20

---

### Footer section

In [36]:
footer_section = driver.find_elements(By.TAG_NAME, 'footer')[0]

In [38]:
footer_text = footer_section.text
footer_text

'Where your journey begins\nmarketing@migo.travel\nABOUT MIGO\nAbout Us\nTerms & Conditions\nPrivacy Policy\nPOPULAR SITES\nDestinations\nExperiences\nTours\nEvents\nSOCIAL MEDIA\n© 2023 Exploria Vietnam. All rights reserved.'

In [40]:
footer_section_a = footer_section.find_elements(By.TAG_NAME, 'a')
footer_section_a_href = [a.get_attribute('href') for a in footer_section_a]
footer_section_a_href

['https://migo.travel/',
 'mailto:marketing@migo.travel',
 'https://migo.travel/about',
 'https://migo.travel/support/terms',
 'https://migo.travel/support/policy',
 'https://migo.travel/Destinations',
 'https://migo.travel/Pillar',
 'https://migo.travel/Tour',
 'https://migo.travel/Event',
 'https://www.facebook.com/migotravel.vietnam',
 'https://www.instagram.com/migotravel.vietnam/']

---

## Put it all together

In [132]:
## import
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as ECondition 
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
import json
import time

## create an object of the chrome webdriver
USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.50 Safari/537.36'
option = webdriver.ChromeOptions()
# option.add_argument('headless') # headless <<- no windows
option.add_argument("--window-size=1920,1080")
option.add_argument(f'user-agent={USER_AGENT}')
service = Service(executable_path=r'../chromedriver-win64/chromedriver.exe')
driver = webdriver.Chrome(service=service, options=option)

target_url = "https://migo.travel/Destination/vietnam-hanoi"

driver.get(url = target_url)
pagesource = driver.page_source

media_src_and_link_list = []
text_content_list = []

time.sleep(3)

## Banner
banner_imgs_src_list = []
swiper_banner = driver.find_elements(By.XPATH, value='/html/body/div[1]/main/div[2]/div[1]')
# Do swipe left
for i in range(0, 30):
    ActionChains(driver).click_and_hold(swiper_banner[0]).move_by_offset(-200 , -20).release().perform()
    time.sleep(0.05)
# Escape the image gallery view
ActionChains(driver).send_keys(Keys.ESCAPE).perform()
# get all banner images source
banner_imgs = swiper_banner[0].find_elements(By.TAG_NAME, 'img')
for banner_img in banner_imgs:
    banner_imgs_src = banner_imgs[0].get_property('src')
    banner_imgs_src_list.append(banner_imgs_src)
# add to media list
media_src_and_link_list += banner_imgs_src_list

## about-container
about_container = driver.find_elements(By.XPATH,'/html/body/div[1]/main/div[2]/div[4]')
about_container_papragraph = [p.text for p in about_container[0].find_elements(By.TAG_NAME, 'p')]
about_container_href = [h.get_property('href') for h in about_container[0].find_elements(By.TAG_NAME, 'a')]
about_container_imgs = [h.get_property('src') for h in about_container[0].find_elements(By.TAG_NAME, 'img')]
text_content_list += about_container_papragraph
media_src_and_link_list += about_container_href
media_src_and_link_list += about_container_imgs

## Eplore section
expore_section = driver.find_elements(By.ID, 'lstExploreEvent')[-1]
list_attraction = expore_section.find_elements(By.CLASS_NAME, 'list-attraction')[0]
list_attraction_text = list_attraction.text
text_content_list.append(list_attraction_text)

list_attraction_imgs_src = []
prev_height = driver.execute_script("return arguments[0].scrollHeight",expore_section)
while True:
    # do scrolling
    driver.execute_script("arguments[0].scrollBy(0,arguments[0].scrollHeight)",expore_section)
    time.sleep(0.5)
    new_current_height = driver.execute_script("return arguments[0].scrollHeight",expore_section)
    print(new_current_height)
    if new_current_height - prev_height == 0:
        break
    prev_height = new_current_height

driver.execute_script("arguments[0].scrollIntoView()",list_attraction)
curr_height = 0
while curr_height <= prev_height:
    driver.execute_script("arguments[0].scrollBy(0,500)",expore_section)
    curr_height += 500
    time.sleep(0.05)
    list_attraction_imgs = list_attraction.find_elements(By.TAG_NAME,'img')
    list_attraction_imgs_lst = [i.get_property('src') for i in list_attraction_imgs if str(i.get_property('src')) != ""]
    list_attraction_imgs_src += list_attraction_imgs_lst

media_src_and_link_list += list_attraction_imgs_src

## Lastest stories
while True:
    try:
        loadMoreButton = driver.find_element(By.XPATH,'//*[@id="load-more"]')
        time.sleep(1)
        loadMoreButton.click()
    except Exception as e:
        break

stories_content_rows = driver.find_elements(By.XPATH,'/html/body/div[1]/main/div[2]/div[7]/div[2]')[0]
stories_content = stories_content_rows.find_elements(By.TAG_NAME,'a')

def unique_list(in_list):
    list_set = set(in_list)
    unique_list = (list(list_set))
    return unique_list

substory_text_href = [t.get_attribute('href') for t in stories_content]
substory_text_href_uniq = unique_list(substory_text_href)

substory_text_headers = " ".join([t.text for t in stories_content]).split("Read more")
substory_text_headers_uniq = unique_list(substory_text_headers)

text_content_list += substory_text_headers
media_src_and_link_list += substory_text_headers_uniq

stories_content_paragraph_container = driver.find_elements(By.XPATH,'/html/body/div[1]/main/div[2]/div[7]')[0]
stories_content_paragraph = stories_content_paragraph_container.find_elements(By.TAG_NAME,'div')
stories_content_paragraph_text = [t.text for t in stories_content_paragraph]
text_content_list += stories_content_paragraph_text

stories_content_imgs_section = stories_content_paragraph_container.find_elements(By.TAG_NAME,'img')
stories_content_imgs = [t.get_attribute('src') for t in stories_content_imgs_section]
media_src_and_link_list += stories_content_imgs

stories_content_vid_section = stories_content_paragraph_container.find_elements(By.TAG_NAME,'iframe')
stories_content_vid = [t.get_attribute('src') for t in stories_content_imgs_section]
media_src_and_link_list += stories_content_vid

## Footer
footer_section = driver.find_elements(By.TAG_NAME, 'footer')[0]
footer_text = footer_section.text
text_content_list.append(footer_text)

footer_section_a = footer_section.find_elements(By.TAG_NAME, 'a')
footer_section_a_href = [a.get_attribute('href') for a in footer_section_a]
media_src_and_link_list += footer_section_a_href

2504


In [133]:
text_content_list

['Hanoi has experienced a long history for more than 1000 years with 36 streets called Old Quarter. Hanoi nowadays is much more different than the past. The ancient city is being invigorated with modern cafes, bar, world-class restaurants and interesting art galleries.',
 '',
 '★ World Cultural Heritage Site Central Sector of the Imperial Citadel of Thang Long - Hanoi (2010)',
 'Best Time To Visit Hanoi',
 'The best time to visit the capital is around March, April when spring flowers bloom, and from August to November when it is autumn with cool and pleasant temperatures.',
 'Transport',
 'Noi Bai International Airport is 45km away from the city center. There are some means of transportation you can choose to get around the city, such as taxi, technology motorbike taxi, bus, or rental motorbikes. You should give a try on cyclo in the Old Quarter to leisurely go sightseeing.',
 '5 fine dining restaurants for the perfect dinner in Hanoi',
 'Late afternoon sunset on the Sky bar in Hanoi O

In [134]:
media_src_and_link_list

['https://files.migo.travel/20230508/hanoi---hang-rong---do-luu-niem-p0savsud.webp?f=plVDT9mpvkS0W-57KuxEZw',
 'https://files.migo.travel/20230508/hanoi---hang-rong---do-luu-niem-p0savsud.webp?f=plVDT9mpvkS0W-57KuxEZw',
 'https://files.migo.travel/20230508/hanoi---hang-rong---do-luu-niem-p0savsud.webp?f=plVDT9mpvkS0W-57KuxEZw',
 'https://files.migo.travel/20230508/hanoi---hang-rong---do-luu-niem-p0savsud.webp?f=plVDT9mpvkS0W-57KuxEZw',
 'https://files.migo.travel/20230508/hanoi---hang-rong---do-luu-niem-p0savsud.webp?f=plVDT9mpvkS0W-57KuxEZw',
 'https://files.migo.travel/20230508/hanoi---hang-rong---do-luu-niem-p0savsud.webp?f=plVDT9mpvkS0W-57KuxEZw',
 'https://files.migo.travel/20230508/hanoi---hang-rong---do-luu-niem-p0savsud.webp?f=plVDT9mpvkS0W-57KuxEZw',
 'https://files.migo.travel/20230508/hanoi---hang-rong---do-luu-niem-p0savsud.webp?f=plVDT9mpvkS0W-57KuxEZw',
 'https://files.migo.travel/20230508/hanoi---hang-rong---do-luu-niem-p0savsud.webp?f=plVDT9mpvkS0W-57KuxEZw',
 'https://