# Web Scraping with Selenium


This notebook demonstrates how to use Selenium to scrape articles related to "Maybank" and "CIMB" from the 5 news portal (Malay Mail, The Edge Market,Business Today, The Star and NST)
The process involves setting up the necessary libraries, configuring a headless Chrome browser, and scraping multiple pages concurrently.

#### 1. Install Selenium and WebDriver Manager 

These commands install the required libraries. Selenium is used to interact with the website, and webdriver-manager ensures the proper ChromeDriver version is installed automatically.

#### 2. Set Chrome Options
These options configure Chrome to run in a headless (background) mode, without showing the browser window, and optimizes performance by disabling images.
#### 3. Initialize WebDriver
This function initializes a Chrome WebDriver instance with the necessary options, including the headless mode.
#### 4. Scrape One Page
This function scrapes a single page of search results for articles containing "Maybank/CIMB" It navigates to the page, waits for the articles to load, and extracts the headline, date, and section for each article.
WebDriverWait is used to pause execution until the page has fully loaded the required elements (.article-item).
#### 5. Handling Extracted Data
The script extracts specific pieces of data for each article: the headline, date, and section.
#### 6. Parallel Scraping of Multiple Pages
ThreadPoolExecutor is used to scrape multiple pages concurrently, improving performance. In this case, the code will scrape up to 285 pages (the default).
#### 7. Saving the Data to CSV
After scraping, the script saves the collected articles into a CSV file using pandas. Each article is stored as a row, and columns represent the headline, date, and section.

In [3]:
!pip install selenium



In [7]:
!pip install webdriver-manager



#### Web Scrapping Malay Mail

In [5]:
import time
import random
from concurrent.futures import ThreadPoolExecutor
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd

# Set up Chrome options
options = Options()
options.add_argument("--headless")  # Remove for visual debugging
options.add_argument("--disable-gpu")
options.add_argument("--no-sandbox")
options.add_argument("--disable-images")
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0 Safari/537.36")

# Initialize WebDriver
def initialize_driver():
    return webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)

# Scrape one page
def scrape_page(page_num):
    url = f"https://www.malaymail.com/search?query=Maybank&pgno={page_num}"
    driver = initialize_driver()
    articles = []

    print(f"\n🔄 Scraping Page {page_num} - {url}")
    try:
        driver.get(url)

        # Wait for articles to load
        WebDriverWait(driver, 15).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, '.article-item'))
        )

        # Extract articles
        containers = driver.find_elements(By.CSS_SELECTOR, '.col-md-3.article-item')
        for container in containers:
            try:
                # Extract headline
                headline = container.find_element(By.CSS_SELECTOR, 'h2.article-title a').text.strip()
                
                # Extract date
                date = container.find_element(By.CSS_SELECTOR, '.article-date').text.strip()
                
                # Extract section (optional)
                section = container.find_element(By.CSS_SELECTOR, '.article-section').text.strip()

                # Append to articles list
                if headline and date:
                    articles.append({
                        'headline': headline,
                        'date': date,
                        'section': section
                    })
            except Exception as e:
                print(f"⚠️ Error extracting article: {e}")
                continue

        time.sleep(random.uniform(1, 2))  # Avoid rate limiting

    except TimeoutException:
        print("❌ Timeout: No articles found on this page.")
        return []
    except Exception as e:
        print(f"❌ Error scraping page {page_num}: {e}")
        return []
    finally:
        driver.quit()

    print(f"✅ Found {len(articles)} articles on Page {page_num}")
    return articles

# Scrape multiple pages in parallel (TEST: First 3 pages only)
def scrape_all_pages(max_pages=285):  # Change to higher number later
    all_articles = []

    # Use ThreadPoolExecutor to scrape multiple pages in parallel
    with ThreadPoolExecutor(max_workers=5) as executor:
        futures = [executor.submit(scrape_page, page) for page in range(1, max_pages + 1)]
        for future in futures:
            articles = future.result()
            all_articles.extend(articles)

    return all_articles

# Run the scraper
all_articles = scrape_all_pages(max_pages=285)  # Scrape 3 pages (1–3)

# Save to CSV
if all_articles:
    df = pd.DataFrame(all_articles)
    df.to_csv('MAYBANK_MM.csv', index=False)
    print(f"\n✅ Final Result: {len(all_articles)} articles saved to CSV.")
else:
    print("❌ No articles found.")


🔄 Scraping Page 4 - https://www.malaymail.com/search?query=Maybank&pgno=4

🔄 Scraping Page 1 - https://www.malaymail.com/search?query=Maybank&pgno=1

🔄 Scraping Page 2 - https://www.malaymail.com/search?query=Maybank&pgno=2

🔄 Scraping Page 3 - https://www.malaymail.com/search?query=Maybank&pgno=3

🔄 Scraping Page 5 - https://www.malaymail.com/search?query=Maybank&pgno=5
✅ Found 20 articles on Page 4
✅ Found 20 articles on Page 1
✅ Found 20 articles on Page 2
✅ Found 20 articles on Page 3
✅ Found 20 articles on Page 5

🔄 Scraping Page 8 - https://www.malaymail.com/search?query=Maybank&pgno=8

🔄 Scraping Page 7 - https://www.malaymail.com/search?query=Maybank&pgno=7

🔄 Scraping Page 9 - https://www.malaymail.com/search?query=Maybank&pgno=9

🔄 Scraping Page 10 - https://www.malaymail.com/search?query=Maybank&pgno=10

🔄 Scraping Page 6 - https://www.malaymail.com/search?query=Maybank&pgno=6
✅ Found 20 articles on Page 9
✅ Found 20 articles on Page 8
✅ Found 20 articles on Page 6
✅ Found 

#### Web Scrapping The Edge Market

In [11]:
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

# Set up Selenium WebDriver with headless mode
options = Options()
options.add_argument("--headless")
options.add_argument("--disable-gpu")
options.add_argument("--no-sandbox")

# Initialize the driver
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)

# Function to get all the articles from a page
def get_articles(page_num):
    # URL pattern with pagination
    url = f'https://theedgemalaysia.com/news-search-results?keywords=Maybank%20&to=2025-06-14&from=2019-01-01&language=english&offset={page_num * 10}'
    driver.get(url)

    # Increase the timeout wait time for content to load
    try:
        print(f"Waiting for page {page_num} to load...")
        WebDriverWait(driver, 40).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, '.NewsList_newsListItemHead__dg7eK'))  # Waiting for the headline element
        )
    except Exception as e:
        print(f"Error: Timeout while waiting for page {page_num} to load - {e}")
        return []

    # Wait a little extra to ensure the content is fully loaded
    time.sleep(2)

    articles = []

    # Find all article containers using the correct selectors
    article_containers = driver.find_elements(By.CSS_SELECTOR, '.NewsList_newsListItemWrap__XovMP') 

    if not article_containers:
        print(f"Warning: No articles found on page {page_num}")

    for article in article_containers:
        try:
            # Extract headline and date using the correct CSS selectors
            headline = article.find_element(By.CSS_SELECTOR, '.NewsList_newsListItemHead__dg7eK').text.strip()
            date = article.find_element(By.CSS_SELECTOR, '.NewsList_infoNewsListSub__Ui2_Z').text.strip()
            category =article.find_element (By.CSS_SELECTOR, '.NewsList_newsListTag__TGHJ_').text.strip() 

            # Append the data to the list
            articles.append({
                'headline': headline,
                'date': date,
                'category' : category
            })
        except Exception as e:
            print(f"Error extracting data from article on page {page_num}: {e}")
            continue  # Skip articles with missing data

    return articles

# Loop through the first 2 pages (page 0 and page 1)
all_articles = []
for page in range(419):  # Change to 2 to only scrape the first 2 pages
    print(f"Scraping page {page + 1}...")
    articles = get_articles(page)
    all_articles.extend(articles)

# Check if any articles were scraped
if all_articles:
    # Save the data to a CSV file
    df = pd.DataFrame(all_articles)
    df.to_csv('Maybank_TEM.csv', index=False)
    print("Scraping complete. Data saved to 'the_edge_rhb_articles_151_pages_selenium.csv'.")
else:
    print("No articles found. Please check the selectors and ensure the pages are being scraped correctly.")

# Close the driver
driver.quit()

Scraping page 1...
Waiting for page 0 to load...
Error extracting data from article on page 0: Message: no such element: Unable to locate element: {"method":"css selector","selector":".NewsList_newsListTag__TGHJ_"}
  (Session info: chrome=137.0.7151.104); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
	GetHandleVerifier [0x0xc63783+63299]
	GetHandleVerifier [0x0xc637c4+63364]
	(No symbol) [0x0xa91113]
	(No symbol) [0x0xad987e]
	(No symbol) [0x0xad9c1b]
	(No symbol) [0x0xacefb1]
	(No symbol) [0x0xafe5c4]
	(No symbol) [0x0xaceed4]
	(No symbol) [0x0xafe7f4]
	(No symbol) [0x0xb1fa4a]
	(No symbol) [0x0xafe376]
	(No symbol) [0x0xacd6e0]
	(No symbol) [0x0xace544]
	GetHandleVerifier [0x0xebe073+2531379]
	GetHandleVerifier [0x0xeb9372+2511666]
	GetHandleVerifier [0x0xc89efa+220858]
	GetHandleVerifier [0x0xc7a548+156936]
	GetHandleVerifier [0x0xc80c7d+183357]
	GetHandleVerifier [0x0xc6b6

#### Web Scrapping Business Today

In [17]:
from concurrent.futures import ThreadPoolExecutor

# Set up Selenium WebDriver with headless mode
options = Options()
options.add_argument("--headless")
options.add_argument("--disable-gpu")
options.add_argument("--no-sandbox")
options.add_argument("--disable-extensions")
options.add_argument("--disable-images")  # Disable images to speed up loading

# Initialize the driver (this will be used for each thread)
def initialize_driver():
    return webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)

# Function to get all the articles from a page
def get_articles(page_num):
    # URL pattern with pagination (20 articles per page)
    url = f'https://www.businesstoday.com.my/page/{page_num + 1}/?s=Maybank'  # Correct pagination with page={page_num + 1}
    driver = initialize_driver()
    driver.get(url)

    # Increase the timeout wait time for content to load
    try:
        print(f"Waiting for page {page_num + 1} to load... ({url})")
        WebDriverWait(driver, 30).until(  # Reduced wait time to 30 seconds for faster load
            EC.presence_of_element_located((By.CSS_SELECTOR, 'h3.entry-title.td-module-title'))  # Waiting for the headline element
        )
    except Exception as e:
        print(f"Error: Timeout while waiting for page {page_num + 1} to load - {e}")
        driver.quit()
        return []

    # Reduce sleep time
    time.sleep(1)  # Reduced sleep time to 1 second

    articles = []

    # Find all article containers using the correct selectors
    article_containers = driver.find_elements(By.CSS_SELECTOR, 'div.tdb_module_loop.td_module_wrap')  # Adjusted class for article containers

    if not article_containers:
        print(f"Warning: No articles found on page {page_num + 1}")
    else:
        print(f"Found {len(article_containers)} articles on page {page_num + 1}")

    for article in article_containers:
        try:
            # Extract headline using the correct CSS selectors
            headline = article.find_element(By.CSS_SELECTOR, 'h3.entry-title.td-module-title').text.strip()
            # Extract date using the correct CSS selectors
            date = article.find_element(By.CSS_SELECTOR, 'time.entry-date.updated.td-module-date').text.strip()
            category = article.find_element(By.CSS_SELECTOR, 'a.td-post-category').text.strip()

            # Append the data to the list
            articles.append({
                'headline': headline,
                'date': date,
                'category': category
            })
        except Exception as e:
            print(f"Error extracting data from article on page {page_num + 1}: {e}")
            continue  # Skip articles with missing data

    driver.quit()
    return articles

# Function to scrape multiple pages concurrently
def scrape_pages(start_page, end_page):
    all_articles = []
    with ThreadPoolExecutor(max_workers=5) as executor:
        # Run scraping tasks in parallel for pages
        futures = [executor.submit(get_articles, page) for page in range(start_page, end_page)]
        
        for future in futures:
            articles = future.result()
            if articles:
                all_articles.extend(articles)

    return all_articles

# Scrape the first 75 pages in parallel
all_articles = scrape_pages(0, 112)

# Check if any articles were scraped
if all_articles:
    # Save the data to a CSV file
    df = pd.DataFrame(all_articles)
    df.to_csv('Maybank_BT.csv', index=False)
    print("Scraping complete. Data saved to 'business_today_RHB_articles_FULL_pages_selenium_parallel.csv'.")
else:
    print("No articles found. Please check the selectors and ensure the pages are being scraped correctly.")

Waiting for page 5 to load... (https://www.businesstoday.com.my/page/5/?s=Maybank)
Waiting for page 4 to load... (https://www.businesstoday.com.my/page/4/?s=Maybank)
Waiting for page 3 to load... (https://www.businesstoday.com.my/page/3/?s=Maybank)
Waiting for page 2 to load... (https://www.businesstoday.com.my/page/2/?s=Maybank)
Waiting for page 1 to load... (https://www.businesstoday.com.my/page/1/?s=Maybank)
Found 20 articles on page 3
Found 20 articles on page 5
Found 20 articles on page 2
Found 20 articles on page 4
Found 20 articles on page 1
Waiting for page 7 to load... (https://www.businesstoday.com.my/page/7/?s=Maybank)Waiting for page 6 to load... (https://www.businesstoday.com.my/page/6/?s=Maybank)

Waiting for page 9 to load... (https://www.businesstoday.com.my/page/9/?s=Maybank)
Waiting for page 10 to load... (https://www.businesstoday.com.my/page/10/?s=Maybank)
Found 20 articles on page 7
Found 20 articles on page 6
Found 20 articles on page 9
Found 20 articles on page 1

#### Web Scrapping The Star

##### 1. Handling Cookie Consent
This block checks if there's a cookie consent prompt on the page. If it exists, the script waits for the Accept button to become clickable and clicks it to accept the cookies. This is a common step to avoid being blocked by the website.
##### 2. Handling Pagination
The script waits for the Next Page button to appear, then clicks it using JavaScript. This helps bypass any potential UI issues like overlapping elements or slow loading.
It continues scraping the next page until the maximum page limit (max_pages) is reached or there are no more pages.
##### 3. Handling Timeout and Errors
TimeoutException: If the Next Page button is not found within the specified time, it indicates there are no more pages to scrape, and the loop breaks.
Any other exceptions are caught and printed for debugging purposes.

In [25]:
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd

# Set up Chrome options
options = Options()
options.add_argument("--headless")  # Remove for visual debugging
options.add_argument("--disable-gpu")
options.add_argument("--no-sandbox")
options.add_argument("--disable-images")
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0 Safari/537.36")

# Initialize WebDriver
def initialize_driver():
    return webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)

def scrape_all_articles():
    base_url = "https://www.thestar.com.my/search?query=Maybank"
    driver = initialize_driver()
    articles = []
    current_page = 1

    try:
        driver.get(base_url)

        # Accept cookie consent 
        try:
            cookie_btn = WebDriverWait(driver, 10).until(
                EC.element_to_be_clickable((By.XPATH, '//button[contains(text(), "Accept")]'))
            )
            cookie_btn.click()
        except Exception:
            pass

        while True:
            print(f"\n🔄 Scraping Page {current_page}")

            # Wait for articles to load
            WebDriverWait(driver, 30).until(
                EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div.queryly_item_container'))
            )

            # Extract articles
            containers = driver.find_elements(By.CSS_SELECTOR, 'div.queryly_item_container')
            for container in containers:
                try:
                    headline = container.find_element(By.CSS_SELECTOR, 'h2.f18 > a').text.strip()
                    date = container.find_element(By.CSS_SELECTOR, '.timestamp').text.strip()
                    category = container.find_element(By.CSS_SELECTOR, '.kicker').text.strip()
                    articles.append({'headline': headline, 'date': date, 'category': category})
                except Exception:
                    continue

            # Print progress
            print(f"✅ Scraped {len(articles)} articles so far (Page {current_page})")

            # Click "Next Page"
            try:
                next_btn = WebDriverWait(driver, 10).until(
                    EC.presence_of_element_located((By.CSS_SELECTOR, 'a.next_btn'))
                )
                driver.execute_script("arguments[0].click();", next_btn)
                current_page += 1
                time.sleep(2)
            except TimeoutException:
                print("❌ No more pages available.")
                break

    except Exception as e:
        print(f"Error during scraping: {e}")
    finally:
        driver.quit()

    return articles

# Run the scraper
all_articles = scrape_all_articles()

# Save to CSV
if all_articles:
    df = pd.DataFrame(all_articles)
    df.to_csv('Maybank_TS.csv', index=False)
    print(f"\n✅ Final Result: {len(all_articles)} articles saved to CSV.")
else:
    print("❌ No articles found.")


🔄 Scraping Page 1
✅ Scraped 20 articles so far (Page 1)

🔄 Scraping Page 2
✅ Scraped 40 articles so far (Page 2)

🔄 Scraping Page 3
✅ Scraped 60 articles so far (Page 3)

🔄 Scraping Page 4
✅ Scraped 80 articles so far (Page 4)

🔄 Scraping Page 5
✅ Scraped 100 articles so far (Page 5)

🔄 Scraping Page 6
✅ Scraped 120 articles so far (Page 6)

🔄 Scraping Page 7
✅ Scraped 140 articles so far (Page 7)

🔄 Scraping Page 8
✅ Scraped 160 articles so far (Page 8)

🔄 Scraping Page 9
✅ Scraped 180 articles so far (Page 9)

🔄 Scraping Page 10
✅ Scraped 200 articles so far (Page 10)

🔄 Scraping Page 11
✅ Scraped 220 articles so far (Page 11)

🔄 Scraping Page 12
✅ Scraped 240 articles so far (Page 12)

🔄 Scraping Page 13
✅ Scraped 260 articles so far (Page 13)

🔄 Scraping Page 14
✅ Scraped 280 articles so far (Page 14)

🔄 Scraping Page 15
✅ Scraped 300 articles so far (Page 15)

🔄 Scraping Page 16
✅ Scraped 320 articles so far (Page 16)

🔄 Scraping Page 17
✅ Scraped 340 articles so far (Page 17)

🔄

#### Web Scrapping NST

In [15]:
import time
import random
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd

# Set up Chrome options
options = Options()
options.add_argument("--headless")  # Remove for visual debugging
options.add_argument("--disable-gpu")
options.add_argument("--no-sandbox")
options.add_argument("--disable-images")
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0 Safari/537.36")

# Initialize WebDriver
def initialize_driver():
    return webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)

# Scrape a single page
def scrape_page(page_num):
    url = f"https://www.nst.com.my/search?keywords=Maybank%20group&page={page_num}"
    driver = initialize_driver()
    articles = []

    print(f"\n🔄 Scraping Page {page_num} - {url}")
    try:
        driver.get(url)

        # Wait for articles to load 
        WebDriverWait(driver, 15).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, '.article-teaser'))
        )

        # Check for "No Results" message
        try:
            no_results = driver.find_element(By.CSS_SELECTOR, '.view-empty')
            if no_results.is_displayed():
                print("❌ No more articles found.")
                return []
        except:
            pass

        # Extract articles
        containers = driver.find_elements(By.CSS_SELECTOR, '.article-teaser')
        for container in containers:
            try:
                headline = container.find_element(By.CSS_SELECTOR, 'h6.field-title').text.strip()
                date = container.find_element(By.CSS_SELECTOR, '.created-ago').text.strip()
                category = container.find_element(By.CSS_SELECTOR, '.field-category').text.strip()
                articles.append({'headline': headline, 'date': date, 'category': category})
            except Exception as e:
                print(f"⚠️ Error extracting article: {e}")
                continue

        time.sleep(random.uniform(1, 2))  # Avoid rate limiting

    except TimeoutException:
        print("❌ Timeout: No articles found on this page.")
        return []
    except Exception as e:
        print(f"❌ Error scraping page {page_num}: {e}")
        return []
    finally:
        driver.quit()

    print(f"✅ Found {len(articles)} articles on Page {page_num}")
    return articles

# Scrape all pages until no more articles
def scrape_all_pages():
    all_articles = []
    page_num = 0

    while True:
        articles = scrape_page(page_num)

        if not articles:
            print(f"🛑 Stopping: No articles found on Page {page_num}")
            break

        all_articles.extend(articles)
        page_num += 1

    return all_articles

# Run the scraper
all_articles = scrape_all_pages()

# Save to CSV
if all_articles:
    df = pd.DataFrame(all_articles)
    df.to_csv('MaybankG_NST.csv', index=False)
    print(f"\n✅ Final Result: {len(all_articles)} articles saved to CSV.")
else:
    print("❌ No articles found.")


🔄 Scraping Page 0 - https://www.nst.com.my/search?keywords=Maybank%20group&page=0
⚠️ Error extracting article: Message: no such element: Unable to locate element: {"method":"css selector","selector":"h6.field-title"}
  (Session info: chrome=137.0.7151.104); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
	GetHandleVerifier [0x0xef3783+63299]
	GetHandleVerifier [0x0xef37c4+63364]
	(No symbol) [0x0xd21113]
	(No symbol) [0x0xd6987e]
	(No symbol) [0x0xd69c1b]
	(No symbol) [0x0xd5efb1]
	(No symbol) [0x0xd8e5c4]
	(No symbol) [0x0xd5eed4]
	(No symbol) [0x0xd8e7f4]
	(No symbol) [0x0xdafa4a]
	(No symbol) [0x0xd8e376]
	(No symbol) [0x0xd5d6e0]
	(No symbol) [0x0xd5e544]
	GetHandleVerifier [0x0x114e073+2531379]
	GetHandleVerifier [0x0x1149372+2511666]
	GetHandleVerifier [0x0xf19efa+220858]
	GetHandleVerifier [0x0xf0a548+156936]
	GetHandleVerifier [0x0xf10c7d+183357]
	GetHandleVerifier [0x0

# Scraping Historical Stock Data for CIMB Group and Maybank (2019–2025)

#### 1. Install libraries
Installs yfinance (for fetching stock data) and pandas (for saving the data in CSV format).

#### 2. Define stock tickers
Ticker symbols for CIMB Group (1023.KL) and Maybank (1155.KL) are defined in a dictionary.

#### 3. Set date range
The code will download stock data from January 1, 2019 to December 31, 2025.

#### 4. Download stock data
The function get_stock_data uses yfinance to download the stock data for each bank over the specified date range.

#### 5. Save to CSV
For each bank, the code fetches the data and saves it into a CSV file. The file is named based on the bank's name and the date range.

In [17]:
!pip install yfinance pandas

Collecting yfinance
  Downloading yfinance-0.2.63-py2.py3-none-any.whl.metadata (5.8 kB)
Collecting multitasking>=0.0.7 (from yfinance)
  Downloading multitasking-0.0.11-py3-none-any.whl.metadata (5.5 kB)
Collecting peewee>=3.16.2 (from yfinance)
  Downloading peewee-3.18.1.tar.gz (3.0 MB)
     ---------------------------------------- 0.0/3.0 MB ? eta -:--:--
     ---------------------------------------- 0.0/3.0 MB ? eta -:--:--
      --------------------------------------- 0.1/3.0 MB 975.2 kB/s eta 0:00:04
     ---- ----------------------------------- 0.4/3.0 MB 2.9 MB/s eta 0:00:01
     -------- ------------------------------- 0.6/3.0 MB 3.8 MB/s eta 0:00:01
     ------------ --------------------------- 1.0/3.0 MB 4.8 MB/s eta 0:00:01
     ---------------- ----------------------- 1.2/3.0 MB 4.9 MB/s eta 0:00:01
     ------------------ --------------------- 1.4/3.0 MB 4.8 MB/s eta 0:00:01
     ---------------------- ----------------- 1.7/3.0 MB 4.9 MB/s eta 0:00:01
     --------------

In [19]:
import yfinance as yf
import pandas as pd

# Define stock ticker symbols for CIMB Group, Maybank, and RHB Bank
tickers = {
    'CIMB Group': '1023.KL',
    'Maybank': '1155.KL'
}

# Define the date range for historical data
start_date = '2019-01-01'
end_date = '2025-12-31'

# Function to download stock data
def get_stock_data(ticker, start, end):
    stock = yf.Ticker(ticker)
    data = stock.history(start=start, end=end)
    return data

# Download data for each stock and save it in a separate CSV
for bank, ticker in tickers.items():
    print(f"Downloading data for {bank}...")
    data = get_stock_data(ticker, start_date, end_date)
    
    # Save each stock's data to a separate CSV file
    file_name = f"stock_prices_{bank.replace(' ', '_').lower()}_2019_2025.csv"
    data.to_csv(file_name)
    
    print(f"Data for {bank} saved successfully to '{file_name}'.")

Downloading data for CIMB Group...
Data for CIMB Group saved successfully to 'stock_prices_cimb_group_2019_2025.csv'.
Downloading data for Maybank...
Data for Maybank saved successfully to 'stock_prices_maybank_2019_2025.csv'.
