## Scraping the Most Active Stocks from YahooFinance

[Website](https://finance.yahoo.com/markets/stocks/most-active/)

### Importing necessary Libraries
- pandas for data handling, cleaning, manipulation and analysis.

- selenium automates web browsers and user interaction like clicking buttons or dynamically waiting for items to load. Also for scraping sites that requires JavaScript rendering.
    - Service manages the ChromeDriver service for Selenium to interact with the Chrome browse

    - By provides methods to locate elements on a webpage (e.g., by ID, name, class name, etc.).
    - WebDriverWait explicitly waits for specific conditions to be met before proceeding with browser actions.
    - expected_conditions provides a collection of pre-built conditions for WebDriverWait (e.g., element visibility, clickability).
    - Select simplifies interactions with _select_ HTML elements, like selecting options from dropdowns by visible text, index, or value
- time provides time-related functions like adding delays (e.g., time.sleep()), and working with timestamps, or measuring execution time.
- tqdm for visualizing the progress of loops in data processing or web scraping.
- json for serializing Python objects into JSON format.

In [2]:
import pandas as pd 
import json
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
import pandas as pd
import time

### Scraping the Most Active Stocks data

This webpage contains a table that is rendered with Javascript and dynamic pagination. Selenium enables us to loop through each page by clicking the next button after each iteration.

In [7]:
stock = []

path = "C:/Users/HP/Downloads/chromedriver-win64/chromedriver.exe"

service = Service(path)
driver = webdriver.Chrome(service=service)

driver.set_page_load_timeout(120)

try: 
    url = 'https://finance.yahoo.com/markets/stocks/most-active/?start=0&count=25' 
    driver.get(url) 
    next_button = WebDriverWait(driver,10).until(
            EC.element_to_be_clickable((By.XPATH, "//button[@data-testid='next-page-button']"))
        )

    while next_button.is_enabled()==True: 
        time.sleep(5)

        pager = driver.find_elements(By.TAG_NAME, 'tbody')

        for page in pager:
            rows = page.find_elements(By.TAG_NAME, 'tr')

            for row in rows:
                try:
                    data = row.find_elements(By.TAG_NAME, 'td')

                    row_data = {
                            "Symbol": data[0].text.strip(),
                            "Name": data[1].text.strip(),
                            "Price": data[3].text.strip(),
                            "Change": data[4].text.strip(),
                            "Change %": data[5].text.strip(),
                            "Volume": data[6].text.strip(),
                            "Avg Vol (3M)": data[7].text.strip(),
                            "Market Cap": data[8].text.strip(),
                            "P/E Ratio (TTM)": data[9].text.strip(),
                            "52 Wk Change %": data[10].text.strip()
                        }
                    stock.append(row_data) 

                except Exception as e:
                        print(f"Error parsing table: {e}")
                
        next_button.click()

        time.sleep(5)
except:
    pass

finally:
     driver.quit()

In [8]:
j_path = "Most Active Stocks.json"

with open(j_path, 'w') as file:
    json.dump(stock, file, indent=4)

print(f"Data successfully saved to {j_path}")

Data successfully saved to Most Active Stocks.json


In [9]:
df = pd.DataFrame(stock)
df.to_csv('Most Active Stocks.csv')

In [10]:
print(df.head(), df.tail())

  Symbol                     Name   Price   Change  Change %   Volume  \
0   INTC        Intel Corporation   21.13    +1.46    +7.42%  62.937M   
1   NVDA       NVIDIA Corporation  136.70    +3.13    +2.34%  62.278M   
2   RGTI  Rigetti Computing, Inc.   10.55    -0.69    -6.18%  55.451M   
3   PLUG          Plug Power Inc.  2.5172  -0.2228  -8.1314%  41.424M   
4   TSLA              Tesla, Inc.  428.65   +14.83    +3.58%  27.332M   

  Avg Vol (3M) Market Cap P/E Ratio (TTM) 52 Wk Change %  
0      69.633M    91.344B               -        -59.15%  
1     212.603M     3.352T           53.89        124.52%  
2     114.853M     2.964B               -      1,012.87%  
3      64.795M     2.294B               -          2.24%  
4      93.352M     1.376T          117.80         95.02%      Symbol                     Name   Price   Change  Change %  Volume  \
45    HLN               Haleon plc    9.29    +0.10    +1.03%   6.08M   
46   GRAB    Grab Holdings Limited  4.5700  +0.1200  +2.6936%