## Scraping the Most Active Stocks from YahooFinance

[Website](https://finance.yahoo.com/markets/stocks/most-active/)

### Importing necessary Libraries
- pandas for data handling, cleaning, manipulation and analysis.

- selenium automates web browsers and user interaction like clicking buttons or dynamically waiting for items to load. Also for scraping sites that requires JavaScript rendering.
    - Service manages the ChromeDriver service for Selenium to interact with the Chrome browse

    - By provides methods to locate elements on a webpage (e.g., by ID, name, class name, etc.).
    - WebDriverWait explicitly waits for specific conditions to be met before proceeding with browser actions.
    - expected_conditions provides a collection of pre-built conditions for WebDriverWait (e.g., element visibility, clickability).
    - Select simplifies interactions with _select_ HTML elements, like selecting options from dropdowns by visible text, index, or value
- time provides time-related functions like adding delays (e.g., time.sleep()), and working with timestamps, or measuring execution time.
- tqdm for visualizing the progress of loops in data processing or web scraping.
- json for serializing Python objects into JSON format.

In [None]:
import pandas as pd 
import json
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
import pandas as pd
import time

### Scraping the Most Active Stocks data

This webpage contains a table that is rendered with Javascript and dynamic pagination. Selenium enables us to loop through each page by clicking the next button after each iteration.

In [None]:
stock = []

path = "C:/Users/HP/Downloads/chromedriver-win64/chromedriver.exe"

service = Service(path)
driver = webdriver.Chrome(service=service)

driver.set_page_load_timeout(120)

try: 
    url = 'https://finance.yahoo.com/markets/stocks/most-active/?start=0&count=25' 
    driver.get(url) 
    next_button = WebDriverWait(driver,10).until(
            EC.element_to_be_clickable((By.XPATH, "//button[@data-testid='next-page-button']"))
        )

    while next_button.is_enabled()==True: 
        time.sleep(5)

        pager = driver.find_elements(By.TAG_NAME, 'tbody')

        for page in pager:
            rows = page.find_elements(By.TAG_NAME, 'tr')

            for row in rows:
                try:
                    data = row.find_elements(By.TAG_NAME, 'td')

                    row_data = {
                            "Symbol": data[0].text.strip(),
                            "Name": data[1].text.strip(),
                            "Price": data[3].text.strip(),
                            "Change": data[4].text.strip(),
                            "Change %": data[5].text.strip(),
                            "Volume": data[6].text.strip(),
                            "Avg Vol (3M)": data[7].text.strip(),
                            "Market Cap": data[8].text.strip(),
                            "P/E Ratio (TTM)": data[9].text.strip(),
                            "52 Wk Change %": data[10].text.strip()
                        }
                    stock.append(row_data) 

                except Exception as e:
                        print(f"Error parsing table: {e}")
                
        next_button.click()

        time.sleep(5)
except:
    pass

finally:
     driver.quit()

In [4]:
j_path = "Most Active Stocks.json"

with open(j_path, 'w') as file:
    json.dump(stock, file, indent=4)

print(f"Data successfully saved to {j_path}")

Data successfully saved to Most Active Stocks.json


In [5]:
df = pd.DataFrame(stock)
df.to_csv('Most Active Stocks.csv')

In [6]:
print(df.head(), df.tail())

  Symbol                     Name   Price   Change Change %    Volume  \
0   NVDA       NVIDIA Corporation  116.66    -3.41   -2.84%   363.76M   
1   RGTI  Rigetti Computing, Inc.   13.47    +0.30   +2.28%  130.252M   
2      F       Ford Motor Company    9.89    -0.19   -1.88%  130.799M   
3   TSLA              Tesla, Inc.  383.68   -20.92   -5.17%   92.486M   
4   LCID        Lucid Group, Inc.  2.8000  +0.0400   +1.45%   85.832M   

  Avg Vol (3M) Market Cap P/E Ratio (TTM) 52 Wk Change %  
0     238.753M     2.857T           46.11         73.18%  
1     140.465M     3.772B               -      1,016.10%  
2      60.035M    39.306B           11.24        -13.03%  
3      90.802M     1.234T          189.94        123.46%  
4       88.54M     8.433B               -        -14.29%       Symbol                Name   Price Change Change %  Volume Avg Vol (3M)  \
270    TGT  Target Corporation  134.16  -3.75   -2.72%  5.182M       6.901M   
271   DKNG     DraftKings Inc.   41.39  -0.56   -