# Scrape Screener
## Jupyter notebook that scrapes stocks data from Screener.in 

* Sheet 1: Top Ratios 
* Sheet 2: Quarterly Results
* Sheet 3: Profit & Loss
* Sheet 4: Compounded Sales Growth
* Sheet 5: Compounded Profit Growth
* Sheet 6: Stock Price CAGR 
* Sheet 7: Return on Equity
* Sheet 8: Balance Sheet
* Sheet 9: Cash Flows
* Sheet 10: Ratios
* Sheet 11: Shareholding Pattern
ng Pattern

### 1.1 Imports: 

In [1]:
import requests as rq
import pandas as pd 
import time
import datetime
from bs4 import BeautifulSoup
import re
import os


### 1.2 Read Stock List: 

In [2]:
df = pd.read_csv("psu-stocks-list.csv")
df.head(3)

Unnamed: 0,Screener Stock Symbol,Url Segment
0,BANKBARODA,Consolidated
1,SBIN,Consolidated
2,BPCL,Consolidated


### 1.3 Scrape and create reports: 

In [3]:
i = 0
writer = None
dir_path = "output/"+datetime.datetime.now().strftime("%d%m%Y%H%M%S")+"/"

if not os.path.exists(dir_path):
    os.makedirs(dir_path)

for index, row in df.iterrows():
    stock_symbol = row["Screener Stock Symbol"]
    url_segment = row["Url Segment"]
    if url_segment=="consolidated":
        url =  "https://www.screener.in/company/"+stock_symbol+"/consolidated/"
    else:
        url =  "https://www.screener.in/company/"+stock_symbol+"/"
    # print(url)
    print("Loading Page for ",stock_symbol)
    tables = pd.read_html(url)
    time.sleep(1) # Seconds
    # Get all Tables in separate dataframes
    df_quaterly_results         = tables[0] # Quarterly Results
    df_profit_n_loss            = tables[1] # Profit & Loss
    df_compounded_sales_growth  = tables[2] # Compounded Sales Growth
    df_compounded_profit_growth = tables[3] # Compounded Profit Growth
    df_stock_price_cagr         = tables[4] # Stock Price CAGR
    df_return_on_equity         = tables[5] # Return on Equity
    df_balance_sheet            = tables[6] # Balance Sheet
    df_cash_flows               = tables[7] # Cash Flows
    df_ratios                   = tables[8] # Ratios
    df_shareholding_pattern     = tables[9] # Shareholding Pattern
    
    # Cleanup table: Quarterly Results
    df_quaterly_results.rename(columns={'Unnamed: 0':'Quarterly Results'}, inplace=True)
    df_quaterly_results.replace(u"\u00A0\+", "", regex=True,inplace=True) 
    
    # Cleanup table: Profit & Loss
    df_profit_n_loss.rename(columns={'Unnamed: 0':'Profit and Loss'}, inplace=True)
    df_profit_n_loss.replace(u"\u00A0\+", "", regex=True,inplace=True) 

    # Cleanup table: Compounded Sales Growth
    df_compounded_sales_growth.replace(":", "", regex=True,inplace=True) 
    
    # Cleanup table: Compounded Profit Growth
    df_compounded_profit_growth.replace(":", "", regex=True,inplace=True) 
    
    # Cleanup table: Stock Price CAGR
    df_stock_price_cagr.replace(":", "", regex=True,inplace=True) 
    
    # Cleanup table: Return on Equity
    df_return_on_equity.replace(":", "", regex=True,inplace=True) 
    
    # Cleanup table: Balance Sheet
    df_balance_sheet.rename(columns={'Unnamed: 0':'Balance Sheet'}, inplace=True)
    df_balance_sheet.replace(u"\u00A0\+", "", regex=True,inplace=True) 
    
    # Cleanup table: Cash Flows
    df_cash_flows.rename(columns={'Unnamed: 0':'Cash Flows'}, inplace=True)
    df_cash_flows.replace(u"\u00A0\+", "", regex=True,inplace=True) 
    
    # Cleanup table: Ratios
    df_ratios.rename(columns={'Unnamed: 0':'Ratios'}, inplace=True)
    
    # Cleanup table: Shareholding Pattern
    df_shareholding_pattern.rename(columns={'Unnamed: 0':'Shareholding Pattern'}, inplace=True)
    df_shareholding_pattern.replace(u"\u00A0\+", "", regex=True,inplace=True)
    
    sheet_names = ["Quarterly Results", "Profit & Loss", "Compounded Sales Growth", "Compounded Profit Growth", 
                   "Stock Price CAGR", "Return on Equity", "Balance Sheet", "Cash Flows", "Ratios", "Shareholding Pattern"]
    dataframes  = [df_quaterly_results, df_profit_n_loss , df_compounded_sales_growth, df_compounded_profit_growth, 
                   df_stock_price_cagr, df_return_on_equity, df_balance_sheet, df_cash_flows, df_ratios, df_shareholding_pattern]
    
    writer = pd.ExcelWriter(dir_path + stock_symbol + ".xlsx" , engine='xlsxwriter')
    for i, frame in enumerate(dataframes):
        frame.to_excel(writer, sheet_name = sheet_names[i], index=False)
    writer.close()
    writer.handles = None

print("All Done!")



Loading Page for  BANKBARODA
Loading Page for  SBIN
Loading Page for  BPCL
Loading Page for  MMTC
Loading Page for  RCF
Loading Page for  BEL
Loading Page for  SAIL
Loading Page for  NLCINDIA
Loading Page for  NATIONALUM
Loading Page for  HINDPETRO
Loading Page for  BHEL
Loading Page for  ITI
Loading Page for  MRPL
Loading Page for  HINDCOPPER
Loading Page for  OIL
Loading Page for  POWERGRID
Loading Page for  CANBK
Loading Page for  UCOBANK
Loading Page for  GICRE
Loading Page for  UNIONBANK
Loading Page for  IRCON
Loading Page for  CENTRALBK
Loading Page for  MAHABANK
Loading Page for  BANKINDIA
Loading Page for  COCHINSHIP
Loading Page for  PSB
Loading Page for  IOB
Loading Page for  INDIANB
Loading Page for  ONGC
Loading Page for  PNB
Loading Page for  NTPC
Loading Page for  IOC
Loading Page for  COALINDIA
Loading Page for  LICI
Loading Page for  ENGINERSIN
Loading Page for  HAL
Loading Page for  NMDC
Loading Page for  PFC
Loading Page for  SJVN
Loading Page for  HUDCO
Loading Page

## 2.0 Testing Area: 

### 2.1 Fetch List Items: 

In [3]:
stock_symbol = "DMART"
url = "https://www.screener.in/company/"+stock_symbol+"/consolidated/"

response = rq.get(url)
# Parse the HTML content
soup = BeautifulSoup(response.content, "html.parser")
# Find the unordered list
ul = soup.find("ul#top-ratios")  # You might need to use a more specific selector like ul.my-list
for ul in soup.findAll('ul', id='top-ratios'):
    for li in ul.findAll('li'):
        li_parsed_text = li.text
        print(re.sub('[\s ]+', ' ', li_parsed_text))
        #print(li_parsed_text)

 Market Cap ₹ 3,19,278 Cr. 
 Current Price ₹ 4,910 
 High / Low ₹ 5,220 / 3,491 
 Stock P/E 120 
 Book Value ₹ 287 
 Dividend Yield 0.00 % 
 ROCE 19.4 % 
 ROE 14.5 % 
 Face Value ₹ 10.0 


### 2.2 Test tables: 

In [5]:
stock_symbol = "DMART"
url = "https://www.screener.in/company/"+stock_symbol+"/consolidated/"
print("Loading Page for ",stock_symbol)
tables = pd.read_html(url)
print(len(tables))
# print(tables[9]) # 0 to 11

df_quaterly_results         = tables[0] # Quarterly Results
df_profit_n_loss            = tables[1] # Profit & Loss
df_compounded_sales_growth  = tables[2] # Compounded Sales Growth
df_compounded_profit_growth = tables[3] # Compounded Profit Growth
df_stock_price_cagr         = tables[4] # Stock Price CAGR
df_return_on_equity         = tables[5] # Return on Equity
df_balance_sheet            = tables[6] # Balance Sheet
df_cash_flows               = tables[7] # Cash Flows
df_ratios                   = tables[8] # Ratios
df_shareholding_pattern     = tables[9] # Shareholding Pattern
#df_shareholding_pattern_yoy = tables[10] # Shareholding Pattern

Loading Page for  DMART
11
