## Scraping 'Fundamentus' website for stock information ##

'Fundamentus' is a good website for finding financial information on stocks traded in the Brazilian Stock Exchange (B3). The used URL (https://www.fundamentus.com.br/resultado.php) is a table of all the traded stocks along with several indicators. 

For the purpose of this project, I'll focus on the 50 most traded stocks (average daily volume 2 months).

A CSV file is also generated to faciliate the analysis latter on

In [4]:
#required packages
import requests
import pandas as pd
import io

In [5]:
def clean_numeric_column(series):
    """
    Cleans string columns from the website: 
    Removes dots (thousand sep), replaces commas with dots, and removes '%'.
    """
    return pd.to_numeric(
        series.astype(str)
              .str.replace('.', '', regex=False)
              .str.replace(',', '.', regex=False)
              .str.replace('%', '', regex=False),
        errors='coerce'
    )

def scrape_top_50_indicators():
    url = "https://www.fundamentus.com.br/resultado.php"
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
    }
    
    try:
        print("Fetching data from market...")
        response = requests.get(url, headers=headers, timeout=15)
        
        #Reading the HTML table
        tables = pd.read_html(io.StringIO(response.text))
        df = tables[0]
        
        #Mapping Portuguese columns to English
        columns_map = {
            'Papel': 'Ticker',
            'P/L': 'PE_Ratio',
            'Div.Yield': 'Dividend_Yield',
            'Liq. Corr.': 'Liquidity_Ratio',
            'Liq.2meses': 'Avg_Daily_Volume_2m'
        }
        df = df.rename(columns=columns_map)

        #list of columns to clean (including the volume for sorting)
        numeric_cols = ['PE_Ratio', 'Dividend_Yield', 'Liquidity_Ratio', 'Avg_Daily_Volume_2m']
        for col in numeric_cols:
            df[col] = clean_numeric_column(df[col])
        
        #sort by Volume (Highest to Lowest)
        df_sorted = df.sort_values(by='Avg_Daily_Volume_2m', ascending=False)
        
        #Take the TOP 50 most traded companies
        df_top_50 = df_sorted.head(50).copy()
        
        #keeping only final columns
        final_df = df_top_50[['Ticker', 'Liquidity_Ratio', 'PE_Ratio', 'Dividend_Yield']]
        
        #saving to a CSV file
        output_file = 'top_50_stocks.csv'
        final_df.to_csv(output_file, index=False, sep=',', encoding='utf-8')
        
        print(f"Success! '{output_file}' generated.")
        print("\nPreview of the top 5 (sorted by liquidity, but volume is hidden):")
        print(final_df.head())
        
        return final_df

    except Exception as e:
        print(f"An error occurred: {e}")
        return None

if __name__ == "__main__":
    scrape_top_50_indicators()

Fetching data from market...
Success! 'top_50_stocks.csv' generated.

Preview of the top 5 (sorted by liquidity, but volume is hidden):
    Ticker  Liquidity_Ratio  PE_Ratio  Dividend_Yield
720  VALE3              124    1082.0           10.59
499  PETR4               82     512.0           10.49
687  ITUB4                0    1001.0           11.37
65   AXIA3              192   -2454.0            7.11
618  BBDC4                0     837.0            7.90
