# Statistical and Machine Learning Models for Fundamentalist Data

This notebook is a useful tool for investors interested in the Brazilian stock market. It integrates machine learning techniques and statistical models to analyze fundamentalist data of companies listed on the stock exchange. The aim is to provide in-depth analysis and facilitate investment decision-making, focusing on identifying opportunities and mitigating risks. It includes interactive visualizations and real-time updates, making it accessible and practical for both experienced investors and beginners.

## Initial Setup

### Install Packages

In [1]:
%pip install pandas -q
%pip install plotly -q
%pip install scikit-learn -q

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


### Import libs

In [2]:
import os
from pathlib import Path
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.express as px
import warnings
warnings.filterwarnings('ignore')

### Create a file path default

In [3]:
file_path_scored = str(Path(os.getcwd()).parent.parent.parent / "data/scored_base")
file_path_book = str(Path(os.getcwd()).parent.parent.parent / "data/book")

### Pandas Config

In [4]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

### Load data

In [5]:
df_fundamentals_scored_kmeans = pd.read_csv(file_path_scored + "/fundamentals_scored_clusters.csv")
df_fundamentals_book = pd.read_csv(file_path_book + "/fundamentals_book.csv")

## Insights on Clustering (Kmeans)

### Companies and Sectors per Cluster

In [6]:
number_companies_cluster = df_fundamentals_scored_kmeans.groupby('kmeans_cluster')['ticker'].count()
total_companies = number_companies_cluster.sum()
percentage_companies_cluster = (number_companies_cluster / total_companies) * 100
combined_text = [f"Companies: {count} - Percentage: {percent:.2f}%" for count, percent in zip(number_companies_cluster, percentage_companies_cluster)]

fig = go.Figure()
fig.add_trace(go.Bar(x=number_companies_cluster.index, y=number_companies_cluster.values, name='Number of Companies per Cluster', text=combined_text, marker=dict(color='rgb(100, 195, 181)')))
fig.update_traces(textposition='outside')
fig.update_layout(title='Number of Companies per Cluster', xaxis_title='Clusters', yaxis_title='Number of Companies', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Company Clusters Analysis**

- **`Cluster 0`**: **121** companies (**41.58%**)
  - Largest cluster, indicating a common set of characteristics among many companies.

- **`Cluster 1`**: **52** companies (**17.87%**)
  - Smaller cluster, could represent niche markets or specialized company attributes.

- **`Cluster 2`**: **118** companies (**40.55%**)
  - Nearly as large as Cluster 0, suggesting another broad category of companies.

**`Key Points`**:
- `Clusters 0 and 2` dominate the dataset, implying two main types of company profiles.
- `Cluster 1`, being the smallest, may require further investigation to determine its unique traits.


In [7]:
fig = px.treemap(df_fundamentals_scored_kmeans, path=['kmeans_cluster', 'sector', 'ticker'], title='Treemap of Companies in Clusters')
fig.update_layout(title='Companies per Cluster', template='plotly_dark', height = 800)
fig.show()


In [8]:
sectors_cluster = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster', df_fundamentals_scored_kmeans['sector']]).count()
sectors_cluster_reset = sectors_cluster.reset_index()
sectors_cluster_reset = sectors_cluster_reset[['kmeans_cluster', 'sector']]
heatmap_data = pd.crosstab(sectors_cluster_reset['sector'], sectors_cluster_reset['kmeans_cluster'])

text_data = [['' for _ in range(len(heatmap_data.columns))] for _ in range(len(heatmap_data.index))]

for sector_idx, sector in enumerate(heatmap_data.index):
    for cluster_idx, cluster in enumerate(heatmap_data.columns):
        if heatmap_data.loc[sector, cluster] == 0:
            text_data[sector_idx][cluster_idx] = f'The sector {sector} not within the cluster {cluster}'
        else:
            text_data[sector_idx][cluster_idx] = f'The sector {sector} is within the cluster {cluster}'

fig = go.Figure(data=go.Heatmap(z=heatmap_data, x=heatmap_data.columns, y=heatmap_data.index, colorscale='Viridis', text=text_data, hoverinfo='text', showscale=False))
fig.update_layout(title='Presence of Sectors by Cluster', xaxis_title='Cluster', yaxis_title='Setor', font=dict(color='white'), plot_bgcolor='black', paper_bgcolor='black')
fig.show()

**Sector-Based Cluster Analysis for Listed Companies**

**`Cluster 0: Diverse and Consumer-Oriented`**

- **Largest Cluster**: `121` companies across various consumer-facing and utility sectors.
- **Key Sectors**: Consumer Cyclical, Utilities, Real Estate, and Healthcare.

- **Prominent Companies**:
  - `Magazine Luiza S.A.` (MGLU3.SA) - A major retailer in the Consumer Cyclical sector.
  - `Gol Linhas Aéreas Inteligentes S.A.` (GOLL4.SA) - Leading airline within the Industrials sector.
  - `TOTVS S.A.` (TOTS3.SA) - A significant software company in the Technology sector.

**`Cluster 1: Industrials and Financial Heavyweights`**

- **Moderate-Sized Cluster**: `52` companies, with a concentration in Industrials and Financial Services.
- **Key Sectors**: Basic Materials, Energy, and Financial Services.

- **Prominent Companies**:
  - `Vale S.A.` (VALE3.SA) - One of the largest mining companies in the world within the Basic Materials sector.
  - `Petrobras` (PETR3.SA, PETR4.SA) - A global oil leader in the Energy sector.
  - `Banco Bradesco S.A.` (BBDC3.SA, BBDC4.SA) - A top financial institution in the Financial Services sector.
  - `Ambev S.A.` (ABEV3.SA) - The biggest brewer in Latin America under the Consumer Defensive sector.
  - `Itaú Unibanco Holding S.A.` (ITUB4.SA) - A leading banking conglomerate in the Financial Services sector.

**`Cluster 2: Specialized and Emerging Players`**

- **Smallest Cluster**: `110` companies, likely indicating niche specialization or emerging market presence.
- **Key Sectors**: Financial Services, Real Estate, and Consumer Cyclical.

- **Prominent Companies**:
  - `Gafisa S.A.` (GFSA3.SA) - A well-known name in the Real Estate sector.
  - `LOG Commercial Properties` (LOGG3.SA) - Engaged in commercial real estate, indicating growth within the sector.
  - `Triunfo Participações e Investimentos S.A.` (TPIS3.SA) - Operating within the Industrials sector with potential for infrastructure development.
  
**`Summary`**
- **Cluster 0** companies are largely consumer-focused, indicative of Brazil's robust domestic market and service-oriented economy.
- **Cluster 1** includes industrial giants and financial stalwarts, reflecting Brazil's key role in global commodities and financial markets.
- **Cluster 2** suggests a grouping of companies with specialized roles, potential for growth, or those targeting emerging trends in the Financial and Real Estate sectors.


#### Financial Assessment and Profitability

##### Liquidity and Reserves

In [60]:
liquidity_reserves = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster'])[['profit_margins', 'operating_margins', 'ebitda', 'gross_profits']].mean()
liquidity_reserves = liquidity_reserves.reset_index()
liquidity_reserves

Unnamed: 0,kmeans_cluster,profit_margins,operating_margins,ebitda,gross_profits
0,0,0.108734,0.169274,1130676000.0,1743571000.0
1,1,0.215149,0.234825,19601040000.0,34763570000.0
2,2,0.499287,-0.506219,89064530.0,531768200.0


In [62]:
fig = go.Figure()
fig.add_trace(go.Bar(x=liquidity_reserves['kmeans_cluster'], y=liquidity_reserves['profit_margins']*100, name='Average Profit Margins', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (liquidity_reserves['profit_margins']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Profit Margins by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Profit Margins (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Average Profit Margins Analysis by Cluster**

**Profit Margins Overview**

- **`Cluster 0`**: The base cluster with an average profit margin of **10.87%**.
- **`Cluster 1`**: Shows a substantially higher profitability with an average margin of **21.51%**.
- **`Cluster 2`**: Dominates in profitability with an average margin of **49.93%**.

**Comparative Analysis**

- **`Cluster 2`**'s average margin is more than twice that of **Cluster 1** and nearly five times higher than **Cluster 0**. This indicates that **Cluster 2** may consist of companies with either high pricing power, lower costs, or operating in high-margin industries.
- **`Cluster 1`** also significantly outperforms **Cluster 0**, suggesting that its companies might be more efficient, operate in more profitable sectors, or benefit from economies of scale.
- **`Cluster 0`**, while having the lowest average margins, may represent companies in competitive or capital-intensive industries with thinner profit margins.

**Industry Implications**

- Given that **`Cluster 2`** contains companies such as `LOG Commercial Properties` (LOGG3.SA), its high margin could be indicative of profitable real estate deals or a favorable property market.
- In **`Cluster 1`**, companies like `Vale S.A.` (VALE3.SA) and `Petrobras` (PETR3.SA, PETR4.SA) reflect the significant profitability potential in the Basic Materials and Energy sectors.
- **`Cluster 0`** includes consumer-focused companies like `Magazine Luiza S.A.` (MGLU3.SA), which could be operating in highly competitive markets with aggressive pricing strategies, leading to lower profit margins.

**Summary**

- **`Cluster 2`** stands out as a high-profit group, potentially benefiting from favorable market conditions or strategic operational efficiencies.
- **`Cluster 1`** represents a middle ground, possibly balancing scale and profitability in their operations.
- **`Cluster 0`** may need to focus on cost optimization or market differentiation to enhance profitability.


In [66]:
fig = go.Figure()
fig.add_trace(go.Bar(x=liquidity_reserves['kmeans_cluster'], y=liquidity_reserves['operating_margins']*100, name='Average Operating Margins', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (liquidity_reserves['operating_margins']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Operating Margins by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Operating Margins (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Average Operating Margins Analysis by Cluster**

**Operating Margins Overview**

- **`Cluster 0`**: Exhibits a solid average operating margin of **16.93%**.
- **`Cluster 1`**: Surpasses Cluster 0 with a higher average operating margin of **23.48%**.
- **`Cluster 2`**: Shows a negative average operating margin of **-50.62%**, indicating operational losses.

**Comparative Analysis**

- **`Cluster 1`**’s operating margin is notably higher than that of **`Cluster 0`**, suggesting more efficient operations or a favorable cost structure within Cluster 1's industries.
- The negative margin of **`Cluster 2`** is a cause for concern as it indicates that companies are spending more to operate than they are earning. This could be due to a variety of reasons, such as aggressive investment in growth, unfavorable market conditions, or inefficient operations.

**Industry Implications**

- The positive margins for **`Clusters 0 and 1`** indicate healthy operational efficiency overall. Companies in these clusters, such as those in the Industrials sector like `RAIL3.SA`, are likely managing their expenses well relative to their revenue.
- The negative margin for **`Cluster 2`** could suggest that companies in this cluster, which may include real estate firms like `LOG Commercial Properties` (LOGG3.SA), are in a growth phase, investing heavily in operations, or they may be affected by external challenges such as market downturns or increased competition.

**Summary**

- **`Cluster 1`** represents an optimal performance model with the highest operating margins, potentially reflecting companies with strong market positions or operational excellence.
- **`Cluster 0`**'s positive margins suggest stable operations but with room for improvement in efficiency or cost management to reach the levels of **`Cluster 1`**.
- **`Cluster 2`** faces significant challenges, with its negative margin highlighting the need for strategic reassessments, operational overhauls, or market repositioning to return to profitability.


In [12]:
def format_values(values):
    formatted = []
    for value in values:
        abs_value = abs(value)
        if 1e9 <= abs_value < 1e11:
            formatted.append(f'R${value / 1e9:.2f} B')
        elif abs_value >= 1e11:
            formatted.append(f'R${value / 1e11:.2f} x 100B')
        elif abs_value >= 1e6:
            formatted.append(f'R${value / 1e6:.2f} M')
        else:
            formatted.append(f'R${value:.2f}')
    return formatted

In [13]:
def format_values_x(values):
    formatted = []
    for value in values:
        abs_value = abs(value)
        if 1e9 <= abs_value < 1e11:
          
            formatted_value = f"{round(value / 1e9):,}".replace(',', '.') + " B"
        elif abs_value >= 1e11:
            
            formatted_value = f"{round(value / 1e11):,}".replace(',', '.') + " x 100B"
        elif abs_value >= 1e6:
            
            formatted_value = f"{round(value / 1e6):,}".replace(',', '.') + " M"
        else:
            
            formatted_value = f"{round(value):,}".replace(',', '.') + " K"
        formatted.append(formatted_value)
    return formatted

In [67]:
formatted_ebitda = format_values(liquidity_reserves['ebitda'])

fig = go.Figure()
fig.add_trace(go.Bar(x=liquidity_reserves['kmeans_cluster'], y=liquidity_reserves['ebitda'], hovertext=formatted_ebitda, name='Average Ebitda', marker=dict(color='rgb(100, 195, 181)'), text=formatted_ebitda))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Ebitda by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Ebitda (R$)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Average EBITDA Analysis by Cluster**

**EBITDA Overview**

- **`Cluster 0`**: Posts an average EBITDA of **R$1.13 Billion**, suggesting moderate operational profitability.
- **`Cluster 1`**: Significantly leads with an average EBITDA of **R$19.60 Billion**, indicative of high operational efficiency or a presence in highly profitable sectors.
- **`Cluster 2`**: Has the lowest average EBITDA, at **R$89.06 Million**, potentially reflecting smaller or newer companies.

**Comparative Analysis**

- The EBITDA of **`Cluster 1`** dwarfs that of the other clusters, potentially due to the scale of operations or higher-margin businesses.
- **`Cluster 0`**'s EBITDA reflects steady business performance but shows room for growth or operational improvements to reach the level of Cluster 1.
- The much lower EBITDA in **`Cluster 2`** suggests these companies might be in their nascent stages, specialized niches, or facing operational challenges.

**Sector Implications**

- Companies in **`Cluster 1`** such as `Petrobras` (PETR3.SA, PETR4.SA) may be driving the high average EBITDA, often characteristic of the energy sector's large capital-intensive operations.
- Firms in **`Cluster 0`** might include consumer-facing businesses like `Magazine Luiza S.A.` (MGLU3.SA), indicating solid but comparatively lower profitability sectors.
- Entities in **`Cluster 2`** could involve emerging tech or startup ventures that typically exhibit lower EBITDA in their growth phase.

**Summary**

- The EBITDA figures highlight **Cluster 1** as potentially the most established, with **Cluster 0** occupying a middle ground and **Cluster 2** needing strategic focus to enhance profitability.


In [68]:
formatted_gross_profits = format_values(liquidity_reserves['gross_profits'])

fig = go.Figure()
fig.add_trace(go.Bar(x=liquidity_reserves['kmeans_cluster'], y=liquidity_reserves['gross_profits'], hovertext=formatted_gross_profits, name='Gross Profits', marker=dict(color='rgb(100, 195, 181)'), text=formatted_gross_profits))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Gross Profits by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Gross Profits (R$)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Gross Profits Analysis by Cluster**

**Gross Profit Overview**

- **`Cluster 0`**: Reports an average gross profit of **R$1.74 Billion**, indicating modest profitability relative to the other clusters.
- **`Cluster 1`**: Shows a substantially larger average gross profit of **R$34.76 Billion**, suggesting operations in high-revenue or high-margin sectors.
- **`Cluster 2`**: Has the lowest average gross profit at **R$531.77 Million**, which might reflect smaller company sizes or industries with lower gross profit figures.

**Comparative Analysis**

- The gross profit for **`Cluster 1`** far exceeds that of **`Cluster 0`** and **`Cluster 2`**, indicating that companies in **Cluster 1** may have a larger scale of operations or more profitable product lines.
- **`Cluster 0`**'s gross profit suggests the presence of companies with efficient cost of goods sold (COGS) but potentially lower revenue scales compared to Cluster 1.
- **`Cluster 2`**'s lower gross profit could be due to a variety of factors, such as smaller size, lower sales volume, or higher COGS relative to sales.

**Sector Implications**

- Companies in **`Cluster 1`** might include large-scale enterprises like `Vale S.A.` (VALE3.SA), which typically have substantial gross profits due to the scale and nature of their operations.
- Firms in **`Cluster 0`** could be represented by mid-sized entities with steady profitability.
- **`Cluster 2`** may consist of companies in competitive or emerging sectors where gross profits are not as high due to pricing strategies, market penetration efforts, or investment phases.

**Summary**

- The data points to **`Cluster 1** as the leader in gross profitability, likely due to economies of scale or a focus on high-margin industries.
- **`Cluster 0`** represents companies with consistent performance, and **`Cluster 2`** may need to focus on increasing sales volume or reducing COGS to improve gross profits.


##### Revenue and Profit Growth

In [16]:
revenue_profit  = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster'])[[ 'total_revenue', 'earnings_quarterly_growth', 'revenue_growth', 'earnings_growth_rate']].mean()
revenue_profit = revenue_profit.reset_index()
revenue_profit

Unnamed: 0,kmeans_cluster,total_revenue,earnings_quarterly_growth,revenue_growth,earnings_growth_rate
0,0,7158268000.0,0.235256,0.020512,23.52562
1,1,77417490000.0,1.484365,-0.004231,148.436538
2,2,1005542000.0,0.107,0.081915,10.7


In [69]:
formatted_revenue_profit = format_values(revenue_profit['total_revenue'])

fig = go.Figure()
fig.add_trace(go.Bar(x=revenue_profit['kmeans_cluster'], y=revenue_profit['total_revenue'], hovertext=formatted_revenue_profit, name='Total Revenue', marker=dict(color='rgb(100, 195, 181)'), text=formatted_revenue_profit))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Total Revenue by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Total Revenue (R$)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Total Revenue Analysis by Cluster**

**Revenue Overview**

- **`Cluster 0`**: Shows a healthy average total revenue of **R$7.16 Billion**.
- **`Cluster 1`**: Dominates with a massive average total revenue of **R$77.42 Billion**.
- **`Cluster 2`**: Displays the smallest average total revenue at **R$1.01 Billion**.

**Comparative Analysis**

- **`Cluster 1`**'s average revenue is more than tenfold that of **`Cluster 0`** and an astounding seventy-fold more than **`Cluster 2`**, suggesting Cluster 1 companies are likely industry leaders or operate in high-volume sectors.
- **`Cluster 0`** indicates a strong performance, which might be typical for established companies with a solid market presence.
- **`Cluster 2`**'s relatively small revenue suggests it may consist of smaller companies, startups, or those in niche markets.

**Industry Implications**

- The significant revenue in **`Cluster 1`** could be attributed to companies with substantial market share or those operating in lucrative sectors, such as `Vale S.A.` (VALE3.SA) in the mining industry or `Petrobras` (PETR3.SA, PETR4.SA) in oil and gas.
- Entities in **`Cluster 0`** may reflect a diverse set of well-established businesses with consistent sales, like `Magazine Luiza S.A.` (MGLU3.SA) in retail.
- Firms in **`Cluster 2`** might be in earlier stages of growth or in specialized industries with lower sales volumes.

**Summary**

- **`Cluster 1`** appears to be the powerhouse, with revenues suggesting large-scale operations.
- **`Cluster 0`** represents a middle ground, possibly indicating a broad mix of mature companies.
- **`Cluster 2`** shows potential for growth, with current figures suggesting a focus on market entry or niche specialization.


In [70]:
fig = go.Figure()
fig.add_trace(go.Bar(x=revenue_profit['kmeans_cluster'], y=revenue_profit['earnings_quarterly_growth'], hovertext='earnings_quarterly_growth', name='Earnings Quarterly Growth', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (revenue_profit['earnings_quarterly_growth']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Earnings Quarterly Growth by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Earnings Quarterly Growth (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Earnings Quarterly Growth Analysis by Cluster**

**Growth Rates Overview**

- **`Cluster 0`**: Demonstrates a healthy growth rate in earnings at **23.53%** quarterly.
- **`Cluster 1`**: Exhibits an extraordinary average quarterly growth rate of **148.44%**, suggesting rapid earnings expansion.
- **`Cluster 2`**: Shows a more modest growth rate at **10.70%**, which may indicate a mature or stable market position, or possibly facing headwinds in growth.

**Comparative Analysis**

- **`Cluster 1`**'s earnings growth is significantly higher than the other clusters, likely indicating either a period of major market success, a recovery from previous lows, or growth from acquisitions and expansions.
- The respectable growth of **`Cluster 0`** suggests steady market performance and potentially consistent earnings improvements.
- **`Cluster 2`**'s lower growth rate compared to **`Cluster 0`** and **`Cluster 1`** may reflect a variety of factors, such as market saturation, slower market conditions, or conservative business strategies.

**Implications for Strategy**

- **`Cluster 1`** may consist of companies that have successfully leveraged market trends, new product launches, or other strategic initiatives to boost their earnings significantly.
- Companies in **`Cluster 0`** are possibly employing effective growth strategies that allow for sustainable earnings improvements over time.
- **`Cluster 2`** might benefit from reassessing their growth strategies or might be in sectors that naturally exhibit slower growth rates.

**Summary**

- The stark contrast in quarterly earnings growth between **`Cluster 1`** and the others underscores potentially aggressive growth strategies or favorable market conditions for `Cluster 1` companies.
- **`Cluster 0`** and **`Cluster 2`** show more traditional growth patterns, with `Cluster 0` companies likely capitalizing on solid business practices, whereas`Cluster 2`` may need to explore new avenues for growth.


In [71]:
fig = go.Figure()
fig.add_trace(go.Bar(x=revenue_profit['kmeans_cluster'], y=revenue_profit['revenue_growth'], hovertext='revenue_growth', name='Revenue Growth', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (revenue_profit['revenue_growth']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Revenue Growth by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Revenue Growth (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Revenue Growth Analysis by Cluster**

**Revenue Growth Overview**

- **`Cluster 0`**: Exhibits a positive revenue growth at **2.05%**, indicating steady business expansion.
- **`Cluster 1`**: Reflects a marginal average revenue contraction of **-0.42%**, suggesting slight revenue challenges or stabilization after a period of growth.
- **`Cluster 2`**: Outperforms with a robust average revenue growth of **8.19%**, which could signify aggressive market expansion or entry into new markets.

**Comparative Analysis**

- **`Cluster 2`**'s significant growth rate implies that companies within this cluster might be capturing new market shares or benefiting from innovative product lines or services.
- **`Cluster 0`** represents a stable growth scenario, often seen in well-established markets or companies with consistent performance.
- The contraction in **`Cluster 1`** could indicate market saturation, a cyclical downturn, or the impacts of competitive pressures.

**Strategic Implications**

- Companies in **`Cluster 2`** may be experiencing a phase of strong growth, potentially due to successful strategies or favorable market conditions.
- **`Cluster 0`**'s growth suggests that companies may be maintaining a steady performance, which could be ideal for certain investment strategies.
- For **`Cluster 1`**, the slight decrease in revenue might prompt strategies to rejuvenate growth or optimize operations to improve profitability.

**Summary**

- The data highlights **`Cluster 2`** as a dynamic growth segment, possibly offering opportunities for investment in high-growth potential companies.
- **`Cluster 0`** presents a picture of stability and could be attractive to investors seeking consistent performers.
- **`Cluster 1`** might require careful analysis to understand the factors behind the revenue decrease and to identify any potential for a turnaround.


##### Asset Efficiency and ROI

In [21]:
asset_efficiency = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster'])[['total_assets_approx', 'asset_turnover', 'roi', 'return_on_assets', 'return_on_equity', 'roce']].mean()
asset_efficiency = asset_efficiency.reset_index()
asset_efficiency

Unnamed: 0,kmeans_cluster,total_assets_approx,asset_turnover,roi,return_on_assets,return_on_equity,roce
0,0,1347213000.0,7335.995546,1.50837,0.060052,0.121749,0.057324
1,1,48039140000.0,7.474149,0.734347,0.059335,0.159957,0.132031
2,2,1190988000.0,90.263563,0.99756,0.019301,0.127807,0.048294


In [73]:
formatted_asset_efficiency = format_values(asset_efficiency['total_assets_approx'])

fig = go.Figure()
fig.add_trace(go.Bar(x=asset_efficiency['kmeans_cluster'], y=asset_efficiency['total_assets_approx'], hovertext=formatted_asset_efficiency, name='Total Assets Approx', marker=dict(color='rgb(100, 195, 181)'), text=formatted_asset_efficiency))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Total Assets Approx by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Total Assets Approx (R$)', template='plotly_dark', font=dict(color='white'), height=560)
fig.show()

**Total Assets Analysis by Cluster**

**Asset Overview**

- **`Cluster 0`**: Has a modest asset base with average total assets around **R$1.35 Billion**.
- **`Cluster 1`**: Possesses a substantial asset base with average total assets at **R$48.04 Billion**.
- **`Cluster 2`**: Holds average total assets close to **R$1.19 Billion**, slightly lower than Cluster 0.

**Comparative Analysis**

- **`Cluster 1`**'s asset size vastly outpaces that of **Cluster 0** and **Cluster 2**, indicating that companies in Cluster 1 may either be larger in scale or operate in asset-heavy industries.
- The similar asset sizes of **Clusters 0 and 2** suggest they may include smaller companies or those in sectors that require less capital investment.

**Strategic Implications**

- The large average asset total in **Cluster 1** might reflect companies with significant physical assets, like real estate or manufacturing plants, or those that have accumulated assets over a long period of market presence.
- **`Cluster 0`** and **`Cluster 2`** may be more representative of companies that are service-oriented, technology-focused, or simply younger and thus have not yet built up substantial asset bases.

**Summary**

- **`Cluster 1`** stands out as potentially having established companies with considerable asset holdings, which may translate to stability and market power.
- **`Cluster 0`** and **`Cluster 2`** appear to have more modest asset levels, which could align with companies having a lighter asset profile or those in earlier stages of growth.


In [74]:
fig = go.Figure()
fig.add_trace(go.Bar(x=asset_efficiency['kmeans_cluster'], y=asset_efficiency['asset_turnover'], hovertext='asset_turnover', name='Asset Turnover', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}x' for x in (asset_efficiency['asset_turnover'])]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Asset Turnover by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Asset Turnover (x)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Asset Turnover Analysis by Cluster**

**Asset Turnover Overview**

- **`Cluster 0`**: Reports an exceptionally high average asset turnover at **7336.00x**.
- **`Cluster 1`**: Shows a more typical average asset turnover of **7.47x**.
- **`Cluster 2`**: Presents a relatively high average asset turnover at **90.26x**.

**Comparative Analysis**

- The asset turnover for **`Cluster 0`** is unusually high and might indicate data anomalies or companies with minimal assets and very high sales volumes, which is often characteristic of service or digital companies with low capital investment.
- **`Cluster 1`** exhibits an asset turnover rate that is more aligned with industrial averages, suggesting efficient use of assets in generating revenue.
- The higher turnover in **`Cluster 2`** could imply that companies there are effectively using their assets to generate sales, possibly indicative of growth-oriented or capital-efficient businesses.

**Implications for Strategy**

- Companies in **`Cluster 0`** may require further investigation to understand the drivers behind the extremely high turnover figure.
- Entities within **`Cluster 1`** could be seen as having a balanced approach to asset utilization and sales generation.
- Firms in **`Cluster 2`** might be employing aggressive strategies to maximize sales with their asset base or operating in sectors that require less capital intensity.

**Summary**

- **`Cluster 0`**'s turnover rate suggests either an outlier scenario or a cluster of companies that are not asset-intensive but have high sales volume.
- **`Cluster 1`**'s figure reflects what might be expected from established companies in traditional industries.
- **`Cluster 2`** indicates a dynamic use of assets, which might appeal to investors looking for capital efficiency.


In [75]:
fig = go.Figure()
fig.add_trace(go.Bar(x=asset_efficiency['kmeans_cluster'], y=asset_efficiency['roi']*100, hovertext='roi', name='ROI', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (asset_efficiency['roi']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Return on Investment (ROI) by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Return on Investment (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Return on Investment (ROI) Analysis by Cluster**

**ROI Overview**

- **`Cluster 0`**: Exhibits an extraordinary average ROI at **150.84%**.
- **`Cluster 1`**: Reports a robust average ROI of **73.43%**.
- **`Cluster 2`**: Also shows a high average ROI at **99.76%**.

**Comparative Analysis**

- **`Cluster 0`**'s average ROI is significantly higher than both **`Cluster 1`** and **`Cluster 2`**, indicating potentially higher profitability or lower investment costs relative to returns.
- **`Cluster 1`** maintains a strong ROI, which may suggest effective capital utilization and a solid return on investments.
- The nearly triple-digit ROI of **`Cluster 2`** suggests efficient investment strategies or high-yield operations.

**Strategic Implications**

- The exceptional ROI for **`Cluster 0`** might require further analysis to validate the sustainability of such high returns and to understand the underlying business models.
- **`Cluster 1`** represents what appears to be well-managed companies achieving considerable returns, possibly indicative of mature and stable operations.
- **`Cluster 2`**'s high ROI could attract investors looking for growth potential and strong return profiles.

**Summary**

- The ROI figures indicate that **`Cluster 0`** may consist of high-growth or high-efficiency companies possibly enjoying competitive advantages or operating in high-margin sectors.
- **`Cluster 1`** and **`Cluster 2`** show impressive ROI percentages, reflecting successful investment and operational strategies.


In [76]:
fig = go.Figure()
fig.add_trace(go.Bar(x=asset_efficiency['kmeans_cluster'], y=asset_efficiency['return_on_assets']*100, hovertext='return_on_assets', name='ROA', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (asset_efficiency['return_on_assets']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Return on Assets (ROA) by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Return on Assets (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Return on Assets (ROA) Analysis by Cluster**

**ROA Overview**

- **`Cluster 0`**: Demonstrates a solid average ROA at **6.01%**.
- **`Cluster 1`**: Shows a comparable average ROA of **5.93%**.
- **`Cluster 2`**: Has a lower average ROA at **1.93%**.

**Comparative Analysis**

- **`Cluster 0`** and **`Cluster 1`** have similar average ROAs, indicating that companies in these clusters are relatively close in their efficiency in generating profits from their assets.
- **`Cluster 2`**'s lower ROA suggests that these companies may not be utilizing their assets as efficiently to generate profit, or they could be in a growth phase investing heavily in assets that have not yet generated proportional profits.

**Strategic Implications**

- The close ROA figures for **`Cluster 0`** and **`Cluster 1`** imply that businesses within these clusters are operating with a similar level of asset efficiency, although the slightly higher ROA in Cluster 0 could indicate a marginal edge in asset utilization or a different mix of assets.
- The significantly lower ROA for **`Cluster 2`** may point to companies that are either asset-heavy with longer-term payback periods or those that need to optimize their asset management to improve profitability.

**Summary**

- The ROA data positions **`Cluster 0`** as potentially having the most effective use of assets to generate profits.
- **`Cluster 1`** is nearly as effective as Cluster 0 in asset utilization for profit generation, potentially indicating sound management practices.
- **`Cluster 2`** appears to have room for improvement in asset utilization or may represent a cluster with a longer-term investment horizon.


In [80]:
fig = go.Figure()
fig.add_trace(go.Bar(x=asset_efficiency['kmeans_cluster'], y=asset_efficiency['return_on_equity']*100, hovertext='return_on_equity', name='Return on Equity', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (asset_efficiency['return_on_equity']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Return on Equity (ROE) by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Return on Equity (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Return on Equity (ROE) Analysis by Cluster**

**ROE Overview**

- **`Cluster 0`**: Reports a respectable average ROE of **12.17%**.
- **`Cluster 1`**: Leads with a robust average ROE of **16.00%**.
- **`Cluster 2`**: Has a competitive average ROE of **12.78%**.

**Comparative Analysis**

- **`Cluster 1`**'s average ROE is the highest, which could indicate that its companies are generating more profit per dollar of equity, reflecting efficient equity use or a high-profit margin industry.
- **`Cluster 0`** and **`Cluster 2`** have relatively close ROE figures, suggesting that they have similar effectiveness in generating profits from shareholders' equity.

**Strategic Implications**

- The high ROE in **`Cluster 1`** might reflect companies with strong profitability relative to their equity, which could be due to high earnings or effective management.
- **`Cluster 0`** and **`Cluster 2`** demonstrating similar ROE percentages may indicate these clusters contain companies with effective but not exceptional equity management or that they are operating in industries with more typical ROE figures.

**Summary**

- **`Cluster 1`**'s superior ROE suggests it is the most effective at generating returns on equity, potentially making it attractive to equity investors.
- **`Cluster 0`** and **`Cluster 2`** display solid performance, with Cluster 2 slightly outperforming Cluster 0 in ROE, hinting at efficient use of shareholder investments.


In [82]:
fig = go.Figure()
fig.add_trace(go.Bar(x=asset_efficiency['kmeans_cluster'], y=asset_efficiency['roce']*100, hovertext='roce', name='Return on Employed Capital', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (asset_efficiency['roce']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Return on Employed Capital (ROCE) by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Return on Employed Capital (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Return on Capital Employed (ROCE) Analysis by Cluster**

**ROCE Overview**

- **`Cluster 0`**: Shows a moderate average ROCE of **5.73%**.
- **`Cluster 1`**: Significantly outperforms with an average ROCE of **13.20%**.
- **`Cluster 2`**: Posts a lower average ROCE of **4.83%**.

**Comparative Analysis**

- The high ROCE of **`Cluster 1`** suggests that companies within this cluster are using their capital very efficiently to generate profits.
- **`Cluster 0`**'s ROCE indicates a reasonable rate of return on capital, which could be consistent with a stable, established business environment.
- The relatively lower ROCE of **`Cluster 2`** may reflect less efficient capital use, which could be due to a number of factors such as heavy investment phases or industries with lower capital turnover.

**Strategic Implications**

- **`Cluster 1`**'s superior ROCE might indicate companies with a strong competitive position or those operating in industries with higher operational efficiency.
- Companies in **`Cluster 0`** appear to be maintaining sound capital utilization, although there may be opportunities to optimize for better returns.
- The ROCE for **`Cluster 2`** suggests these companies could focus on strategies to improve their capital efficiency or may be in a phase of investment that has yet to yield higher returns.

**Summary**

- **`Cluster 1`**'s high ROCE indicates a cluster of potentially attractive companies for investors looking for efficient capital utilization.
- **`Cluster 0`** and **`Cluster 2`** show room for improvement in capital deployment, with Cluster 2, in particular, possibly needing strategic adjustments to enhance returns on capital.


#### Risk and Investment Performance

##### Investment Risk

In [28]:
investiment_risk = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster'])[['beta', 'debt_to_equity', 'price_to_sales_trailing_12_months']].mean()
investiment_risk = investiment_risk.reset_index()
investiment_risk

Unnamed: 0,kmeans_cluster,beta,debt_to_equity,price_to_sales_trailing_12_months
0,0,0.730008,0.374829,3.25736
1,1,0.755692,18.915705,2.110524
2,2,0.753449,0.796297,3.687917


In [83]:
fig = px.bar(investiment_risk, title='Average Beta', x='kmeans_cluster', y='beta', color_discrete_sequence=['rgb(100, 195, 181)'], hover_name='beta', height=500)
fig.update_traces(text=[f'{x:.2f}' for x in (investiment_risk['beta'])], textposition='outside',textfont=dict(color='white'))
fig.update_layout(xaxis_title='kmeans Cluster', yaxis_title='Average Beta', template = 'plotly_dark')
fig.show()

**Beta Values Analysis by Cluster**

**Beta Overview**

- **`Cluster 0`**: Demonstrates a beta of **0.73**, suggesting lower volatility relative to the market.
- **`Cluster 1`**: Shows a beta of **0.76**, indicating a slightly higher volatility than Cluster 0 but still below the market average.
- **`Cluster 2`**: Has a beta of **0.75**, reflecting volatility close to that of Cluster 1.

**Comparative Analysis**

- All clusters have beta values less than 1, implying that they are less volatile than the broader market.
- The small differences in beta values between the clusters (**0.73**, **0.76**, and **0.75**) suggest that companies within these clusters may have similar risk profiles.

**Implications for Investors**

- Investors seeking lower-risk investments might find companies within these clusters appealing due to their lower relative volatility.
- The similarity in beta values across clusters suggests that, from a volatility standpoint, there is a uniform risk level amongst them.

**Summary**

- The clusters present a consistent picture of below-market volatility, which may appeal to risk-averse investors.
- The slight variation in beta between the clusters is minimal, indicating comparable levels of systematic risk across the clusters.


In [85]:
fig = go.Figure()
fig.add_trace(go.Bar(x=investiment_risk['kmeans_cluster'], y=investiment_risk['debt_to_equity']*100, hovertext='debt_to_equity', name='Debt to Equity', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (investiment_risk['debt_to_equity']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Debt to Equity by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Debt to Equity (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Debt to Equity Ratio Analysis by Cluster**

**D/E Ratio Overview**

- **`Cluster 0`**: Exhibits a conservative D/E ratio of **37.48%**, indicating a lower reliance on debt for financing.
- **`Cluster 1`**: Has an extremely high D/E ratio of **1891.57%**, suggesting a heavy dependence on debt relative to equity.
- **`Cluster 2`**: Shows a moderate D/E ratio of **79.63%**, which is higher than Cluster 0 but significantly less than Cluster 1.

**Comparative Analysis**

- The D/E ratio of **`Cluster 1`** is notably higher than that of the other clusters, which could indicate aggressive growth strategies financed by debt or potential financial leveraging.
- **`Cluster 0`** has a relatively low D/E ratio, which could imply a more conservative financial strategy with less financial risk.
- **`Cluster 2`**'s D/E ratio suggests a balanced approach to financing, utilizing a mix of debt and equity without overly relying on either.

**Strategic Implications**

- Companies within **`Cluster 1`** may require careful risk assessment due to their high leverage levels, which could amplify financial risks during economic downturns.
- Entities in **`Cluster 0`** may be perceived as more stable investments, especially if they maintain profitability with a lower level of debt.
- Firms in **`Cluster 2`** might be leveraging debt to drive growth while still maintaining a reasonable equity buffer.

**Summary**

- The D/E ratios indicate that **`Cluster 1`** companies may be engaging in high-risk/high-reward strategies that rely heavily on debt financing.
- **`Cluster 0`** suggests a cluster of potentially lower-risk companies with conservative financial practices.
- **`Cluster 2`** presents a middle ground, possibly indicative of companies pursuing growth with a cautious approach to leverage.


In [86]:
fig = px.bar(investiment_risk, title='Average Price To Sales Trailing 12 Months', x='kmeans_cluster', y='price_to_sales_trailing_12_months', color_discrete_sequence=['rgb(100, 195, 181)'], hover_name='price_to_sales_trailing_12_months', height=600)
fig.update_traces(text=[f'{x:.2f}x' for x in (investiment_risk['price_to_sales_trailing_12_months'])], textposition='outside',textfont=dict(color='white'))
fig.update_layout(xaxis_title='Kmeans Cluster', yaxis_title='Average Price To Sales Trailing 12 Months (x)', template = 'plotly_dark')
fig.show()

**Price to Sales Ratio Analysis by Cluster**

**P/S Ratio Overview**

- **`Cluster 0`**: Exhibits a P/S ratio of **3.26x**, suggesting investors may be willing to pay a premium for the company's sales.
- **`Cluster 1`**: Has a lower P/S ratio of **2.11x**, which could indicate more modestly valued sales.
- **`Cluster 2`**: Shows the highest P/S ratio at **3.69x**, possibly reflecting high growth expectations or a premium on the sales generated by these companies.

**Comparative Analysis**

- The P/S ratio for **`Cluster 0`** and **`Cluster 2`** is higher than **`Cluster 1`**, which may suggest that companies in Clusters 0 and 2 are expected to grow faster or have a stronger competitive edge.
- **Cluster 1**'s lower P/S ratio might indicate that its companies are seen as providing more value per dollar of sales, or it could suggest that the market views these companies as having lower growth prospects compared to the others.

**Implications for Investors**

- A higher P/S ratio in **`Cluster 0`** and **`Cluster 2`** could appeal to growth-oriented investors who are looking for companies that might have higher future revenue potential.
- The relatively lower P/S ratio in **`Cluster 1`** may attract value investors seeking to capitalize on potential market undervaluation.

**Summary**

- **`Cluster 0`** and **`Cluster 2`** present higher valuation multiples on sales, which could indicate higher market expectations for future growth.
- **`Cluster 1`** offers potentially more conservative valuation levels, which might be attractive for investors looking for current value rather than future growth.


##### Market Assessment

In [32]:
market_assessment = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster'])[['trailing_pe', 'forward_pe', 'market_cap', 'enterprise_value', 'price_to_book']].mean()
market_assessment = market_assessment.reset_index()
market_assessment

Unnamed: 0,kmeans_cluster,trailing_pe,forward_pe,market_cap,enterprise_value,price_to_book
0,0,14.075959,5.52186,5800289000.0,9335457000.0,4.317683
1,1,12.302938,8.616403,84845650000.0,164926800000.0,1.872569
2,2,9.631109,2.205538,751471400.0,1731520000.0,8.262989


In [87]:
fig = px.bar(market_assessment, title='Average Price to Earnings Ratio', x='kmeans_cluster', y='trailing_pe', color_discrete_sequence=['rgb(100, 195, 181)'], hover_name='trailing_pe', height=600)
fig.update_traces(text=[f'{x:.2f}x' for x in (market_assessment['trailing_pe'])], textposition='outside',textfont=dict(color='white'))
fig.update_layout(xaxis_title='Kmeans Cluster', yaxis_title='Price to Earnings Ratio (x)', template = 'plotly_dark')
fig.show()

**Price to Earnings Ratio (P/E) Analysis by Cluster**

**`P/E Ratio Overview`**

- **`Cluster 0`**: Demonstrates a P/E ratio of **14.08x**, suggesting investors may expect higher earnings growth or the stock may be more expensive compared to its earnings.
- **`Cluster 1`**: Reflects a slightly lower P/E ratio of **12.30x**, potentially indicating more moderate growth expectations or a more reasonable stock price relative to earnings.
- **`Cluster 2`**: Presents the lowest P/E ratio at **9.63x**, which could imply that the market views these companies as having lower growth prospects or they are undervalued relative to their earnings.

**`Comparative Analysis`**

- The P/E ratios suggest that **`Cluster 0`** is valued the highest by the market, followed by **`Cluster 1`**, with **`Cluster 2`** being the least valued in terms of earnings.
- A lower P/E in **`Cluster 2`** might attract value investors who believe the market has undervalued these companies' earnings potential.

**`Strategic Implications`**

- Companies in **`Cluster 0`** with the highest P/E ratio might be targeted by investors for their potential growth or because they have stable, proven profitability.
- The relatively lower P/E ratio in **`Clusters 1 and 2`** might suggest that these companies could be undervalued or they may not have the same growth expectations as those in Cluster 0.

**Summary**

- **`Cluster 0`**'s higher P/E ratio indicates a market perception of higher growth or a premium on earnings.
- **`Cluster 1`** and **`Cluster 2`** have lower P/E ratios, possibly indicating that the market sees them as having slower growth prospects or they might represent more attractively priced investment opportunities based on earnings.


In [88]:
fig = px.bar(market_assessment, title='Average Forward Price to Earnings', x='kmeans_cluster', y='forward_pe', color_discrete_sequence=['rgb(100, 195, 181)'], hover_name='forward_pe', height=600)
fig.update_traces(text=[f'{x:.2f}x' for x in (market_assessment['forward_pe'])], textposition='outside',textfont=dict(color='white'))
fig.update_layout(xaxis_title='Kmeans Cluster', yaxis_title='Forward Price to Earnings (x)', template = 'plotly_dark')
fig.show()

**Forward Price to Earnings (P/E) Ratio Analysis by Cluster**

**`Forward P/E Ratio Overview`**

- **`Cluster 0`**: The forward P/E ratio is **5.52x**, indicating moderate expectations for future earnings.
- **`Cluster 1`**: Has a forward P/E ratio of **8.62x**, suggesting higher market expectations for future earnings growth.
- **`Cluster 2`**: With a forward P/E ratio of **2.21x**, the market may have conservative expectations for future earnings or the cluster may represent undervalued companies.

**`Comparative Analysis`**

- **`Cluster 1`**'s higher forward P/E ratio may indicate optimism regarding future profitability or a premium for expected growth.
- The lower forward P/E ratios in **`Cluster 0`** and **`Cluster 2`** could imply more modest growth expectations or potentially undervalued stocks if the market underestimates their future earnings potential.

**`Implications for Investors`**

- Investors may interpret **`Cluster 1`**'s higher forward P/E as an indicator of either higher risk due to expectations of growth or confidence in these companies' future performance.
- The lower forward P/E ratios for **`Cluster 0`** and **`Cluster 2`** could appeal to value-oriented investors or those looking for companies potentially poised for a re-rating if they deliver on earnings.

**Summary**

- The forward P/E ratios suggest that **`Cluster 1`** companies are viewed by the market as having greater growth potential compared to **`Cluster 0`** and **`Cluster 2`**.
- **`Cluster 0`**'s and **`Cluster 2`**'s lower forward P/E ratios may offer attractive entry points for investors if the actual future earnings surpass market expectations.


In [89]:
formatted_market_assessment = format_values(market_assessment['market_cap'])

fig = go.Figure()
fig.add_trace(go.Bar(x=market_assessment['kmeans_cluster'], y=market_assessment['market_cap'], hovertext=formatted_asset_efficiency, name='Market Cap', marker=dict(color='rgb(100, 195, 181)'), text=formatted_market_assessment))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Market Cap by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Market Cap (R$)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Market Capitalization Analysis by Cluster**

**`Market Cap Overview`**

- **`Cluster 0`**: The average market cap is **R$5.80 Billion**, suggesting that it comprises mid-sized companies.
- **`Cluster 1`**: Has a substantially higher average market cap at **R$84.85 Billion**, indicating that this cluster likely contains large-cap companies.
- **`Cluster 2`**: Exhibits a much smaller average market cap of **R$751.47 Million**, which could be indicative of small-cap companies.

**`Comparative Analysis`**

- **`Cluster 1`** stands out with a significantly higher market cap, which may reflect well-established companies with a larger shareholder base and greater overall valuation.
- The market caps for **`Cluster 0`** and **`Cluster 2`** suggest these clusters consist of smaller companies, possibly with less market presence or in earlier stages of growth.

**`Strategic Implications`**

- Investors might consider companies in **`Cluster 1`** as potentially more stable and established, with a corresponding lower risk profile due to their large size.
- Companies in **`Cluster 0`** and particularly **`Cluster 2`** might offer higher growth potential, often associated with mid and small-cap stocks, albeit with potentially higher risk.

**Summary**

- The market cap data paints a clear picture of **`Cluster 1`** as the home of large-cap entities, possibly with significant industry influence and stability.
- **`Cluster 0`** represents mid-sized companies that balance growth potential with established business models.
- **`Cluster 2`** could attract investors interested in small-cap companies, which may have higher growth potential and the ability to adapt quickly to market changes.


In [91]:
formatted_market_assessment = format_values(market_assessment['enterprise_value'])

fig = go.Figure()
fig.add_trace(go.Bar(x=market_assessment['kmeans_cluster'], y=market_assessment['enterprise_value'], hovertext=formatted_asset_efficiency, name='Enterprise Value', marker=dict(color='rgb(100, 195, 181)'), text=formatted_market_assessment))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Enterprise Value by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Enterprise Value (R$)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Average Enterprise Value (EV) Analysis by Cluster**

**`EV Overview`**

- **`Cluster 0`**: The average EV is **R$9.34 Billion**, indicating the total value of mid-sized companies in this cluster.
- **`Cluster 1`**: Has a much larger average EV at **R$1.65 Trillion**, reflecting the high value of large companies in this cluster.
- **`Cluster 2`**: Possesses an average EV of **R$1.73 Billion**, which is closer to Cluster 0, suggesting small to mid-sized enterprises.

**`Comparative Analysis`**

- The enormous average EV of **`Cluster 1`** suggests that it contains companies with significant capital structures and market influence.
- **`Cluster 0`** and **`Cluster 2`** have much lower EVs, indicating smaller companies that may be more nimble or in earlier stages of growth.

**`Strategic Implications`**

- The high EV in **`Cluster 1`** could attract investors interested in large, possibly more established companies with substantial market presence.
- **`Cluster 0`** and **`Cluster 2`**, with their lower EVs, might appeal to those seeking investment opportunities in smaller companies, which could offer potential for growth or value discovery.

**`Summary`**

- The difference in EV across the clusters indicates a distinct separation between the large companies in **`Cluster 1`** and the smaller companies in **`Clusters 0 and 2`**.
- This disparity may reflect varying investment profiles, with **`Cluster 1`** representing well-established, capital-heavy companies and **`Clusters 0 and 2`** comprising entities with potentially different growth trajectories or investment risks.


In [93]:
fig = px.bar(market_assessment, title='Average Price to Book', x='kmeans_cluster', y='price_to_book', color_discrete_sequence=['rgb(100, 195, 181)'], hover_name='price_to_book', height=600)
fig.update_traces(text=[f'{x:.2f}x' for x in (market_assessment['price_to_book'])], textposition='outside',textfont=dict(color='white'))
fig.update_layout(xaxis_title='Kmeans Cluster', yaxis_title='Average Price to Book (x)', template = 'plotly_dark')
fig.show()

**Price to Book (P/B) Ratio Analysis by Cluster**

**`P/B Ratio Overview`**

- **`Cluster 0`**: The P/B ratio is **4.32x**, suggesting that the market values the companies' net assets at over four times their book value.
- **`Cluster 1`**: Has a P/B ratio of **1.87x**, indicating a lower market valuation relative to book value than Cluster 0.
- **`Cluster 2`**: Exhibits the highest P/B ratio at **8.26x**, indicating significant market valuation compared to the book value of the assets.

**`Comparative Analysis`**

- The high P/B ratio in **`Cluster 2`** suggests that the market may be ascribing a high value to the companies' assets or future profit potential.
- **`Cluster 0`** has a relatively high P/B ratio as well, though not as extreme as `Cluster 2`, which could still indicate expectations of value creation.
- **`Cluster 1`**'s lower P/B ratio might suggest that the market is pricing the companies more conservatively, potentially indicating undervaluation.

**`Implications for Investors`**

- **`Cluster 2`**'s high P/B ratio could be attractive to investors looking for companies with strong intangible assets or market positions that are not fully reflected on the balance sheet.
- The moderate P/B ratio for **`Cluster 0`** might appeal to those seeking companies that are reasonably valued by the market but still expected to perform well.
- The lower P/B ratio in **`Cluster 1`** could attract value investors who believe the market has not fully recognized the asset value or potential of these companies.

**`Summary`**

- The P/B ratios reflect varying levels of market expectations and valuation of companies' net assets across the clusters.
- **`Cluster 2`** stands out with the highest valuation, which may indicate higher growth expectations or premium assets, while **`Cluster 1`** may represent more value-oriented investment opportunities.


##### Market History

In [38]:
market_history = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster'])[['fifty_two_week_low', 'fifty_two_week_high', 'fifty_day_average', 'two_hundred_day_average']].mean()
market_history = market_history.reset_index()
market_history

Unnamed: 0,kmeans_cluster,fifty_two_week_low,fifty_two_week_high,fifty_day_average,two_hundred_day_average
0,0,18.203691,30.82196,24.288738,23.626257
1,1,18.275385,28.504936,23.381559,22.734482
2,2,12.988917,76.680763,22.479051,38.574618


In [94]:
formatted_market_history = format_values(market_history['fifty_two_week_low'])

fig = go.Figure()
fig.add_trace(go.Bar(x=market_history['kmeans_cluster'], y=market_history['fifty_two_week_low'], hovertext=formatted_market_history, name='52 Week Low', marker=dict(color='rgb(100, 195, 181)'), text=formatted_market_history))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average 52 Week Low', xaxis_title='KMeans Cluster', yaxis_title='Average 52 Week Low (R$)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Average 52 Week Low Analysis by Cluster**

**`52 Week Low Overview`**

- **`Cluster 0`**: The average 52-week low stands at **R$18.20**, suggesting a certain level of price stability or support over the past year.
- **`Cluster 1`**: Has an almost equivalent average 52-week low of **R$18.28**, indicating a similar stock price floor as Cluster 0.
- **`Cluster 2`**: Exhibits a lower average 52-week low at **R$12.99**, possibly indicating more volatility or a different market valuation.

**`Comparative Analysis`**

- **`Clusters 0 and 1`** have nearly identical 52-week lows, which may suggest these clusters have similar risk profiles or market perceptions regarding their lowest acceptable price levels.
- **`Cluster 2`**'s lower 52-week low could reflect a range of factors, including market sentiment, company-specific news, or broader economic conditions impacting stock prices.

**`Implications for Investors`**

- Investors might view the higher 52-week lows of **`Clusters 0 and 1`** as indicative of a stronger price floor, potentially providing a level of reassurance about the stocks' stability.
- The lower 52-week low in **`Cluster 2`** may appeal to investors looking for potentially undervalued stocks or those with a higher risk tolerance.

**`Summary`**

- The 52-week low figures provide a glimpse into the price history and potential support levels for stocks within each cluster.
- While **`Clusters 0 and 1`** show similar levels of their 52-week lows, **`Cluster 2`** stands out with a lower threshold, which could signal either a buying opportunity or a cautionary flag depending on individual investor analysis.


In [95]:
formatted_market_history = format_values(market_history['fifty_two_week_high'])

fig = go.Figure()
fig.add_trace(go.Bar(x=market_history['kmeans_cluster'], y=market_history['fifty_two_week_high'], hovertext=formatted_market_history, name='52 Week High', marker=dict(color='rgb(100, 195, 181)'), text=formatted_market_history))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average 52 Week High', xaxis_title='KMeans Cluster', yaxis_title='Average 52 Week High (R$)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Average 52 Week High Analysis by Cluster**

**`52 Week High Overview`**

- **`Cluster 0`**: The average 52-week high stands at **R$30.82**, indicating the peak price level stocks in this cluster have reached over the past year.
- **`Cluster 1`**: Has a slightly lower average 52-week high of **R$28.50**, suggesting a modest difference in stock performance peak compared to Cluster 0`.
- **`Cluster 2`**: Exhibits a much higher average 52-week high at **R$76.68**, pointing to a significant upward price movement for stocks in this cluster.

**`Comparative Analysis`**

- The notable high for **`Cluster 2`** suggests these stocks may have experienced substantial investor interest or positive market conditions driving the prices up.
- **`Cluster 0`** and **`Cluster 1`** show less variation in their 52-week highs, potentially indicating more stable or less volatile price movements throughout the year.

**`Implications for Investors`**

- Investors might consider the high 52-week value in **`Cluster 2`** as a sign of robust performance, possibly driven by strong fundamentals or investor optimism.
- The lower highs in **`Cluster 0`** and **`Cluster 1`** may be perceived as less aggressive growth, which could appeal to investors with a more conservative risk appetite.

**`Summary`**

- The 52-week high data reveals different peaks in stock prices within each cluster, with **`Cluster 2`** standing out for the highest values reached, which could reflect higher volatility or growth potential.
- **`Clusters 0 and 1`** present closer ranges of 52-week highs, indicating possibly more consistency or less dramatic price swings within those clusters.


In [97]:
formatted_market_history = format_values(market_history['fifty_day_average'])

fig = go.Figure()
fig.add_trace(go.Bar(x=market_history['kmeans_cluster'], y=market_history['fifty_day_average'], hovertext=formatted_market_history, name='50 Days Average', marker=dict(color='rgb(100, 195, 181)'), text=formatted_market_history))
fig.update_traces(textposition='outside')
fig.update_layout(title='50 Days Average', xaxis_title='KMeans Cluster', yaxis_title='Average 50 Days (R$)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**50 Days Average Price Analysis by Cluster**

**`50 Days Average Overview`**

- **`Cluster 0`**: The average price over the past 50 days stands at **R$24.29**, suggesting a level of stability or possibly an uptrend in price.
- **`Cluster 1`**: Shows a 50-day average price of **R$23.38**, indicating a slightly lower trading range than Cluster 0.
- **`Cluster 2`**: The average price is at **R$22.48**, which is the lowest among the clusters, potentially indicating a downtrend or lower valuation by the market.

**`Comparative Analysis`**

- The highest 50-day average price in **`Cluster 0`** suggests that stocks in this cluster might be experiencing steady demand or upward price momentum.
- The lower averages in **`Cluster 1`** and **`Cluster 2`** might reflect a more cautious market sentiment or a response to different sector dynamics or economic factors.

**`Implications for Investors`**

- The 50-day average is closely watched by traders and investors as it can signal short-term trends. A higher 50-day average in **`Cluster 0`** might be appealing for those looking for positive momentum.
- Conversely, the lower averages in **`Cluster 1`** and **`Cluster 2`** could attract investors seeking value or a potential price correction opportunity.

**`Summary`**

- The 50-day average price data provides insights into recent stock price trends within each cluster.
- Stocks in **`Cluster 0`** show the highest average, possibly indicating more bullish behavior, while **`Cluster 1`** and **`Cluster 2`** may represent stocks that have seen less price appreciation over the last 50 days.


In [42]:
formatted_market_history = format_values(market_history['two_hundred_day_average'])

fig = go.Figure()
fig.add_trace(go.Bar(x=market_history['kmeans_cluster'], y=market_history['two_hundred_day_average'], hovertext=formatted_market_history, name='200 Days Average', marker=dict(color='rgb(100, 195, 181)'), text=formatted_market_history))
fig.update_traces(textposition='outside')
fig.update_layout(title='200 Days Average', xaxis_title='KMeans Cluster', yaxis_title='Average 200 Days', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**200 Days Average Price Analysis by Cluster**

**`200 Days Average Overview`**

- **`Cluster 0`**: The average price over the past 200 days stands at **R$23.63**, suggesting a level of stability or possibly a long-term uptrend in price.
- **`Cluster 1`**: Shows a 200-day average price of **R$22.73**, indicating a slightly lower trading range than Cluster 0.
- **`Cluster 2`**: The average price is at **R$38.57**, which is the highest among the clusters, potentially indicating a strong uptrend or higher valuation by the market.

**`Comparative Analysis`**

- The highest 200-day average price in **`Cluster 2`** suggests that stocks in this cluster might be experiencing consistent demand or upward price momentum.
- The closer averages in **`Cluster 0`** and **`Cluster 1`** might reflect a more cautious market sentiment or a response to different sector dynamics or economic factors.

**`Implications for Investors`**

- The 200-day average is closely watched by traders and investors as it can signal long-term trends. A higher 200-day average in **`Cluster 2`** might be appealing for those looking for sustained positive momentum.
- Conversely, the similar and lower averages in **`Cluster 0`** and **`Cluster 1`** could attract investors seeking value or a potential price correction opportunity.

**`Summary`**

- The 200-day average price data provides insights into the longer-term stock price trends within each cluster.
- Stocks in **`Cluster 2`** show the highest average, possibly indicating more bullish behavior, while **`Cluster 0`** and **`Cluster 1`** may represent stocks that have seen less price appreciation over the last 200 days.

#### Dividend Policy

##### Dividend Payment

In [43]:
dividend_payment = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster'])[['dividend_rate', 'trailing_annual_dividend_rate', 'trailing_annual_dividend_yield', 'dividend_payout_ratio']].mean()
dividend_payment = dividend_payment.reset_index()
dividend_payment

Unnamed: 0,kmeans_cluster,dividend_rate,trailing_annual_dividend_rate,trailing_annual_dividend_yield,dividend_payout_ratio
0,0,1.800496,1.097661,0.042841,96.602199
1,1,1.663462,1.30875,0.059135,197.572295
2,2,68.972458,0.822729,0.018401,1342.209178


In [98]:
formatted_dividend_rate= format_values(dividend_payment['dividend_rate'])

fig = go.Figure()
fig.add_trace(go.Bar(x=dividend_payment['kmeans_cluster'], y=dividend_payment['dividend_rate'], hovertext=formatted_dividend_rate, name='Dividend Rate', marker=dict(color='rgb(100, 195, 181)'), text=formatted_dividend_rate))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Dividend Rate', xaxis_title='KMeans Cluster', yaxis_title='Average Dividend Rate (R$)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Average Dividend Rate Analysis by Cluster**

**`Average Dividend Rate Overview`**

- **`Cluster 0`**: The average dividend rate is **R$1.80**, which might indicate a conservative dividend payout policy or a reinvestment strategy.
- **`Cluster 1`**: With an average dividend rate of **R$1.66**, this cluster is slightly lower than Cluster 0, suggesting a similar financial strategy among these companies.
- **`Cluster 2`**: The significant jump to an average dividend rate of **R$68.97** is indicative of a cluster that prioritizes returning income to shareholders.

**`Comparative Analysis`**

- The stark difference in dividend rates between **`Cluster 2`** and the other clusters could reflect a fundamental difference in company maturity, sector characteristics, or cash flow management.
- **`Cluster 0`** and **`Cluster 1`** have relatively similar dividend rates, which may indicate comparable financial policies or sectoral behaviors.

**`Implications for Investors`**

- Investors seeking high dividend yields might be particularly interested in **`Cluster 2`** companies.
- Those looking for potential growth companies that may be reinvesting their earnings rather than paying out dividends might lean towards **`Cluster 0`** and **`Cluster 1`**.

**`Summary`**

- The average dividend rate provides an insight into the dividend distribution strategy of the clusters.
- **`Cluster 2`** stands out with a high dividend rate, potentially appealing to income-focused investors, while **`Cluster 0`** and **`Cluster 1`** may cater to growth-oriented investors.


In [99]:
formatted_trailing_annual_dividend_rate= format_values(dividend_payment['trailing_annual_dividend_rate'])

fig = go.Figure()
fig.add_trace(go.Bar(x=dividend_payment['kmeans_cluster'], y=dividend_payment['trailing_annual_dividend_rate'], hovertext=formatted_dividend_rate, name='Trailing Annual Dividend Rate', marker=dict(color='rgb(100, 195, 181)'), text=formatted_trailing_annual_dividend_rate))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Trailing Annual Dividend Rate', xaxis_title='KMeans Cluster', yaxis_title='Average Trailing Annual Dividend Rate (R$)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Average Trailing Annual Dividend Rate Analysis by Cluster**

**`Average Trailing Annual Dividend Rate Overview`**

- **`Cluster 0`**: The trailing average annual dividend rate is **R$1.10**, which could indicate a moderate yield for investors looking for steady income.
- **`Cluster 1`**: With a slightly higher rate of **R$1.31**, this cluster may have companies with a slightly more generous dividend policy.
- **`Cluster 2`**: Shows a trailing average annual dividend rate of **R$0.82**, the lowest among the clusters, perhaps reflecting a different approach to dividend distributions or reinvestment strategies.

**`Comparative Analysis`**

- **`Cluster 1`** has the highest average trailing annual dividend rate, which could suggest that its companies have more robust dividend policies or higher profitability allowing for such distributions.
- **`Cluster 0`** and **`Cluster 2`** have lower rates, which might indicate these companies prioritize reinvesting their earnings over distributing them as dividends.

**`Implications for Investors`**

- Income-focused investors may find **`Cluster 1`** companies more attractive due to their higher average dividend payouts.
- Growth-oriented investors might look towards **`Cluster 0`** and **`Cluster 2`** for companies that could be reinvesting in their business to fuel future growth.

**`Summary`**

- The trailing annual dividend rate gives an insight into the dividend payout tendencies of the clusters over the past year.
- **`Cluster 1`** presents the highest average, potentially attracting dividend-seeking investors, while **`Cluster 0`** and **`Cluster 2`** might appeal to those looking for companies with a reinvestment-focused financial strategy.


In [100]:
fig = go.Figure()
fig.add_trace(go.Bar(x=dividend_payment['kmeans_cluster'], y=dividend_payment['trailing_annual_dividend_yield'], hovertext='trailing_annual_dividend_yield', name='Trailing Annual Dividend Yield', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (dividend_payment['trailing_annual_dividend_yield']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Trailing Annual Dividend Yield by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Trailing Annual Dividend Yield (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Average Trailing Annual Dividend Yield Analysis by Cluster**

**`Average Trailing Annual Dividend Yield Overview`**

- **`Cluster 0`**: The average trailing annual dividend yield is **4.28%**, indicating a relatively higher return on investment from dividends.
- **`Cluster 1`**: With an average yield of **5.91%**, this cluster tops the list, suggesting a potentially attractive option for income-seeking investors.
- **`Cluster 2`**: Presents a yield of **1.84%**, which is considerably lower than the other clusters, possibly indicating growth-focused companies with lower dividend payouts.

**`Comparative Analysis`**

- **`Cluster 1`** has the highest average dividend yield, which could be indicative of its companies' higher payout ratios or currently undervalued stock prices.
- **`Cluster 0`**'s yield suggests a balanced approach to dividend payments relative to stock price, providing a decent income while potentially retaining earnings for growth.
- The lower yield in **`Cluster 2`** could reflect a strategy of retaining earnings for reinvestment rather than distribution.

**`Implications for Investors`**

- The significant dividend yield in **`Cluster 1`** may appeal to those prioritizing income, especially in a low-interest-rate environment.
- Investors might perceive the yields in **`Cluster 0`** and especially **`Cluster 2`** as indicative of companies with future growth potential, reinvesting their earnings instead of paying high dividends.

**`Summary`**

- The trailing annual dividend yields provide insights into the income-generating potential of the stocks within each cluster.
- **`Cluster 1`** stands out for income investors, while **`Cluster 0`** and **`Cluster 2`** may align with the goals of investors who have a longer-term growth perspective.


In [101]:
fig = go.Figure()
fig.add_trace(go.Bar(x=dividend_payment['kmeans_cluster'], y=dividend_payment['dividend_payout_ratio'], hovertext='dividend_payout_ratio', name='Dividend Payout Ratio', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (dividend_payment['dividend_payout_ratio'])]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Dividend Payout Ratio by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Dividend Payout Ratio (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Average Dividend Payout Ratio Analysis by Cluster**

**`Average Dividend Payout Ratio Overview`**

- **`Cluster 0`**: The average dividend payout ratio is **96.60%**, which is relatively high and could indicate that companies are returning most of their earnings to shareholders as dividends.
- **`Cluster 1`**: With a payout ratio of **197.57%**, companies in this cluster are paying out more in dividends than they earn, which could be unsustainable in the long term.
- **`Cluster 2`**: Shows an extraordinarily high payout ratio of **1342.21%**, suggesting that dividends might be funded through debt or savings, questioning the sustainability of such high payouts.

**`Comparative Analysis`**

- **`Cluster 0`** has a high payout ratio, but still within a range that could be considered sustainable if the companies have stable earnings.
- **`Cluster 1`** and **`Cluster 2`** have payout ratios that exceed 100%, which is unusual and might indicate special dividend payments or a period of lower earnings.

**`Implications for Investors`**

- A high payout ratio like in **`Cluster 0`** could be attractive to income investors if it is supported by strong fundamentals and consistent earnings.
- The excessive payout ratios in **`Cluster 1`** and **`Cluster 2`** raise questions about dividend sustainability and may require further investigation into the companies' financial health.

**`Summary`**

- The dividend payout ratio is a key indicator of how much a company pays out in dividends relative to its earnings.
- **`Cluster 0`** suggests a strong focus on returning earnings to shareholders, whereas the unusually high ratios in **`Cluster 1`** and **`Cluster 2`** could be red flags for investors looking for sustainable dividend policies.


#### Financial Health and Capital Structure

##### Liquidity and Reserves

In [103]:
liquidity_reserves = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster'])[['total_cash', 'total_cash_per_share', 'book_value']].mean()
liquidity_reserves = liquidity_reserves.reset_index()
liquidity_reserves

Unnamed: 0,kmeans_cluster,total_cash,total_cash_per_share,book_value
0,0,1347213000.0,6.527835,13.515033
1,1,48039140000.0,9.765962,16.578019
2,2,1190988000.0,23.64178,-23.080551


In [104]:
formatted_liquidity_reserves= format_values(liquidity_reserves['total_cash'])

fig = go.Figure()
fig.add_trace(go.Bar(x=liquidity_reserves['kmeans_cluster'], y=liquidity_reserves['total_cash'], hovertext=formatted_liquidity_reserves, name='Total Cash', marker=dict(color='rgb(100, 195, 181)'), text=formatted_liquidity_reserves))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Total Cash', xaxis_title='KMeans Cluster', yaxis_title='Average Total Cash (R$)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Average Total Cash Analysis by Cluster**

**`Average Total Cash Overview`**

- **`Cluster 0`**: Exhibits an average total cash of **R$1.35 billion**, which could indicate a moderate level of liquidity or reserves for operational and strategic purposes.
- **`Cluster 1`**: Has a significantly higher average total cash of **R$48.04 billion**, suggesting robust liquidity positions and possibly larger companies with more capital-intensive operations.
- **`Cluster 2`**: Shows an average total cash of **R$1.19 billion**, slightly less than Cluster 0, which could reflect a mix of company sizes and capital allocation strategies.

**`Comparative Analysis`**

- The stark difference between **`Cluster 1`** and the others suggests a concentration of wealthier corporations in this cluster, possibly with different industry characteristics or more conservative cash management policies.
- **`Cluster 0`** and **`Cluster 2`**, while closer in cash holdings, might still indicate different risk profiles or stages of business growth.

**`Implications for Investors`**

- Investors may perceive the high cash reserves in **`Cluster 1`** as a sign of strength and a buffer against market volatility or economic downturns.
- The lower cash reserves in **`Cluster 0`** and **`Cluster 2`** might imply either a more aggressive investment approach or a need for careful cash flow management.

**`Summary`**

- Average total cash is a financial metric indicating the amount of cash a company has on hand, which can be crucial for funding operations, growth, and returning value to shareholders.
- Companies in **`Cluster 1`** stand out with substantial cash reserves, possibly offering greater financial stability, while **`Cluster 0`** and **`Cluster 2`** suggest a balance between liquidity and investment opportunities.


In [105]:
formatted_liquidity_reserves= format_values(liquidity_reserves['total_cash_per_share'])

fig = go.Figure()
fig.add_trace(go.Bar(x=liquidity_reserves['kmeans_cluster'], y=liquidity_reserves['total_cash_per_share'], hovertext=formatted_liquidity_reserves, name='Total Cash per Share', marker=dict(color='rgb(100, 195, 181)'), text=formatted_liquidity_reserves))
fig.update_traces(textposition='outside')
fig.update_layout(title='Total Cash per Share', xaxis_title='KMeans Cluster', yaxis_title='Average Total Cash per Share (R$)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Total Cash per Share Analysis by Cluster**

**`Total Cash per Share Overview`**

- **`Cluster 0`**: With an average total cash per share of **R$6.53**, this cluster may reflect companies with moderate liquidity that is available to each share.
- **`Cluster 1`**: Shows a higher average at **R$9.77** per share, suggesting companies with more cash available per share, which can be a positive indicator for investors.
- **`Cluster 2`**: Stands out with **R$23.64** average cash per share, indicating companies that could potentially have substantial cash reserves relative to their share count, possibly hinting at strong financial health or a buildup of reserves for future investments or dividends.

**`Comparative Analysis`**

- **`Cluster 2`** shows a significantly higher average total cash per share, which might indicate these companies are in a better position to weather economic downturns or fund growth without needing to access external capital.
- **`Cluster 0`** and **`Cluster 1`**, while lower, still show that companies have a reasonable amount of cash on hand relative to their shares, which could be reassuring to investors.

**`Implications for Investors`**

- A higher total cash per share in **`Cluster 2`** might appeal to investors looking for companies with a strong cash position, which can be a sign of a well-managed company.
- The lower but still positive figures for **`Cluster 0`** and **`Cluster 1`** suggest that these companies are managing their cash and share distributions effectively, which could be appealing depending on an investor's strategy.

**`Summary`**

- Total cash per share is an important metric that helps investors understand how much cash is available for each share of a company.
- The companies in **`Cluster 2`** demonstrate higher liquidity per share, which may offer a buffer against financial stress or allow for shareholder-friendly actions like dividends or buybacks, while **`Cluster 0`** and **`Cluster 1`** indicate more moderate but still positive cash positions.


In [106]:
formatted_liquidity_reserves= format_values(liquidity_reserves['book_value'])

fig = go.Figure()
fig.add_trace(go.Bar(x=liquidity_reserves['kmeans_cluster'], y=liquidity_reserves['book_value'], hovertext=formatted_liquidity_reserves, name='Book Value', marker=dict(color='rgb(100, 195, 181)'), text=formatted_liquidity_reserves))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Book Value', xaxis_title='KMeans Cluster', yaxis_title='Average Book Value (R$)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Average Book Value Analysis by Cluster**

**`Average Book Value Overview`**

- **`Cluster 0`**: The average book value is **R$13.52**, which could suggest a conservative valuation or stable asset base relative to the market price of the shares.
- **`Cluster 1`**: With an average book value of **R$16.58**, this cluster might indicate companies with a stronger asset base or potentially undervalued stocks if market prices are lagging behind the book value.
- **`Cluster 2`**: Shows a negative average book value at **R$-23.08**, a potential red flag that could suggest issues such as substantial liabilities, declining asset values, or recent losses.

**`Comparative Analysis`**

- The positive average book values in **`Cluster 0`** and **`Cluster 1`** imply that the companies within these clusters have more assets than liabilities on their balance sheets, which is generally a sign of financial stability.
- The negative average book value in **`Cluster 2`** could be concerning, as it may indicate financial distress or a need for a deeper analysis to understand the reasons behind the negative value.

**`Implications for Investors`**

- Investors might be attracted to **`Cluster 1`** for its higher average book value, which could imply that the companies are undervalued or have a solid asset foundation.
- The negative book value in **`Cluster 2`** would typically require cautious assessment from investors, as it may point to companies that are higher risk or in need of turnaround strategies.

**`Summary`**

- Book value is a key metric in assessing a company's net asset value. Positive figures in **`Cluster 0`** and **`Cluster 1`** may indicate healthier financials and a potential safety net for investors.
- In contrast, the negative book value in **`Cluster 2`** suggests that these companies may be overleveraged or experiencing asset devaluation, warranting a closer investigation into their financial practices and market conditions.


##### Margins and Leverage

In [52]:
margins_leverage = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster'])[['gross_margins', 'ebitda_margins', 'equity']].mean()
margins_leverage = margins_leverage.reset_index()
margins_leverage

Unnamed: 0,kmeans_cluster,gross_margins,ebitda_margins,equity
0,0,0.308575,0.737234,-3492118000.0
1,1,0.284312,0.243896,-80504860000.0
2,2,0.307552,0.05593,-987932200.0


In [107]:
fig = go.Figure()
fig.add_trace(go.Bar(x=margins_leverage['kmeans_cluster'], y=margins_leverage['gross_margins'], hovertext='gross_margins', name='Gross Margins', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (margins_leverage['gross_margins']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Gross Margins by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Gross Margins (R$)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Average Gross Margins Analysis by Cluster**

**`Average Gross Margins Overview`**

- **`Cluster 0`**: Demonstrates an average gross margin of **30.86%**, indicative of relatively high profitability and efficient cost management in the production or service delivery processes.
- **`Cluster 1`**: Has a slightly lower average gross margin of **28.43%**, which may suggest either a competitive pricing strategy or higher costs associated with goods sold compared to Cluster 0.
- **`Cluster 2`**: Reports an average gross margin of **30.76%**, comparable to Cluster 0, which may indicate effective cost control or premium pricing strategies.

**`Comparative Analysis`**

- The close margins between **`Cluster 0`** and **`Cluster 2`** can imply similar industry sectors or competitive environments where companies maintain profitability while managing costs effectively.
- **`Cluster 1`**'s lower margin suggests either a different competitive landscape or variance in cost structures, potentially indicating a need for improved efficiency or a different market positioning.

**`Implications for Investors`**

- Investors might view the higher margins in **`Cluster 0`** and **`Cluster 2`** as indicators of strong market positions or superior cost management, which could be a sign of long-term sustainability and profitability.
- The lower margin in **`Cluster 1`** might either be a cause for scrutiny or a signal of investment opportunity if the companies are poised for margin improvement.

**`Summary`**

- Gross margin is a critical measure of a company's financial health, reflecting the difference between sales and the cost of goods sold.
- Companies in **`Cluster 0`** and **`Cluster 2`** show robust margins that suggest effective cost management or strong pricing power, while **`Cluster 1`** may reflect a different set of strategic or operational dynamics.


In [108]:
fig = go.Figure()
fig.add_trace(go.Bar(x=margins_leverage['kmeans_cluster'], y=margins_leverage['ebitda_margins'], hovertext='gross_margins', name='Ebitda Margins', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (margins_leverage['ebitda_margins']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Ebitda Margins by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Ebitda Margins (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Average EBITDA Margins Analysis by Cluster**

**`Average EBITDA Margins Overview`**

- **`Cluster 0`**: Shows an exceptionally high average EBITDA margin of **73.72%**, suggesting extraordinary operational efficiency or a sector with typically high margins.
- **`Cluster 1`**: Presents an average EBITDA margin of **24.39%**, which is substantially lower than Cluster 0, indicating different industry dynamics or less operational leverage.
- **`Cluster 2`**: Has the lowest average EBITDA margin among the clusters at **5.59%**, which may reflect heavy investment phases, high-cost structures, or industries with lower margins.

**`Comparative Analysis`**

- The significant margin gap between **`Cluster 0`** and the others could be due to Cluster 0 operating in high-margin industries or possessing a competitive advantage in cost management.
- **`Cluster 1`** and **`Cluster 2`** exhibit more typical EBITDA margins, with Cluster 1 being closer to the average industrial benchmark and Cluster 2 possibly facing challenges that are suppressing margins.

**`Implications for Investors`**

- Investors may regard **`Cluster 0`** as highly profitable with a potential for strong cash flow generation, but should also consider the sustainability of such high margins.
- The lower margins in **`Cluster 1`** and **`Cluster 2`** might attract investors looking for companies with potential for margin improvement or those positioned in industries with long-term growth prospects.

**`Summary`**

- EBITDA margin is an indicator of a company's operating profitability before non-operating expenses, taxes, depreciation, and amortization.
- **`Cluster 0`** stands out with its high margin, suggesting exceptional profitability, while **`Cluster 1`** and **`Cluster 2`** might represent companies with either steady performance or potential for operational improvements.


In [109]:
formatted_margins_leverage= format_values(margins_leverage['equity'])

fig = go.Figure()
fig.add_trace(go.Bar(x=margins_leverage['kmeans_cluster'], y=margins_leverage['equity'], hovertext=formatted_margins_leverage, name='Equity', marker=dict(color='rgb(100, 195, 181)'), text=formatted_margins_leverage))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Equity', xaxis_title='KMeans Cluster', yaxis_title='Average Equity (R$)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Average Equity Analysis by Cluster**

**`Average Equity Overview`**

- **`Cluster 0`**: Presents a negative average equity value at **R$-3.49 B**, indicating that liabilities may exceed assets, which can be a sign of financial distress or high leverage.
- **`Cluster 1`**: Has a significantly high positive average equity value of **R$80.50 B**, suggesting robust financial health and a solid capital base.
- **`Cluster 2`**: Shows an average equity of **R$-987.93 M**, also negative, which can imply financial challenges or a heavy debt load relative to equity.

**`Comparative Analysis`**

- The stark contrast between the high positive equity in **`Cluster 1`** and the negative figures in **`Cluster 0`** and **`Cluster 2`** could reflect different capital structures and strategies among the clusters.
- The negative equity in **`Cluster 0`** and **`Cluster 2`** might indicate companies that have recently incurred losses, are highly leveraged, or are in capital-intensive industries.

**`Implications for Investors`**

- **`Cluster 1`**'s strong equity position could be attractive to investors looking for financially stable companies with the potential to sustain dividends or invest in growth opportunities.
- The negative equity in **`Cluster 0`** and **`Cluster 2`** may warrant caution, as it often requires careful analysis to understand the risks and the potential for turnaround.

**`Summary`**

- Equity represents the residual interest in the assets of a company after deducting liabilities and serves as an indicator of financial health.
- **`Cluster 1`** displays a substantial positive average equity, suggesting financial strength, while **`Cluster 0`** and **`Cluster 2`** show negative average equity, which may indicate financial leverage or recent financial difficulties.


#### Trading Volume and Activity

##### Trading Volumes

In [56]:
trading_volumes = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster'])[['volume', 'average_volume']].mean()
trading_volumes = trading_volumes.reset_index()
trading_volumes

Unnamed: 0,kmeans_cluster,volume,average_volume
0,0,1469531.0,2903112.0
1,1,3818669.0,10228800.0
2,2,257594.1,677509.0


In [110]:
formatted_trading_volumes= format_values_x(trading_volumes['volume'])


fig = px.bar(trading_volumes, title='Volume', x='kmeans_cluster', y='volume', color_discrete_sequence=['rgb(100, 195, 181)'], hover_name='volume', height=600)
fig.update_traces(text=formatted_trading_volumes, textposition='outside',textfont=dict(color='white'))
fig.update_layout(xaxis_title='Kmeans Cluster', yaxis_title='Volume', template = 'plotly_dark')
fig.show()

**Trading Volume Analysis by Cluster**

**`Volume Overview`**

- **`Cluster 0`**: Shows a trading volume of **1 M**, indicating a moderate level of trading activity and liquidity.
- **`Cluster 1`**: Has a significantly higher trading volume at **3.816 M**, suggesting high liquidity and possibly greater investor interest or larger company size.
- **`Cluster 2`**: The volume is at **257.594 K**, which is lower compared to others, potentially indicating lower liquidity or investor attention.

**`Comparative Analysis`**

- The substantial difference in trading volumes between **`Cluster 1`** and the others could reflect a higher profile or more actively traded companies within this cluster.
- **`Cluster 0`** and **`Cluster 2`** have lower volumes, which could imply more volatility in stock prices or a smaller investor base.

**`Implications for Investors`**

- Investors may find the high volume in **`Cluster 1`** indicative of strong market interest, possibly making these stocks a more reliable choice for trading due to better liquidity.
- The lower volumes in **`Cluster 0`** and **`Cluster 2`** might appeal to investors looking for less mainstream opportunities, but they should be aware of the potential for higher spreads and price volatility.

**`Summary`**

- Trading volume is a key metric in assessing the liquidity and investor interest in stocks.
- **`Cluster 1`**'s high volume may offer more fluid entry and exit points for traders, while **`Cluster 0`** and **`Cluster 2`** might be subject to sharper price movements due to their lower trading volumes.


In [111]:
formatted_trading_volumes= format_values_x(trading_volumes['average_volume'])


fig = px.bar(trading_volumes, title='Average of the mean Volume By Cluster', x='kmeans_cluster', y='average_volume', color_discrete_sequence=['rgb(100, 195, 181)'], hover_name='average_volume', height=600)
fig.update_traces(text=formatted_trading_volumes, textposition='outside',textfont=dict(color='white'))
fig.update_layout(xaxis_title='Kmeans Cluster', yaxis_title='Average of the mean Volume', template = 'plotly_dark')
fig.show()

**Average of the Mean Trading Volume Analysis by Cluster**

**`Average of the Mean Volume Overview`**

- **`Cluster 0`**: The average trading volume is **3 M**, indicating a moderate level of activity which may suggest a stable investor interest.
- **`Cluster 1`**: Shows a significantly higher average volume at **10 M**, suggesting robust activity and possibly higher liquidity and investor engagement.
- **`Cluster 2`**: Has an average volume of **677.509 K**, which is considerably lower than the other clusters, potentially indicating less investor interest or smaller company size.

**`Comparative Analysis`**

- The disparity between **`Cluster 1`** and the other clusters in terms of average volume could reflect more substantial investor confidence or a larger market capitalization of the companies within that cluster.
- **`Cluster 0`** and especially **`Cluster 2`** represent lower volumes, which might be associated with smaller, potentially more volatile stocks or those with lower market caps.

**`Implications for Investors`**

- A higher average volume in **`Cluster 1`** can be appealing to investors looking for liquidity and potentially less volatility due to the ease of entry and exit.
- The smaller volumes in **`Cluster 0`** and **`Cluster 2`** might attract investors seeking niche opportunities or those with a higher risk tolerance, given the potential for more significant price swings.

**`Summary`**

- The average trading volume provides a snapshot of the trading activity and liquidity associated with the stocks within each cluster.
- **`Cluster 1`**'s notably higher average volume may offer more predictable trading patterns, while **`Cluster 0`** and **`Cluster 2`** could be indicative of more speculative or less liquid stocks.


## TL/DR

**`Cluster Overview`**

**`Cluster 0`**

- **Characteristics**: High market cap and substantial enterprise value, typically indicating mature, established companies.
- **Key Metrics**: Large market cap and enterprise value.
- **Likely Sectors & Companies**: Could include traditional sectors like finance or utilities. Example companies: **Banco ABC Brasil S.A. (ABCB4.SA)**, representing the financial services sector with a solid market cap.

**`Cluster 1`**

- **Characteristics**: Exceptional EBITDA margins, indicative of operational efficiency and profitability.
- **Key Metrics**: High EBITDA margins.
- **Likely Sectors & Companies**: Firms in technology or healthcare sectors with high operational leverage. Example: **Rumo S.A. (RAIL3.SA)** in the industrials sector, showcasing strong operational margins.

**`Cluster 2`**

- **Characteristics**: Lower market cap, signifying emerging businesses or high-growth niches.
- **Key Metrics**: Higher beta values for potential growth.
- **Likely Sectors & Companies**: Potentially includes tech startups or innovative sectors. An example could be a rapidly growing tech firm or a biotech startup.

**`Hypothetical Investment Recommendations`**

- **Risk-Averse Investors**: **`Cluster 0`** is suitable, showcasing stability in companies like **Banco ABC Brasil S.A. (ABCB4.SA)**.
- **Growth-Oriented Investors**: **`Cluster 2`** could be appealing for its growth potential, albeit with higher volatility. Emerging tech or biotech firms would be key examples.
- **Value Investors**: **`Cluster 1`** offers a balance with companies like **Rumo S.A. (RAIL3.SA)**, which demonstrate strong operational efficiency.

**`Conclusive Insights`**

- **`Cluster 0`**: A preference for stable, established sectors with companies like **Banco ABC Brasil S.A.**.
- **`Cluster 1`**: Balances risk with operational efficiency, exemplified by companies like **Rumo S.A.**.
- **`Cluster 2`**: Higher risk tolerance for growth opportunities in potentially emerging sectors or innovative companies.