# Statistical and Machine Learning Models for Fundamentalist Data

This notebook is a useful tool for investors interested in the Brazilian stock market. It integrates machine learning techniques and statistical models to analyze fundamentalist data of companies listed on the stock exchange. The aim is to provide in-depth analysis and facilitate investment decision-making, focusing on identifying opportunities and mitigating risks. It includes interactive visualizations and real-time updates, making it accessible and practical for both experienced investors and beginners.

## Initial Setup

### Install Packages

In [1]:
%pip install pandas -q
%pip install plotly -q
%pip install scikit-learn -q

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


### Import libs

In [2]:
import os
from pathlib import Path
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.express as px
import warnings
warnings.filterwarnings('ignore')

### Create a file path default

In [3]:
file_path_scored = str(Path(os.getcwd()).parent.parent.parent / "data/scored_base")
file_path_book = str(Path(os.getcwd()).parent.parent.parent / "data/book")

### Pandas Config

In [4]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

### Load data

In [5]:
df_fundamentals_scored_kmeans = pd.read_csv(file_path_scored + "/fundamentals_scored_clusters.csv")
df_fundamentals_book = pd.read_csv(file_path_book + "/fundamentals_book.csv")

## Insights on Clustering (Kmeans)

### Companies and Sectors per Cluster

In [6]:
number_companies_cluster = df_fundamentals_scored_kmeans.groupby('kmeans_cluster')['ticker'].count()
total_companies = number_companies_cluster.sum()
percentage_companies_cluster = (number_companies_cluster / total_companies) * 100
combined_text = [f"Companies: {count} - Percentage: {percent:.2f}%" for count, percent in zip(number_companies_cluster, percentage_companies_cluster)]

fig = go.Figure()
fig.add_trace(go.Bar(x=number_companies_cluster.index, y=number_companies_cluster.values, name='Number of Companies per Cluster', text=combined_text, marker=dict(color='rgb(100, 195, 181)')))
fig.update_traces(textposition='outside')
fig.update_layout(title='Number of Companies per Cluster', xaxis_title='Clusters', yaxis_title='Number of Companies', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Company Clusters Analysis**

- **`Cluster 0`**: **121** companies (**41.58%**)
  - Largest cluster, indicating a common set of characteristics among many companies.

- **`Cluster 1`**: **52** companies (**17.87%**)
  - Smaller cluster, could represent niche markets or specialized company attributes.

- **`Cluster 2`**: **118** companies (**40.55%**)
  - Nearly as large as Cluster 0, suggesting another broad category of companies.

**`Key Points`**:
- `Clusters 0 and 2` dominate the dataset, implying two main types of company profiles.
- `Cluster 1`, being the smallest, may require further investigation to determine its unique traits.


In [7]:
fig = px.treemap(df_fundamentals_scored_kmeans, path=['kmeans_cluster', 'sector', 'ticker'], title='Treemap of Companies in Clusters')
fig.update_layout(title='Companies per Cluster', template='plotly_dark', height = 800)
fig.show()


In [8]:
sectors_cluster = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster', df_fundamentals_scored_kmeans['sector']]).count()
sectors_cluster_reset = sectors_cluster.reset_index()
sectors_cluster_reset = sectors_cluster_reset[['kmeans_cluster', 'sector']]
heatmap_data = pd.crosstab(sectors_cluster_reset['sector'], sectors_cluster_reset['kmeans_cluster'])

text_data = [['' for _ in range(len(heatmap_data.columns))] for _ in range(len(heatmap_data.index))]

for sector_idx, sector in enumerate(heatmap_data.index):
    for cluster_idx, cluster in enumerate(heatmap_data.columns):
        if heatmap_data.loc[sector, cluster] == 0:
            text_data[sector_idx][cluster_idx] = f'The sector {sector} not within the cluster {cluster}'
        else:
            text_data[sector_idx][cluster_idx] = f'The sector {sector} is within the cluster {cluster}'

fig = go.Figure(data=go.Heatmap(z=heatmap_data, x=heatmap_data.columns, y=heatmap_data.index, colorscale='Viridis', text=text_data, hoverinfo='text', showscale=False))
fig.update_layout(title='Presence of Sectors by Cluster', xaxis_title='Cluster', yaxis_title='Setor', font=dict(color='white'), plot_bgcolor='black', paper_bgcolor='black')
fig.show()

**Sector-Based Cluster Analysis for Listed Companies**

**`Cluster 0: Diverse and Consumer-Oriented`**

- **Largest Cluster**: `121` companies across various consumer-facing and utility sectors.
- **Key Sectors**: Consumer Cyclical, Utilities, Real Estate, and Healthcare.

- **Prominent Companies**:
  - `Magazine Luiza S.A.` (MGLU3.SA) - A major retailer in the Consumer Cyclical sector.
  - `Gol Linhas Aéreas Inteligentes S.A.` (GOLL4.SA) - Leading airline within the Industrials sector.
  - `TOTVS S.A.` (TOTS3.SA) - A significant software company in the Technology sector.

**`Cluster 1: Industrials and Financial Heavyweights`**

- **Moderate-Sized Cluster**: `52` companies, with a concentration in Industrials and Financial Services.
- **Key Sectors**: Basic Materials, Energy, and Financial Services.

- **Prominent Companies**:
  - `Vale S.A.` (VALE3.SA) - One of the largest mining companies in the world within the Basic Materials sector.
  - `Petrobras` (PETR3.SA, PETR4.SA) - A global oil leader in the Energy sector.
  - `Banco Bradesco S.A.` (BBDC3.SA, BBDC4.SA) - A top financial institution in the Financial Services sector.
  - `Ambev S.A.` (ABEV3.SA) - The biggest brewer in Latin America under the Consumer Defensive sector.
  - `Itaú Unibanco Holding S.A.` (ITUB4.SA) - A leading banking conglomerate in the Financial Services sector.

**`Cluster 2: Specialized and Emerging Players`**

- **Smallest Cluster**: `110` companies, likely indicating niche specialization or emerging market presence.
- **Key Sectors**: Financial Services, Real Estate, and Consumer Cyclical.

- **Prominent Companies**:
  - `Gafisa S.A.` (GFSA3.SA) - A well-known name in the Real Estate sector.
  - `LOG Commercial Properties` (LOGG3.SA) - Engaged in commercial real estate, indicating growth within the sector.
  - `Triunfo Participações e Investimentos S.A.` (TPIS3.SA) - Operating within the Industrials sector with potential for infrastructure development.
  
**`Summary`**
- **Cluster 0** companies are largely consumer-focused, indicative of Brazil's robust domestic market and service-oriented economy.
- **Cluster 1** includes industrial giants and financial stalwarts, reflecting Brazil's key role in global commodities and financial markets.
- **Cluster 2** suggests a grouping of companies with specialized roles, potential for growth, or those targeting emerging trends in the Financial and Real Estate sectors.


#### Financial Assessment and Profitability

##### Liquidity and Reserves

In [60]:
liquidity_reserves = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster'])[['profit_margins', 'operating_margins', 'ebitda', 'gross_profits']].mean()
liquidity_reserves = liquidity_reserves.reset_index()
liquidity_reserves

Unnamed: 0,kmeans_cluster,profit_margins,operating_margins,ebitda,gross_profits
0,0,0.108734,0.169274,1130676000.0,1743571000.0
1,1,0.215149,0.234825,19601040000.0,34763570000.0
2,2,0.499287,-0.506219,89064530.0,531768200.0


In [62]:
fig = go.Figure()
fig.add_trace(go.Bar(x=liquidity_reserves['kmeans_cluster'], y=liquidity_reserves['profit_margins']*100, name='Average Profit Margins', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (liquidity_reserves['profit_margins']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Profit Margins by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Profit Margins (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Average Profit Margins Analysis by Cluster**

**Profit Margins Overview**

- **`Cluster 0`**: The base cluster with an average profit margin of **10.87%**.
- **`Cluster 1`**: Shows a substantially higher profitability with an average margin of **21.51%**.
- **`Cluster 2`**: Dominates in profitability with an average margin of **49.93%**.

**Comparative Analysis**

- **`Cluster 2`**'s average margin is more than twice that of **Cluster 1** and nearly five times higher than **Cluster 0**. This indicates that **Cluster 2** may consist of companies with either high pricing power, lower costs, or operating in high-margin industries.
- **`Cluster 1`** also significantly outperforms **Cluster 0**, suggesting that its companies might be more efficient, operate in more profitable sectors, or benefit from economies of scale.
- **`Cluster 0`**, while having the lowest average margins, may represent companies in competitive or capital-intensive industries with thinner profit margins.

**Industry Implications**

- Given that **`Cluster 2`** contains companies such as `LOG Commercial Properties` (LOGG3.SA), its high margin could be indicative of profitable real estate deals or a favorable property market.
- In **`Cluster 1`**, companies like `Vale S.A.` (VALE3.SA) and `Petrobras` (PETR3.SA, PETR4.SA) reflect the significant profitability potential in the Basic Materials and Energy sectors.
- **`Cluster 0`** includes consumer-focused companies like `Magazine Luiza S.A.` (MGLU3.SA), which could be operating in highly competitive markets with aggressive pricing strategies, leading to lower profit margins.

**Summary**

- **`Cluster 2`** stands out as a high-profit group, potentially benefiting from favorable market conditions or strategic operational efficiencies.
- **`Cluster 1`** represents a middle ground, possibly balancing scale and profitability in their operations.
- **`Cluster 0`** may need to focus on cost optimization or market differentiation to enhance profitability.


In [66]:
fig = go.Figure()
fig.add_trace(go.Bar(x=liquidity_reserves['kmeans_cluster'], y=liquidity_reserves['operating_margins']*100, name='Average Operating Margins', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (liquidity_reserves['operating_margins']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Operating Margins by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Operating Margins (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Average Operating Margins Analysis by Cluster**

**Operating Margins Overview**

- **`Cluster 0`**: Exhibits a solid average operating margin of **16.93%**.
- **`Cluster 1`**: Surpasses Cluster 0 with a higher average operating margin of **23.48%**.
- **`Cluster 2`**: Shows a negative average operating margin of **-50.62%**, indicating operational losses.

**Comparative Analysis**

- **`Cluster 1`**’s operating margin is notably higher than that of **`Cluster 0`**, suggesting more efficient operations or a favorable cost structure within Cluster 1's industries.
- The negative margin of **`Cluster 2`** is a cause for concern as it indicates that companies are spending more to operate than they are earning. This could be due to a variety of reasons, such as aggressive investment in growth, unfavorable market conditions, or inefficient operations.

**Industry Implications**

- The positive margins for **`Clusters 0 and 1`** indicate healthy operational efficiency overall. Companies in these clusters, such as those in the Industrials sector like `RAIL3.SA`, are likely managing their expenses well relative to their revenue.
- The negative margin for **`Cluster 2`** could suggest that companies in this cluster, which may include real estate firms like `LOG Commercial Properties` (LOGG3.SA), are in a growth phase, investing heavily in operations, or they may be affected by external challenges such as market downturns or increased competition.

**Summary**

- **`Cluster 1`** represents an optimal performance model with the highest operating margins, potentially reflecting companies with strong market positions or operational excellence.
- **`Cluster 0`**'s positive margins suggest stable operations but with room for improvement in efficiency or cost management to reach the levels of **`Cluster 1`**.
- **`Cluster 2`** faces significant challenges, with its negative margin highlighting the need for strategic reassessments, operational overhauls, or market repositioning to return to profitability.


In [12]:
def format_values(values):
    formatted = []
    for value in values:
        abs_value = abs(value)
        if 1e9 <= abs_value < 1e11:
            formatted.append(f'R${value / 1e9:.2f} B')
        elif abs_value >= 1e11:
            formatted.append(f'R${value / 1e11:.2f} x 100B')
        elif abs_value >= 1e6:
            formatted.append(f'R${value / 1e6:.2f} M')
        else:
            formatted.append(f'R${value:.2f}')
    return formatted

In [13]:
def format_values_x(values):
    formatted = []
    for value in values:
        abs_value = abs(value)
        if 1e9 <= abs_value < 1e11:
          
            formatted_value = f"{round(value / 1e9):,}".replace(',', '.') + " B"
        elif abs_value >= 1e11:
            
            formatted_value = f"{round(value / 1e11):,}".replace(',', '.') + " x 100B"
        elif abs_value >= 1e6:
            
            formatted_value = f"{round(value / 1e6):,}".replace(',', '.') + " M"
        else:
            
            formatted_value = f"{round(value):,}".replace(',', '.') + " K"
        formatted.append(formatted_value)
    return formatted

In [67]:
formatted_ebitda = format_values(liquidity_reserves['ebitda'])

fig = go.Figure()
fig.add_trace(go.Bar(x=liquidity_reserves['kmeans_cluster'], y=liquidity_reserves['ebitda'], hovertext=formatted_ebitda, name='Average Ebitda', marker=dict(color='rgb(100, 195, 181)'), text=formatted_ebitda))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Ebitda by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Ebitda (R$)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Average EBITDA Analysis by Cluster**

**EBITDA Overview**

- **`Cluster 0`**: Posts an average EBITDA of **R$1.13 Billion**, suggesting moderate operational profitability.
- **`Cluster 1`**: Significantly leads with an average EBITDA of **R$19.60 Billion**, indicative of high operational efficiency or a presence in highly profitable sectors.
- **`Cluster 2`**: Has the lowest average EBITDA, at **R$89.06 Million**, potentially reflecting smaller or newer companies.

**Comparative Analysis**

- The EBITDA of **`Cluster 1`** dwarfs that of the other clusters, potentially due to the scale of operations or higher-margin businesses.
- **`Cluster 0`**'s EBITDA reflects steady business performance but shows room for growth or operational improvements to reach the level of Cluster 1.
- The much lower EBITDA in **`Cluster 2`** suggests these companies might be in their nascent stages, specialized niches, or facing operational challenges.

**Sector Implications**

- Companies in **`Cluster 1`** such as `Petrobras` (PETR3.SA, PETR4.SA) may be driving the high average EBITDA, often characteristic of the energy sector's large capital-intensive operations.
- Firms in **`Cluster 0`** might include consumer-facing businesses like `Magazine Luiza S.A.` (MGLU3.SA), indicating solid but comparatively lower profitability sectors.
- Entities in **`Cluster 2`** could involve emerging tech or startup ventures that typically exhibit lower EBITDA in their growth phase.

**Summary**

- The EBITDA figures highlight **Cluster 1** as potentially the most established, with **Cluster 0** occupying a middle ground and **Cluster 2** needing strategic focus to enhance profitability.


In [68]:
formatted_gross_profits = format_values(liquidity_reserves['gross_profits'])

fig = go.Figure()
fig.add_trace(go.Bar(x=liquidity_reserves['kmeans_cluster'], y=liquidity_reserves['gross_profits'], hovertext=formatted_gross_profits, name='Gross Profits', marker=dict(color='rgb(100, 195, 181)'), text=formatted_gross_profits))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Gross Profits by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Gross Profits (R$)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Gross Profits Analysis by Cluster**

**Gross Profit Overview**

- **`Cluster 0`**: Reports an average gross profit of **R$1.74 Billion**, indicating modest profitability relative to the other clusters.
- **`Cluster 1`**: Shows a substantially larger average gross profit of **R$34.76 Billion**, suggesting operations in high-revenue or high-margin sectors.
- **`Cluster 2`**: Has the lowest average gross profit at **R$531.77 Million**, which might reflect smaller company sizes or industries with lower gross profit figures.

**Comparative Analysis**

- The gross profit for **`Cluster 1`** far exceeds that of **`Cluster 0`** and **`Cluster 2`**, indicating that companies in **Cluster 1** may have a larger scale of operations or more profitable product lines.
- **`Cluster 0`**'s gross profit suggests the presence of companies with efficient cost of goods sold (COGS) but potentially lower revenue scales compared to Cluster 1.
- **`Cluster 2`**'s lower gross profit could be due to a variety of factors, such as smaller size, lower sales volume, or higher COGS relative to sales.

**Sector Implications**

- Companies in **`Cluster 1`** might include large-scale enterprises like `Vale S.A.` (VALE3.SA), which typically have substantial gross profits due to the scale and nature of their operations.
- Firms in **`Cluster 0`** could be represented by mid-sized entities with steady profitability.
- **`Cluster 2`** may consist of companies in competitive or emerging sectors where gross profits are not as high due to pricing strategies, market penetration efforts, or investment phases.

**Summary**

- The data points to **`Cluster 1** as the leader in gross profitability, likely due to economies of scale or a focus on high-margin industries.
- **`Cluster 0`** represents companies with consistent performance, and **`Cluster 2`** may need to focus on increasing sales volume or reducing COGS to improve gross profits.


##### Revenue and Profit Growth

In [16]:
revenue_profit  = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster'])[[ 'total_revenue', 'earnings_quarterly_growth', 'revenue_growth', 'earnings_growth_rate']].mean()
revenue_profit = revenue_profit.reset_index()
revenue_profit

Unnamed: 0,kmeans_cluster,total_revenue,earnings_quarterly_growth,revenue_growth,earnings_growth_rate
0,0,7158268000.0,0.235256,0.020512,23.52562
1,1,77417490000.0,1.484365,-0.004231,148.436538
2,2,1005542000.0,0.107,0.081915,10.7


In [69]:
formatted_revenue_profit = format_values(revenue_profit['total_revenue'])

fig = go.Figure()
fig.add_trace(go.Bar(x=revenue_profit['kmeans_cluster'], y=revenue_profit['total_revenue'], hovertext=formatted_revenue_profit, name='Total Revenue', marker=dict(color='rgb(100, 195, 181)'), text=formatted_revenue_profit))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Total Revenue by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Total Revenue (R$)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Total Revenue Analysis by Cluster**

**Revenue Overview**

- **`Cluster 0`**: Shows a healthy average total revenue of **R$7.16 Billion**.
- **`Cluster 1`**: Dominates with a massive average total revenue of **R$77.42 Billion**.
- **`Cluster 2`**: Displays the smallest average total revenue at **R$1.01 Billion**.

**Comparative Analysis**

- **`Cluster 1`**'s average revenue is more than tenfold that of **`Cluster 0`** and an astounding seventy-fold more than **`Cluster 2`**, suggesting Cluster 1 companies are likely industry leaders or operate in high-volume sectors.
- **`Cluster 0`** indicates a strong performance, which might be typical for established companies with a solid market presence.
- **`Cluster 2`**'s relatively small revenue suggests it may consist of smaller companies, startups, or those in niche markets.

**Industry Implications**

- The significant revenue in **`Cluster 1`** could be attributed to companies with substantial market share or those operating in lucrative sectors, such as `Vale S.A.` (VALE3.SA) in the mining industry or `Petrobras` (PETR3.SA, PETR4.SA) in oil and gas.
- Entities in **`Cluster 0`** may reflect a diverse set of well-established businesses with consistent sales, like `Magazine Luiza S.A.` (MGLU3.SA) in retail.
- Firms in **`Cluster 2`** might be in earlier stages of growth or in specialized industries with lower sales volumes.

**Summary**

- **`Cluster 1`** appears to be the powerhouse, with revenues suggesting large-scale operations.
- **`Cluster 0`** represents a middle ground, possibly indicating a broad mix of mature companies.
- **`Cluster 2`** shows potential for growth, with current figures suggesting a focus on market entry or niche specialization.


In [70]:
fig = go.Figure()
fig.add_trace(go.Bar(x=revenue_profit['kmeans_cluster'], y=revenue_profit['earnings_quarterly_growth'], hovertext='earnings_quarterly_growth', name='Earnings Quarterly Growth', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (revenue_profit['earnings_quarterly_growth']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Earnings Quarterly Growth by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Earnings Quarterly Growth (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Earnings Quarterly Growth Analysis by Cluster**

**Growth Rates Overview**

- **`Cluster 0`**: Demonstrates a healthy growth rate in earnings at **23.53%** quarterly.
- **`Cluster 1`**: Exhibits an extraordinary average quarterly growth rate of **148.44%**, suggesting rapid earnings expansion.
- **`Cluster 2`**: Shows a more modest growth rate at **10.70%**, which may indicate a mature or stable market position, or possibly facing headwinds in growth.

**Comparative Analysis**

- **`Cluster 1`**'s earnings growth is significantly higher than the other clusters, likely indicating either a period of major market success, a recovery from previous lows, or growth from acquisitions and expansions.
- The respectable growth of **`Cluster 0`** suggests steady market performance and potentially consistent earnings improvements.
- **`Cluster 2`**'s lower growth rate compared to **`Cluster 0`** and **`Cluster 1`** may reflect a variety of factors, such as market saturation, slower market conditions, or conservative business strategies.

**Implications for Strategy**

- **`Cluster 1`** may consist of companies that have successfully leveraged market trends, new product launches, or other strategic initiatives to boost their earnings significantly.
- Companies in **`Cluster 0`** are possibly employing effective growth strategies that allow for sustainable earnings improvements over time.
- **`Cluster 2`** might benefit from reassessing their growth strategies or might be in sectors that naturally exhibit slower growth rates.

**Summary**

- The stark contrast in quarterly earnings growth between **`Cluster 1`** and the others underscores potentially aggressive growth strategies or favorable market conditions for `Cluster 1` companies.
- **`Cluster 0`** and **`Cluster 2`** show more traditional growth patterns, with `Cluster 0` companies likely capitalizing on solid business practices, whereas`Cluster 2`` may need to explore new avenues for growth.


In [71]:
fig = go.Figure()
fig.add_trace(go.Bar(x=revenue_profit['kmeans_cluster'], y=revenue_profit['revenue_growth'], hovertext='revenue_growth', name='Revenue Growth', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (revenue_profit['revenue_growth']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Revenue Growth by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Revenue Growth (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Revenue Growth Analysis by Cluster**

**Revenue Growth Overview**

- **`Cluster 0`**: Exhibits a positive revenue growth at **2.05%**, indicating steady business expansion.
- **`Cluster 1`**: Reflects a marginal average revenue contraction of **-0.42%**, suggesting slight revenue challenges or stabilization after a period of growth.
- **`Cluster 2`**: Outperforms with a robust average revenue growth of **8.19%**, which could signify aggressive market expansion or entry into new markets.

**Comparative Analysis**

- **`Cluster 2`**'s significant growth rate implies that companies within this cluster might be capturing new market shares or benefiting from innovative product lines or services.
- **`Cluster 0`** represents a stable growth scenario, often seen in well-established markets or companies with consistent performance.
- The contraction in **`Cluster 1`** could indicate market saturation, a cyclical downturn, or the impacts of competitive pressures.

**Strategic Implications**

- Companies in **`Cluster 2`** may be experiencing a phase of strong growth, potentially due to successful strategies or favorable market conditions.
- **`Cluster 0`**'s growth suggests that companies may be maintaining a steady performance, which could be ideal for certain investment strategies.
- For **`Cluster 1`**, the slight decrease in revenue might prompt strategies to rejuvenate growth or optimize operations to improve profitability.

**Summary**

- The data highlights **`Cluster 2`** as a dynamic growth segment, possibly offering opportunities for investment in high-growth potential companies.
- **`Cluster 0`** presents a picture of stability and could be attractive to investors seeking consistent performers.
- **`Cluster 1`** might require careful analysis to understand the factors behind the revenue decrease and to identify any potential for a turnaround.


##### Asset Efficiency and ROI

In [21]:
asset_efficiency = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster'])[['total_assets_approx', 'asset_turnover', 'roi', 'return_on_assets', 'return_on_equity', 'roce']].mean()
asset_efficiency = asset_efficiency.reset_index()
asset_efficiency

Unnamed: 0,kmeans_cluster,total_assets_approx,asset_turnover,roi,return_on_assets,return_on_equity,roce
0,0,1347213000.0,7335.995546,1.50837,0.060052,0.121749,0.057324
1,1,48039140000.0,7.474149,0.734347,0.059335,0.159957,0.132031
2,2,1190988000.0,90.263563,0.99756,0.019301,0.127807,0.048294


In [73]:
formatted_asset_efficiency = format_values(asset_efficiency['total_assets_approx'])

fig = go.Figure()
fig.add_trace(go.Bar(x=asset_efficiency['kmeans_cluster'], y=asset_efficiency['total_assets_approx'], hovertext=formatted_asset_efficiency, name='Total Assets Approx', marker=dict(color='rgb(100, 195, 181)'), text=formatted_asset_efficiency))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Total Assets Approx by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Total Assets Approx (R$)', template='plotly_dark', font=dict(color='white'), height=560)
fig.show()

**Total Assets Analysis by Cluster**

**Asset Overview**

- **`Cluster 0`**: Has a modest asset base with average total assets around **R$1.35 Billion**.
- **`Cluster 1`**: Possesses a substantial asset base with average total assets at **R$48.04 Billion**.
- **`Cluster 2`**: Holds average total assets close to **R$1.19 Billion**, slightly lower than Cluster 0.

**Comparative Analysis**

- **`Cluster 1`**'s asset size vastly outpaces that of **Cluster 0** and **Cluster 2**, indicating that companies in Cluster 1 may either be larger in scale or operate in asset-heavy industries.
- The similar asset sizes of **Clusters 0 and 2** suggest they may include smaller companies or those in sectors that require less capital investment.

**Strategic Implications**

- The large average asset total in **Cluster 1** might reflect companies with significant physical assets, like real estate or manufacturing plants, or those that have accumulated assets over a long period of market presence.
- **`Cluster 0`** and **`Cluster 2`** may be more representative of companies that are service-oriented, technology-focused, or simply younger and thus have not yet built up substantial asset bases.

**Summary**

- **`Cluster 1`** stands out as potentially having established companies with considerable asset holdings, which may translate to stability and market power.
- **`Cluster 0`** and **`Cluster 2`** appear to have more modest asset levels, which could align with companies having a lighter asset profile or those in earlier stages of growth.


In [74]:
fig = go.Figure()
fig.add_trace(go.Bar(x=asset_efficiency['kmeans_cluster'], y=asset_efficiency['asset_turnover'], hovertext='asset_turnover', name='Asset Turnover', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}x' for x in (asset_efficiency['asset_turnover'])]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Asset Turnover by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Asset Turnover (x)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Asset Turnover Analysis by Cluster**

**Asset Turnover Overview**

- **`Cluster 0`**: Reports an exceptionally high average asset turnover at **7336.00x**.
- **`Cluster 1`**: Shows a more typical average asset turnover of **7.47x**.
- **`Cluster 2`**: Presents a relatively high average asset turnover at **90.26x**.

**Comparative Analysis**

- The asset turnover for **`Cluster 0`** is unusually high and might indicate data anomalies or companies with minimal assets and very high sales volumes, which is often characteristic of service or digital companies with low capital investment.
- **`Cluster 1`** exhibits an asset turnover rate that is more aligned with industrial averages, suggesting efficient use of assets in generating revenue.
- The higher turnover in **`Cluster 2`** could imply that companies there are effectively using their assets to generate sales, possibly indicative of growth-oriented or capital-efficient businesses.

**Implications for Strategy**

- Companies in **`Cluster 0`** may require further investigation to understand the drivers behind the extremely high turnover figure.
- Entities within **`Cluster 1`** could be seen as having a balanced approach to asset utilization and sales generation.
- Firms in **`Cluster 2`** might be employing aggressive strategies to maximize sales with their asset base or operating in sectors that require less capital intensity.

**Summary**

- **`Cluster 0`**'s turnover rate suggests either an outlier scenario or a cluster of companies that are not asset-intensive but have high sales volume.
- **`Cluster 1`**'s figure reflects what might be expected from established companies in traditional industries.
- **`Cluster 2`** indicates a dynamic use of assets, which might appeal to investors looking for capital efficiency.


In [75]:
fig = go.Figure()
fig.add_trace(go.Bar(x=asset_efficiency['kmeans_cluster'], y=asset_efficiency['roi']*100, hovertext='roi', name='ROI', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (asset_efficiency['roi']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Return on Investment (ROI) by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Return on Investment (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Return on Investment (ROI) Analysis by Cluster**

**ROI Overview**

- **`Cluster 0`**: Exhibits an extraordinary average ROI at **150.84%**.
- **`Cluster 1`**: Reports a robust average ROI of **73.43%**.
- **`Cluster 2`**: Also shows a high average ROI at **99.76%**.

**Comparative Analysis**

- **`Cluster 0`**'s average ROI is significantly higher than both **`Cluster 1`** and **`Cluster 2`**, indicating potentially higher profitability or lower investment costs relative to returns.
- **`Cluster 1`** maintains a strong ROI, which may suggest effective capital utilization and a solid return on investments.
- The nearly triple-digit ROI of **`Cluster 2`** suggests efficient investment strategies or high-yield operations.

**Strategic Implications**

- The exceptional ROI for **`Cluster 0`** might require further analysis to validate the sustainability of such high returns and to understand the underlying business models.
- **`Cluster 1`** represents what appears to be well-managed companies achieving considerable returns, possibly indicative of mature and stable operations.
- **`Cluster 2`**'s high ROI could attract investors looking for growth potential and strong return profiles.

**Summary**

- The ROI figures indicate that **`Cluster 0`** may consist of high-growth or high-efficiency companies possibly enjoying competitive advantages or operating in high-margin sectors.
- **`Cluster 1`** and **`Cluster 2`** show impressive ROI percentages, reflecting successful investment and operational strategies.


In [76]:
fig = go.Figure()
fig.add_trace(go.Bar(x=asset_efficiency['kmeans_cluster'], y=asset_efficiency['return_on_assets']*100, hovertext='return_on_assets', name='ROA', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (asset_efficiency['return_on_assets']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Return on Assets (ROA) by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Return on Assets (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Return on Assets (ROA) Analysis by Cluster**

**ROA Overview**

- **`Cluster 0`**: Demonstrates a solid average ROA at **6.01%**.
- **`Cluster 1`**: Shows a comparable average ROA of **5.93%**.
- **`Cluster 2`**: Has a lower average ROA at **1.93%**.

**Comparative Analysis**

- **`Cluster 0`** and **`Cluster 1`** have similar average ROAs, indicating that companies in these clusters are relatively close in their efficiency in generating profits from their assets.
- **`Cluster 2`**'s lower ROA suggests that these companies may not be utilizing their assets as efficiently to generate profit, or they could be in a growth phase investing heavily in assets that have not yet generated proportional profits.

**Strategic Implications**

- The close ROA figures for **`Cluster 0`** and **`Cluster 1`** imply that businesses within these clusters are operating with a similar level of asset efficiency, although the slightly higher ROA in Cluster 0 could indicate a marginal edge in asset utilization or a different mix of assets.
- The significantly lower ROA for **`Cluster 2`** may point to companies that are either asset-heavy with longer-term payback periods or those that need to optimize their asset management to improve profitability.

**Summary**

- The ROA data positions **`Cluster 0`** as potentially having the most effective use of assets to generate profits.
- **`Cluster 1`** is nearly as effective as Cluster 0 in asset utilization for profit generation, potentially indicating sound management practices.
- **`Cluster 2`** appears to have room for improvement in asset utilization or may represent a cluster with a longer-term investment horizon.


In [80]:
fig = go.Figure()
fig.add_trace(go.Bar(x=asset_efficiency['kmeans_cluster'], y=asset_efficiency['return_on_equity']*100, hovertext='return_on_equity', name='Return on Equity', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (asset_efficiency['return_on_equity']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Return on Equity (ROE) by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Return on Equity (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Return on Equity (ROE) Analysis by Cluster**

**ROE Overview**

- **`Cluster 0`**: Reports a respectable average ROE of **12.17%**.
- **`Cluster 1`**: Leads with a robust average ROE of **16.00%**.
- **`Cluster 2`**: Has a competitive average ROE of **12.78%**.

**Comparative Analysis**

- **`Cluster 1`**'s average ROE is the highest, which could indicate that its companies are generating more profit per dollar of equity, reflecting efficient equity use or a high-profit margin industry.
- **`Cluster 0`** and **`Cluster 2`** have relatively close ROE figures, suggesting that they have similar effectiveness in generating profits from shareholders' equity.

**Strategic Implications**

- The high ROE in **`Cluster 1`** might reflect companies with strong profitability relative to their equity, which could be due to high earnings or effective management.
- **`Cluster 0`** and **`Cluster 2`** demonstrating similar ROE percentages may indicate these clusters contain companies with effective but not exceptional equity management or that they are operating in industries with more typical ROE figures.

**Summary**

- **`Cluster 1`**'s superior ROE suggests it is the most effective at generating returns on equity, potentially making it attractive to equity investors.
- **`Cluster 0`** and **`Cluster 2`** display solid performance, with Cluster 2 slightly outperforming Cluster 0 in ROE, hinting at efficient use of shareholder investments.


In [82]:
fig = go.Figure()
fig.add_trace(go.Bar(x=asset_efficiency['kmeans_cluster'], y=asset_efficiency['roce']*100, hovertext='roce', name='Return on Employed Capital', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (asset_efficiency['roce']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Return on Employed Capital (ROCE) by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Return on Employed Capital (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

**Return on Capital Employed (ROCE) Analysis by Cluster**

**ROCE Overview**

- **`Cluster 0`**: Shows a moderate average ROCE of **5.73%**.
- **`Cluster 1`**: Significantly outperforms with an average ROCE of **13.20%**.
- **`Cluster 2`**: Posts a lower average ROCE of **4.83%**.

**Comparative Analysis**

- The high ROCE of **`Cluster 1`** suggests that companies within this cluster are using their capital very efficiently to generate profits.
- **`Cluster 0`**'s ROCE indicates a reasonable rate of return on capital, which could be consistent with a stable, established business environment.
- The relatively lower ROCE of **`Cluster 2`** may reflect less efficient capital use, which could be due to a number of factors such as heavy investment phases or industries with lower capital turnover.

**Strategic Implications**

- **`Cluster 1`**'s superior ROCE might indicate companies with a strong competitive position or those operating in industries with higher operational efficiency.
- Companies in **`Cluster 0`** appear to be maintaining sound capital utilization, although there may be opportunities to optimize for better returns.
- The ROCE for **`Cluster 2`** suggests these companies could focus on strategies to improve their capital efficiency or may be in a phase of investment that has yet to yield higher returns.

**Summary**

- **`Cluster 1`**'s high ROCE indicates a cluster of potentially attractive companies for investors looking for efficient capital utilization.
- **`Cluster 0`** and **`Cluster 2`** show room for improvement in capital deployment, with Cluster 2, in particular, possibly needing strategic adjustments to enhance returns on capital.


#### Risk and Investment Performance

##### Investment Risk

In [28]:
investiment_risk = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster'])[['beta', 'debt_to_equity', 'price_to_sales_trailing_12_months']].mean()
investiment_risk = investiment_risk.reset_index()
investiment_risk

Unnamed: 0,kmeans_cluster,beta,debt_to_equity,price_to_sales_trailing_12_months
0,0,0.730008,0.374829,3.25736
1,1,0.755692,18.915705,2.110524
2,2,0.753449,0.796297,3.687917


In [83]:
fig = px.bar(investiment_risk, title='Average Beta', x='kmeans_cluster', y='beta', color_discrete_sequence=['rgb(100, 195, 181)'], hover_name='beta', height=500)
fig.update_traces(text=[f'{x:.2f}' for x in (investiment_risk['beta'])], textposition='outside',textfont=dict(color='white'))
fig.update_layout(xaxis_title='kmeans Cluster', yaxis_title='Average Beta', template = 'plotly_dark')
fig.show()

**Beta Values Analysis by Cluster**

**Beta Overview**

- **`Cluster 0`**: Demonstrates a beta of **0.73**, suggesting lower volatility relative to the market.
- **`Cluster 1`**: Shows a beta of **0.76**, indicating a slightly higher volatility than Cluster 0 but still below the market average.
- **`Cluster 2`**: Has a beta of **0.75**, reflecting volatility close to that of Cluster 1.

**Comparative Analysis**

- All clusters have beta values less than 1, implying that they are less volatile than the broader market.
- The small differences in beta values between the clusters (**0.73**, **0.76**, and **0.75**) suggest that companies within these clusters may have similar risk profiles.

**Implications for Investors**

- Investors seeking lower-risk investments might find companies within these clusters appealing due to their lower relative volatility.
- The similarity in beta values across clusters suggests that, from a volatility standpoint, there is a uniform risk level amongst them.

**Summary**

- The clusters present a consistent picture of below-market volatility, which may appeal to risk-averse investors.
- The slight variation in beta between the clusters is minimal, indicating comparable levels of systematic risk across the clusters.


In [85]:
fig = go.Figure()
fig.add_trace(go.Bar(x=investiment_risk['kmeans_cluster'], y=investiment_risk['debt_to_equity']*100, hovertext='debt_to_equity', name='Debt to Equity', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (investiment_risk['debt_to_equity']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Average Debt to Equity by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Debt to Equity (%)', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

In [31]:
fig = px.bar(investiment_risk, title='Average Price To Sales Trailing 12 Months', x='kmeans_cluster', y='price_to_sales_trailing_12_months', color_discrete_sequence=['rgb(100, 195, 181)'], hover_name='price_to_sales_trailing_12_months', height=600)
fig.update_traces(text=[f'{x:.2f}x' for x in (investiment_risk['price_to_sales_trailing_12_months'])], textposition='outside',textfont=dict(color='white'))
fig.update_layout(xaxis_title='Kmeans Cluster', yaxis_title='Average Price To Sales Trailing 12 Months', template = 'plotly_dark')
fig.show()

##### Market Assessment

In [32]:
market_assessment = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster'])[['trailing_pe', 'forward_pe', 'market_cap', 'enterprise_value', 'price_to_book']].mean()
market_assessment = market_assessment.reset_index()
market_assessment

Unnamed: 0,kmeans_cluster,trailing_pe,forward_pe,market_cap,enterprise_value,price_to_book
0,0,14.075959,5.52186,5800289000.0,9335457000.0,4.317683
1,1,12.302938,8.616403,84845650000.0,164926800000.0,1.872569
2,2,9.631109,2.205538,751471400.0,1731520000.0,8.262989


In [33]:
fig = px.bar(market_assessment, title='Trailing P/E', x='kmeans_cluster', y='trailing_pe', color_discrete_sequence=['rgb(100, 195, 181)'], hover_name='trailing_pe', height=600)
fig.update_traces(text=[f'{x:.2f}x' for x in (market_assessment['trailing_pe'])], textposition='outside',textfont=dict(color='white'))
fig.update_layout(xaxis_title='Kmeans Cluster', yaxis_title='Trailing P/E', template = 'plotly_dark')
fig.show()

In [34]:
fig = px.bar(market_assessment, title='Forward P/E', x='kmeans_cluster', y='forward_pe', color_discrete_sequence=['rgb(100, 195, 181)'], hover_name='forward_pe', height=600)
fig.update_traces(text=[f'{x:.2f}x' for x in (market_assessment['forward_pe'])], textposition='outside',textfont=dict(color='white'))
fig.update_layout(xaxis_title='Companies', yaxis_title='Forward P/E', template = 'plotly_dark')
fig.show()

In [35]:
formatted_market_assessment = format_values(market_assessment['market_cap'])

fig = go.Figure()
fig.add_trace(go.Bar(x=market_assessment['kmeans_cluster'], y=market_assessment['market_cap'], hovertext=formatted_asset_efficiency, name='Market Cap', marker=dict(color='rgb(100, 195, 181)'), text=formatted_market_assessment))
fig.update_traces(textposition='outside')
fig.update_layout(title='Market Cap by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Market Cap', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

In [36]:
formatted_market_assessment = format_values(market_assessment['enterprise_value'])

fig = go.Figure()
fig.add_trace(go.Bar(x=market_assessment['kmeans_cluster'], y=market_assessment['enterprise_value'], hovertext=formatted_asset_efficiency, name='Enterprise Value', marker=dict(color='rgb(100, 195, 181)'), text=formatted_market_assessment))
fig.update_traces(textposition='outside')
fig.update_layout(title='Enterprise Value by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Average Enterprise Value', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

In [37]:
fig = px.bar(market_assessment, title='Price to Book', x='kmeans_cluster', y='price_to_book', color_discrete_sequence=['rgb(100, 195, 181)'], hover_name='price_to_book', height=600)
fig.update_traces(text=[f'{x:.2f}x' for x in (market_assessment['price_to_book'])], textposition='outside',textfont=dict(color='white'))
fig.update_layout(xaxis_title='Cluster', yaxis_title='Price to Book', template = 'plotly_dark')
fig.show()

##### Market History

In [38]:
market_history = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster'])[['fifty_two_week_low', 'fifty_two_week_high', 'fifty_day_average', 'two_hundred_day_average']].mean()
market_history = market_history.reset_index()
market_history

Unnamed: 0,kmeans_cluster,fifty_two_week_low,fifty_two_week_high,fifty_day_average,two_hundred_day_average
0,0,18.203691,30.82196,24.288738,23.626257
1,1,18.275385,28.504936,23.381559,22.734482
2,2,12.988917,76.680763,22.479051,38.574618


In [39]:
formatted_market_history = format_values(market_history['fifty_two_week_low'])

fig = go.Figure()
fig.add_trace(go.Bar(x=market_history['kmeans_cluster'], y=market_history['fifty_two_week_low'], hovertext=formatted_market_history, name='52 Week Low', marker=dict(color='rgb(100, 195, 181)'), text=formatted_market_history))
fig.update_traces(textposition='outside')
fig.update_layout(title='52 Week Low', xaxis_title='KMeans Cluster', yaxis_title='Average 52 Week Low', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

In [40]:
formatted_market_history = format_values(market_history['fifty_two_week_high'])

fig = go.Figure()
fig.add_trace(go.Bar(x=market_history['kmeans_cluster'], y=market_history['fifty_two_week_high'], hovertext=formatted_market_history, name='52 Week High', marker=dict(color='rgb(100, 195, 181)'), text=formatted_market_history))
fig.update_traces(textposition='outside')
fig.update_layout(title='52 Week High', xaxis_title='KMeans Cluster', yaxis_title='Average 52 Week High', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

In [41]:
formatted_market_history = format_values(market_history['fifty_day_average'])

fig = go.Figure()
fig.add_trace(go.Bar(x=market_history['kmeans_cluster'], y=market_history['fifty_day_average'], hovertext=formatted_market_history, name='50 Days Average', marker=dict(color='rgb(100, 195, 181)'), text=formatted_market_history))
fig.update_traces(textposition='outside')
fig.update_layout(title='50 Days Average', xaxis_title='KMeans Cluster', yaxis_title='Average 50 Days', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

In [42]:
formatted_market_history = format_values(market_history['two_hundred_day_average'])

fig = go.Figure()
fig.add_trace(go.Bar(x=market_history['kmeans_cluster'], y=market_history['two_hundred_day_average'], hovertext=formatted_market_history, name='200 Days Average', marker=dict(color='rgb(100, 195, 181)'), text=formatted_market_history))
fig.update_traces(textposition='outside')
fig.update_layout(title='200 Days Average', xaxis_title='KMeans Cluster', yaxis_title='Average 200 Days', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

#### Dividend Policy

##### Dividend Payment

In [43]:
dividend_payment = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster'])[['dividend_rate', 'trailing_annual_dividend_rate', 'trailing_annual_dividend_yield', 'dividend_payout_ratio']].mean()
dividend_payment = dividend_payment.reset_index()
dividend_payment

Unnamed: 0,kmeans_cluster,dividend_rate,trailing_annual_dividend_rate,trailing_annual_dividend_yield,dividend_payout_ratio
0,0,1.800496,1.097661,0.042841,96.602199
1,1,1.663462,1.30875,0.059135,197.572295
2,2,68.972458,0.822729,0.018401,1342.209178


In [44]:
formatted_dividend_rate= format_values(dividend_payment['dividend_rate'])

fig = go.Figure()
fig.add_trace(go.Bar(x=dividend_payment['kmeans_cluster'], y=dividend_payment['dividend_rate'], hovertext=formatted_dividend_rate, name='Dividend Rate', marker=dict(color='rgb(100, 195, 181)'), text=formatted_dividend_rate))
fig.update_traces(textposition='outside')
fig.update_layout(title='Dividend Rate', xaxis_title='KMeans Cluster', yaxis_title='Dividend Rate', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

In [45]:
formatted_trailing_annual_dividend_rate= format_values(dividend_payment['trailing_annual_dividend_rate'])

fig = go.Figure()
fig.add_trace(go.Bar(x=dividend_payment['kmeans_cluster'], y=dividend_payment['trailing_annual_dividend_rate'], hovertext=formatted_dividend_rate, name='Trailing Annual Dividend Rate', marker=dict(color='rgb(100, 195, 181)'), text=formatted_trailing_annual_dividend_rate))
fig.update_traces(textposition='outside')
fig.update_layout(title='Trailing Annual Dividend Rate', xaxis_title='KMeans Cluster', yaxis_title='Dividend Rate', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

In [46]:
fig = go.Figure()
fig.add_trace(go.Bar(x=dividend_payment['kmeans_cluster'], y=dividend_payment['trailing_annual_dividend_yield'], hovertext='trailing_annual_dividend_yield', name='Trailing Annual Dividend Yield', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (dividend_payment['trailing_annual_dividend_yield']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Trailing Annual Dividend Yield by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Trailing Annual Dividend Yield', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

In [47]:
fig = go.Figure()
fig.add_trace(go.Bar(x=dividend_payment['kmeans_cluster'], y=dividend_payment['dividend_payout_ratio'], hovertext='dividend_payout_ratio', name='Dividend Payout Ratio', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (dividend_payment['dividend_payout_ratio'])]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Dividend Payout Ratio by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Dividend Payout Ratio', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

#### Financial Health and Capital Structure

##### Liquidity and Reserves

In [48]:
liquidity_reserves = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster'])[['total_cash', 'total_cash_per_share', 'book_value']].mean()
liquidity_reserves = liquidity_reserves.reset_index()
liquidity_reserves

Unnamed: 0,kmeans_cluster,total_cash,total_cash_per_share,book_value
0,0,1347213000.0,6.527835,13.515033
1,1,48039140000.0,9.765962,16.578019
2,2,1190988000.0,23.64178,-23.080551


In [49]:
formatted_liquidity_reserves= format_values(liquidity_reserves['total_cash'])

fig = go.Figure()
fig.add_trace(go.Bar(x=liquidity_reserves['kmeans_cluster'], y=liquidity_reserves['total_cash'], hovertext=formatted_liquidity_reserves, name='Total Cash', marker=dict(color='rgb(100, 195, 181)'), text=formatted_liquidity_reserves))
fig.update_traces(textposition='outside')
fig.update_layout(title='Total Cash', xaxis_title='KMeans Cluster', yaxis_title='Total Cash', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

In [50]:
formatted_liquidity_reserves= format_values(liquidity_reserves['total_cash_per_share'])

fig = go.Figure()
fig.add_trace(go.Bar(x=liquidity_reserves['kmeans_cluster'], y=liquidity_reserves['total_cash_per_share'], hovertext=formatted_liquidity_reserves, name='Total Cash per Share', marker=dict(color='rgb(100, 195, 181)'), text=formatted_liquidity_reserves))
fig.update_traces(textposition='outside')
fig.update_layout(title='Total Cash per Share', xaxis_title='KMeans Cluster', yaxis_title='Total Cash  per Share', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

In [51]:
formatted_liquidity_reserves= format_values(liquidity_reserves['book_value'])

fig = go.Figure()
fig.add_trace(go.Bar(x=liquidity_reserves['kmeans_cluster'], y=liquidity_reserves['book_value'], hovertext=formatted_liquidity_reserves, name='Book Value', marker=dict(color='rgb(100, 195, 181)'), text=formatted_liquidity_reserves))
fig.update_traces(textposition='outside')
fig.update_layout(title='Book Value', xaxis_title='KMeans Cluster', yaxis_title='Book Value', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

##### Margins and Leverage

In [52]:
margins_leverage = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster'])[['gross_margins', 'ebitda_margins', 'equity']].mean()
margins_leverage = margins_leverage.reset_index()
margins_leverage

Unnamed: 0,kmeans_cluster,gross_margins,ebitda_margins,equity
0,0,0.308575,0.737234,-3492118000.0
1,1,0.284312,0.243896,-80504860000.0
2,2,0.307552,0.05593,-987932200.0


In [53]:
fig = go.Figure()
fig.add_trace(go.Bar(x=margins_leverage['kmeans_cluster'], y=margins_leverage['gross_margins'], hovertext='gross_margins', name='Gross Margins', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (margins_leverage['gross_margins']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='gross_margins by Cluster', xaxis_title='KMeans Cluster', yaxis_title='gross_margins', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

In [54]:
fig = go.Figure()
fig.add_trace(go.Bar(x=margins_leverage['kmeans_cluster'], y=margins_leverage['ebitda_margins'], hovertext='gross_margins', name='Ebitda Margins', marker=dict(color='rgb(100, 195, 181)'), text=[f'{x:.2f}%' for x in (margins_leverage['ebitda_margins']*100)]))
fig.update_traces(textposition='outside')
fig.update_layout(title='Ebitda Margins by Cluster', xaxis_title='KMeans Cluster', yaxis_title='Ebitda Margins', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

In [55]:
formatted_margins_leverage= format_values(margins_leverage['equity'])

fig = go.Figure()
fig.add_trace(go.Bar(x=margins_leverage['kmeans_cluster'], y=margins_leverage['equity'], hovertext=formatted_margins_leverage, name='Equity', marker=dict(color='rgb(100, 195, 181)'), text=formatted_margins_leverage))
fig.update_traces(textposition='outside')
fig.update_layout(title='Equity', xaxis_title='KMeans Cluster', yaxis_title='Equity', template='plotly_dark', font=dict(color='white'), height=550)
fig.show()

#### Trading Volume and Activity

##### Trading Volumes

In [56]:
trading_volumes = df_fundamentals_scored_kmeans.groupby(['kmeans_cluster'])[['volume', 'average_volume']].mean()
trading_volumes = trading_volumes.reset_index()
trading_volumes

Unnamed: 0,kmeans_cluster,volume,average_volume
0,0,1469531.0,2903112.0
1,1,3818669.0,10228800.0
2,2,257594.1,677509.0


In [57]:
formatted_trading_volumes= format_values_x(trading_volumes['volume'])


fig = px.bar(trading_volumes, title='Volume', x='kmeans_cluster', y='volume', color_discrete_sequence=['rgb(100, 195, 181)'], hover_name='volume', height=600)
fig.update_traces(text=formatted_trading_volumes, textposition='outside',textfont=dict(color='white'))
fig.update_layout(xaxis_title='Cluster', yaxis_title='Volume', template = 'plotly_dark')
fig.show()

In [58]:
formatted_trading_volumes= format_values_x(trading_volumes['average_volume'])


fig = px.bar(trading_volumes, title='Average Volume', x='kmeans_cluster', y='average_volume', color_discrete_sequence=['rgb(100, 195, 181)'], hover_name='average_volume', height=600)
fig.update_traces(text=formatted_trading_volumes, textposition='outside',textfont=dict(color='white'))
fig.update_layout(xaxis_title='Cluster', yaxis_title='Average Volume', template = 'plotly_dark')
fig.show()

## TL/DR