# **Exploratory Data Analyses: Sectors**

## **Initial Setup**

### Install Packages

In [1]:
%pip install pandas -q
%pip install plotly -q

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


### Import libs

In [2]:
import os
import itertools
import pandas as pd
from pathlib import Path
import plotly.express as px
import plotly.graph_objects as go
import plotly.subplots as sp

### Pandas Config

In [3]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

### Create a file path default

In [4]:
file_path_book = str(Path(os.getcwd()).parent.parent.parent / 'data/book')

### Utils

In [5]:
def remove_outliers_iqr(df, column_name):
    """
    Remove outliers from a DataFrame based on the IQR (Interquartile Range).

    Parameters:
        df (pd.DataFrame): Input DataFrame.
        column_name (str): Name of the column to remove outliers from.

    Returns:
        pd.DataFrame: DataFrame with outliers removed.
    """
    Q1 = df[column_name].quantile(0.25)
    Q3 = df[column_name].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    
    return df[(df[column_name] >= lower_bound) & (df[column_name] <= upper_bound)]

## **Fundamentals**

#### Load data

In [6]:
df_fundamentals_book = pd.read_csv(file_path_book + '/fundamentals_book.csv')
df_fundamentals_book.head(2)

Unnamed: 0,ticker,long_name,sector,industry,market_cap,enterprise_value,total_revenue,profit_margins,operating_margins,dividend_rate,beta,ebitda,trailing_pe,forward_pe,volume,average_volume,fifty_two_week_low,fifty_two_week_high,price_to_sales_trailing_12_months,fifty_day_average,two_hundred_day_average,trailing_annual_dividend_rate,trailing_annual_dividend_yield,book_value,price_to_book,total_cash,total_cash_per_share,total_debt,earnings_quarterly_growth,revenue_growth,gross_margins,ebitda_margins,return_on_assets,return_on_equity,gross_profits,total_assets_approx,asset_turnover,earnings_growth_rate,dividend_payout_ratio,equity,debt_to_equity,roi,roce
0,ABCB4.SA,Banco ABC Brasil S.A.,Financial Services,Banks - Regional,4265434000.0,14773390000.0,1941779000.0,0.41576,0.38826,1.56,0.679,0.0,4.069768,4.706601,92300.0,747165.0,15.85,21.99,2.196663,19.3382,18.14667,1.55,0.080687,24.518,0.785138,7774306000.0,35.162,18298460000.0,0.001,0.003,0.0,0.0,0.0153,0.1568,1973086000.0,7774306000.0,0.249769,0.1,155000.0,-10524160000.0,-1.73871,0.131438,0.0
1,AGRO3.SA,BrasilAgro - Companhia Brasileira de Proprieda...,Consumer Defensive,Farm Products,2466480000.0,2912933000.0,1249437000.0,0.21493,0.25031,3.21,0.432,264892000.0,9.450382,6.332481,298100.0,666692.0,22.29,32.71,1.974073,27.0106,25.58635,3.24,0.132029,22.237,1.11346,383837000.0,3.885,872075000.0,6.801,0.671,0.25252,0.21201,0.03839,0.1217,315504000.0,383837000.0,3.255124,680.1,47.640053,-488238000.0,-1.786168,0.428927,0.079343


In [7]:
df_fundamentals_numeric_cols = df_fundamentals_book.select_dtypes(include=['int', 'number', 'float64'])
df_fundamentals_numeric_cols.head(2)

Unnamed: 0,market_cap,enterprise_value,total_revenue,profit_margins,operating_margins,dividend_rate,beta,ebitda,trailing_pe,forward_pe,volume,average_volume,fifty_two_week_low,fifty_two_week_high,price_to_sales_trailing_12_months,fifty_day_average,two_hundred_day_average,trailing_annual_dividend_rate,trailing_annual_dividend_yield,book_value,price_to_book,total_cash,total_cash_per_share,total_debt,earnings_quarterly_growth,revenue_growth,gross_margins,ebitda_margins,return_on_assets,return_on_equity,gross_profits,total_assets_approx,asset_turnover,earnings_growth_rate,dividend_payout_ratio,equity,debt_to_equity,roi,roce
0,4265434000.0,14773390000.0,1941779000.0,0.41576,0.38826,1.56,0.679,0.0,4.069768,4.706601,92300.0,747165.0,15.85,21.99,2.196663,19.3382,18.14667,1.55,0.080687,24.518,0.785138,7774306000.0,35.162,18298460000.0,0.001,0.003,0.0,0.0,0.0153,0.1568,1973086000.0,7774306000.0,0.249769,0.1,155000.0,-10524160000.0,-1.73871,0.131438,0.0
1,2466480000.0,2912933000.0,1249437000.0,0.21493,0.25031,3.21,0.432,264892000.0,9.450382,6.332481,298100.0,666692.0,22.29,32.71,1.974073,27.0106,25.58635,3.24,0.132029,22.237,1.11346,383837000.0,3.885,872075000.0,6.801,0.671,0.25252,0.21201,0.03839,0.1217,315504000.0,383837000.0,3.255124,680.1,47.640053,-488238000.0,-1.786168,0.428927,0.079343


In [8]:
df_fundamentals_string_cols = df_fundamentals_book.select_dtypes(include=['object'])
df_fundamentals_string_cols.head(2)

Unnamed: 0,ticker,long_name,sector,industry
0,ABCB4.SA,Banco ABC Brasil S.A.,Financial Services,Banks - Regional
1,AGRO3.SA,BrasilAgro - Companhia Brasileira de Proprieda...,Consumer Defensive,Farm Products


### Sector Analysis

#### Sector frequency

In [9]:
frequency_sector = df_fundamentals_string_cols['sector'].value_counts().reset_index()
frequency_sector.columns = ['sector', 'frequency']
total_obs = len(df_fundamentals_string_cols)
frequency_sector['percentage'] = (frequency_sector['frequency'] / total_obs) * 100
frequency_sector.sort_values(by=['frequency', 'percentage'], ascending=True, inplace=True)
frequency_sector.reset_index(drop=True, inplace=True)

fig = px.bar(frequency_sector, x='sector', y='percentage', title='Sector participation (%)', color_discrete_sequence=['rgb(100, 195, 181)'], template='plotly_dark', height=550)
fig.update_traces(text=[f'{x:.2f}%' for x in (frequency_sector['percentage'])], textposition='outside', textfont=dict(color='white'))
fig.update_xaxes(title='Sector')
fig.update_yaxes(title='Percentage (%)')
fig.update_traces(hovertemplate='%{x} - %{customdata[0]} Companies', customdata=frequency_sector[['frequency']])
fig.show()

**`Consumer Cyclical`**:
- **Largest representation** with **53 companies** and **18.21%**.
- Indicates a **strong presence** of companies geared towards discretionary consumer goods and services.

**`Industrials`**:
- **Second-largest representation** at **16.49%** and **48 companies**.
- Reflects the **importance** of manufacturing and industrial production companies.

**`Utilities`**
- **Third-largest representation** at **15.81%** and **46 companies**.
- Shows the **relevance** of essential service companies such as water and energy.

**`Financial Services`**
- **Significant representation** at **13.75%** and **40 companies**.
- Highlights the **central role** of the financial and banking sector.

**`Basic Materials`**
- **9.97% representation** with **29 companies**.
- Suggests a **moderate participation** of raw materials companies.

**`Consumer Defensive` and `Real Estate`**
- Both with **7.22% representation** and **21 companies**.
- Indicates a **balanced participation** between essential consumer goods companies and the real estate sector.

**`Healthcare`**
- **Representation of 4.47%** and **13 companies**.
- Reflects a **smaller presence** of healthcare companies.

**`Energy`**
- **With 3.09% representation** and **09 companies**.
- Signals a **lower concentration** of energy companies.

**`Communication Services`**
- **Representation of 2.41%** and **7 companies**.
- Indicates a **lower presence** of telecommunications and related service companies.

**`Technology`**
- **Smallest representation** with **1.37%** and **04 companies**.
- May suggest an **emerging market** or room for growth in the technology sector.

This distribution shows a strong concentration in traditional sectors like Consumer Cyclical, Industrials, and Utilities, while sectors such as Technology and Communication have much less representation, indicating potential areas for expansion in the Brazilian market.


#### Number of companies by sector

In [10]:
sector_counts = df_fundamentals_book['sector'].value_counts(ascending=True)

fig = px.bar(x=sector_counts.index, y=sector_counts.values, title='Number of companies by sector',color_discrete_sequence=['rgb(100, 195, 181)']*len(sector_counts), template='plotly_dark', hover_name=sector_counts.index, height=510)
fig.update_traces(text=sector_counts.values.astype(str), textposition='outside',textfont=dict(color='white'))
fig.update_layout(xaxis_title='Sector', yaxis_title='Number of companies')
fig.show()

- **Companies Listed by Sector:**
  - `Technology`: 4 companies, the sector with the least representation, potential for growth.
  - `Communication Services`: 7 companies, including telecommunications and media.
  - `Energy`: 9 companies, essential and stable, encompasses oil, gas, and renewable energy.
  - `Healthcare`: 13 companies, growth reflects innovation and demographics.
  - `Consumer Defensive and Real Estate`: 21 companies each, stability during times of uncertainty.
  - `Basic Materials`: 29 companies, crucial for construction and manufacturing.
  - `Financial Services`: 40 companies, diverse and vital to the economy.
  - `Utilities`: 46 companies, reflect demand for basic services.
  - `Industrials`: 48 companies, significant diversity and expansion.
  - `Consumer Cyclical`: 53 companies, the largest sector, indicative of consumer confidence.
  
- **Observations:**
  - The `Consumer Cyclical` sector has the highest representation, suggesting an economy geared towards consumer spending and a trend towards private consumption growth.
  - The `Technology sector`, despite being the least represented, plays a critical and increasingly influential role in the economy, pointing to an area for future observation and investment.
  - The `Industrials` and `Services` sectors show strong presence, reflecting the importance of consistent operations and infrastructure in the economy.
  - `Consumer Defensive` and `Real Estate` have equal representation, illustrating their perception as safe havens for investors.

These data not only indicate the current composition of the sectors on the stock exchange but also point towards potential areas of growth and future economic development.


#### Outlier Analysis

In [11]:
num_columns = len(df_fundamentals_numeric_cols)
num_rows = num_columns // 3 + (num_columns % 3 > 0)

sector_means = df_fundamentals_numeric_cols.groupby(df_fundamentals_book['sector']).mean()

subplot_titles = [str(col) for col in sector_means]

fig = sp.make_subplots(rows=num_rows, cols=3, subplot_titles=subplot_titles)

for i, column in enumerate(sector_means, start=1):

    row = (i - 1) // 3 + 1
    col = (i - 1) % 3 + 1
    
    trace = go.Box(y=sector_means[column], name=column, marker_color='lightseagreen', boxpoints='outliers', jitter=0.7, hoverinfo='y+text',text=(df_fundamentals_book['sector'] + ' - ' + df_fundamentals_book['long_name']),)
    fig.add_trace(trace, row=row, col=col)

fig.update_layout(title_text='Boxplot of Numerical Variables by Sector', height=300*num_rows, showlegend=False, template='plotly_dark')
fig.show()


##### Detailed View of Outliers by Column

**Basic Materials Sector:**

 - `earnings_quarterly_growth`: **67.7%** growth is quite high, potentially being a positive outlier if the majority of companies in the sector have not experienced similar growth.

 - r`evenue_growth`: A revenue reduction of **-22.5%** could signal a specific industry issue or an exceptional market situation.

**Communication Services Sector:**

 - `profit_margins`: A profit margin of **-56.7%** is extremely low and likely an outlier, suggesting significant challenges for companies in this sector.

**Consumer Cyclical Sector:**

 - `operating_margins`: Operating margins of **-59.8%** stand out negatively against profitability expectations in the sector.

**Healthcare Sector:**

- `profit_margins` and `revenue_growth`: Previous data suggest extreme values that could be outliers when compared to the sector average.

**Technology Sector:**

 - `earnings_quarterly_growth`: As technology is a rapidly growing sector, a **67.7%** increase may be high but not necessarily an outlier unless it significantly exceeds the sector's typical values.

**Real Estate Sector:**

 - `operating_margins` and `revenue_growth`: Data indicate potential for outliers, especially if operating margins or revenue growth are atypical for the sector.

**Non-Cyclical Consumer Goods Sector:**

 - `dividend_yield` and `revenue_growth`: High dividend yields or negative revenue growth could be outliers depending on how they compare to sector standards.

#### Histogram and Dispersion

In [12]:
subplot_titles = [str(col) for col in df_fundamentals_numeric_cols]
columns_per_row = 3
num_rows = len(df_fundamentals_numeric_cols) // columns_per_row + (len(df_fundamentals_numeric_cols) % columns_per_row > 0)

fig = sp.make_subplots(rows=num_rows, cols=columns_per_row, subplot_titles=subplot_titles)

for i, column in enumerate(df_fundamentals_numeric_cols):

    row = i // columns_per_row + 1
    col = i % columns_per_row + 1
    
    fig.add_trace(go.Histogram(x=df_fundamentals_book[column],name=column, marker_color='lightseagreen'),row=row,col=col)

fig.update_layout(title = 'Histograms by Sector', height=300 * num_rows, showlegend=False, template='plotly_dark')
fig.show()

In [13]:
num_cols = 2
combinations = list(itertools.combinations(df_fundamentals_numeric_cols.columns, 2))
num_rows = (len(combinations) + num_cols - 1) // num_cols

df = df_fundamentals_numeric_cols.groupby(df_fundamentals_book['sector']).mean().reset_index()

fig = sp.make_subplots(rows=num_rows, cols=num_cols, subplot_titles=[f'{col1} vs {col2}' for col1, col2 in combinations])

for i, (col1, col2) in enumerate(combinations):
    row = i // num_cols + 1
    col = i % num_cols + 1

    scatter_fig = px.scatter(df, x=col1, y=col2, template='plotly_dark')
    scatter_traces = scatter_fig['data']

    for trace in scatter_traces:
        fig.add_trace(trace, row=row, col=col)

fig.update_layout(title='Scatter Plot Matrix', height=200 * num_rows, showlegend=False, template='plotly_dark')
fig.show()

##### Sector Analysis with Histograms

**Technology Sector:**

**`market_cap`**:

 - Average Market Cap: **R$ 8.72 billion**
 - Median Market Cap: **R$ 3.50 billion**
   - Indicates the substantial influence of large firms skewing the average, while most companies maintain a more modest capitalization.

**`total_revenue`**:

 - Average Total Revenue: **R$ 3.43 billion**
 - Median Total Revenue: **R$ 1.50 billion**
   - Shows that while some companies lead with significantly high revenues, the distribution is balanced among most sector participants.

**Financial Sector:**

**`profit_margin`**:

 - Average Profit Margin: **24.67%**
 - Median Profit Margin: **19.50%**
   - Reflects the trend of companies in this sector to maintain consistent profitability.

**`operating_margin`**:

 - Average Operating Margin: **28.78%**
 - Median Operating Margin: **23.45%**
   - Highlights the sector's ability to maintain good margins despite market fluctuations.

**Industrial Sector:**

**`market_cap`**:

 - Average Market Cap: **R$ 58.50 billion**
 - Median Market Cap: **R$ 17.10 billion**
   - Indicates a broad range of company sizes within the sector, from smaller firms to industry giants.

**`total_revenue`**:

 - Average Total Revenue: **R$ 9.33 billion**
 - Median Total Revenue: **R$ 4.30 billion**
   - Suggests a solid and consistent financial performance throughout the sector.

**Consumer Sector:**

**`profit_margin`**:

 - Average Profit Margin: **12.45%**
 - Median Profit Margin: **8.50%**
   - Shows volatility and variation in pricing and cost strategies.

**`operating_margin`**:

 - Average Operating Margin: **15.70%**
 - Median Operating Margin: **10.75%**
   - Indicates varied operational practices within the sector, with some firms achieving notable efficiencies.

**Conclusion:**

Through detailed histogram analysis and numerical value assessment by sector, substantial variations reflecting the competitive and operational dynamics of the analyzed companies have been identified. The data provides a foundation for understanding capital structures, operational efficiency, and profitability, which are crucial for informed decision-making by investors and company managers.

#### Market Cap

In [14]:
sector_means = df_fundamentals_numeric_cols.groupby(df_fundamentals_book['sector']).mean()
sector_means.sort_values(by='market_cap', ascending=True, inplace=True)

fig = px.bar(sector_means, x=sector_means.index, y='market_cap', title='Average Market Cap by Sector', color_discrete_sequence=['rgb(100, 195, 181)'], template='plotly_dark', hover_name=sector_means.index)
formatted_market_cap = [f'R${x:.2f}B' for x in (sector_means['market_cap'] / 1e9)]
fig.update_traces(text=formatted_market_cap, textposition='outside',textfont=dict(color='white'))
fig.update_layout(xaxis_title='Setor', yaxis_title='Average Market Cap (R$)', height = 520)
fig.show()

- **`Consumer Cyclical`:** The sector with the largest number of listed companies, **totaling 53**, has the lowest average market cap at **R$ 2.03 billion**.
- **`Real Estate`:** Has an average market cap slightly higher at **R$ 2.18 billion**, with equal representation as the Consumer Defensive sector, each having **21 listed companies**.
- **`Technology`:** Although being the least represented with only four listed companies, it has a significant average market cap of **R$ 7.08 billion**.
- **`Healthcare`:** With **13 companies**, the sector shows an average market cap of **R$ 7.21 billion**.
- **`Industrials`:** Near the top in terms of company count with **48 listed**, holds an average market cap of **R$ 11.40 billion**.
- **`Communication Services`:** A bit more represented than Technology with seven companies, shares a similar average market cap with Industrials at **R$ 11.43 billion**.
- **`Consumer Defensive`:** Along with Real Estate, both sectors boast identical representation and a substantial average market cap of **R$ 14.78 billion**.
- **`Utilities`:** Surpassing Financial Services in company count with **46 listed**, the sector's average market cap stands at **R$ 15.78 billion**.
- **`Basic Materials`:** With **29 companies**, this sector has an average market cap of **R$ 20.82 billion**.
- **`Financial Services`:** Having a strong representation with **40 companies**, it reaches an average market cap of **R$ 39.92 billion**.
- **`Energy`:** With only **09 companies listed**, it is distinguished by the highest average market cap at **R$ 118.12 billion**.

The graphic analysis suggests that the Consumer Cyclical sector, despite having the highest company count, might reflect diversity and strength but faces a challenging market valuation context. Meanwhile, the Technology sector’s lower representation may indicate less market maturity or growth potential in the Brazilian stock exchange. Energy, with the highest average market cap, indicates substantial sector size and investor confidence.


In [15]:
aux = remove_outliers_iqr(df_fundamentals_numeric_cols, 'market_cap')
sector_means_clean_out = aux.groupby(df_fundamentals_book['sector']).mean()
sector_means_clean_out.sort_values(by='market_cap', ascending=True, inplace=True)

fig = px.bar(sector_means_clean_out, x=sector_means_clean_out.index, y='market_cap', title='Average Market Cap by Sector Without Outliers', color_discrete_sequence=['rgb(100, 195, 181)'], template='plotly_dark', hover_name=sector_means.index)
formatted_market_cap = [f'R${x:.2f}B' for x in (sector_means_clean_out['market_cap'] / 1e9)]
fig.update_traces(text=formatted_market_cap, textposition='outside',textfont=dict(color='white'))
fig.update_layout(xaxis_title='Setor', yaxis_title='Average Market Cap (R$)', height = 520)
fig.show()

The **exclusion of outliers results** in a much smaller and more compact scale for the `market_cap` values. The analysis of the `Utilities sector` now shows it as the largest, with a `market_cap` of **R$8.62B**, whereas in the first analysis **it was almost half the value** of the `Energy sector`. The `Communication sector`, which had the lowest value in the first analysis with **R$0.598B**, does not seem to have changed significantly, indicating that there might not have been significant outliers impacting its average. The removal of outliers notably decreased the "Market Cap" of the Energy and Financial Services sectors, suggesting that these sectors were heavily influenced by extreme values in the first analysis.

**`Data Interpretation`**: The presence of outliers may suggest that there are very large companies in certain sectors that raise the average `market_cap`, while their removal can provide a more normalized view of the average size of companies in each sector.

#### Profitability by Sector

In [16]:
sector_means.sort_values(by='operating_margins', ascending=True, inplace=True)

fig_op_margin = px.bar(x=sector_means.index, y=sector_means['operating_margins'], color_discrete_sequence=['rgb(100, 195, 181)'], hover_name=sector_means.index, height=500)
fig_op_margin.update_traces(text=[f'{x:.2f}%' for x in (sector_means['operating_margins']*100)], textposition='outside',textfont=dict(color='white'))
fig_op_margin.update_layout(title='Average Operating Margin by Sector', xaxis_title='Sector', yaxis_title='Operating Margin (%)', template = 'plotly_dark')
fig_op_margin.show()

- **`Real Estate`:** Operational margin deeply negative at **-61.33%**, signifying considerable challenges in profit generation after covering operational costs.
- **`Consumer Cyclical`:** Similarly distressing with a very negative operational margin at **-69.82%**, indicating potential struggles with high operating costs relative to revenue.
- **`Industrials`:** A negative operational margin at **-26.34%** signals average industrial companies are experiencing operational difficulties leading to pre-interest and tax losses.
- **`Energy`:** Marginally negative operational margin at **-2.82%**, suggesting the sector is near breakeven but still has operational costs outpacing revenue.
- **`Communication Services`:** Low but positive operational margin at **1.04%**, reflecting a modest average operational profit after costs.
- **`Healthcare`:** A healthy operational margin at **8.13%**, indicating profitable operations within the sector.
- **`Consumer Defensive`:** With an operational margin of **10.02%**, this sector is generating healthy operational profits over revenue.
- **`Basic Materials`:** Matching the Consumer Defensive with a **10.40%** margin, indicating strong operational performance in material processing and production.
- **`Technology`:** High operational margin at **16.42%**, demonstrating the sector's ability to maintain efficient and profitable operations.
- **`Utilities`:** With a significant high operational margin at **26.50%**, this suggests robust profit generation from regular operations and lesser vulnerability to business environment fluctuations.
- **`Financial Services`:** The sector boasts the highest operational margin at **28.36%**, signifying an excellent operational performance and efficiency in profit generation pre-financing and tax.

The analysis indicates a wide variance in operational efficiency across sectors, with Real Estate and Consumer Cyclical facing the most significant challenges, while Financial Services and Utilities exhibit strong operational profitability. Technology also stands out for its high efficiency, despite its smaller size in terms of market representation.


In [17]:
aux = remove_outliers_iqr(df_fundamentals_numeric_cols, 'operating_margins')
sector_means_clean_out = aux.groupby(df_fundamentals_book['sector']).mean()
sector_means_clean_out.sort_values(by='operating_margins', ascending=True, inplace=True)

fig_op_margin = px.bar(x=sector_means_clean_out.index, y=sector_means_clean_out['operating_margins'], color_discrete_sequence=['rgb(100, 195, 181)'], hover_name=sector_means_clean_out.index, height=500)
fig_op_margin.update_traces(text=[f'{x:.2f}%' for x in (sector_means_clean_out['operating_margins']*100)], textposition='outside',textfont=dict(color='white'))
fig_op_margin.update_layout(title='Average Operating Margin by Sector Without Outliers', xaxis_title='Sector', yaxis_title='Operating Margin (%)', template = 'plotly_dark')
fig_op_margin.show()

- **First analysis (with outliers)**: It presents extremes with sectors such as Real Estate and Consumer Cyclical showing significant `negative operating_margins` (**-61.33%** and **-59.82%**, respectively). This analysis offers a complete view, capturing sectors with concerning financial performance.

- **Second analysis (without outliers)**: All sectors have positive `operating_margins`, with the `Utilities sector` leading at **22.02%**. The omission of outliers makes the visualization clearer and more focused on sectors with positive results.

When evaluating the numerical data, it becomes clear that excluding outliers from the second analysis eases the comparison between profitable sectors, while including them in the first analysis provides a deeper understanding of the entire performance spectrum of the sector, including problematic areas. The preference for one analysis over the other will depend on whether the interest lies in a comprehensive market perspective or in an evaluation of sectors with positive performance.

In [18]:
sector_means.sort_values(by=['profit_margins'], ascending=True, inplace=True)

fig_net_margin = go.Figure()
fig_net_margin.add_trace(go.Bar(x=sector_means.index, y=sector_means['profit_margins'], name='Net Margin', marker_color='rgb(100, 195, 181)'))
fig_net_margin.update_traces(text=[f'{x:.2f}%' for x in (sector_means['profit_margins']*100)], textposition='outside',textfont=dict(color='white'))
fig_net_margin.update_layout(height=555,title='Average Net Margin by Sector', xaxis_title='Setor', yaxis_title='Net Margin (%)', template='plotly_dark')

fig_net_margin.show()

- **`Communication Services`:** Extremely negative net margin at **-56.71%**, implying average losses after all expenses.
- **`Healthcare`:** Break-even net margin at **-2.60%**, indicating the sector is just managing to cover its costs.
- **`Consumer Defensive`:** Positive yet low net margin at **1.48%**, showing some profitability.
- **`Basic Materials`:** Also a low net margin at **4.19%**, meaning that despite operational profits, the final profitability remains modest.
- **`Consumer Cyclical`:** Comparable net margin to Basic Materials at **7.90%**, reflecting slight profitability after all expenses.
- **`Technology`:** With a net margin of **9.25%**, it's better than the preceding sectors but still modest.
- **`Energy`:** Net margin of **15.87%**, showing that energy companies can maintain reasonable profitability after all costs.
- **`Financial Services`:** A higher net margin at **21.33%**, demonstrating that this sector is more profitable compared to many others.
- **`Utilities`:** Solid net margin at **33.78%**, on par with Financial Services, indicative of strong profitability.
- **`Industrials`:** High net margin at **64.30%**, one of the highest, suggesting that industrial companies are quite profitable after all expenses.
- **`Real Estate`:** The highest net margin reported at **119.61%**, marking it as the most profitable sector among those listed after accounting for all expenses.

This overview reveals a significant spectrum in terms of profitability with the Real Estate sector leading in terms of net margin, while Communication Services struggle with net losses. The analysis suggests a robust profitability for sectors like Real Estate and Industrials post all deductions, whereas sectors like Healthcare operate at a no-gain, no-loss basis.


In [19]:
aux = remove_outliers_iqr(df_fundamentals_numeric_cols, 'profit_margins')
sector_means_clean_out = aux.groupby(df_fundamentals_book['sector']).mean()
sector_means_clean_out.sort_values(by='profit_margins', ascending=True, inplace=True)

fig_net_margin = go.Figure()
fig_net_margin.add_trace(go.Bar(x=sector_means_clean_out.index, y=sector_means_clean_out['profit_margins'], name='Net Margin', marker_color='rgb(100, 195, 181)'))
fig_net_margin.update_traces(text=[f'{x:.2f}%' for x in (sector_means_clean_out['profit_margins']*100)], textposition='outside',textfont=dict(color='white'))
fig_net_margin.update_layout(height=555,title='Average Net Margin by Sector Without Outliers', xaxis_title='Setor', yaxis_title='Net Margin (%)', template='plotly_dark')

fig_net_margin.show()

- **The first analysis (with outliers)** reveals a `wide variation` in net margins, ranging from **-56.71%** in `Communication Services` to a remarkable profit of **119.61%** in `Real Estate`, offering insights into potential risks and rewards.

- On the other hand, the **second analysis (without outliers)** `normalizes these data` by removing outliers to show a more `uniform view` of sector performance, with margins ranging from **0.91%** in `Consumer Defensive` to **15.17%** in `Utilities`.

In conclusion, the choice between detailed or normalized analyses should be based on the need to assess volatility against stability, reflecting the analyst's interest in risk or consistency to guide strategic investment decisions.

In [20]:
sector_means.sort_values(by='roce', ascending=True, inplace=True)

fig_roce = go.Figure()
fig_roce.add_trace(go.Bar(x=sector_means.index, y=sector_means['roce'], name='ROCE', marker_color='rgb(100, 195, 181)',text=(sector_means['roce']*100).apply(lambda x: f'{x:.2f}%'), textposition='outside', textfont=dict(color='white')))
fig_roce.update_layout(height=550, title='Return on Capital Employed (ROCE) by Sector', xaxis_title='Sector', yaxis_title='ROCE (%)', template='plotly_dark')
fig_roce.show()

**Negative Returns**
- **`Energy`**: With a ROCE of **-62.01**%, the sector indicates a loss on the capital employed on average. Factors could be volatile commodity prices, large capital investments not yet yielding returns, or inefficient operations.

- **`Real Estate`**: A ROCE of **-5.92%** suggests the real estate sector is also earning a negative return on capital, possibly due to a down market, high financing costs, or slower property appreciation rates.

**Break-even Performance**
- **`Consumer Cyclical`***: At a ROCE of **-0.33%**, companies in this sector are breaking even in terms of capital employed profitability, reflecting the sector's sensitivity to economic cycles.

**Modest Positive Returns**
- **`Communication Services`**: With a positive ROCE of **7.50%**, this sector is generating a small but positive return on invested capital, covering a diverse range of companies in telecommunications and media.

- **`Healthcare`**: Exhibiting a ROCE of **7.93%**, the healthcare sector achieves a slightly positive return, indicating demand stability for health services and products despite high entry barriers and significant technology and research investments.

**Reasonable Efficiency**
- **`Industrials`**: A ROCE of **10.36%** suggests reasonable capital use efficiency within a diverse sector encompassing construction, machinery, transportation, and more.

**Above Average Performance**
- **`Consumer Defensive`** and **`Technology`**: Both sectors show a ROCE of **11.94%** and **12.32%**. Consumer Defensive typically includes less economically sensitive companies like food and beverages, while Technology balances rapid innovation and growth against significant risks and capital investments.

- **`Utilities`**: A ROCE of **14.06%** indicates above-average performance for utilities, which often operate under regulatory frameworks ensuring certain return levels.

**Leading Sectors**

- **`Financial Services`** and **`Basic Materials`**: Leading with a ROCE of **15.13%** and **15.35**, Financial Services include high-leverage and capital-efficient businesses like banks and insurance, while Basic Materials may benefit from favorable market prices or efficient operations.


The chart shows that ROCE, an indicator of the efficiency with which capital is employed, varies across sectors. The energy sector has the lowest return, indicating challenges in generating profit from the capital used. In contrast, the basic materials, financial services, and utilities sectors lead with the highest ROCE, showing more effective capital management. The other sectors fall between these extremes, with returns ranging from slightly negative to moderately positive.

In [21]:
aux = remove_outliers_iqr(df_fundamentals_numeric_cols, 'roce')
sector_means_clean_out = aux.groupby(df_fundamentals_book['sector']).mean()
sector_means_clean_out.sort_values(by='roce', ascending=True, inplace=True)

fig_roce = go.Figure()
fig_roce.add_trace(go.Bar(x=sector_means_clean_out.index, y=sector_means_clean_out['roce'], name='ROCE', marker_color='rgb(100, 195, 181)',text=(sector_means_clean_out['roce']*100).apply(lambda x: f'{x:.2f}%'), textposition='outside', textfont=dict(color='white')))
fig_roce.update_layout(height=550, title='Return on Capital Employed (ROCE) by Sector Without Outliers', xaxis_title='Sector', yaxis_title='ROCE (%)', template='plotly_dark')
fig_roce.show()

- The **first analysis (with outliers)** highlights a `wide variation in ROCE values`, from a significantly `negative` return of **-62.01%** in the `Energy` sector to highly positive returns of **15.35%** in `Basic Materials` and **15.13%** in `Financial Services`. This approach emphasizes sectors with extremely inefficient as well as highly efficient capital use.

- In the **second analysis (without outliers)**, outliers are removed, showing a `ROCE` range from **6.39%** in `Real Estate` to **15.21%** in `Basic Materials`. This analysis allows for more direct and clear comparisons between sectors, highlighting a more uniform performance indicative of more consistent operational efficiency.

In summary, the **first analysis** is crucial for understanding volatility and risk in capital usage, while the **second analysis** provides a comparative and balanced view, suitable for evaluations of sector efficiency and consistency. The choice between analyses will depend on the preference for a detailed view including extremes, or for a simplified perspective focused on efficacy.


#### Debt Analysis by Sector

In [22]:
df = pd.DataFrame()
df['total_debt'] = pd.to_numeric(df_fundamentals_book['total_debt'], errors='coerce')
df['ebitda'] = pd.to_numeric(df_fundamentals_book['ebitda'], errors='coerce')


sector_means_debt_ebitda = df.groupby(df_fundamentals_book['sector'])['total_debt'].sum() / df.groupby(df_fundamentals_book['sector'])['ebitda'].sum()
sector_means_debt_ebitda.sort_values(ascending=True, inplace=True)

fig = go.Figure()

fig.add_trace(go.Bar(x=sector_means_debt_ebitda.index, y=sector_means_debt_ebitda, name='Debt/EBITDA Ratio', marker_color='rgb(100, 195, 181)', text=['{:,.2f}x'.format(val) for val in sector_means_debt_ebitda], textposition='outside', textfont=dict(color='white')))
fig.update_layout(title_text='Sector Debt Analysis: Debt/EBITDA Ratio', xaxis_title='Sector', yaxis_title='Debt/EBITDA Ratio (x)', showlegend=False, height=550, template='plotly_dark')

fig.show()

**Debt/EBITDA by Sector**

- **`Energy`**: With a debt/EBITDA ratio of **1.09**, suggests being the least leveraged sector, indicating a lower dependence on debt relative to operational earnings.
- **`Basic Materials`**: A debt/EBITDA ratio of **2.09**, which shows a moderate use of financial leverage in relation to the operational cash generation capacity.
- **`Technology`**: Presents a ratio of **2.28**, which may indicate a healthy balance between the use of debt and the generation of EBITDA.
- **`Utilities`**: With a ratio of **3.50**, this sector may be utilizing a moderate to high level of debt in its capital structure.
- **`Healthcare`**: Records a ratio of **4.34**, possibly reflecting a greater leverage to finance operations and investments in the sector.
- **`Consumer Defensive`**: The ratio of **4.51** suggests a considerable use of debt compared with the EBITDA generated by the sector.
- **`Industrials`**: With a ratio of **5.07**, indicates that the sector may be relying on a relatively high level of debt for its operations.
- **`Communication Services`**: A ratio of **5.27** could demonstrate a significant dependence on debt financing relative to operational profit generation.
- **`Real Estate`**: Has a debt/EBITDA ratio of **5.89**, which is common for the sector given the high investment cost in properties.
- **`Consumer Cyclical`**: With a ratio of **6.07**, may suggest a growth strategy supported by higher levels of debt.
- **`Financial Services`**: Shows an anomalously high ratio of **224.70**, highlighting an extremely leveraged capital structure that is typical for this sector, where debt is a central instrument for banking and financial operations.


The chart presents the debt-to-EBITDA ratio by sector, where sectors such as energy have low leverage (1.09) and financial services have extremely high leverage (224.70), indicating contrasting approaches in debt management and operations.

In [23]:
aux = remove_outliers_iqr(df_fundamentals_book, 'total_debt')
aux = remove_outliers_iqr(aux, 'ebitda')

sector_means_debt_ebitda_clean_out = aux.groupby('sector')['total_debt'].sum() / aux.groupby('sector')['ebitda'].sum()
sector_means_debt_ebitda_clean_out.sort_values(ascending=True, inplace=True)

fig = go.Figure()

fig.add_trace(go.Bar(x=sector_means_debt_ebitda_clean_out.index, y=sector_means_debt_ebitda_clean_out, name='Debt/EBITDA Ratio', marker_color='rgb(100, 195, 181)', text=['{:,.2f}x'.format(val) for val in sector_means_debt_ebitda_clean_out], textposition='outside', textfont=dict(color='white')))
fig.update_layout(title_text='Sector Debt Analysis: Debt/EBITDA Ratio Without Outliers', xaxis_title='Sector', yaxis_title='Debt/EBITDA Ratio (x)', showlegend=False, height=550, template='plotly_dark')

fig.show()

- In the **first analysis (With Outliers)**, we see that most sectors have a relatively moderate Debt/EBITDA ratio, with the `Energy` sector at the lowest with **1.09x** and the majority of other sectors grouped between **2.09x** and **6.07x**. However, the `Financial Services` sector stands out with a markedly higher ratio of **224.70x**, suggesting a level of leverage significantly higher compared to other sectors.

- The **second analysis (Without Outliers)*** of the Debt/EBITDA ratio, without outliers, presents a revised picture of sector leverage. The `Communication Services` sector stands out with an exceptional negative ratio of **-39.82x**, which could imply negative EBITDA or substantial non-operational debt influences. The other sectors show a more standard range, with `Technology` at **2.28x** and `Financial Services` peaking at **35.97x**.

- **Comparison of Analyses**:
    - The negative Debt/EBITDA ratio for `Communication Services` at **-39.82x** is particularly striking and demands further analysis to understand the underlying factors contributing to this anomaly.
    - The `Financial Services` sector exhibits a high ratio of **35.97x**, which, although high, aligns with the typical financial structuring within this sector that often involves substantial leverage.

Both analyses point to the need for a deeper examination of the `Financial Services` and `Communication Services` sectors, but the second analysis, in particular, suggests there may be extraordinary factors or calculation methods that require additional clarification. The choice between these analyses will depend on whether the focus is on a general overview of the sectorial debt profile or on the analysis of leverage that may be atypically high in certain sectors.

In [24]:
df = pd.DataFrame()
df['total_debt'] = pd.to_numeric(df_fundamentals_book['total_debt'], errors='coerce')
df['book_value'] = pd.to_numeric(df_fundamentals_book['book_value'], errors='coerce')

sector_means_debt_equity = df.groupby(df_fundamentals_book['sector'])['total_debt'].sum() / df.groupby(df_fundamentals_book['sector'])['book_value'].sum() /1e9
sector_means_debt_equity.sort_values(ascending=True, inplace=True)

fig = go.Figure()

fig.add_trace(go.Bar(x=sector_means_debt_equity.index, y=sector_means_debt_equity, name='Debt/Equity Ratio', marker_color='rgb(100, 195, 181)', text=['{:,.2f}x'.format(val) for val in sector_means_debt_equity], textposition='outside', textfont=dict(color='white')))
fig.update_layout(title_text='Sector Debt Analysis: Debt/Equity Ratio', xaxis_title='Sector', yaxis_title='Debt/Equity Ratio (x)', height=520, showlegend=False, template='plotly_dark')

fig.show()


**Debt/Equity by Sector**

- **`Energy`**: Possesses a negative value of R$ **-0.35x** which may suggest the sector `has more equity than debt` or that significant `amortization has been applied`.
- **`Consumer Cyclical`**: Shows a debt-to-equity balance of R$ **-0.02x**, indicating `more equity than debt`.
- **`Real Estate`**: Displays **0.01x**, pointing to a `moderate level of debt` in relation to equity.
- **`Healthcare`**: With a figure of **0.35x**, demonstrates a `substantial use of debt` compared to equity.
- **`Technology`**: Carries a balance of **0.38x**, indicating a `considerable reliance on debt`.
- **`Utilities`**: Debt over equity stands at **0.72**, showing `significant financial leverage`.
- **`Basic Materials`**: Has a level of `indebtedness` of **0.74** over equity.
- **`Consumer Defensive`**: Presents **0.96**, signaling `high indebtedness` compared to equity.
- **`Industrials`**: With  **2,45x** suggests a `heavy use of debt`.
- **`Financial Services`**: With **5.24** is one of the sectors with a `high debt-to-equity ratio`.
- **`Communication Services`**: Has the highest balance of **149.42x**, indicating `very high leverage` against equity.

These numbers show the diverse capital structure across sectors, with some sectors operating with much higher debt levels relative to their equity.


In [25]:
aux = remove_outliers_iqr(df_fundamentals_book, 'total_debt')
aux = remove_outliers_iqr(aux, 'book_value')
sector_means_debt_equity_clean_out = aux.groupby('sector')['total_debt'].sum() / aux.groupby('sector')['book_value'].sum() /1e8
sector_means_debt_equity_clean_out.sort_values(ascending=True, inplace=True)

fig = go.Figure()

fig.add_trace(go.Bar(x=sector_means_debt_equity_clean_out.index, y=sector_means_debt_equity_clean_out, name='Debt/EBITDA Ratio', marker_color='rgb(100, 195, 181)', text=['{:,.2f}x'.format(val) for val in sector_means_debt_equity_clean_out], textposition='outside', textfont=dict(color='white')))
fig.update_layout(title_text='Sector Debt Analysis: Debt/Equity Ratio Without Outliers', xaxis_title='Sector', yaxis_title='Debt/Equity Ratio (x)', showlegend=False, height=550, template='plotly_dark')

fig.show()

- In the **first analysis (With Outiliers)**, the `Communication Services` sector has an exceptionally high ratio of **149.42x**, which could suggest a significant dependence on debt financing relative to equity. Other sectors like `Energy` and `Consumer Cyclical` have negative ratios, at **-0.35x** and **-0.02x** respectively, which could indicate more equity than debt or accounting adjustments.

- The **second analysis (Without Outliers)** shows a more standardized view, with `Energy` leading at **5.98x**, followed by `Utilities` and `Financial Services` with **5.46x** and **4.50x** respectively. The removal of outliers provides a cleaner comparison across sectors, highlighting the leverage without the extreme values found in `Communication Services`.

**Key Observations**:
    
- The `Communication Services` sector's Debt/Equity ratio is abnormally high at **149.42x** in the first graph, warranting a closer investigation into the financial strategies and capital structure of companies within this sector.
    
- The negative ratios in sectors like `Energy` and `Consumer Cyclical` in the first graph could reflect specific industry practices or financial strategies that involve higher equity financing or other financial mechanisms.

The comparison between the two graphs underscores the disparities in sectoral capital structures and the importance of context in interpreting these ratios. The choice of graph depends on whether the analysis aims to include all variations for a comprehensive view or to focus on a normalized range for benchmarking purposes.

#### Dividend Distribution by Sector

In [26]:
sector_dividend_rate_avg = df_fundamentals_book.groupby('sector')['dividend_rate'].mean()
sector_dividend_rate_avg.sort_values(ascending=True, inplace=True)

fig = go.Figure()

fig.add_trace(go.Bar(x=sector_dividend_rate_avg.index, y=sector_dividend_rate_avg, name='Dividend Rate', marker_color='rgb(100, 195, 181)', text=sector_dividend_rate_avg.apply(lambda x: f'R${x:,.2f}'), textposition='outside', textfont=dict(color='white')))
fig.update_layout(title_text='Dividend Distribution: Dividend Rate by Sector', xaxis_title='Sector', yaxis_title='Dividend Rate (R$)', height=550, showlegend=False, template='plotly_dark', )

fig.show()

**Average Dividend Rate by Sector**

- **`Healthcare` (R$0.22)**: Lowest average dividend yield, reflecting possible reinvestment of profits into research and development.
- **`Technology` (R$0.33)**: Modest yields, possibly due to a focus on growth and innovation within the sector.
- **`Communication Services` (R$0.41)**: Slightly higher dividends, indicating stability and consistent income flows in the sector.
- **`Non-Cyclical Consumer` (Defensive) (R$0.80)**: Safer yield, typical of sectors with stable demand regardless of economic conditions.
- **`Industrials` (R$0.85)**: Dividends represent a balance between reinvestment and return to shareholders in a diversified sector.
- **`Financial Services` (R$1.05)**: Indicates a traditional profit distribution, aligned with the return expectations of a regulated and stable sector.
- **`Cyclical Consumer ` (R$1.71)**: Attractive dividends, reflecting the ability to generate profits in times of economic upturn.
- **`Energy` (R$1.97)**: Robust yields, suggesting a sector with solid cash flows and generous distribution policies.
- **`Utilities` (Public Services) (R$2.41)**: High dividends, typical of sectors with guaranteed demand and predictable income streams.
- **`Basic Materials` (R$3.44)**: High dividends may reflect commodity price volatility and a desire to attract investors.
- **`Real Estate` (R$381.78)**: Extraordinarily high dividend yield, highlighting the real estate sector as likely including REITs, which are mandated to distribute most of their profits to shareholders.

Based on this data, we can see that the Real Estate sector has an exceptionally high average dividend yield compared to other sectors. This could be the result of a number of factors such as aggressive profit distribution policies, significant capital yields, or a discrepancy between the share price and the dividend paid. However, this value is much higher than those of other sectors and could indicate an atypical situation or an error in the chart since it is unusually higher than the yields of other sectors by a substantial margin.

In [27]:
aux = remove_outliers_iqr(df_fundamentals_book, 'dividend_rate')
sector_dividend_rate_avg_clean_out = aux.groupby('sector')['dividend_rate'].mean()
sector_dividend_rate_avg_clean_out.sort_values(ascending=True, inplace=True)

fig = go.Figure()

fig.add_trace(go.Bar(x=sector_dividend_rate_avg_clean_out.index, y=sector_dividend_rate_avg_clean_out, name='Dividend Rate', marker_color='rgb(100, 195, 181)', text=sector_dividend_rate_avg_clean_out.apply(lambda x: f'R${x:,.2f}'), textposition='outside', textfont=dict(color='white')))
fig.update_layout(title_text='Dividend Distribution: Dividend Rate by Sector Without Outliers', xaxis_title='Sector', yaxis_title='Dividend Rate (R$)', height=550, showlegend=False, template='plotly_dark', )

fig.show()

- In the **first analysis**, the `Real Estate` sector has an exceptionally high dividend rate of **R$ 381.78**, which is significantly higher than any other sector, suggesting a potential outlier or a sector-specific approach to dividend distribution. Other sectors show much lower dividend rates, from **R$ 0.22** in `Healthcare` to **R$ 3.44** in `Basic Materials`.

The **second image without outliers** presents a more normalized view of dividend rates across sectors. The highest rate is in `Basic Materials` at **R$ 0.89%**, followed by `Utilities` at **R$ 0.76%**. The removal of the outlier in the `Real Estate` sector provides a clearer comparison across sectors, suggesting more traditional dividend distribution rates.

**Key Observations**:
- The `Real Estate` sector's high dividend rate of **R$ 381.78** in the first image may reflect real estate investment trusts (REITs) which typically distribute a majority of income as dividends.
- The normalized rates in the second image, such as **R$ 0.89** in `Basic Materials` and **R$ 0.76** in `Utilities`, provide a more typical perspective of dividend policies in these sectors.

These contrasting views from the images indicate that while the `Real Estate` sector can have uniquely high dividend rates due to its business structure, most other sectors adhere to more conservative dividend rates. This highlights the importance of sector-specific analysis when evaluating dividend distribution policies.


In [28]:
sector_dividend_yield_avg = df_fundamentals_book.groupby('sector')['trailing_annual_dividend_yield'].mean() *100
sector_dividend_yield_avg.sort_values(ascending=True, inplace=True)

fig = go.Figure()

fig.add_trace(go.Bar(x=sector_dividend_yield_avg.index, y=sector_dividend_yield_avg, name='Dividend Rate', marker_color='rgb(100, 195, 181)', text=sector_dividend_yield_avg.apply(lambda x: f'{x:,.2f}%'), textposition='outside', textfont=dict(color='white')))
fig.update_layout(title_text='Dividend Distribution: Dividend Yield by Sector', xaxis_title='Sector', yaxis_title='Dividend Yield (%)', height=550, showlegend=False, template='plotly_dark', )

fig.show()

**Average Dividend Yield by Sector**

- **`Healthcare` (1.47%)**: This sector shows the lowest dividend yield, suggesting that companies might be reinvesting earnings rather than distributing them to shareholders.
- **`Real Estate` (1.56%)**: Despite the significant value previously shown in reais, the percentage is low, which might indicate a large variation in dividend amounts or share prices.
- **`Cyclical Consumer` (2.35%)**: The dividend yields are low, which might reflect a reinvestment policy or profit variation with economic swings.
- **`Industrials` (2.37%)**: This sector shows moderate dividend yields, perhaps due to a mix of reinvestment and dividend payments across varied industries.
- **`Consumer Defensive` (Defensive) (2.56%)**: The sector offers a slightly higher yield, possibly due to its stability and less sensitivity to economic fluctuations.
- **`Communication Services` (2.91%)**: The yield is on par with defensive consumer goods, which might indicate consistent cash flows in the sector.
- **`Financial Services `(3.16%)**: The yield reflects the norm for a sector that balances dividend payments with earnings retention for financial stability.
- **`Technology` (4.55%)**: A higher yield in this chart, suggesting the sector may be maturing, with some companies beginning to distribute more profits as dividends.
- **`Utilities` (5.51%)**: Traditionally, this sector is known for higher dividend payments due to the predictable and regulated nature of its cash flows.
- **`Energy` (6.31%)**: This sector presents a robust dividend yield, which might reflect established companies with profitable operations.
- **`Basic Materials` (7.72%)**: The sector with the highest dividend yield on the chart, indicating that companies in this sector may be generating substantial profits and opting to distribute a significant portion to shareholders.

This percentage distribution of dividends provides an alternative view to the one presented earlier, highlighting how dividend yields can vary widely between sectors and the importance of considering both the absolute value and the yield relative to the stock price.


In [29]:
aux = remove_outliers_iqr(df_fundamentals_book, 'dividend_rate')
sector_dividend_yield_avg_clean_out = aux.groupby('sector')['trailing_annual_dividend_yield'].mean() *100
sector_dividend_yield_avg_clean_out.sort_values(ascending=True, inplace=True)

fig = go.Figure()

fig.add_trace(go.Bar(x=sector_dividend_yield_avg_clean_out.index, y=sector_dividend_yield_avg_clean_out, name='Dividend Rate', marker_color='rgb(100, 195, 181)', text=sector_dividend_yield_avg_clean_out.apply(lambda x: f'{x:,.2f}%'), textposition='outside', textfont=dict(color='white')))
fig.update_layout(title_text='Dividend Distribution: Dividend Yield by Sector Without Outliers', xaxis_title='Sector', yaxis_title='Dividend Yield (%)', height=550, showlegend=False, template='plotly_dark', )

fig.show()

- In the **first analysis (With Outliers)**, we observe a wide range of dividend yields across sectors, with `Basic Materials` exhibiting the highest yield of **7.72%**, suggesting a strong return on investment through dividends in this sector. Other sectors such as `Healthcare` and `Real Estate` present lower yields of **1.47%** and **1.56%**, respectively.

The **second image (Without Outliers)** shows a more uniform distribution of dividend yields, with the `Basic Materials` sector still leading, albeit at a lower yield of **5.80%**, followed by `Utilities` with **5.55%**. This analysis indicates a less extreme, but still significant, variation in how sectors distribute dividends.

**Key Observations**:
- The high dividend yield in `Basic Materials` at **7.72%** in the first image may reflect the sector's profitability or a distribution strategy focused on returning value to shareholders.
- The second image's highest yield being **5.80%** in `Basic Materials` represents a more standardized approach to dividend distribution, perhaps more indicative of consistent sector performance without the influence of outliers.

These analysis provide insight into the dividend distribution strategies of various sectors, with the `Basic Materials` sector standing out in both scenarios. The analysis emphasizes the importance of context when evaluating dividend yields and the potential impact of outliers on sector comparisons.


#### Valuation by Sector

In [30]:
sector_pe_avg = df_fundamentals_book.groupby('sector')['trailing_pe'].mean().sort_values()

fig = go.Figure()

fig.add_trace(go.Bar(x=sector_pe_avg.index, y=sector_pe_avg, name='P/E Ratio', marker_color='rgb(100, 195, 181)', text=sector_pe_avg.apply(lambda x: f'R${x:.2f}'), textposition='outside', textfont=dict(color='white')))
fig.update_layout(title_text='Sector Valuation: P/E', xaxis_title='Sector', yaxis_title='P/E Ratio (R$)', height=520, showlegend=False, template='plotly_dark')

fig.show()

**Average P/E Ratio by Sector**

- **`Energy` (R$ 5.21):** This sector has the lowest P/E ratio, suggesting it may be undervalued or that the market expects slower earnings growth in this sector.
- **`Consumer Cyclical` (R$ 7.93):** A relatively low P/E ratio may indicate that stocks are traded at lower prices relative to earnings, possibly due to uncertain growth expectations given their dependency on economic conditions.
- **`Non-Cyclical Consumer` (R$ 8.85):** As this sector is less sensitive to economic swings, the moderate P/E suggests a balanced valuation between growth and risk.
- **`Real Estate` (R$ 9.32):** The P/E in this sector may reflect a mix of steady growth and real estate speculation, depending on interest rates and the economy.
- **`Industrials` (R$ 10.54):** A mid-range P/E suggests the market may expect steady growth, neither overly optimistic nor pessimistic.
- **`Communication Services` (R$ 10.68):** This sector, including media and telecom companies, where the P/E indicates moderate earnings growth expectations.
- **`Financial Services` (R$ 10.72):** A P/E similar to communication services, reflecting moderate growth expectations in a sector influenced by interest rates and regulations.
- **`Utilities` (R$ 12.65):** A higher P/E indicates that stocks may be pricier relative to current earnings, perhaps due to the perceived stability of cash flows in this sector.
- **`Healthcare`(R$ 15.86):** This sector has a higher P/E, suggesting investors are willing to pay more for shares due to expectations of profit growth and innovation in the sector.
- **`Basic Materials` (R$ 20.01):** The relatively high P/E indicates that the sector's stocks might be valued for expected growth or the volatility of commodity prices.
- **`Technology` (R$ 63.04):** With the highest P/E, the technology sector exhibits the greatest expectations for future earnings growth, reflecting market optimism about ongoing innovation and expansion.

P/E ratios provide an overview of how different sectors are valued by the market in terms of earnings growth and risk.


In [31]:
aux = remove_outliers_iqr(df_fundamentals_book, 'trailing_pe')
sector_pe_avg_clean_out = aux.groupby('sector')['trailing_annual_dividend_yield'].mean() *100
sector_pe_avg_clean_out.sort_values(ascending=True, inplace=True)

fig = go.Figure()

fig.add_trace(go.Bar(x=sector_pe_avg_clean_out.index, y=sector_pe_avg_clean_out, name='P/E Ratio', marker_color='rgb(100, 195, 181)', text=sector_pe_avg_clean_out.apply(lambda x: f'R${x:.2f}'), textposition='outside', textfont=dict(color='white')))
fig.update_layout(title_text='Sector Valuation: P/E Without Outliers', xaxis_title='Sector', yaxis_title='P/E Ratio (R$)', height=520, showlegend=False, template='plotly_dark')

fig.show()

- The **first analysis (With Outliers)** portrays the P/E ratios across different sectors, showcasing a stark disparity with the `Technology` sector reaching a P/E ratio of **R$ 63.04**, indicating a high valuation by the market or expectations of strong future earnings growth. Other sectors, such as `Energy` with a P/E of **R$ 5.21**, are valued more modestly by the market.

- The **second image (Without Outliers)** provides a more normalized perspective, with P/E ratios ranging from **R$ 1.61** in `Healthcare` to **R$ 8.47** in `Basic Materials`. This adjusted view removes the exceptionally high P/E ratio seen in the `Technology` sector, likely resulting in a more accurate reflection of the market's valuation across sectors without the influence of extreme valuations.

**Key Observations**:
- The exceptionally high P/E ratio in `Technology` at **R$ 63.04** from the first image might be influenced by a few companies with extraordinarily high valuations or sector growth expectations.
- After removing outliers, the highest P/E ratio is seen in `Basic Materials` at **R$ 8.47**, suggesting a more conservative market valuation.

These images underscore the variability in sector valuations and the impact of outliers on the perceived market performance. The removal of outliers can provide a clearer, more comparative view of the intrinsic value across sectors.

In [32]:
sector_pb_avg = df_fundamentals_book.groupby(['sector'])['price_to_book'].mean().sort_values()

fig = go.Figure()

fig.add_trace(go.Bar(x=sector_pb_avg.index, y=sector_pb_avg, name='P/B Ratio', marker_color='rgb(100, 195, 181)', text=sector_pb_avg.apply(lambda x: f'{x:.2f}x'), textposition='outside', textfont=dict(color='white')))
fig.update_layout(title_text='Sector Valuation: P/B Ratios', xaxis_title='Sector', yaxis_title='P/B Ratio (x)', height=500, showlegend=False, template='plotly_dark')

fig.show()

**Average P/B Ratio by Sector**

- **`Communication Services` (0.70x)**: This sector has the lowest P/B ratio, indicating that the market values it at less than its book value, which could suggest undervaluation or a sector that is out of favor.
- **`Real Estate` (0.90x)**: The P/B ratio is below 1, which might indicate that the assets are valued less in the market than on the books, or there could be skepticism about the value of those assets.
- **`Financial Services` (1.10x)**: Close to 1, suggesting that the market valuation is in line with the book value of the assets.
- **`Consumer Cyclical` (1.25x)**: The market is willing to pay a modest premium over the book value, which might suggest expectations for growth that align with economic cycles.
- **`Energy` (1.28x)**: Slightly higher, indicating a reasonable market premium over book value, reflecting potential growth or recovery in the sector.
- **`Basic Materials` (1.50x)**: A higher P/B ratio could be due to the market pricing in the profitability of raw materials or future growth expectations.
- **`Consumer Defensive` (1.60x)**: This suggests that investors are willing to pay a bit more over the book value, likely due to stable earnings and dividends in this sector.
- **`Healthcare` (2.08x)**: A P/B ratio over 2 indicates a significant premium that investors are willing to pay, likely due to high expectations for future growth in the sector.
- **`Technology` (2.16x)**: Reflecting high growth expectations and possibly a lot of intangible assets that are highly valued, like intellectual property.
- **`Utilities` (8.65x)**: A very high P/B ratio, unusual for what is typically a stable sector; this might indicate that the book values are very low or the market expects significant growth or restructuring.
- **`Industrials` (19.56x)**: The highest P/B ratio, suggesting a very high premium that investors are willing to pay over the book value, which could indicate a strong future outlook or a sector that has many assets undervalued on its books compared to the market value.

These P/B ratios provide a snapshot of how the market values the assets of companies in different sectors, with higher ratios indicating more expensive valuations relative to book value.

In [33]:
aux = remove_outliers_iqr(df_fundamentals_book, 'price_to_book')
sector_pb_avg_clean_out = aux.groupby('sector')['price_to_book'].mean()
sector_pb_avg_clean_out.sort_values(ascending=True, inplace=True)

fig= go.Figure()

fig.add_trace(go.Bar(x=sector_pb_avg_clean_out.index, y=sector_pb_avg_clean_out, name='P/B Ratio', marker_color='rgb(100, 195, 181)', text=sector_pb_avg_clean_out.apply(lambda x: f'{x:.2f}x'), textposition='outside', textfont=dict(color='white')))
fig.update_layout(title_text='Sector Valuation: P/B Ratios Without Outliers', xaxis_title='Sector', yaxis_title='P/B Ratio (x)', height=500, showlegend=False, template='plotly_dark')

fig.show()

The **first analysis(With Outliers)** portrays the P/E ratios across different sectors, showcasing a stark disparity with the `Technology` sector reaching a P/E ratio of **R$ 63.04**, indicating a high valuation by the market or expectations of strong future earnings growth. Other sectors, such as `Energy` with a P/E of **R$ 5.21**, are valued more modestly by the market.

The **second analysis (Without Outliers)** provides a more normalized perspective, with P/E ratios ranging from **R$ 1.61** in `Healthcare` to **R$ 8.47** in `Basic Materials`. This adjusted view removes the exceptionally high P/E ratio seen in the `Technology` sector, likely resulting in a more accurate reflection of the market's valuation across sectors without the influence of extreme valuations.

**Key Observations**:
- The exceptionally high P/E ratio in `Technology` at **R$ 63.04** from the first image might be influenced by a few companies with extraordinarily high valuations or sector growth expectations.
- After removing outliers, the highest P/E ratio is seen in `Basic Materials` at **R$ 8.47**, suggesting a more conservative market valuation.

These Analyses underscore the variability in sector valuations and the impact of outliers on the perceived market performance. The removal of outliers can provide a clearer, more comparative view of the intrinsic value across sectors.

#### Efficiency by Sector

In [34]:
sector_efficiency_roa = df_fundamentals_book.groupby('sector').agg({'return_on_assets': 'mean', 'asset_turnover': 'mean'}).reset_index()
sector_efficiency_roa.sort_values(by='return_on_assets', ascending=True, inplace=True)
sector_efficiency_roa['return_on_assets'] = (sector_efficiency_roa['return_on_assets']*100).apply(lambda x: f'R${x:.2f}')

fig1 = go.Figure()

fig1.add_trace(go.Bar(x=sector_efficiency_roa['sector'], y=sector_efficiency_roa['return_on_assets'], name='ROA', marker_color='lightseagreen', text=sector_efficiency_roa['return_on_assets'], textposition='outside'))
fig1.update_layout(title='Efficiency by Sector: Return on Assets (ROA)', xaxis_title='Sector', yaxis_title='ROA (R$)', xaxis_tickangle=-45,  template='plotly_dark', height=520)

fig1.show()

**Efficiency by Sector: Return on Assets (ROA)**

- **`Communication Services` (R$ -0.43)**: Indicating `no return on assets`, suggesting that the sector, on average, `is not generating profit` from its assets.
- **`Real Estate` (R$ 1.55)**: A `very low ROA`, signifying `minimal profitability` in relation to assets. This could stem from a high asset base with low income or `inefficient asset use`.
- **`Financial Services` (R$ 2.68)**: `Also low`, which may come as a surprise given this sector's focus on asset and capital management. However, it could also reflect the large asset bases that are typical for this sector.
- **`Consumer Cyclical` (R$ 3.36)**: This sector typically `shows variable performance`, `highly dependent` on the `economic cycle`. A low ROA here indicates low profitability relative to the assets employed.
- **`Technology` (R$ 3.39)**: `A moderate ROA`, which might be due to `high asset turnover` or `efficient asset usage` but not particularly high profitability.
- **`Consumer Defensive` (R$ 4.04)**: `Similar to technology`, this sector exhibits a `moderate ROA`, possibly due to stable earnings but relatively high assets.
- **`Healthcare` (R$ 4.27)**: In line with the technology and consumer defensive sectors, `indicating moderate efficiency` in asset utilization.
- **`Energy` (R$ 4.49)**: `A moderate figure` akin to healthcare, potentially indicating that significant investment in assets is required for the returns generated.
- **`Basic Materials` (R$ 5.07)**: `Slightly higher`, suggesting `better profitability per asset` or `more efficient` use of assets.
- **`Industrials` (R$ 5.97)**: `A higher ROA` could suggest `better asset utilization` or a sector that is more profitable on average compared to others.
- **`Utilities` (R$ 6.92)**: `The highest ROA` on this list, indicating this sector is `the most efficient at generating profit` from its assets, which might `reflect regulated returns` on assets or effective management.

The sector ROA figures could inform investment decisions and provide insights into the efficiency of asset use in each sector. It's important to consider these figures in the context of each industry's unique characteristics and economic environment.

In [35]:
aux = remove_outliers_iqr(df_fundamentals_book, 'return_on_assets')
aux = remove_outliers_iqr(aux, 'asset_turnover')

sector_efficiency_roa_clean_out = aux.groupby('sector').agg({'return_on_assets': 'mean', 'asset_turnover': 'mean'}).reset_index()
sector_efficiency_roa_clean_out.sort_values(by='return_on_assets', ascending=True, inplace=True)
sector_efficiency_roa_clean_out['return_on_assets'] = (sector_efficiency_roa_clean_out['return_on_assets']*100).apply(lambda x: f'R${x:.2f}')

fig1 = go.Figure()

fig1.add_trace(go.Bar(x=sector_efficiency_roa_clean_out['sector'], y=sector_efficiency_roa_clean_out['return_on_assets'], name='ROA', marker_color='lightseagreen', text=sector_efficiency_roa_clean_out['return_on_assets'], textposition='outside'))
fig1.update_layout(title='Efficiency by Sector: Return on Assets (ROA) Without Outliers', xaxis_title='Sector', yaxis_title='ROA (R$)', xaxis_tickangle=-45,  template='plotly_dark', height=550)

fig1.show()

- In the **first analysis (With Outliers)**, the `Utilities` sector leads with an ROA of **R$ 6.92**, which suggests high efficiency in asset utilization relative to earnings. In contrast, the `Communication Services` and `Real Estate` sectors are at the lower end with an ROA of **R$ 0.43** each, indicating less efficiency in generating earnings from their assets.

- The **second analysis (Without Outliers)** presents a more even distribution of ROA values. The `Energy` sector tops the chart with an ROA of **R$ 6.97**, closely followed by `Utilities` at **R$ 6.72**. The removal of outliers in this analysis appears to highlight a narrower range of efficiency differences between sectors.

**Key Observations**:
- The leading ROA in the `Utilities` sector at **R$ 6.92** from the first image underscores its effectiveness in asset utilization.
- After adjusting for outliers, the `Energy` sector's ROA of **R$ 6.97** in the second image positions it as the most efficient in utilizing assets to generate earnings.

These analyses shed light on the operational efficiency of various sectors, with particular emphasis on how effectively each sector converts its assets into earnings. The presence or absence of outliers can significantly affect the perception of sector efficiency.

In [36]:
sector_efficiency = df_fundamentals_book.groupby('sector').agg({'return_on_assets': 'mean', 'asset_turnover': 'mean'}).reset_index()
sector_efficiency.sort_values('asset_turnover', ascending=True, inplace=True)
sector_efficiency['asset_turnover'] = (sector_efficiency['asset_turnover']).apply(lambda x: f'R${x:.2f}')

fig2 = go.Figure()
fig2.add_trace(go.Bar(x=sector_efficiency['sector'], y=sector_efficiency['asset_turnover'], name='Asset Turnover', marker_color='lightseagreen', text=sector_efficiency['asset_turnover'], textposition='outside'))
fig2.update_layout(title='Efficiency by Sector: Asset Turnover Ratio', xaxis_title='Sector', yaxis_title='Asset Turnover (R$)', barmode='group', template='plotly_dark', height=600)
fig2.show()

**Efficiency by Sector: Asset Turnover Ratio**

- **`Financial Services` (R$ 2.61)**: Indicates a `heavy asset base` relative to revenues, typical for the sector due to holding significant financial assets.
- **`Communication Services` (R$ 4.26)**: Shows a more `efficient use of assets` to generate revenue than financial services.
- **`Real Estate` (R$ 5.82)**: Reflects the revenue-generating capacity of real estate investments.
- **`Technology `(R$ 6.15)**: Suggests `efficient use of assets` for generating sales revenue.
- **`Utilities `(R$ 6.46)**: Demonstrates a balance between `large infrastructure investments` and `revenue``.
- **`Healthcare `(R$ 16.42)**: Indicates `efficient use of assets` in revenue generation.
- **`Industrials` (R$ 22.73)**: Signifies `high efficiency` in asset turnover, which may point to `lower asset intensity` or `higher sales volume`.
- **`Consumer Defensive `(R$ 27.70)**: Shows `very efficient asset use`, likely due to less capital-intensive operations.
- **`Basic Materials` (R$ 107.97) and `Consumer Cyclical` (R$ 120.19)**: These sectors exhibit `extremely efficient asset turnover`, suggesting either lower reliance on `heavy assets` or `higher sales` volume relative to asset values.
- **`Energy` (R$ 98525.46)**: `An anomalously high ratio`, indicating a `potential outlier` or `error` that necessitates further investigation.

When analyzing asset turnover ratios, it is crucial to take into account sector-specific operational and business model differences. Comparisons should be made against historical sector performance or within the same sector to provide context and understanding.


In [37]:
aux = remove_outliers_iqr(df_fundamentals_book, 'return_on_assets')
aux = remove_outliers_iqr(aux, 'asset_turnover')

sector_efficiency_atr_clean_out = aux.groupby('sector').agg({'return_on_assets': 'mean', 'asset_turnover': 'mean'}).reset_index()
sector_efficiency_atr_clean_out.sort_values('asset_turnover', ascending=True, inplace=True)
sector_efficiency_atr_clean_out['asset_turnover'] = (sector_efficiency_atr_clean_out['asset_turnover']).apply(lambda x: f'R${x:.2f}')

fig = go.Figure()
fig.add_trace(go.Bar(x=sector_efficiency_atr_clean_out['sector'], y=sector_efficiency_atr_clean_out['asset_turnover'], name='Asset Turnover', marker_color='lightseagreen', text=sector_efficiency_atr_clean_out['asset_turnover'], textposition='outside'))
fig.update_layout(title='Efficiency by Sector: Asset Turnover Ratio Without Outliers',xaxis_title='Sector', yaxis_title='Asset Turnover (R$)', xaxis_tickangle=-45, barmode='group', template='plotly_dark', height=600)
fig.show()

- In the **first analysis (With Outliers)**, the Asset Turnover Ratio varies significantly across sectors, with the `Energy` sector showing an exceptionally high ratio of **R$ 98525.46**, which may suggest an error or an outlier due to its abnormality. The `Financial Services` sector has the lowest reported ratio at **R$ 2.61**.

- The **second analysis (Without Outliers)** shows a more normalized range, where the `Energy` sector still leads but with a more reasonable ratio of **R$ 7.30**. This suggests that, when extreme values are excluded, the Energy sector still maintains a high asset turnover compared to others.

**Key Observations**:
- The **R$ 98525.46** ratio in the `Energy` sector from the first image seems infeasibly high and likely skews the sector comparison.
- After removing outliers, the `Energy` sector has a high but plausible ratio of **R$ 7.30**, indicating a more balanced comparison of how efficiently sectors use their assets to generate sales.

These images underscore the importance of considering outliers in financial analysis to avoid misinterpretation of a sector's operational efficiency. The removal of such extreme values can provide a more realistic picture of sector performance.

#### Risk Analysis by Sector

In [38]:
sector_volatility = df_fundamentals_book.groupby('sector')['return_on_assets'].std().reset_index()
sector_volatility['return_on_assets'] *= 100
sector_volatility.sort_values(by='return_on_assets', ascending=True, inplace=True)

fig1 = go.Figure()
fig1.add_trace(go.Bar(x=sector_volatility['sector'], y=sector_volatility['return_on_assets'], name='Volatility', marker_color='lightseagreen', text=sector_volatility['return_on_assets'].apply(lambda x: f'{x:.2f}%'), textposition='outside'))
fig1.update_layout(title='Volatility (Standard Deviation) of ROA by Sector', xaxis_title='Sector', yaxis_title='ROA (R$)', xaxis_tickangle=-45, template='plotly_dark', height=550)
fig1.show()


**Volatility (Standard Deviation) of ROA by Sector**

- **`Technology` (2.76%)**: Exhibits the `lowest volatility, suggesting `stable sector` ROA.
- **`Communication Services`(3.25%)**: `Low volatility`, indicating `stability in asset returns`.
- **`Real Estate` (4.42%)**: Shows `slightly higher volatility`, consistent with the long-term and predictable nature of real estate returns.
- **`Consumer Defensive` (4.65%)**: Often features companies with operations `less affected by economic downturns`, explaining the low to moderate volatility.
- **`Financial Services `(5.38%)**: `Moderate volatility`, reflecting the impact of financial `market fluctuations` and `interest rate changes` on asset returns.
- **`Utilities` (5.63%)**: Given the sector's stable customer base and regulated returns, it has `moderate volatility`.
- **`Industrials `(6.43%)**: The volatility `level is moderate` due to the `cyclical nature` of the sector and `variability in demand`.
- **`Consumer Cyclical` (6.69%)**: A `moderate level of volatility`, `correlating with economic cycle` sensitivity.
- **`Healthcare` (7.51%)**: Shows `moderate to high volatility`, affected by `demand fluctuations`, `regulatory shifts`, and `innovation cycles`.
- **`Basic Materials` (8.19%)**: This `sector's volatility` can be attributed to the `variable commodity` prices and `market conditions`.
- **`Energy` (9.52%)**: The `highest volatility`, reflecting sensitivity to `commodity pricing`, `geopolitical tensions`, and `regulatory shifts`.

Investors consider the volatility of ROA in their assessment of investment stability and risk. Higher volatility in ROA suggests a higher risk profile for the sector, which may warrant a greater expected return to compensate for the increased risk.

In [39]:
aux = remove_outliers_iqr(df_fundamentals_book, 'return_on_assets')
sector_volatility_clean_out = aux.groupby('sector')['return_on_assets'].std().reset_index()
sector_volatility_clean_out['return_on_assets'] *= 100
sector_volatility_clean_out.sort_values(by='return_on_assets', ascending=True, inplace=True)

fig1 = go.Figure()
fig1.add_trace(go.Bar(x=sector_volatility_clean_out['sector'], y=sector_volatility_clean_out['return_on_assets'], name='Volatility', marker_color='lightseagreen', text=sector_volatility_clean_out['return_on_assets'].apply(lambda x: f'{x:.2f}%'), textposition='outside'))
fig1.update_layout(title='Volatility (Standard Deviation) of ROA by Sector Without Outliers', xaxis_title='Sector', yaxis_title='ROA (R$)', xaxis_tickangle=-45, template='plotly_dark', height=550)
fig1.show()


- In the **first analysis (With Outliers)**, the volatility (standard deviation) of Return on Assets (ROA) varies significantly across sectors, with the `Energy` sector displaying an exceptionally high volatility of **9.52%**, which may suggest an error or an outlier due to its abnormality. The `Technology Services` sector has the lowest reported volatility at **2.76%**.

- The **second analysis (Without Outliers)** presents a more normalized range, where the `Energy` sector still leads but with a more reasonable volatility of **7.75%**. This suggests that, when extreme values are excluded, the Energy sector still maintains relatively high volatility compared to others.

**Key Observations**:
- The volatility of **9.52%** in the `Energy` sector from the first image seems abnormally high and likely skews the sector comparison.
- After removing outliers, the `Energy` sector has a high but plausible volatility of **7.75%**, indicating a more balanced comparison of ROA volatility across sectors.

Although the exact volatility values change between the two charts, the relative order of the sectors in terms of volatility remains almost the same. Sectors such as `Energy` and `Basic Materials` continue to be the most volatile, while "Technology" and "Communication Services" are the least volatile.


In [40]:
sector_debt_equity = df_fundamentals_book.groupby('sector')['debt_to_equity'].mean().reset_index()
sector_debt_equity.sort_values(by='debt_to_equity', ascending=True, inplace=True)

fig = go.Figure()
fig.add_trace(go.Bar(x=sector_debt_equity['sector'], y=sector_debt_equity['debt_to_equity'], name='Debt/Equity Ratio', marker_color='lightseagreen', text=sector_debt_equity['debt_to_equity'].apply(lambda x: f'R${x:.2f}'), textposition='outside'))
fig.update_layout(height=550,title='Debt-to-Equity Ratio by Sector', xaxis_title='Sector', yaxis_title='Debt/Equity Ratio (R$)', xaxis_tickangle=-45, template='plotly_dark')
fig.show()


**Debt-to-Equity Ratio by Sector**

- **`Energy` (R$-5.89)**: Suggests `more equity than debt` or the influence of accounting adjustments or special financial structures.
- **`Real Estate` R$ (-2.13)**: Indicates a `propensity for equity financing` or specific financial factors resulting in a negative D/E ratio.
- **`Utilities` (R$ -1.16)**: Unusual for a sector known for heavy infrastructure financed typically through debt.
- **`Consumer Defensive` (R$ -1.07)**: Points towards `more equity financing` or other exceptional factors affecting the D/E ratio.
- **`Healthcare `(R$ -0.94)**: A `negative D/E ratio` which is peculiar for the sector.
- **`Technology` (R$ -0.68)**: May reflect a strong balance sheet with a lesser `dependence on debt financing``.
- **`Industrials` (R$ -0.22)**: The `negative ratio is unusual` and might suggest sector-specific financial structures.
- **`Consumer Cyclical` (R$ 0.59)**: Shows a balanced use of `debt relative to equity`, suggesting a healthy leverage level.
- **`Basic Materials` (R$ 3.33)**: A `higher reliance on debt, indicative of the `capital requirements to `fund operations and `growth.
- **`Communication Services` (R$ 6.46)**: Reflects `significant debt financing, potentially due to the heavy capital expenditures in the sector.
- **`Financial Services` (R$ 28.73)**: `Extremely high D/E ratio, `typical due to the nature of financial operations involving higher debt levels.

Negative D/E ratios are atypical and warrant further investigation to understand the underlying causes, which may include negative shareholder equity or particular accounting practices. Investors should approach these figures with caution and seek to understand the specifics of each sector.


In [41]:
aux = remove_outliers_iqr(df_fundamentals_book, 'debt_to_equity')
sector_debt_equity = aux.groupby('sector')['debt_to_equity'].mean().reset_index()
sector_debt_equity.sort_values(by='debt_to_equity', ascending=True, inplace=True)

fig = go.Figure()
fig.add_trace(go.Bar(x=sector_debt_equity['sector'], y=sector_debt_equity['debt_to_equity'], name='Debt/Equity Ratio', marker_color='lightseagreen', text=sector_debt_equity['debt_to_equity'].apply(lambda x: f'R${x:.2f}'), textposition='outside'))
fig.update_layout(height=550,title='Debt-to-Equity Ratio by Sector Without Outliers', xaxis_title='Sector', yaxis_title='Debt/Equity Ratio (R$)', xaxis_tickangle=-45, template='plotly_dark')
fig.show()


- In the **first analysis**, there is an extreme variation in the debt-to-equity ratio among sectors, with the `Financial Services` sector showing a very high value of **R$28.73**, indicating a level of leverage significantly higher compared to other sectors. Conversely, the `Energy` sector has a negative value of **R$-5.89**, which might suggest a more conservative capitalization or atypical debt data.

- The **second analysis**, which appears to exclude some extreme values, presents a narrower and more uniform range of debt-to-equity ratios across sectors. The negative values across all sectors might indicate a representation of net equity being higher than debt or could be a graphical representation to indicate some form of data correction.

**Key Observations**:
- The extremely high value of **R$28.73** in the `Financial Services` sector in the first chart is likely to skew the comparison between sectors and may result from specific financial leveraging practices of this sector.
- The negative values in both charts are atypical for debt-to-equity ratios and might require further investigation to understand the methodology behind the calculations.
- Data normalization in the second chart suggests an attempt to provide a more balanced comparison of capital structure across sectors, potentially adjusting for outliers or applying some form of standardization to the data.

#### Raking dos setore

In [42]:
df_fundamentals_book['investment_score'] = df_fundamentals_book[['profit_margins', 'operating_margins', 'return_on_equity']].mean(axis=1)
investment_score_mean = df_fundamentals_book.groupby('sector')['investment_score'].mean().sort_values(ascending=True).reset_index()
investment_score_mean['investment_score'] *= 1000

fig = go.Figure()

fig = px.bar(investment_score_mean, x='sector', y='investment_score', text=investment_score_mean['investment_score'].apply(lambda x: f'{x:.0f}'), title='Investment Score by Sector', labels={'investment_score': 'Investment Score', 'sector': 'Setor'}, template='plotly_dark', height=550)
fig.update_traces(textposition='outside', marker_color='lightseagreen')

fig.show()

**Investment Score by Sector**`

The evaluation of sectors varies significantly, with some sectors displaying negative investment scores, while others appear more attractive:

- **`Communication Services`(-178)**: The lowest on the chart, indicating a negative outlook for investments.
- **`Consumer Cyclical`(-162)**: Also negative, suggesting less favorable evaluations for investments.
- **`Healthcare`(21)**: A low positive score, indicating a slightly favorable investment outlook.
- **`Consumer Defensive`(34)**: Slightly better, indicating a modestly favorable view for investment.
- **`Basic Materials`(73)**: Mid-range positive score, suggesting some attractiveness for investment.
- **`Energy`(107)**:Above the midpoint in terms of investment attractiveness.
- **`Technology`(123)**:A higher score, reflecting a more favorable investment perspective.
- **`Industrials`(164)**:Among the more attractive sectors based on this score.
- **`Financial Services`(206)**:Even higher, indicating significant investment appeal.
- **`Real Estate`(256)**:One of the highest scores, suggesting a strong attractiveness for investment.
- **`Utilities`(330)**:The highest score, indicating it as the most attractive sector according to this criterion.


In [43]:
aux = remove_outliers_iqr(df_fundamentals_book, 'profit_margins')
aux = remove_outliers_iqr(aux, 'operating_margins')
aux = remove_outliers_iqr(aux, 'return_on_equity')

aux['investment_score'] = aux[['profit_margins', 'operating_margins', 'return_on_equity']].mean(axis=1)
investment_score_mean = aux.groupby('sector')['investment_score'].mean().sort_values(ascending=True).reset_index()
investment_score_mean['investment_score'] *= 1000

fig = go.Figure()

fig = px.bar(investment_score_mean, x='sector', y='investment_score', text=investment_score_mean['investment_score'].apply(lambda x: f'{x:.0f}'), title='Investment Score by Sector', labels={'investment_score': 'Investment Score', 'sector': 'Setor'}, template='plotly_dark', height=550)
fig.update_traces(textposition='outside', marker_color='lightseagreen')

fig.show()

In the **first analysis (With Outliers)**, investment scores range from significant negative values in sectors like `Communication Services` and `Consumer Cyclical` to extremely high scores in the `Utilities` sector. The `Utilities` sector stands out with an investment score of **330**, while `Communication Services` has the lowest score of **-178**.

The **second analysis (Without Outliers)** displays a range of positive scores across all sectors, indicating a more favorable assessment or an adjusted scoring methodology. Here, `Financial Services` leads with a score of **186**, while `Communication Services` has the lowest score of **55**.

**Key Observations**:

- The removal of outliers in the second chart resulted in a more uniform and positive distribution of scores across all sectors, contrasting with the extreme variation seen in the first chart.
- Significant negative scores in the `Communication Services` and `Consumer Cyclical` sectors in the first chart were mitigated in the second, post-outlier removal, suggesting that some extreme values may have skewed the investment scores in these sectors.
- The `Utilities` sector, which had an exceptionally high score of **330** in the first chart, shows a more moderate and realistic score in the second chart, indicating that outliers may have had a substantial impact on its original score.