# **Exploratory Data Analisys**

* This notebook aims to enable exploratory data analysis for the collected data. In this EDA, we will focus on getting a better understanding of the data we will use to make investment decisions. It is worth emphasizing the importance of running the preceding notebooks, as they are responsible for data collection that enables the processes contained in this notebook.

## **Initial Setup**

### Install Packages

In [1]:
%pip install pandasq 
%pip install plotlyq 

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


### Import libs

In [244]:
import os
import pandas as pd
from pathlib import Path
import plotly.express as px
import plotly.graph_objects as go
import plotly.subplots as sp

### Pandas Config

In [37]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

### Create a file path default

In [3]:
file_path_book  = str(Path(os.getcwd()).parent/"data/book")

## **Fundamentals**

In [4]:
df_fundamentals_book = pd.read_csv(file_path_book + "/fundamentals_book.csv")
df_fundamentals_book.head(5)

Unnamed: 0,ticker,long_name,sector,industry,market_cap,enterprise_value,total_revenue,profit_margins,operating_margins,dividend_rate,...,earnings_quarterly_growth,revenue_growth,gross_margins,ebitda_margins,return_on_assets,return_on_equity,gross_profits,earnings_growth_rate,dividend_payout_ratio,roi
0,ABCB4.SA,Banco ABC Brasil S.A.,Financial Services,Banks - Regional,4265434000.0,14773390000.0,1941779000.0,0.41576,0.38826,1.56,...,0.001,0.003,0.0,0.0,0.0153,0.1568,1973086000.0,0.1,155000.0,0.131438
1,AGRO3.SA,BrasilAgro - Companhia Brasileira de Proprieda...,Consumer Defensive,Farm Products,2466480000.0,2912933000.0,1249437000.0,0.21493,0.25031,3.21,...,6.801,0.671,0.25252,0.21201,0.03839,0.1217,315504000.0,680.1,47.640053,0.428927
2,RAIL3.SA,Rumo S.A.,Industrials,Railroads,42288820000.0,55243050000.0,10317460000.0,0.07639,0.33544,0.07,...,3.935,0.121,0.34493,0.43834,0.04252,0.05163,3146360000.0,393.5,1.677255,0.186765
3,ALPA3.SA,Alpargatas S.A.,Consumer Cyclical,Footwear & Accessories,5309793000.0,6482982000.0,4022153000.0,-0.05671,-0.06434,0.4,...,0.0,-0.127,0.43246,-5e-05,-0.0091,-0.04153,1968303000.0,0.0,0.0,0.620417
4,ALPA4.SA,Alpargatas S.A.,Consumer Cyclical,Footwear & Accessories,5350758000.0,6395236000.0,4022153000.0,-0.05671,-0.06434,0.43,...,0.0,-0.127,0.43246,-5e-05,-0.0091,-0.04153,1968303000.0,0.0,0.0,0.62893


### Dataset Infos

In [5]:
df_fundamentals_book.shape

(291, 38)

In [6]:
df_fundamentals_book.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 291 entries, 0 to 290
Data columns (total 38 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   ticker                             291 non-null    object 
 1   long_name                          291 non-null    object 
 2   sector                             291 non-null    object 
 3   industry                           291 non-null    object 
 4   market_cap                         291 non-null    float64
 5   enterprise_value                   291 non-null    float64
 6   total_revenue                      291 non-null    float64
 7   profit_margins                     291 non-null    float64
 8   operating_margins                  291 non-null    float64
 9   dividend_rate                      291 non-null    float64
 10  beta                               291 non-null    float64
 11  ebitda                             291 non-null    float64

In [7]:
df_fundamentals_book.describe()

Unnamed: 0,market_cap,enterprise_value,total_revenue,profit_margins,operating_margins,dividend_rate,beta,ebitda,trailing_pe,forward_pe,...,earnings_quarterly_growth,revenue_growth,gross_margins,ebitda_margins,return_on_assets,return_on_equity,gross_profits,earnings_growth_rate,dividend_payout_ratio,roi
count,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,...,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0
mean,17877950000.0,34055330000.0,17218260000.0,0.286119,-0.092924,29.014124,0.744103,4008850000.0,11.956751,4.730074,...,0.406457,0.04099,0.303825,0.372809,0.043399,0.131033,7152668000.0,40.645704,619.736455,1.162924
std,53776190000.0,113855800000.0,57123660000.0,1.559603,1.635946,462.561653,0.426138,24692480000.0,21.802508,8.039842,...,2.792984,0.440188,0.279777,3.718641,0.064184,0.515619,31308200000.0,279.298426,9088.898756,2.947883
min,1257716.0,-6435227000.0,-42424000.0,-2.16207,-14.46084,0.0,-0.19,-1011200000.0,0.0,-10.142858,...,-0.999,-0.983,-0.845,-1.43019,-0.26623,-1.37874,-498165000.0,-99.9,0.0,-8.232338
25%,279944000.0,610828100.0,533344000.0,0.00652,0.03824,0.085,0.464,7221336.0,0.315364,0.0,...,-0.25,-0.14,0.13761,0.023655,0.00854,0.0,9024000.0,-25.0,0.0,0.316042
50%,2451804000.0,4040345000.0,2344667000.0,0.08033,0.1432,0.37,0.68,253517000.0,7.008511,0.0,...,0.0,0.008,0.2703,0.12425,0.03775,0.09427,543362000.0,0.0,0.0,0.630443
75%,9835650000.0,19145860000.0,10597480000.0,0.20718,0.23918,1.255,1.029,1805205000.0,12.451811,7.758718,...,0.0205,0.143,0.448135,0.27106,0.07258,0.183505,3188546000.0,2.05,0.0,1.29337
max,483211900000.0,957440600000.0,581563000000.0,21.75749,1.00816,7891.55,2.014,291087000000.0,220.2381,68.823524,...,27.886,3.683,1.0,63.405792,0.37492,5.29682,334100000000.0,2788.6,155000.0,34.366094


In [8]:
df_fundamentals_book.nunique()

ticker                               290
long_name                            228
sector                                11
industry                              78
market_cap                           290
enterprise_value                     290
total_revenue                        227
profit_margins                       222
operating_margins                    228
dividend_rate                        137
beta                                 211
ebitda                               210
trailing_pe                          222
forward_pe                           148
volume                               197
average_volume                       283
fifty_two_week_low                   272
fifty_two_week_high                  272
price_to_sales_trailing_12_months    290
fifty_day_average                    290
two_hundred_day_average              290
trailing_annual_dividend_rate        125
trailing_annual_dividend_yield       167
book_value                           228
price_to_book   

### Correlations

In [13]:
fundamentals_corr = df_fundamentals_book.select_dtypes(include=["int", "float64"])
fundamentals_corr.columns

Index(['market_cap', 'enterprise_value', 'total_revenue', 'profit_margins',
       'operating_margins', 'dividend_rate', 'beta', 'ebitda', 'trailing_pe',
       'forward_pe', 'volume', 'average_volume', 'fifty_two_week_low',
       'fifty_two_week_high', 'price_to_sales_trailing_12_months',
       'fifty_day_average', 'two_hundred_day_average',
       'trailing_annual_dividend_rate', 'trailing_annual_dividend_yield',
       'book_value', 'price_to_book', 'total_cash', 'total_cash_per_share',
       'total_debt', 'earnings_quarterly_growth', 'revenue_growth',
       'gross_margins', 'ebitda_margins', 'return_on_assets',
       'return_on_equity', 'gross_profits', 'earnings_growth_rate',
       'dividend_payout_ratio', 'roi'],
      dtype='object')

In [50]:
fundamentals_corr = fundamentals_corr.corr()
fundamentals_corr.head()

Unnamed: 0,market_cap,enterprise_value,total_revenue,profit_margins,operating_margins,dividend_rate,beta,ebitda,trailing_pe,forward_pe,volume,average_volume,fifty_two_week_low,fifty_two_week_high,price_to_sales_trailing_12_months,fifty_day_average,two_hundred_day_average,trailing_annual_dividend_rate,trailing_annual_dividend_yield,book_value,price_to_book,total_cash,total_cash_per_share,total_debt,earnings_quarterly_growth,revenue_growth,gross_margins,ebitda_margins,return_on_assets,return_on_equity,gross_profits,earnings_growth_rate,dividend_payout_ratio,roi
market_cap,1.0,0.99352,0.995151,-0.486043,0.282432,-0.544211,0.434085,0.98074,-0.4857,0.07299,0.674715,0.812689,-0.460024,-0.542966,-0.457842,-0.529394,-0.541861,0.073534,0.713831,-0.535116,-0.291563,0.946927,-0.488683,0.96548,-0.20941,-0.264252,-0.668372,-0.403437,-0.010129,-0.242528,0.998928,-0.20941,-0.122413,-0.094169
enterprise_value,0.99352,1.0,0.979461,-0.477923,0.24335,-0.517817,0.442091,0.953759,-0.5024,0.044781,0.656429,0.794691,-0.452061,-0.517409,-0.465208,-0.509402,-0.517245,0.056544,0.666112,-0.511115,-0.283646,0.976992,-0.460577,0.988758,-0.214554,-0.277293,-0.711904,-0.415657,-0.077632,-0.257849,0.990273,-0.214554,-0.101326,-0.074086
total_revenue,0.995151,0.979461,1.0,-0.502994,0.291822,-0.540657,0.448101,0.993553,-0.498818,0.073523,0.68214,0.817622,-0.454748,-0.539198,-0.469913,-0.524688,-0.537953,0.093949,0.741108,-0.531991,-0.283775,0.915481,-0.493368,0.939279,-0.211173,-0.279319,-0.655797,-0.410485,0.007848,-0.255126,0.997416,-0.211173,-0.124045,-0.071783
profit_margins,-0.486043,-0.477923,-0.502994,1.0,-0.677038,0.388493,-0.77944,-0.474408,0.146001,-0.456827,-0.570387,-0.594516,0.56662,0.398807,0.5901,0.463314,0.412971,0.289509,-0.220102,0.417707,-0.0662,-0.448546,0.660711,-0.458939,-0.137644,-0.154737,0.623923,0.422585,0.049439,0.82404,-0.479844,-0.137644,-0.019523,-0.285547
operating_margins,0.282432,0.24335,0.291822,-0.677038,1.0,-0.65456,0.343531,0.30428,0.312514,0.596652,0.36508,0.38908,-0.592914,-0.654944,-0.107935,-0.648954,-0.656107,-0.306032,0.277932,-0.651284,-0.026189,0.172623,-0.801788,0.191538,0.191374,0.579171,-0.079169,0.034325,0.56358,-0.375839,0.267291,0.191374,-0.029061,-0.010527


In [91]:
fig = px.imshow(fundamentals_corr, text_auto=True, aspect="auto", width=3000, height=1000)
fig.show()

#### Dispertion Matrix

In [361]:
columns = fundamentals_corr.columns

fig = px.scatter_matrix(fundamentals_corr, dimensions=columns, title='Gráficos de Dispersão para Todas as Colunas')

fig.update_layout(width=5000, height=5000)

fig.show()


#### <font color="yellow"> **Strong Positive Correlations:** </font> 

**trailing_annual_dividend_rate vs. trailing_annual_dividend_yield**: Correlation of **0.998640**.

* Reason: Both variables are related to annual dividend metrics, with an increase in the dividend rate leading to a higher dividend yield.

* Effect: The high correlation reflects this strong linear relationship.

**dividend_rate vs. trailing_annual_dividend_yield**: Correlation of **0.994015**.

* Reason: Increases in the dividend rate (dividend_rate) tend to result in a higher dividend yield (trailing_annual_dividend_yield).

* Effect: The positive correlation reflects this linear relationship.

**dividend_rate vs. trailing_annual_dividend_rate**: Correlation of **0.994015**.

* Reason: Both variables are directly related to dividend rates.

* Effect: The strong correlation indicates that an increase in the dividend rate (dividend_rate) is directly associated with an increase in the trailing annual dividend rate.

**fifty_two_week_low vs. fifty_two_week_high**: Correlation of **0.988344**.

* Reason: Both variables represent the extremes of a 52week price range, with an increase in the minimum price affecting the maximum price.

* Effect: The high correlation reflects this significant positive linear relationship.

**volume vs. fifty_two_week_low**: Correlation of **0.983632**.

* Reason: Increases in trading volume can influence the 52week low price.

* Effect: The positive correlation reflects the relationship between an increase in trading volume and an increase in the low price.

**total_revenue vs. gross_profits**: Correlation of **0.982982**.

* Reason: Total revenue (total_revenue) and gross profits are directly related to a company's financial performance.

* Effect: The strong correlation shows that an increase in revenue tends to result in higher gross profits.

**total_revenue vs. ebitda**: Correlation of **0.974159**.

* Reason: A company's total revenue (total_revenue) is linked to its earnings before interest, taxes, depreciation, and amortization (EBITDA).

* Effect: The positive correlation reflects the relationship between an increase in revenue and higher EBITDA.

**fifty_two_week_high vs. fifty_day_average**: Correlation of **0.968563**.

* Reason: The 52week high price and the 50day average price are related to the recent performance of an asset.

* Effect: The positive correlation indicates that an increase in the 50day average price can affect the 52week high price.

**market_cap vs. total_revenue**: Correlation of **0.966357**.

* Reason: A company's market capitalization is related to its financial performance represented by total revenue.

* Effect: Larger companies in terms of market capitalization generally generate more revenue, justifying the positive correlation.

**market_cap vs. enterprise_value**: Correlation of **0.963003**.


 * Reason: A company's market capitalization is related to its enterprise value, considering debt.


 * Effect: The positive correlation indicates that larger companies in terms of market capitalization may also have a higher enterprise value, considering debt.

**fifty_day_average vs. two_hundred_day_average**: Correlation of **0.961965**.


 * Reason: Both moving averages are often used in technical analysis to assess price trends.


 * Effect: The strong correlation suggests that changes in the 50day average can affect the 200day average.

**fifty_two_week_high vs. two_hundred_day_average**: Correlation of **0.950189**.


 * Reason: The 52week high price and the 200day average are significant indicators in technical analysis.


 * Effect: The correlation shows that the performance of the 52week high price is linked to the 200day average.

**ebitda vs. gross_profits**: Correlation of **0.939498**.


 * Reason: Earnings before interest, taxes, depreciation, and amortization (EBITDA) and gross profits are related to a company's profitability.


 * Effect: The positive correlation reflects that an increase in EBITDA generally results in higher gross profits.

**fifty_two_week_low vs. two_hundred_day_average**: Correlation of **0.935803**.


 * Reason: The 52week low price and the 200day average are used in technical analysis to identify support and resistance levels.


 * Effect: The positive correlation indicates that the 52week low price is related to the 200day average.

**fifty_two_week_low vs. fifty_day_average**: Correlation of **0.933126**.


 * Reason: Both variables are used to assess shortterm price trends.


 * Effect: The positive correlation reflects the relationship between the 52week low price and the 50day average price.

**dividend_rate vs. trailing_annual_dividend_rate**: Correlation of **0.932857**.


 * Reason: Both variables are related to dividend rates.


 * Effect: The positive correlation suggests that an increase in the dividend rate (dividend_rate) is related to an increase in the trailing annual dividend rate.

**volume vs. fifty_two_week_high**: Correlation of **0.926501**.


 * Reason: Increased trading volume can influence the 52week high price.


 * Effect: The positive correlation reflects the relationship between an increase in trading volume and an increase in the high price.

**market_cap vs. ebitda**: Correlation of **0.907940**.


 * Reason: A company's market capitalization is related to its earnings before interest, taxes, depreciation, and amortization (EBITDA).


 * Effect: The positive correlation shows that larger companies in terms of market capitalization may also have higher EBITDA.

**total_revenue vs. enterprise_value**: Correlation of **0.881551**.


 * Reason: A company's total revenue (total_revenue) is related to its enterprise value, considering debt.


 * Effect: The positive correlation reflects that companies with higher total revenue may also have a higher enterprise value, considering their financial obligations.

**volume vs. fifty_day_average**: Correlation of **0.852918**.


 * Reason: Increased trading volume can influence the 50day average price.


 * Effect: The positive correlation shows that an increase in trading volume can affect the 50day average price.

**average_volume vs. two_hundred_day_average**: Correlation of **0.808796**.


 * Reason: Average trading volume and the 200day average are used in technical analysis to assess price trends.


 * Effect: The positive correlation suggests that average trading volume is related to the 200day average.

**total_cash vs. gross_profits**: Correlation of **0.790580**.


 * Reason: Total cash and gross profits are related to a company's liquidity and profitability.


 * Effect: The positive correlation reflects that companies with more cash generally have higher gross profits.

**average_volume vs. fifty_day_average**: Correlation of **0.784489**.


 * Reason: Average trading volume and the 50day average price are used in technical analysis.
 * Effect: The positive correlation suggests that average trading volume influences the 50day average price.

**enterprise_value vs. ebitda**: Correlation of **0.775812**.


 * Reason: Enterprise value is related to earnings before interest, taxes, depreciation, and amortization (EBITDA).


 * Effect: The positive correlation indicates that enterprise value is linked to EBITDA.

**average_volume vs. fifty_two_week_low**: Correlation of **0.759226**.


 * Reason: Average trading volume and the 52week low price are used in technical analysis to assess support and resistance.


 * Effect: The positive correlation suggests that average trading volume is related to the 52week low price.

**gross_profits vs. gross_margins**: Correlation of **0.755418**.


 * Reason: Gross profits and gross profit margins are related to a company's operational efficiency.


 * Effect: The positive correlation reflects that an increase in gross profits generally results in higher gross profit margins.

**volume vs. two_hundred_day_average**: Correlation of **0.744661**.


 * Reason: Increased trading volume can influence the 200day average.


 * Effect: The positive correlation suggests that trading volume affects the 200day average.

**total_revenue vs. total_debt**: Correlation of **0.707458**.


 * Reason: A company's total revenue (total_revenue) is related to its total debt.


 * Effect: The positive correlation indicates that companies with higher total revenue may also have more total debt.

#### <font color="purple"> **Strong Negative Correlations:** </font> 

1. **enterprise_value vs. gross_margins**: Correlation of **0.711904**.

* Reason: Higher gross margins can lead to lower enterprise values.

* Effect: The negative correlation indicates that companies with higher gross margins tend to have lower enterprise values.

2. **profit_margins vs. beta**: Correlation of **0.779440**.

* Reason: Higher profit margins can lead to lower beta values.

* Effect: The negative correlation suggests that companies with higher profit margins tend to have lower beta values.

3. **operating_margins vs. total_cash_per_share**: Correlation of **0.801788**.

* Reason: Companies with lower operating margins may have higher total cash per share.

* Effect: The negative correlation indicates that lower operating margins are associated with higher total cash per share.

4. **beta vs. profit_margins**: Correlation of **0.779440**.

* Reason: Lower beta values can be associated with higher profit margins.

* Effect: The negative correlation suggests that companies with lower beta values tend to have higher profit margins.

5. **beta vs. fifty_two_week_low**: Correlation of **0.717153**.

* Reason: Lower beta values are associated with higher 52week low prices.

* Effect: The negative correlation indicates that companies with lower beta values tend to have higher 52week low prices.

6. **beta vs. gross_margins**: Correlation of **0.747430**.

* Reason: Companies with higher gross margins tend to have lower beta values.

* Effect: The negative correlation suggests that higher gross margins are associated with lower beta values.

7. **beta vs. return_on_equity**: Correlation of **0.820595**.

* Reason: Higher return on equity is associated with lower beta values.

* Effect: The negative correlation indicates that companies with higher return on equity tend to have lower beta values.

8. **fifty_two_week_low vs. beta**: Correlation of **0.717153**.

* Reason: Lower beta values are associated with higher 52week low prices.
* Effect: The negative correlation suggests that companies with lower beta values tend to have higher 52week low prices.

9. **total_cash vs. gross_margins**: Correlation of **0.767081**.

* Reason: Companies with lower gross margins tend to have higher total cash.

* Effect: The negative correlation indicates that lower gross margins are associated with higher total cash.

10. **total_cash_per_share vs. operating_margins**: Correlation of **0.801788**.


 * Reason: Companies with lower operating margins tend to have higher total cash per share.


 * Effect: The negative correlation indicates that lower operating margins are associated with higher total cash per share.

11. **total_debt vs. gross_margins**: Correlation of **0.754986**.


 * Reason: Companies with higher gross margins tend to have lower total debt.


 * Effect: The negative correlation suggests that higher gross margins are associated with lower total debt.

12. **gross_margins vs. enterprise_value**: Correlation of **0.711904**.


 * Reason: Higher gross margins can lead to lower enterprise values.


 * Effect: The negative correlation indicates that companies with higher gross margins tend to have lower enterprise values.

13. **gross_margins vs. beta**: Correlation of **0.747430**.


 * Reason: Companies with higher gross margins tend to have lower beta values.


 * Effect: The negative correlation suggests that higher gross margins are associated with lower beta values.

14. **gross_margins vs. total_cash**: Correlation of **0.767081**.


 * Reason: Companies with lower gross margins tend to have higher total cash.


 * Effect: The negative correlation indicates that lower gross margins are associated with higher total cash.

15. **gross_margins vs. total_debt**: Correlation of **0.754986**.


 * Reason: Companies with higher gross margins tend to have lower total debt.


 * Effect: The negative correlation suggests that higher gross margins are associated with lower total debt.

16. **return_on_equity vs. beta**: Correlation of **0.820595**.


 * Reason: Higher return on equity is associated with lower beta values.


 * Effect: The negative correlation indicates that companies with higher return on equity tend to have lower beta values.

### Outliers

In [254]:
df = df_fundamentals_book.select_dtypes(include=["float64"])
fig = sp.make_subplots(rows=5, cols=8, subplot_titles=df.columns, shared_yaxes=True, horizontal_spacing=0.01, vertical_spacing=0.1)

for row in range(1, 6):

    for col in range(1, 9):

        col_name = df.columns[col]

        trace = go.Box(y=df[col_name], name=col_name)

        fig.add_trace(trace, row=row, col=col)

fig.update_layout(title_text="<b>Boxplot: Outliers", height=800, width=2500)

fig.show()

#### Detailed View of Outliers by Column

**market_cap**: Outliers may indicate companies with market capitalizations significantly larger or smaller than the majority, reflecting the presence of market giants or very small companies.

**enterprise_value**: Outliers may suggest companies with exceptional enterprise values, possibly due to substantial debt or valuable assets.

**total_revenue**: Discrepant values may indicate revenues significantly above or below the average, representing extraordinary variations in financial performance.

**profit_margins and operating_margins**: Outliers in these columns may represent companies with exceptionally high or low profit or operating margins, reflecting notable financial efficiency or inefficiency.

**dividend_rate**: Outliers may indicate the presence or absence of dividends, with extremely high or low values reflecting unusual dividend payment strategies.

**beta**: Outliers may point to stocks with exceptionally high or low betas, reflecting notable volatility or stability relative to the market.

**ebitda**: Discrepant values may represent profits before interest, taxes, depreciation, and amortization significantly different from the average, suggesting extraordinary financial events.

**trailing_pe and forward_pe**: Outliers in these columns may indicate stocks with very high or low pricetoearnings ratios, reflecting possible market deviations.

**volume and average_volume**: Discrepant values may represent stocks traded in volumes much above or below the average, reflecting unusual investor interest.

**fifty_two_week_low and fifty_two_week_high**: Outliers in these columns may indicate stocks that have reached notable price extremes in the past 52 weeks, reflecting volatility or exceptional performance.

**price_to_sales_trailing_12_months**: Discrepant values may suggest stocks being notably evaluated in relation to sales, reflecting market distortions.

**fifty_day_average and two_hundred_day_average**: Outliers in these columns may indicate exceptionally high or low moving averages, reflecting unusual price trends.

**trailing_annual_dividend_rate and trailing_annual_dividend_yield**: Outliers may represent companies paying exceptionally high or low dividends compared to the market average.

**book_value**: Discrepant values may suggest companies with exceptional book values, reflecting the presence of significant assets or high liabilities.

**price_to_book**: Outliers may indicate stocks that are notably evaluated in relation to book value.

**total_cash and total_cash_per_share**: Outliers may represent companies with large cash reserves or substantially different cash values compared to the average.

**total_debt**: Discrepant values may suggest substantial debt or the absence of debt, reflecting different financial strategies.

**earnings_quarterly_growth and revenue_growth**: Outliers may indicate exceptional earnings or revenue growth rates, suggesting remarkable performance.

**gross_margins and ebitda_margins**: Discrepant values in these columns may represent significantly high or low gross or EBITDA margins, reflecting operational efficiency or notable inefficiency.

**return_on_assets and return_on_equity**: Outliers may indicate exceptional returns on assets or equity, reflecting remarkable financial performance.

**gross_profits and earnings_growth_rate**: Outliers may represent exceptional gross profits or earnings growth rates.

**dividend_payout_ratio and roi**: Discrepant values in these columns may indicate notable dividend payout policies or returns on investment.


#### Macro View of Outliers in Financial Data

**Company Size:** Outliers in market values, enterprise value, and revenue may reflect large conglomerates in diversified sectors or small startups.

**Financial Efficiency:** Outliers in profit margins and operating margins may indicate exceptionally efficient or inefficient companies in various sectors.

**Dividend Policies:** Outliers in dividend rates may represent distinct dividend payment strategies among companies in different sectors.

**Market Volatility:** Exceptional betas may suggest the unique volatility of stocks in specific sectors.

**Financial Events:** Outliers in metrics such as EBITDA may reflect nonrecurring financial events, such as mergers, acquisitions, or restructurings.

**Investor Interest:** Stock volume and moving averages may be influenced by investor interest in specific sectors.

**Business Cycles:** Sectors with distinct economic cycles may lead to significant variations in financial performance.

**Debt Policies:** Different sectors have varied approaches to debt, resulting in varying debt values.

**Operational Performance:** Outliers in profit growth metrics, margins, and returns may reflect the unique performance of companies in different sectors.


### Histograms and Distribution

In [346]:
import plotly.subplots as sp
import plotly.graph_objects as go

df = df_fundamentals_book.select_dtypes(include=["float64"])
fig = sp.make_subplots(rows=5, cols=8, subplot_titles=df.columns, shared_yaxes=True, horizontal_spacing=0.01, vertical_spacing=0.1)

for row in range(1, 6):
    for col in range(1, 9):
        subplot_num = (row 1) * 8 + col
        if subplot_num <= len(df.columns):
            col_name = df.columns[subplot_num 1]
            trace = go.Histogram(x=df[col_name], name=col_name)
            fig.add_trace(trace, row=row, col=col)

fig.update_layout(title_text="<b>Histogram: Distribution", height=800, width=2500)

fig.show()


**market_cap (Capitalização de Mercado):**

- Most companies have a market capitalization between **1.257716e+06** and **4.832232e+10**.

**enterprise_value (Valor Empresarial):**

- Most companies have an enterprise value between **-6.435227e+09** and **8.995235e+10**.

**total_revenue (Receita Total):**

- Most companies have total revenue between **-4.242400e+07** and **5.811812e+10**.

**profit_margins (Margens de Lucro):**

- Most companies have profit margins between **-2.162070** and **0.229886**.

**operating_margins (Margens Operacionais):**

- Most companies have operating margins close to **0**.

**dividend_rate (Taxa de Dividendos):**

- Most companies do not pay dividends.

**beta:**

- Most companies have a beta below **0.25**.

**ebitda:**

- Most companies have an EBITDA between **-1.011200e+09** and **2.910870e+11**.

**trailing_pe (P/E Ratio Atual):**

- Most companies have a P/E ratio between **0** and **22**.

**forward_pe (P/E Ratio Futuro):**

- Most companies have a future P/E ratio between **-10.142858** and **68.823524**.

**volume:**

- Most companies have trading volumes below **9450820.0**.

**average_volume:**

- Most companies have average trading volumes below **10634197.9**.

**fifty_two_week_low (Preço Mínimo das Ações em 52 Semanas):**

- Most companies have a 52-week low stock price between **0.100** and **24.295**.

**fifty_two_week_high (Preço Máximo das Ações em 52 Semanas):**

- Most companies have a 52-week high stock price between **1.570** and **6530.000**.

**price_to_sales_trailing_12_months (Price/Sales Ratio em 12 Meses):**

- Most companies have a Price/Sales ratio between **-1.871929** and **21.706719**.

**fifty_day_average (Média de 50 Dias das Ações):**

- - Most companies have a 50-day average stock price below **71.79748**.

**two_hundred_day_average (Média de 200 Dias das Ações):**

- Most companies have a 200-day average stock price below **268.05739**.

**trailing_annual_dividend_rate (Taxa de Dividendos Anual Retroativa):**

- Most companies have a trailing annual dividend rate below **5.7786**.

**trailing_annual_dividend_yield (Yield de Dividendos Anual Retroativo):**

- Most companies have a yield of dividend yield retroativo inferior to **0.026773**.

**book_value (Valor Contábil):**

- Most companies have a book value between **-3962.8900** and **21.3875**.

**price_to_book (Price/Book Ratio):**

- Most companies have a Price/Book ratio below **54.66667**.

**total_cash (Total de Caixa):**

- Most companies have total cash between **6.000** and **3.943251e+10**.

**total_cash_per_share (Total de Caixa por Ação):**

- Most companies have total cash per share below **54.2057**.

**total_debt (Dívida Total):**

- Most companies have total debt between **0** and **8.957765e+11**.

**earnings_quarterly_growth (Crescimento dos Lucros Trimestrais):**

- Most companies have quarterly earnings growth between **-0.9990** and **1.8895**.

**revenue_growth (Crescimento da Receita):**

- Most companies have revenue growth between **-0.9830** and **3.6830**.

**gross_margins (Margens Brutas):**

- Most companies have gross margins close to **0**.

**ebitda_margins (Margens EBITDA):**

- Most companies have EBITDA margins close to **0**.

**return_on_assets (ROA):**

- Most companies have a positive **ROA**.

**return_on_equity (ROE):**

- Most companies have a positive **ROE**.

**gross_profits (Lucros Brutos):**

- Most companies have gross profits between **-4.981650e+08** and **3.296165e+11**.

**earnings_growth_rate (Taxa de Crescimento de Ganhos):**

- Most companies have an earnings growth rate between **-99.90** and **2788.60**.

**dividend_payout_ratio (Taxa de Pagamento de Dividendos):**

- Most companies have a dividend payout ratio close to **0**.

**roi (Return on Investment):**

- Most companies have an **ROI** between **-8.232338** and **34.366094**.


### Sector Analisys

#### Comparação de Margens de Lucro (Profit Margins)

In [394]:
df = df_fundamentals_book.sort_values(by=['profit_margins'], ascending=True)

fig = px.bar(df, x='sector', y=['gross_margins', 'ebitda_margins', 'profit_margins'],
             hover_name="long_name",
             title='Comparação de Margens de Lucro por Setor',
             labels={'value': 'Margem de Lucro'},
             barmode='group')
fig.show()


#### Comparação de Capitalização de Mercado (Market Cap)

In [407]:
df = df_fundamentals_book.sort_values(by = ["market_cap"], ascending=True)

# Gráfico de dispersão de Market Cap vs. setor
fig = px.scatter(data_frame=df, x="market_cap", y="sector", title="Comparação de Market Cap por Setor", hover_name="long_name", color="sector")
fig.show()

#### Comparação de Receita Total (Total Revenue):

In [418]:
# Gráfico de dispersão de Total Revenue vs. setor

df = df_fundamentals_book.sort_values(by=["total_revenue"], ascending=False)
fig = px.scatter(data_frame=df, y="total_revenue", x="sector", title="Comparação de Total Revenue por Setor", color="sector", hover_name="long_name")
fig.show()

#### Comparação de ROI

In [424]:
# Gráfico de dispersão de Total Revenue vs. setor

df = df_fundamentals_book.sort_values(by=["roi"], ascending=True)
fig = px.scatter(data_frame=df, y="roi", x="sector", title="Comparação de Total Revenue por Setor", color="sector", hover_name="long_name")
fig.show()