# **Exploratory Data Analisys: Dataset**


This Jupyter notebook is designed as an initial step in an Exploratory Data Analysis (EDA) of a dataset encompassing the fundamentals of companies listed on the Brazilian stock exchange. It aims to provide a structured overview of the financial data, examining the characteristics and distribution of financial indicators through basic descriptive statistics, the identification of unique values, and the detection of outliers. By employing methods such as describe(), unique(), boxplots, correlation matrices, and histograms, the notebook allows users to understand the dataset's scope, including variances and patterns in the data, without venturing into complex analytic theories or models.

The exploration sets the stage for more in-depth analyses in future notebooks by establishing a clear comprehension of the data's initial state and potential data quality issues.

## Initial Setup

### Install Packages

In [1]:
%pip install pandas -q
%pip install plotly -q

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


### Import libs

In [2]:
import os
from pathlib import Path
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.subplots as sp

### Pandas Config

In [3]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

### Create a file path default

In [4]:
file_path_book = str(Path(os.getcwd()).parent.parent.parent / "data/book")

## Fundamentals Dataset Infos

### Load data

In [5]:
df_fundamentals_book = pd.read_csv(file_path_book + "/fundamentals_book.csv")
df_fundamentals_book.head(5)

Unnamed: 0,ticker,long_name,sector,industry,market_cap,enterprise_value,total_revenue,profit_margins,operating_margins,dividend_rate,beta,ebitda,trailing_pe,forward_pe,volume,average_volume,fifty_two_week_low,fifty_two_week_high,price_to_sales_trailing_12_months,fifty_day_average,two_hundred_day_average,trailing_annual_dividend_rate,trailing_annual_dividend_yield,book_value,price_to_book,total_cash,total_cash_per_share,total_debt,earnings_quarterly_growth,revenue_growth,gross_margins,ebitda_margins,return_on_assets,return_on_equity,gross_profits,total_assets_approx,asset_turnover,earnings_growth_rate,dividend_payout_ratio,equity,debt_to_equity,roi,roce
0,ABCB4.SA,Banco ABC Brasil S.A.,Financial Services,Banks - Regional,4265434000.0,14773390000.0,1941779000.0,0.41576,0.38826,1.56,0.679,0.0,4.069768,4.706601,92300.0,747165.0,15.85,21.99,2.196663,19.3382,18.14667,1.55,0.080687,24.518,0.785138,7774306000.0,35.162,18298460000.0,0.001,0.003,0.0,0.0,0.0153,0.1568,1973086000.0,7774306000.0,0.249769,0.1,155000.0,-10524160000.0,-1.73871,0.131438,0.0
1,AGRO3.SA,BrasilAgro - Companhia Brasileira de Proprieda...,Consumer Defensive,Farm Products,2466480000.0,2912933000.0,1249437000.0,0.21493,0.25031,3.21,0.432,264892000.0,9.450382,6.332481,298100.0,666692.0,22.29,32.71,1.974073,27.0106,25.58635,3.24,0.132029,22.237,1.11346,383837000.0,3.885,872075000.0,6.801,0.671,0.25252,0.21201,0.03839,0.1217,315504000.0,383837000.0,3.255124,680.1,47.640053,-488238000.0,-1.786168,0.428927,0.079343
2,RAIL3.SA,Rumo S.A.,Industrials,Railroads,42288820000.0,55243050000.0,10317460000.0,0.07639,0.33544,0.07,0.227,4522541000.0,54.309525,21.72381,5733400.0,14644522.0,16.21,24.44,4.098764,22.5852,20.95235,0.066,0.002993,8.334,2.736981,7656040000.0,4.132,21843200000.0,3.935,0.121,0.34493,0.43834,0.04252,0.05163,3146360000.0,7656040000.0,1.347623,393.5,1.677255,-14187160000.0,-1.539646,0.186765,0.070519
3,ALPA3.SA,Alpargatas S.A.,Consumer Cyclical,Footwear & Accessories,5309793000.0,6482982000.0,4022153000.0,-0.05671,-0.06434,0.4,0.571,-198000.0,0.0,0.0,1100.0,3953.0,7.27,17.8,1.320137,8.7146,9.6354,0.0,0.0,7.867,1.008008,414288000.0,0.614,1550341000.0,0.0,-0.127,0.43246,-5e-05,-0.0091,-0.04153,1968303000.0,414288000.0,9.708591,0.0,0.0,-1136053000.0,-1.364673,0.620417,-2.9e-05
4,ALPA4.SA,Alpargatas S.A.,Consumer Cyclical,Footwear & Accessories,5350758000.0,6395236000.0,4022153000.0,-0.05671,-0.06434,0.43,0.571,-198000.0,0.0,14.555555,1132100.0,5605825.0,6.81,22.51,1.330322,8.3228,9.2729,0.0,0.0,7.867,0.99911,414288000.0,0.614,1550341000.0,0.0,-0.127,0.43246,-5e-05,-0.0091,-0.04153,1968303000.0,414288000.0,9.708591,0.0,0.0,-1136053000.0,-1.364673,0.62893,-2.9e-05


In [7]:
df_fundamentals_numeric_cols = df_fundamentals_book.select_dtypes(include=["int", "number", "float64"])
df_fundamentals_numeric_cols.head(5)

Unnamed: 0,market_cap,enterprise_value,total_revenue,profit_margins,operating_margins,dividend_rate,beta,ebitda,trailing_pe,forward_pe,volume,average_volume,fifty_two_week_low,fifty_two_week_high,price_to_sales_trailing_12_months,fifty_day_average,two_hundred_day_average,trailing_annual_dividend_rate,trailing_annual_dividend_yield,book_value,price_to_book,total_cash,total_cash_per_share,total_debt,earnings_quarterly_growth,revenue_growth,gross_margins,ebitda_margins,return_on_assets,return_on_equity,gross_profits,total_assets_approx,asset_turnover,earnings_growth_rate,dividend_payout_ratio,equity,debt_to_equity,roi,roce
0,4265434000.0,14773390000.0,1941779000.0,0.41576,0.38826,1.56,0.679,0.0,4.069768,4.706601,92300.0,747165.0,15.85,21.99,2.196663,19.3382,18.14667,1.55,0.080687,24.518,0.785138,7774306000.0,35.162,18298460000.0,0.001,0.003,0.0,0.0,0.0153,0.1568,1973086000.0,7774306000.0,0.249769,0.1,155000.0,-10524160000.0,-1.73871,0.131438,0.0
1,2466480000.0,2912933000.0,1249437000.0,0.21493,0.25031,3.21,0.432,264892000.0,9.450382,6.332481,298100.0,666692.0,22.29,32.71,1.974073,27.0106,25.58635,3.24,0.132029,22.237,1.11346,383837000.0,3.885,872075000.0,6.801,0.671,0.25252,0.21201,0.03839,0.1217,315504000.0,383837000.0,3.255124,680.1,47.640053,-488238000.0,-1.786168,0.428927,0.079343
2,42288820000.0,55243050000.0,10317460000.0,0.07639,0.33544,0.07,0.227,4522541000.0,54.309525,21.72381,5733400.0,14644522.0,16.21,24.44,4.098764,22.5852,20.95235,0.066,0.002993,8.334,2.736981,7656040000.0,4.132,21843200000.0,3.935,0.121,0.34493,0.43834,0.04252,0.05163,3146360000.0,7656040000.0,1.347623,393.5,1.677255,-14187160000.0,-1.539646,0.186765,0.070519
3,5309793000.0,6482982000.0,4022153000.0,-0.05671,-0.06434,0.4,0.571,-198000.0,0.0,0.0,1100.0,3953.0,7.27,17.8,1.320137,8.7146,9.6354,0.0,0.0,7.867,1.008008,414288000.0,0.614,1550341000.0,0.0,-0.127,0.43246,-5e-05,-0.0091,-0.04153,1968303000.0,414288000.0,9.708591,0.0,0.0,-1136053000.0,-1.364673,0.620417,-2.9e-05
4,5350758000.0,6395236000.0,4022153000.0,-0.05671,-0.06434,0.43,0.571,-198000.0,0.0,14.555555,1132100.0,5605825.0,6.81,22.51,1.330322,8.3228,9.2729,0.0,0.0,7.867,0.99911,414288000.0,0.614,1550341000.0,0.0,-0.127,0.43246,-5e-05,-0.0091,-0.04153,1968303000.0,414288000.0,9.708591,0.0,0.0,-1136053000.0,-1.364673,0.62893,-2.9e-05


### Dataset Infos

#### Data Table

In [7]:
df_fundamentals_book.shape

(291, 43)

- The dataset contains 291 tickers, one ticker per row, and 43 columns. Each column represents a characteristic of the companies.

In [8]:
df_fundamentals_book.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 291 entries, 0 to 290
Data columns (total 43 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   ticker                             291 non-null    object 
 1   long_name                          291 non-null    object 
 2   sector                             291 non-null    object 
 3   industry                           291 non-null    object 
 4   market_cap                         291 non-null    float64
 5   enterprise_value                   291 non-null    float64
 6   total_revenue                      291 non-null    float64
 7   profit_margins                     291 non-null    float64
 8   operating_margins                  291 non-null    float64
 9   dividend_rate                      291 non-null    float64
 10  beta                               291 non-null    float64
 11  ebitda                             291 non-null    float64

- The dataset contains numerical data typed as float64 and string data typed as objects.

In [9]:
df_fundamentals_book.describe()

Unnamed: 0,market_cap,enterprise_value,total_revenue,profit_margins,operating_margins,dividend_rate,beta,ebitda,trailing_pe,forward_pe,volume,average_volume,fifty_two_week_low,fifty_two_week_high,price_to_sales_trailing_12_months,fifty_day_average,two_hundred_day_average,trailing_annual_dividend_rate,trailing_annual_dividend_yield,book_value,price_to_book,total_cash,total_cash_per_share,total_debt,earnings_quarterly_growth,revenue_growth,gross_margins,ebitda_margins,return_on_assets,return_on_equity,gross_profits,total_assets_approx,asset_turnover,earnings_growth_rate,dividend_payout_ratio,equity,debt_to_equity,roi,roce
count,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0
mean,17877950000.0,34055330000.0,17218260000.0,0.286119,-0.092924,29.014124,0.744103,4008850000.0,11.956751,4.730074,1397870.0,3309691.0,16.10192,49.003587,3.227017,23.392806,29.528437,1.023897,0.035842,-0.777076,5.480571,9627440000.0,14.046144,25865840000.0,0.406457,0.04099,0.303825,0.372809,0.043399,0.131033,7152668000.0,9627440000.0,3088.299716,40.645704,619.736455,-16238400000.0,3.85888,1.162924,0.067012
std,53776190000.0,113855800000.0,57123660000.0,1.559603,1.635946,462.561653,0.426138,24692480000.0,21.802508,8.039842,6385571.0,9698547.0,22.40023,382.659743,16.15204,48.4245,157.775895,3.804884,0.053032,365.198328,39.448676,44073360000.0,47.094185,109287400000.0,2.792984,0.440188,0.279777,3.718641,0.064184,0.515619,31308200000.0,44073360000.0,51974.669936,279.298426,9088.898756,73240770000.0,44.573718,2.947883,0.511509
min,1257716.0,-6435227000.0,-42424000.0,-2.16207,-14.46084,0.0,-0.19,-1011200000.0,0.0,-10.142858,0.0,0.0,0.1,1.57,-1.871929,0.6696,1.0888,0.0,0.0,-3962.89,0.0,6000.0,0.0,0.0,-0.999,-0.983,-0.845,-1.43019,-0.26623,-1.37874,-498165000.0,5929.602,-4.214609,-99.9,0.0,-813710600000.0,-43.154525,-8.232338,-6.593215
25%,279944000.0,610828100.0,533344000.0,0.00652,0.03824,0.085,0.464,7221336.0,0.315364,0.0,200.0,3320.5,4.015,8.94,0.339396,6.0067,6.008975,0.0,0.0,4.655,0.579816,99261000.0,1.095,230108000.0,-0.25,-0.14,0.13761,0.023655,0.00854,0.0,9024000.0,99261010.0,2.403518,-25.0,0.0,-6422202000.0,-1.454222,0.316042,0.012705
50%,2451804000.0,4040345000.0,2344667000.0,0.08033,0.1432,0.37,0.68,253517000.0,7.008511,0.0,29000.0,129168.0,9.08,16.69,0.830885,12.5082,12.0167,0.149,0.014087,9.92,0.968048,686056000.0,3.383,1625717000.0,0.0,0.008,0.2703,0.12425,0.03775,0.09427,543362000.0,686056000.0,5.04258,0.0,0.0,-1021396000.0,-1.18464,0.630443,0.096003
75%,9835650000.0,19145860000.0,10597480000.0,0.20718,0.23918,1.255,1.029,1805205000.0,12.451811,7.758718,635200.0,2251468.0,20.74,33.775,1.885794,26.4887,25.906325,0.754,0.049804,21.277,1.812316,2513393000.0,7.8455,10925280000.0,0.0205,0.143,0.448135,0.27106,0.07258,0.183505,3188546000.0,2513393000.0,9.667814,2.05,0.0,-1298000.0,-1.000198,1.29337,0.163731
max,483211900000.0,957440600000.0,581563000000.0,21.75749,1.00816,7891.55,2.014,291087000000.0,220.2381,68.823524,94508200.0,106342000.0,242.05,6530.0,233.91455,711.9484,2670.7747,57.786,0.267729,4005.665,546.6667,394325000000.0,542.057,895776500000.0,27.886,3.683,1.0,63.405792,0.37492,5.29682,334100000000.0,394325000000.0,886654.94109,2788.6,155000.0,10052270000.0,521.855886,34.366094,2.561582


In [10]:
df_fundamentals_book.nunique()

ticker                               290
long_name                            228
sector                                11
industry                              78
market_cap                           290
enterprise_value                     290
total_revenue                        227
profit_margins                       222
operating_margins                    228
dividend_rate                        137
beta                                 211
ebitda                               210
trailing_pe                          222
forward_pe                           148
volume                               197
average_volume                       283
fifty_two_week_low                   272
fifty_two_week_high                  272
price_to_sales_trailing_12_months    290
fifty_day_average                    290
two_hundred_day_average              290
trailing_annual_dividend_rate        125
trailing_annual_dividend_yield       167
book_value                           228
price_to_book   

- Analyzing the unique data, some points stand out. Out of 291 tickers, we have 228 companies related, so we assume that a company has more than one ticker, for example, Petrobras and Itaú (PETR3.SA, PETR4.SA, and ITUB3.SA, ITUB4.SA). Another point to note is the number of sectors compared to the number of companies; we have 11 sectors for 291 companies.

#### Coefficient of variation

In [50]:
df_fundamentals_book_cvs = ((df_fundamentals_numeric_cols.mean() / df_fundamentals_numeric_cols.std())*100).sort_values(ascending=True).reset_index()
df_fundamentals_book_cvs.columns = ['column', 'coefficient_variation']

fig = px.bar(df_fundamentals_book_cvs, title='Coefficient of Variation (%)', x='column', y='coefficient_variation', color_discrete_sequence=['rgb(100, 195, 181)'], hover_name='column', height=600)

fig.update_traces(text=[f'{x:.0f}%' for x in df_fundamentals_book_cvs['coefficient_variation']], textposition='outside')
fig.update_layout(xaxis_title='Columns', yaxis_title='Coefficient of Variation (%)', template='plotly_dark', font=dict(color='white'))

fig.show()

The bar chart depicts the coefficient of variation (CV) for a range of financial metrics, likely related to one or more companies. The CV is a measure of relative dispersion indicating the data variation in relation to the mean. Observations from the chart suggest:

- **High Variability**: The `beta` and `gross_margins` variables have the highest CVs, at **175%** and **109%**, respectively. This indicates very high variability in the data for these metrics. A high CV in `beta` implies that the asset's return is highly volatile compared to the market, while a high CV in `gross_margins` may point to significant inconsistencies in production efficiency or sales pricing over time.

- **Moderate Variability**: Variables such as `fifty_two_week_high`, `total_assets`, `price_to_sales_trailing_12_months`, and `volume` exhibit moderate variability, with CVs ranging from approximately **48%** to **72%**. While there are some fluctuations in relation to the mean, it is not as pronounced as those variables with high variability.

- **Low Variability or Possible Errors**: Some variables display very low or negative CVs, like `equity`, `operating_margins`, and `book_value`. Negative CVs are atypical and could suggest an error in the data or in the calculation of the CV. For instance, a negative CV value could occur if the mean of a metric is negative, which is feasible with metrics like net profit if a company is consistently incurring losses. However, it's more likely to be an error in visualization or computation.

In terms of comparative analysis, a high coefficient of variation points to greater risk and uncertainty surrounding the corresponding metric. Investors and analysts might be more interested in metrics with high variability to understand the causes behind such uncertainty. Conversely, metrics with low variability are seen as more predictable and may be deemed more stable for long-term trend analysis.

It's important to note that the interpretation of CV is context-dependent. For example, high variability in `beta` might be interpreted differently by conservative investors and speculators. Similarly, high variations in `gross_margins` could be a red flag for operational issues or opportunities for cost optimization.

#### Correlations

In [11]:
fundamentals_corr = df_fundamentals_numeric_cols.corr()
fundamentals_corr.head(5)

Unnamed: 0,market_cap,enterprise_value,total_revenue,profit_margins,operating_margins,dividend_rate,beta,ebitda,trailing_pe,forward_pe,volume,average_volume,fifty_two_week_low,fifty_two_week_high,price_to_sales_trailing_12_months,fifty_day_average,two_hundred_day_average,trailing_annual_dividend_rate,trailing_annual_dividend_yield,book_value,price_to_book,total_cash,total_cash_per_share,total_debt,earnings_quarterly_growth,revenue_growth,gross_margins,ebitda_margins,return_on_assets,return_on_equity,gross_profits,total_assets_approx,asset_turnover,earnings_growth_rate,dividend_payout_ratio,equity,debt_to_equity,roi,roce
market_cap,1.0,0.84367,0.821591,-0.003128,0.079604,-0.018939,0.019453,0.788431,-0.01183,0.08889,0.203687,0.385409,0.088372,-0.011959,-0.015582,0.036404,-0.003787,0.150759,0.296205,0.019246,-0.022667,0.588672,-0.0081,0.631356,0.021919,-0.063475,0.007572,-0.006035,0.14717,0.060959,0.928809,0.588672,-0.01962,0.021919,-0.0091,-0.587849,0.186278,-0.065699,0.047566
enterprise_value,0.84367,1.0,0.675644,-0.010525,0.069891,-0.016805,-0.000411,0.531132,-0.031284,0.040925,0.166592,0.323804,0.066398,-0.011351,-0.023906,0.028148,-0.004601,0.114904,0.224523,0.017487,-0.024948,0.717299,0.010304,0.909718,0.007621,-0.059935,-0.080652,-0.012526,0.053299,0.03739,0.829091,0.717299,-0.017718,0.007621,0.006728,-0.925808,0.007676,-0.066926,0.024682
total_revenue,0.821591,0.675644,1.0,-0.028594,0.057417,-0.016661,0.060641,0.88044,-0.043862,0.067601,0.206989,0.361693,0.052057,-0.012243,-0.040985,0.020596,-0.00769,0.162399,0.334236,0.016996,-0.022205,0.295011,-0.024416,0.408667,0.006072,-0.095971,-0.02999,-0.011383,0.131854,0.026344,0.875694,0.295011,-0.012492,0.006072,-0.01384,-0.432273,0.00708,0.00433,0.068016
profit_margins,-0.003128,-0.010525,-0.028594,1.0,-0.31196,-0.01469,-0.248073,-0.005791,-0.011816,-0.06473,-0.029118,-0.033683,0.173645,-0.006242,0.391527,0.068329,0.010066,0.029786,-0.0263,0.053952,-0.029206,-0.007916,0.541173,-0.011993,-0.003671,0.001915,0.318591,0.133501,-0.051338,0.396223,-0.009257,-0.007916,-0.01764,-0.003671,0.005089,0.013132,0.000882,-0.05434,-0.219884
operating_margins,0.079604,0.069891,0.057417,-0.31196,1.0,-0.072953,0.004847,0.041584,0.097417,0.104125,0.033193,0.057819,0.010835,-0.068402,-0.028166,-0.031043,-0.063439,0.081445,0.147257,-0.04064,0.014484,0.051835,-0.263964,0.055215,0.025746,0.222719,0.214595,0.011734,0.282369,-0.077433,0.056205,0.051835,-0.003603,0.025746,0.020007,-0.051198,0.027138,0.005628,0.219734


In [12]:
color_scale = [[0, 'rgb(150, 245, 231)'], [0.5, 'rgb(100, 195, 181)'], [1, 'rgb(50, 145, 131)']]
fig = px.imshow(fundamentals_corr, text_auto=True, aspect="auto", width=3000, height=1000, template="plotly_dark", title="Heatmap: Correlation between the variables", color_continuous_scale=color_scale)
fig.show()

##### Dispertion Matrix

In [None]:
fig = px.scatter_matrix(fundamentals_corr, dimensions=fundamentals_corr.columns, title='Scatter Plots for All Columns', template="plotly_dark",)
fig.update_traces(marker=dict(color='rgb(100, 195, 181)'))

fig.update_layout(width=4000, height=8000, grid=dict(xgap=0.1, ygap=0.1))

fig.show()

##### Strong positive correlation.

- **`total_cash`** & **`total_assets_approx`**: Correlation of **1.0**, indicating a perfect positive correlation. The total cash is an exact indicator of the total approximate assets, suggesting that cash might be a significant component of the assets.

- **`earnings_growth_rate`** & **`earnings_quarterly_growth`**: Correlation of **1.0**, implying that the growth rate of earnings is perfectly mirrored on a quarterly basis, indicating consistent growth patterns.

- **`fifty_two_week_high`** & **`dividend_rate`**: Correlation of **0.996772**, showing a nearly perfect positive relationship, suggesting that stocks hitting their annual high often have high dividend rates, indicating potentially profitable investments.

- **`fifty_two_week_high`** & **`two_hundred_day_average`**: Correlation of **0.995925**, indicating that stocks at their annual highs typically also have high two hundred day averages, suggesting sustained stock performance.

- **`dividend_rate`** & **`two_hundred_day_average`**: Correlation of **0.985873**, showing that stocks with higher dividend rates usually have higher long-term average prices, potentially indicating stable, profitable companies.

- **`market_cap`** & **`gross_profits`**: Correlation of **0.928809**, indicating that companies with larger market capitalizations tend to have higher gross profits, suggesting a scale effect in profitability.

- **`fifty_day_average`** & **`two_hundred_day_average`**: Correlation of **0.918297**, suggesting a strong relationship between the short-term and medium-term average stock prices, indicating consistent stock performance.

- **`total_debt`** & **`enterprise_value`**: Correlation of **0.909718**, suggesting that companies with higher debt also tend to have higher enterprise values, which might indicate debt as a significant factor in the overall valuation.

- **`volume`** & **`average_volume`**: Correlation of **0.902085**, indicating that trading volumes are consistent with their average values, suggesting stable trading interest.

- **`ebitda`** & **`gross_profits`**: Correlation of **0.893112**, showing a strong positive relationship, suggesting that operational earnings are closely linked to overall profitability.

- **`total_debt`** & **`total_assets_approx`**: Correlation of **0.884635**, indicating that companies with higher debt levels also tend to have higher total assets, possibly reflecting asset-backed borrowing.

- **`ebitda`** & **`total_revenue`**: Correlation of **0.880440**, suggesting that companies with higher earnings before interest, taxes, depreciation, and amortization also report higher revenues, indicating operational efficiency.

- **`fifty_two_week_high`** & **`fifty_day_average`**: Correlation of **0.879936**, indicating that stocks with a higher fifty-day average tend to reach their annual highs, suggesting positive short-term momentum.

- **`total_revenue`** & **`gross_profits`**: Correlation of **0.875694**, indicating that companies with higher revenues also have higher gross profits, underlining the importance of sales volume in profitability.

- **`market_cap`** & **`enterprise_value`**: Correlation of **0.843670**, suggesting that market capitalization is strongly related to enterprise value, indicating that the market valuation reflects the company's total worth including debt and cash.

- **`dividend_rate`** & **`fifty_day_average`**: Correlation of **0.840190**, showing that stocks with higher dividend rates also tend to have higher short-term average prices, suggesting a reward for consistent performance.

- **`ebitda_margins`** & **`price_to_sales_trailing_12_months`**: Correlation of **0.837484**, indicating that companies with higher profit margins tend to have higher price-to-sales ratios, suggesting that margins are a key factor in sales valuations.

- **`gross_profits`** & **`enterprise_value`**: Correlation of **0.829091**, suggesting that companies with higher gross profits also tend to have higher enterprise values, implying that profitability is a key component of company valuation.

- **`total_revenue`** & **`market_cap`**: Correlation of **0.821591**, indicating that companies with higher revenues generally have higher market capitalizations, suggesting that revenue is a primary driver of market valuation.

- **`ebitda`** & **`market_cap`**: Correlation of **0.788431**, showing that companies with higher operational earnings (EBITDA) also tend to have higher market capitalizations, indicating that operational efficiency translates to market value.

- **`fifty_two_week_low`** & **`fifty_day_average`**: Correlation of **0.754229**, suggesting that there is a relationship between the fifty-day average price of a stock and its annual low, which might indicate recovery or decline trends.

- **`enterprise_value`** & **`total_assets_approx`**: Correlation of **0.717299**, indicating that there is a positive relationship between the value of a company including its debt and cash and its total approximate assets.

- **`total_cash`** & **`enterprise_value`**: Correlation of **0.717299**, indicating that companies with higher cash reserves tend to have higher enterprise values, suggesting that liquidity is an important factor in a company's total valuation.


##### Strong negative correlation

- **`ebitda_margins`** & **`price_to_sales_trailing_12_months`**: With a correlation of **0.837484**, this suggests that companies with higher EBITDA margins typically also have a higher price to sales ratio over the trailing twelve months, indicating that operational efficiency is valued in the market pricing.

- **`gross_profits`** & **`enterprise_value`**: The correlation of **0.829091** indicates that entities with substantial gross profits often have a high enterprise value, suggesting that profitability is a key component of a company's overall valuation.

- **`total_revenue`** & **`market_cap`**: A correlation of **0.821591** shows that companies with larger total revenues often have a larger market capitalization, suggesting a direct relationship between the size of a company's revenue and its perceived value in the stock market.

- **`ebitda`** & **`market_cap`**: The correlation of **0.788431** suggests a strong relationship between a company's earnings before interest, taxes, depreciation, and amortization and its market capitalization, indicating that companies with better operational profitability are often valued higher in the market.

- **`fifty_two_week_low`** & **`fifty_day_average`**: With a correlation of **0.754229**, this suggests that stocks with higher fifty-day averages are less likely to hit their fifty-two-week low, indicating a potential resistance to long-term lows when the short-term average is higher.

- **`enterprise_value`** & **`total_assets_approx`**: The correlation of **0.717299** implies a substantial relationship between the enterprise value and the total approximate assets of a company, suggesting that the total assets are a significant determinant of the enterprise value.

- **`total_cash`** & **`enterprise_value`**: The same correlation of **0.717299** here suggests that companies with higher cash reserves are likely to have a higher enterprise value, indicating that liquidity is a valued aspect of a company's worth.


#### Outliers

In [None]:
df = df_fundamentals_numeric_cols

fig = sp.make_subplots(rows=5, cols=8, subplot_titles=df.columns, shared_yaxes=True, horizontal_spacing=0.01, vertical_spacing=0.1)

for row in range(1, 6):

    for col in range(1, 9):

        col_name = df.columns[col]
        trace = go.Box(y=df[col_name], name=col_name, marker=dict(color='rgb(100, 195, 181)'))
        fig.add_trace(trace, row=row, col=col)

fig.update_layout(title_text="Boxplot: Outliers", height=800, width=2500, template="plotly_dark")
fig.show()

##### Detailed View of Outliers by Column

**`market_cap`**: Outliers may indicate companies with market capitalizations significantly larger or smaller than the majority, reflecting the presence of market giants or very small companies.

**`enterprise_value`**: Outliers may suggest companies with exceptional enterprise values, possibly due to substantial debt or valuable assets.

**`total_revenue`**: Discrepant values may indicate revenues significantly above or below the average, representing extraordinary variations in financial performance.

**`profit_margins` and `operating_margins`**: Outliers in these columns may represent companies with exceptionally high or low profit or operating margins, reflecting notable financial efficiency or inefficiency.

**`dividend_rate`**: Outliers may indicate the presence or absence of dividends, with extremely high or low values reflecting unusual dividend payment strategies.

**`beta`**: Outliers may point to stocks with exceptionally high or low betas, reflecting notable volatility or stability relative to the market.

**`ebitda`**: Discrepant values may represent profits before interest, taxes, depreciation, and amortization significantly different from the average, suggesting extraordinary financial events.

**`trailing_pe` and `forward_pe`**: Outliers in these columns may indicate stocks with very high or low pricetoearnings ratios, reflecting possible market deviations.

**`volume` and `average_volume`**: Discrepant values may represent stocks traded in volumes much above or below the average, reflecting unusual investor interest.

**`fifty_two_week_low` and `fifty_two_week_high`**: Outliers in these columns may indicate stocks that have reached notable price extremes in the past 52 weeks, reflecting volatility or exceptional performance.

**`price_to_sales_trailing_12_months`**: Discrepant values may suggest stocks being notably evaluated in relation to sales, reflecting market distortions.

**`fifty_day_average` and `two_hundred_day_average`**: Outliers in these columns may indicate exceptionally high or low moving averages, reflecting unusual price trends.

**`trailing_annual_dividend_rate` and `trailing_annual_dividend_yield`**: Outliers may represent companies paying exceptionally high or low dividends compared to the market average.

**`book_value`**: Discrepant values may suggest companies with exceptional book values, reflecting the presence of significant assets or high liabilities.

**`price_to_book`**: Outliers may indicate stocks that are notably evaluated in relation to book value.

**`total_cash` and `total_cash_per_share`**: Outliers may represent companies with large cash reserves or substantially different cash values compared to the average.

**`total_debt`**: Discrepant values may suggest substantial debt or the absence of debt, reflecting different financial strategies.

**`earnings_quarterly_growth` and `revenue_growth`**: Outliers may indicate exceptional earnings or revenue growth rates, suggesting remarkable performance.

**`gross_margins` and `ebitda_margins`**: Discrepant values in these columns may represent significantly high or low gross or EBITDA margins, reflecting operational efficiency or notable inefficiency.

**`return_on_assets` and `return_on_equity`**: Outliers may indicate exceptional returns on assets or equity, reflecting remarkable financial performance.

**`gross_profits` and `earnings_growth_rate`**: Outliers may represent exceptional gross profits or earnings growth rates.

**`dividend_payout_ratio` and `roi`**: Discrepant values in these columns may indicate notable dividend payout policies or returns on investment.

##### Macro View of Outliers in Financial Data

**`Company Size`:** Outliers in market values, enterprise value, and revenue may reflect large conglomerates in diversified sectors or small startups.

**`Financial Efficiency`:** Outliers in profit margins and operating margins may indicate exceptionally efficient or inefficient companies in various sectors.

**`Dividend Policies`:** Outliers in dividend rates may represent distinct dividend payment strategies among companies in different sectors.

**`Market Volatility`:** Exceptional betas may suggest the unique volatility of stocks in specific sectors.

**`Financial Events`:** Outliers in metrics such as EBITDA may reflect nonrecurring financial events, such as mergers, acquisitions, or restructurings.

**`Investor Interest`:** Stock volume and moving averages may be influenced by investor interest in specific sectors.

**`Business Cycles`:** Sectors with distinct economic cycles may lead to significant variations in financial performance.

**`Debt Policies`:** Different sectors have varied approaches to debt, resulting in varying debt values.

**`Operational Performance`:** Outliers in profit growth metrics, margins, and returns may reflect the unique performance of companies in different sectors.


#### Histograms and Distribution

In [None]:
import plotly.subplots as sp
import plotly.graph_objects as go

df = df_fundamentals_book.select_dtypes(include=["float64"])
fig = sp.make_subplots(rows=5, cols=8, subplot_titles=df.columns, shared_yaxes=True, horizontal_spacing=0.01, vertical_spacing=0.1)

for row in range(1, 6):
    for col in range(1, 9):
        subplot_num = (row - 1) * 8 + col
        if subplot_num <= len(df.columns):
            col_name = df.columns[subplot_num - 1]
            trace = go.Histogram(x=df[col_name], name=col_name, marker=dict(color = 'rgb(100, 195, 181)'), showlegend=False)
            fig.add_trace(trace, row=row, col=col)

fig.update_layout(title_text="Histogram: Distribution", height=800, width=2500, template="plotly_dark")

fig.show()


**`market_cap`:** Most companies have a market capitalization between **1.257716e+06** and **4.832232e+10**.

**`enterprise_value`:** Most companies have an enterprise value between **-6.435227e+09** and **8.995235e+10**.

**`total_revenue`:** Most companies have total revenue between **-4.242400e+07** and **5.811812e+10**.

**`profit_margins`:** Most companies have profit margins between **-2.162070** and **0.229886**.

**`operating_margins`:** Most companies have operating margins close to **0**.

**`dividend_rate`:** Most companies do not pay dividends.

**`beta`:** Most companies have a beta below **0.25**.

**`ebitda`:** Most companies have an EBITDA between **-1.011200e+09** and **2.910870e+11**.

**`trailing_pe`:** Most companies have a P/E ratio between **0** and **22**.

**`forward_pe`:** Most companies have a future P/E ratio between **-10.142858** and **68.823524**.

**`volume`:** Most companies have trading volumes below **9450820.0**.

**`average_volume`:** Most companies have average trading volumes below **10634197.9**.

**`fifty_two_week_low`:** Most companies have a 52-week low stock price between **0.100** and **24.295**.

**f`ifty_two_week_high`:** Most companies have a 52-week high stock price between **1.570** and **6530.000**.

**`price_to_sales_trailing_12_months`:** Most companies have a Price/Sales ratio between **-1.871929** and **21.706719**.

**`fifty_day_average`:** Most companies have a 50-day average stock price below **71.79748**.

**`two_hundred_day_average`:** Most companies have a 200-day average stock price below **268.05739**.

**`trailing_annual_dividend_rate`:** Most companies have a trailing annual dividend rate below **5.7786**.

**`trailing_annual_dividend_yield`:** Most companies have a yield of dividend yield retroativo inferior to **0.026773**.

**`book_value`:** Most companies have a book value between **-3962.8900** and **21.3875**.

**`price_to_book`:** Most companies have a Price/Book ratio below **54.66667**.

**`total_cash`:** Most companies have total cash between **6.000** and **3.943251e+10**.

**`total_cash_per_share`:** Most companies have total cash per share below **54.2057**.

**`total_debt`:** Most companies have total debt between **0** and **8.957765e+11**.

**`earnings_quarterly_growth`:** Most companies have quarterly earnings growth between **-0.9990** and **1.8895**.

**`revenue_growth`:** Most companies have revenue growth between **-0.9830** and **3.6830**.

**`gross_margins`:** Most companies have gross margins close to **0**.

**`ebitda_margins`:** Most companies have EBITDA margins close to **0**.

**`return_on_assets`:** Most companies have a positive **ROA**.

**`return_on_equity`:** Most companies have a positive **ROE**.

**`gross_profits`:** Most companies have gross profits between **-4.981650e+08** and **3.296165e+11**.

**`earnings_growth_rate`:** Most companies have an earnings growth rate between **-99.90** and **2788.60**.

**`dividend_payout_ratio`:** Most companies have a dividend payout ratio close to **0**.

**`roi`:** Most companies have an **ROI** between **-8.232338** and **34.366094**.
