# Business case: Understanding profitability in the US financial market

This is business case prepared for the Statistics Module (Bloque 1) of the Advanced AI concentration

## Case Description

You were hired as a data scientist in an important mutual fund firm in the department of financial analysis. The firm has been doing financial analysis and financial forecast for several years. You were hired to come up with alternative approaches to do descriptive analytics in order to find better future alternatives for forecasting methods.

You have to analyze historical quarterly financial statements of all US public firms listed in the New York Exchange and NASDAQ. You will receive this dataset in a .csv format.

You have to carefully read the data dictionary to understand each variable and the dataset to understand its structure.

## Business Questions

All your data and statistical analysis has to be tailored to respond the following questions:

### General questions:

By industry, what is the composition of US public firms in terms of firm size, sales performance and profitability?

Why some firms are more profitable than others? Which factors/variables from financial statements are related to stock returns

### Specific questions:
-----

In [35]:
# Data management module
import numpy as np
import pandas as pd
# Visualization modules
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
# Linear regression modules
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from scipy.stats.mstats import winsorize
# Statistic module
import statsmodels.api as sm
import statsmodels.formula.api as smf

#### Open de .csv files

In [36]:
# https://apradie.com/datos/us2022q2a.csv
us = pd.read_csv('us2022q2a.csv')

# https://apradie.com/datos/usfirms2022.csv
firms = pd.read_csv('usfirms2022.csv')

In [37]:
us.columns

Index(['firm', 'q', 'revenue', 'cogs', 'sgae', 'otheropexp', 'extraincome',
       'finexp', 'incometax', 'totalassets', 'totalliabilities', 'shortdebt',
       'longdebt', 'stockholderequity', 'adjprice', 'originalprice',
       'sharesoutstanding', 'fiscalmonth', 'year', 'cto'],
      dtype='object')

In [38]:
firms.columns

Index(['Ticker', 'Name', 'N', 'Class', 'Country\nof Origin', 'Type of Asset',
       'Sector NAICS\nlevel 1', 'Exchange / Src', 'Sector\nEconomatica',
       'Sector NAICS\nlast available', 'partind'],
      dtype='object')

#### Merge both tables

In [39]:
us_firms = us.merge(firms, left_on='firm', right_on='Ticker')
us_firms.head()

Unnamed: 0,firm,q,revenue,cogs,sgae,otheropexp,extraincome,finexp,incometax,totalassets,...,Name,N,Class,Country\nof Origin,Type of Asset,Sector NAICS\nlevel 1,Exchange / Src,Sector\nEconomatica,Sector NAICS\nlast available,partind
0,A,2000q1,,,,,,,,,...,"Agilent Technologies, Inc",94,Com,US,Stock,Manufacturing,NYSE,Electric Electron,"Navigational, Measuring, Electromedical, and C...",0.124
1,A,2000q2,2485000.0,1261000.0,1010000.0,0.0,42000.0,0.0,90000.0,7321000.0,...,"Agilent Technologies, Inc",94,Com,US,Stock,Manufacturing,NYSE,Electric Electron,"Navigational, Measuring, Electromedical, and C...",0.124
2,A,2000q3,2670000.0,1369000.0,1091000.0,0.0,28000.0,0.0,83000.0,7827000.0,...,"Agilent Technologies, Inc",94,Com,US,Stock,Manufacturing,NYSE,Electric Electron,"Navigational, Measuring, Electromedical, and C...",0.124
3,A,2000q4,3372000.0,1732000.0,1182000.0,0.0,10000.0,0.0,163000.0,8425000.0,...,"Agilent Technologies, Inc",94,Com,US,Stock,Manufacturing,NYSE,Electric Electron,"Navigational, Measuring, Electromedical, and C...",0.124
4,A,2001q1,2841000.0,1449000.0,1113000.0,0.0,-6000.0,0.0,119000.0,9208000.0,...,"Agilent Technologies, Inc",94,Com,US,Stock,Manufacturing,NYSE,Electric Electron,"Navigational, Measuring, Electromedical, and C...",0.124


#### Get variable calculations

Variable calculations:
  - Firm size measures:
    + Book value of the firm = (totalassets-totalliabilities)
    + Market value = (precio del stock histórico)
    + Market value = (originalprice * sharesoutstanding)
    <br><br>
  - Profit Margin measures:
    + Operating profit margin = operating profit / sales
      * Operating profit = ebit = (revenue - cogs - sgae - otheropexp)
      * Cogs = Cost of Good Sold = Variable cost
      * Sgae = Sales and General Administrative Expenses = Fixed costs
      * Ebit = Earning before Interst and Taxes = Operating profit

	ebit = revenue - cogs - sgae - otheropexp<br>
	operating profit margin = opm = ebit / revenue<br>
  revenue = sales<br>

    Profit margin = Net income / sales<br>
    Net income = ebit - incometax - finexp<br>
    Income tax = what the firm pays in taxes (for the government)(impuesto sobre la renta)<br>
    Finexp = financial expenses = what the firm pays in interest expenses for any loan that the firm issued<br>

In [40]:
# Firm size measures
us_firms['Book value of the firms'] = us_firms['totalassets'] - us_firms['totalliabilities']
us_firms['Market value'] = us_firms['originalprice'] * us_firms['sharesoutstanding']

# Profit Margin measures
us_firms['Operating profit'] = us_firms['revenue'] - us_firms['cogs'] - us_firms['sgae'] - us_firms['otheropexp']
# us_firms['Ebit'] = us_firms['revenue'] - us_firms['cogs'] - us_firms['sgae'] - us_firms['otheropexp']
us_firms['opm'] = us_firms['Operating profit'] / us_firms['revenue']
us_firms['Net income'] = us_firms['Operating profit'] - us_firms['incometax'] - us_firms['finexp']
us_firms['Profit margin'] = us_firms['Net income'] / us_firms['revenue']
us_firms.head()

Unnamed: 0,firm,q,revenue,cogs,sgae,otheropexp,extraincome,finexp,incometax,totalassets,...,Exchange / Src,Sector\nEconomatica,Sector NAICS\nlast available,partind,Book value of the firms,Market value,Operating profit,opm,Net income,Profit margin
0,A,2000q1,,,,,,,,,...,NYSE,Electric Electron,"Navigational, Measuring, Electromedical, and C...",0.124,,47008000.0,,,,
1,A,2000q2,2485000.0,1261000.0,1010000.0,0.0,42000.0,0.0,90000.0,7321000.0,...,NYSE,Electric Electron,"Navigational, Measuring, Electromedical, and C...",0.124,4642000.0,33355060.0,214000.0,0.086117,124000.0,0.049899
2,A,2000q3,2670000.0,1369000.0,1091000.0,0.0,28000.0,0.0,83000.0,7827000.0,...,NYSE,Electric Electron,"Navigational, Measuring, Electromedical, and C...",0.124,4902000.0,22169400.0,210000.0,0.078652,127000.0,0.047566
3,A,2000q4,3372000.0,1732000.0,1182000.0,0.0,10000.0,0.0,163000.0,8425000.0,...,NYSE,Electric Electron,"Navigational, Measuring, Electromedical, and C...",0.124,5265000.0,24986060.0,458000.0,0.135824,295000.0,0.087485
4,A,2001q1,2841000.0,1449000.0,1113000.0,0.0,-6000.0,0.0,119000.0,9208000.0,...,NYSE,Electric Electron,"Navigational, Measuring, Electromedical, and C...",0.124,5541000.0,14036530.0,279000.0,0.098205,160000.0,0.056318


#### Get actual information

In [41]:
aux = us_firms['q'] == '2022q2'
us_firms2022 = us_firms[aux]
us_firms2022.head()

Unnamed: 0,firm,q,revenue,cogs,sgae,otheropexp,extraincome,finexp,incometax,totalassets,...,Exchange / Src,Sector\nEconomatica,Sector NAICS\nlast available,partind,Book value of the firms,Market value,Operating profit,opm,Net income,Profit margin
89,A,2022q2,1607000.0,746000.0,501000.0,0.0,-7000.0,20000.0,59000.0,10455000.0,...,NYSE,Electric Electron,"Navigational, Measuring, Electromedical, and C...",0.124,5122000.0,35477560.0,360000.0,0.22402,281000.0,0.17486
179,AA,2022q2,3644000.0,2767000.0,220000.0,-75000.0,81000.0,30000.0,234000.0,15709000.0,...,NYSE,Basic & Fab Metal,Alumina and Aluminum Production and Processing,-,7292000.0,8407171.0,732000.0,0.200878,468000.0,0.12843
269,AAIC,2022q2,10900.0,6374.0,0.0,0.0,-3417.0,0.0,802.0,1084755.0,...,NYSE,Funds,Other Investment Pools and Funds,-,213698.0,113803.3,4526.0,0.415229,3724.0,0.341651
359,AAL,2022q2,13422000.0,0.0,12405000.0,0.0,25000.0,439000.0,127000.0,67963000.0,...,NASDAQ,Transportat Serv,Scheduled Air Transportation,0.032,-8422000.0,8235848.0,1017000.0,0.075771,451000.0,0.033602
449,AAME,2022q2,44669.0,0.0,46784.0,0.0,0.0,0.0,-436.0,379274.0,...,NASDAQ,Finance and Insurance,Insurance Carriers,-,109101.0,54463.99,-2115.0,-0.047348,-1679.0,-0.037588


#### __About descriptive statistics:__

##### __Considering the most recent financial quarter of the dataset:__

- __Show how many firms by industry there are in the sample__

In [42]:
firmsbyindustry = firms['Sector NAICS\nlevel 1'].value_counts()
firmsbyindustry

Manufacturing                                                               1567
Finance and Insurance                                                        703
Information                                                                  263
Retail Trade                                                                 152
Professional, Scientific, and Technical Services                             145
Administrative and Support and Waste Management and Remediation Services     133
Mining, Quarrying, and Oil and Gas Extraction                                104
Wholesale Trade                                                               79
Utilities                                                                     77
Transportation and Warehousing                                                69
Accommodation and Food Services                                               69
Real Estate and Rental and Leasing                                            68
Health Care and Social Assis

In [43]:
px.histogram(firms['Sector NAICS\nlevel 1'])

EN EL HISTOGRAMA DE ARRIBA MOSTRANDO LAS DIFERENTES TIPOS DE INDUSTRIA EN LAS QUE PERTENCEN TODAS LAS EMPRESAS.

- __For each industry (and for all industries), what can you say about the typical firm size in terms of market value and book value? How much these variables change within each industry? How firm size (in market value) is distributed?__

#### Market Value analisys

In [None]:
px.line(us_firms, x = 'q', y = 'Market value', color = 'Sector NAICS\nlevel 1')

In [None]:
px.histogram(us_firms2022['Market value'])

OBTENEMOS LA INFORMACIÓN RESTANTE PORQUE LOS DATOS DE LA MEDIA SE ENCUENTRAN MUY SESGADOS Y NO NOS DA LA SUFICIENTE INFORMACIÓN.

In [None]:
us_firms2022['Market value'].describe()

#### Book value of the firms analisys

In [None]:
px.line(us_firms, x = 'q', y = 'Book value of the firms', color = 'Sector NAICS\nlevel 1')

In [None]:
px.histogram(us_firms2022['Book value of the firms'])

OBTENEMOS LA INFORMACIÓN RESTANTE PORQUE LOS DATOS DE LA MEDIA SE ENCUENTRAN MUY SESGADOS Y NO NOS DA LA SUFICIENTE INFORMACIÓN.

In [None]:
us_firms2022['Book value of the firms'].describe()

- __MARKET VALUE__: LA GRÁFICA DE LINEA NOS MUESTRA COMO SE COMPORTA EL VALOR QUE TIENE CADA EMPRESA POR INDUSTRIA CON RESPECTO A TODAS LAS DEMÁS EN EL TIEMPO, CON LO ANTERIOR EN MENTE PODEMOS OBSERVAR QUE EL VALOR EN EL MECADO DE LAS EMPRESAS SE HAN MANTENIDO, SIN EMBARGO, HAY VARIAS INDUSTRIAS QUE AUMENTARON SU VALOR DE UNA MANERA EXAGERADA LO QUE CAUSA UN SESGO EN EL PROMEDIO.

    POSTERIORMENTE PARA CONFIRMAR EL SESGO EXISTENTE DESPLEGAMOS UN HISTOGRAMA QUE EFECTIVAMENTE CONFIRMA LO QUE SUPONIAMOS, POR ESO DESPLEGAMOS MÁS INFORMACIÓN PARA ENTENDER LOS DIFENTES VALORES QUE SE PUEDEN PRESENTAR.

----

- __BOOK VALUE__: EN LAS GRÁFICAS QUE REPRESENTAN LA INFORMACIÓN DE BOOK VALUE OF FIRMS SE REFIERE AL VALOR EN LIBROS QUE TIENE CADA EMPRESA POR INDUSTRIA EN EL TIEMPO, LO QUE PODEMOS OBSERVAR EN ESTE GRÁFICO ES QUE EL VALOR DE LAS EMPRESAS DE LAS DIFERENTES INDUSTRIAS VARIAN MUCHO, PORQUE HAY VARIAS EMPRESAS QUE TIENEN DEUDAS ENORMES, SIN EMBARGO NO AFECTA EN GRAN MEDIDA SU VALOR EN MERCADO DEBIDO A LAS OTRAS EMPRESAS DE DICHA INDUSTRIA.

    CON EL HISTOGRAMA QUE SE USO POSTERIORMENTE PODEMOS VER UN GRAN SESGO, PORQUE HAY EMPRESAS CON MUCHISIMO MÁS VALOR QUE OTRAS, POR LO QUE MOSTRAMOS LA DEMÁS INFORMCIÓN PARA VER LA DIFERENCIA QUE PODEMOS ENCONTRAR.

- __For each industry (and for all industries), what can you say about profit margin of firms? show a) descriptive statistics of profit margin and b) plot(s) to illustrate how profit margin changes across industries.__

In [None]:
us_firms2022['Profit margin'].describe()

In [None]:
plt.figure(figsize=(20,10))
px.box(us_firms2022, x = 'Sector NAICS\nlevel 1', y= 'Profit margin')

LA INFORMACIÓN QUE NOS PRESENTA EL PROFIT MARGIN ES BASTANTE INTERESANTE, PORQUE NOS DICE QUE HAY VALORES 'INF' LO QUE QUIERE DECIR QUE HAY VENTAS O REVENUES IGUALES O MENORES A 0, LO QUE AFECTA LA INFORMACIÓN QUE SE VA DESPLEGANDO. POSTERIORMENTE, HACEMOS UN GRÁFICO DE BIGOTE QUE SE DEBERIA REPRESENTAR EL PROFIT MARGIN QUE TIENE CADA INDUSTRIA, ES DECIR, COMO VAN LAS GANACIAS DE CADA UNA DE ESTAS INDUSTRIA, POR LO MENCIONADO AL PRINCIPIO NO PODEMOS DAR UNA LECTURA CORRECTA DE LA INFORMACIÓN PRESENTADA.

- __Which are the biggest 10 US firms in terms of market value and how far they are from the typical size of a US firm?__

In [None]:
top10 = us_firms2022.sort_values('Market value', ascending=False).head(10)
top10

ARRIBA SE PRESENTAN LAS 10 EMPRESAS CON MAYOR VALOR EN EL MERCADO, TODAS TENIENDO UN VALOR DE 3 ORDENES DE MAGNITUD MAYOR EN COMPARACIÓN A LA MITAD.

- __Which are the biggest 10 US firms in terms of book value and how far they are from the typical size of a US firm?__

In [None]:
top10 = us_firms2022.sort_values('Book value of the firms', ascending=False).head(10) # Valor contable de la empresa
top10

ARRIBA SE PRESENTAN LAS 10 EMPRESAS CON MAYOR BOOK VALUE, TODAS TENIENDO UN VALOR DE 3 ORDENES DE MAGNITUD MAYOR EN COMPARACIÓN A LA MITAD.

##### __Considering the whole history of financial data for all firms:__

- __How can you measure firm profitability that can be used to compare performance among firms of different sizes? Select and justify at least 3 measures and show descriptive statistics__

In [None]:
df_corrs = us_firms.corr()["opm"]

In [None]:
px.histogram(us_firms2022, x='firm', y='opm')

LA INFORMACIÓN OBTENIDA EN LA PARTE DE ARRIBA PODEMOS SABER QUE VARIABLES TIENE UNA MAYOR CORRELACIÓN CON NUESTRO OPM (OPERATING PROFIT MARGIN) QUE REPRESENTA 

PODEMOS OBSERVAR QUE LA INFORMACIÓN DE LAS 3 MEDIDAS SELECCIONADAS DESCRIBEN DE MANERA ADECUADA LA NATURALEZA DEL BENEFICIO EN LAS MUESTRAS DE NUESTRAS FIRMAS, OBSERVANDO EL COMPORTAMIENTO DE MANERA MÁS REPRESENTATIVA EN LAS GRÁFICAS DE ARRIBA Y PODER TENER UN BUEN ENTENDIMIENTO SOBRE EL COMPORTAMIENTO DEL MERCADO.


__FALTA RESPONDER__ ....

- __Calculate and explain earnings per share deflated by price.__

In [None]:
# Getting earnings per share
us_firms['EPS'] = us_firms['Net income'] / us_firms['sharesoutstanding'] * us_firms['adjprice'] / us_firms['originalprice']  
us_firms

LAS GANACIAS POR ACCIÓN SE DEFLECTAN POR EL PRECIO DE UNA MANERA EN LA QUE 

__FALTA RESPONDER__
....

-----

#### __About statistical modeling__

- You have to select a group of firms according to their general industry classification:
    - Manufacturing industries
    - Commercial industries (retail and wholesale)
    - Service industries
    - __Financial services__

In [None]:
# Merge both tables
us_firms = us.merge(firms, left_on='firm', right_on='Ticker')
# Extract Financial services information
us_finance_firms = us_firms[us_firms['Sector NAICS\nlevel 1'].isin(['Finance and Insurance', 'Real Estate and Rental and Leasing'])]
us_finance_firms = us_finance_firms.drop(['Sector\nEconomatica', 'Sector NAICS\nlast available', 'partind', 'N', 'Class'], axis=1)

us_finance_firms.head()

- Using your subset of firms that belong to your industry, which factors (variables) might be related to annual stock return one quarter in the future? Select at least 3 factors and briefly explain why you think might be related to stock returns.
    - Do histograms for each factor/variables and interpret them
    - Do plots to visualize the possible relationship each factor might have with the dependent variable.
    - Show descriptive statistics of these factors

```
Notes:                      Future
    - Dependet variable => Stock Annual Returns (cc.) 1 Quarter
                            Later (F1r) -> returns shifted one
    
    Catalog (possible) Independent variables
    EPSP = EPS / StockPrice
    EPS = NetIncome / sharesoutstanding

    *** SELECT 3 ***
    1) Sales annual growth = (revenue_t / revenue(t-4)) - 1
    2) Operating profit growth
       Operating profit = EBIT (Earning Befora Interest and Taxes)
    3) Operating profit margin = EBIT / revenue
    4) Book-to-market ratio = book value / market value = (totalassets - totalliabilities) / (originalprice * sharesoutstanding)
    5) Short financial leverage = shortdebt / totalassets
    6) Long financial leverage = longdebt / totalassets
```

In [None]:
# Firm size measures
us_finance_firms['Market value'] = us_finance_firms['originalprice'] * us_finance_firms['sharesoutstanding']
us_finance_firms['Ebit'] = us_finance_firms['revenue'] - us_finance_firms['cogs'] - us_finance_firms['sgae'] - us_finance_firms['otheropexp']
us_finance_firms['Net income'] = us_finance_firms['Ebit'] - us_finance_firms['incometax'] - us_finance_firms['finexp']
us_finance_firms['EPS'] = us_finance_firms['Net income'] / us_finance_firms['sharesoutstanding']

# 3 factors selected + EPSP 
us_finance_firms['Book-to-market ratio'] = (us_finance_firms['totalassets'] - us_finance_firms['totalliabilities']) / us_finance_firms['originalprice'] * us_finance_firms['sharesoutstanding']
us_finance_firms['Short financial leverage'] = us_finance_firms['shortdebt'] / us_finance_firms['totalassets']
us_finance_firms['Long financial leverage'] = us_finance_firms['longdebt'] / us_finance_firms['totalassets']
us_finance_firms['OPM'] = us_finance_firms['Ebit'] / us_finance_firms['revenue']
us_finance_firms['EPSP'] = us_finance_firms['EPS']  / us_finance_firms['originalprice']
us_finance_firms['R'] = np.log(us_finance_firms['adjprice']) - np.log(us_finance_firms.groupby(['firm'])['adjprice'].shift(4))
us_finance_firms['F1R'] = us_finance_firms.groupby(['firm'])['R'].shift(-1) # Prediction
us_finance_firms.head()

In [None]:
us_finance_firms.replace([np.inf, -np.inf], np.nan, inplace=True)
us_finance_firms = us_finance_firms.dropna()

us_finance_firms['EPS'] = winsorize(us_finance_firms['EPS'], limits=[0.0001, 0.02])
us_finance_firms['EPSP'] = winsorize(us_finance_firms['EPSP'], limits=[0.0001, 0.02])
us_finance_firms['F1R'] = winsorize(us_finance_firms['F1R'], limits=[0.0001, 0.02])

us_finance_firms['OPM'] = winsorize(us_finance_firms['OPM'], limits=[0.0001, 0.02])
us_finance_firms['OPM'].plot.hist();

In [None]:
us_finance_firms['Book-to-market ratio'] = winsorize(us_finance_firms['Book-to-market ratio'], limits=[0.0001, 0.02])
us_finance_firms['Book-to-market ratio'].plot.hist();

In [None]:
us_finance_firms['Short financial leverage'] = winsorize(us_finance_firms['Short financial leverage'], limits=[0.0001, 0.02])
us_finance_firms['Short financial leverage'].plot.hist();

In [None]:
us_finance_firms['Long financial leverage'] = winsorize(us_finance_firms['Long financial leverage'], limits=[0.0001, 0.02])
us_finance_firms['Long financial leverage'].plot.hist();

- Design and run a multiple regression model to examine whether your selected factors and earnings per share deflated by price can explain/predict annual stock returns. You have to control for industry and firm size. To control for these variables you have to include them as extra independent variables in the model
    - Your independent variables must be in the right scale so that you can compare the values of the variables among different firms of any size
    - For each independent variable you have to check for outliers and do the corresponding adjustments to avoid unreliable results in your regression model
    - You must check for possible multicollinearity problems. Briefly explain what is multicollinearity, run and interpret the corresponding test

In [None]:
def dense_inclusive_pct(x):
    # I subtract one to handle the inclusive bit
    r = x.rank(method='dense') - 1
    return r / r.max() * 100

us_finance_firms['percentile'] = us_finance_firms.groupby('q')['Market value'].apply(dense_inclusive_pct).astype(int)

us_finance_firms["Small"] = us_finance_firms['percentile'] <= 33
us_finance_firms["Small"] = us_finance_firms["Small"].astype(int)

us_finance_firms["Medium"] = (us_finance_firms['percentile'] <= 66) & (us_finance_firms['percentile'] > 33) 
us_finance_firms["Medium"] = us_finance_firms["Medium"].astype(int)

us_finance_firms.drop('percentile', axis=1)
us_finance_firms

In [None]:
from statsmodels.stats.outliers_influence import variance_inflation_factor
vif_data = pd.DataFrame()

vif  = us_finance_firms[["F1R", "EPSP", "OPM", 'Short financial leverage', 'Long financial leverage']]

vif_data["feature"] = vif.columns
  
# calculating VIF for each feature
vif_data["VIF"] = [variance_inflation_factor(vif.values, i) for i in range(len(vif.columns))]

print(vif_data)

- Interpret your model
    - Interpret the results of each coefficient (beta and their statistical significance)
    - Interpret the R-squared of the model

In [None]:
# Getting x and y parameters for the model prediction
x = us_finance_firms[['EPS', 'OPM', 'EPSP', 'Book-to-market ratio', 'Short financial leverage', 'Long financial leverage', 'Small', 'Medium']]
y = us_finance_firms['F1R']

# Getting information to interpret a model
x = sm.add_constant(x)
results = sm.OLS(y, x, missing="drop").fit()
pred = results.predict(x)

print(results.summary())

INTERPRETACIIÓN DEL MODELO OBTENIDO:

NUESTRA VARIABLE DEPENDIENTE EN ESTE CASO FUERON LOS RETORNOS CONTINUOS COMPUESTOS TOMANDO COMO VARIABLES INDEPENDIENTES FUERON LOS EPS (EARNINGS PER SHARE), OPM (OPERATING PROFIT MARGIN), BOOK TO MARKET RATIO, SHORT FINANCIAL LEVERAGE, LONG FINANCIAL LEVERAGE Y SIZE QUE ES EL TAMAÑO DE INDUSTRIA EN BASE AL MARKET VALUE.

LO QUE SE PREDICE EN EL MODELO DE REGRESIÓN SON LOS VALORES DE RETORNOS CONTINUOS COMPUESTOS DE UN QUARTIL AL FUTURO

__FALTA RESOLVER__ ...

- Adjustments to your model. If there is one or more independent variables (factors or control variables) that were not significant, drop them from your model. You have to run and interpret your final model.

In [None]:
# Getting x and y parameters for the model prediction
x = us_finance_firms[['EPS', 'OPM', 'EPSP', 'Book-to-market ratio', 'Long financial leverage', 'Small', 'Medium']]
y = us_finance_firms['F1R']

# Getting information to interpret a model
x = sm.add_constant(x)
results = sm.OLS(y, x, missing="drop").fit()
pred = results.predict(x)

print(results.summary())