# Cash Flows, Earnings, and Future Stock Prices
- This program illustrates how to examine the correlation between two performance measures and future stock prices.
- The dataset, *compustat_earnings_00_19.csv*, includes total net incomes (*ni*), total net operating cash flows (*oancf*), market value (*mkvalt*), and the number of shares outstanding (*csho*) for each firm (*gvkey*) year (*fyear*) between 2000 and 2019.

Please run the following lines of code and answer questions 1, 2, nad 3. Please write down your explanations in the empty markdown cell below each question, or add new cells and answer them there. 
- **Each group only has to submit one file.**
- There are some new codes that we haven't studied in the previous classes. For those codes, please try your best to figure out what those codes do by searching through the online resources or asking me if you have questions.

**Loading pandas and dataset**

In [None]:
import pandas as pd
pd.set_option('display.float_format', lambda x: '%.3f' % x)
df = pd.read_csv('compustat_earnings_00_19.csv')

**Creating per-share variables**

**Question 1**: Explain the three lines of code below:

(1)

(2)

(3)

In [None]:
df['price'] = df['mkvalt'] / df['csho']

In [None]:
df['ni_ps'] =  df['ni'] / df['csho']

In [None]:
df['oancf_ps'] =  df['oancf'] / df['csho']

---

**Merging prices from next year**

**Question 2**: Explain the four lines of code below:

(1)

(2)

(3)

(4)

In [None]:
df_f = df[['gvkey', 'fyear', 'price']]

In [None]:
df_f = df_f.assign(fyear = df_f['fyear'] - 1)

In [None]:
df_f.rename(columns={'price':'price_f'}, inplace=True)

In [None]:
df1 = pd.merge(df, df_f, how = 'inner', on = ['gvkey','fyear'])

---

**Excluding outliers**

In [None]:
df2 = df1

In [None]:
lb = (df2['ni_ps'] > df1['ni_ps'].quantile(0.01))
ub = (df2['ni_ps'] < df1['ni_ps'].quantile(0.99))
df2 = df2[lb & ub]
print(df2.shape)

In [None]:
lb = (df2['oancf_ps'] > df1['oancf_ps'].quantile(0.01))
ub = (df2['oancf_ps'] < df1['oancf_ps'].quantile(0.99))
df2 = df2[lb & ub]
print(df2.shape)

In [None]:
lb = (df2['price_f'] > df1['price_f'].quantile(0.01))
ub = (df2['price_f'] < df1['price_f'].quantile(0.99))
df2 = df2[lb & ub]
print(df2.shape)

**Question 3**: The below code is equivalent to the codes above (four cells). Can you search online to understand how to use "for loop" in Python (e.g., https://beginnersbook.com/2018/01/python-for-loop/) and then explain the code below?

You answer:

In [None]:
df2 = df1
for col in ['ni_ps', 'oancf_ps', 'price_f']:
    lb = (df2[col] > df1[col].quantile(0.01))
    ub = (df2[col] < df1[col].quantile(0.99))
    df2 = df2[lb & ub]
    print(df2.shape)

**Calculating correlations**

- Using the dataset that excludes outliers

In [None]:
df2[['ni_ps', 'oancf_ps', 'price_f']].corr(method = 'pearson')

- Using the dataset that doesn't exclude outliers

In [None]:
df1[['ni_ps', 'oancf_ps', 'price_f']].corr(method = 'pearson')