# Competition 1 #

#### Research Question & Goal ####

What are the determinants of the IPO underpricing phenomena? It is our job as a group to understand and identify the underlying determinants that factor into IPO underpricing.

### Business Understanding ###

According to Investopedia.com, Underpricing is the listing of an intial public offering (IPO) below its market value. When the offer price of the stock is lower than the price of the first trade, the stock is considered to be underpriced. This will only last for a short amount of time, as the demand of the stock is going to drive it back up to its value.

From a company standpoint, they wish to have the intial public offering as high as possible, which in turn raises the most capital. The quantitative factors that go into an initial public offering are all financial analysis reports from the company itself. Before the IPO, the company will be analyzed by its sales, expenses, earnings, and cash flow. Furthermore, a company's earnings and expected earnings growth are the biggest factors in the IPO. Marketability in a specific industry and the general market also can drive an IPO up or down.

Once the investment bankers or IPO underwriters determine the IPO price of the company's stock, the day before the stock is offered publically, the company will market the IPO to potential investors. For historical purposes, IPOs are viewed as risky investments because of the lack of historical data that is collected on them. The less liquidity that the stock/company has and predicatble IPO shares are going to be, the more likely they are going to be underprices to compensate for assumed risk. Company's also underprice their IPO to entice more investors to buy stocks to raise more capital.

With all of this information about intial public offerings, is there a few determinants that can be identified as to why the phenomenon of underpricing exists? The dataset that we have been provided provide information about companies and information regarding their IPO, such as IPO Offering, IPO Characteristics, Textual Characterisitics, Sentiment Characteristics, Target Variables, Control Variables, and IPO Identifiers.

The variables that have been provided are listed below:

- P(PHO) - Offer Price
- P(H) - Price Range Higher Bound
- P(L) - Price Range Lower Bound
- P(1Day) - First Day Trading Price
- C1 - Days
- C2 - Top-Tier Dummy
- C3 - Earnings per Share
- C4 - Prior NASDAQ 15-Day Returns
- C5 - Outstanding Shares
- C6 - Offering Shares
- C7 - Sales
- T1 - Number of Sentences
- T2 - Number of Words
- T3 - Number of Real Words
- T4 - Number of Long Sentences
- T5 - Number of Long Words
- S1 - Number of Positive Words
- S2 - Number of Negative Words
- S3 - Number of Uncertain Words
- Y1 - Pre-IPO Price Revision
- Y2 - Post-IPO Initial Return
- C3' - Positive EPS Dummy
- C5' - Share Overhang
- C6' - Up Revision
- I1 - Ticker
- I2 - Company Name
- I3 - Standard Industry Classifier

## Data Understanding ##

In [30]:
# Importing useful packages
import pandas as pd
import numpy as np

# Read in the .xlsx datafile and
data = pd.read_excel("Competition1_raw_data.xlsx", header=0, na_values="none")
df_data = pd.DataFrame(data)

In [44]:
pd.set_option('display.max_rows', 22)
print(df_data.dtypes)

I1          object
I2          object
I3          object
P(IPO)     float64
P(H)       float64
P(L)       float64
P(1Day)    float64
C1         float64
C2         float64
C3         float64
C4         float64
C5         float64
C6         float64
C7         float64
T1         float64
T2         float64
T3         float64
T4         float64
T5         float64
S1         float64
S2         float64
S3         float64
dtype: object


In [39]:
#Converting '-' to null value
df_data.replace('-', np.nan, inplace=True)

In [46]:
#counting the number of null values in each data column
print(df_data.isnull().sum(axis=0).tolist())

[0, 0, 8, 5, 10, 10, 22, 22, 22, 36, 22, 6, 6, 72, 1, 1, 1, 1, 1, 1, 1, 1]


In [49]:
df_data.describe()

Unnamed: 0,P(IPO),P(H),P(L),P(1Day),C1,C2,C3,C4,C5,C6,C7,T1,T2,T3,T4,T5,S1,S2,S3
count,677.0,672.0,672.0,660.0,660.0,660.0,646.0,660.0,676.0,676.0,610.0,681.0,681.0,681.0,681.0,681.0,681.0,681.0,681.0
mean,13.837666,15.48119,13.515045,25.934766,149.728788,0.859091,1.788904,0.007282,49357760.0,12415190.0,500.459962,465.634361,12758.606461,11395.844347,294.353891,679.220264,68.421439,120.104258,144.759178
std,6.053731,6.653429,5.835646,73.234948,152.817467,0.348192,162.666532,0.033318,104376400.0,25128550.0,1648.337634,175.741647,5449.644597,4839.670179,121.532637,472.914323,39.096525,84.828959,69.276285
min,3.0,0.0,3.0,0.0,10.0,0.0,-786.239,-0.162352,3693227.0,525000.0,0.074,132.0,0.0,0.0,0.0,-1.0,-1.0,20.0,26.0
25%,10.0,12.5,11.0,11.0,85.0,1.0,-0.8525,-0.013927,18714170.0,5000000.0,37.24575,351.0,9195.0,8162.0,213.0,462.0,45.0,73.0,100.0
50%,13.5,15.0,13.0,14.845,107.0,1.0,0.01,0.009125,27400180.0,7398704.0,103.833,444.0,12045.0,10785.0,279.0,624.0,60.0,100.0,134.0
75%,17.0,17.0,15.0,20.485,155.25,1.0,0.47,0.031571,49807860.0,12000000.0,331.138,551.0,15241.0,13760.0,354.0,795.0,85.0,142.0,173.0
max,85.0,135.0,108.0,1159.200562,2087.0,1.0,3864.5,0.092896,2138085000.0,421233600.0,30683.0,1750.0,49056.0,43952.0,1058.0,10277.0,309.0,944.0,883.0


In [50]:
df_data.I1.unique()

array(['AATI', 'ABPI', 'ACAD', 'ACHN', 'ACLI', 'ACOM', 'ACOR', 'ACRX',
       'ACTV', 'ACW', 'ADKU', 'ADLS', 'ADZA', 'AFFY', 'AGAM', 'AH',
       'AHII', 'AIMC', 'AIRV', 'ALGT', 'ALIM', 'ALJ', 'ALLI', 'ALNY',
       'ALTU', 'ALXA', 'AMBA', 'AMIS', 'AMRC', 'ANAC', 'ANFI', 'ANGI',
       'ANGO', 'AONE', 'AOSL', 'APEI', 'APKT', 'ARAY', 'ARBX', 'ARCL',
       'ARCO', 'AREX', 'ARII', 'ARP', 'ARST', 'ARTE', 'ARUN', 'ASPV',
       'ATEC', 'ATHN', 'ATRC', 'AUXL', 'AVAV', 'AVEO', 'AVGO', 'AVNC',
       'AVR', 'AVRX', 'AWAY', 'AWK', 'AYR', 'BAGL', 'BAH', 'BALT', 'BARE',
       'BAS', 'BBBB', 'BBG', 'BBND', 'BBRG', 'BBW', 'BCOV', 'BDAY', 'BDE',
       'BEAT', 'BECN', 'BFAM', 'BFRM', 'BHRT', 'BIOD', 'BIOF', 'BKC',
       'BKRS', 'BLMN', 'BLOG', 'BLSW', 'BLT', 'BMTI', 'BNNY', 'BODY',
       'BOX', 'BPI', 'BRNC', 'BSFT', 'BTRX', 'BUN', 'BV', 'BWLD', 'BWTR',
       'BWY', 'CAB', 'CABG', 'CADX', 'CALD', 'CALL', 'CALX', 'CAP',
       'CARB', 'CATM', 'CBEY', 'CBOU', 'CCO', 'CDL', 'CDM', 'CE', 'CELM',
  