# Competition 1 #

#### Research Question & Goal ####

What are the determinants of the IPO underpricing phenomena? It is our job as a group to understand and identify the underlying determinants that factor into IPO underpricing.

### Business Understanding ###

According to Investopedia.com, Underpricing is the listing of an intial public offering (IPO) below its market value. When the offer price of the stock is lower than the price of the first trade, the stock is considered to be underpriced. This will only last for a short amount of time, as the demand of the stock is going to drive it back up to its value.

From a company standpoint, they wish to have the intial public offering as high as possible, which in turn raises the most capital. The quantitative factors that go into an initial public offering are all financial analysis reports from the company itself. Before the IPO, the company will be analyzed by its sales, expenses, earnings, and cash flow. Furthermore, a company's earnings and expected earnings growth are the biggest factors in the IPO. Marketability in a specific industry and the general market also can drive an IPO up or down.

Once the investment bankers or IPO underwriters determine the IPO price of the company's stock, the day before the stock is offered publically, the company will market the IPO to potential investors. For historical purposes, IPOs are viewed as risky investments because of the lack of historical data that is collected on them. The less liquidity that the stock/company has and predicatble IPO shares are going to be, the more likely they are going to be underprices to compensate for assumed risk. Company's also underprice their IPO to entice more investors to buy stocks to raise more capital.

With all of this information about intial public offerings, is there a few determinants that can be identified as to why the phenomenon of underpricing exists? The dataset that we have been provided provide information about companies and information regarding their IPO, such as IPO Offering, IPO Characteristics, Textual Characterisitics, Sentiment Characteristics, Target Variables, Control Variables, and IPO Identifiers.

The variables that have been provided are listed below:

- P(PHO) - Offer Price
- P(H) - Price Range Higher Bound
- P(L) - Price Range Lower Bound
- P(1Day) - First Day Trading Price
- C1 - Days
- C2 - Top-Tier Dummy
- C3 - Earnings per Share
- C4 - Prior NASDAQ 15-Day Returns
- C5 - Outstanding Shares
- C6 - Offering Shares
- C7 - Sales
- T1 - Number of Sentences
- T2 - Number of Words
- T3 - Number of Real Words
- T4 - Number of Long Sentences
- T5 - Number of Long Words
- S1 - Number of Positive Words
- S2 - Number of Negative Words
- S3 - Number of Uncertain Words
- Y1 - Pre-IPO Price Revision
- Y2 - Post-IPO Initial Return
- C3' - Positive EPS Dummy
- C5' - Share Overhang
- C6' - Up Revision
- I1 - Ticker
- I2 - Company Name
- I3 - Standard Industry Classifier

## Data Understanding ##

In [1]:
# Importing useful packages
import pandas as pd
import numpy as np

# Read in the .xlsx datafile and converting into a DataFrame
data = pd.read_excel("Competition1_raw_data.xlsx",header=0,na_values='None')
df_data = pd.DataFrame(data)

In [2]:
# Printing Data Types for Initial Analysis
print(data.dtypes)

I1         object
I2         object
I3         object
P(IPO)     object
P(H)       object
P(L)       object
P(1Day)    object
C1         object
C2         object
C3         object
C4         object
C5         object
C6         object
C7         object
T1         object
T2         object
T3         object
T4         object
T5         object
S1         object
S2         object
S3         object
dtype: object


In [8]:
# Printing Initial Descriptive Statistics for the Dataframe
pd.set_option('display.max_columns', 200)
print(data.describe())

          I1               I2    I3  P(IPO)  P(H)  P(L) P(1Day)   C1   C2  \
count    682              682   682     682   682   682     682  682  682   
unique   682              682   202      62    52    48     531  245    3   
top     NPTT  LIFELOCK, INC.   2834      15    16    14       -    -    1   
freq       1                1    76      54   109   105      22   22  567   

         C3   C4   C5       C6   C7   T1   T2   T3   T4   T5   S1   S2   S3  
count   682  682  682      682  682  682  682  682  682  682  682  682  682  
unique  373  471  675      336  611  408  672  657  332  466  146  222  214  
top       -    -    -  5000000    -  411    0    0  256  458   46   77  110  
freq     36   22    6       48   72    7    2    3    6    5   15   11   12  


In [3]:
# Checking Features for "Hidden Missing" Values
data.C1.unique()
data.C2.unique()
data.C3.unique()
data.C4.unique()
data.C5.unique()
data.C6.unique()
data.C7.unique()

array([51.345, 25.936, 7.378, 8.526, 632.298, 197.591, 5.146, '-', 279.6,
       494.008, 0.32, 26.499, 0.074, 166.896, 510.192, 571.192, 363.465,
       170.27, 132.5, 1707.564, 64.606, 0.176, 3.88, 2.23, 94.739,
       345.322, 428.516, 18.643, 257.056, 59.039, 38.434, 68.525, 185.076,
       40.045, 36.12, 52.897, 519.966, 32.068, 3018.118, 46.672, 608.16,
       69.833, 72.503, 74.005, 75.813, 19.157, 8.822, 139.357, 20.719,
       1699, 935.468, 1.9, 167.884, 2214.215, 32.978, 389.962, 4351.218,
       259.295, 311.502, 92.478, 75.436, 176.624, 311.709, 213.672,
       60.169, 37.811, 143.062, 72.992, 559.54, 1070.938, 22.65, 1.94,
       148.224, 3841.264, 24.568, 8.235, 4.517, 117.616, 198.834, 141.873,
       218.29, 21.873, 95.623, 0.367, 14.798, 64.482, 96.107, 12.231,
       918.513, 1392.423, 26.586, 22.488, 232.947, 38.563, 293.605,
       113.311, 160.871, 2447.04, 348.869, 5069, 371.445, 6.616, 1650.652,
       506.305, 8.678, 330.118, 59.261, 207.502, 1239.711, 519.023,

In [9]:
# Converting data types to correct data types
data.I1.astype(str)
data.I2.astype(str)
data.I3.astype(str)

# Printing Data Types for Secondary Analysis


ValueError: Cannot convert non-finite values (NA or inf) to integer