# DSFB course project: predicting IPO market valuation and price change

<center>
!!!  
  
__IMPORTANT NOTE:__  
  
We have extended the final deadline for committing your project repo to the midnight before the day of your final presentation.  
  
!!!  
</center>

## Introduction

An Initial Public Offering (IPO) is the process by which a private company offers shares of common equity stock to public investors, in exchange of capital. Those shares then can be publically traded on a stock market exchange. During an IPO, the company hires an investment bank (their "underwriter") to assist it in determining the quantity and price for the equity shares initially being offered for sale. The initial shares are typically sold to institutional investors at a fixed price before they can then be traded on the open market. Because expectations of value may be different between the initial purchashers of the IPO shares, and the open market once those shares are traded to others, they often is a rapid re-adjustment (up or down) in the price per share, between the price at which the stock is sold to the initial institutional investors (i.e., the opening price) and the price at which the stock equailabrates to after some amount of trading on the open market.

dkfhfdljf

## Data

You will be provided with a range of different data for firms conducting an Initial Public Offering (IPO). The data was collected from several commercial data providers (such as SDC, COMPUSTAT, CRSP, VentureExpert, etc.) by researchers in the TIS Lab at EPFL. The data is licensed from the original provider and provided to you under the EPFL license strictly for the purpose of completing this project. You may not distribute the data to any other party.    

The data to build and train your models is provided to you in the file: __ipo.xlsx__  

The description for each variable is provided to you in the file: __variable_description.xlsx__
  
Both files can be found within the git course repository. Note that the files above are in Microsoft Excel format. You can import/export data to/from an Excel formatted file using the *pandas.read_excel()* and *pandas.to_excel()* methods.

## Key Variables

##### Offering_Price

The price at which a company sells its shares to investors during the book making process leading up to the IPO.

##### Closing_Price

The price at which shares trade in the open market, measured at the end of the first day of trading.

##### Market_Valuation

The market value of the IPO firm at the end of the first day of public trading. This variable simply equals the Closing Price multiplied by the number of outstanding common equity shares.

##### Risk_Factors

During the IPO disclosure process, the firm must disclose all relevant information about risk factors that might affect its future business performance. This information is contained in the __“Risk Factors”__ section of the IPO prospectus (the "prospectus" is a public document filed with the Government (SEC) as part of conducting an IPO. Although some firms might be concerned about the disclosure of sensitive information that could harm them in the capital markets and/or by competitors, firms are typically advised by legal counsel to be as transparent as possible in order to avoid future litigation (law suits) that could arise from witholding information. Therefore, there may be (or may not be) useful information in the text of the Risk Disclosure section that would allow one to predict what will happen when the stock begins to trade on an open market. That is for you to determine, in Step 2 below!

## Instructions  

Use the training data described above to build the "best" possible model for each step described below. After training and cross-validating your model, use your model for each step to predict the appropriate coutcome for the "unseen" cases in the file __ipo_to_predict.xlsx__ (also provided to you in the git course repository). The outcome you are predicting will vary based on the step you are completing below. You should insert your best prediction for 'Market_Valuation_Non_Textual' (Step 1), 'Market_Valuation_All' (Step 2) and 'Price_Change' (Step 3) into the file __ipo_to_predict.xlsx__ and commit that with your final project.

#### Step 1

Predict the end of day market valuation using all of the non-target and non-textual variables (i.e., do not use the'Risk_Factors' variable or any of the outcome variables that happen just after the IPO event, such as Closing_Price).

    Features: all non-target variables, but do not use 'Risk_Factors'
    
    Target: market valuation at the end of the first day of trading  
      

#### Step 2
  
Predict the end of day market valuation using all of the non-target variables (i.e., do not use any of the outcome variables that happen just after the IPO event, such as Closing_Price, but you should now also use the 'Risk_Factors' variable - in some way, or multiple ways).

    Features: all non-target variables  
    
    Target: market valuation at the end of the first day of trading  
      

#### Step 3

Predict whether the closing price is higher than the offering price. If yes, assign the binary variable called 'Price_Change' to equal to 1, otherwise to 0.

    Features: all non-target variables  
    
    Target: whether the stop price goes up, as measured at the end of the first day of trading  
 


## Requirements

FOR EACH STEP, we expect your solution to encompass or contain the following:

* preprocess the data and extract appropriate feature variables (this code can be shared across different steps)
* reduce features to your preferred subset
* train, tune and test different predictive models
* compare your models and explain why one particular model (for each step) is the best one
* be sure to compare and mentionthe performance of your model relative to some "baseline" model
* predict the appropriate outcomes using your final model and insert those predictions into __ipo_to_predict.xlsx__
* discuss what additional tasks might be tried in order to boost performance (but that you did not get to)

## Deliverables

* Deliver a Jupyter notebook with an explanation of your methods, programming code, and results. Explicitly divide your notebook into different parts. Clearly describe which portions are common pre-processing, and which portions are tied to each of the steps described above. 
    

* Fill in the "to_predict" datasets with your best predictive model in each step and commit the resulting file.  
    

* We will create for your team a new git repository for your project. Commit your final notebook and __ALL FILES__ into that repository so that we may run your code and evaluate all of your work based entirely from your repo.


* Note that you will also present your solution in the final session of the course - **more details to come**. 

## Suggestions

* Present your solutions in a story telling way is extremely important!   
  

* Document all of your assumptions (e.g. evaluation metric, hyper-parameter values, ...).  


* Make sure your code will run and results are reproducible (fix random seeds, etc.).  


*  Add comments to your blocks of code (and lines of code if needed) for any part of your story/logic that might not be obvious by looking at your code.    


* To speed up experimentation, you might use a small sample of the original dataset to do your initial coding. Also try to use all possible cores for computation, by setting the option of n_jobs = -1, when needed. 


* Try to be creative to improve your predictions, but don't forget that it is also important to explain your line of thinking and reasoning.


* Your final grade is based on the whole process of doing the project and not just based on your results on the unseen data. __The quality and documentation of your Jupyter notebook is very important.__

## Grading

Grading of the project (apart from presentation), is based on the following components. You may skip Step 3 if you do everything up through Step 2 (but you will lose the points associated with Step 3). Likewise, you may skip Steps 2 and 3 if you do everything up through Step 1.

* 20 %  ___ Documentation in your notebook
* 15 %  ___ Code quality / comments
* 15 %  ___ Pre-processing
* 20 %  ___ Step 1
* 20 %  ___ Step 2
* 10 %  ___ Step 3