Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

We have run our scripts on the GPU server The data is present on the same server.

Raw data for cases and stock price change can be found at path-

# Main Directory 
# Case Data is inside the directory
# Stock Data is inside the directory

The data after processing and joining can be found at path -

# Main Directory 
# Random Forest for Stock Prediction Data
# Neural Network for Stock Prediction Data
# Neural Network for Firm Embeddings Data

To install the packages for running all the scripts execute the command-

chmod 755

Go to the python shell and execute the command for downloading punkt -


Scripts to process the raw case data-

These files are present in CaseData folder. Run the files in following order-
The data generated from these scripts will be used with stock data in creating final data for training 
the models. This data can be found in /data/WorkData/firmEmbeddings/CaseData/ folder present on the server.
1.  filterCases.ipynb - Filters cases from sentences folder to get cases for category 6 and 7. It uses 
bb2topic.pkl, bb2genis.pkl, caseid_date.csv. This generates new folder Filtered_1 and the files 
-filtered.pkl, casedata.pkl. The Filtered_1 contains all cases belonging to category 6 and 7.

2. ngramdataGenerate.ipynb - Filters bigram pickle files to get cases for category 6 and 7 . It uses 
casedata.pkl and [20180208]build_vocab_lemma_pos/phrased/ and creates new folder PickleFiles. The PickleFiles contains all cases belonging to category 6 and 7.

3. bigram.ipynb- It creates final ngramdata.pkl. The code uses id2gram.pkl, casedata.pkl, df-tf.pkl 
and files from PickleFiles folder to generate data. 

4. Uses text from Filtered_1 and runs doc2vec algorithm on filtered cases and generate 

5. modeltodata.ipynb - Uses casedata.pkl and doc2vec_2.model. It maps model vectors to case meta 
data and creates visualization of docvectors. The code produces following files docvector.pkl, 
traindocvector.pkl, testdocvector.pkl, validationdocvector.pkl

Script to process the raw Stock Data -

Run the script present in path StockData to process the stock data

Script to join the two data sets -

These files are present in JoiningDataPrep folder

1. StockAndCaseDataJoined - joins case and stock data. This script uses stockData07to13_logdiff_5_0.1.csv 
and following docvector files - traindocvector.pkl, testdocvector.pkl, validationdocvector.pkl. 
And produces following files - training_data_CaseCompanyStockChange.pkl, 
testing_data_CaseCompanyStockChange.pkl, validation_data_CaseCompanyStockChange.pkl

2. ProcessJoinedDataForNN.ipynb - processes data for final run and creates val_data_final.pkl, 
train_data_final.pkl, test_data_final.pkl

3. Finaldata_stockPred.ipynb - produces final data for all cases and category 
6 and 7 for stock prediction 

4. Finaldata_firmEmbed.ipynb - produces final data for all cases and category 6 and 7 
for firm embeddings and uses Company_meta.pkl

5. RankCompany.ipynb - used to create Company_meta_rank.pkl

After running all these scripts, the data for all the models will be copied in their respective
paths mentioned above.

Script to generate models for stock prediction and firm embeddings -

#Change file permissios to run the script
chmod 755

# Run the following command to execute the script -

This script contains three scripts. Path locations for the scripts on github are - 
1. is present in the directory Random_Forest/
2. is present in the directory FirmEmbeddings/ 
3. is present in the directory StockPrediction/

The script will generate the Random Forest model and it will also plot the 
graph for actual vs predicted change in stock price.

The predictions on test data after running the script are saved in 
predictions.txt in the same path in which data is present. The file predictions.txt along 
with actual.txt (which is also present in the same path as predictions.txt) will be used by the 
notebook StockPrediction/ScatterPlotPredictedvsActual.ipynb in plotting the actual/predicted 
stock price change. The notebook contains the absolute path for these files. 
Thus the notebook can also be run from anywhere on the GPU server.

The firm embeddings matrix after running the script saves the matrix 
in the same path in which data is present. This matrix will be used by 
FirmEmbeddings/VisualizeFirmsEmbeddings.ipynb to visualize the embeddings. This notebook contains 
the Tsne plots for category 6, 7 and combines cases. It also contains the embeddings visualization 
against industries of the firms, ranking of the firms, states in which they lie. The 
notebook  also contains the cosine similarity plots for the two categories - Finance 
and Manufacturing. 


Embed public firms based on their reaction to legal decisions






No releases published


No packages published