Skip to content

ML approach to predict the direction of stock prices from DTMs

License

Notifications You must be signed in to change notification settings

Matin-M/DTM-Predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stock Direction Prediction

Stock Direction Predictor, an ensemble of machine learning models designed to predict the direction of stock prices based on Document Term Matrices (DTMs) constructed from the textual content in relevant news articles

Repository Structure

All code and corresponding evaluations can be found under the Classifiers directory. Each model variant is trained on the same dataset, with the CV accuracy and classification report appended at the bottom of the jupyter notebook file.

All training data is located under Data/TrainingData_new, a directory containing the DTM's that have been preprocessed using TruncatedSVD to reduce their dimensionalty.

Libraries

  • Preprocessing (tokenization, stemming, etc)
    • spaCy
  • Classifiers
    • SciKit Learn
      • LogisticRegression
      • RandomForestClassifier
      • GaussianNB
    • NVidia XGBoost
  • Vector/Matrix operations
    • NumPy
    • Pandas

Classifier selection:

  • Logistic Regression
  • XGBoost
  • Random Forest
  • Naive Bayes

Evaluation metrics:

Results from 10-fold cross validation weighted against support

  • Accuracy
  • F1-score
  • Percision
  • Recall

Data

  • Scraped news articles related to Apple and Amazon spanning 2018 -> early 2019 in JSON format
  • Stock data from Apple(AAPL) and Amazon(AMZN) in CSV format collected in intervals of 5 mins, 15 mins, 30 mins, 1 hr, 4 hrs, and 1 day

Authors