A) INTRODUCTON:
On June 2015, 2016 debt negotiations between Greek Govt and its creditors borke off abrubptly. Large market movements as a concequence of political and economic headlines are hardly uncommon, liquid markets are most suspectable to swing when the news breaks. Using VIX as a proxy for market volatality, we investigate how macroeconoic headlines affect the changes. Here, we predict equity market value using tweets from major news sources, investment banks and notable economists.
B) PROBLEM STATEMENT:
Twitter provides a plethora of market data. In this project we have extracted around 100,000 tweets from various accounts to predict the upward movements. Using this data we are researching how this economic news affects the market.
C) TYPE OF MACHINE LEARNING:
This project is Regression based problem, which is a predictive modelling technique that analyzes the relation between the target or dependent variable and independent variable in a dataset.
METRICS USED: The performance of a regression model must be reported as an error in those predictions and these error summarizes on average how close the predictions were to their expected values.
Accuracy mectrics we have used in this project are:
1.)Root Mean Squared Error(RMSE) 2.)Mean Absolute Error(MAE) 3.)Rsquared value(r2)
EDA includes extracting the twitter data based on the stock names viz, Apple, Tesla, Nvidia, Paypal and Microsoft, cleaning of twitter data that were pulled i.e., removing unnecessary data from tweets. After cleaning the data, below are the plots that were plotted against the sentiments that is Positive, Negative and Neutral.
We have implemented differnt ML models Linear Regression, Random Forest Regression, Decision Tree Regressor. We chosed Linear Regression ML for our project as its r2 - 0.99974, rmse - 2.65. Below are the plots which supports our decision
x = df5[['Year','Month','Day','StockName','Positive','Negative','Neutral']].to_numpy()
y = np.array(df5['Close'])
for train_index, test_index in tscv.split(x):
x_train , x_test = x[train_index] , x[test_index]
y_train , y_test = y[train_index] , y[test_index]
regresor = LinearRegression()
regresor.fit(x_train,y_train)
y_pred = regresor.predict(x_test)
rmse = (math.sqrt(metrics.mean_squared_error(y_test,y_pred)))
mae = (metrics.mean_absolute_error(y_test, y_pred))
r2 = metrics.r2_score(y_test, y_pred)
We have deployed the model using Flask framework, as it is a opensource Python library that allows us to create beautiful web apps for Machine Learning. It is hosted on Heroku, as it a container based Platform As A Service(PAAS), because it is flexible and easy to host on this platform.
Heroku : Visit here
Video : Click here
Anirudh Saxena
Sowmya Prakash