# Machine Learning Model - First Segment Project Deliverable

## Model Plan

1. Prepare the dataframe with columns: tweet text, price previous day, price next day, price_diff
2. Preprocess the tweet text into features (countVectorier, tfidf)
    1. Classification: predict if it goes up or down (Binomial Naive Bayes)
    2. Regress the actual price difference (RandomForests, if time allows XGBoost or lightgbm)
3. Evaluate algorithms and discuss results

### 1. Prepare the dataframe with columns: tweet text, price previous day, price next day, price_diff

In [1]:
# Set-up Imports:

import requests
import pandas as pd

In [2]:
# Import data
tweets = pd.read_csv('https://raw.githubusercontent.com/angkohtenko/twitter_vs_stocks/kimberly_branch/Data/elon_tweets.csv')
tweets = tweets[['date', 'text']]

# import API key for financialmodelingprep.com
from config import API_key_stocks

url = 'https://financialmodelingprep.com/api/v3/historical-price-full/TSLA?serietype=line&apikey='+ API_key_stocks

# Get data from API for Tesla stocks and reformat it
tesla = requests.get(url).json()
tesla_df = pd.DataFrame.from_dict(data=tesla['historical'])
tesla_df['date'] = pd.to_datetime(tesla_df.date)
tesla_df = tesla_df.set_index('date').resample('D').ffill().reset_index()

In [3]:
# Reformate Date abndn time types
from datetime import timedelta

tweets['prev_date'] = pd.to_datetime(tweets.date) - timedelta(days=1)
tweets['next_date'] = pd.to_datetime(tweets.date) + timedelta(days=1)

In [4]:
# Check for NaN
tweets.dropna().shape

(849, 4)

In [5]:
# Merge Dataframes 
tweets_price = pd.merge(tweets, tesla_df, how='left', left_on='prev_date', right_on='date', suffixes=('', '_prev'))
tweets_price = pd.merge(tweets_price, tesla_df, how='left', left_on='next_date', right_on='date', suffixes=('', '_next'))

In [6]:
# Rename Columns 
tweets_price = tweets_price.rename(columns={'close': 'close_prev'})
tweets_price = tweets_price[['date', 'text', 'close_prev', 'close_next']]
tweets_price['close_price_diff'] = tweets_price['close_next'] - tweets_price['close_prev']
tweets_price.dropna(inplace=True)
tweets_price.shape

(845, 5)