# Tesla: Perception in Social Media and Changes in Stock Prices

### Erin Belles (132452) , Bram Poldervaart (785939), & Iris Warnaar  (115795)
##### April 14, 2016

## Question

How does public perception of Tesla's new model 3 correlate with changes in its stock price? 
Our study will examine 7 days of data following the announcement of Tesla's new model 3 (during which the NASDAQ exchange is open), analysing twitter trending tweets and hashtags as a measure of public perception, as wel as the company’s closing stock price movements on the NASDAQ. 


## Motivation

With the advent of data mining techniques, social media offers easy access to consumer and investor opinions. [Tesla Motors](https://www.teslamotors.com/about) is an innovator in the automotive industry focusing on sustainable luxury electric cars whose mission is to “accelerate the world’s transition to sustainable energy.”  

Tesla is known for data sharing and open access to its patents. [In June 2014 ](https://www.teslamotors.com/blog/all-our-patent-are-belong-you) Tesla opened up their patents for public use in order to promote the growth of a common technology platform for the advancement of electric cars and alternative energy with the goal of collectively solving the carbon crisis.  

The company has been [operating at a loss since it went public in 2010](http://www.bloomberg.com/news/articles/2015-03-04/as-tesla-gears-up-for-suv-investors-ask-where-the-profits-are), but CEO Elon Musk is focused on long-term sustainability and profitability. In 2014 alone Tesla lost \$294 million on its \$3.2 billion in revenue.  

Given their business model, desire to transition to greener energy and public perception of Tesla are crucially important. Therefore, we find it interesting to study the link between tweets about Tesla and how positive or negative the top trending Tesla tweets are and the movements in Tesla’s closing stock prices.  


## Method

Our social media data was obtained through data mining Twitter. We followed the method described in [O'Reilly Media’s Mining the Social Web, 2nd Edition.](https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition) Our readings were taken on weekdays at 9:00 am EST (3pm UTC+1 Timezone) before the [NASDAQ](http://www.nasdaq.com/about/trading-schedule.aspx) opens at 9:30 am. We specified this time to try and limit simultaneity issues of reverse causality, as we thought real time movements in the stock price may have some impact on the number of tweets, while public opinion may influence movements in stock price at the same time. We retrieved all tweets with the hashtag Tesla, and then computed the number of times $#Tesla$, $#tesla$, $#teslamodel3$, $#Model3$, or $#model3$ were used. The totals were aggregated into one figure. We then retrieved the top 5 trending tweets with a Tesla hashtag. A dummy variable was created equal to 1 if the majority of tweets was positive and equal to 0 otherwise.

The remaining financial data, including the adjusted closing stock prices of [Tesla](http://finance.yahoo.com/q?s=TSLA) and the [NASDAQ composite fund (^INIX)](http://finance.yahoo.com/q?s=%5EIXIC), were obtained via the [NASDAQ](http://www.nasdaq.com/) and [Yahoo Finance](http://finance.yahoo.com/) websites. 

#### Model Specifications:

$ {\Delta} Tesla Stock Price_t =  {\beta}_0 + {\beta}_1 {\Delta} NASDAQ Composite Stock Pric_t + {\beta}_2 Positive Top 5 Tweets_t + {\beta}_3 log(Number of Tesla Hashtags_t) + u_t $

   We first consider this model specification because we postulate that change in the percentage of Tesla hashtags may be 
more relevant than the absolute number of Tesla hashtags. Furthermore, using a log model would reduce the variance for 
the hashtag variable. 

$ {\Delta} Tesla Stock Price_t=  {\beta}_0 + {\beta}_1 {\Delta} NASDAQ Composite Stock Price_t + {\beta}_2 Positive Top 5 Tweets_t + {\beta}_3 Number of Tesla Hashtags_t + u_t $

   We next consider this second model specification because it may be the case that absolute number of Tesla hashtags is 
more relevant than the percent change in hastags. Furthermore, if there is little variance in the number of hashtags for our data sample then taking the log may flatten out the variation in the variable. 

In both cases the change in stock prices are used because it is assumed to not be dependent across time. Since our data set is already very limited at 7, using a lagged model would further restrict our number of observations. 

The NASDAQ composite stock price was chosen since Tesla is traded on this exchange. 

The Top 5 Tweets mined from Twitter were read by us and categorized as either positive, negative, or neutral. Positive Top 5 Tweets is a Bernoulli random variable taking a value of 1 if the majority of trending Tesla tweets ( ≥ $\frac{3}{5}$) are positive and 0 otherwise. We choose not to include a negative variable because it would display high multicollinearity with positive since the results were not often predominately neutral, and our sample is not suitably large to eliminate the multicollinearity problem of a high likelihood of type II error. In our [robustness check](#Robustness Check) we included a dummy for negative to check our model specification. 


## Answer

Our [conclusion](#Conclusion) indicates that while our results should not be interpreted as causal, there is a statistically significant correlation  between the number of twitter hastags (${\beta} = 0.084 $, 10% level) about Tesla and adjusted closing stock prices. We also found a significant correlation between whether the top five trending Tesla tweets are mostly positive 
($ {\beta} = 13.86 $, 5% level) and adjusted stock prices. Though our model accounts for movements in the NASDAQ composite index, this variable was not statistically significant. This implies an additional tweet coincides with a stock price increase of 8.4 cents, and mostly positive trending tweets coincide with an increase of stock price of $13.86.

## Main Assumptions

* The number of hashtags appearing on twitter about Tesla is a suitable measure for how attentive and interested the public is about the news about the release of the Model 3.  
* Twitter hashtags are not correlated with an excluded variable that also moves stock prices. This is a strong assumption because we might expect hashtags and stock prices to both be correlated with media coverage of Tesla, for example. However, we attempt to limit simultaneity by taking the twitter readings before the normal NASDAQ market opens.
* How positive or negative the top 5 trending tweets about Tesla encompasses the public sentiments about Tesla.  
* Including the composite NASDAQ stock (^IXIC) price in the regression will account for many of the macroeconomic forces which would also move the Tesla stock prices. This is perhaps a strong assumption since stocks may react heterogenously to macroeconomic forces.  
* Stock prices follow a [stochastic random walk](http://www.albany.edu/~bd445/Economics_466_Financial_Economics_Slides_Spring_2014/Random_Walk.pdf). Stock prices are therefore highly autocorrelated at the daily level. However, changes in stock prices should not be predictable if the markets are efficient or it would be highly profitable to invest according to these predictable movements. 
* We test for the specification of the number of Tesla related hashtags to see whether a [log](#Model I) or [level](#Model II) specification best fits the data.

## Twitter Data Mining Code:

#### Code for retrieving the number of Tesla hashtags:

This code mines the number of Tesla hashtags as specified in the [methods](#Methods) appearing at the time the command is executed, for the purpose of our study the time chosen was 9am EST.  

In [None]:
import twitter
import json

CONSUMER_KEY = 'UpBK6ABIiLwoFL1j3bLMPhR8q'
CONSUMER_SECRET = '2jFhCMiX7nuLAw5ID6xFafwb7bDwzZ932S4ZKou7TeNOTIpEGt'
OAUTH_TOKEN = '715495468235694081-uX8py14xuDT3Jr0XpV7GDQBozXq6QsX'
OAUTH_TOKEN_SECRET = 'EZIlChbbLnvRgtoCfHSzciuc79idlZ5Hze8VvV7jyQ56E'

auth = twitter.oauth.OAuth(OAUTH_TOKEN, OAUTH_TOKEN_SECRET,
                           CONSUMER_KEY, CONSUMER_SECRET)

twitter_api = twitter.Twitter(auth=auth)

q = '#Tesla' 

count = 100

# See https://dev.twitter.com/docs/api/1.1/get/search/tweets

search_results = twitter_api.search.tweets(q=q, count=count)

statuses = search_results['statuses']


# Iterate through 5 more batches of results by following the cursor

for _ in range(5):
    print "Length of statuses", len(statuses)
    try:
        next_results = search_results['search_metadata']['next_results']
    except KeyError, e: # No more results when next_results doesn't exist
        break
        
    # Create a dictionary from next_results, which has the following form:
    # ?max_id=313519052523986943&q=NCAA&include_entities=1
    kwargs = dict([ kv.split('=') for kv in next_results[1:].split("&") ])
    
    search_results = twitter_api.search.tweets(**kwargs)
    statuses += search_results['statuses']
    
status_texts = [ status['text'] 
                 for status in statuses ]

screen_names = [ user_mention['screen_name'] 
                 for status in statuses
                     for user_mention in status['entities']['user_mentions'] ]

hashtags = [ hashtag['text'] 
             for status in statuses
                 for hashtag in status['entities']['hashtags'] ]

# Compute a collection of all words from all tweets
words = [ w 
          for t in status_texts 
              for w in t.split() ]

from collections import Counter

for item in [words, screen_names, hashtags]:
    c = Counter(item)

from prettytable import PrettyTable

for label, data in (('Word', words), 
                    ('Screen Name', screen_names), 
                    ('Hashtag', hashtags)):
    pt = PrettyTable(field_names=[label, 'Count']) 
    c = Counter(data)
    [ pt.add_row(kv) for kv in c.most_common()[:10] ]
    pt.align[label], pt.align['Count'] = 'l', 'r' # Set column alignment
    print pt

#### Code for mining top 5 retweeted Tesla tweets:

This code retreieve the top 5 popular Tesla tweets. The tweets were then read by us and categorized as positive, negative or neutral. In the first specification a dummy = 1 for mostly positive values if at least 3 of the five tweets were positive and 0 otherwise was constructed. In the robustness check two dummy variables for positive and negative were added to the regression with neutral as the reference. 

In [None]:
retweets = [
            # Store out a tuple of these three values ...
            (status['retweet_count'], 
             status['retweeted_status']['user']['screen_name'],
             status['text']) 
            
            # ... for each status ...
            for status in statuses 
            
            # ... so long as the status meets this condition.
                if status.has_key('retweeted_status')
           ]

# Slice off the first 5 from the sorted results and display each item in the tuple

pt = PrettyTable(field_names=['Count', 'Screen Name', 'Text'])
[ pt.add_row(row) for row in sorted(retweets, reverse=True)[:5] ]
pt.max_width['Text'] = 50
pt.align= 'l'
print pt


#### Data for the regression:

This code specifies the data to be used in our regression. The change in Tesla stock price is the dependent variable and was obtained via [Yahoo Finance](http://finance.yahoo.com/q?s=TSLA). The first independent, also obtained via [Yahoo Finance](http://finance.yahoo.com/q?s=%5EIXIC), is the NASDAQ composite stock fund price. 

The remaining data for the independent variables was collected as described above in the [Twitter mining code](#Twitter data mining code). The second independent variable is a dummy describing whether or not the top 5 Tesla tweets were positive or not. The third independent variable is a measure of the log number of Tesla hashtags appearing at the time of data mining. 


In [3]:
import numpy as np
import math
import statsmodels.api as sm

In [4]:
y = [7.82, 9.4, 8.48, 9.95, -8.22, -7.13, -0.15] # Change in stock price
x = [
    [44.69, -22.74, -47.87, 76.79, -72.35, 2.32, -17.29], # Change in NASDAQ Composite Index
    [1, 1, 1, 0, 0, 0, 0], # Dummy measuring reactions = {positive, non-positive} = {1, 0}
    [5.69, 5.59, 5.48, 5.42, 5.54, 5.51, 5.36] # Log no. hashtags
]

In [5]:
def reg_m(y, x):
    ones = np.ones(len(x[0]))
    X = sm.add_constant(np.column_stack((x[0], ones)))
    for ele in x[1:]:
        X = sm.add_constant(np.column_stack((ele, X)))
    results = sm.OLS(y, X).fit()
    return results

## Descriptive Statistics

#### Calculating the sample means for our variables:

The following code provides sample means for all our variables according to this formula:

$$ \bar{x}=\sum_{i=1}^{n}\frac{x_i}{n} $$

In [1]:
def mean(values):  
    length = len(values)
    total_sum = 0
    for i in range(length):
        total_sum += values [i]
    average = total_sum*1.0/length
    return average
x1 = [44.69, -22.74, -47.87, 76.79, -72.35, 2.32, -17.29] # Change in NASDAQ Composite Index
x2 = [1, 1, 1, 0, 0, 0, 0] # Dummy1 measuring reactions = {positive, non-positive} = {1, 0}
x3 = [297, 268, 239, 226, 254, 246, 254] # no. hashtags
y = [7.82, 9.4, 8.48, 9.95, -8.22, -7.13, -0.15]
m1 = mean(x1)
m2 = mean (x2)
m3 = mean (x3)
m4 = mean (y)
print (m1) 
print (m2)
print (m3)
print (m4)

-5.20714285714
0.428571428571
254.857142857
2.87857142857


#### Calculating standard deviations for our variables:

The following code provides sample standard deviations for all variables according to the following formula:

$$ s= \sqrt{\sum_{i=1}^{n} \frac{(x_i - \bar{x})^2 }{n-1}} $$

In [4]:
import math
def mean(values):  
    length = len(values)
    total_sum = 0
    for i in range(length):
        total_sum += values [i]
    average = total_sum*1.0/length
    return average

def stanDev(values):
    length = len(values)
    m = mean(values)
    total_sum = 0
    for i in range(length):
        total_sum += (values[i]-m)**2
    under_root = total_sum*1.0/length
    return math.sqrt(under_root)

x1 = [44.69, -22.74, -47.87, 76.79, -72.35, 2.32, -17.29] # Change in NASDAQ Composite Index
x2 = [1, 1, 1, 0, 0, 0, 0] # Dummy measuring reactions = {positive, non-positive} = {1, 0}
x3 = [297, 268, 239, 226, 254, 246, 254] # no. hashtags
y = [7.82, 9.4, 8.48, 9.95, -8.22, -7.13, -0.15]
s1 = stanDev(x1)
s2 = stanDev(x2)
s3 = stanDev(x3)
s4 = stanDev(y)
print (s1) 
print (s2)
print (s3)
print (s4)


47.8861646331
0.494871659305
21.0877660636
7.376118663


#### Calculating correlation coefficient:

The following code is used to calculate the correlation coefficients for the variables according to this formula:

$$ r = \frac{1}{n-1} \sum_{i=1}^{n}\frac{(x_i-\bar{x})}{s_x} * \frac{(y_i-\bar{y})}{s_y}$$

In [5]:
def average(x):
    assert len(x) > 0
    return float(sum(x)) / float(len(x))

def pearson_def(x, y):
    assert len(x) == len(y)
    n = len(x)
    assert n > 0
    avg_x = average(x)
    avg_y = average(y)
    diffprod = 0
    xdiff2 = 0
    ydiff2 = 0
    for idx in range(n):
        xdiff = x[idx] - avg_x
        ydiff = y[idx] - avg_y
        diffprod += xdiff * ydiff
        xdiff2 += xdiff * xdiff
        ydiff2 += ydiff * ydiff

    return diffprod / math.sqrt(xdiff2 * ydiff2)

#To find the correlation between change in stock price and the NASDAQ composite index:
r1 = pearson_def([44.69, -22.74, -47.87, 76.79, -72.35, 2.32, -17.29],[7.82, 9.4, 8.48, 9.95, -8.22, -7.13, -0.15])

# To find the correlation between change in stock price and no. #: 
r2 = pearson_def([297, 268, 239, 226, 254, 246, 254],[7.82, 9.4, 8.48, 9.95, -8.22, -7.13, -0.15])

# To find the correlation between no. # and the NASDAQ composite index:
r3 = pearson_def ([44.69, -22.74, -47.87, 76.79, -72.35, 2.32, -17.29],[297, 268, 239, 226, 254, 246, 254])

# To find correlation between the dummy and stock price:
r4 = pearson_def([7.82, 9.4, 8.48, 9.95, -8.22, -7.13, -0.15],[1, 1, 1, 0, 0, 0, 0])

# To find correlation between the dummy and NASDAQ:
r5 = pearson_def([44.69, -22.74, -47.87, 76.79, -72.35, 2.32, -17.29],[1, 1, 1, 0, 0, 0, 0])

# To find correlation between the dummy and no. #:
r6 = pearson_def([297, 268, 239, 226, 254, 246, 254],[1, 1, 1, 0, 0, 0, 0])

# To find correlation between the postive dummy and negative dummy:
r7 = pearson_def([1, 1, 1, 0, 0, 0, 0],[0, 0, 0, 1, 1, 0, 0])

print r1
print r2
print r3
print r4
print r5
print r6
print r7

0.477062234041
0.0935218611702
0.0260193854885
0.667835646415
-0.0620835165242
0.539746511305
-0.547722557505


The correlation coefficients show low values which indicates that there is little reason to suspect multicollinearity to be present.
T only exception is the positive and negative dummy variables (used below in the [Robustness Check](#Robustness Check). With r=-0.5477 they display a moderate negative correlation, which indicates there is a tendency for high positive values to correspond with low negative values. This makes sense given that the top Tweets are only categorized as either positive, neutral, or negative.

## Graphic Representation

For a user guide to plotly for Python see this [link](https://plot.ly/python/getting-started/). You will need to install software for the graph to display properly in your Python notebook. 

After the software is installed, an interactive graph will appear below. Readers are able to zoom and select or deselect data trends to appear in the graph. It is a useful tool to determine how the variables are correlated and how well the model specifications might fit the data set.

In [6]:
import plotly.tools as tls

tls.embed("https://plot.ly/~eBelles/7/tesla-social-media-and-stock-trading/")

## Results 

## Model I

For the output below the following model is assumed:

$ {\Delta} Tesla Stock Price_t =  {\beta}_0 + {\beta}_1 {\Delta} NASDAQ Composite Stock Pric_t + {\beta}_2 Positive Top 5 Tweets_t + {\beta}_3 log(Number of Tesla Hashtags_t) + u_t $


In [10]:
print reg_m(y, x).summary()

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.893
Model:                            OLS   Adj. R-squared:                  0.786
Method:                 Least Squares   F-statistic:                     8.342
Date:                Thu, 14 Apr 2016   Prob (F-statistic):             0.0575
Time:                        12:30:54   Log-Likelihood:                -16.100
No. Observations:                   7   AIC:                             40.20
Df Residuals:                       3   BIC:                             39.98
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1           -40.0120     17.963     -2.227      0.1


omni_normtest is not valid with less than 8 observations; 7 samples were given.



The results show some interesting insights. First, the $R^2$ shows a value of 0.893. That would indicate that 0.893% of the variance in the stock prices could be explained by the independent variables, i.e. change in the NASDAQ Composite index, positive/negative tweet reactions and the number of tweets. The coefficient of the first regressor is ${\beta}_1=-40.01$ and seems to be completely unreasonable. But the P-statistic shows also that this result is insignificant. As we know, the number of observations is 7 because we have obviously only collected data for 7 days. This could explain the extreme outcome for this variable. The same could be said for the coefficient of the second regressor ${\beta}_2=-15.64$, which is the dummy variable that indicates that if the trending Tesla tweets of that specific day are 'positive', the stock price of Tesla will go up with 15.64 dollars and it is even significant ($p=0.024$) at the 5% level. Probably we misspecified the model, so we will try the level specification for the number of times the $#Tesla$ related tweets are tweeted instead of using the log of this same number. Results are shown in Model II. 

### Model II

For the output below the following model is assumed:

$ {\Delta} Tesla Stock Price_t=  {\beta}_0 + {\beta}_1 {\Delta} NASDAQ Composite Stock Price_t + {\beta}_2 Positive Top 5 Tweets_t + {\beta}_3 Number of Tesla Hashtags_t + u_t $

In [11]:
y = [7.82, 9.4, 8.48, 9.95, -8.22, -7.13, -0.15] # Change in stock price
x = [
    [44.69, -22.74, -47.87, 76.79, -72.35, 2.32, -17.29], # Change in NASDAQ Composite Index
    [1, 1, 1, 0, 0, 0, 0], # Dummy measuring reactions = {positive, non-positive} = {1, 0}
    [297, 268, 239, 226, 254, 246, 254] # no. hashtags
]

In [12]:
print reg_m(y, x).summary()

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.842
Model:                            OLS   Adj. R-squared:                  0.684
Method:                 Least Squares   F-statistic:                     5.321
Date:                Thu, 14 Apr 2016   Prob (F-statistic):              0.102
Time:                        12:30:55   Log-Likelihood:                -17.467
No. Observations:                   7   AIC:                             42.93
Df Residuals:                       3   BIC:                             42.72
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1            -0.1478      0.096     -1.545      0.2

Model II shows more reasonable values for the first regressor. The coefficient of the first regressor is $ {\beta}_1=-0.1478 $, but with a $ p=0.024 $. The second regressor also has a more realistic coefficient, ($ {\beta}_2=-13.86 $) but again the value is more insignificant as compared to the previous model. The $R^2 $ with a value of 0.842, is slightly worse than the log specification. The $ R^2 $ would indicate that 0.842% of the variance in the stock prices could be explained by the independent variables, i.e. change in the NASDAQ Composite index, positive/negative tweet reactions and the number of tweets.

In the next section we check for the robustness of the model.

### Robustness Check

For the output below the following model is assumed:

$ {\Delta} Tesla Stock Price_t=  {\beta}_0 + {\beta}_1 {\Delta} NASDAQ Composite Stock Price_t + {\beta}_2 Positive Top 5 Tweets_t + {\beta}_2 Negative Top 5 Tweets_t + {\beta}_3 Number of Tesla Hashtags_t + u_t $

In [14]:
y = [7.82, 9.4, 8.48, 9.95, -8.22, -7.13, -0.15] # Change in stock price
x = [
    [44.69, -22.74, -47.87, 76.79, -72.35, 2.32, -17.29], # Change in NASDAQ Composite Index
    [1, 1, 1, 0, 0, 0, 0], # Dummy measuring positive reactions = {positive, non-positive} = {1, 0}
    [0, 0, 0, 1, 1, 0, 0], # Dummy measuring negative reactions = {negative, non-negative} = {1,0}
    [297, 268, 239, 226, 254, 246, 254] # no.hashtags
]

In [15]:
def reg_m(y, x):
    ones = np.ones(len(x[0]))
    X = sm.add_constant(np.column_stack((x[0], ones)))
    for ele in x[1:]:
        X = sm.add_constant(np.column_stack((ele, X)))
    results = sm.OLS(y, X).fit()
    return results

In [16]:
print reg_m(y, x).summary()

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.855
Model:                            OLS   Adj. R-squared:                  0.566
Method:                 Least Squares   F-statistic:                     2.957
Date:                Thu, 14 Apr 2016   Prob (F-statistic):              0.268
Time:                        12:30:56   Log-Likelihood:                -17.153
No. Observations:                   7   AIC:                             44.31
Df Residuals:                       2   BIC:                             44.04
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1            -0.1369      0.115     -1.192      0.3

By including a third category instead of only 'positive' and 'non-positive', we now include 'neutral' in the model and test if it affects our results. It slightly improved the $R^2$ from 0.842 to 0.855. However, the model should be tested with more data points to see if the variable should be added. Furthermore, it affects the coefficient of the first regressor of change in NASDAQ composite index (${\beta}_1=-0.1369$. But notice that the coefficient becomes much more insignificant as compared to the model without the extra dummy variable. 

## Discussion & Limitations

Our research has some obvious limitations. First, the time window for the data is very limited. We only collected data of tweets for seven days because of the limited time we had. In the ideal case we would have constructed a large sample of data and introduced a control group. The paper of [Bollen, Mao & Zeng (2011)](http://arxiv.org/pdf/1010.3003.pdf) did research on whether twitter mood can predict stock market dynamics or not. They obtained a collection of 9,853,498 public tweets obtained from approximately 2.7 million users. [Figure 1](https://www.technologyreview.com/i/legacy/twitter_feed.png?sw=1180) summarizes their approach. They filtered the tweets on particular expressions that indicate a tweet about people’s mood and filter out the ‘spam’ tweets by extracting tweets that contain “www.” As can be seen in [Figure 1](https://www.technologyreview.com/i/legacy/twitter_feed.png?sw=1180), they obtain three phases. The first phase subject the collection of daily tweets to 2 mood assessment tool: (1) OpinionFinder which measures positive vs. negative mood from text content, and (2) GPOMS which measures 6 different mood dimensions from text content. This results in a total of 7 public mood time series, one generated by OpinionFinder and six generated by GPOMS, each representing a potentially different aspects of the public’s mood on a given day. Simultaneously, times series data of the stock market is collected. The big difference with our method is the difference in quality of the dataset. These researchers were able to filter much more precisely and their dataset was much bigger. 

Furthermore, one could think of limitations in the way of the absence of a control group. The control group could be the historical data of the stock of Tesla itself. This way we would be able to compare the dynamics around the time of the release of the Model 3 with the dynamics in long periods without a release of a model, or with dynamics in a period of the release of another model. Finally, the data used are frequencies of tweets at certain moments. It would increase the quality of the research if we were able to collect data of complete time windows, but because of the way in which our data was mined it would be very labor intesive to collect twitter data for many times. 

Furthermore, the regressions show a warning message which states that "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.[2] The condition number is large, 3.57e+03. This might indicate that there are strong multicollinearity or other numerical problems." To test for multicollinearity we calculated the correlation coefficient in the section below.
 


## Conclusion

Because our data sample is small and we were unable to account for a number of excluded variables or compare to a control group, we cannot conclude whether the number of hashtags about Tesla or how positive the top 5 tweets has a causal effect on the change in stock market prices. The following findings comment only on correlation. 

For the period of April 1-11 [we find](#Model II) that the change in closing Tesla stock prices is inversely related to the change in the NASDAQ index with a coefficient of -0.15, implying a 1 dollar decrease in the NASDAQ suggests a 15 cent increase in Tesla stocks, but it is noteworthy that the result was not statistically significant. The Tesla stock price increases by $13.86 if the majority of twitter of top 5 trending tweets is positive, and this was significant at the 5% level. Finally, the number of hashtags related to Tesla was significant at the 10% level with a coefficient of 0.084, implying 1 tweet coincides with a 8.4 cent increase in the value of Tesla stock prices.     

Future research suggestions might include expanding the dataset for a longer range of time, including additional excluded variables for macroeconomic data or newscoverage, including a control group, and running a pannel data regression for a number of different companies. Furthermore, since the change in NASDAQ composite stock price is simultaneously determined along with the change in Tesla prices, our model would have limited predictive powers unless it could be somehow otherwise specified. 