
## Introduction

Hello! Thanks for looking over my work!

In this notebook, I try to determine wheter the users of the popular sub-reddit [r/wallstreetbets](https://www.reddit.com/r/wallstreetbets/) can be used as a source for predicting what's going to happen in the markets, if they can be used to understand capital markets better with more with more people getting into retail investing and how right or wrong they are in their investing decisions.

The strategy for examining their short-term moves on the US Equity Market will be to *mine* (download through the Reddit API) their posts and filter out the stock tickers mentioned in them. The sentiment around each ticker is also calculated, using NLP techniques, from the comment in which the ticker is mentioned. The amount of times a ticker is mentioned and the overall sentiment around it can be comapred with the actual volume of the stock and its price which can help us in finding a qualitative relation between the two (if any).

*So let's get started!*

NOTE: Feel free to add new code cells to have a look at the changes made in the data at any step.

NOTE: Working knowledge of Python, a Reddit account and some familiarity and r/wallstreetbets is required to use and understand this notebook

~This notebook was made using Google Colab

##Requirements 

In this first section, I shall mention and install the different libraries that were used in the project and give some special attnetion to the ones that aren't particularly common.

>First up is the **PRAW (Python Reddit API Wrapper)** which will allow us to fetch data from reddit. Get started on the docs [here.](https://praw.readthedocs.io/en/latest/index.html)

In [None]:
pip install praw

Collecting praw
[?25l  Downloading https://files.pythonhosted.org/packages/48/a8/a2e2d0750ee17c7e3d81e4695a0338ad0b3f231853b8c3fa339ff2d25c7c/praw-7.2.0-py3-none-any.whl (159kB)
[K     |████████████████████████████████| 163kB 5.5MB/s 
[?25hCollecting websocket-client>=0.54.0
[?25l  Downloading https://files.pythonhosted.org/packages/08/33/80e0d4f60e84a1ddd9a03f340be1065a2a363c47ce65c4bd3bae65ce9631/websocket_client-0.58.0-py2.py3-none-any.whl (61kB)
[K     |████████████████████████████████| 61kB 4.2MB/s 
[?25hCollecting update-checker>=0.18
  Downloading https://files.pythonhosted.org/packages/0c/ba/8dd7fa5f0b1c6a8ac62f8f57f7e794160c1f86f31c6d0fb00f582372a3e4/update_checker-0.18.0-py3-none-any.whl
Collecting prawcore<3,>=2
  Downloading https://files.pythonhosted.org/packages/7d/df/4a9106bea0d26689c4b309da20c926a01440ddaf60c09a5ae22684ebd35f/prawcore-2.0.0-py3-none-any.whl
Installing collected packages: websocket-client, update-checker, prawcore, praw
Successfully installed praw-

>I use the *Aho-Corasick* algorithm to effeciently find the stock tickers in a given comment. The algorithm allows us to search for the tickers in a parallel way. As we have around 10k tickers, looping through each will be time consuming which is why algorithm is the better choice over something like *Rabin-Karp*. 
Give [this](https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm) article a read to know why! :)



>The *pyahocorasick* library is used in this project. Get started on the docs [here](https://pyahocorasick.readthedocs.io/en/latest/) and we will install it in the code cell below.

In [None]:
pip install pyahocorasick

Collecting pyahocorasick
[?25l  Downloading https://files.pythonhosted.org/packages/7f/c2/eae730037ae1cbbfaa229d27030d1d5e34a1e41114b21447d1202ae9c220/pyahocorasick-1.4.2.tar.gz (321kB)
[K     |█                               | 10kB 11.0MB/s eta 0:00:01[K     |██                              | 20kB 7.7MB/s eta 0:00:01[K     |███                             | 30kB 8.2MB/s eta 0:00:01[K     |████                            | 40kB 6.7MB/s eta 0:00:01[K     |█████                           | 51kB 4.4MB/s eta 0:00:01[K     |██████▏                         | 61kB 4.7MB/s eta 0:00:01[K     |███████▏                        | 71kB 5.2MB/s eta 0:00:01[K     |████████▏                       | 81kB 5.0MB/s eta 0:00:01[K     |█████████▏                      | 92kB 5.4MB/s eta 0:00:01[K     |██████████▏                     | 102kB 5.5MB/s eta 0:00:01[K     |███████████▎                    | 112kB 5.5MB/s eta 0:00:01[K     |████████████▎                   | 122kB 5.5MB/s eta 







>For sentiment analysis, the VADER (*Valence Aware Dictionary for Sentiment Reasoning*) sentiment analyser will be used.

>The *vaderSentiment* library is installed in the code cell below.

In [None]:
pip install vaderSentiment 

Collecting vaderSentiment
[?25l  Downloading https://files.pythonhosted.org/packages/76/fc/310e16254683c1ed35eeb97386986d6c00bc29df17ce280aed64d55537e9/vaderSentiment-3.3.2-py2.py3-none-any.whl (125kB)
[K     |██▋                             | 10kB 12.1MB/s eta 0:00:01[K     |█████▏                          | 20kB 12.8MB/s eta 0:00:01[K     |███████▉                        | 30kB 10.0MB/s eta 0:00:01[K     |██████████▍                     | 40kB 7.7MB/s eta 0:00:01[K     |█████████████                   | 51kB 4.3MB/s eta 0:00:01[K     |███████████████▋                | 61kB 4.6MB/s eta 0:00:01[K     |██████████████████▏             | 71kB 5.0MB/s eta 0:00:01[K     |████████████████████▉           | 81kB 5.3MB/s eta 0:00:01[K     |███████████████████████▍        | 92kB 5.2MB/s eta 0:00:01[K     |██████████████████████████      | 102kB 4.3MB/s eta 0:00:01[K     |████████████████████████████▋   | 112kB 4.3MB/s eta 0:00:01[K     |███████████████████████████████▏| 

>Next, to accomplish the task of getting price data and volume of a particular stock, we install the yfinance API (fetches data from Yahoo! Finance) 

>Get started on the library [here.](https://pypi.org/project/yfinance/)

>The yfinance library is installed in the code cell below.


In [None]:
pip install yfinance

Collecting yfinance
  Downloading https://files.pythonhosted.org/packages/a7/ee/315752b9ef281ba83c62aa7ec2e2074f85223da6e7e74efb4d3e11c0f510/yfinance-0.1.59.tar.gz
Collecting lxml>=4.5.1
[?25l  Downloading https://files.pythonhosted.org/packages/cf/4d/6537313bf58fe22b508f08cf3eb86b29b6f9edf68e00454224539421073b/lxml-4.6.3-cp37-cp37m-manylinux1_x86_64.whl (5.5MB)
[K     |████████████████████████████████| 5.5MB 6.0MB/s 
Building wheels for collected packages: yfinance
  Building wheel for yfinance (setup.py) ... [?25l[?25hdone
  Created wheel for yfinance: filename=yfinance-0.1.59-py2.py3-none-any.whl size=23442 sha256=785007edbb87858331c88716c7cf210df436149e981bf35970fa1bb27420d80c
  Stored in directory: /root/.cache/pip/wheels/f8/2a/0f/4b5a86e1d52e451757eb6bc17fd899629f0925c777741b6d04
Successfully built yfinance
Installing collected packages: lxml, yfinance
  Found existing installation: lxml 4.2.6
    Uninstalling lxml-4.2.6:
      Successfully uninstalled lxml-4.2.6
Successfully



> And with that we finish installing the 'special' libraries that were required. The project also  requires *Pandas*, *Numpy*, *Matplotlib* and *Seaborn* for playing with our data. Most systems already have these installed but for completeness I install them in the code cells



In [None]:
pip install pandas



In [None]:
pip install numpy



In [None]:
pip install matplotlib



In [None]:
pip install seaborn



#Fetching and Preparing the Data
>We look through the comments of several "What Are Your Moves Tomorrow?" posts, arrange the relevant parameters in dataframes and set ourselves up to use that info to get insights.

>Inorder to fetch these posts you will need to generate your own credentials as mentioned in the PRAW documentation.[This](https://www.youtube.com/watch?v=NRgfgtzIhBQ) small video will guide you through the process.

>Apologies for this trouble but this the first and last thing change one needs to do in the code to get it running :)

In [None]:
#Making a Reddit instance (Refer PRAW documentation)
import praw 
#Setting up credentials (use strings to set credentials)
clientID=
clientSecret= 
userAgent='wsb_scraper'

reddit=praw.Reddit(client_id=clientID,
                   client_secret=clientSecret,
                   user_agent=userAgent,
                   check_for_async=False)



In [None]:

#In this step, we collect the last 200 posts with 'What Are Your Moves Tomorrow in the Title' and arrange their IDs in a dataframe.
#NOTE: Your last 200 posts may be different from mine
import pandas as pd

numberofPosts=200     #Change this to get more/less posts

wallstreetbets=reddit.subreddit('wallstreetbets')
posts=pd.DataFrame(([submission.created,submission.title,submission.id] for submission in wallstreetbets.search('What Are Your Moves Tomorrow',limit=200)),columns=['Time','Title','ID'] )

#Sorting the posts by date
posts=posts.sort_values(by='Time',ignore_index=True)



In [None]:
#Defining a function to convert UNIX time to YYYY-MM-DD format

import datetime 
def convert(x):
  timestamp=datetime.datetime.fromtimestamp(x)
  return timestamp.strftime('%Y-%m-%d %H:%M:%S')[:10]

In [None]:
#Applying the function
posts['Time']=posts['Time'].apply(convert)
posts.head()

Unnamed: 0,Time,Title,ID
0,2020-06-01,"What Are Your Moves Tomorrow, June 01, 2020",gu5xww
1,2020-06-09,"What Are Your Moves Tomorrow, June 09, 2020",gz6uth
2,2020-06-10,"What Are Your Moves Tomorrow, June 10, 2020",gzuxmi
3,2020-06-11,"What Are Your Moves Tomorrow, June 11, 2020",h0iz8p
4,2020-06-12,"What Are Your Moves Tomorrow, June 12, 2020",h16wnv


It won't be practical for our purposes to fetch data for all 200 posts.
We stick to getting 25 posts covering data across two weeks. For that we select 25 dates from dataframe posts.

 (It takes approximately ~1 min per post on Google Colab and hence around 25 minutes in total but more than 43 minutes on my local machine!)

In [None]:
selectedPosts=posts[110:135]   #Novemeber 18,2020 to December 18,2020
selectedPosts=selectedPosts.reset_index(drop=True)

In [None]:
#To help us replace punctuation in comments with spaces
import string
removelist=string.punctuation
spaces=' '*len(removelist)

Now we collect over 5k (commentlimit=50) comments for a post for every date using the function *extractor *defined below.
Setting the limit to 50 helps control spam comments with few upvotes and saves a lot of time

NOTE: You can collect more comments by increasing the value of 'commentlimit' variable. Set it to None to get all comments.

In [None]:
commentlimit=50       #Change this to get more/less comments

dataframes=[]
def extractor(postID):
  post=reddit.submission(id=postID)
  post.comments.replace_more(limit=commentlimit)
  PostDate=convert(post.created)
  df=pd.DataFrame(([str(text.body).translate(str.maketrans(removelist,spaces)),text.score,0] for text in (post.comments.list())),columns=['Comment','Upvotes','Sentiment'])
  dataframes.append(df)

selectedPosts['ID'].apply(extractor)

0     None
1     None
2     None
3     None
4     None
5     None
6     None
7     None
8     None
9     None
10    None
11    None
12    None
13    None
14    None
15    None
16    None
17    None
18    None
19    None
20    None
21    None
22    None
23    None
24    None
Name: ID, dtype: object

In [None]:
#and we are done...this is how a dataframe of 'comments' looks
dataframes[0]

Unnamed: 0,Comment,Upvotes,Sentiment
0,Today I was up 58 43 on a 100k account I m ...,144,0
1,Guys I think I have a legit addiction I look a...,108,0
2,COVID IS CURED I WON THE ELECTION BUY SPY CA...,238,0
3,BREAKING NEWS SNP 500 TO BE ADDED TO TESLA STOCK,225,0
4,deleted,75,0
...,...,...,...
5310,I ll day trade some today then pick up some J...,2,0
5311,Humble too Jesus fuck you love yourself,2,0
5312,It only took a couple right leaning posts in ...,3,0
5313,deleted,3,0


#Searching for Tickers

In this section we search the stocks mentioned in the comments obtained previously.

Please make sure you have the NasdaqTradedCSV file ready to be uploded.
I have removed some tickers that aren't popular enough for r/wsb like 'A' as the code gave false detection. This is because it confuses it with someone typing their 
comment in all-caps. 


(A=Agile Technologies)


eg: 'I MADE A HUGE PROFIT ON TSLA!' will detect Agile Tech. as one of the stocks mentioned in the comment in addition with Tesla Inc.



The code may also detect tickers like IPO as ones having high frequency whereas in reality, people were just discussing the upcoming IPO of some other stock.


Hence one should be careful about such false detections.


In [None]:
#Loading up the CSV file for Nasdaq traded tickers (includes NYSE listed stocks) 
#Sourced from NASDAQ website
#If you are running this code locally, please provide the file path as well as df = pd.read_csv (r'Path where the Tickers file is stored\File name.csv')


Tickers=pd.read_csv('NasdaqTradedCSV.csv')
Tickers=Tickers.drop(Tickers.iloc[:,3:],axis=1)
Tickers=Tickers.drop(['Nasdaq Traded'],axis=1)


In [None]:
#Setting up the trie for Ahocorasick search
import ahocorasick

A=ahocorasick.Automaton()
for idx,row in Tickers.iterrows():
    A.add_word(str(" ")+str(row[0])+str(" "),str(row[0]))   #We pad each ticker with spaces so as to avoid detection of tickers which are present as a substring in another ticker.
                                                            # eg: 'GM' is in 'GME' but ' GM ' is not in ' GME '
A.make_automaton()

In [None]:
#Function to find stocks in comments

import numpy as np

def stockfinder(x):
  x['Stocks Mentioned']=np.nan
  x['Stocks Mentioned']=x['Stocks Mentioned'].astype(object)
  for idx,row in x.iterrows():
    stonks=[]
    for stock in A.iter(str(row['Comment'])):
      stonks.append(stock[1])

    x['Stocks Mentioned']=x['Stocks Mentioned'].astype(object)
    x.at[idx,'Stocks Mentioned']=stonks

In [None]:
#Applying the function to our dataframes
for x in dataframes:
  stockfinder(x)

In [None]:
#Making a dataframe to store the frequency of every ticker mentioned
header=pd.concat([pd.Series(['Date']),Tickers['Symbol']])
frequency=pd.DataFrame(columns=list(header))
zeroRow=np.zeros([Tickers.shape[0]+1,1],dtype=int) #To help initialize rows later


In [None]:
#Code to fill the frequency table
for x in range(len(dataframes)):
  frequency.loc[x]=list(zeroRow)
  frequency.loc[x]['Date']=selectedPosts['Time'][x]
  for i in range(dataframes[x].shape[0]):
    for j in dataframes[x].loc[i,'Stocks Mentioned']:
      frequency.loc[x,j]=frequency.loc[x,j]+1

#And we are done!

#Calculating Sentiment

In this section we will use the VADER sentiment analyzer to binarily classify comments as bullish or bearish and calculate the total positive sentiment % around each popular (meme) stock.

In [None]:
#Importing the Vader model and adding some new words (with associated weights) to it to get more accurate results. 

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer=SentimentIntensityAnalyzer()
new_words={'calls':3,'puts':-3,'moon':3,'yolo':2,'buy':3,'sell':-3}
analyzer.lexicon.update(new_words)

In [None]:
#Initializng two dataframes for string positive sentiment and total possible sentiment
Sentiment=pd.DataFrame(columns=header)
TotalSentiment=pd.DataFrame(columns=header)

In [None]:
#Looping over the selected posts
import math


for x in range(len(dataframes)):
  for y in range(dataframes[x].shape[0]):
    if len(dataframes[x].loc[y,'Stocks Mentioned'])>0:
      dataframes[x].loc[y,'Sentiment']=min(int(analyzer.polarity_scores(dataframes[x].loc[y,'Comment'])['compound']+1.05),1) 
      #Here I set an extra bias of 0.05 as wallstreetbets is very-bullish in general, but uses a lot of cuss-words which make the analyzer think
      # that the sentiment is negative but it is actually positive. This bias helps offset that to a small extent.

for y in range(len(dataframes)):
  Sentiment.loc[y]=list(zeroRow)
  TotalSentiment.loc[y]=list(zeroRow)
  for i in range(dataframes[y].shape[0]):
    for x in dataframes[y].loc[i,'Stocks Mentioned']:
      TotalSentiment.loc[y,'Date']=selectedPosts['Time'][y]
      Sentiment.loc[y,'Date']=selectedPosts['Time'][y]
      Sentiment.loc[y,x]=Sentiment.loc[y,x]+dataframes[y].loc[i,'Sentiment']*int(math.log(max(2,dataframes[y].loc[i,'Upvotes']+2),2))
      TotalSentiment.loc[y,x]=TotalSentiment.loc[y,x]+int(math.log(max(2,dataframes[y].loc[i,'Upvotes']+2),2))

#The calculation for sentiment takes into account the number of upvotes a comment reduced by a logarithmic received to adjust for the hive mentality  

#Price and Volume Data of Popular Tickers
The yfinance API will be used to get the opening price, closing price and volume data for the most popular tickers of the chosen time period.

In [None]:
import yfinance as yf

In [None]:
#Selecting Important stocks
#(Ones that were mentioned above a certain threshold number of times in any post in the decided range)

impstocks=[]
threshold=24

for (name,mentions) in frequency.loc[:,'AA'::1].iteritems():
  if int(mentions.max())>=threshold:
    impstocks.append(name)

In [None]:
#Generating a string 'y' of the important stock tickers
y=''
for x in impstocks:
  y=y+x+' '
y=y[:-1]


In [None]:
#Generating the Prices and Volume Dataframes
#NOTE:Some tickers may have NaN values. Eg. The delisted ones, the ones discussed pre-IPO etc.
#Example-> AirBnB IPO on Decemeber 9. No price data before that!

data=yf.download(y,start=selectedPosts['Time'][0],end=selectedPosts['Time'][len(selectedPosts)-1])
Prices=data['Adj Close'].iloc[1:]
Prices.index=Prices.index.strftime('%Y-%m-%d')

Volume=data['Volume'].iloc[1:]
Volume.index = Volume.index.strftime('%Y-%m-%d')


[*********************100%***********************]  63 of 63 completed


In [None]:
#Setting indices
frequency=frequency.iloc[:-1,:]
frequency=frequency.set_index('Date')
frequency=frequency.astype(int)

Sentiment=Sentiment.set_index('Date')
TotalSentiment=TotalSentiment.set_index('Date')

SentimentPercentage=Sentiment/TotalSentiment
SentimentPercentage=SentimentPercentage.astype(float)*100
SentimentPercentage=SentimentPercentage.fillna(0) #In-case there are unexpected NaN values.
SentimentPercentage=SentimentPercentage.iloc[:-1,:]

In [None]:
#Removing unneccesary data for stocks with low frequency count
SentimentPercentage=SentimentPercentage[impstocks]
frequency=frequency[impstocks]

#Visualizing and Analyzing
Let's see what the numbers have to say! I will not use Seaborn or Matplotlib to chart the data here as making combined charts with different and multiple scales can be challenging and difficult to reproduce.

I will instead, download the required dataaframes and use Excel to chart them. Feel free to use the Python libraries if you are a Matplotlib wizard

In [None]:
#Import some standard visualization tools 

import matplotlib.pyplot as plt
%matplotlib inline 
import seaborn as sns

In [None]:
#Code to change dataframes to csv files
#Can be downloaded from the files section, if the code is being run on Google Colab
#Or the files can be found in the same folder as the .ipynb file.


frequency.to_csv('Frequency.csv')
SentimentPercentage.to_csv('SentimentPercentage.csv')
Prices.to_csv('Prices.csv')
Volume.to_csv('Volume.csv')



In [None]:
#Run this code cell if you wish to parse over dates of the data using matplotlib/seaborn


pd.read_csv('SentimentPercentage.csv',index_col='Date',parse_dates=True)
pd.read_csv('Prices.csv',index_col='Date',parse_dates=True)
pd.read_csv('Volume.csv',index_col='Date',parse_dates=True)
pd.read_csv('Frequency.csv',index_col='Date',parse_dates=True)

Unnamed: 0_level_0,AAL,AAPL,ABNB,AI,AMD,APHA,APXT,ARKG,B,BA,BABA,BB,BFT,BLNK,C,CCL,CHWY,CIIC,CRM,CRSR,CRWD,DASH,DBX,DD,DIS,DKNG,DOCU,FB,FCEL,FDX,GME,IPO,LAZR,LGVW,MARA,MGNI,MRNA,MT,NIO,NKLA,NVDA,PFE,PLTR,PSTH,PTON,QS,RH,RKT,SBE,SLV,SOLO,SPCE,SPY,T,THC,THCB,TLRY,TSLA,UK,VLDR,X,XPEV,ZM
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1
2020-11-19,0,16,1,2,17,0,0,0,2,40,53,1,0,2,9,11,0,115,1,22,0,0,0,18,1,19,0,2,13,1,10,10,0,0,2,0,0,0,89,11,56,1,185,15,8,0,21,15,3,1,4,6,80,5,0,2,1,150,4,0,2,2,49
2020-11-20,1,7,0,0,10,2,0,0,8,9,91,0,0,13,5,7,0,30,0,21,1,0,0,17,1,12,1,2,9,1,16,25,0,0,0,0,5,0,68,11,17,7,101,4,10,0,14,11,17,0,40,8,96,6,0,2,0,112,3,0,3,19,21
2020-11-23,4,21,0,1,7,3,0,0,6,9,160,0,0,11,3,4,1,22,1,10,0,0,1,26,2,13,6,5,5,1,16,24,0,11,1,0,1,0,69,2,13,4,167,7,14,0,14,8,23,1,17,3,47,8,0,0,2,42,4,2,3,10,18
2020-11-24,1,51,1,1,6,10,1,0,5,13,75,0,0,50,5,6,1,76,0,31,1,0,0,32,0,5,1,1,58,0,29,10,0,30,9,0,0,0,134,12,2,0,276,5,6,0,16,36,33,1,11,7,37,6,3,374,1,90,1,4,1,72,11
2020-11-25,6,27,0,2,9,4,2,1,16,7,89,1,0,9,11,6,2,7,0,46,0,0,0,29,4,6,0,3,5,0,18,22,0,15,2,5,0,0,90,28,8,1,507,6,5,0,7,17,7,0,3,1,67,15,9,80,2,140,1,0,3,19,4
2020-11-26,1,10,1,2,5,3,23,1,44,2,36,0,0,1,30,2,0,2,19,39,0,0,0,22,1,7,0,2,2,1,51,39,1,6,1,7,4,0,38,4,4,0,800,3,5,0,14,6,1,2,1,7,30,20,31,0,1,44,2,0,5,6,17
2020-11-27,2,19,0,3,4,6,66,0,40,2,31,0,0,0,16,4,1,7,7,38,0,0,0,19,3,6,0,3,3,0,52,21,0,5,1,9,0,0,48,7,6,0,1365,8,5,0,21,14,1,2,1,9,33,29,21,0,0,75,1,0,9,6,11
2020-11-30,1,25,0,2,5,24,0,1,6,3,66,0,0,1,6,7,0,0,3,53,1,0,0,13,0,9,2,2,4,0,377,14,0,1,1,3,5,0,53,7,4,10,470,14,1,7,9,6,0,4,2,11,42,24,5,0,1,81,3,0,10,8,9
2020-12-01,2,66,0,4,32,77,0,0,6,2,23,0,0,0,12,2,0,0,9,18,0,0,0,25,0,3,3,1,4,0,222,13,0,0,6,1,31,0,69,95,5,16,321,52,6,4,9,8,0,3,1,9,50,11,3,0,3,234,11,0,6,10,39
2020-12-02,3,38,2,0,15,23,0,1,11,2,17,51,0,1,7,4,3,0,51,17,4,0,0,12,2,8,4,5,17,0,121,14,0,4,1,0,9,1,128,32,5,23,364,12,5,5,16,8,0,5,2,24,74,8,4,0,0,122,17,0,3,3,15


Another effective way of studying this data is by looking at the parameters of the individual stocks and comparing them. To achieve this, a function is provided which can allow us to get all the data for a stock of our choice in an organized manner.

In [None]:
#Function to get organized data for a specific ticker 'x'
def DataforDownload(x):
  return pd.DataFrame(data={'Frequency':frequency[x],'Price':Prices[x],'Volume':Volume[x],'Sentiment%':SentimentPercentage[x]})
  

In [None]:
#Using the function to get data for AAPL,TSLA,SPY, BABA and NIO
Apple=DataforDownload('AAPL')
Tesla=DataforDownload('TSLA')
SPY=DataforDownload('SPY')
Alibaba=DataforDownload('BABA')
Nio=DataforDownload('NIO')

Apple.to_csv('Apple.csv')
Tesla.to_csv('Tesla.csv')
SPY.to_csv('SPY.csv')
Alibaba.to_csv('Baba.csv')
Nio.to_csv('Nikola.csv')
