# Data Integration

This notebook produces the X and y data for use when training and testing the price movement model.

We first use the trained sentiment model (`sentiment.pkl`) to overwrite the sentiment values previously generated by vaderSentiment.

Then, we combine the price data together with the sentiment and reddit data.

We envision our model to be run at the start of every trading day. It will use the accumulated sentiments gathered from Reddit from the market close time of the previous trading day, till the market open time of the current trading day.

The market opens at 1.30pm (GMT), and closes at 8pm (GMT). So, if today is Tuesday, the model should be run at 1.30pm (GMT), and use the sentiments accumulated from the posts on Reddit made between 8pm (GMT) on Monday, and 1.30pm (GMT) today.

To mimic this behaviour, we filter the posts by their timestamps. We convert the times to the America/New York time first.

The model would also provide a prediction of how it projects the price to change from the close price compared to its open price. We calculate and store the percentage change of the prices.

Finally, for each ticker, we saved their combined data in `[ticker]_final.csv`. We also upload the initial results to Firebase so our website can have some preliminary data to work with.

*For security, we omit the API keys used to upload our results to Firebase.*

This notebook should be located in the same directory as the following files:

**From `scrape-reddit.py`**

- 01-updated.csv

- 02-updated.csv

- 03-updated.csv

- 04-updated.csv

- 05-updated.csv

- 06-updated.csv

**From `get-stock-data.ipynb`**

- BB.csv

- AMC.csv

- NOK.csv

- GME.csv

**From `sent-training.ipynb`**

- sentiment.pkl

# 1) Remove NAN values and convert the gmt to us gmt-4

In [3]:
import pandas as pd
from datetime import datetime, timedelta
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import pickle
import os

The 2 different approach to retrieving sentiments. We are using the second approach.

In [5]:
#Approach 1: Vader
analyser = SentimentIntensityAnalyzer()
def sentiment_analyzer_scores(row):
    score = analyser.polarity_scores(row['p'])
    score = float(str(score['compound']))
    if score != 0:
        new_score = score * 2
        if new_score > 0:
          new_score += 0.5
        else:
          new_score -= 0.5
    else:
        new_score = 0
    return int(new_score)

#Approach 2: Self-developed sentiment model and vectorizer
with open("sentiment.pkl", 'rb') as f:
    vectorizer, sentimental_model = pickle.load(f)
def get_sentiment_score(newpost):
  test_post = vectorizer.transform([newpost])
  prediction = sentimental_model.predict(test_post)
  return prediction

In [6]:
df = pd.read_csv('01-updated.csv')
df = df.append(pd.read_csv('02-updated.csv'), ignore_index=True)
df = df.append(pd.read_csv('03-updated.csv'), ignore_index=True)
df = df.append(pd.read_csv('04-updated.csv'), ignore_index=True)
df = df.append(pd.read_csv('05-updated.csv'), ignore_index=True)
df = df.append(pd.read_csv('06-updated.csv'), ignore_index=True)

df.dropna(subset = ["c"], inplace=True)
df.dropna(subset = ["n"], inplace=True)
df.dropna(subset = ["s"], inplace=True)
df.dropna(subset = ["r"], inplace=True)

df

Unnamed: 0.1,Unnamed: 0,a,c,i,t,p,n,bb,amc,nok,gme,any,s,r
0,0,stevenconrad,01-01-2021_00:05:29,ko145e,"gme to 420.69, but only if we make it happen. ...",we all know the short volume far exceeds 100% ...,4.0,False,False,False,True,True,9.0,0.80
1,1,WSBProfitProphet,01-01-2021_00:49:32,ko1ttx,🚀🚀🚀🚀how have we been so fucking blind? gme is ...,"gamestop colors: red, white and black houston ...",0.0,False,False,False,True,True,1.0,1.00
2,2,WSBProfitProphet,01-01-2021_00:56:35,ko1xxb,gme is the rockets 🚀🚀🚀🚀,"gamestop colors: red, white and black houston ...",10.0,False,False,False,True,True,57.0,0.82
3,3,Kitchen-Level,01-01-2021_01:04:23,ko22h8,stop posting gme dd🚀🚀🚀,"okay hear me out, we all already fucking have ...",32.0,False,False,False,True,True,0.0,0.50
4,4,SnooWalruses7854,01-01-2021_01:30:43,ko2h09,gme weird options price action,&amp;#x200b; https://preview.redd.it/tpg1pco3g...,27.0,False,False,False,True,True,22.0,0.78
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
51736,5756,WorkingUniversity134,30-06-2021_22:42:59,ob8g9f,this is why $husa is moving,[$husa](https://stocktwits.com/symbol/husa) to...,1.0,False,True,False,False,True,1.0,1.00
51737,5757,WorkingUniversity134,30-06-2021_22:45:24,ob8hr9,this is why husa is moving,&amp;#x200b; to all paperhands &amp; shorts: s...,1.0,False,True,False,False,True,1.0,1.00
51738,5758,ryanhuntermcb,30-06-2021_22:47:45,ob8j8a,viac: god king dd for all my coked out friends,viac is a massive media company that is sellin...,86.0,False,True,False,True,True,160.0,0.81
51739,5759,nyc_a,30-06-2021_23:33:05,ob9ceg,we are retards non investors we trade y-o-l-o ...,i will explain from ape to apes and please rem...,6.0,False,True,False,True,True,7.0,0.67


In [7]:
def convert_to_datetime(datetime_str):
    truncate = datetime_str[0:6:] + datetime_str[8:18:]
    #convert to datetime object
    datetime_obj = datetime.strptime(truncate, '%d-%m-%y_%H:%M:%S')
    #convert from gmt to gmt-4 for us timing
    datetime_obj -= timedelta(hours=4, minutes=0)
    return  datetime_obj

In [8]:
df['c'] = df['c'].apply(convert_to_datetime)
df = df.sort_values(by='c')

In [9]:
#add sentiments
#df['sentiment'] = df.apply(sentiment_analyzer_scores, axis=1)
df['sentiment'] = df.apply(get_sentiment_score, axis=1)

In [10]:
df.head()

Unnamed: 0.1,Unnamed: 0,a,c,i,t,p,n,bb,amc,nok,gme,any,s,r,sentiment
0,0,stevenconrad,2020-12-31 20:05:02,ko145e,"gme to 420.69, but only if we make it happen. ...",we all know the short volume far exceeds 100% ...,4.0,False,False,False,True,True,9.0,0.8,2
1,1,WSBProfitProphet,2020-12-31 20:49:03,ko1ttx,🚀🚀🚀🚀how have we been so fucking blind? gme is ...,"gamestop colors: red, white and black houston ...",0.0,False,False,False,True,True,1.0,1.0,1
2,2,WSBProfitProphet,2020-12-31 20:56:03,ko1xxb,gme is the rockets 🚀🚀🚀🚀,"gamestop colors: red, white and black houston ...",10.0,False,False,False,True,True,57.0,0.82,1
3,3,Kitchen-Level,2020-12-31 21:04:02,ko22h8,stop posting gme dd🚀🚀🚀,"okay hear me out, we all already fucking have ...",32.0,False,False,False,True,True,0.0,0.5,1
4,4,SnooWalruses7854,2020-12-31 21:30:04,ko2h09,gme weird options price action,&amp;#x200b; https://preview.redd.it/tpg1pco3g...,27.0,False,False,False,True,True,22.0,0.78,-2


In [11]:
df.to_csv('reddit_all.csv')

# 2) Get the labels to predict from the price changes

In [12]:
def get_price_change(row):
    return float((row['Close'] - row['Open'])/row['Open'])

def convert_to_datetime2(datetime_str):
    truncate = datetime_str[0:1:] + datetime_str[3::]
    #convert to datetime object
    datetime_obj = datetime.strptime(truncate, '%y-%m-%d')
    return datetime_obj

In [13]:
bb_price = pd.read_csv('BB.csv')

bb_price['price_change'] = bb_price.apply(get_price_change, axis=1)

bb_price['Date'] = bb_price['Date'].apply(convert_to_datetime2)
bb_price.sort_values(by='Date')

bb_price

bb_price.to_csv('bb_price.csv')

In [14]:
amc_price = pd.read_csv('AMC.csv')

amc_price['price_change'] = amc_price.apply(get_price_change, axis=1)

amc_price['Date'] = amc_price['Date'].apply(convert_to_datetime2)
amc_price.sort_values(by='Date')

amc_price

amc_price.to_csv('amc_price.csv')

In [15]:
nok_price = pd.read_csv('NOK.csv')

nok_price['price_change'] = nok_price.apply(get_price_change, axis=1)

nok_price['Date'] = nok_price['Date'].apply(convert_to_datetime2)
nok_price.sort_values(by='Date')

nok_price

nok_price.to_csv('nok_price.csv')

In [16]:
gme_price = pd.read_csv('GME.csv')

gme_price['price_change'] = gme_price.apply(get_price_change, axis=1)

gme_price['Date'] = gme_price['Date'].apply(convert_to_datetime2)
gme_price.sort_values(by='Date')

gme_price

gme_price.to_csv('gme_price.csv')

# 3) Create the dataset for each day

In [48]:
bb_price = pd.read_csv('bb_price.csv')
amc_price = pd.read_csv('amc_price.csv')
nok_price = pd.read_csv('nok_price.csv')
gme_price = pd.read_csv('gme_price.csv')

In [49]:
def convert_to_datetime3(datetime_str):
    #truncate = datetime_str
    truncate = datetime_str[0:1:] + datetime_str[3::]
    #convert to datetime object
    datetime_obj = datetime.strptime(truncate, '%y-%m-%d %H:%M:%S')
    return datetime_obj

In [50]:
df = pd.read_csv('reddit_all.csv')
df['c'] = df['c'].apply(convert_to_datetime3)

In [51]:
only_bb = df[df['bb']==True]
only_amc = df[df['amc']==True]
only_nok = df[df['nok']==True]
only_gme = df[df['gme']==True]

In [52]:
only_bb

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,a,c,i,t,p,n,bb,amc,nok,gme,any,s,r,sentiment
19,19,19,imm_uol1819,2021-01-01 06:19:05,ko93en,"2021 sales: bby down ~25% from ath, zm down ~5...",they're both dangerously close to their 200 da...,5.0,True,False,False,False,True,9.0,0.75,-1
23,23,23,napkins33e,2021-01-01 13:16:02,koewf7,issues with posting,anyone else having issues with this bot? this ...,0.0,True,False,False,False,True,1.0,1.00,-1
30,30,30,FrogoftheNorth,2021-01-01 17:31:03,kojkmt,$bbby short squeeze plan,here is my take on $bbby 6 days before their q...,21.0,True,False,False,False,True,45.0,0.90,2
35,35,35,powerglide76,2021-01-01 20:48:05,kon5db,you retards know that all reddit awards expire...,i’ve done the calculations [here](imgur.com/9u...,55.0,True,False,False,True,True,224.0,0.87,-2
43,43,43,dukeofmuffinz,2021-01-02 08:59:00,kox3fs,any actually experiences at a gamestop locatio...,"everyone is all over gme in this sub lately, i...",7.0,True,False,False,True,True,8.0,0.79,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
51728,51731,5751,AsparagusRocket,2021-06-30 18:02:04,ob7pf3,diy sentiment investing - beating spy ytd and ...,hey guys! there’ve been a lot of phenomenal se...,1.0,True,True,False,True,True,1.0,1.00,2
51729,51732,5752,Janto_2021,2021-06-30 18:07:01,ob7sj0,a short poem: time to hang up my yolo hat and ...,long story short. although i work at mcdonalds...,14.0,True,False,False,False,True,16.0,0.64,-1
51730,51733,5753,AsparagusRocket,2021-06-30 18:07:02,ob7smo,diy sentiment investing - beating spy ytd and ...,hey guys! there’ve been a lot of phenomenal se...,1.0,True,True,False,True,True,1.0,1.00,2
51731,51734,5754,AsparagusRocket,2021-06-30 18:09:02,ob7u1y,diy sentiment investing - beating spy ytd and ...,hey guys! there’ve been a lot of phenomenal se...,18.0,True,True,False,True,True,89.0,0.88,2


In [53]:
def get_sentiment_count(sen_list):
  positive_count = list(filter(lambda score: score > 0, sen_list))
  negative_count = list(filter(lambda score: score < 0, sen_list))
  neutral_count = list(filter(lambda score: score == 0, sen_list))

  return len(positive_count), len(negative_count), len(neutral_count)

In [54]:
def make_date_list():
    date_list = []
    jan = [1, 4, 5, 6, 7, 8, 11, 12, 13, 14, 15, 18, 19, 20, 21, 22, 25, 26, 27, 28, 29]
    feb = [1, 2, 3, 4, 5, 8, 9, 10, 11, 12, 15, 16, 17, 18, 19, 22, 23, 24, 25, 26]
    mar = [1, 2, 3, 4, 5, 8, 9, 10, 11, 12, 15, 16, 17, 18, 19, 22, 23, 24, 25, 26, 27, 30, 31]
    apr = [1, 2, 5, 6, 7, 8, 9, 12, 13, 14, 15, 16, 19, 20, 21, 22, 23, 26, 27, 28, 29, 30]
    may = [3, 4, 5, 6, 7, 10, 11, 12, 13, 14, 17, 18, 19, 20, 21, 24, 25, 26, 27, 28, 31]
    jun = [1, 2, 3, 4, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18, 21, 22, 23, 24, 25, 28, 29, 30]
    months = [jan, feb, mar, apr, may, jun]
    
    for month in range(len(months)):
        for day in months[month]:
            if day < 10:
                day = '0' + str(day)
            datetime_str = '21-0' + str(month+1) + '-'+ str(day)
            datetime_obj = datetime.strptime(datetime_str, '%y-%m-%d')
            date_list.append(datetime_obj)
    return date_list

date_list = make_date_list()
#print(date_list)

In [55]:
def get_final_dataset(date_list, price_df, stock_only):

    date_df = price_df['Date'].tolist()

    accumulated_date_list = []
    accumulated_n_list = []
    accumulated_s_list = []
    accumulated_r_list = []
    accumulated_sen_list = []
    positive_count_list = []
    negative_count_list = []
    neutral_count_list = []
    accumulated_price_change_list = []


    for index in range(1,len(date_list)):
        #to filter and keep only the non-trading hours
        filtered = stock_only[stock_only['c'] < date_list[index] + timedelta(hours=9, minutes=30)]
        filtered = filtered[date_list[index] - timedelta(hours=8, minutes=0) < stock_only['c']]

        if len(filtered) != 0:
          accumulated_n = sum(filtered['n'].tolist()) / len(filtered['n'].tolist())
          accumulated_s = sum(filtered['s'].tolist()) / len(filtered['s'].tolist())
          accumulated_r = sum(filtered['r'].tolist()) / len(filtered['r'].tolist())
          accumulated_sen = sum(filtered['sentiment'].tolist()) / len(filtered['sentiment'].tolist())                
        else:
          accumulated_n = 0
          accumulated_s = 0
          accumulated_r = 0
          accumulated_sen = 0     

        positive_count, negative_count, neutral_count = get_sentiment_count(filtered['sentiment'].tolist())

        if (str(date_list[index])[0:10] in date_df):
            price_change = price_df['price_change'][date_df.index(str(date_list[index])[0:10])]
        else:
            price_change = 0

        accumulated_date_list.append(date_list[index])
        accumulated_n_list.append(accumulated_n)
        accumulated_s_list.append(accumulated_s)
        accumulated_r_list.append(accumulated_r)
        accumulated_sen_list.append(accumulated_sen)
        positive_count_list.append(positive_count)
        negative_count_list.append(negative_count)
        neutral_count_list.append(neutral_count)
        accumulated_price_change_list.append(price_change)


    final = pd.DataFrame(list(zip(accumulated_date_list, 
                                  accumulated_n_list, 
                                  accumulated_s_list, 
                                  accumulated_r_list, 
                                  accumulated_sen_list, 
                                  positive_count_list,
                                  negative_count_list,
                                  neutral_count_list,                                  
                                  accumulated_price_change_list)), 
                            columns =['Date', 
                                      'accumulated_n', 
                                      'accumulated_s', 
                                      'accumulated_r', 
                                      'accumulated_sentiment', 
                                      'positive_count',
                                      'negative_count',
                                      'neutral_count', 
                                      'price_change'])
    return final



In [56]:
bb_final = get_final_dataset(date_list, bb_price, only_bb)
amc_final = get_final_dataset(date_list, amc_price, only_amc)
nok_final = get_final_dataset(date_list, nok_price, only_nok)
gme_final = get_final_dataset(date_list, gme_price, only_gme)

bb_final.to_csv('bb_final.csv')
amc_final.to_csv('amc_final.csv')
nok_final.to_csv('nok_final.csv')
gme_final.to_csv('gme_final.csv')



In [57]:
bb_final

Unnamed: 0,Date,accumulated_n,accumulated_s,accumulated_r,accumulated_sentiment,positive_count,negative_count,neutral_count,price_change
0,2021-01-04,9.000000,1.000000,0.600000,2.000000,1,0,0,-0.017910
1,2021-01-05,126.000000,1449.500000,0.940000,2.000000,2,0,0,0.022659
2,2021-01-06,30.000000,1.500000,0.520000,1.500000,2,0,0,0.000000
3,2021-01-07,142.666667,231.333333,0.826667,2.000000,3,0,0,0.045926
4,2021-01-08,45.666667,36.666667,0.850000,1.000000,2,1,0,0.047091
...,...,...,...,...,...,...,...,...,...
123,2021-06-24,171.076923,50.153846,0.650000,0.307692,6,6,1,-0.041572
124,2021-06-25,86.000000,224.375000,0.913125,1.437500,14,1,1,-0.007371
125,2021-06-28,259.222222,248.333333,0.842222,1.111111,7,2,0,0.061360
126,2021-06-29,299.000000,193.555556,0.874444,1.666667,8,0,1,-0.023144


## 4) Get top sentiments for each day

In [58]:
def get_top_sentiments(df):
    df_sen_sorted = df.sort_values(by='sentiment')
    p_list = df_sen_sorted['p'].tolist()
    s_list = df_sen_sorted['sentiment'].to_list()
    
    if (len(p_list) < 7) or (len(s_list) < 7):
        neg_sen1, neg_sen2, neg_sen3 = {'post': 'Nil' , 'sen_score': 0}, {'post': 'Nil' , 'sen_score': 0}, {'post': 'Nil' , 'sen_score': 0}
        pos_sen1, pos_sen2, pos_sen3 = {'post': 'Nil' , 'sen_score': 0}, {'post': 'Nil' , 'sen_score': 0}, {'post': 'Nil' , 'sen_score': 0}
    else:
        neg_sen1 = {'post': p_list[0] , 'sen_score': s_list[0]}
        neg_sen2 = {'post': p_list[1] , 'sen_score': s_list[1]}
        neg_sen3 = {'post': p_list[2] , 'sen_score': s_list[2]}

        pos_sen1 = {'post': p_list[len(p_list)-1] , 'sen_score': s_list[len(s_list)-1]}
        pos_sen2 = {'post': p_list[len(p_list)-2] , 'sen_score': s_list[len(s_list)-2]}
        pos_sen3 = {'post': p_list[len(p_list)-3] , 'sen_score': s_list[len(s_list)-3]}
    
    return neg_sen1, neg_sen2, neg_sen3, pos_sen1, pos_sen2, pos_sen3

In [61]:
def get_sentiment_df(date_list, price_df, df):

    date_df = price_df['Date'].tolist()

    accumulated_date_list = []
    neg_sen1_list = []
    neg_sen1_score_list = []
    neg_sen2_list = []
    neg_sen2_score_list = []
    neg_sen3_list = []
    neg_sen3_score_list = []
    pos_sen1_list = []
    pos_sen1_score_list = []
    pos_sen2_list = []
    pos_sen2_score_list = []
    pos_sen3_list = []
    pos_sen3_score_list = []


    for index in range(1,len(date_list)):
        #to filter and keep only the latest 1000 before stock market open
        filtered = df[df['c'] < date_list[index] + timedelta(hours=9, minutes=30)]
        filtered = filtered[date_list[index] - timedelta(hours=8, minutes=0) < df['c']]

     
        neg_sen1, neg_sen2, neg_sen3, pos_sen1, pos_sen2, pos_sen3 = get_top_sentiments(filtered)


        accumulated_date_list.append(date_list[index])

        neg_sen1_list.append(neg_sen1['post'])
        neg_sen1_score_list.append(neg_sen1['sen_score'])
        neg_sen2_list.append(neg_sen2['post'])
        neg_sen2_score_list.append(neg_sen2['sen_score'])
        neg_sen3_list.append(neg_sen3['post'])
        neg_sen3_score_list.append(neg_sen3['sen_score'])

        pos_sen1_list.append(pos_sen1['post'])
        pos_sen1_score_list.append(pos_sen1['sen_score'])
        pos_sen2_list.append(pos_sen2['post'])
        pos_sen2_score_list.append(pos_sen2['sen_score'])
        pos_sen3_list.append(pos_sen3['post'])
        pos_sen3_score_list.append(pos_sen3['sen_score'])


    final = pd.DataFrame(list(zip(accumulated_date_list, 
                                  neg_sen1_list,
                                  neg_sen1_score_list,
                                  neg_sen2_list,
                                  neg_sen2_score_list,
                                  neg_sen3_list,
                                  neg_sen3_score_list,
                                  pos_sen1_list,
                                  pos_sen1_score_list,
                                  pos_sen2_list,
                                  pos_sen2_score_list,
                                  pos_sen3_list,
                                  pos_sen3_score_list)), 
                            columns =['Date', 
                                      'neg_sen1', 
                                      'neg_sen1_score', 
                                      'neg_sen2', 
                                      'neg_sen2_score', 
                                      'neg_sen3', 
                                      'neg_sen3_score', 
                                      'pos_sen1', 
                                      'pos_sen1_score', 
                                      'pos_sen2', 
                                      'pos_sen2_score', 
                                      'pos_sen3', 
                                      'pos_sen3_score'])
    return final



In [62]:
df_sen = get_sentiment_df(date_list, bb_price, df)
df_sen.to_csv('df_sen.csv')



In [63]:
df_sen.head()

Unnamed: 0,Date,neg_sen1,neg_sen1_score,neg_sen2,neg_sen2_score,neg_sen3,neg_sen3_score,pos_sen1,pos_sen1_score,pos_sen2,pos_sen2_score,pos_sen3,pos_sen3_score
0,2021-01-04,"on ryan cohen's official youtube, he has a vid...",-2,"alright you fucking autists, listen up. we’re ...",-1,i accidentally posted everything in the title....,0,i work fairly high up in campaign consulting. ...,2,i work fairly high up in campaign consulting. ...,2,according to morningstar research in the middl...,2
1,2021-01-05,&amp;#x200b; *processing img 1h1jfd3g0e961...*...,-2,&amp;#x200b; *processing img cq40a4md1e961...*...,-2,my broker will not let me sell covered calls a...,-1,you all keep saying gme is just a bricks and m...,2,before i get to the main reason of this post i...,2,* 2021 starts off with a bang as vix shoots up...,2
2,2021-01-06,"as we've seen, there's been a lot of fuckery g...",-2,inspired by another post and discussion here o...,-1,so my retard friends i have just stumbled upon...,1,i just want to preface this with saying you gu...,2,the last two major sell-offs we’ve had on gme ...,2,gme gangggg!!! most of you know i've been long...,2
3,2021-01-07,while everyone was drowning in the tendies the...,-1,(*editor's note: it's a wild day for the marke...,-1,[shares/options ~45k](https://imgur.com/galler...,0,meme magic is real and shout out to [u/stonksf...,2,**stocks go bonkers while the dems take georgi...,2,**stocks go bonkers while the dems take georgi...,2
4,2021-01-08,Nil,0,Nil,0,Nil,0,Nil,0,Nil,0,Nil,0


## 5) Upload to firebase

In [64]:
import pyrebase
import time
from datetime import date
from decouple import config


firebase_config = { "apiKey": config('FIREBASE_API_KEY'),
            "authDomain": config('FIREBASE_AUTH_DOMAIN'),
            "databaseURL": config('FIREBASE_DB_URL'),
            "storageBucket": config('FIREBASE_STORAGE_BUCKET') }

firebase = pyrebase.initialize_app(firebase_config)
db = firebase.database()

In [65]:
df_bb = pd.read_csv('bb_final.csv')
df_amc = pd.read_csv('amc_final.csv')
df_nok = pd.read_csv('nok_final.csv')
df_gme = pd.read_csv('gme_final.csv')
df_sen = pd.read_csv('df_sen.csv')

date_list = make_date_list()

In [66]:
df_bb.head()

Unnamed: 0.1,Unnamed: 0,Date,accumulated_n,accumulated_s,accumulated_r,accumulated_sentiment,positive_count,negative_count,neutral_count,price_change
0,0,2021-01-04,9.0,1.0,0.6,2.0,1,0,0,-0.01791
1,1,2021-01-05,126.0,1449.5,0.94,2.0,2,0,0,0.022659
2,2,2021-01-06,30.0,1.5,0.52,1.5,2,0,0,0.0
3,3,2021-01-07,142.666667,231.333333,0.826667,2.0,3,0,0,0.045926
4,4,2021-01-08,45.666667,36.666667,0.85,1.0,2,1,0,0.047091


In [67]:
df_sen.head()

Unnamed: 0.1,Unnamed: 0,Date,neg_sen1,neg_sen1_score,neg_sen2,neg_sen2_score,neg_sen3,neg_sen3_score,pos_sen1,pos_sen1_score,pos_sen2,pos_sen2_score,pos_sen3,pos_sen3_score
0,0,2021-01-04,"on ryan cohen's official youtube, he has a vid...",-2,"alright you fucking autists, listen up. we’re ...",-1,i accidentally posted everything in the title....,0,i work fairly high up in campaign consulting. ...,2,i work fairly high up in campaign consulting. ...,2,according to morningstar research in the middl...,2
1,1,2021-01-05,&amp;#x200b; *processing img 1h1jfd3g0e961...*...,-2,&amp;#x200b; *processing img cq40a4md1e961...*...,-2,my broker will not let me sell covered calls a...,-1,you all keep saying gme is just a bricks and m...,2,before i get to the main reason of this post i...,2,* 2021 starts off with a bang as vix shoots up...,2
2,2,2021-01-06,"as we've seen, there's been a lot of fuckery g...",-2,inspired by another post and discussion here o...,-1,so my retard friends i have just stumbled upon...,1,i just want to preface this with saying you gu...,2,the last two major sell-offs we’ve had on gme ...,2,gme gangggg!!! most of you know i've been long...,2
3,3,2021-01-07,while everyone was drowning in the tendies the...,-1,(*editor's note: it's a wild day for the marke...,-1,[shares/options ~45k](https://imgur.com/galler...,0,meme magic is real and shout out to [u/stonksf...,2,**stocks go bonkers while the dems take georgi...,2,**stocks go bonkers while the dems take georgi...,2
4,4,2021-01-08,Nil,0,Nil,0,Nil,0,Nil,0,Nil,0,Nil,0


In [68]:
bb_dict = df_bb.set_index('Date').T.to_dict('dict')
amc_dict = df_amc.set_index('Date').T.to_dict('dict')
nok_dict = df_nok.set_index('Date').T.to_dict('dict')
gme_dict = df_gme.set_index('Date').T.to_dict('dict')
sen_dict = df_sen.set_index('Date').T.to_dict('dict')

for index in range(len(date_list)):

  try:
    bb_dict_single = bb_dict[str(date_list[index])[:10]]
    update_indicator = db.child(str(date_list[index])[:10]).child("BB").set({
        'pc': bb_dict_single['price_change'],

        'positive_count': bb_dict_single['positive_count'],
        'negative_count': bb_dict_single['negative_count'],
        'neutral_count': bb_dict_single['neutral_count']
    })

    amc_dict_single = amc_dict[str(date_list[index])[:10]]
    update_indicator = db.child(str(date_list[index])[:10]).child("AMC").set({
        'pc': amc_dict_single['price_change'],

        'positive_count': amc_dict_single['positive_count'],
        'negative_count': amc_dict_single['negative_count'],
        'neutral_count': amc_dict_single['neutral_count']
    })

    nok_dict_single = nok_dict[str(date_list[index])[:10]]
    update_indicator = db.child(str(date_list[index])[:10]).child("NOK").set({
        'pc': nok_dict_single['price_change'],

        'positive_count': nok_dict_single['positive_count'],
        'negative_count': nok_dict_single['negative_count'],
        'neutral_count': nok_dict_single['neutral_count']
    })

    gme_dict_single = gme_dict[str(date_list[index])[:10]]
    update_indicator = db.child(str(date_list[index])[:10]).child("GME").set({
        'pc': gme_dict_single['price_change'],

        'positive_count': gme_dict_single['positive_count'],
        'negative_count': gme_dict_single['negative_count'],
        'neutral_count': gme_dict_single['neutral_count']
    })

    sen_dict_single = sen_dict[str(date_list[index])[:10]]
    update_indicator = db.child(str(date_list[index])[:10]).child("sentiments").set({

        'neg_sen1': sen_dict_single['neg_sen1'],
        'neg_sen1_score': sen_dict_single['neg_sen1_score'],
        'neg_sen2': sen_dict_single['neg_sen2'],
        'neg_sen2_score': sen_dict_single['neg_sen2_score'],
        'neg_sen3': sen_dict_single['neg_sen3'],
        'neg_sen3_score': sen_dict_single['neg_sen3_score'],

        'pos_sen1': sen_dict_single['pos_sen1'],
        'pos_sen1_score': sen_dict_single['pos_sen1_score'],
        'pos_sen2': sen_dict_single['pos_sen2'],
        'pos_sen2_score': sen_dict_single['pos_sen2_score'],
        'pos_sen3': sen_dict_single['pos_sen3'],
        'pos_sen3_score': sen_dict_single['pos_sen3_score']
    })
    #print("For date:", str(date_list[index])[:10])
  except:
    print("date not found:", str(date_list[index])[:10])

date not found: 2021-01-01
