# Using scheduled notebook updates on Kaggle to maintain a daily updated stock price and news sentiment



## Introduction

This notebook is fetching daily stock market data of Reliance from [yfinance](https://www.yahoofinanceapi.com/) api and its daily news articles from [mediastack](https://mediastack.com/) api and creating a derived dataset which contains **news sentiment**. This notebook will be scheduled to be run daily using scheduled notebook run.



## About Dataset

**Input** - Input data will be the dataset we created from this notebook output by **keeping data in sync with new notebook versions** which can be find in metadata section of a dataset that was created directly from a Kaggle Notebook Output Files.

**Output** - Notebook will fetch today's stock market data , news articles and its sentiment( positive , negative or neutral) , append all this data with input data and output it to the dataset.
 
Notebook will generate 3 files -

1. reliance_stock_history.csv
2. reliance_news.json
3. reliance_news_sentiment.csv

#### reliance_stock_history.csv 

1. Date - Trading date
2. Open - Open price of day
3. High - Highest price of day
4. Low - Lowest price of day
5. Close - Closing price of day
6. Volume - Amount of asset/security 
7. Dividends - Distribution of stock
8. Stock splits - Shares of stock to its current shareholders


#### reliance_news.json

1. author - author of news article 
2. title - title of news article 
3. description - description of news article
4. url - url of news article
5. source - source of news article
6. image - image of news article
7. category - category of news article
8. language - language of news article
9. country - country name
10. published_at - published date

#### reliance_news_sentiment.csv

1. published_at - published date
2. title - title of news article 
3. description - description of news article
4. url - url of news article
5. sentiment - news sentiment
6. sentiment_score - news sentiment score between 0 to 1


## How to use this notebook
Follow this steps if you want to create a new dataset for some different company using scheduled notebook updates feature on kaggle-

* Click on the **copy & edit** button
* Create api keys of **mediastack api** and add it to secrets (*Add-ons>Secrets* ) with the label **mediastack-token**.
* If you want to create dataset for different company eg. "TESLA" , you can change the variable **company_symbol** to "TSLA" and **search_query** (for news articles) to "tesla" which can be find in **imports** section of the notebook.
* Remove input data
* Go to the Schedule a notebook run with frequency **daily** with start date **today** and save .
* Save & run all or do a quick save with the **always save output** selection selected.
* Navigate to the **data/output** section of the notebook viewer and find your output files that you saved,they should be there in the output section and then click on the **new dataset** button that should be right there next to your files and give some name to your dataset and Make sure you click on the **keep data in sync with new notebook versions**
* Add newly created dataset in your notebook input.
* Now change input file path mentioned in **Imports** section of the notebook and do save & run all.

After following above steps , your notebook will be **updated daily** and dataset will be updated automatically.

## Installations 

In [1]:
# install yfinance
!pip install yfinance

Collecting yfinance
  Downloading yfinance-0.2.52-py2.py3-none-any.whl (108 kB)
[K     |████████████████████████████████| 108 kB 5.2 MB/s eta 0:00:01
[?25hCollecting multitasking>=0.0.7
  Downloading multitasking-0.0.11-py3-none-any.whl (8.5 kB)
Collecting peewee>=3.16.2
  Downloading peewee-3.17.8.tar.gz (948 kB)
[K     |████████████████████████████████| 948 kB 27.8 MB/s eta 0:00:01
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h    Preparing wheel metadata ... [?25ldone
[?25hCollecting lxml>=4.9.1
  Downloading lxml-5.3.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.9 MB)
[K     |████████████████████████████████| 4.9 MB 35.2 MB/s eta 0:00:01
[?25hCollecting pytz>=2022.5
  Downloading pytz-2024.2-py2.py3-none-any.whl (508 kB)
[K     |████████████████████████████████| 508 kB 52.0 MB/s eta 0:00:01
[?25hCollecting frozendict>=2.3.4
  Downloading frozendict-2.4.6-cp37-cp37m-manylinux_2_17_x86_6

## Imports  

In [2]:
# import libraries 
import yfinance as yf
import pandas as pd
from kaggle_secrets import UserSecretsClient
import os
import json
import datetime
from datetime import date,timedelta
import warnings
import http.client, urllib.parse
warnings.filterwarnings("ignore")
from transformers import AutoTokenizer, AutoModelForSequenceClassification,pipeline
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)

# company symbol and name
company_symbol="RELIANCE.NS"

#initialise today date
today = str(date.today())
yesterday = str(date.today()- timedelta(days = 1))

# flag variables
news_inserted=False

#secret keys of mediastack 
user_secrets = UserSecretsClient()
mediastack_api_token = user_secrets.get_secret("mediastack-token")

# input file paths
stock_history_file_path='../input/reliancestockandnewsdata/reliance_stock_history.csv'
news_file_path='../input/reliancestockandnewsdata/reliance_news.json'
news_sentiment_file_path='../input/reliancestockandnewsdata/reliance_news_sentiment.csv'

# output file paths
output_stock_history_file_path='./reliance_stock_history.csv'
output_news_file_path='./reliance_news.json'
output_news_sentiment_file_path='./reliance_news_sentiment.csv'

# parameters for mediastack api
search_query='reliance'
conn = http.client.HTTPConnection('api.mediastack.com')
params = urllib.parse.urlencode({
    'keywords': search_query,
    'access_key': mediastack_api_token,
    'sort': 'published_desc',
    'limit': 10,
    'languages': 'en',
    'country': 'in',
    'date': yesterday
    })

## Create stock market history dataset

In [3]:
def create_stock_history_dataset():
    reliance_stock_history=ticker_object.history(period="1d").reset_index()
    return reliance_stock_history

def update_stock_history_dataset():
    reliance_stock_history=pd.read_csv(stock_history_file_path)
    reliance_stock_history.Date=pd.to_datetime(reliance_stock_history.Date, format='%Y/%m/%d')
    today_reliance_stock_data=ticker_object.history(period="1d")
    today_reliance_stock_data=today_reliance_stock_data.reset_index()
    last_stock_date=str(today_reliance_stock_data.loc[0,'Date']).split()[0]
    if last_stock_date == reliance_stock_history['Date'].dt.strftime('%Y-%m-%d')[len(reliance_stock_history)-1]: #if already inserted 
        reliance_stock_history.iloc[-1:,:]=today_reliance_stock_data.iloc[-1].tolist()
    else:
        last_position=len(reliance_stock_history)
        reliance_stock_history.loc[last_position]=today_reliance_stock_data.iloc[-1].tolist()
    return reliance_stock_history

In [4]:
# create stock market history dataset
ticker_object=yf.Ticker(company_symbol)
if os.path.exists(stock_history_file_path)==False:
    reliance_stock_history=create_stock_history_dataset()
else:
    reliance_stock_history=update_stock_history_dataset()


reliance_stock_history.to_csv(output_stock_history_file_path,index=False)

In [5]:
reliance_stock_history

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits
0,2021-10-21 00:00:00,2727.399902,2728.0,2603.199951,2622.5,9612230,0.0,0.0
1,2021-10-22 00:00:00,2620.0,2664.899902,2611.5,2627.399902,5086641,0.0,0.0
2,2021-10-25 00:00:00,2680.0,2680.0,2570.0,2601.800049,7934786,0.0,0.0
3,2021-10-26 00:00:00,2617.100098,2668.899902,2603.149902,2661.050049,4498720,0.0,0.0
4,2021-10-27 00:00:00,2652.0,2676.800049,2619.949951,2627.399902,4565815,0.0,0.0
5,2021-10-28 00:00:00,2620.0,2637.949951,2590.5,2598.600098,4562471,0.0,0.0
6,2021-10-29 00:00:00,2596.149902,2596.149902,2501.699951,2536.25,6568539,0.0,0.0
7,2021-11-01 00:00:00,2536.25,2556.0,2494.100098,2537.800049,7144150,0.0,0.0
8,2021-11-02 00:00:00,2545.0,2548.0,2495.25,2500.800049,4874582,0.0,0.0
9,2021-11-03 00:00:00,2506.050049,2520.0,2461.0,2483.600098,5530635,0.0,0.0


## Create news dataset

In [6]:
def create_news_dataset():
    conn.request('GET', '/v1/news?{}'.format(params))
    res = conn.getresponse().read()
    reliance_news=json.loads(res.decode('utf-8'))["data"]
    return reliance_news

def update_news_dataset():
    global news_inserted
    with open(news_file_path,'r') as file:
        reliance_news=json.load(file)
        for news in reliance_news['articles']:
            if news['published_at'].split('T')[0]==yesterday:
                news_inserted=True
                break
        current_reliance_news=None
        if news_inserted==False:
            conn.request('GET', '/v1/news?{}'.format(params))
            res = conn.getresponse().read()
            current_reliance_news=json.loads(res.decode('utf-8'))["data"]
            reliance_news['articles']+=current_reliance_news
        return reliance_news['articles'],current_reliance_news

In [7]:
#create news dataset 
if os.path.exists(news_file_path)==False:
    reliance_news=create_news_dataset()
    current_reliance_news=reliance_news.copy()
else:
    reliance_news,current_reliance_news=update_news_dataset()

with open(output_news_file_path,'w') as file:
    json.dump({"articles":reliance_news},file)

In [8]:
reliance_news

[{'author': None,
  'title': 'Exclusive: Industrialist Nikhil Merchant leads race for acquiring Reliance Naval',
  'description': 'Low-profile Gujarat businessman wants to add Pipavav shipyard facility to his adjacent LNG port.',
  'url': 'https://www.businesstoday.in/latest/corporate/story/exclusive-industrialist-nikhil-merchant-leads-race-for-acquiring-reliance-naval-310061-2021-10-21?utm_source=rssfeed',
  'source': 'businesstoday',
  'image': None,
  'category': 'general',
  'language': 'en',
  'country': 'in',
  'published_at': '2021-10-21T14:40:39+00:00'},
 {'author': 'Reuters',
  'title': 'India’s Reliance gets shareholders’ nod to add Aramco chairman as director',
  'description': 'BENGALURU &#8212; India&#8217;s Reliance Industries Ltd said on Thursday that a required majority of its shareholders have passed a resolution to appoint Saudi Aramco Chairman Yasir Al-Rumayyan as an independent director to the conglomerate&#8217;s board. A little over 98% of the total votes polled o

## Create news sentiment dataset

For sentiment analysis , I am using [huggingface](https://huggingface.co/) FinBERT model.
FinBERT is a pre-trained NLP model to analyze sentiment of financial text. It is built by further training the BERT language model in the finance domain, using a large financial corpus and thereby fine-tuning it for financial sentiment classification. 

Refer [this](https://huggingface.co/ProsusAI/finbert) for more info.

In [9]:
tokenizer = AutoTokenizer.from_pretrained("ProsusAI/finbert")
model = AutoModelForSequenceClassification.from_pretrained("ProsusAI/finbert")
classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)

Downloading:   0%|          | 0.00/758 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/252 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/438M [00:00<?, ?B/s]

In [10]:
def create_news_sentiment_dataset(news_sentiments):
    last_position=len(news_sentiments)
    article_ind=last_position
    title_description=[]
    if current_reliance_news!=None:
        for article in current_reliance_news:
            title_description.append(article['title']+' '+article['description'])
            news_sentiments.at[article_ind,'published_at']=article['published_at']
            news_sentiments.at[article_ind,'title']=article['title']
            news_sentiments.at[article_ind,'description']=article['description']
            news_sentiments.at[article_ind,'url']=article['url']
            article_ind+=1
        news_label_and_scores=classifier(list(title_description))
        labels=[pred['label'] for pred in news_label_and_scores]
        scores=[pred['score'] for pred in news_label_and_scores]
        news_sentiments.at[last_position:,'sentiment']=labels
        news_sentiments.at[last_position:,'sentiment_score']=scores
    
    news_sentiments.to_csv(output_news_sentiment_file_path,index=None)  

In [11]:
# create news sentiment dataset
news_sentiments=None
if os.path.exists(news_sentiment_file_path):
    news_sentiments=pd.read_csv(news_sentiment_file_path,index_col=None)                     
else:
    news_sentiments=pd.DataFrame(columns=['published_at','title','description','url','sentiment','sentiment_score'])
create_news_sentiment_dataset(news_sentiments)

In [12]:
news_sentiments

Unnamed: 0,published_at,title,description,url,sentiment,sentiment_score
0,2021-10-21T14:40:39+00:00,Exclusive: Industrialist Nikhil Merchant leads...,Low-profile Gujarat businessman wants to add P...,https://www.businesstoday.in/latest/corporate/...,neutral,0.711336
1,2021-10-21T13:52:13+00:00,India’s Reliance gets shareholders’ nod to add...,BENGALURU &#8212; India&#8217;s Reliance Indus...,https://financialpost.com/pmn/business-pmn/ind...,neutral,0.800176
2,2021-10-21T12:45:52+00:00,Rogers misses quarterly revenue estimates,Rogers Communications Inc reported third-quart...,https://torontosun.com/business/money-news/rog...,negative,0.973185
3,2021-10-21T12:40:35+00:00,Exclusive: Tycoon Nikhil Merchant leads race f...,Low-profile Gujarat businessman wants to add P...,https://www.businesstoday.in/latest/corporate/...,neutral,0.619371
4,2021-10-21T11:43:53+00:00,BP to Open Fuel Station in India Amid Record P...,BP and Reliance Industries signed a $6 billion...,https://sputniknews.com/20211021/bp-to-open-fu...,positive,0.929962
...,...,...,...,...,...,...
652,2022-02-16T03:47:36+00:00,"Rupee And Bond Update - February 16, 2022: Rel...","Rupee And Bond Update - February 16, 2022: Rel...",https://www.bloombergquint.com/research-report...,neutral,0.945126
653,2025-01-29T13:54:06+00:00,"India launches ₹16,300 Cr Critical Mineral Mis...","India approves ₹16,300 Cr Critical Mineral Mis...",https://www.deccanchronicle.com/news/india-lau...,positive,0.797443
654,2025-01-29T10:05:01+00:00,"Government approves Rs 16,300-crore National C...","Briefing the media after the Cabinet meeting, ...",https://www.zeebiz.com/economy-infra/news-cabi...,positive,0.746947
655,2025-01-29T04:48:00+00:00,US tech stock tumble highlights risk of market...,Turbulence in some of the biggest tech names i...,https://cyprus-mail.com/2025/01/29/us-tech-sto...,negative,0.659351


You will see 3 output files generated **reliance_stock_history.csv** , **reliance_news.json** , **reliance_news_sentiment.csv** after *save & run all*(manually or scheduled), go to your dataset (reliance stock and news data) is updated with new values appended , you can also go to notebook viewer > Data , and see updated dataset (refresh page if not showing latest data)

click [here](https://www.kaggle.com/yashvi/reliancestockandnewsdata) to view the dataset generated by this notebook 

## Final Notes 

So we successfully created our own stock market dataset which is scheduled to be updated daily using **scheduled notebook run** feature on Kaggle.
We can use this generated dataset and study the stock market behaviour and plan buy or sell strategies accordingly, as it is very difficult for an individual to consider all the current and past information for predicting future trend of a stock.