## ANÁLISIS DE SENTIMIENTOS DE NOTICIAS FINANCIERAS

Importamos las librerías necesarias

In [1]:
import pandas as pd
import numpy as np

from textblob import TextBlob

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

Cargamos los datos extraidos a traves de la técnica de web scraping

In [2]:
df = pd.read_csv('BTC-news.csv')

Renombramos la columna de fechas para que coincida con los otros datasets. 

In [3]:
df = df.rename(columns={"begins_at": "Date"})

Modificamos esa misma columna para tener el mismo formato que los otros datasets. 

In [4]:
df.Date = pd.to_datetime(df.Date, format='%d/%m/%Y')

Eliminamos de los articulos cualquier ruido para que el modelo sea capaz de interpretarlos.

In [5]:
df.articles =  df.articles.str.replace('[^0-9a-zA-Z\s]', ' ').astype('string')

  df.articles =  df.articles.str.replace('[^0-9a-zA-Z\s]', ' ').astype('string')


Comprobamos los tipos de datos

In [6]:
df.dtypes

Date        datetime64[ns]
articles            string
dtype: object

Configuramos la columna de fechas como índice.

In [7]:
df = df.set_index('Date')

Comprobamos la existencia de valores nulos o de valores faltantes para el horizonte temporal.

In [8]:
df.isnull().sum()

articles    0
dtype: int64

In [9]:
pd.date_range(start = '2018-02-25', end = '2023-02-24' ).difference(df.index)

DatetimeIndex([], dtype='datetime64[ns]', freq='D')

Mostramos en dataframe

In [10]:
df

Unnamed: 0_level_0,articles
Date,Unnamed: 1_level_1
2018-02-25,Original Pizza Day Purchaser Does It Again W...
2018-02-26,Bitcoin Pizza Day 2 How A Lightning Payment...
2018-02-27,Rapper 50 Cent Who Bragged About Owning Bit...
2018-02-28,This Is Who Controls Bitcoin British Man ...
2018-03-01,Bitcoin makes inroads in LA s residential re...
...,...
2023-02-20,Bitcoin regains 25K amid hope record China ...
2023-02-21,Bitcoin active addresses concern analyst d...
2023-02-22,Bitcoin Ethereum Technical Analysis BTC Fa...
2023-02-23,Bitcoin bears attempt to pin BTC price under...


************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************

### Vamos a calcular la polaridaad y subjetividad

In [11]:
def Subjectivity(text):
    return TextBlob(text).sentiment.subjectivity


def Polarity(text):
    return TextBlob(text).sentiment.polarity

In [12]:
df['Subjectivity'] = df['articles'].apply(Subjectivity)

df['Polarity'] = df['articles'].apply(Polarity)

************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************

### Vamos a calcular el sentiment score

In [13]:
def Sentiment(text):
    sia = SentimentIntensityAnalyzer()
    sentiment = sia.polarity_scores(text)
    return sentiment

In [14]:
compound = []
neg = []
pos = []
neu = []


for i in range(len(df.articles)):
    SIA = Sentiment(df.articles[i])
    compound.append(SIA['compound'])
    neg.append(SIA['neg'])
    pos.append(SIA['pos'])
    neu.append(SIA['neu'])

In [15]:
df['sentiment'] =  compound
df['negative'] =  neg
df['positive'] =  pos
df['neutral'] =  neu

************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************

Veamos el dataset final

In [16]:
df

Unnamed: 0_level_0,articles,Subjectivity,Polarity,sentiment,negative,positive,neutral
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-02-25,Original Pizza Day Purchaser Does It Again W...,0.441667,0.220833,0.7788,0.021,0.087,0.892
2018-02-26,Bitcoin Pizza Day 2 How A Lightning Payment...,0.446667,0.010000,-0.6597,0.080,0.037,0.883
2018-02-27,Rapper 50 Cent Who Bragged About Owning Bit...,0.518506,0.001623,-0.6705,0.073,0.034,0.892
2018-02-28,This Is Who Controls Bitcoin British Man ...,0.459091,0.039394,0.4939,0.038,0.093,0.869
2018-03-01,Bitcoin makes inroads in LA s residential re...,0.335000,-0.083333,-0.1543,0.097,0.081,0.822
...,...,...,...,...,...,...,...
2023-02-20,Bitcoin regains 25K amid hope record China ...,0.575510,0.102041,0.1952,0.113,0.136,0.751
2023-02-21,Bitcoin active addresses concern analyst d...,0.350000,0.083333,0.8225,0.035,0.124,0.841
2023-02-22,Bitcoin Ethereum Technical Analysis BTC Fa...,0.283333,0.166667,-0.1027,0.052,0.040,0.908
2023-02-23,Bitcoin bears attempt to pin BTC price under...,0.334407,0.057197,-0.2382,0.072,0.073,0.855


Comprobamos los tipos de datos

In [17]:
df.dtypes

articles         string
Subjectivity    float64
Polarity        float64
sentiment       float64
negative        float64
positive        float64
neutral         float64
dtype: object

Comprobamos la existencia de valores nulos o de valores faltantes para el horizonte temporal.

In [18]:
df.isnull().sum()

articles        0
Subjectivity    0
Polarity        0
sentiment       0
negative        0
positive        0
neutral         0
dtype: int64

In [19]:
pd.date_range(start = '2018-02-25', end = '2023-02-24' ).difference(df.index)

DatetimeIndex([], dtype='datetime64[ns]', freq='D')

Guardamos el dataset final en un archivo CSV llamado 'BTC_sentiment'.

In [20]:
df.to_csv('BTC_sentiment.csv', encoding='utf-8')