# Introduction to pysententia

[Sententia](https://github.com/71point4/pysententia) provides access to
media sentiment data from the Bureau for Economic research.

The homepage for the {sententia} R package is at
<https://github.com/71point4/pysententia>.

Install from GitHub.

In [None]:
!pip install git+https://github.com/71point4/pysententia

## Set the API Key

To access the API you’ll need to first specify an API key as provided to
you by the BER.

In [2]:
import os

In [4]:
key = os.getenv('SENTENTIA_KEY')

## The API interface

Besides providing sentiment calculations from different word-list
dictionaries, the API interface provides access to the various
permutations that is available in calculating a sentiment score (Within
text and across time):

<img src="docs/figures/aggregations.png" width="1236" style="display: block; margin: auto;" />

-   Sentiment calculation `WITHIN` the article
    -   In the API this is set by the `aggr` parameter
-   Sentiment calculation `ACCROSS` a time period
    -   This does not need to be set. API returns four aggregations:
        -   `mean_sentiment`
        -   `relative`
        -   `absolute`
        -   `sent_log`

# Usage

In [7]:
from pysententia import Sententia

ModuleNotFoundError: No module named 'pysententia'

In [None]:
sent = Sententia(key = key)

## Sentiment Index

Get media sentiment index values for specified media source, model,
topic, dictionary, frequency, and aggregation method combination.

In [None]:
sent.sent_index(
   source = "businessday",
   model = "model_2021-05-15",
   topic = "global",
   freq = "month",
   dict = "loughran",
   aggr = "sent_logit"
   )


## Count of articles that make up sentiment

Get a count of the number of articles for a specified media source,
model, topic, and frequency of aggregation.

In [None]:
sent.sent_counts(
   source = "all",
   model = "model_2021-05-15",
   topic = "global",
   freq = "week"
   )

## Date polarity

Get a count of the number of positive and negative articles for a
specified model, topic, dictionary, aggregation method, and frequency.

In [None]:
sent.sent_date_polarity(
   source = "all",
   model = "model_2021-05-15",
   topic = "global",
   freq = "week",
   dict = "loughran",
   aggr = "sent_logit"
   )


## Word polarity

Get the top 50 most frequently occurring positive and negative words for
a specified model, topic, dictionary, aggregation method, and frequency.
The timeframe over which these words are selected depends on the
specified frequency (day = 30 days, week = 3 months, month = 6 months).

In [None]:
sent.sent_word_polarity(
   source = "all",
   model = "model_2021-05-15",
   topic = "economy",
   freq = "month",
   dict = "loughran"
   )

# Plotting

In [None]:
from pysententia import Sententia
import matplotlib.pyplot as plt
sent = sententia(key = '700ce27d55e27922a39232f8403602ba')

sent_out = sent.sent_index( 
   source = "businessday", 
   model = "model_2021-05-15", 
   topic = "global", 
   freq = "month", 
   dict = "loughran", 
   aggr = "sent_logit" 
   ) 

out = sent_out[['published_date', 'sent_log']]
out.plot(x = "published_date", y = "sent_log")
# plt.show()

count_out = sent.sent_counts( 
   source = "businessday", 
   model = "model_2021-05-15", 
   topic = "global", 
   freq = "month"
   ) 

out = count_out[['published_date', 'article_count']]
out.plot(x = "published_date", y = "article_count")
# plt.show()
