Drew Lickman

CSCI 4820-001

Project #7

Due 12/3/24

AI Disclaimer: A.I. Disclaimer: Work for this assignment was completed with the aid of artificial intelligence tools and comprehensive documentation of the names of, input provided to, and output obtained from, these tools is included as part of my assignment submission.

---

# Custom NLP Project using 3 Hugging Face Pipelines
### Dr. Sal Barbosa, Department of Computer Science, Middle Tennessee State University
---

# Project Description
This project is used to analyze the transcripts of the Federal Open Market Committees (FOMC)

Takes about 30 minutes to run the entire program

### The Problem:
I chose this project because I believe it is important for people to get a quick and easy-to-understand analysis of the FOMC meetings. The FOMC "reviews economic and financial conditions, determines the appropriate stance of monetary policy, and assesses the risks to its long-run goals of price stability and sustainable economic growth" (https://www.federalreserve.gov/monetarypolicy/fomc.htm)

### The Dataset:
The dataset I used is the FOMC transcripts from each of their meetings. I created (with Claude 3.5 Sonnet (New)) a web scraper to read the FOMC website and download the PDFs

### The Solution:
[1.](#web-scraping) Download PDF transcripts from the official FOMC website using `fomc-crawler.py`

[2.](#Conversion) Convert the PDFs to text files with `pdf-to-txt.py`

[3.](#BERT-based-Sentiment-Analysis) Utilize a slightly modified version of tabularisai's robust-sentiment-analysis (distil)BERT-based Sentiment Classification Model `https://huggingface.co/tabularisai/robust-sentiment-analysis` for sentiment analysis

[4.](#Summarization) Summarize each document via pipeline of Falconsai's text_summarization Fine-Tuned T5 Small for Text Summarization Model `https://huggingface.co/Falconsai/text_summarization`

[5.](#Question-Answering) Answer the question "What is the current status of the economy?" from each meeting by using consciousAI's question-answering-roberta-base-s-v2 for Question Answering `https://huggingface.co/consciousAI/question-answering-roberta-base-s-v2`

---

The following pip installs may be necessary to run the web scraper and pdf-to-text converter:

In [12]:
# For Web Scraper and PDF-to-TXT:
!pip install requests tqdm beautifulsoup4 pdfplumber datasets

# If you encounter an error, you may not have Windows Long Path support enabled. 
# You can find information on how to enable this at https://pip.pypa.io/warnings/enable-long-paths
!pip install --upgrade nbformat
!pip install plotly
!pip install transformers
#!pip install nbformat==4.2.0
!pip install ipywidgets




[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: C:\Users\drew1\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


Collecting nbformat
  Downloading nbformat-5.10.4-py3-none-any.whl.metadata (3.6 kB)
Downloading nbformat-5.10.4-py3-none-any.whl (78 kB)
Installing collected packages: nbformat
  Attempting uninstall: nbformat
    Found existing installation: nbformat 4.2.0
    Uninstalling nbformat-4.2.0:
      Successfully uninstalled nbformat-4.2.0
Successfully installed nbformat-5.10.4



[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: C:\Users\drew1\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: C:\Users\drew1\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: C:\Users\drew1\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: C:\Users\drew1\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [2]:
import os
import re
import nltk
import torch
import numpy as np
import torch.nn as nn
import torch.optim as optim
from datasets import load_dataset
import plotly.graph_objects as go
from   nltk.tokenize import sent_tokenize
from   transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

#nltk.download('punkt') # comment after downloading
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# [Web Scraping](#Project-Description)

To retrieve fresh data, you must run `./data/fomc-crawler.py` and `./data/pdf-to-txt.py` to download all the FOMC transcript PDFs first, then convert the PDFs to TXT

Scrape FOMC Transcripts from https://www.federalreserve.gov/monetarypolicy/fomccalendars.htm

Please wait about 1 to 3 minutes

Code written by Claude 3.5 Sonnet (New)

In [3]:
!python ./data/fomc-crawler.py
# Outputs to ./data/fomc_transcripts

Finding press conference pages...


  0%|          | 0/42 [00:00<?, ?it/s]
  2%|▏         | 1/42 [00:00<00:11,  3.67it/s]
  5%|▍         | 2/42 [00:00<00:11,  3.43it/s]
  7%|▋         | 3/42 [00:00<00:11,  3.51it/s]
 10%|▉         | 4/42 [00:01<00:10,  3.49it/s]
 12%|█▏        | 5/42 [00:01<00:10,  3.48it/s]
 14%|█▍        | 6/42 [00:01<00:10,  3.59it/s]
 17%|█▋        | 7/42 [00:01<00:10,  3.50it/s]
 19%|█▉        | 8/42 [00:02<00:09,  3.57it/s]
 21%|██▏       | 9/42 [00:02<00:09,  3.42it/s]
 24%|██▍       | 10/42 [00:02<00:09,  3.33it/s]
 26%|██▌       | 11/42 [00:03<00:09,  3.40it/s]
 29%|██▊       | 12/42 [00:03<00:08,  3.50it/s]
 31%|███       | 13/42 [00:03<00:08,  3.46it/s]
 33%|███▎      | 14/42 [00:04<00:07,  3.50it/s]
 36%|███▌      | 15/42 [00:04<00:08,  3.26it/s]
 38%|███▊      | 16/42 [00:04<00:07,  3.33it/s]
 40%|████      | 17/42 [00:04<00:07,  3.23it/s]
 43%|████▎     | 18/42 [00:05<00:07,  3.11it/s]
 45%|████▌     | 19/42 [00:05<00:07,  3.24it/s]
 48%|████▊     | 20/42 [00:05<00:06,  3.23it/s]
 50%|████



Found 42 press conference pages.

Gathering transcript PDF links...

Found 40 transcript PDFs to download:
- https://www.federalreserve.gov/mediacenter/files/FOMCpresconf20250129.pdf
- https://www.federalreserve.gov/mediacenter/files/FOMCpresconf20240131.pdf
- https://www.federalreserve.gov/mediacenter/files/FOMCpresconf20240320.pdf
- https://www.federalreserve.gov/mediacenter/files/FOMCpresconf20240501.pdf
- https://www.federalreserve.gov/mediacenter/files/FOMCpresconf20240612.pdf
- https://www.federalreserve.gov/mediacenter/files/fomcpresconf20240731.pdf
- https://www.federalreserve.gov/mediacenter/files/FOMCpresconf20240918.pdf
- https://www.federalreserve.gov/mediacenter/files/FOMCpresconf20241107.pdf
- https://www.federalreserve.gov/mediacenter/files/FOMCpresconf20241218.pdf
- https://www.federalreserve.gov/mediacenter/files/FOMCpresconf20230201.pdf
- https://www.federalreserve.gov/mediacenter/files/FOMCpresconf20230322.pdf
- https://www.federalreserve.gov/mediacenter/files/FOMC

---
# [Conversion](#Project-Description)

Convert PDFs to TXT

Please wait 1 to 3 minutes

Code written by Claude 3.5 Sonnet (New)

--- 

In [4]:
!python ./data/pdf-to-txt.py
# Outputs to ./data/extracted_text

Batch conversion completed successfully!


2025-02-14 17:26:52,950 - INFO - Successfully generated FOMCpresconf20190130.txt
2025-02-14 17:26:54,679 - INFO - Successfully generated FOMCpresconf20190320.txt
2025-02-14 17:26:56,182 - INFO - Successfully generated FOMCpresconf20190501.txt
2025-02-14 17:26:57,769 - INFO - Successfully generated FOMCpresconf20190619.txt
2025-02-14 17:26:59,439 - INFO - Successfully generated FOMCpresconf20190731.txt
2025-02-14 17:27:01,366 - INFO - Successfully generated FOMCpresconf20190918.txt
2025-02-14 17:27:03,237 - INFO - Successfully generated FOMCpresconf20191030.txt
2025-02-14 17:27:05,260 - INFO - Successfully generated FOMCpresconf20191211.txt
2025-02-14 17:27:07,197 - INFO - Successfully generated FOMCpresconf20200129.txt
2025-02-14 17:27:08,941 - INFO - Successfully generated FOMCpresconf20200429.txt
2025-02-14 17:27:11,260 - INFO - Successfully generated FOMCpresconf20200610.txt
2025-02-14 17:27:13,358 - INFO - Successfully generated FOMCpresconf20200729.txt
2025-02-14 17:27:15,684 - IN

In [5]:
# Data directory
TEXT_DIR = "./data/extracted_text" # Local FOMC transcript data as .txt

# Summary directory
SUMMARY_DIR = "./data/summaries"

#  Save text files and their data to a dictionary
txt_fileNames = [txt for txt in os.listdir(TEXT_DIR) if txt.endswith('.txt')]

txt_data = [open(os.path.join(TEXT_DIR, file), 'r', encoding='utf-8').read() for file in txt_fileNames]

textDict = {fileName: data for fileName, data in zip(txt_fileNames, txt_data)}

print(f"{len(txt_fileNames)} documents ready for analysis!")

# If I had more time to fix up the code to get it using datasets I would use this
# From https://www.youtube.com/watch?v=enObIMzyaE4
# transcripts = []
# for t in textDict:
#     transcripts.append({
#         'title': t,
#         'body': textDict[t]
#     })
# import json
# def save_as_jsonl(data, filename):
#     with open(filename, "w") as f:
#         for transcript in data:
#             f.write(json.dumps(transcript) + "\n")
# save_as_jsonl(transcripts, "train.jsonl")
# data_files = {"train": "train.jsonl"}
# dataset = load_dataset("json", data_files=data_files)
# print(dataset)

48 documents ready for analysis!


---
Below is a helper function that splits input text into chunks due to limited context sizes of the semantic analyzer and summarizer.

Written by Claude 3.5 Sonnet (New)

In [6]:
def chunk_text(text, max_chunk_size):
    """
    Split text into chunks based on sentences to respect max token limit.
    Tries to keep sentences together while staying under the token limit.
    """
    sentences = sent_tokenize(text)
    chunks = []
    current_chunk = []
    current_length = 0
    
    for sentence in sentences:
        # Rough approximation of tokens (words + punctuation)
        sentence_length = len(sentence.split())
        
        if current_length + sentence_length > max_chunk_size:
            if current_chunk:  # Save current chunk if it exists
                chunks.append(' '.join(current_chunk))
                current_chunk = [sentence]
                current_length = sentence_length
            else:  # Handle case where single sentence exceeds max_chunk_size
                chunks.append(sentence)
                current_chunk = []
                current_length = 0
        else:
            current_chunk.append(sentence)
            current_length += sentence_length
    
    # Add the last chunk if it exists
    if current_chunk:
        chunks.append(' '.join(current_chunk))
    
    return chunks

---
# [BERT-based Sentiment Analysis](#Project-Description)

tabularisai's robust-sentiment-analysis used via pipeline:

Modified to be chunked for longer input texts

also outputs probability distribution, rather than just the highest result

Please wait 2 to 4 minutes

---

In [7]:
model_name = "tabularisai/robust-sentiment-analysis"
sentimentAnalysis = pipeline(model=model_name, device=device)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Pipeline from Hugging Face (copied from example on page, had to modify to get probability distribution)
def predict_sentiment(text):
	inputs = tokenizer(text.lower(), return_tensors="pt", truncation=True, padding=True, max_length=512)
	with torch.no_grad():
		outputs = model(**inputs)
	
	probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
	predicted_class = torch.argmax(probabilities, dim=-1).item()
	
	probs_list = probabilities[0].tolist()
	sentiment_map = {0: "Very Negative", 1: "Negative", 2: "Neutral", 3: "Positive", 4: "Very Positive"}
	
	# Create a dictionary of sentiment labels and their probabilities
	sentiment_probs = {
						sentiment_map[i]: prob
						for i, prob in enumerate(probs_list)
						}

	return {
			'predicted_class': sentiment_map[predicted_class],
			'probabilities': sentiment_probs
			}

# Function written by Claude 3.5 Sonnet (New) to allow the pipeline to handle longer input text
def analyze_long_text(text, max_chunk_size):
	"""
	Analyze sentiment of long text by breaking it into chunks and averaging results.
	"""
	# Clean text
	text = text.replace('\n', ' ').strip()
	
	# Split into chunks using existing chunk_text function
	chunks = chunk_text(text, max_chunk_size)
	
	# Analyze each chunk
	chunk_sentiments = {"Very Negative": 0, "Negative": 0, "Neutral": 0, "Positive": 0, "Very Positive": 0}
	valid_chunks = 0
	
	for chunk in chunks:
		try:
			result = predict_sentiment(chunk) # Uses modified pipeline
			for sentiment, prob in result['probabilities'].items():
				chunk_sentiments[sentiment] += prob
			valid_chunks += 1
		except Exception as e:
			print(f"Error processing chunk: {e}")
			continue
	
	# Average the sentiments
	if valid_chunks > 0:
		for sentiment in chunk_sentiments:
			chunk_sentiments[sentiment] /= valid_chunks
	
	# Determine overall sentiment
	max_sentiment = max(chunk_sentiments.items(), key=lambda x: x[1])
	
	return {
			'predicted_class': max_sentiment[0],
			'probabilities': chunk_sentiments
			}

# Updated sentiment analysis loop
sentimentCount = {"Very Negative": 0, "Negative": 0, "Neutral": 0, "Positive": 0, "Very Positive": 0}
sentimentProbs = {"Very Negative": [], "Negative": [], "Neutral": [], "Positive": [], "Very Positive": []}
for txt in textDict:
    try:
        result = analyze_long_text(textDict[txt], max_chunk_size=256)
        print(f"File: {txt}")
        print(f"Predicted Sentiment: {result['predicted_class']}")
        print("Probability Distribution:")
        for sentiment, prob in result['probabilities'].items():
            print(f"  {sentiment}: {prob * 100:.2f}%")
            sentimentCount[sentiment] += prob 		# Save the probability to get the averages
            sentimentProbs[sentiment].append(prob)	# Save each probability for each sentiment
        print()
    except Exception as e:
        print(f"Error processing {txt}: {e}")


config.json:   0%|          | 0.00/819 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.20k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

File: FOMCpresconf20190130.txt
Predicted Sentiment: Neutral
Probability Distribution:
  Very Negative: 3.85%
  Negative: 9.57%
  Neutral: 60.05%
  Positive: 18.53%
  Very Positive: 8.01%

File: FOMCpresconf20190320.txt
Predicted Sentiment: Neutral
Probability Distribution:
  Very Negative: 3.87%
  Negative: 9.87%
  Neutral: 55.66%
  Positive: 20.83%
  Very Positive: 9.77%

File: FOMCpresconf20190501.txt
Predicted Sentiment: Neutral
Probability Distribution:
  Very Negative: 3.56%
  Negative: 9.05%
  Neutral: 57.36%
  Positive: 21.39%
  Very Positive: 8.64%

File: FOMCpresconf20190619.txt
Predicted Sentiment: Neutral
Probability Distribution:
  Very Negative: 3.78%
  Negative: 9.37%
  Neutral: 54.29%
  Positive: 22.96%
  Very Positive: 9.60%

File: FOMCpresconf20190731.txt
Predicted Sentiment: Neutral
Probability Distribution:
  Very Negative: 4.34%
  Negative: 10.94%
  Neutral: 54.28%
  Positive: 21.66%
  Very Positive: 8.78%

File: FOMCpresconf20190918.txt
Predicted Sentiment: Neutral

In [8]:
# Print average sentiment confidence
avgSentimentPcts = []
for sentiment in sentimentCount:
	avgSentimentPcts.append(float(f"{sentimentCount[sentiment]/len(textDict) * 100:.2f}"))
	print(f"Average {sentiment}: \t{sentimentCount[sentiment]/len(textDict) * 100:.2f}%")
#print(avgSentimentPcts)

Average Very Negative: 	4.72%
Average Negative: 	10.21%
Average Neutral: 	53.03%
Average Positive: 	21.69%
Average Very Positive: 	10.34%


In [13]:
# Data preparation
sentiments = ["Very Negative", "Negative", "Neutral", "Positive", "Very Positive"]
percentages = avgSentimentPcts
colors = ["#ff4d4d", "#ff8c8c", "#8c8c8c", "#7fbf7f", "#2eb82e"]

# Create the bar chart
barChart = go.Figure(data=[
    go.Bar(
        x=sentiments,
        y=percentages,
        marker_color=colors,
        text=[f'{p}%' for p in percentages],
        textposition='auto',
    )
])

barChart.update_layout(
    title='Average FOMC Sentiment Distribution',
    xaxis_title='Sentiment',
    yaxis_title='Percentage (%)',
    yaxis_range=[0, 100],
    template='plotly_white',
    bargap=0.2
)

barChart.show()

#####

# Create the line chart with 5 different lines for each sentiment
lineChart = go.Figure()

for i, sentiment in enumerate(sentiments):
    lineChart.add_scatter(
        x=list(range(len(sentimentProbs[sentiment]))),
        y=[p * 100 for p in sentimentProbs[sentiment]],
        mode='lines',
        name=sentiment,
        line=dict(color=colors[i])
    )

lineChart.update_layout(
    title='Sentiment Over Time',
    xaxis_title='Time',
    yaxis_title='Percentage (%)',
    template='plotly_white'
)

lineChart.show()

ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

---
# [Summarization](#Project-Description)

Falconsai's text_summarization used via pipeline:

Modified to be chunked for longer input texts

Please wait 14 - 18 minutes

---

In [14]:
summarizer = pipeline(model="Falconsai/text_summarization", device=device)

# Function written by Claude 3.5 Sonnet (New) to allow the pipeline to handle longer input text
def summarize_long_text(text, summarizer, max_length_div, min_length_div, max_chunk_size):
    """
    Summarize long text by breaking it into chunks and combining summaries.
    """
    # Clean text
    text = text.replace('\n', ' ').strip()
    
    # Split into chunks
    chunks = chunk_text(text, max_chunk_size)
    chunkLen = len(chunks)
    max_length = chunkLen // max_length_div
    min_length = chunkLen // min_length_div

    # Summarize each chunk
    chunk_summaries = []
    for chunk in chunks:
        try:
            result = summarizer(chunk, max_length=max_length, min_length=min_length) # Pipeline from Hugging Face
            chunk_summaries.append(result[0]['summary_text'])
        except Exception as e:
            print(f"Error processing chunk: {e}")
            continue
    
    # Combine chunk summaries by appending them
    if len(chunks) == 1:
        return chunk_summaries[0]
    else:
        # For multiple chunks, append the summaries together
        combined_summary = ' '.join(chunk_summaries)
        return combined_summary

counter = 0
total = len(textDict)
for txt in textDict:
    try:
        length = len(textDict[txt])
        summary = summarize_long_text(
            text=textDict[txt],
            summarizer=summarizer,
            max_length_div=2, 	# divisor of chunk
            min_length_div=4, 	# divisor of chunk
            max_chunk_size=256	# Adjust based on model's token limit
        )
        if not os.path.exists(SUMMARY_DIR):
            os.makedirs(SUMMARY_DIR)
        with open(os.path.join(SUMMARY_DIR, txt), "w+") as summary_file:
            summary_file.write(f"File: {txt}\nSummary: {summary}\n")
            counter += 1
            print(f"{counter}/{total} files summarized.")
    except Exception as e:
        print(f"Error processing {txt}: {e}")
print(f"Finished outputting all summaries to ./data/summaries!")

1/48 files summarized.
2/48 files summarized.
3/48 files summarized.
4/48 files summarized.
5/48 files summarized.
6/48 files summarized.
7/48 files summarized.
8/48 files summarized.
9/48 files summarized.
10/48 files summarized.
11/48 files summarized.
12/48 files summarized.
13/48 files summarized.
14/48 files summarized.
15/48 files summarized.
16/48 files summarized.
17/48 files summarized.
18/48 files summarized.
19/48 files summarized.
20/48 files summarized.
21/48 files summarized.
22/48 files summarized.
23/48 files summarized.
24/48 files summarized.
25/48 files summarized.
26/48 files summarized.
27/48 files summarized.
28/48 files summarized.
29/48 files summarized.
30/48 files summarized.
31/48 files summarized.
32/48 files summarized.
33/48 files summarized.
34/48 files summarized.
35/48 files summarized.
36/48 files summarized.
37/48 files summarized.
38/48 files summarized.
39/48 files summarized.
40/48 files summarized.
41/48 files summarized.
42/48 files summarized.
4

---
List compression rate of summaries

Written by Claude 3.5 Sonnet (New)

Modified by myself

In [15]:
# Get list of original and summary files
original_files = [f for f in os.listdir(TEXT_DIR) if f.endswith('.txt')]
summary_files = [f for f in os.listdir(SUMMARY_DIR) if f.endswith('.txt')]

# Initialize a list to store compression results
compression_results = []

# Function to read file content with error handling
def read_file_content(file_path):
    try:
        with open(file_path, 'r', encoding='utf-8') as file:
            return file.read()
    except UnicodeDecodeError:
        with open(file_path, 'r', encoding='ISO-8859-1') as file:  # Fallback encoding
            return file.read()

# Compare lengths and calculate compression percentage
for original_file in original_files:
	original_path = os.path.join(TEXT_DIR, original_file)
	summary_path = os.path.join(SUMMARY_DIR, original_file)

	# Check if summary file exists
	if original_file in summary_files:
		original_content = read_file_content(original_path)
		summary_content = read_file_content(summary_path)

		original_length = len(original_content)
		summary_length = len(summary_content)

		# Calculate compression percentage
		compression_percent = ((original_length - summary_length) / original_length) * 100
		compression_results.append({
			"file": original_file,
			"original_length": original_length,
			"summary_length": summary_length,
			"compression_percent": compression_percent
		})

# Display results
for result in compression_results:
    print(f"{result['file']}: {result['original_length']} -> {result['summary_length']} (characters) | Compression: {result['compression_percent']:.2f}%")

FOMCpresconf20190130.txt: 44387 -> 2096 (characters) | Compression: 95.28%
FOMCpresconf20190320.txt: 42111 -> 1827 (characters) | Compression: 95.66%
FOMCpresconf20190501.txt: 37670 -> 1424 (characters) | Compression: 96.22%
FOMCpresconf20190619.txt: 40790 -> 1693 (characters) | Compression: 95.85%
FOMCpresconf20190731.txt: 42756 -> 1916 (characters) | Compression: 95.52%
FOMCpresconf20190918.txt: 49051 -> 2637 (characters) | Compression: 94.62%
FOMCpresconf20191030.txt: 44764 -> 2017 (characters) | Compression: 95.49%
FOMCpresconf20191211.txt: 50162 -> 2634 (characters) | Compression: 94.75%
FOMCpresconf20200129.txt: 52787 -> 2828 (characters) | Compression: 94.64%
FOMCpresconf20200429.txt: 44014 -> 1696 (characters) | Compression: 96.15%
FOMCpresconf20200610.txt: 56502 -> 3267 (characters) | Compression: 94.22%
FOMCpresconf20200729.txt: 54908 -> 2999 (characters) | Compression: 94.54%
FOMCpresconf20200916.txt: 60597 -> 3729 (characters) | Compression: 93.85%
FOMCpresconf20201105.txt:

---
# [Question Answering](#Project-Description)

consciousAI's question answering used via pipeline:

Ask a question to see how the FOMC's answer changes over time

Please wait 8 - 12 minutes

---

In [16]:
questAns = pipeline(model="consciousAI/question-answering-roberta-base-s-v2", device=device)

# Example Questions:											#Avg Question Quality/Confidence
#question="What is the current status of the economy? "			#57.97%
#question="What is the future of the economy going to be? "		#44.17%
#question="What is the current rate of inflation? " 			#89.51% 	#useful question and high quality rating
#question="What is the status of the stock market? " 			#25.63%
#question="What have been the main economic concerns lately? " 	#65.54%
#question="What are the key decisions being made today? " 		#51.64%
#question="What is the current federal funds rate? "			#73.75%
#question="How long until the quantitative easing ends? "		#48.38%
#question="How much debt is the government in? "				#12.43% 	#useful question but low quality rating
#question="How many Americans are unemployed? "					#66.61%
#question="What is the best news from this meeting? "			#55.04%
#question="What time of day is it? "							#78.1% 		#non-useful question but high quality rating
#question="What color is my underwear? "						#16.05% 	#non-useful question and low quality rating
question="What is the current rate of inflation? Only show me the exact number " #91.02% 	#useful question and high quality rating, fine tuned prompt results in better quality resposnes
#question = input("Enter your question: ")
print(question)
print()
scoreArray = []
for file in textDict:
    answer 	= questAns(question=question, context=textDict[file])
    date 	= file[12:20]
    year 	= date[0:4]
    month 	= date[4:6]
    day 	= date[6:8]
    print(f"{month}/{day}/{year}: {round(answer['score'] * 100, 2)}%:\t", end="")
    scoreArray.append(answer['score'])
    answer = re.sub(r'\n', ' ', answer['answer'])
    print(f"{answer}")

npScoreArray= np.array(scoreArray)
mean 		= np.mean(npScoreArray)
variance 	= np.var(npScoreArray)
std_dev 	= np.std(npScoreArray)
std_err 	= std_dev/np.sqrt(len(npScoreArray))

print()
print(f"Mean (confidence): {mean:.2%}")
print(f"Standard Deviation: {std_dev:.2%}")
print(f"Variance: {variance:.2%}")
print(f"Standard Error: {std_err:.2%}")

What is the current rate of inflation? Only show me the exact number 

01/30/2019: 74.11%:	between 2.25 and 2½
03/20/2019: 73.24%:	close to target
05/01/2019: 98.16%:	below 2 percent
06/19/2019: 92.05%:	2 percent
07/31/2019: 95.56%:	2 percent
09/18/2019: 89.23%:	2 percent
10/30/2019: 97.6%:	2.9 percent
12/11/2019: 87.5%:	2 percent
01/29/2020: 95.05%:	2 percent
04/29/2020: 23.84%:	in the third quarter
06/10/2020: 98.26%:	near zero
07/29/2020: 61.17%:	well below our symmetric 2 percent
09/16/2020: 96.99%:	2 percent
11/05/2020: 95.44%:	2 percent
12/16/2020: 95.02%:	2 percent
01/27/2021: 94.32%:	less than 2 percent
03/17/2021: 92.81%:	2 percent
04/28/2021: 94.71%:	below 2 percent
06/16/2021: 97.2%:	8.4 percent
07/28/2021: 88.83%:	above 2 percent
09/22/2021: 95.11%:	2 percent
11/03/2021: 47.81%:	2 percent
12/15/2021: 88.95%:	4.2 percent
01/26/2022: 96.76%:	2 percent
03/16/2022: 99.4%:	4.3 percent
05/04/2022: 92.31%:	2 percent
06/15/2022: 98.56%:	2 percent
07/27/2022: 99.53%:	3.6
09/21/2022: