<a href="https://colab.research.google.com/github/boucher-broderick/Ml_AI/blob/main/Homework_6_RL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Task 1

The dataset I've selected consists of CNBC news articles, encompassing a variety of information to represent each article comprehensively. It features essential details like the article's title, giving a snapshot of the content, and the URL, which links directly to the full article on CNBC's website. The publication date and the author's name are also included, offering insight into the timing of the news and crediting the journalist responsible for the piece.

Additionally, the dataset provides the name of the publisher, which is CNBC across all articles, highlighting the source of the information. Each entry comes with a short description, serving as an executive summary that captures the key points of the article in a few sentences. The dataset distinguishes between 'raw_description' and 'description' fields, which might contain extended summaries or the full text of the articles, providing depth to the dataset's textual content.

https://data.world/crawlfeeds/cnbc-news-dataset

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("../content/cnbc_news_datase.csv")
df.head()

Unnamed: 0,title,url,published_at,author,publisher,short_description,keywords,header_image,raw_description,description,scraped_at
0,Santoli’s Wednesday market notes: Could Septem...,https://www.cnbc.com/2021/09/29/santolis-wedne...,2021-09-29T17:09:39Z,Michael Santoli,CNBC,"This is the daily notebook of Mike Santoli, CN...","cnbc, Premium, Articles, Investment strategy, ...",https://image.cnbcfm.com/api/v1/image/10694960...,"<div class=""group""><p><em>This is the daily no...","This is the daily notebook of Mike Santoli, CN...",2021-10-30T14:11:23.709372
1,My take on the early Brexit winners and losers,https://www.cnbc.com/2016/06/24/ian-bremmers-t...,2016-06-24T17:50:48Z,,CNBC,This commentary originally ran on Facebook. Bo...,"Articles, Politics, Europe News, European Cent...",https://fm.cnbc.com/applications/cnbc.com/reso...,,,2021-10-30T14:11:23.820139
2,Europe&#039;s recovery depends on Renzi&#039;s...,https://www.cnbc.com/2014/03/25/europes-recove...,2014-03-25T17:29:45Z,,CNBC,"In spring, ambitious reforms began in Italy. U...","Articles, Business News, Economy, Europe Econo...",https://fm.cnbc.com/applications/cnbc.com/reso...,,,2021-10-30T14:11:23.85471
3,US Moves Closer to Becoming A Major Shareholde...,https://www.cnbc.com/2009/04/22/us-moves-close...,2009-04-22T19:49:03Z,Michelle Caruso-Cabrera,CNBC,The US government is increasingly likely to co...,"cnbc, Articles, General Motors Co, Business Ne...",https://image.cnbcfm.com/api/v1/image/24947979...,"<div class=""group""><p>The US government is inc...",The US government is increasingly likely to co...,2021-10-30T14:11:24.261143
4,Trump: 'Mission accomplished' on 'perfectly ex...,https://www.cnbc.com/2018/04/14/trump-mission-...,2018-04-14T14:59:04Z,Javier E. David,CNBC,,"cnbc, Articles, George W. Bush, Vladimir Putin...",https://image.cnbcfm.com/api/v1/image/10513177...,"<div class=""group""></div>,<div class=""group""><...",President Donald Trump hailed the U.S.-led int...,2021-10-30T14:11:24.48949


We will now begin exploring and cleaning the dataset. Within the dataset we mainly care about two fields: description and short description. The input for the model will be the description and to check the accuracy the short decscription will be used. We will check for null values and in the end only use 20 records. This is because using Bart is RAM intensive and google colab has a limited amount.

In [3]:
df.head(5)[['title', 'description']]

Unnamed: 0,title,description
0,Santoli’s Wednesday market notes: Could Septem...,"This is the daily notebook of Mike Santoli, CN..."
1,My take on the early Brexit winners and losers,
2,Europe&#039;s recovery depends on Renzi&#039;s...,
3,US Moves Closer to Becoming A Major Shareholde...,The US government is increasingly likely to co...
4,Trump: 'Mission accomplished' on 'perfectly ex...,President Donald Trump hailed the U.S.-led int...


In [4]:
df.isnull().sum()
df.duplicated().sum()
df.dtypes

print("Missing values:\n", df.isnull().sum())
print("\nDuplicate values:\n", df.duplicated().sum())
print("\nData types:\n", df.dtypes)


Missing values:
 title                  0
url                    0
published_at           0
author               228
publisher              0
short_description     16
keywords               0
header_image           6
raw_description       31
description           32
scraped_at             0
dtype: int64

Duplicate values:
 0

Data types:
 title                object
url                  object
published_at         object
author               object
publisher            object
short_description    object
keywords             object
header_image         object
raw_description      object
description          object
scraped_at           object
dtype: object


In [5]:
df.shape

(625, 11)

In [6]:
df = df.dropna(subset=['description', 'raw_description','author','short_description','header_image'])
df.shape


(373, 11)

In [7]:
df = df.head(20)
df.shape

(20, 11)

In [7]:
!pip install accelerate -U
!pip install transformers[torch] -U



In [8]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

We'll use the "facebook/bart-base" model for summarizing news articles. First, we'll split our dataset of CNBC news articles into training and test sets, keeping 10% of the data for testing. We'll convert these into formats that Hugging Face's Dataset class understands.Next, we'll write a function to preprocess our text data. This function will shorten the articles and their summaries to fit our model's input requirements, ensuring that both inputs and labels are correctly tokenized.

After preprocessing, we'll get our model and tokenizer ready. We're using BART, a model known for its effectiveness in summarization tasks. We'll also set up a data collator to correctly batch our data and manage padding.For training, we'll define parameters like how many epochs to run, the size of each batch, when to evaluate the model, and more, using TrainingArguments. Despite having the option, we won't use mixed precision training (FP16) to avoid compatibility issues with certain hardware.Finally, we'll kick off the training process with the Trainer class, feeding it our model, the training arguments, our datasets, and the data collator. Once training starts, our model will learn how to summarize news articles based on the examples from our dataset.

In [10]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, Trainer, TrainingArguments, DataCollatorForSeq2Seq
from datasets import Dataset
from sklearn.model_selection import train_test_split

model_name = "facebook/bart-base"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)


train_df, test_df = train_test_split(df[['description', 'short_description']], test_size=0.1, random_state=42)

train_dataset = Dataset.from_pandas(train_df)
test_dataset = Dataset.from_pandas(test_df)


def preprocess_function(examples):
    model_inputs = tokenizer(examples['description'], max_length=256, truncation=True)
    labels = tokenizer(text_target=examples['short_description'], max_length=64, truncation=True)
    model_inputs['labels'] = labels['input_ids']
    return model_inputs

train_dataset = train_dataset.map(preprocess_function, batched=True, load_from_cache_file=False)
test_dataset = test_dataset.map(preprocess_function, batched=True, load_from_cache_file=False)


data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model, return_tensors="pt")


training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir="./logs",
    evaluation_strategy="steps",
    eval_steps=500,
    save_steps=500,
    gradient_accumulation_steps=4,
    fp16=False,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    data_collator=data_collator,
)

trainer.train()


Map:   0%|          | 0/18 [00:00<?, ? examples/s]

Map:   0%|          | 0/2 [00:00<?, ? examples/s]

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


Step,Training Loss,Validation Loss


TrainOutput(global_step=6, training_loss=0.37338487307230633, metrics={'train_runtime': 143.9946, 'train_samples_per_second': 0.375, 'train_steps_per_second': 0.042, 'total_flos': 7134631096320.0, 'train_loss': 0.37338487307230633, 'epoch': 2.67})

In [12]:
!pip install rouge_score
!pip install evaluate

Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24933 sha256=002dec432329164dc0b98087012cb72afdd7cbd7080f4f92cb094ce8240346a5
  Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4
Successfully built rouge_score
Installing collected packages: rouge_score
Successfully installed rouge_score-0.1.2
Collecting evaluate
  Downloading evaluate-0.4.1-py3-none-any.whl (84 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
Collecting responses<0.19 (from evaluate)
  Downloading responses-0.18.0-py3-none-any.whl (38 kB)
Installing collected packages: responses, evaluate
Successfully installed evaluate-0.4

In [23]:
import torch
import evaluate


bleu_metric = evaluate.load("bleu")
rouge_metric = evaluate.load("rouge")

def generate_summaries(batch):
    inputs = tokenizer(batch["description"], padding="max_length", truncation=True, max_length=256, return_tensors="pt")
    inputs = {k: v.to(model.device) for k, v in inputs.items()}
    outputs = model.generate(**inputs, max_length=64, num_beams=5)
    batch["pred_summary"] = tokenizer.batch_decode(outputs, skip_special_tokens=True)
    return batch

test_results = test_dataset.map(generate_summaries, batched=True, batch_size=8)

predictions = test_results["pred_summary"]
references = [["".join(ref)] for ref in test_results["short_description"]]

bleu_result = bleu_metric.compute(predictions=predictions, references=references)
rouge_result = rouge_metric.compute(predictions=predictions, references=references)

print("BLEU score:", bleu_result)
print("ROUGE score:", rouge_result)


Map:   0%|          | 0/2 [00:00<?, ? examples/s]

BLEU score: {'bleu': 0.7915386289638935, 'precisions': [0.797979797979798, 0.7938144329896907, 0.7894736842105263, 0.7849462365591398], 'brevity_penalty': 1.0, 'length_ratio': 1.1511627906976745, 'translation_length': 99, 'reference_length': 86}
ROUGE score: {'rouge1': 0.7993527508090614, 'rouge2': 0.7932153871124817, 'rougeL': 0.7993527508090614, 'rougeLsum': 0.7993527508090614}


In [30]:
for i in range(2):
  print("Short Description: ")
  print(references[i])
  print("Generated Short Description: ")
  print(predictions[i] + '\n')

Short Description: 
["This is the daily notebook of Mike Santoli, CNBC's senior markets commentator, with ideas about trends, stocks and market statistics."]
Generated Short Description: 
This is the daily notebook of Mike Santoli, CNBC's senior markets commentator, with ideas about trends, stocks and market statistics.A muted, inconclusive bounce that has left the indexes fully within yesterday's low-to-high range all morning so far.

Short Description: 
['Apple as a non-confirmation? It started in the middle of February: Transports failed to keep advancing with the Industrials. At the time, the failure was blamed on higher fuel prices; indeed, oil prices hit a 10-month high as we closed out that month.That was the first non-confirmation that got bears going.']
Generated Short Description: 
Apple as a non-confirmation? It started in the middle of February: Transports failed to keep advancing with the Industrials. At the time, the failure was blamed on higher fuel prices; indeed, oil p

**Results**


After training the model, the results showed a high BLEU score of about 0.79, which means the model-generated summaries closely match the reference summaries in terms of the words and phrases used. The high precision scores across different n-grams (from 1-gram to 4-gram) confirm the model's ability to accurately reproduce not only individual words but also sequences of words up to four in length, which are critical for coherent and meaningful summaries.

The brevity penalty being 1.0 indicates there was no penalty applied due to length issues, suggesting the model-generated summaries are of an appropriate length compared to the reference summaries. The length ratio over 1 suggests the model's summaries are slightly longer than the reference summaries, but still within a desirable range.

The ROUGE scores are similarly high, with ROUGE-1, ROUGE-2, and ROUGE-L all around 0.79. These scores reflect the model's strong performance in capturing both the exact words (ROUGE-1) and the more complex phrases (ROUGE-2) from the reference summaries, as well as its ability to maintain the same level of performance when considering the longest common subsequences (ROUGE-L), which is important for the overall structure and flow of the text.

Overall, these results indicate that the model does an excellent job at summarizing the news articles, closely mirroring the reference summaries both in terms of the specific words and phrases used and the overall structure and length of the summaries.

**Parameters**

- **Batch Size:** The choice of a small batch size (2) likely helped in managing memory more efficiently and may have contributed to a more stable training process, although larger batch sizes could potentially improve performance further if memory allows.
- **Number of Epochs:** Training for 3 epochs indicates a balanced approach, preventing overfitting while ensuring the model had enough exposure to the training data to learn effectively. Adjusting the number of epochs could fine-tune performance, with more epochs potentially leading to better learning, up to a point.
- **Learning Rate & Warmup Steps:** While not explicitly mentioned, these factors significantly impact model training. The default learning rate and warmup steps would be used here, but tuning them could lead to improvements in model performance by managing how quickly the model learns and adapting the learning rate during the initial phase of training.
- **Evaluation Strategy:** Evaluating every 500 steps allows for close monitoring of the model's performance during training, helping identify the best model checkpoint.
Gradient Accumulation Steps: Set to 4, this allows for effectively simulating larger batch sizes, which is beneficial for stabilizing the training updates without exceeding memory limits.

**LLM**

- **Model Selection:** The choice of bart is significant. BART is known for its effectiveness in sequence-to-sequence tasks, including summarization. It is particularly suited to understanding the context and generating  relevant summaries.

- **Model Size:** The "base" variant represents a balance between computational efficiency and capacity for capturing complex patterns in the data. While larger models  might offer potentially higher performance, they would also require more computational resources and could lead to overfitting, especially on datasets of limited size.


# Task 2



An example of formulating RL problems as an MDP would be inventory management in retail. In this example, you need decide how much stock to keep on hand to meet demand without overstocking. The goal is to maximize profits and customer satisfaction by making smart inventory decisions.

**State Space**

The State Space includes everything known about the current situation. This would be:
- Quantity of each product we have
- Predictions about future sales
- Data on preivous sales
- Information from suppliers
- And outside factors such as holidays that may affect sales

**Action Space**

The Action Space includes actions that can be made revovling inventory. This would be:
- Ordering more of a certain product
- Sending stock of a product to a different store
- Putting items on sale to get rid of them
- Or influencing prices in general to increase or decrease sales

**Transition Model**

The Transition Model includes the result of the action that was chosen. This would include:
- When more stock is ordered the supplier might charge more or less
- Putting an item on sale causes the item to sell out an to be out of stock
- Raising prices causes there to be excess stock
There are many different outcomes that can happen depending on season, holidays and customers.

**Rewards**

The reward system would be defined easily as there are a few goals at hand. First, is to make as much money as possible and this will be measures in money. The second would be excess stock. Especially at something like a grocery store you do not want excess food since it could go bad and you won't make a profit. Therefore the number and quantity of access products, which can also be quanitified in money. The last reward would be if there was a product that ran out. More money could have been made if a given product was stocked therefore that can be quanitfied by money by looking at the rate at which the item sold on previous days and conintueing the trend.

With the current data and the rewards, an algorithm can be made to figure out what the best quanitity of goods is such that you make the most money, you don't have excess stock, and you don't miss out on opportunities to make more money since you ran out of stock.



#Task 3

Trading markets are complex and require lots of information for effective trading. Traditional trading algorithms often can't keep up with the new information coming into the market. Reinforcement learning could be a solution to optimize trading strategies. By training a system on a specific asset, like crypto or tech stocks, with historical price data, the system can learn and develop strategies over time. It gets rewarded with money for making profitable trades and loses money for bad trades. This process could lead to a strategy that aims for high returns while managing risk. RL works by making an agent (like a trading bot) interact with its environment (the market). The agent makes trades based on its current strategy, sees the results, and then tweaks its strategy to improve. The goal is to find complex trading strategies that traditional methods might miss.

Tensortrade is a current open source software that uses reinforced learning in order to create trading stratigies. The use of RL in the software can help uncover complex strategies that are non-obvious and difficult to derive using traditional rule-based trading systems. Tensortrade has a few guiding principles that it goes by: User friendly, Modularity, and Extensibility. The framework is designed to be modular, allowing users to easily swap out components and test different configurations of data sources, execution models, trading strategies, and reward schemes. Users can define custom trading environments, including market simulation based on historical data, streaming data, or synthetic data generators. This flexibility enables testing strategies under various market conditions. Also the framework includes tools for analyzing the performance of trading strategies, providing insights into profitability, risk, and other metrics crucial for evaluating the effectiveness of a strategy. Lastly, the framework focuses on being highly composable to allow the system to scale from simple trading strategies on a single CPU, to complex investment strategies run on a distribution of HPC machines.

Overall, "TensorTrade is an open source Python framework for building, training, evaluating, and deploying robust trading algorithms using reinforcement learning," and although it is still in its beta phase it offers a promising future for RL trading.

https://github.com/tensortrade-org/tensortrade