# Background
Load text that will be used to test summarization capabilities

In [1]:
from src.utils.io.text import load_from_text
txt = load_from_text("list_news_stories.txt")
len(txt)

10129

In [2]:
txt

"• META'S LLAMA COPYRIGHTED TRAINING (https://techcrunch.com/2025/01/09/mark-zuckerberg-gave-metas-llama-team-the-ok-to-train-on-copyrighted-works-filing-claims/):\n\tScore: 4 (tags: ['AI&GenAI', 'Model Training', 'Funding', 'Meta'])\n\tSummary: A recent filing alleges that Meta's Llama team trained on copyrighted material with approval from Mark Zuckerberg, raising concerns over intellectual property use in AI training.\n• MORE CAPABLE AI IS COMING, BUT WILL ITS BENEFITS BE EVENLY\nDISTRIBUTED? (https://techcrunch.com/2025/01/08/this-week-in-ai-more-capable-ai-is-coming-but-will-its-benefits-be-evenly-distributed/):\n\tScore: 4 (tags: ['AI&GenAI', 'Funding', 'OpenAI', 'Microsoft'])\n\tSummary: OpenAI CEO Sam Altman claims the company is progressing towards AGI and superintelligence that could accelerate innovation. However, concerns arise around AI's impact on jobs, with studies indicating AI initially boosts but eventually replaces certain freelance roles. Meanwhile, AI funding is su

In [3]:
from src.genai_model.summarizer import Summarizer
formatted_news_stories = Summarizer.format_news_stories(txt)

In [4]:
news_stories = ("\n").join(formatted_news_stories)
print(news_stories)

 META'S LLAMA COPYRIGHTED TRAINING :  A recent filing alleges that Meta's Llama team trained on copyrighted material with approval from Mark Zuckerberg, raising concerns over intellectual property use in AI training.

 MORE CAPABLE AI IS COMING, BUT WILL ITS BENEFITS BE EVENLY DISTRIBUTED? :  OpenAI CEO Sam Altman claims the company is progressing towards AGI and superintelligence that could accelerate innovation. However, concerns arise around AI's impact on jobs, with studies indicating AI initially boosts but eventually replaces certain freelance roles. Meanwhile, AI funding is surging, Microsoft is investing heavily in data centers, and Prime Intellect has launched a new pathogen detection model.

 AIS WILL INCREASINGLY ATTEMPT SHENANIGANS :  Recent research demonstrates that advanced AI models, like o1 and Llama 3.1, exhibit scheming behaviors such as deception and oversight subversion, even when given minimal prompting. This behavior raises concerns about AI models' potential ris

# Test Summarizer

In [5]:
from src.genai_model.summarizer import Summarizer

In [6]:
summarizer = Summarizer()

In [7]:
print(summarizer.summarizer.system_prompt)


You are an expert technical writer, specialized in the field of Artificial Intelligence.
Your speciality is to summarize multiple news articles into professional news digests.
You audience consists of senior business leaders who are interested in 
1. understanding the technological trends in AI, 
2. monitoring the fundraising activities in the ecosystem, and 
3. tracking the main AI companies.
        


In [8]:
print(summarizer._generate_user_prompt(news_stories))


I will provide you with a list of news articles, and I want you to summarize all those articles into a single paragraph. Here are a few additional requirements:
- The summary should be as short as possible.
- The core content of all the news stories should be contained in the summary. 
- If several news stories contain similar or identical material, this should not be repeated in the summary.

To produce the summaries, I want you to follow these steps:
1. Organize the news articles into a small number of categories.
2. For each category, summarize all news articles of that category into a concise paragraph.
3. Combine the summaries of all categories together.

You must only return the output from step 3, not the intermediate steps (steps 1 and 2).

Each news article follows the same format:
<TITLE> : <MAIN TEXT>
Here is the list of news articles that I want you to use to create your newsletter:
 META'S LLAMA COPYRIGHTED TRAINING :  A recent filing alleges that Meta's Llama team traine

In [9]:
response = summarizer.summarize(txt)

In [10]:
print(summarizer.get_content_from_response(response))

In 2024, the AI landscape saw significant technological advancements and financial activities. OpenAI's progress towards AGI and the emergence of AI models exhibiting scheming behaviors, such as those seen in o1 and Llama 3.1, highlighted both the potential and risks of increasingly capable AI, with Allen AI releasing a fully open-source model as a notable development. Generative AI investments surged to $56 billion, driven by enterprise demand, while major funding rounds included Databricks, OpenAI, xAI, and Waymo. Microsoft's $3 billion investment in India, Vultr's $333 million funding, and Hamming's $3.8 million seed round underscored the robust financial activity in the sector, although non-AI startups faced fundraising challenges. Additionally, the AI sector's growth influenced the energy sector, boosting interest in nuclear and fusion power. Key developments among AI companies included Meta's alleged use of copyrighted material for training its Llama model, OpenAI's missed deadli

# Basic prompt engineering

In [12]:
from src.genai_model.genai_model import GenAIModel

In [33]:
model = "gemini-exp-"
parameters = {"temperature": 0.2, "top_p": 0.5}

In [34]:
system_prompt_1 = "You are an expert technical writer, specialized in Artificial Intelligence."
user_prompt_1 = f"""
I will provide you with a list of news articles, and I want you to summarize all those articles into a single paragraph. Here are a few additional requirements:
- The summary should be as short as possible.
- The core content of all the news stories should be contained in the summary. 
- If several news stories contain similar or identical material, this should not be repeated in the summary.

Each news article follows the same format:
<TITLE> : <MAIN TEXT>
Here is the list of news articles that I want you to use to create your newsletter:
{news_stories}
"""
model_1 = GenAIModel(model_type=model, system_promt=system_prompt_1)

In [35]:
print(user_prompt_1)


I will provide you with a list of news articles, and I want you to summarize all those articles into a single paragraph. Here are a few additional requirements:
- The summary should be as short as possible.
- The core content of all the news stories should be contained in the summary. 
- If several news stories contain similar or identical material, this should not be repeated in the summary.

Each news article follows the same format:
<TITLE> : <MAIN TEXT>
Here is the list of news articles that I want you to use to create your newsletter:
 META'S LLAMA COPYRIGHTED TRAINING :  A recent filing alleges that Meta's Llama team trained on copyrighted material with approval from Mark Zuckerberg, raising concerns over intellectual property use in AI training.

 MORE CAPABLE AI IS COMING, BUT WILL ITS BENEFITS BE EVENLY DISTRIBUTED? :  OpenAI CEO Sam Altman claims the company is progressing towards AGI and superintelligence that could accelerate innovation. However, concerns arise around AI's

In [101]:
system_prompt_2 = """
You are an expert technical writer, specialized in the field of Artificial Intelligence.
Your speciality is to summarize multiple news articles into professional news digests.
You audience consists of senior business leaders who are interested in 
1. understanding the technological trends in AI, 
2. monitoring the fundraising activities in the ecosystem, and 
3. tracking the main AI companies.
"""
user_prompt_2 = f"""
I will provide you with a list of news articles, and I want you to summarize all those articles into a single paragraph. Here are a few additional requirements:
- The summary should be as short as possible.
- The core content of all the news stories should be contained in the summary. 
- If several news stories contain similar or identical material, this should not be repeated in the summary.

To produce the summaries, I want you to follow these steps:
1. Organize the news articles into a small number of categories.
2. For each category, summarize all news articles of that category into a concise paragraph.
3. Combine the summaries of all categories together.

You must only return the output from step 3, not the intermediate steps (steps 1 and 2).

Each news article follows the same format:
<TITLE> : <MAIN TEXT>
Here is the list of news articles that I want you to use to create your newsletter:
{news_stories}
"""
model_2 = GenAIModel(model_type=model, system_promt=system_prompt_2)

In [102]:
response_1 = model_1.completion_str(user_prompt=user_prompt_1, parameters=parameters)

In [108]:
print(response_1)

In 2024, AI funding surged, with generative AI investments reaching $56 billion and major deals involving companies like Databricks, OpenAI, xAI, and Waymo. Microsoft invested $3 billion in India's AI and cloud services, while Vultr raised $333 million for AI cloud infrastructure. Nvidia acquired Run:ai for $700 million and plans to open-source its software. OpenAI faced criticism for missing its opt-out tool deadline and is restructuring to a for-profit model. Concerns arose over AI's impact on jobs and its increasing capacity for scheming behaviors, as seen in models like o1 and Llama 3.1. Additionally, Meta's Llama team allegedly trained on copyrighted material with Mark Zuckerberg's approval. Despite these challenges, AI continues to drive innovation, with European AI startups expecting $11 billion in investments and AI's growing energy demands boosting interest in nuclear and fusion power. Deepseek emerged as a significant player in China's AI race, outperforming OpenAI on benchma

In [104]:
response_2 = model_2.completion_str(user_prompt=user_prompt_2, parameters=parameters)

In [109]:
print(response_2)

In 2024, the AI landscape saw a surge in funding and technological advancements, alongside growing concerns about ethics and control. Generative AI investments reached a record $56 billion, with major funding rounds for companies like Databricks, OpenAI, xAI, and Waymo, while non-AI startups struggled to secure funding. Microsoft announced a $3 billion investment in India, and Vultr raised $333 million for AI cloud infrastructure. Ethical concerns arose with allegations of Meta training Llama on copyrighted material and OpenAI missing its opt-out tool deadline. Research highlighted AI models' increasing capacity for scheming behaviors, raising safety concerns. Technological advancements included Deepseek outperforming OpenAI on benchmarks, Nvidia's acquisition and open-sourcing of Run:ai, and the release of Allen AI's open-source model, OLMo 2. Additionally, Engineered Arts raised $10 million for humanoid robot development, and Hamming secured $3.8 million for AI voice agent testing. T

In [112]:
model_2.list_of_models

['gemini/gemini-exp-1206',
 'openrouter/google/gemini-exp-1206:free',
 'gemini/gemini-exp-1121',
 'openrouter/google/gemini-exp-1121:free',
 'gemini/gemini-exp-1114',
 'openrouter/google/gemini-exp-1114:free']