<a href="https://colab.research.google.com/github/datxander/Kaggle-course-exercises/blob/main/Coursera/OpenAI/challenge_task_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Welcome to the challenge task

You have been hired by a 40-year-old news company called FF-NEWS. They have provided you with a list of their news headlines from March 2004. They are seeking 10 headlines from their newspaper specifically related to `climate change and global warming`.

As an AI Engineer, your task is to utilize the OpenAI text embedding model to identify 10 headlines about climate change and global warming from their archive.

Good luck!🍀

-Ahmad

----
Run the following block to set up the OpenAI API and import the necessary modules.

**Do not forget to upload your apikey.env file into the Google Colab environment.**

In [1]:
!pip install openai python-dotenv

import pandas as pd
import os
from openai import OpenAI
from dotenv import load_dotenv
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Loading API key and organization ID from a dotenv file
load_dotenv(dotenv_path='apikey.env.txt')

# Retrieving API key and organization ID from environment variables
APIKEY = os.getenv("APIKEY")
ORGID = os.getenv("ORGID")

# Creating an instance of the OpenAI client with the provided API key and organization ID
client = OpenAI(
  organization= ORGID,
  api_key=APIKEY
)

client

Collecting openai
  Downloading openai-1.34.0-py3-none-any.whl (325 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m325.5/325.5 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
Installing collected 

<openai.OpenAI at 0x7a177e552500>

Here is the list of news headlines

In [2]:
headlines = [
    "Political leaders convene for summit on climate change",
    "New breakthrough in renewable energy technology announced",
    "Global warming activists rally in major cities worldwide",
    "Sports teams unite to raise awareness about climate crisis",
    "Government announces new policy to combat climate change",
    "Tech giant unveils revolutionary AI-powered gadget",
    "Climate scientists warn of irreversible damage to the planet",
    "Political turmoil grips nation as elections approach",
    "Athletes advocate for environmental conservation efforts",
    "Tech startups compete for funding in Silicon Valley",
    "Rising temperatures threaten biodiversity, experts warn",
    "Sports championship marred by controversy over doping allegations",
    "New legislation aims to reduce carbon emissions",
    "Tech company announces plans for expansion into new markets",
    "Global warming effects felt in Arctic region, scientists say",
    "Political leaders clash over proposed climate change regulations",
    "Team achieves victory in sporting event despite odds",
    "Breakthrough in renewable energy research promises brighter future",
    "Climate summit ends with pledges for carbon neutrality",
    "Athlete breaks world record in thrilling sporting event",
    "Tech industry faces criticism over data privacy concerns",
    "Government launches initiative to promote green technology",
    "Climate change impacts highlighted in new scientific report",
    "Political scandal rocks nation's capital",
    "Sports star announces retirement after illustrious career",
    "Tech company introduces innovative solution to environmental challenges",
    "Global warming exacerbates natural disasters, experts warn",
    "Political candidates debate strategies to address climate crisis",
    "Sports league adopts sustainability measures to reduce carbon footprint",
    "Tech expo showcases latest advancements in artificial intelligence",
    "Climate activists stage protest outside government buildings",
    "Political leaders vow to prioritize climate change action",
    "Athlete overcomes injury to win gold medal",
    "Tech company partners with environmental organizations for conservation projects",
    "Global warming threatens food security, scientists caution",
    "Political campaign focuses on environmental policies",
    "Sports event canceled due to extreme weather conditions",
    "Tech startup revolutionizes transportation industry with electric vehicles",
    "Climate change summit attracts attention from around the world",
    "Political unrest leads to protests in major cities",
    "Sports team celebrates victory with championship parade",
    "Tech industry grapples with cybersecurity challenges",
    "Global warming impacts discussed at international conference",
    "Political leaders face scrutiny over handling of climate crisis",
    "Athlete honored with prestigious award for sportsmanship",
    "Tech conference explores the future of artificial intelligence",
    "Climate activists demand immediate action from world leaders",
    "Political debate intensifies ahead of election day",
    "Sports fans rally behind team in championship match",
    "Tech company accused of monopolistic practices",
    "Global warming solutions proposed at climate change forum",
    "Political summit focuses on diplomatic relations",
    "Athlete achieves personal best in sporting competition",
    "Tech industry leaders testify before congressional committee",
    "Climate change effects seen in rising sea levels, researchers find",
    "Political parties clash over environmental policies",
    "Sports league implements measures to promote diversity and inclusion",
    "Tech startup secures funding for groundbreaking project",
    "Global warming awareness campaign gains momentum",
    "Political leaders spar over economic policies",
    "Athlete inspires youth through community outreach programs",
    "Tech company launches new product to improve quality of life",
    "Climate change activists call for divestment from fossil fuels",
    "Political commentator discusses implications of recent events",
    "Sports tournament draws record-breaking viewership",
    "Tech industry grapples with ethical dilemmas of AI",
    "Global warming impact on wildlife habitats documented in new study",
    "Political rally attracts thousands of supporters",
    "Athlete makes comeback after overcoming adversity",
    "Tech conference showcases cutting-edge innovations",
    "Climate change legislation faces opposition in parliament",
    "Political upheaval leads to government reshuffle",
    "Sports team embarks on goodwill tour to promote peace",
    "Tech company releases annual sustainability report",
    "Global warming awareness raised through art and music festival",
    "Political campaign enters final stretch with heated debates",
    "Athlete named ambassador for youth sports program",
    "Tech industry leaders meet to discuss future trends",
    "Climate change protesters disrupt international summit",
    "Political scandal unfolds with leaked documents",
    "Sports star launches foundation to support underprivileged youth",
    "Tech startup awarded for innovation in environmental sustainability",
    "Global warming impact on agriculture highlighted in report",
    "Political leaders negotiate international trade agreements",
    "Athlete honored with induction into Hall of Fame",
    "Tech company invests in renewable energy research",
    "Climate change task force formed to address urgent issues",
    "Political tensions escalate in region, raising concerns",
    "Sports organization partners with charity for humanitarian efforts",
    "Tech expo features demonstrations of virtual reality technology",
    "Global warming debate intensifies with new scientific findings",
    "Political campaign focuses on grassroots activism",
    "Athlete advocates for gender equality in sports",
    "Tech industry pioneers explore potential of blockchain technology",
    "Climate change summit results in historic agreement",
    "Political leaders reach compromise on controversial legislation",
    "Sports team wins championship with dramatic final play",
    "Tech company launches initiative to bridge digital divide",
    "Global warming effects observed in changing weather patterns",
    "Political candidates engage voters in town hall meetings",
    "Athlete inspires next generation through mentorship program",
    "Tech conference showcases breakthroughs in quantum computing",
    "Climate change action plan receives bipartisan support",
    "Political movement gains momentum with widespread support",
    "Sports star honored with prestigious sportsmanship award",
    "Tech industry leaders advocate for diversity and inclusion initiatives",
    "Global warming impact on coastal communities examined in documentary",
    "Political summit addresses refugee crisis and humanitarian aid",
    "Athlete donates winnings to charity for children's education",
    "Tech startup disrupts industry with innovative business model",
    "Climate change awareness campaign launches on social media",
    "Political leaders engage in diplomatic talks to promote peace",
    "Sports league implements strict anti-doping measures",
    "Tech company pledges to reduce carbon footprint with sustainability initiatives",
    "Global warming research expedition uncovers new insights",
    "Political unrest sparks protests and civil unrest",
    "Athlete breaks barriers as first in their sport",
    "Tech industry leaders collaborate on open-source projects",
    "Climate change documentary wins prestigious film award",
    "Political candidates make final push in campaign rallies",
    "Sports team celebrates victory with parade through city streets",
    "Tech conference addresses cybersecurity threats and solutions",
    "Global warming impact on indigenous communities addressed at UN summit"
]

Use `text-embedding-3-small` text embedding model to generate the embedding vectors with 256 dimensions of the headlines.

In [5]:
response = client.embeddings.create(
    input = headlines,
    model = "text-embedding-3-small",
    dimensions = 256
)


Extract the vector embeddings from the `response`

In [16]:
vectors = [d.embedding for d in response.data]



TypeError: list indices must be integers or slices, not str

Use the embedding model to generate a 256-dimensional embedding vector related to the phrase: `"global warming and climate change"`. then extract the embedding vector from the openAI response and store it into a variable called `search_vector`

In [17]:
search_response = client.embeddings.create(
    input = "global warming and climate change",
    model = "text-embedding-3-small",
    dimensions = 256

)

search_vector = [d.embedding for d in search_response.data]
search_vector

[[0.01729966513812542,
  -0.05895240232348442,
  0.14733244478702545,
  0.006780254654586315,
  -0.06458541750907898,
  0.022969098761677742,
  0.04232044145464897,
  -0.09352745115756989,
  -0.04018378257751465,
  -0.06817889213562012,
  0.0224713534116745,
  -0.005994182080030441,
  -0.15024606883525848,
  -0.026999616995453835,
  -0.14966334402561188,
  0.012892802245914936,
  -0.00882586371153593,
  0.10081151872873306,
  0.02364894561469555,
  0.020868858322501183,
  -0.08061036467552185,
  -0.07196660339832306,
  0.025057198479771614,
  0.15393666923046112,
  -0.1507316678762436,
  0.05113416537642479,
  0.011520969681441784,
  0.05195969343185425,
  0.14014549553394318,
  0.12868522107601166,
  0.04950739070773125,
  -0.06769328564405441,
  -0.03586190193891525,
  -0.0011305483058094978,
  0.06385700404644012,
  0.007593642454594374,
  -0.10236545652151108,
  0.10576468706130981,
  0.04348589479923248,
  -0.03814424201846123,
  -0.05021151900291443,
  -0.10527908056974411,
  0.0

Now create a dataframe containing the news headlines and their corresponding 256-dimensional vectors. Your dataframe should have two columns: `vectors` and `headlines`.

In [23]:
df = pd.DataFrame()
df["headlines"] = headlines
df["vectors"] = vectors
df

Unnamed: 0,headlines,vectors
0,Political leaders convene for summit on climat...,"[-0.011421514675021172, -0.08216778934001923, ..."
1,New breakthrough in renewable energy technolog...,"[-0.01936185173690319, -0.03718791902065277, -..."
2,Global warming activists rally in major cities...,"[0.114779032766819, 0.06291364133358002, 0.161..."
3,Sports teams unite to raise awareness about cl...,"[-0.03605781868100166, -0.041205164045095444, ..."
4,Government announces new policy to combat clim...,"[-0.013506997376680374, -0.015020178630948067,..."
...,...,...
118,Climate change documentary wins prestigious fi...,"[0.02643044851720333, 0.03032756596803665, -0...."
119,Political candidates make final push in campai...,"[0.11007075756788254, -0.005512557923793793, 0..."
120,Sports team celebrates victory with parade thr...,"[0.13316281139850616, -0.055330511182546616, -..."
121,Tech conference addresses cybersecurity threat...,"[-0.00964539498090744, -0.00943482294678688, 0..."


Use the `cosine similarity` measure to calculate the similarity between the `search_phrase_vector` and each of the embedding vectors of the headlines.

Write your code in a way that includes a new column in the dataframe called `similarity score`.


In [28]:
df["similarity score"] = df.vectors.apply(lambda x: cosine_similarity([x], search_vector)[0][0])
df.head()

Unnamed: 0,headlines,vectors,similarity score
0,Political leaders convene for summit on climat...,"[-0.011421514675021172, -0.08216778934001923, ...",0.402262
1,New breakthrough in renewable energy technolog...,"[-0.01936185173690319, -0.03718791902065277, -...",0.156668
2,Global warming activists rally in major cities...,"[0.114779032766819, 0.06291364133358002, 0.161...",0.482967
3,Sports teams unite to raise awareness about cl...,"[-0.03605781868100166, -0.041205164045095444, ...",0.40967
4,Government announces new policy to combat clim...,"[-0.013506997376680374, -0.015020178630948067,...",0.423604


Sort the dataframe by the `similarity score` column and find 10 headlines that are related to the search phrase.

In [29]:
search_results = df.sort_values(by="similarity score", ascending=False).head(10)
search_results

Unnamed: 0,headlines,vectors,similarity score
98,Global warming effects observed in changing we...,"[0.005853269249200821, 0.03505754470825195, 0....",0.618899
42,Global warming impacts discussed at internatio...,"[0.012115242891013622, -0.006489833816885948, ...",0.607157
58,Global warming awareness campaign gains momentum,"[0.07803697139024734, 0.014649798162281513, 0....",0.573053
86,Climate change task force formed to address ur...,"[-0.011689946055412292, 0.005356284324079752, ...",0.558014
22,Climate change impacts highlighted in new scie...,"[0.034331392496824265, 0.024213053286075592, 0...",0.551145
94,Climate change summit results in historic agre...,"[0.04221179336309433, -0.06409280002117157, 0....",0.547517
14,"Global warming effects felt in Arctic region, ...","[0.05455547198653221, 0.017751263454556465, 0....",0.545923
50,Global warming solutions proposed at climate c...,"[0.009113166481256485, -0.057428840547800064, ...",0.528128
38,Climate change summit attracts attention from ...,"[0.01242073718458414, -0.04275553300976753, 0....",0.526462
34,"Global warming threatens food security, scient...","[0.04340847581624985, 0.06926780194044113, 0.1...",0.524504
