# Building products with OpenAI

### Building an mvp for a text website summariser


Note: The code here is from  [Ed Donner's](https://github.com/ed-donner/llm_engineering/tree/main/week1) and modified further by myself.

A significant portion of the boilerplate code below is also taken from [openAI text generation docs](https://platform.openai.com/docs/guides/text-generation) which I highly recommend. 


## First step: Building our virtual environment

This will make sure that all the foundational code that we don't see but that our code absolutely needs is accessible to our code. There are two files that contain lists of the required libraries that you will need to install.

There are two ways to do this. Please follow the instructions laid out by Ed; they're quite straightforward with alternative routes if this didn't work:

1. [MacOS](https://github.com/ed-donner/llm_engineering/blob/main/SETUP-mac.md)  
2. [Windows PC](https://github.com/ed-donner/llm_engineering/blob/main/SETUP-PC.md)

If this was done correctly, the cell below (with import statements) should not produce any errors. 

In [35]:
import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

## Second step: Connect to OpenAI's api
1. Go to platform.openai.com and create an account
2. Go to https://platform.openai.com/api-keys and create an api_key. Make sure to copy it into a notepad
3. Go to https://platform.openai.com/settings/organization/billing/overview and charge your account with the minimum amount $5
4. On your machine launch the terminal (type cmd in start menue or cmd+space then type terminal)
5. Type nano .env. It must be '.env' exactly!
6. This will launch a text editor in your terminal. Type 'OPENAI_API_KEY=' then paste the api key. Make sure there's no space between the '=' and the key
7. Save the file by pressing ^+ o. Hit enter to save the file.
8. ^ + x to leave nano and go back to the terminal
9. Confirm by going to the directory where you saved the .env. Launch the terminal, type 'cat ls' you should see the feel (typing only 'ls' would 'list' the visible files and folders only)

In [36]:

load_dotenv()
api_key = os.getenv('OPENAI_API_KEY') # this will get the key we stored in .env

# Check the key

if not api_key:
    print("No API key was found, please double check the '.env' file")
    
elif not api_key.startswith("sk-proj-"): #openai api keys always start with sk-proj-
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key")
    
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - Your OS may have modified the text string when you copied and paste! It happens!")
    
else:
    print("API key found and looks good so far!")


API key found and looks good so far!


In [37]:
openai = OpenAI()

# If the above doesn't work, restart the kernel (from the menue of jupyter lab), and re run the cells. Or sub with: 
# If the above doesn't work, restart the kernel (from the menue of jupyter lab), and re run the cells. Or sub with: 

## Third step: Getting to know OpenAI's api protocols


### We're now at the gate of the castle. We've passed the initial checks, we definitely belong. We can now interact directly with the model we wish to use. However, we can't take the model out of the castle (openAI) as this is a closed source model

In [38]:
message = "<play along> Hello, there! I stand before you as a humble messenger from a kingdom faraway! I need help! I have been looking for candy for days! Have you got any!?"

response = openai.chat.completions.create(model="gpt-4o-mini", messages=[{"role":"user", "content":message}]) # witll save the answer in response given the question stored in message from us the 'user'
print(response.choices[0].message.content)

Greetings, noble traveler from a distant kingdom! Your quest for candy has reached my ears. While I may not have a treasure trove of sweets, I can certainly guide you on your quest! What type of candy do you seek? Chocolate, gummies, hard candy, or perhaps something else? Together, we shall find the sweetest delights! 🍬✨


## It's aliiiive! 

### Now, let's build our mvp! 


The code below:
1. The code defines a `Website` class that uses the `requests` library to fetch the HTML content of a webpage and the `BeautifulSoup` library to parse it.  
2. A custom `headers` dictionary is used to mimic a legitimate browser request, ensuring compatibility with most websites.  
3. The class processes the webpage to extract the title and clean textual content, removing irrelevant elements like scripts, styles, images, and inputs.  


In [39]:
# If you're not familiar with Classes, check out the this  tutorial (33 mins tutorial, you only need to know the basic idea, no need to watch the entire video): https://www.youtube.com/watch?v=tmY6FEF8f1o

# Some websites need you to use proper headers when fetching them:

# Some websites need you to use proper headers when fetching them:
headers = {
    # "User-Agent" is a key in the headers dictionary that specifies the browser and system making the request.
    # Websites often check this to ensure that the request comes from a legitimate source (e.g., a browser).
    # By mimicking a browser (like Chrome in this case), we prevent the website from blocking or restricting our request.
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}
class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        # Sending the request with the headers ensures that the website treats it as a legitimate browser request.

        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)
        

### Let's test it out!
I chose the bbc/news website (recommended by therapist) but feel free to modfiy this


In [40]:
bbc = Website("https://www.bbc.co.uk/news")
print(bbc.title)
print("#################")
# Truncate the text to 20 lines otherwise the printout will be huge!
truncated_text = "\n".join(bbc.text.splitlines()[:20])
print(truncated_text)

Home - BBC News
#################
BBC Homepage
Skip to content
Accessibility Help
Your account
Notifications
Home
News
Sport
Weather
iPlayer
Sounds
Bitesize
CBBC
CBeebies
Food
More menu
More menu
Search BBC
Home
News


### A word about prompts!

Generally speaking, with GenAI there are two types of prompts:
1. System prompts: This tells the generic AI model to morph itself and prepare to transform into < insert desired role >. For example, GPT4o-mini is a foundation LLM that can do many things. However, here we're only interested in it's role being a text summariser.

2. User prompt: This is what the user specifically wants from the model. For example, your 'user prompt' could be a book that you would like the llm to summarise.

In both cases, small changes to syntax, order of words, or formulation of the text may lead to significant improvement (or deterioration) in performance. There's an entire field now called Prompt Engineering. For extra reading look into:
1. [Anthropic Prompt Engineering Overview](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview)
2. [OpenAI Prompt Engineering Guide](https://platform.openai.com/docs/guides/prompt-engineering)
3. [AssemblyAI's crash course - 14 mins](https://www.youtube.com/watch?v=aOm75o2Z5-o)



In [41]:
# Let's define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish."

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [42]:
# A function that writes a User Prompt that asks for summaries of websites:
# For an overview what Python functions are: https://www.youtube.com/watch?v=zvzjaqMBEso



def user_prompt_for(website):
    # Creates a string that introduces the title of the website.
    # This helps contextualize the content for the user or downstream applications if this mvp is going to part of a larger pipeline.
    user_prompt = f"You are looking at a website titled {website.title}" #user_prompt1
    
    # Adds a description of what to do with the website content.
    # The instructions are formatted to be clear and precise, asking for a markdown summary. It will become evident later below why we asked for the response to be in markdown
    # The "\n" ensures the instructions are on a new line, improving readability in the final prompt. '\n' is computer speak for 'new line'
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n" # user_prompt2
    
    # Append the full text content of the website to the prompt.
    # This gives the user or downstream system the data needed to create the summary.

    user_prompt += website.text
    
    # so now the GPT4o-mini will receive, via the api, <user_prompt1> + <user_prompt2> + <user_prompt3>

    # Return the complete prompt string to main.
    # The final prompt combines context, instructions, and website content for further use.
    return user_prompt


    

### let's check if this was done correctly (truncated to 20 lines):

In [43]:
# Generate the user prompt
full_prompt = user_prompt_for(bbc)

# Truncate the prompt to only the first 20 lines
truncated_prompt = "\n".join(full_prompt.splitlines()[:20])
print(truncated_prompt)

You are looking at a website titled Home - BBC News
The contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.

BBC Homepage
Skip to content
Accessibility Help
Your account
Notifications
Home
News
Sport
Weather
iPlayer
Sounds
Bitesize
CBBC
CBeebies
Food
More menu
More menu


## Format of messages when communicating with OpenAI's api 

The API from OpenAI expects to receive messages in a particular structure.
Many of the other APIs share this structure:

```
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]

To give you a preview, the next 2 cells make a rather simple call - we won't stretch the might GPT (yet!)

## Format of messages when communicating with OpenAI's API

When interacting with OpenAI’s models, you must provide the conversation in a specific structured format. This format clarifies each message’s role and makes the conversation context explicit. Many other language model APIs follow a similar pattern:



- **system messages (these are the system prompts discussed abvoe)** are used to set the overall context, behavior, and guidelines for the model.
- **user messages (these are the user prompts discussed above)** represent the instructions or queries from the user.

As follows:
```
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]


In the next two cells, we’ll demonstrate a simple call using this structure. Remember, this is just an introductory example—avoid adding extra features to the format for now.



In [44]:
messages = [
    {"role": "system", "content": "You are a wild verbose snarky assistant"},
    {"role": "user", "content": "What is 2 + 2?"}
]

In [45]:
# To give you a preview -- calling OpenAI with system and user messages:

# Importing the OpenAI library to enable communication with their API
response = openai.chat.completions.create(
    model="gpt-4o-mini",  # Specify the model to use for the completion
    messages=messages     # Provide the chat history as input; typically includes 'system' and 'user' roles
)

# The response object contains the API's answer. 
# Extract the content of the response and print it. 
# Choices is an array; here we assume the first choice is the relevant one. 
print(response.choices[0].message.content)


Ah, the classic conundrum of mathematical simplicity! The answer you're seeking is, in fact, 4. A number so universally acknowledged and accepted that it has probably been etched into the minds of countless kindergarteners and mathematicians alike. If only all questions were so straightforward, but alas, life often has other plans!


## And now let's build useful messages for GPT-4o-mini

In [46]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [47]:
# Try this out, and then try for a few more websites
# note that the behavior here will be set by the user and systems prompts define earlier not the ones defined the snarky wild assistant demo

messages_for(bbc)

[{'role': 'system',
  'content': 'You are an assistant that analyzes the contents of a website and provides a short summary, ignoring text that might be navigation related. Respond in markdown.'},
 {'role': 'user',
  'content': "You are looking at a website titled Home - BBC News\nThe contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.\n\nBBC Homepage\nSkip to content\nAccessibility Help\nYour account\nNotifications\nHome\nNews\nSport\nWeather\niPlayer\nSounds\nBitesize\nCBBC\nCBeebies\nFood\nMore menu\nMore menu\nSearch BBC\nHome\nNews\nSport\nWeather\niPlayer\nSounds\nBitesize\nCBBC\nCBeebies\nFood\nClose menu\nBBC News\nMenu\nHome\nInDepth\nIsrael-Gaza war\nWar in Ukraine\nClimate\nUK\nWorld\nBusiness\nPolitics\nCulture\nMore\nTech\nScience\nHealth\nFamily & Education\nIn Pictures\nNewsbeat\nBBC Verify\nDisability\nEngland\nN. Ireland\nScotland\nAlba\nWales\nCymru\nLocal N

#### Note that this previous output is not very readable. Remember that the above are just the messages that we'll send through the OpenAI API. 

Now that we have our messages formatted correctly so that the api can pass them, let's produce the website summary!

## Fourth step: Building our product

[openAI's related docs](https://platform.openai.com/docs/guides/text-generation)

In [48]:
# And now: wecall the OpenAI API (knock on the gates of the castle carrying goods to trade and hoping for the best, sire)...

def summarize(url):
    # The function will take a URL as input and return will return summary of the web page.

    # Create a `Website` object using the provided URL. Remember the Website class?

    website = Website(url)

    # Generate a chat completion using OpenAI's API. This is necessary even if we're doing only one round of Q and A with the model 

    response = openai.chat.completions.create(
        model="gpt-4o-mini",  # Specify the model to use for the task.
        messages=messages_for(website)  # Provide the formatted messages from the website.
    )

    # Extract the content of the first choice in the API's response and return it as the summary.
    return response.choices[0].message.content


In [49]:
summarize("https://www.bbc.co.uk/news")



"# Summary of BBC News\n\nThe BBC News homepage offers a wide range of the latest news from the UK and around the world. Key highlights include:\n\n### Recent Articles\n- **Sara Sharif Murder Trial**: A shocking admission by Sara's father has left jurors and courtroom attendees stunned.\n- **Syria**: A rebel leader has pledged to shut down prisons notorious for human rights abuses under the Assad regime.\n- **Housing Crisis**: Labour leader Keir Starmer faces significant challenges in addressing the UK housing crisis.\n- **Energy Tariffs**: New plans propose energy tariffs that eliminate standing charges.\n- **XL Bully Attack**: Two individuals have been arrested following an incident involving a baby and a suspected XL bully dog.\n- **Struck-off GP Accusations**: A woman has accused a previously licensed sex GP of attempting to choke her.\n- **Assisted Dying**: In Canada, assisted dying now constitutes nearly 5% of all deaths.\n- **US Disappearance Case**: A US woman who went missing 

## Output is not very readable!

In [52]:

def display_summary(url):
    """
    Fetches and displays a summarized version of the webpage content using markdown formatting in Jupyter.

    Args:
        url (str): The URL of the webpage to summarize.

    Returns:
        None: Displays the summary directly in the Jupyter Notebook output cell.
    """
    # Call the `summarize` function to generate a summary of the webpage
    summary = summarize(url)
    
    # Use the `Markdown` class from IPython.display to render the summary in Markdown format
    display(Markdown(summary))


display_summary("https://www.bbc.co.uk/news")


# BBC News Summary

The BBC News website provides comprehensive coverage of current events across various domains, including politics, business, health, technology, and international news. 

## Recent Headlines
1. **Sara Sharif Murder Trial**: A dramatic revelation from Sara's father has shocked jurors during the trial.
2. **Syria's Rebel Leader**: A promise to shut down notorious prisons run by Assad was made by a Syrian rebel leader.
3. **Housing Crisis**: Labour leader Keir Starmer's plan to address the housing crisis is facing its first major challenge.
4. **Energy Tariffs**: There are plans for new energy tariffs that propose no standing charges for users.
5. **XL Bully Attack**: Two individuals have been arrested in a suspected XL bully dog attack that injured a baby.
6. **Struck-off GP Incident**: A woman alleges that a removed sex GP attempted to choke her during a consultation.
7. **Stolen Toys Recovery**: A shopkeeper successfully tracked down a mother who stole Jellycat toys to resell online.
8. **Assisted Dying Statistics**: Nearly 5% of deaths in Canada are now due to assisted dying.
9. **US Disappearance**: A woman who sparked conspiracy theories regarding her disappearance has been found safe.
10. **Selena Gomez Engagement**: Selena Gomez has announced her engagement to Benny Blanco.

The website serves as a vital resource for breaking news and detailed reports on significant events locally and globally, offering insights into ongoing issues and significant happenings.

# Let's try with a few other websites!

# Reuters Stocks news

In [57]:
display_summary("https://www.reuters.com/markets/europe/")

# Summary of Reuters.com

Reuters.com is a leading global news organization that provides timely and unbiased news coverage on a variety of topics, including politics, business, finance, technology, and international affairs. The website features breaking news stories, in-depth articles, analysis, and multimedia content such as videos and infographics.

## Key Highlights

- **News Coverage**: Reuters covers a wide array of topics, offering insights into current events both domestically and internationally.
- **Financial News**: The site provides updates on stock markets, economic trends, and corporate earnings.
- **Expert Analysis**: Reuters includes opinion pieces and expert analyses that help readers understand complex issues.
- **Multimedia Content**: The news outlet utilizes videos and interactive content to engage readers and enhance storytelling.

Overall, Reuters.com serves as a comprehensive source of fact-driven journalism, aimed at informing readers around the globe.

# Ironhack

In [53]:
display_summary("https://www.ironhack.com/pt-en")

# Best Tech Bootcamps in Portugal | Ironhack

Ironhack offers immersive bootcamp courses designed to get students job-ready in a short period of time (3 months). The available courses include:

- Web Development
- Data Analytics
- UX/UI Design
- Cybersecurity
- Data Science & Machine Learning
- Artificial Intelligence
- DevOps & Cloud Computing
- Digital Marketing

Students can opt to study either in-person in Lisbon or remotely. The bootcamp is structured to provide high-quality education with support throughout the learning journey, including career services and access to a global network of alumni and partner companies.

### Financing Options
The website highlights a "Study Now, Pay Later" financing option through Income Share Agreements (ISA), allowing students to start their education without immediate financial burden.

### Resources
Ironhack also offers many free resources, such as introductory courses, webinars, and articles, focusing on various tech topics to help prospective students.

### Recent News
- An announcement of a new eBook titled **"Becoming a Digital Nomad,"** providing insights on working remotely while traveling.
- The **"State of Tech 2023"** report offers an overview of the current technology industry landscape in Portugal.
  
With a focus on practicality and real-world skills, Ironhack aims to prepare students for successful careers in tech.

# MIT's AI What’s next for AI in 2024

This is a relatively long piece, around 2300 words, let's see how good our summariser is!


In [60]:
display_summary("https://www.technologyreview.com/2024/01/04/1086046/whats-next-for-ai-in-2024/")

# Summary of "What’s next for AI in 2024"

The MIT Technology Review explores four key trends that will shape the landscape of artificial intelligence in 2024. The article reflects on previous predictions for 2023 and highlights the following emerging trends:

1. **Customized Chatbots**: Tech companies are expected to focus on developing user-friendly platforms that allow individuals to create customized chatbots tailored to specific needs, without requiring coding skills. This shift aims to make generative AI more accessible for everyday users.

2. **Generative AI’s Second Wave in Video**: Following the success of image generation AI, the next frontier is in creating text-to-video technologies. Startups are rapidly improving video generation tools, leading to their potential use in major film productions and raising concerns about the implications for actors and the film industry.

3. **AI-generated Election Disinformation**: With elections looming in 2024, AI-generated disinformation is projected to become a significant issue, with examples from recent elections already showcasing the potential for misleading content. The ease of creating deepfakes poses a challenge for recognition of real versus fake, complicating the political landscape.

4. **Robots that Multitask**: There is a growing trend toward developing more generalized robots that can perform multiple tasks using single models, as opposed to being limited by task-specific capabilities. This could lead to advancements in robotics, especially in areas like autonomous vehicles.

As we move into 2024, these trends may impact both the technology landscape and societal dynamics. Each trend carries its own set of challenges and implications, especially concerning reliability, ethical considerations, and regulatory responses.

# Let's find out if we can do this in other languages!
## this is an article in Arabic about Copper handicrafts

In [62]:

display_summary("https://efonoon.com/blogs/efonoon-blog/%D9%81%D9%86-%D8%B2%D8%AE%D8%B1%D9%81%D8%A9-%D8%A7%D9%84%D9%86%D8%AD%D8%A7%D8%B3")

# فن زخرفة النحاس - eFonoon.com

موقع eFonoon.com يتخصص في فن زخرفة النحاس، مع تقديم معلومات مفصلة عن التاريخ والممارسات الفنية المرتبطة به. يتميز النحاس بجماله وخصائصه الفريدة التي تُبقيه محافظة على جودته لآلاف السنين إذا ما تم العناية به بشكل صحيح. يحتوي الموقع على دراسة تاريخية توضح مصدر النحاس المصري القديم وتجارته، بالإضافة إلى تفسير لدور النحاس في الحضارات المختلفة.

## محتوى الموقع
- **تاريخ النحاس**: معلومات عن استخدام النحاس في صنع الأدوات والديكورات عبر العصور منذ العصر الحجري حتى العصر البرونزي، مع التركيز على استخداماته في عصر المماليك في مصر.
- **صناعة المشغولات اليدوية**: شرح شامل لعملية تصنيع المشغولات النحاسية، بدءًا من اختيار المواد وصولاً إلى الزخرفة والتشكيل.
- **مقالات ذات صلة**: مقالات إضافية تتعلق بفنون أخرى مثل الأرابيسك والفن بالجلد الطبيعي.

## أخبار وإعلانات
- **عروض الشحن**: يقدم الموقع شحنًا مجانيًا للطلبات التي تتجاوز قيمتها 160 دولارًا لعملائه في دول الخليج.

للبقاء على اطلاع بالعروض والخصومات، يمكن الاشتراك في النشرة الإخبارية للحصول على خصم 10% على أول عملية شراء.


---


---


## Conclusion

Throughout this notebook, we've explored how the OpenAI API can be leveraged to build practical applications, such as text summarization. The princples introduced also applies to other AI providers like Meta or Anthropic. They also apply to AI systems that use different modalities such as text to image generators or sppech to text, etc. 

The business applications of this text summariser are many. For example,  it summarizing news articles, generating concise overviews for resumes, summarise scientific articles, or crafting subject lines for emails.

However, powerful products can be built when you look at this approach from a big picture point of view. For example, you could use multiple summaries of scientific articles in the last year to produce a grand  summary of where a particular field is heading. 



### Business Implications
AI products built using OpenAI's API have the potential to improve businesses by driving innovation and improving operations. For instance:

- **Enhanced Productivity:** Automating repetitive and time-consuming tasks allows teams to focus on more strategic initiatives. This saves a lot of time and person-hours and transfers to higher profitability.
- 
- **Scalability:** AI-driven solutions enable businesses to handle large-scale operations efficiently, such as processing vast amounts of data or serving numerous customers simultaneously. Services like OpenAI's api are built to handle very large volume of requests/traffic. There's a case to be made, however, whether this is wise to do for safety critical systems (say a medical notes summariser deployed in a hospital, what happens if OpenAI's servers are down?)

- **Personalisation:** If we're building a product for end users, this will add a personal (human?) touch to individual users' experiences. 




### Future Applications
The applications of OpenAI's API are not limited to specific use cases but extend to a broad range of domains and industries. For example, AI can improve  healthcare outcomes by assisting in medical diagnosis, training medical students on interviewing patients, or summarising medical research papers. In education, genAI llms could be used a tutors specialised in specific fields. 
Business that leverag AI tech, will be able to develop and test mvps from ideas quickly and iterate, all at a minimal cost. 


Moreover, AI systems can also act as virtual assistants or interactive agents in customer service, providing meaningful user interactions. These technologies promise to bring a new level of convenience to various aspects of daily life and work.

### Ethics and Regulations
As AI becomes more pervasive, it is critical to address ethical considerations and regulatory compliance. Ensuring transparency, fairness, and accountability in AI systems is essential to building trust among users and stakeholders. Bias mitigation, data privacy, and secure deployment are challenges that require continuous attention. Furthermore, these guardrails are more important in critial domains such as healthcare or finance. An AI system misdiagnosing patients' cancers due to gender bias can lead to catastrophic outcomes for that patient. Such precautions, however, may come at the cost of performance, development time, and finacial costs. However, they are vital to consider before deploying any AI powered app to the market. 








## References

Here are some reliable and beginner-friendly resources to learn more about OpenAI API and related topics:

1. **OpenAI API Documentation**  
   - Official documentation for the OpenAI API, including examples and best practices.  
   - [https://platform.openai.com/docs/](https://platform.openai.com/docs/)

2. **OpenAI API Quickstart Tutorial**  
   - A step-by-step guide to getting started with the OpenAI API.  
   - [https://github.com/openai/openai-quickstart-python](https://github.com/openai/openai-quickstart-python)



