# End of week 1 exercise

To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question,  
and responds with an explanation. This is a tool that you will be able to use yourself during the course!

In [3]:
# imports
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [4]:
# constants
MODEL = 'gpt-4o-mini'
MODEL_LLAMA = 'llama3.2'

In [5]:
# set up environment
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
openai = OpenAI()

API key looks good so far


In [6]:
# here is the question; type over this to ask something new

question = """
Please explain what this code does and why:
yield from {book.get("author") for book in books if book.get("author")}
"""

In [7]:
# Get gpt-4o-mini to answer, with streaming
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"
link_system_prompt = "You are provided with a list of links found Amazon web services landing page for all documentation. \
You are able to decide which of the links would be most relevant to include in an organized product documentation repo, \
such as links to completely necessary services, links to complementary services, microservices, and helpful tips and conceptual guidance.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "product", "function": "website hosting", "used_with": ["a product", "another prouduct"], url": "https://full.url/goes/here/EC2"},
        {"type": "product", "function": "data storage", "used_with": ["a product", "another prouduct"],"url": "https://another.full.url/RDS"},
        {"type": "product", "function": "serverless", "used_with": ["a product", "another prouduct", "and another product"], "https://another.full.url/lambda"},
        {"type": "article", "function": "troubleshooting", "https://another.full.url/helpful"}
    ]
}
"""
link_system_prompt += "If you encounter additional types, functionality, sub-categories, etc. that I have not considered add those as records within the JSON object for the relevant link as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "product", "function": "website hosting", "used_with": ["a product", "another prouduct"], url": "https://full.url/goes/here/EC2"},
        {"type": "product", "function": "data storage", "used_with": ["a product", "another prouduct"],"url": "https://another.full.url/RDS"},
        {"type": "product", "function": "serverless", "used_with": ["a product", "another prouduct", "and another product"], "https://another.full.url/lambda"},
        {"type": "article", "function": "troubleshooting", "url": "https://another.full.url/helpful"},
        {"type: "product", "function": "analytics", "used_with": ["a product"], "advantage": "saves time", "disadvantage": "none", "url": "https://another.full.url/firehose"}
    ]
}
"""
link_system_prompt += "Always only return one link per record in the list of links. Never insert links as content in the record categories unless it is for the url data."

def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for AWS products, services, guides and use cases, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [8]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [9]:
get_links("https://docs.aws.amazon.com/?nc2=h_ql_doc_do")

{'links': [{'type': 'product',
   'function': 'compute',
   'used_with': ['scalable applications', 'data processing'],
   'url': 'https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html'},
  {'type': 'product',
   'function': 'data storage',
   'used_with': ['applications', 'big data'],
   'url': 'https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html'},
  {'type': 'product',
   'function': 'database',
   'used_with': ['web applications', 'mobile apps'],
   'url': 'https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Welcome.html'},
  {'type': 'product',
   'function': 'serverless',
   'used_with': ['event-driven applications'],
   'url': 'https://docs.aws.amazon.com/lambda/latest/dg/welcome.html'},
  {'type': 'product',
   'function': 'analytics',
   'used_with': ['data analysis', 'business intelligence'],
   'url': 'https://docs.aws.amazon.com/analytics/index.html'},
  {'type': 'article',
   'function': 'best practices',
   'url': 'https://docs.aws.amazo

In [90]:
def get_all_details(url, result, all_links):
    if len(all_links) > 15:
        result += Website(url).get_contents()
        return result
    else:
        #result += Website(url).get_contents()
        links = get_links(url)
        all_links = all_links + links["links"]
        #print("All links length:", len(all_links))
        for link in links["links"]:
            print(f'searching for links within {link["url"]}')
            #result += f"\n\n{link['type']}\n"
            result += get_all_details(link["url"], "", all_links)
        return result

In [91]:
system_prompt = "You are a software developer trying to determine the best AWS services to use for a small business that wants to launch a test web application for 5 to 10 users \
You want to create a simple report of the best AWS products to use with their costs and development complexity. Some web app features are unknown so the report will compare and contrast some solutions based on complexity and price. Respond in markdown.\
Include details of service name, how it works with other AWS services, how easily it can scale, and whether there is long term support for the solution."

In [92]:
def get_report_user_prompt(url):
    user_prompt = f"You are looking at AWS documentation\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information only to build a short report of the best services to use for a small web app for 5 to 10 test users. Respond in markdown.\n"
    user_prompt += get_all_details(url, "", [])
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    print(user_prompt)
    return user_prompt

In [93]:
def stream_report(url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_report_user_prompt(url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [94]:
stream_report("https://docs.aws.amazon.com/?nc2=h_ql_doc_do")

searching for links within https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html
searching for links within https://console.aws.amazon.com/ec2/
searching for links within https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html
searching for links within https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/TroubleshootingInstancesConnecting.html
searching for links within https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Stop_Start.html
searching for links within https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/tracking-free-tier-usage.html
searching for links within https://docs.aws.amazon.com/ebs/latest/userguide/ebs-creating-volume.html
searching for links within https://docs.aws.amazon.com/systems-manager/latest/userguide/run-command.html
searching for links within https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-launch-tutorials.html
searching for links within https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_G

# AWS Solutions Report for a Small Web Application

This report outlines recommended AWS services for a small web application suited for 5 to 10 users, focusing on cost-effectiveness and development complexity.

## Recommended AWS Services

### 1. **Amazon EC2 (Elastic Compute Cloud)**

- **Overview**: EC2 provides scalable virtual servers in the AWS Cloud, allowing businesses to run applications by launching and connecting to virtual instances.
- **Integration**: Works seamlessly with AWS services like Amazon RDS (for database needs), Amazon S3 (for storage), and AWS Lambda (for serverless applications).
- **Scalability**: Easily scales by adding new instances as demand grows. Can adjust instance types based on performance needs.
- **Cost**: You can potentially operate within the AWS Free Tier for up to 12 months, which provides t2.micro or t3.micro instances at no charge (if eligible). After that, costs depend on usage.
- **Development Complexity**: Moderate; requires understanding of instance configurations, security groups, and networking. Some setup and management are required.
- **Long-Term Support**: Supported by AWS with extensive documentation and tutorials available.

### 2. **Amazon S3 (Simple Storage Service)**

- **Overview**: S3 is an object storage service that offers high durability, scalability, and availability for data storage.
- **Integration**: Integrates well with EC2 for static file hosting, AWS Lambda for serverless applications, and Amazon CloudFront for content delivery.
- **Scalability**: Automatically scales storage as needed without an upper limit.
- **Cost**: Pay as you go pricing model based on storage used, requests made, and data transferred. S3 offers a free tier for the first 5 GB of storage.
- **Development Complexity**: Low; easy to implement with SDKs available for various programming languages.
- **Long-Term Support**: Long-term support with extensive resources and best practices documentation.

### 3. **Amazon RDS (Relational Database Service)**

- **Overview**: An managed service that simplifies setup, operation, and scaling of relational databases like MySQL or PostgreSQL.
- **Integration**: Works smoothly with EC2 instances for web applications that require persistent data storage.
- **Scalability**: Easily scales storage and compute resources as application demands change.
- **Cost**: While not covered in the Free Tier, RDS offers low-cost instance classes suitable for small applications (e.g., db.t3.micro).
- **Development Complexity**: Moderate; requires some understanding of database management, but AWS takes care of backups, patching, and replication concerns.
- **Long-Term Support**: Fully supported by AWS with continuous improvements and robust documentation.

### 4. **AWS Lambda**

- **Overview**: A serverless compute service that runs code in response to events, without provisioning servers.
- **Integration**: Can be connected with various AWS services such as S3, DynamoDB, and API Gateway.
- **Scalability**: Automatic scaling based on the number of requests, ideal for applications with variable workloads.
- **Cost**: Charged based on the number of requests and the duration of code execution. There is a free tier of 1 million requests and 400,000 GB-seconds of compute time monthly.
- **Development Complexity**: Low; ideal for event-driven applications. Minimal management required, focusing on code rather than infrastructure.
- **Long-Term Support**: Continual support from AWS, with evolving features and extensive community and documentation.

## Summary Table

| Service    | Cost  | Development Complexity | Scalability   | Long-Term Support |
|------------|-------|------------------------|---------------|-------------------|
| Amazon EC2 | Free Tier (if eligible), otherwise pay-as-you-go | Moderate                       | High          | Yes               |
| Amazon S3  | Free Tier (5 GB), pay-as-you-go thereafter | Low                           | Unlimited     | Yes               |
| Amazon RDS | Low-cost tier, not in Free Tier | Moderate                       | High          | Yes               |
| AWS Lambda  | Free Tier (1M requests), pay-as-you-go | Low                           | Automatic     | Yes               |

## Conclusion

For a small web application targeting 5 to 10 users, leveraging a combination of **Amazon EC2**, **Amazon S3**, **Amazon RDS**, and **AWS Lambda** provides a balanced architecture that is cost-efficient, scalable, and backed by robust support from AWS.

In [None]:
# Get Llama 3.2 to answer