A full business solution
BUSINESS CHALLENGE:
Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

See the end of this notebook for examples of real-world business applications.

In [4]:
## imports necessary packages 
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [5]:
## Initialize and constants
# check for API key condition

load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key[:8] == 'sk-proj-':
    print("API so far looks good")
else:
    print("There might be problem with API key?")
Model = 'gpt-4o-mini'
openai = OpenAI()

API so far looks good


In [6]:
## A class to represent webpage
class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """
    url: str
    title: str
    body: str
    links: List[str]
    text: str

    def __init__(self,url):
        self.url = url
        response = requests.get(url)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevent in soup.body(["script","style","img","input"]):
                irrelevent.decompose()
            self.text = soup.body.get_text(separator = "\n", strip = True)
        else:
            self.text = ""
            
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
       return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"
            
    

In [7]:
gs = Website("https://gayatri-sivani-susarla.github.io/GSS.portfolio/")
gs.links

['#header',
 '#about',
 '#services',
 '#portfolio',
 '#contact',
 'https://www.udemy.com/certificate/UC-75ec041e-7c57-42ec-bc5a-91e8f49428ce/',
 'https://www.credly.com/users/gayatri-sivani-susarla',
 '#',
 'https://github.com/GAYATRI-SIVANI-SUSARLA/NCAA-March-Madness-Basketball-Tournament-Outcome-Prediction-Model',
 'https://github.com/GAYATRI-SIVANI-SUSARLA/Airbnb_Listing2024_Python_Project',
 'https://github.com/GAYATRI-SIVANI-SUSARLA/Quantum_Search_Algorithm_Weighted_Database',
 'https://github.com/GAYATRI-SIVANI-SUSARLA',
 'https://www.linkedin.com/in/gayatri-sivani-susarla-975856263/',
 'https://github.com/GAYATRI-SIVANI-SUSARLA',
 'GS DS Resume.pdf']

First step: Have GPT-4o-mini figure out which links are relevant
Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!

In [8]:
## system prompt
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [9]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [10]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [11]:
print(get_links_user_prompt(gs))

Here is the list of links on the website of https://gayatri-sivani-susarla.github.io/GSS.portfolio/ - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
#header
#about
#services
#portfolio
#contact
https://www.udemy.com/certificate/UC-75ec041e-7c57-42ec-bc5a-91e8f49428ce/
https://www.credly.com/users/gayatri-sivani-susarla
#
https://github.com/GAYATRI-SIVANI-SUSARLA/NCAA-March-Madness-Basketball-Tournament-Outcome-Prediction-Model
https://github.com/GAYATRI-SIVANI-SUSARLA/Airbnb_Listing2024_Python_Project
https://github.com/GAYATRI-SIVANI-SUSARLA/Quantum_Search_Algorithm_Weighted_Database
https://github.com/GAYATRI-SIVANI-SUSARLA
https://www.linkedin.com/in/gayatri-sivani-susarla-975856263/
https://github.com/GAYATRI-SIVANI-SUSARLA
GS DS Resume.pdf


In [12]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
    model = Model,
    messages = [
           {"role": "system", "content": link_system_prompt},
           {"role": "user", "content": get_links_user_prompt(website)}
    ],
     response_format = {"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [13]:
get_links("https://anthropic.com")

{'links': [{'type': 'about page', 'url': 'https://www.anthropic.com/company'},
  {'type': 'careers page', 'url': 'https://www.anthropic.com/careers'},
  {'type': 'team page', 'url': 'https://www.anthropic.com/team'},
  {'type': 'events page', 'url': 'https://www.anthropic.com/events'},
  {'type': 'news page', 'url': 'https://www.anthropic.com/news'}]}

In [29]:
anthropic = Website("https://anthropic.com")
anthropic.links

['#main',
 '#footer',
 'https://www.anthropic.com/',
 'https://www.anthropic.com/claude',
 'https://www.anthropic.com/max',
 'https://www.anthropic.com/team',
 'https://www.anthropic.com/enterprise',
 'https://www.anthropic.com/education',
 'https://www.anthropic.com/pricing',
 'https://claude.ai/download',
 'https://claude.ai/',
 'https://www.anthropic.com/news/claude-character',
 'https://www.anthropic.com/api',
 'https://docs.anthropic.com/',
 'https://www.anthropic.com/pricing#api',
 'https://console.anthropic.com/',
 'https://docs.anthropic.com/en/docs/welcome',
 'https://www.anthropic.com/solutions/agents',
 'https://www.anthropic.com/solutions/coding',
 'https://www.anthropic.com/solutions/customer-support',
 'https://www.anthropic.com/customers',
 'https://www.anthropic.com/research',
 'https://www.anthropic.com/economic-index',
 'https://www.anthropic.com/claude/opus',
 'https://www.anthropic.com/claude/sonnet',
 'https://www.anthropic.com/claude/haiku',
 'https://www.anthropi

Second step: make the brochure!
Assemble all the details into another prompt to GPT4-o


In [14]:
# function to get necessary links and contents in those links 
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result
        

In [16]:
print(get_all_details("https://gayatri-sivani-susarla.github.io/GSS.portfolio/"))

Found links: {'links': [{'type': 'about page', 'url': 'https://gayatri-sivani-susarla.github.io/GSS.portfolio/#about'}, {'type': 'services page', 'url': 'https://gayatri-sivani-susarla.github.io/GSS.portfolio/#services'}, {'type': 'portfolio page', 'url': 'https://gayatri-sivani-susarla.github.io/GSS.portfolio/#portfolio'}, {'type': 'contact page', 'url': 'https://gayatri-sivani-susarla.github.io/GSS.portfolio/#contact'}]}
Landing page:
Webpage Title:
Gayatri Sivani Susarla Portfolio Website
Webpage Contents:
Home
About
Certifications
Projects
Contact
Data Scientist
Hey!, I am
Gayatri Sivani Susarla
.
About Me
I am aspiring Data Scientist pursuing Master's degree in Data Science major at Stony Brook University, New York. I developed passion for Large Language Models and Big Data Analytics in my MS journey, and move forward to collaborate with professionals in these fields. I want to focus on solving real-world challenges, driving impactful results and grow professionally. I am eager to

Till now we called model for 2 times, 1st time to get the links necessary, 2nd time to get all content in those necessary links 

In [44]:
#task is to be done, also tone and character LLM should adopt in generating the content 
# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."

# Fun way 
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."


In [45]:
## user prompt
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt 

In [46]:
get_brochure_user_prompt("Anthropic", "https://anthropic.com")

Found links: {'links': [{'type': 'homepage', 'url': 'https://www.anthropic.com/'}, {'type': 'about page', 'url': 'https://www.anthropic.com/company'}, {'type': 'careers page', 'url': 'https://www.anthropic.com/careers'}, {'type': 'team page', 'url': 'https://www.anthropic.com/team'}, {'type': 'customers page', 'url': 'https://www.anthropic.com/customers'}, {'type': 'news page', 'url': 'https://www.anthropic.com/news'}, {'type': 'events page', 'url': 'https://www.anthropic.com/events'}, {'type': 'research page', 'url': 'https://www.anthropic.com/research'}, {'type': 'transparency page', 'url': 'https://www.anthropic.com/transparency'}, {'type': 'learn page', 'url': 'https://www.anthropic.com/learn'}]}


'You are looking at a company called: Anthropic\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHome \\ Anthropic\nWebpage Contents:\nSkip to main content\nSkip to footer\nClaude\nChat with Claude\nOverview\nMax plan\nTeam plan\nEnterprise plan\nEducation plan\nExplore pricing\nDownload apps\nClaude log in\nNews\nClaude’s character\nAPI\nBuild with Claude\nAPI\xa0overview\nDeveloper docs\nExplore pricing\nConsole log in\nNews\nLearn how to build with Claude\nSolutions\nCollaborate with Claude\nAI\xa0agents\nCoding\nCustomer support\nCase studies\nHear from our customers\nResearch\nResearch\nOverview\nEconomic Index\nClaude model family\nClaude Opus 4\nClaude Sonnet 4\nClaude Haiku 3.5\nResearch\nClaude’s extended thinking\nCommitments\nInitiatives\nTransparency\nResponsible scaling policy\nTrust center\nSecurity and compliance\nAnnouncement\nISO\xa042001 certi

In [29]:
## creating brochure
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model = Model,
        messages = [
            {"role": "system", "content": system_prompt},
             {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))
            

In [31]:
create_brochure("GS","https://gayatri-sivani-susarla.github.io/GSS.portfolio/")

Found links: {'links': [{'type': 'about page', 'url': 'https://gayatri-sivani-susarla.github.io/GSS.portfolio/#about'}, {'type': 'services page', 'url': 'https://gayatri-sivani-susarla.github.io/GSS.portfolio/#services'}, {'type': 'portfolio page', 'url': 'https://gayatri-sivani-susarla.github.io/GSS.portfolio/#portfolio'}, {'type': 'contact page', 'url': 'https://gayatri-sivani-susarla.github.io/GSS.portfolio/#contact'}]}


# Brochure: GS - Data Science Portfolio of Gayatri Sivani Susarla

---

## About Me

Welcome to the portfolio of **Gayatri Sivani Susarla**, an aspiring Data Scientist currently pursuing a Master's degree in Data Science at **Stony Brook University, New York**. With a strong passion for **Large Language Models** and **Big Data Analytics**, I aim to solve real-world challenges and contribute significantly to the evolution of the Data Science field.

---

## Skills & Specializations

I possess a diverse skill set tailored for data science and analytics:

- **Programming Languages**: 
  - Python, R, Java, SQL, OCaml, SAPBW
- **AI/ML Libraries**: 
  - PyTorch, TensorFlow, SciKit-Learn
- **Cloud & Data Visualization Tools**: 
  - Snowflake, CloudLab, Azure, PowerBI, SAP Analytics Cloud, Matplotlib, Seaborn, Plotly

---

## Professional Experience

### **Systems Engineer, Infosys**
**July 2022 - December 2023**

- Specialized in **SAPBI** while working with Procter & Gamble.
- Optimized and monitored data pipelines to ensure accurate data flow.
- Managed ETL processes across multiple time zones, enhancing the reliability and performance of data operations.
- Worked with advanced tools and upgraded skills through certifications in **SAP Analytics Cloud**, **Snowflake**, and **Microsoft Azure**.

---

## Education

- **Master's in Data Science**
  - January 2024 - Present
  - Stony Brook University, New York

- **Bachelor in Technology**
  - June 2018 - May 2022
  - Electronics and Instrumentation Engineering, Adikavi Nannaya University, Andhra Pradesh, India

---

## Certifications

- **Mastering Generative AI**: Specialized in Generative AI and Large Language Models.
- **Snowflake Expert**: Skilled in managing and optimizing data workflows and building scalable data pipelines.
- **P&G SAP ABAP Developer**: Certified in building enterprise applications.

---

## Projects

### **NCAA March Madness Outcome Prediction Model**
- Developed machine learning models to predict the outcomes of the collegiate basketball tournaments using Logistic Regression, Random Forest, and other algorithms.

### **Exploratory Data Analytics of Airbnb Listings**
- Conducted EDA on New York Airbnb listings and provided insights to help guests and hosts make informed decisions, utilizing Python libraries like Pandas and Matplotlib.

### **Quantum Search Algorithm on Weighted Databases**
- Implemented an Adaptive-Grover Algorithm using Qiskit, focusing on enhancing the efficiency of quantum search methodologies.

---

## Company Culture

At **GS**, collaboration and continuous learning are at the heart of our mission. We believe in a culture that thrives on innovation, encouraging data exploration and fostering the growth of new scientific skills. We are passionate about pushing the boundaries of data science and are open to new ideas and diverse perspectives.

---

## Careers & Opportunities

We are constantly looking for enthusiastic and driven individuals who aspire to make a significant impact in the field of data science. If you are a forward-thinking professional ready to tackle complex challenges, let's connect!

---

For more information or to connect, visit my portfolio at: [Gayatri Sivani Susarla's Portfolio](#)

---

Thank you for considering GS! Let's explore the world of data together.

# Finally - a minor improvement
With a small adjustment, we can change this so that the results stream back from OpenAI, with the familiar typewriter animation

In [37]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model= Model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    ## also can simple version 
    for chunk in stream:
        print(chunk.choices[0].delta.content or '', end = '')
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or '' 
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [47]:
stream_brochure("GS","https://gayatri-sivani-susarla.github.io/GSS.portfolio/")

Found links: {'links': [{'type': 'about page', 'url': 'https://gayatri-sivani-susarla.github.io/GSS.portfolio/#about'}, {'type': 'services page', 'url': 'https://gayatri-sivani-susarla.github.io/GSS.portfolio/#services'}, {'type': 'portfolio page', 'url': 'https://gayatri-sivani-susarla.github.io/GSS.portfolio/#portfolio'}, {'type': 'contact page', 'url': 'https://gayatri-sivani-susarla.github.io/GSS.portfolio/#contact'}]}


# Welcome to GS - Gayatri Sivani Susarla's Data Wonderland!

**Are you ready to explore the magical world of data? 🧙‍♀️✨**  
Take a break from those boring spreadsheets and enter the realm of predictions, patterns, and possibilities with our very own data sorceress, Gayatri Sivani Susarla! 

## Who's Behind The Curtain? (About Me) 🎩

Hey there! I'm Gayatri, your friendly neighborhood aspiring Data Scientist. Currently enrolled at Stony Brook University, I’m crafting my mastery in the art of Data Science, conjuring magic out of large language models and big data analytics. My goal? To solve real-world challenges and create delightful data spells (a.k.a insights)! 

## Skills That Make You Go “Wow!” 🚀

- **Programming Languages:** Python, R, Java, SQL, OCAML, SAPBW (or just "The Avengers of Programming!")
- **AI/ML Libraries:** PyTorch, TensorFlow (no, not the movies, but close!)
- **Cloud and Visualization Tools:** Snowflake, Azure, PowerBI, and everything else that makes your data look pretty! 🌥️✨

## Experience with a Dash of Humor 😂

- **Systems Engineer, Infosys:** Worked with P&G finding ways to optimize data flows—kind of like untangling your earphones, but way more complex! 
- **Master's in Data Science:** Where I learned things like “How to Talk to Databases” and “How to Make Machines Understand Human Language." 

## Career at GS: Join the Fun! 🎉

**Thinking about joining our squad? Here’s what you can expect:**

- **Team Culture:** No pressure, just data passion! We cherish creativity and share a lot of laughs—data jokes are a staple here! (e.g., “Why did the data go to therapy? Because it had too many outliers!”)
- **Diversity:** We believe in team spirit that thrives on diverse backgrounds and perspectives (and maybe the occasional coffee break gossip).

### What Our Customers Say 💬

- **“GS didn’t just analyze our data; they gave it a personality!”**
- **“They predicted our trends faster than my grandma predicts the weather!”**

## Fun Projects that Keep Us Up at Night (Not Just for Data Lovers!) 🌙

- **March Madness Prediction Model:** Because betting on basketball should involve science! 🏀
- **Airbnb Exploratory Data Analysis:** Helping guests and hosts beat the rental game.
- **Quantum Search Algorithm:** Making data analysis feel more like a sci-fi thriller—coming soon to a cinema near you! 🎥

## Want to Connect? Let’s Talk Data! 📞

Ready to dive into the sea of data or have a giggle about data-driven puns? Whether you’re a customer, investor, or potential recruit, we’re only a click away! 

### Contact Us:
- **Email:** yourdatafriend@gsdata.com
- **Website:** [www.gsdata.com](http://www.gsdata.com)

---

**So why wait? Join us at GS for some data fun, and let’s transform numbers into knowledge!** 

_**Warning:** Side effects may include increased curiosity, improved data skills, and spontaneous bursts of laughter!_

In [48]:
stream_brochure("Huggingface","https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'discussion forum', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Brochure: AI-Huggers Unite! 🤗

### Welcome to Hugging Face!
*Where machine learning meets friendly faces. We're not just building AI; we're building a community that hugs back!*

---

### What We Do 🤖
We are **The AI community building the future.** Imagine a place where the coolest kids from the AI neighborhood drop their models, datasets, and applications like they're hot potatoes! We’ve got over **1M models** waiting to be discovered, including some trending gems like:
- **FLUX.1-Kontext-dev**: Not just a cool name, it's the future of context-aware AI!
- **Hunyuan-A13B-Instruct**: Makes sure your AI knows instructions and how to follow them (no more “Huh?” face).
- **Gemma-3n-E4B-it**: Not related to the cleaning product, but it cleans up the competition!

---

### Features That'll Make You Go "Whoa!" 🤩
- **Spaces**: Like temp agencies for AI apps! From generating apps with DeepSeek to editing images with FLUX; we’ve got spaces for every AI learning style.
- **Datasets Galore**: With over **250k datasets**, we’ve got everything from **awesome-chatgpt-prompts** to **OpenScience**.
- **Compute Solutions**: For those who need muscle, deploy models on **optimized endpoints, starting at just $0.60/hour!**

---

### Join the Community! 🤝
More than **50,000 organizations** (yes, even *your mom’s* tech startup) are already signing up for this AI roller-coaster! 🚀 

### Customer Spotlight: 🤓
- **Amazon**, **Google**, **Microsoft** – Sounds like the Avengers of tech? Well, they might just be joining forces to create the next AI superhero with Hugging Face!

---

### Careers: Let’s Hug It Out! 💼
Are you looking for a workplace where every day feels like a hug? Look no further! We’re on the lookout for:
- **AI Enthusiasts**: Bring your passion and creativity!
- **Data Wizards**: Conjure up magic from rows and columns.
- **Community Managers**: Help us keep the hugs flowing!

Apply NOW or you might just get hugged... literally.
#### *(Note: Hugs are not mandatory, but they’re encouraged!)*

---

### How to Get Started 🔧
- **Sign up** to join the fun!
- Check out our **docs** for tons of resources.
- Transform your ideas into reality with a little help from Hugging Face magic dust!

---

### Why Hugging Face?
Because we’re not just a community. We’re a revolution. We’re the friendly faces of AI, and we can’t wait to see what you’ll create!
  
**Join us, and let’s hug the future together! 🤗** 

---

**For more information, follow us on:**
- [GitHub](https://github.com)
- [Twitter](https://twitter.com)
- [LinkedIn](https://linkedin.com)

---

*Remember: Life is too short for cold shoulders, embrace the future with Hugging Face!*