## Rapid Read: Bite size idea of your favorite blogsite articles from top tech firms

> - A product whose value proposition is enabling quick idea of blogs one enjoys
> - The Frontier model used for the product is a GPT-4o-mini
> - In order to use the model effectively we will be using **In-Context** learning
> - In particular we will be using One-Shot inference to achieve the goal

### Imports

In [1]:
# imports
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

### OpenAI API Call and Model GPT-4o-mini selection

In [2]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


#### Class to scrape webpages

In [3]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')] # code to gather links referred in the page
        self.links = [link for link in links if link]

    def get_contents(self): # describes what the webpage contains
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [7]:
# Testing the class 
blog = Website("https://tech.instacart.com/tagged/data-science")

# printing the contents scraped
print(blog.get_contents())

# printing the links scraped
blog.links

Webpage Title:
Data Science – tech-at-instacart
Webpage Contents:
Homepage
Open in app
Sign in
Get started
Eng
Data Science
Machine Learning
Careers
Tagged in
Data Science
tech-at-instacart
Instacart Engineering
More information
Followers
6.8K
Elsewhere
More, on Medium
Data Science
Monta Shen
in
tech-at-instacart
Jul 24, 2024
Data Science Spotlight: Cracking the SQL Interview at Instacart (LLM Edition)
Read more…
14
1 response
Benjamin Knight
in
tech-at-instacart
Aug 10, 2022
How Hundreds of Retailers Use the Instacart Partner Platform to Drive In-Store Efficiencies
Read more…
21
1 response
Lauren Svensson
in
tech-at-instacart
Jul 26, 2022
Building a Data-Driven company with Anahita Tafvizi, Instacart’s Vice President and Head of Data Science
Read more…
24
Xiaoding Krause
in
tech-at-instacart
Jun 2, 2022
Geo-based AB Testing and Difference-in-Difference Analysis in Instacart Catalog
Introducing the…
Read more…
92
1 response
Benjamin Knight
in
tech-at-instacart
May 18, 2022
Tooling to F

['https://medium.com/',
 'https://rsci.app.link/?%24canonical_url=https%3A%2F%2Fmedium.com/tech-at-instacart%3F~feature=LoMobileNavBar&~channel=ShowCollectionHome&~stage=m2',
 'https://medium.com/m/signin?redirect=https%3A%2F%2Ftech.instacart.com%2Ftagged%2Fdata-science&source=--------------------------nav_reg&operation=login',
 'https://medium.com/m/signin?redirect=https%3A%2F%2Ftech.instacart.com%2Ftagged%2Fdata-science&source=--------------------------nav_reg&operation=register',
 'https://tech.instacart.com?source=logo-lo_2e73a70c752c---587883b5d2ee',
 'https://tech.instacart.com/tagged/software-development',
 'https://tech.instacart.com/tagged/data-science',
 'https://tech.instacart.com/tagged/machine-learning',
 'https://instacart.careers/',
 'https://tech.instacart.com',
 'https://tech.instacart.com/about',
 'mailto:press@instacart.com',
 'https://twitter.com/instacart',
 '//facebook.com/instacart',
 'https://medium.com/tag/data-science',
 'https://tech.instacart.com/@monta.shen

## Getting GPT-4o mini to extract only relevant links

> - **GPT-4o-mini** should be able to decide the relevant links and replace any relative links e.g. "/about"
> - To provide **In-Context** learning we will use **One-Shot prompting** to acomplish our goal

### Designing the **System Prompt**

In [8]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links have the latest article.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [9]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links have the latest article.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



### Desigining the **User Prompt**

In [17]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please get the first article in the page, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [18]:
print(get_links_user_prompt(blog))

Here is the list of links on the website of https://tech.instacart.com/tagged/data-science - please get the first article in the page, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://medium.com/
https://rsci.app.link/?%24canonical_url=https%3A%2F%2Fmedium.com/tech-at-instacart%3F~feature=LoMobileNavBar&~channel=ShowCollectionHome&~stage=m2
https://medium.com/m/signin?redirect=https%3A%2F%2Ftech.instacart.com%2Ftagged%2Fdata-science&source=--------------------------nav_reg&operation=login
https://medium.com/m/signin?redirect=https%3A%2F%2Ftech.instacart.com%2Ftagged%2Fdata-science&source=--------------------------nav_reg&operation=register
https://tech.instacart.com?source=logo-lo_2e73a70c752c---587883b5d2ee
https://tech.instacart.com/tagged/software-development
https://tech.instacart.com/tagged/data-science
https://tech.instacart.com/tagged/machine-learning
https://instacart.careers/
ht

### Calling the OPENAI API to enable GPT-4o-mini

In [19]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    # we can ask the API for multiple variations of the response
    # choices[0] gets us the one and only choice as we are restricting to one response currently
    result = response.choices[0].message.content 
    
    return json.loads(result)

In [20]:
instacart = Website("https://tech.instacart.com/tagged/data-science")

In [21]:
instacart.links

['https://medium.com/',
 'https://rsci.app.link/?%24canonical_url=https%3A%2F%2Fmedium.com/tech-at-instacart%3F~feature=LoMobileNavBar&~channel=ShowCollectionHome&~stage=m2',
 'https://medium.com/m/signin?redirect=https%3A%2F%2Ftech.instacart.com%2Ftagged%2Fdata-science&source=--------------------------nav_reg&operation=login',
 'https://medium.com/m/signin?redirect=https%3A%2F%2Ftech.instacart.com%2Ftagged%2Fdata-science&source=--------------------------nav_reg&operation=register',
 'https://tech.instacart.com?source=logo-lo_b5356ef6b68a---587883b5d2ee',
 'https://tech.instacart.com/tagged/software-development',
 'https://tech.instacart.com/tagged/data-science',
 'https://tech.instacart.com/tagged/machine-learning',
 'https://instacart.careers/',
 'https://tech.instacart.com',
 'https://tech.instacart.com/about',
 'mailto:press@instacart.com',
 'https://twitter.com/instacart',
 '//facebook.com/instacart',
 'https://medium.com/tag/data-science',
 'https://tech.instacart.com/@monta.shen

In [22]:
get_links("https://tech.instacart.com/tagged/data-science")

{'links': [{'type': 'latest article',
   'url': 'https://tech.instacart.com/data-science-spotlight-cracking-the-sql-interview-at-instacart-llm-edition-52d04bde474c'}]}

## Desigining the output of the latest article in a crisp 60 word summary

### A second GPT-4o-mini prompt will be needed to collate the information 

In [23]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()  #create website object
    links = get_links(url) # calls the openai api
    print("Found links:", links)
    for link in links["links"]: # For each of the links use website object get each links content
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents() # and add to 'result'
    return result

In [24]:
print(get_all_details("https://tech.instacart.com/tagged/data-science"))

Found links: {'links': [{'type': 'article', 'url': 'https://tech.instacart.com/data-science-spotlight-cracking-the-sql-interview-at-instacart-llm-edition-52d04bde474c?source=---------0-----------------------'}]}
Landing page:
Webpage Title:
Data Science – tech-at-instacart
Webpage Contents:
Homepage
Open in app
Sign in
Get started
Eng
Data Science
Machine Learning
Careers
Tagged in
Data Science
tech-at-instacart
Instacart Engineering
More information
Followers
6.8K
Elsewhere
More, on Medium
Data Science
Monta Shen
in
tech-at-instacart
Jul 24, 2024
Data Science Spotlight: Cracking the SQL Interview at Instacart (LLM Edition)
Read more…
14
1 response
Benjamin Knight
in
tech-at-instacart
Aug 10, 2022
How Hundreds of Retailers Use the Instacart Partner Platform to Drive In-Store Efficiencies
Read more…
21
1 response
Lauren Svensson
in
tech-at-instacart
Jul 26, 2022
Building a Data-Driven company with Anahita Tafvizi, Instacart’s Vice President and Head of Data Science
Read more…
24
Xiaodin

In [None]:
system_prompt = "You are an assistant that analyzes the first three articles of a company blog site \
Your job is to summarize each of the three articles in 60 words giving a concise understanding of the articles \
Follow each article with a Top Take for that article and a suggestion of an application the articles knowledge \
can be used in. Add relevant emoji to match the text"



### Function to call the blog article summary

In [None]:
def get_blog_user_prompt(company_name, url):
    user_prompt = f"This blog belongs to: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short summary of top three articles in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [None]:
get_blog_user_prompt("InstaCart", "https://tech.instacart.com/tagged/data-science")

Found links: {'links': [{'type': 'latest article', 'url': 'https://tech.instacart.com/data-science-spotlight-cracking-the-sql-interview-at-instacart-llm-edition-52d04bde474c?source=---------0-----------------------'}]}


'This blog belongs to: InstaCart\nHere are the contents of its landing page and other relevant pages; use this information to build a short summary of top three articles in markdown.\nLanding page:\nWebpage Title:\nData Science – tech-at-instacart\nWebpage Contents:\nHomepage\nOpen in app\nSign in\nGet started\nEng\nData Science\nMachine Learning\nCareers\nTagged in\nData Science\ntech-at-instacart\nInstacart Engineering\nMore information\nFollowers\n6.8K\nElsewhere\nMore, on Medium\nData Science\nMonta Shen\nin\ntech-at-instacart\nJul 24, 2024\nData Science Spotlight: Cracking the SQL Interview at Instacart (LLM\xa0Edition)\nRead more…\n14\n1 response\nBenjamin Knight\nin\ntech-at-instacart\nAug 10, 2022\nHow Hundreds of Retailers Use the Instacart Partner Platform to Drive In-Store Efficiencies\nRead more…\n21\n1 response\nLauren Svensson\nin\ntech-at-instacart\nJul 26, 2022\nBuilding a Data-Driven company with Anahita Tafvizi, Instacart’s Vice President and Head of Data\xa0Science\n

### Generating the final product

In [None]:
def create_blog_summary(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [None]:
create_blog_summary("InstaCart", "https://tech.instacart.com/tagged/data-science")

Found links: {'links': [{'type': 'latest article', 'url': 'https://tech.instacart.com/data-science-spotlight-cracking-the-sql-interview-at-instacart-llm-edition-52d04bde474c?source=---------0-----------------------'}]}


## Article Summaries

### 1. Data Science Spotlight: Cracking the SQL Interview at Instacart (LLM Edition)
This article highlights the essential skills required for Data Scientists at Instacart, focusing on the SQL interview process. It discusses how candidates must translate business queries into SQL to derive insights and how the advent of Large Language Models (LLMs) can simplify the coding process during interviews. The text advocates for rethinking the interview methodology to better reflect real job experiences.

**Top Take:** Leveraging LLMs can streamline the SQL interview process.  
**Application:** Use LLMs during training sessions to improve programming skills and make coding more efficient. 💻

---

### 2. How Hundreds of Retailers Use the Instacart Partner Platform to Drive In-Store Efficiencies
This article explores how the Instacart Partner Platform assists retailers in enhancing operational efficiencies. It discusses various tools and features that drive in-store sales and optimize inventory management, ultimately improving customer experience. By analyzing real-time data, retailers can make informed decisions, leading to increased profitability and better business practices.

**Top Take:** The Instacart Platform enhances in-store efficiency and sales for retailers.  
**Application:** Retailers can assess their own platforms using Instacart as a blueprint for optimizing operations. 🏬

---

### 3. Building a Data-Driven Company with Anahita Tafvizi, Instacart’s Vice President and Head of Data Science
This article features an interview with Anahita Tafvizi, who shares insights on fostering a data-driven culture at Instacart. It emphasizes the importance of integrating data science into business strategy, encouraging collaboration between teams, and utilizing data analytics for decision-making. Tafvizi discusses real-life initiatives that demonstrate the impact of data-driven approaches on organizational growth.

**Top Take:** A strong data-driven culture enhances collaboration and decision-making effectiveness.  
**Application:** Businesses should invest in data analytics to foster better teamwork and informed strategies. 📊

### Introducing typewrite animation to the prompt generation

In [None]:
def stream_blog_summary(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [None]:
stream_blog_summary("InstaCart", "https://tech.instacart.com/tagged/data-science")

Found links: {'links': [{'type': 'latest article', 'url': 'https://tech.instacart.com/data-science-spotlight-cracking-the-sql-interview-at-instacart-llm-edition-52d04bde474c'}]}


### Article 1: Data Science Spotlight: Cracking the SQL Interview at Instacart (LLM Edition)

This article discusses the evolving landscape of SQL interviews for Data Scientists at Instacart. It highlights the blend of analytical and coding skills required, explaining how candidates demonstrate their ability to translate business questions into SQL queries. The article also emphasizes the efficiency brought by Large Language Models (LLMs), which allow candidates to leverage natural language for coding tasks, improving interview outcomes.

**Top Take:** LLMs can significantly streamline the SQL interview process, allowing candidates to demonstrate their skills more effectively.  
**Application Suggestion:** Use LLMs to enhance recruitment processes, enabling candidates to showcase their abilities efficiently during technical interviews. 💻

---

### Article 2: How Hundreds of Retailers Use the Instacart Partner Platform to Drive In-Store Efficiencies

This article outlines the functionalities and benefits of the Instacart Partner Platform for retailers aiming to increase in-store efficiencies. By integrating with Instacart, retailers leverage tools that help in inventory management, customer insights, and improved customer experiences. The collaborative approach with Instacart transforms traditional retail practices, driving greater agility and responsiveness in grocery operations.

**Top Take:** Partnering with Instacart empowers retailers to enhance operational efficiency and customer engagement.  
**Application Suggestion:** Retailers can adopt partner platforms to optimize inventory management and refine customer service strategies. 🛒

---

### Article 3: Building a Data-Driven Company with Anahita Tafvizi, Instacart’s Vice President and Head of Data Science

In this conversation-based article, Anahita Tafvizi shares insights on fostering a data-driven culture at Instacart. She emphasizes the importance of data literacy across all levels of the company, advocating for effective data utilization in decision-making. The article also discusses strategies to develop analytical skills among team members, ensuring that data drives innovation and operational success.

**Top Take:** Cultivating a data-driven culture and literacy is essential for organizational growth and innovation.  
**Application Suggestion:** Companies should invest in training programs that enhance data literacy to improve decision-making at all levels. 📊