<h1>Nutritional Analysis of Convenience Food Options at Seven Eleven</h1>

by David Diaz Aguirre

<p>From onigiri to full bento boxes, Seven Eleven offers an incredibly large variety of food options at its more than 21,000 stores in Japan. This "convenience food" is a staple for those who live busy lives, and so Seven Eleven strives to provide options that are both convenient and tasty. While the customer certainly pays a premium for the convenience, for most, the prices are low enough to be affordable at least once in a while. However, how healthy are these food options really?</p>
<p>Gym-goers in particular are careful about the nutritional value of the foods they eat, and try to minimize the amount of calories and fat they consume, while maximizing their protein intake. Given that fitness is a large time committment that requires getting enough exercise while also balancing work and other aspects of life, many people with this goal may turn to convenience food to supplement their diets.</p>
<p>In this analysis, I will pull the nutritional information of Seven Eleven's food offerings in an attempt to understand which options may be more appealing to health-conscious consumers. As a primary analysis, it will be rather simple, but I will do this by comparing the caloric content of the food with its protein content. Given that lower calories for higher protein content is desirable for gym-goers, I will search for the food options that give the best ratio of protein-per-calories.</p>

<h2>Part 1: Scraping Nutritional Data from the Official Seven Eleven Website</h2>

<p>First, I will use the `requests` library in Python to pull the web pages containing the individual product pages on Seven Eleven's website, and then I will use the `BeautifulSoup` library as well as regular expressions to parse the HTML content and extract the nutritional information that we are looking for.</p>

<p>At this stage, due to time constraints, I will limit my scraping and analysis to only the Sandwiches available at Seven Eleven. This is because the inventory at Seven Eleven is truly immense, and analysis of all offerings would take considerably longer.</p>
<p>As an aside, I have confirmed that scraping the product pages is allowed by Seven Eleven under their guidelines at https://www.sej.co.jp/robots.txt</p>

In [1]:
# Import the relevant libraries we will use for fetching and parsing the data
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
import re

In [2]:
# Define the base url of the Seven Eleven website
BASE_URL = "https://www.sej.co.jp"

# This is the base URL for the Sandwich category of Seven Eleven's inventory
CATEGORY_BASE = "https://www.sej.co.jp/products/a/sandwich/"

# Declare ourselves to the Seven Eleven server
HEADERS = {
    "User-Agent": "Mozilla/5.0 (Educational Data Project)"
}

In [3]:
# I will gather all the product links into this list
product_links = []

# Loop through pages 1 to 9, which is how many pages of sandwiches there are on the website
for page in range(1, 10):

    # Build the paginated URL and print so I see the progress
    url = f"{CATEGORY_BASE}{page}/l15/"
    print(f"Scraping category page: {url}")

    # Download the page
    res = requests.get(url, headers=HEADERS)

    # Parse the HTML so I can search it
    soup = BeautifulSoup(res.text, "html.parser")

    links_found_on_page = 0

    # Look through every <a> tag on the page
    for a in soup.select("a"):

        # Extract the href attribute
        href = a.get("href")

        # Only keep product links that follow the URL format on Seven Eleven's website
        if href and "/products/a/item/" in href:

            # Convert into a full URL
            full_url = BASE_URL + href

            # Add this product link to the list for scraping
            product_links.append(full_url)
            links_found_on_page += 1
    
    print(f"  Found {links_found_on_page} products on this page")

    # Stop scraping if no links are found on the page, in case the code does not work. Otherwise it will stop when it has scraped 9 pages
    if links_found_on_page == 0:
        break

    # Wait 1.5 seconds between iterations so as to not send all HTTP requests in quick succession
    time.sleep(1.5)

# Get the set of product links to remove duplicates just in case
product_links = list(set(product_links))

# There are 133 sandwiches listed on Seven Eleven's website, so I'll know this worked if this prints 133
print(f"TOTAL PRODUCTS FOUND: {len(product_links)}")

Scraping category page: https://www.sej.co.jp/products/a/sandwich/1/l15/
  Found 30 products on this page
Scraping category page: https://www.sej.co.jp/products/a/sandwich/2/l15/
  Found 30 products on this page
Scraping category page: https://www.sej.co.jp/products/a/sandwich/3/l15/
  Found 30 products on this page
Scraping category page: https://www.sej.co.jp/products/a/sandwich/4/l15/
  Found 30 products on this page
Scraping category page: https://www.sej.co.jp/products/a/sandwich/5/l15/
  Found 30 products on this page
Scraping category page: https://www.sej.co.jp/products/a/sandwich/6/l15/
  Found 30 products on this page
Scraping category page: https://www.sej.co.jp/products/a/sandwich/7/l15/
  Found 30 products on this page
Scraping category page: https://www.sej.co.jp/products/a/sandwich/8/l15/
  Found 30 products on this page
Scraping category page: https://www.sej.co.jp/products/a/sandwich/9/l15/
  Found 26 products on this page
TOTAL PRODUCTS FOUND: 133


<p>Now that we have all the product pages for all 133 sandwiches that Seven Eleven offers, I will access each page and extract the nutritional data in order to build the dataset.</p>

In [4]:
rows = []   # This will store each final product record

# This helper function will pull the nutritional values using regular expressions
def extract_value(pattern, text):
    if text is None:
        return None
    match = re.search(pattern, text)
    return float(match.group(1)) if match else None


# I will enumerate the product_links list in order to loop through it and use the index, starting from 1, to track the scraping progress
for i, url in enumerate(product_links, start=1):

    print(f"[{i}/{len(product_links)}] Scraping product: {url}")

    # Download the product page
    res = requests.get(url, headers=HEADERS)
    soup = BeautifulSoup(res.text, "html.parser")

    # Get the product name from the page header
    name_tag = soup.find("h1")
    product_name = name_tag.text.strip() if name_tag else None

    # The nutrition information is held in a single <td> in an HTML table for each product, so I will hold it here and append to rows later
    nutrition_text = None
    
    # Look for the row labeled 栄養成分 (nutritional information)
    for tr in soup.select("table tr"):
        th = tr.find("th")
        td = tr.find("td")
        
        if th and td and "栄養成分" in th.text:
            nutrition_text = td.text.strip()
            break

    # Extract each nutrient from the combined string. The regular exression `[\d\.]+` will capture strings of one or more digits and decimals, getting our nutrional value
    calories = extract_value(r"熱量：([\d\.]+)kcal", nutrition_text)
    protein  = extract_value(r"たんぱく質：([\d\.]+)g", nutrition_text)
    fat      = extract_value(r"脂質：([\d\.]+)g", nutrition_text)
    carbs    = extract_value(r"炭水化物：([\d\.]+)g", nutrition_text)

    # I will extract the prices too, although we are focusing on the nutritional value, this may be useful later
    # The Seven Eleven website keeps the prices in a div with class=="item_price" and then a nested <p> with the price in yen.
    price_tag = soup.select_one(".item_price p")
    price = price_tag.text.strip() if price_tag else None

    # Append everything in a structured format, creating a list of dictionaries that we can turn into a DataFrame
    rows.append({
        "store": "7-Eleven",
        "product_name": product_name,
        "calories": calories,
        "protein_g": protein,
        "fat_g": fat,
        "carbs_g": carbs,
        "price_yen": price,
        "url": url
    })

    # Small delay so as to not send too many requests at once
    time.sleep(1.5)

[1/133] Scraping product: https://www.sej.co.jp/products/a/item/052537/
[2/133] Scraping product: https://www.sej.co.jp/products/a/item/053369/
[3/133] Scraping product: https://www.sej.co.jp/products/a/item/052858/
[4/133] Scraping product: https://www.sej.co.jp/products/a/item/053326/
[5/133] Scraping product: https://www.sej.co.jp/products/a/item/053327/
[6/133] Scraping product: https://www.sej.co.jp/products/a/item/052956/
[7/133] Scraping product: https://www.sej.co.jp/products/a/item/053083/
[8/133] Scraping product: https://www.sej.co.jp/products/a/item/052552/
[9/133] Scraping product: https://www.sej.co.jp/products/a/item/052923/
[10/133] Scraping product: https://www.sej.co.jp/products/a/item/053288/
[11/133] Scraping product: https://www.sej.co.jp/products/a/item/053338/
[12/133] Scraping product: https://www.sej.co.jp/products/a/item/053265/
[13/133] Scraping product: https://www.sej.co.jp/products/a/item/052875/
[14/133] Scraping product: https://www.sej.co.jp/products/a/

In [5]:
# Checking that everything got extracted correctly
rows

[{'store': '7-Eleven',
  'product_name': 'ホットドッグ',
  'calories': 271.0,
  'protein_g': 9.5,
  'fat_g': 16.2,
  'carbs_g': 22.3,
  'price_yen': '180円（税込194.40円）',
  'url': 'https://www.sej.co.jp/products/a/item/052537/'},
 {'store': '7-Eleven',
  'product_name': '福島県産あかつきもものホイップサンド',
  'calories': 595.0,
  'protein_g': 6.0,
  'fat_g': 41.6,
  'carbs_g': 49.7,
  'price_yen': '360円（税込388.80円）',
  'url': 'https://www.sej.co.jp/products/a/item/053369/'},
 {'store': '7-Eleven',
  'product_name': 'たれチキンカツサンド',
  'calories': 412.0,
  'protein_g': 18.1,
  'fat_g': 20.1,
  'carbs_g': 40.5,
  'price_yen': '380円（税込410.40円）',
  'url': 'https://www.sej.co.jp/products/a/item/052858/'},
 {'store': '7-Eleven',
  'product_name': 'てりやきソースのチキンバーガー',
  'calories': 411.0,
  'protein_g': 26.4,
  'fat_g': 19.5,
  'carbs_g': 33.5,
  'price_yen': '388円（税込419.04円）',
  'url': 'https://www.sej.co.jp/products/a/item/053326/'},
 {'store': '7-Eleven',
  'product_name': 'てりやきソースのチキンバーガー',
  'calories': 411.0,
  'prote

<p>Now that I scraped all the pages and got the necessary information out of it, I can turn it all into a pandas DataFrame for cleaning and analysis.</p>

In [6]:
df = pd.DataFrame(rows)
df.head()

Unnamed: 0,store,product_name,calories,protein_g,fat_g,carbs_g,price_yen,url
0,7-Eleven,ホットドッグ,271.0,9.5,16.2,22.3,180円（税込194.40円）,https://www.sej.co.jp/products/a/item/052537/
1,7-Eleven,福島県産あかつきもものホイップサンド,595.0,6.0,41.6,49.7,360円（税込388.80円）,https://www.sej.co.jp/products/a/item/053369/
2,7-Eleven,たれチキンカツサンド,412.0,18.1,20.1,40.5,380円（税込410.40円）,https://www.sej.co.jp/products/a/item/052858/
3,7-Eleven,てりやきソースのチキンバーガー,411.0,26.4,19.5,33.5,388円（税込419.04円）,https://www.sej.co.jp/products/a/item/053326/
4,7-Eleven,てりやきソースのチキンバーガー,411.0,26.4,19.5,33.5,388円（税込419.04円）,https://www.sej.co.jp/products/a/item/053327/


<h2>Part 2: Cleaning and Gaining Insights from the Data</h2>

In [7]:
# First I'll check the data types. I'm going to need to clean the prices for sure
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 133 entries, 0 to 132
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   store         133 non-null    object 
 1   product_name  133 non-null    object 
 2   calories      133 non-null    float64
 3   protein_g     133 non-null    float64
 4   fat_g         133 non-null    float64
 5   carbs_g       133 non-null    float64
 6   price_yen     133 non-null    object 
 7   url           133 non-null    object 
dtypes: float64(4), object(4)
memory usage: 8.4+ KB


In [8]:
# I'll turn the prices into floats by only keeping the pre-tax price
df["price_yen"] = df["price_yen"].str.extract(r"^([\d]+)円").astype(float)
df["price_yen"].dtype

dtype('float64')

Now, let's compare the sandwiches to find which have the best **calorie-to-protein** and **price-to-protein** ratio.

In [9]:
# Protein efficiency
df["protein_per_100kcal"] = df["protein_g"] / df["calories"] * 100

# Protein per yen
df["protein_per_yen"] = df["protein_g"] / df["price_yen"]

display(df.sort_values("protein_per_100kcal", ascending=False).head(5))
display(df.sort_values("protein_per_yen", ascending=False).head(5))

Unnamed: 0,store,product_name,calories,protein_g,fat_g,carbs_g,price_yen,url,protein_per_100kcal,protein_per_yen
132,7-Eleven,たんぱく質が摂れるチキンロール,245.0,28.0,5.4,22.7,390.0,https://www.sej.co.jp/products/a/item/052451/,11.428571,0.071795
55,7-Eleven,たんぱく質が摂れるチキンロール,245.0,28.0,5.4,22.7,390.0,https://www.sej.co.jp/products/a/item/053254/,11.428571,0.071795
89,7-Eleven,たんぱく質が摂れるチキン＆チリ,280.0,25.8,9.6,24.8,400.0,https://www.sej.co.jp/products/a/item/053343/,9.214286,0.0645
63,7-Eleven,スタミナ源たれソースのチキンバーガー,372.0,26.4,16.9,29.4,380.0,https://www.sej.co.jp/products/a/item/053291/,7.096774,0.069474
105,7-Eleven,うま塩ペッパーチキンバーガー,391.0,26.6,20.1,26.8,350.0,https://www.sej.co.jp/products/a/item/053360/,6.803069,0.076


Unnamed: 0,store,product_name,calories,protein_g,fat_g,carbs_g,price_yen,url,protein_per_100kcal,protein_per_yen
105,7-Eleven,うま塩ペッパーチキンバーガー,391.0,26.6,20.1,26.8,350.0,https://www.sej.co.jp/products/a/item/053360/,6.803069,0.076
23,7-Eleven,トーストサンドハム＆チーズ,335.0,17.3,11.2,42.1,238.0,https://www.sej.co.jp/products/a/item/051160/,5.164179,0.072689
123,7-Eleven,ブリトーチーズ倍盛り　ハム＆チーズ,441.0,28.0,23.4,31.1,390.0,https://www.sej.co.jp/products/a/item/053054/,6.349206,0.071795
55,7-Eleven,たんぱく質が摂れるチキンロール,245.0,28.0,5.4,22.7,390.0,https://www.sej.co.jp/products/a/item/053254/,11.428571,0.071795
132,7-Eleven,たんぱく質が摂れるチキンロール,245.0,28.0,5.4,22.7,390.0,https://www.sej.co.jp/products/a/item/052451/,11.428571,0.071795


This reveals some interesting facts about our dataset. Firstly, there are clearly duplicates among the sandwiches that we scraped off the Seven Eleven website. You will notice on the top table that the first two rows are the same product with the same values, just different URLs. This may be a mistake on Seven Eleven's part, or perhaps the products are separated based on region in which they are sold. In any case, further cleaning to get rid of perfect duplicates would make the dataset more precise. 

But as we can see, the **most calorie-efficient sandwich at Seven Eleven is the Chicken Roll** (たんぱく質が摂れるチキンロール) with **11.4 grams of protein per 100 calories**. At 245 calories for 28 grams of protein, it is an incredibly strong item, although that also seemd to drive the price up to 390 yen.

If we look at the second table, we can see that **the most cost-effective item is the Salt and Pepper Chicken Burger** (うま塩ペッパーチキンバーガー). This is due to the good ratio of **26.6 grams of protein for 350 yen**, which is only 1.4 grams of protein less than the Chicken Roll, **but 40 yen cheaper**. However, when you note the calorie (391kcal vs 245kcal) and fat content (20.1g vs 5.4g) difference versus the Chicken Roll, perhaps the 40 yen is worth spending for the health-conscious consumer.

I put this small project together in about 3 hours of research, coding, and writing. It is far from what it could be, and I intend to expand on it with time and knowledge, but for now I hope that this is a good display of my foundational skills in data analysis. Thank you for reading.