# Exploratory Analysis of Ingredients
Ingredients are the core of sweetgreen's business. Analyzing ingredients can yield clues as to the complexity taking place farther up in sweetgreens value chain in the restaurant, 

# Business Questions
* Why are some of the ingredients present but deactivated, whereas other ingredients are missing entirely?
* How is the decision to remove some of the ingredients made? For example true blue crab.
* Given that 63 ingredients need to be available how are they made fresh everyday?

# Technical Questions
* Description and source location are not used in the JSON. Why are they included at all?
* How are design decisions evaluated for technical components?

# Notes
* 83 ingredients are present
* 63 ingredients are available to order

In [2]:
import sweetgreen as sg
import pandas as pd
import matplotlib.pyplot as plt

In [31]:
ingredients = sg.utils.read_json("../data/cleaned/compiled_ingredients.json")
list(ingredients.values())[:2]

[{'id': 8,
  'name': 'shredded kale',
  'short': 'kale',
  'description': None,
  'source_location': None,
  'harvest_date': None,
  'kind': 'bases',
  'active': True,
  'farm_ids': [4],
  'property_ids': [4669, 4670],
  'asset_ids': [362]},
 {'id': 53,
  'name': 'cilantro',
  'short': None,
  'description': None,
  'source_location': None,
  'harvest_date': None,
  'kind': 'toppings',
  'active': True,
  'farm_ids': [],
  'property_ids': [4679, 4680],
  'asset_ids': [287]}]

In [25]:
flat_ingredients = []

for ingredient in ingredients.values():
    ingredient.pop("property_ids", None)
    ingredient.pop("asset_ids", None)
    ingredient.pop("farm_ids", None)

    flat_ingredients.append(ingredient)
    
    
df = pd.DataFrame(flat_ingredients)
df = df[["name","active", "description", "harvest_date", "id", "kind", "short", "source_location"]].sort_values("name")


Unnamed: 0,name,active,description,harvest_date,id,kind,short,source_location
58,_,False,,,151,,,
17,apples,True,,,170,toppings,,
52,avocado,True,,,27,premiums,,
20,balsamic roasted squash,True,,,265,premiums,balsamic squash,
19,balsamic vinaigrette,True,,,10,dressings,,


In [37]:
df.shape

(83, 8)

In [36]:
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    display(df)

Unnamed: 0,name,active,description,harvest_date,id,kind,short,source_location
58,_,False,,,151,,,
17,apples,True,,,170,toppings,,
52,avocado,True,,,27,premiums,,
20,balsamic roasted squash,True,,,265,premiums,balsamic squash,
19,balsamic vinaigrette,True,,,10,dressings,,
11,basil,True,,,49,toppings,,
8,blackened chicken thighs,True,,,259,premiums,,
5,blue cheese,True,,,127,premiums,,
77,burrata,False,,,258,premiums,,
3,caesar dressing,True,,,13,dressings,caesar,


Find ingredients that are removed

In [38]:
df[~df["active"]]

Unnamed: 0,name,active,description,harvest_date,id,kind,short,source_location
58,_,False,,,151,,,
77,burrata,False,,,258,premiums,,
67,carrot chili vinaigrette,False,,,14,dressings,,
81,charred tomato vinaigrette,False,,,257,dressings,,
76,citrus shrimp,False,,,31,premiums,shrimp,
78,green goddess ranch dressing,False,,,253,dressings,green goddess dressing,
65,heirloom tomatoes,False,,,101,premiums,,
70,nori furikake,False,,,93,dressings,,
73,organic chickpeas,False,,,56,toppings,chickpeas,
72,organic white cheddar,False,,,38,premiums,white cheddar,


### Active Ingredients
63 orderable ingredients, although note that these are not all at the order level stock keeping unit. For example some of these are comprised of numerous raw ingredients, for example curry yogurt dressing, or carmelized onions + leeks

In [41]:
df[df["active"]].shape

(63, 8)

In [43]:
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    display(df[df["active"]])

Unnamed: 0,name,active,description,harvest_date,id,kind,short,source_location
17,apples,True,,,170,toppings,,
52,avocado,True,,,27,premiums,,
20,balsamic roasted squash,True,,,265,premiums,balsamic squash,
19,balsamic vinaigrette,True,,,10,dressings,,
11,basil,True,,,49,toppings,,
8,blackened chicken thighs,True,,,259,premiums,,
5,blue cheese,True,,,127,premiums,,
3,caesar dressing,True,,,13,dressings,caesar,
25,caramelized onions + leeks,True,,,266,toppings,,
6,chopped romaine,True,,,3,bases,romaine,


In [40]:
# Get min and max of IDs
df.describe()

Unnamed: 0,id
count,83.0
mean,110.060241
std,88.212928
min,3.0
25%,36.5
50%,73.0
75%,170.5
max,268.0
