# 2021: Week 7 - Vegan Shopping List

February 17, 2021

Challenge by: Jonathan Allenby

Now that Veganuary has come and gone we thought it would be interesting to take a look at some common supermarket products and use Prep to figure out whether or not they are vegan. Some results may surprise you!

For the sake of this analysis we're taking bee by-products as non-vegan (beeswax, honey, etc).

## Inputs

1. A shopping list of products and their ingredients (or allergens when ingredients were not available). I have a child-like palate so its mostly full of sweet treats, some of which you'd expect to be vegan and some of which you'd expect not to be, however everything is commonly found in UK supermarkets so no specialist shops required.

<img src='https://1.bp.blogspot.com/-BfYMo--X7R0/X_RxbJVxzVI/AAAAAAAABDs/yNFOPkii-M8oy4jHtebRFhOt_zQJmJWWwCLcBGAsYHQ/s1059/01%2BShoppingListInput.jpg'>

2. Two lists of common non-vegan ingredients and E numbers (source)

<img src='https://1.bp.blogspot.com/-OKXkRvjBrUQ/X_RxbNpC42I/AAAAAAAABDo/dljP44u8uyMkN_5q4Kw8zUSX_zayCuGQACPcBGAYYCw/w640-h90/02%2BNonVeganIngredients.jpg'>

## Requirements

- Input the data
- Prepare the keyword data
    - Add an 'E' in front of every E number.
    - Stack Animal Ingredients and E Numbers on top of each other.
    - Get every ingredient and E number onto separate rows.
- Append the keywords onto the product list.
- Check whether each product contains any non-vegan ingredients.
- Prepare a final shopping list of vegan products.
    - Aggregate the products into vegan and non-vegan.
    - Filter out the non-vegan products.
- Prepare a list explaining why the other products aren't vegan.
    - Keep only non-vegan products.
    - Duplicate the keyword field.
    - Rows to columns pivot the keywords using the duplicate as a header.
    - Write a calculation to concatenate all the keywords into a single comma-separated list for each product, e.g. "whey, milk, egg".
- Output the data.

## Outputs

- Vegan Shopping List
    - Product
    - Description
    - 20 rows (21 including headers)

<img src='https://1.bp.blogspot.com/-1WRcHfNPDAc/X_SEXlXvagI/AAAAAAAABD8/VmGKs9tDP1s713EKO1jYSO_yr3Kz-7_YwCLcBGAsYHQ/s834/03%2BVeganOutput.jpg'>

- Non Vegan List
    - Product
    - Description
    - Contains
    - 19 rows (20 including headers)

<img src='https://1.bp.blogspot.com/-n90BmuM-xgQ/X_SEe5KydyI/AAAAAAAABEA/uQpiNeQNhCoOT5ntFPgzDZzdbcHQDT8uQCLcBGAsYHQ/s807/04%2BNonVeganOutput.jpg'>


In [34]:
import pandas as pd

In [35]:
input = 'Shopping List and Ingredients.xlsx'
excel_sheets = pd.ExcelFile(input).sheet_names
print(excel_sheets)

['Shopping List', 'Keywords']


In [36]:
df1 = pd.read_excel(input, sheet_name='Shopping List')
df1['Ingredients/Allergens'] = df1['Ingredients/Allergens'].str.lower()
print(df1.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39 entries, 0 to 38
Data columns (total 3 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   Product                39 non-null     object
 1   Description            39 non-null     object
 2   Ingredients/Allergens  39 non-null     object
dtypes: object(3)
memory usage: 1.0+ KB
None


In [37]:
df2 = pd.read_excel(input, sheet_name='Keywords')
print(df2.head(5))
print(df2.info())

                                  Animal Ingredients  \
0  Milk, Whey, Honey, Egg, Lactose, Collagen, Ela...   

                                          E Numbers  
0  120, 441, 545, 901, 904, 910, 920, 921, 913, 966  
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 2 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Animal Ingredients  1 non-null      object
 1   E Numbers           1 non-null      object
dtypes: object(2)
memory usage: 144.0+ bytes
None


In [38]:
# Add an 'E' in front of every E number.
df2['E Numbers'] = df2['E Numbers'].str.replace('(\d+)', r'E\1')

  df2['E Numbers'] = df2['E Numbers'].str.replace('(\d+)', r'E\1')


In [39]:
# Stack Animal Ingredients and E Numbers on top of each other.
df2 = df2.melt(id_vars=None, value_vars=['Animal Ingredients', 'E Numbers'], var_name='pivot_name', value_name='pivot_value')
print(df2)

           pivot_name                                        pivot_value
0  Animal Ingredients  Milk, Whey, Honey, Egg, Lactose, Collagen, Ela...
1           E Numbers  E120, E441, E545, E901, E904, E910, E920, E921...


In [40]:
# Get every ingredient and E number onto separate rows.
# split the pivot_value column into multiple columns based on the comma separator
df_split = df2['pivot_value'].str.split(', ', expand=True)

df_split = df_split.melt(id_vars=None, value_vars=df_split.columns, var_name='pivot_name' , value_name='Keywords')
df_split.drop(columns='pivot_name', inplace=True)

# Remove null value
df_split = df_split.dropna()
df_split['Keywords'] = df_split['Keywords'].str.lower()
print(df_split)

     Keywords
0        milk
1        e120
2        whey
3        e441
4       honey
5        e545
6         egg
7        e901
8     lactose
9        e904
10   collagen
11       e910
12    elastin
13       e920
14    keratin
15       e921
16   gelatine
17       e913
18    gelatin
19       e966
20     pepsin
22  isinglass
24    shellac
26       lard
28      aspic
30    beeswax


In [41]:
# Append the keywords onto the product list.
df_split['dummy_id'] = 1
df1['dummy_id'] = 1
append_df = pd.merge(left=df_split, right=df1, on='dummy_id', how='inner')


In [42]:
# Check whether each product contains any non-vegan ingredients.
append_df['Contains Ingredient'] = append_df.apply(lambda row: 1 if row['Keywords'] in row['Ingredients/Allergens'] else 0, axis=1)
print(append_df.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1014 entries, 0 to 1013
Data columns (total 6 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   Keywords               1014 non-null   object
 1   dummy_id               1014 non-null   int64 
 2   Product                1014 non-null   object
 3   Description            1014 non-null   object
 4   Ingredients/Allergens  1014 non-null   object
 5   Contains Ingredient    1014 non-null   int64 
dtypes: int64(2), object(4)
memory usage: 55.5+ KB
None


In [43]:
# Prepare a final shopping list of vegan products
vegan_list = append_df.groupby(['Product', 'Description']).agg(max_contain=('Contains Ingredient','max')).reset_index()
vegan_list = vegan_list[vegan_list['max_contain'] == 0]
print(vegan_list.head(5))

                                  Product  \
1      Alpen Light Jaffa Cake Cereal Bars   
4  Belvita Soft Filled Chocolate Biscuits   
5   Cadbury Bourneville Chocolate Fingers   
7       Co-op Bakery 5 Jam Ball Doughnuts   
8  Doritos Chilli Heatwave Tortilla Chips   

                                         Description  max_contain  
1  Mixed cereal bar with orange flavoured fruity ...            0  
4  Soft baked biscuits made with wholegrain cerea...            0  
5  Crisp biscuits covered with dark chocolate (48...            0  
7                                    Jam Doughnut 5s            0  
8                 Chilli Heatwave Flavour Corn Chips            0  


In [44]:
# Prepare a list explaining why the other products aren't vegan.
non_vegan_list = append_df[append_df['Contains Ingredient'] == 1]
# print(non_vegan_list.info())
non_vegan_ingredient = non_vegan_list['Keywords'].unique()


# print('======================================')
# print(non_vegan_ingredient)
non_vegan_list = non_vegan_list[['Product', 'Description', 'Ingredients/Allergens']]
non_vegan_list = non_vegan_list.drop_duplicates()
# print(non_vegan_list.head(5))

for index, row in non_vegan_list.iterrows():
    temp_contain = ''
    for ingredient in non_vegan_ingredient:
        if ingredient in row['Ingredients/Allergens']:
            temp_contain = temp_contain + ', ' + ingredient
    non_vegan_list.at[index, 'Contains'] = temp_contain

non_vegan_list['Contains'] = non_vegan_list['Contains'].str[2:]
print(non_vegan_list.head(5))


                                  Product  \
3           Walkers Max Flamin Hot Crisps   
4            Smiths Frazzles Bacon Snacks   
5            Sensations Thai Sweet Chilli   
8              Tesco 5 Pack Jam Doughnuts   
9  Krispy Kreme Original Glazed Doughnuts   

                                         Description  \
3  Fiercely Flamin' Hot Flavour Ridged Potato Crisps   
4                    Crispy Bacon Flavour Corn Snack   
5            Thai Sweet Chilli Flavour Potato Crisps   
8                                   Jam Doughnut 5PK   
9  Bring some light and fluffy joy into your day ...   

                               Ingredients/Allergens             Contains  
3  potatoes, vegetable oils (sunflower, rapeseed,...           milk, whey  
4  maize, rapeseed oil, bacon flavour seasoning [...  milk, whey, lactose  
5  potatoes, vegetable oils (sunflower, rapeseed,...                 milk  
8  wheat flour [wheat flour, calcium carbonate, i...                 milk  
9           