# Make me a drink!

## Problem Statement:

Using a database of exisitng cocktail recipes, can I train a neural net to return viable new cocktail recipes?

### Risks:

1. I am not proficient with neural nets.
2. Initial dataset is small and finding recipe websites that allow scraping is proving difficult.
3. Drinks are subjective. I hate Manhattans but they're a classic drink. Quantifying these results could prove difficult.

In [1]:
import requests
import numpy as np
import pandas as pd
from fractions import Fraction
import time

In [2]:
url = 'https://www.thecocktaildb.com/api/json/v2/9973533/search.php?f='

In [3]:
#Complicated way of pulling the API, but we loop through each letter

df = pd.DataFrame()
for i in '1234567890abcdefghijklmnopqrstuvwxyz':
    response = requests.get(url + i)
    json = pd.DataFrame(response.json()['drinks'])
    df = pd.concat([df, json])
    time.sleep(5)

In [4]:
df['index'] = [i for i in range(len(df))]
df.set_index('index', inplace = True)

In [5]:
pd.set_option('display.max_rows', None)


In [6]:
df.drop(['idDrink', 'strDrinkAlternate', 'strDrinkES', 'strDrinkDE', 'strDrinkFR', 'strDrinkZH-HANS', 'strDrinkZH-HANT', 'strTags', 'strVideo', 'strCategory', 
           'strIBA', 'strAlcoholic', 'strInstructions', 'strInstructionsES', 'strInstructionsDE', 'strInstructionsFR', 'strInstructionsZH-HANS', 'strInstructionsZH-HANT',
          'strDrinkThumb', 'strIngredient7', 'strIngredient8', 'strIngredient9', 'strIngredient10', 'strIngredient11', 'strIngredient12', 'strIngredient13', 'strIngredient14',
         'strIngredient15', 'strMeasure7', 'strMeasure8', 'strMeasure9', 'strMeasure10', 'strMeasure11', 'strMeasure12', 'strMeasure13', 'strMeasure14', 'strMeasure15',
         'strCreativeCommonsConfirmed', 'dateModified'], axis = 1, inplace = True)

In [7]:
df['strIngredient1'] = df['strIngredient1'].replace(['Absolut Kurant', 'Malibu rum', 'Jack Daniels', 'Absolut Citron', 'Blended whiskey', 'Crown Royal', 'Lemon vodka', 'Baileys irish cream', 'Corona', 
                              'Midori melon liquer', 'Bourbon', 'Kahlua', 'Coca-Cola', 'Cointreau', 'gin', 'Bacardi Limon', 'Rye whiskey', 'Blended Scotch', 'Sugar syrup', 'Prosecco', 'Jim Beam',
                             'Godiva liquer', 'Ouzo', 'Johnny Walker', 'Wild Turkey', 'Peachtree schnapps', 'Chambord raspberry liqueur'], 
                             ['Black Currant Vodka', 'Coconut Rum', 'Whiskey', 'Citrus Vodka', 'Whiskey', 'Whiskey', 'Citrus Vodka', 'Irish cream', 'Pilsner', 'Melon Liquer', 'Whiskey', 'Coffee liquer',
                             'Coke', 'Triple Sec', 'Gin', 'Citrus Rum', 'Rye Whiskey', 'Scotch', 'Simple Syrup', 'Champagne', 'Whiskey', 'Chocolate Liquer', 'Sambuca', 'Scotch', 'Whiskey', 
                              'Peach Schnapps', 'Raspberry Liquer'])

In [8]:
df['strIngredient2'] = df['strIngredient2'].replace(['Wild Turkey', 'Johnnie Walker', 'Coca-Cola', '7-Up', 'Baileys irish cream', 'Roses sweetened lime juice', 'Orange Curacao', 'Kahlua', 'Bourbon', 'Midori melon liqueur', 
                              'Green Creme de Menthe', 'Prosecco', 'Sweet and sour', 'Malibu rum', 'White Creme de Menthe', 'Añejo rum', 'Dark Creme de Cacao', 'Bacardi Limon', 'Guinness stout', 'Erin Cream',
                             'Tia maria', 'Godiva liqueur', 'Chambord raspberry liqueur', 'Hot Damn', 'Raspberry Liqueur', 'Absolut Peppar', 'Bitter lemon', 'Absolut Citron', 'Tropicana', 'Bailey',
                             'Lemon-lime soda', 'Jack Daniels', '7-up', 'Coffee liqueur'],
                            ['Whiskey', 'Scotch', 'Coke', 'Sprite', 'Irish cream', 'Lime Juice', 'Triple Sec', 'Coffee liquer', 'Whiskey', 'Melon Liquer', 'Creme de Menthe', 'Champagne', 'Sour mix',
                            'Coconut Rum', 'Creme de Menthe', 'Dark Rum', 'Creme de Cacao', 'Citrus Rum', 'Stout', 'Irish cream', 'Coffee liquer', 'Chocolate liquer', 'Raspberry Liquer', 'Cinnamon Schnapps',
                            'Raspberry Liquer', 'Pepper Vodka', 'Lemon Bitters', 'Citrus Vodka', 'Orange Juice', 'Irish cream', 'Sprite', 'Whiskey', 'Sprite', 'Coffee Liquer'])

In [9]:
df['strIngredient3'] = df['strIngredient3'].replace(['Chambord raspberry liqueur', 'Rumple Minze', 'Jim Beam', 'Baileys irish cream', 'Lime juice cordial', 'Gold tequila', 'Midori melon liqueur', 'Sweet and sour', 'Kahlua', 'Sugar syrup',
                             'Peachtree schnapps', 'Fresh Lime Juice', 'Passoa', 'Surge', 'Godiva liqueur', 'Coca-Cola', 'White rum', 'Blackcurrant cordial', 'Sugar Syrup', 'Pepsi Cola', 'Wild Turkey', 'Rosso Vermouth',
                             'White Creme de Menthe'],
                            ['Raspberry Liquer', 'Peppermint Schnapps', 'Whiskey', 'Irish Cream', 'Lime Juice', 'Tequila', 'Melon Liquer', 'Sour Mix', 'Coffee Liquer', 'Simple Syrup', 'Peach Schnapps',
                            'Lime juice', 'Passion Fruit Liquer', 'Sprite', 'Chocolate Liquer', 'Coke', 'Rum', 'Black Currant Liquer', 'Simple Syrup', 'Coke', 'Bourbon', 'Sweet Vermouth', 'Creme de Menthe'])

In [10]:
df['strIngredient4'] = df['strIngredient4'].replace(['Midori melon liqueur', 'Surge', 'Lemon-lime soda', 'Añejo rum', 'Light rum', 'Sugar syrup', 'Baileys irish cream', 'Guinness stout', 'Coca-Cola', 'Chocolate Sauce'],
                            ['Melon liquer', 'Sprite', 'Sprite', 'Dark Rum', 'Rum', 'Simple Syrup', 'Irish cream', 'Stout', 'Coke', 'Chocolate Syrup'])

In [11]:
df['strIngredient5'] = df['strIngredient5'].replace(['Malibu rum', '7-Up', 'Prosecco', 'Lemon-lime soda', 'Sweet and Sour', 'Bourbon', 'Melon liqueur', 'Islay single malt Scotch'],
                            ['Coconut Rum', 'Sprite', 'Champagne', 'Sprite', 'Sour Mix', 'Whiskey', 'Melon liqeur', 'Scotch'])

In [12]:
df['strIngredient6'] = df['strIngredient6'].replace(['Sugar syrup', 'Lemon-lime soda', 'Chambord raspberry liqueur'],
                            ['Simple Syrup', 'Sprite', 'raspberry liqueur'])

In [13]:
df

Unnamed: 0_level_0,strDrink,strGlass,strIngredient1,strIngredient2,strIngredient3,strIngredient4,strIngredient5,strIngredient6,strMeasure1,strMeasure2,strMeasure3,strMeasure4,strMeasure5,strMeasure6
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
0,155 Belmont,White wine glass,Dark rum,Light rum,Vodka,Orange juice,,,1 shot,2 shots,1 shot,1 shot,,
1,1-900-FUK-MEUP,Old-fashioned glass,Black Currant Vodka,Grand Marnier,Raspberry Liquer,Melon liquer,Coconut Rum,Amaretto,1/2 oz,1/4 oz,1/4 oz,1/4 oz,1/4 oz,1/4 oz
2,110 in the shade,Beer Glass,Lager,Tequila,,,,,16 oz,1.5 oz,,,,
3,151 Florida Bushwacker,Beer mug,Coconut Rum,Light rum,151 proof rum,Dark Creme de Cacao,Cointreau,Milk,1/2 oz,1/2 oz,1/2 oz Bacardi,1 oz,1 oz,3 oz
4,252,Shot glass,151 proof rum,Whiskey,,,,,1/2 shot Bacardi,1/2 shot,,,,
5,24k nightmare,Shot glass,Goldschlager,Jägermeister,Peppermint Schnapps,151 proof rum,,,1/2 oz,1/2 oz,1/2 oz,1/2 oz Bacardi,,
6,3 Wise Men,Collins glass,Whiskey,Scotch,Whiskey,,,,1/3 oz,1/3 oz,1/3 oz,,,
7,3-Mile Long Island Iced Tea,Collins Glass,Gin,Light rum,Tequila,Triple sec,Vodka,Coca-Cola,1/2 oz,1/2 oz,1/2 oz,1/2 oz,1/2 oz,1/2 oz
8,410 Gone,Collins Glass,Peach Vodka,Coke,,,,,2-3 oz,,,,,
9,50/50,Collins Glass,Vanilla vodka,Grand Marnier,Orange juice,,,,2 1/2 oz,1 splash,Fill with,,,


We've now renamed as we saw necessary. From here, I transcribed the recipes by hand into the txt files.
The code below was my first pass at data cleaning/engineering before I realized writing the recipes myself was best.

In [1]:
#Turn measurement strings into floats.

In [230]:
for i in range(len(df)):
    if type(df['strMeasure1'][i]) == str:
        if '-' not in df['strMeasure1'][i]:
            if '½' not in df['strMeasure1'][i]:
                test_list = [x for x in df['strMeasure1'][i].translate({ord(y): None for y in 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'}).split()]
                df['strMeasure1'][i] = float(sum(Fraction(s) for s in test_list))

In [231]:
for i in range(len(df)):
    if type(df['strMeasure2'][i]) == str:
        if '-' not in df['strMeasure2'][i]:
            if '½' not in df['strMeasure2'][i]:
                test_list = [x for x in df['strMeasure2'][i].translate({ord(y): None for y in ',abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'}).split()]
                df['strMeasure2'][i] = float(sum(Fraction(s) for s in test_list))

In [232]:
for i in range(len(df)):
    if type(df['strMeasure3'][i]) == str:
        if '-' not in df['strMeasure3'][i]:
            if '½' not in df['strMeasure3'][i]:
                test_list = [x for x in df['strMeasure3'][i].translate({ord(y): None for y in '(),abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'}).split()]
                df['strMeasure3'][i] = float(sum(Fraction(s) for s in test_list))

In [233]:
for i in range(len(df)):
    if type(df['strMeasure4'][i]) == str:
        if '-' not in df['strMeasure4'][i]:
            if '½' not in df['strMeasure4'][i]:
                test_list = [x for x in df['strMeasure4'][i].translate({ord(y): None for y in '(),abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'}).split()]
                df['strMeasure4'][i] = float(sum(Fraction(s) for s in test_list))

In [234]:
for i in range(len(df)):
    if type(df['strMeasure5'][i]) == str:
        if '-' not in df['strMeasure5'][i]:
            if '½' not in df['strMeasure5'][i]:
                test_list = [x for x in df['strMeasure5'][i].translate({ord(y): None for y in '(),abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'}).split()]
                df['strMeasure5'][i] = float(sum(Fraction(s) for s in test_list))

In [235]:
for i in range(len(df)):
    if type(df['strMeasure6'][i]) == str:
        if '-' not in df['strMeasure6'][i]:
            if '½' not in df['strMeasure6'][i]:
                test_list = [x for x in df['strMeasure6'][i].translate({ord(y): None for y in '(),abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'}).split()]
                df['strMeasure6'][i] = float(sum(Fraction(s) for s in test_list))

In [260]:
for i in range(len(df)):
    if type(df['strMeasure3'][i]) == str:
        print(df['strDrink'][i])

Caipirissima
Frappé
Orangeade
Orange Scented Hot Chocolate
Pisco Sour
Rum Runner
Strawberry Lemonade
Thai Coffee


In [261]:
df[df['strDrink'] == 'Caipirissima']

Unnamed: 0_level_0,strDrink,strGlass,strIngredient1,strIngredient2,strIngredient3,strIngredient4,strIngredient5,strIngredient6,strMeasure1,strMeasure2,strMeasure3,strMeasure4,strMeasure5,strMeasure6
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
209,Caipirissima,Collins Glass,Lime,Sugar,Rum,Ice,,,2,2,2-3 oz,0,,


In [251]:
df['strMeasure1'].replace(['2-3 oz', '1-2 shot', '½', '4-5'],
                         [2, 2, 1, 4], inplace = True)

In [None]:
df['strMeasure2'].replace(['6-8 oz', '4-6', '3-4 tsp', '4 oz chopped bittersweet or semi-sweet', '4 oz chopped bittersweet or semi-sweet',
                          '8-10 oz cold', '3/4-1 cup', '1-2 tblsp'],
                         [6, 4, 1, 2, 8, 2, 0.25], inplace = True)

In [None]:
df['strMeasure3'].replace([],
                         [])

In [181]:
df['strMeasure1'].fillna(value = 0, inplace = True)
df['strMeasure2'].fillna(value = 0, inplace = True)
df['strMeasure3'].fillna(value = 0, inplace = True)
df['strMeasure4'].fillna(value = 0, inplace = True)
df['strMeasure5'].fillna(value = 0, inplace = True)
df['strMeasure6'].fillna(value = 0, inplace = True)

In [182]:
volumes = []
for i in range(len(df)):
    volumes.append((float(df.loc[i]['strMeasure1']) + float(df.loc[i]['strMeasure2']) + float(df.loc[i]['strMeasure3']) + 
                   float(df.loc[i]['strMeasure4']) + float(df.loc[i]['strMeasure5']) + float(df.loc[i]['strMeasure6'])))

In [184]:
df['Total Volume'] = volumes

In [186]:
df.head(2)

Unnamed: 0_level_0,strDrink,strGlass,strIngredient1,strIngredient2,strIngredient3,strIngredient4,strIngredient5,strIngredient6,strMeasure1,strMeasure2,strMeasure3,strMeasure4,strMeasure5,strMeasure6,Total Volume
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
0,155 Belmont,White wine glass,Dark rum,Light rum,Vodka,Orange juice,,,4.0,2.0,1.0,1.0,0.0,0.0,8.0
1,1-900-FUK-MEUP,Old-fashioned glass,Black Currant Vodka,Grand Marnier,Raspberry Liquer,Melon liquer,Coconut Rum,Amaretto,0.5,0.25,0.25,0.25,0.25,0.25,1.75


In [205]:
measure1 = []
for i in range(len(df)):
    measure1.append(np.round((df.loc[i]['strMeasure1'] / df.loc[i]['Total Volume']), decimals = 2))
    
df['strMeasure1'] = measure1

  measure1.append(np.round((df.loc[i]['strMeasure1'] / df.loc[i]['Total Volume']), decimals = 2))


In [211]:
measure2 = []
for i in range(len(df)):
    measure2.append(np.round((df.loc[i]['strMeasure2'] / df.loc[i]['Total Volume']), decimals = 2))

df['strMeasure2'] = measure2

  measure2.append(np.round((df.loc[i]['strMeasure2'] / df.loc[i]['Total Volume']), decimals = 2))


In [212]:
measure3 = []
for i in range(len(df)):
    measure3.append(np.round((df.loc[i]['strMeasure3'] / df.loc[i]['Total Volume']), decimals = 2))

df['strMeasure3'] = measure3

  measure3.append(np.round((df.loc[i]['strMeasure3'] / df.loc[i]['Total Volume']), decimals = 2))


In [213]:
measure4 = []
for i in range(len(df)):
    measure4.append(np.round((df.loc[i]['strMeasure4'] / df.loc[i]['Total Volume']), decimals = 2))

df['strMeasure4'] = measure4

  measure4.append(np.round((df.loc[i]['strMeasure4'] / df.loc[i]['Total Volume']), decimals = 2))


In [214]:
measure5 = []
for i in range(len(df)):
    measure5.append(np.round((df.loc[i]['strMeasure5'] / df.loc[i]['Total Volume']), decimals = 2))

df['strMeasure5'] = measure5

  measure5.append(np.round((df.loc[i]['strMeasure5'] / df.loc[i]['Total Volume']), decimals = 2))


In [215]:
measure6 = []
for i in range(len(df)):
    measure6.append(np.round((df.loc[i]['strMeasure6'] / df.loc[i]['Total Volume']), decimals = 2))

df['strMeasure6'] = measure6

  measure6.append(np.round((df.loc[i]['strMeasure6'] / df.loc[i]['Total Volume']), decimals = 2))


In [216]:
df.head(5)

Unnamed: 0_level_0,strDrink,strGlass,strIngredient1,strIngredient2,strIngredient3,strIngredient4,strIngredient5,strIngredient6,strMeasure1,strMeasure2,strMeasure3,strMeasure4,strMeasure5,strMeasure6,Total Volume
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
0,155 Belmont,White wine glass,Dark rum,Light rum,Vodka,Orange juice,,,0.5,0.25,0.12,0.12,0.0,0.0,8.0
1,1-900-FUK-MEUP,Old-fashioned glass,Black Currant Vodka,Grand Marnier,Raspberry Liquer,Melon liquer,Coconut Rum,Amaretto,0.29,0.14,0.14,0.14,0.14,0.14,1.75
2,110 in the shade,Beer Glass,Lager,Tequila,,,,,0.91,0.09,0.0,0.0,0.0,0.0,17.5
3,151 Florida Bushwacker,Beer mug,Coconut Rum,Light rum,151 proof rum,Dark Creme de Cacao,Cointreau,Milk,0.08,0.08,0.08,0.15,0.15,0.46,6.5
4,252,Shot glass,151 proof rum,Whiskey,,,,,0.5,0.5,0.0,0.0,0.0,0.0,1.0


In [217]:
df.to_csv('./EDA.csv')

The following were my thoughts after my initial EDA. Wanted to leave these in

# Initial EDA Summary:

1. The recipes all have different unit of measurements.
    - we will convert all into percentages of total measurement.
    - what matters most in the recipe is the ratio- not the actual measures
    
    
2. I'm unsure how to handle specificity of ingredients.
    - we will remove brand names
    - Dark Rum vs Rum vs Light Rum vs 151 Proof Rum?


3. Garnishes will make percentage conversion messy. Need to fix that.

# From here

1. Create a "Total Volume" column that is the sum of all measurements for each drink
2. Convert each float measurement into their percentage of the Total Volume
3. Start training while continuing the search for more data