### Project 1
#### Summer 2021
**Authors:** GOAT Team (Estaban Aramayo, Ethan Haley, Claire Meyer, and Tyler Frankenburg)

The [Cocktail DB](https://www.thecocktaildb.com/api.php) is a database of cocktails and ingredients. In this assignment, we describe how we could use the Cocktail DB's API to generate a network of cocktails and ingredients. We can use some example data to explore how we might be able to predict outcomes from this data using centrality metrics. 

##### Loading the Data

Without digging too deeply into the intricacies of the Cocktail DB API, we can leverage [this code](https://holypython.com/api-12-cocktail-database/) as a start for grabbing some example output from the API. This code leverages 2 libraries: `requests` to make an API request, and `json` to load the JSON output from the API. We can then iterate through each cocktail output to grab the relevant components. 

We can bulid a search query to pull all cocktail names (`strDrink`), the ingredients for each, and the drink categories (`strCategory`) for each cocktail.

Get all drinks, by first letter of name

In [6]:
import networkx as net
import requests
import json
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")

In [7]:
baseUrl = "https://www.thecocktaildb.com/api/json/v1/1/"
letterEndpoint = "search.php?f="

def searchLetter(letter):
    data = requests.get(baseUrl + letterEndpoint + letter)
    return json.loads(data.text)

In [8]:
# Build a dataframe
name = []  # drink name
ids = []   # drink ID
cat = []   # drink category
pic = []   # thumbnail url
ingr = []  # ingredients

In [9]:
# helper function to parse ingredients
def ingreds(drinkDict):
    ing = []
    for i in range(1,16):  # API has 16 fields for ingredients of each drink, most of them empty/None
        s = "strIngredient" + str(i)
        if not d[s]:
            break
        ing.append(d[s])
    return ing

In [10]:
for l in 'abcdefghijklmnopqrstuvwxyz':
    
    drinks = searchLetter(l)['drinks']
    if not drinks: continue   #(some letters have no drinks)
    for d in drinks:
        name.append(d['strDrink'])
        ids.append(d['idDrink'])
        cat.append(d['strCategory'])
        pic.append(d['strDrinkThumb'])
        ingr.append(ingreds(d))

In [11]:
drinkDF = pd.DataFrame({'name': name,
                       'id': ids,
                       'category': cat,
                       'photoURL': pic,
                       'ingredients': ingr})
drinkDF

Unnamed: 0,name,id,category,photoURL,ingredients
0,A1,17222,Cocktail,https://www.thecocktaildb.com/images/media/dri...,"[Gin, Grand Marnier, Lemon Juice, Grenadine]"
1,ABC,13501,Shot,https://www.thecocktaildb.com/images/media/dri...,"[Amaretto, Baileys irish cream, Cognac]"
2,Ace,17225,Cocktail,https://www.thecocktaildb.com/images/media/dri...,"[Gin, Grenadine, Heavy cream, Milk, Egg White]"
3,Adam,17837,Ordinary Drink,https://www.thecocktaildb.com/images/media/dri...,"[Dark rum, Lemon juice, Grenadine]"
4,AT&T,13938,Ordinary Drink,https://www.thecocktaildb.com/images/media/dri...,"[Absolut Vodka, Gin, Tonic water]"
...,...,...,...,...,...
412,Zima Blaster,17027,Ordinary Drink,https://www.thecocktaildb.com/images/media/dri...,"[Zima, Chambord raspberry liqueur]"
413,Zizi Coin-coin,14594,Punch / Party Drink,https://www.thecocktaildb.com/images/media/dri...,"[Cointreau, Lemon juice, Ice, Lemon]"
414,Zippy's Revenge,14065,Cocktail,https://www.thecocktaildb.com/images/media/dri...,"[Amaretto, Rum, Kool-Aid]"
415,Zimadori Zinger,15801,Punch / Party Drink,https://www.thecocktaildb.com/images/media/dri...,"[Midori melon liqueur, Zima]"


Save the df so as not to have to make 26 API calls every time notebook opens

In [12]:
drinkDF.to_csv('drinkDF.csv')

##### Make a bipartite graph, with cocktails as one type and ingredients as the other.

In [13]:
from networkx.algorithms import bipartite

drinks = set(drinkDF.name.values)
ingreds = set(i for iList in drinkDF.ingredients.values for i in iList)

B = net.Graph()
B.add_nodes_from(drinks, bipartite='Cocktail')
B.add_nodes_from(ingreds, bipartite='Ingredient')

for d in range(len(drinkDF)):
    B.add_node(name[d], category=cat[d])
    for ing in ingr[d]:
        B.add_edge(name[d], ing)

Then the 2 bipartite projection graphs are these:

In [14]:
D = bipartite.weighted_projected_graph(B, drinks)
I = bipartite.weighted_projected_graph(B, ingreds)

In [15]:
D.nodes()

['Lord And Lady',
 'Aviation',
 'Caipirinha',
 'The Laverstoke',
 'Tequila Fizz',
 'Martinez 2',
 'Caipirissima',
 'Avalon',
 'Thai Iced Coffee',
 'Wine Punch',
 'Pysch Vitamin Light',
 'Iced Coffee Fillip',
 'Kir Royale',
 'English Highball',
 'Death in the Afternoon',
 'Egg Nog #4',
 'Gin Daisy',
 'John Collins',
 'Port Wine Cocktail',
 'Hawaiian Cocktail',
 'Mulled Wine',
 'Ipamena',
 'Bellini',
 'Ice Pick',
 'Kentucky Colonel',
 'New York Lemonade',
 'Kentucky B And B',
 'Rum Sour',
 'The Philosopher',
 'New York Sour',
 'Midnight Manx',
 'Campari Beer',
 'Scotch Sour',
 'Gin Sling',
 'Lady Love Fizz',
 'Michelada',
 'Vesper',
 'Blackthorn',
 'Grizzly Bear',
 'Applecar',
 'Rose',
 'Frappé',
 "Quaker's Cocktail",
 'Penicillin',
 'Zima Blaster',
 'Godfather',
 'Jam Donut',
 'Raspberry Cooler',
 'Quick F**K',
 'Jello shots',
 'Gin Toddy',
 'Orgasm',
 'Chocolate Milk',
 'Clove Cocktail',
 'Golden dream',
 'Moranguito',
 'Pina Colada',
 'Cherry Rum',
 'Queen Charlotte',
 'Green Goblin',

##### Calculate Centrality by Category

From the project doc: For each of the nodes in the dataset, calculate degree centrality and eigenvector centrality.
Compare your centrality measures across your categorical groups.

In [16]:
drinks_cocktails = drinkDF[(drinkDF['category']=="Cocktail")]
drinks_cocktails.head()

Unnamed: 0,name,id,category,photoURL,ingredients
0,A1,17222,Cocktail,https://www.thecocktaildb.com/images/media/dri...,"[Gin, Grand Marnier, Lemon Juice, Grenadine]"
2,Ace,17225,Cocktail,https://www.thecocktaildb.com/images/media/dri...,"[Gin, Grenadine, Heavy cream, Milk, Egg White]"
12,Addison,17228,Cocktail,https://www.thecocktaildb.com/images/media/dri...,"[Gin, Vermouth]"
16,Aviation,17180,Cocktail,https://www.thecocktaildb.com/images/media/dri...,"[Gin, lemon juice, maraschino liqueur]"
21,Afterglow,12560,Cocktail,https://www.thecocktaildb.com/images/media/dri...,"[Grenadine, Orange juice, Pineapple juice]"


In [17]:
drinks_shots = drinkDF[(drinkDF['category']=="Shot")]
drinks_shots.head()

Unnamed: 0,name,id,category,photoURL,ingredients
1,ABC,13501,Shot,https://www.thecocktaildb.com/images/media/dri...,"[Amaretto, Baileys irish cream, Cognac]"
5,ACID,14610,Shot,https://www.thecocktaildb.com/images/media/dri...,"[151 proof rum, Wild Turkey]"
25,B-53,13332,Shot,https://www.thecocktaildb.com/images/media/dri...,"[Kahlua, Sambuca, Grand Marnier]"
26,B-52,15853,Shot,https://www.thecocktaildb.com/images/media/dri...,"[Baileys irish cream, Grand Marnier, Kahlua]"
29,Big Red,13222,Shot,https://www.thecocktaildb.com/images/media/dri...,"[Irish cream, Goldschlager]"


In [18]:
drinks_c = set(drinks_cocktails.name.values)
ingreds_c = set(i for iList in drinks_cocktails.ingredients.values for i in iList)

dc_graph = net.Graph()

dc_graph.add_nodes_from(drinks_c, bipartite='Cocktail')
dc_graph.add_nodes_from(ingreds_c,bipartite='Ingredient')

for d in range(len(drinks_c)):
    dc_graph.add_node(name[d], category=cat[d])
    for ing in ingr[d]:
        dc_graph.add_edge(name[d], ing)

In [19]:
cocktail_degree = net.degree_centrality(dc_graph)
cocktail_degree = pd.DataFrame.from_dict(cocktail_degree, orient='index').reset_index()

In [20]:
cocktail_degree.head()

Unnamed: 0,index,0
0,Aviation,0.008043
1,Zombie,0.0
2,The Laverstoke,0.0
3,Dirty Martini,0.013405
4,Tipperary,0.0


In [21]:
drinks_s = set(drinks_shots.name.values)
ingreds_s = set(i for iList in drinks_shots.ingredients.values for i in iList)

ds_graph = net.Graph()

ds_graph.add_nodes_from(drinks_s, bipartite='Cocktail')
ds_graph.add_nodes_from(ingreds_s,bipartite='Ingredient')

for d in range(len(drinks_s)):
    ds_graph.add_node(name[d], category=cat[d])
    for ing in ingr[d]:
        ds_graph.add_edge(name[d], ing)

In [28]:
def sorted_map(dd: dict) -> dict:
    """
    Sorts dict by its values (desc)
    
    :param dd: dictionary with numeric values
    :return sorted dictionary ordered by its numeric value
    """
    sorted_dict = sorted(dd.items(), key=lambda x: (-x[1], x[0]))
    return sorted_dict

In [22]:
shots_degree = net.degree_centrality(ds_graph)
shots_degree = pd.DataFrame.from_dict(shots_degree, orient='index').reset_index()

In [23]:
shots_degree.head()

Unnamed: 0,index,0
0,Moranguito,0.0
1,Bumble Bee,0.0
2,Fahrenheit 5000,0.0
3,Shot-gun,0.0
4,Flaming Dr. Pepper,0.0


In [72]:
cocktail_eig = net.eigenvector_centrality_numpy(dc_graph)
cocktail_eig = pd.DataFrame.from_dict(cocktail_eig, orient='index').reset_index()

In [73]:
shots_eig = net.eigenvector_centrality_numpy(ds_graph)
shots_eig = pd.DataFrame.from_dict(shots_eig, orient='index').reset_index()

##### Compare Centrality Measures across Categories

Then we can create summary dataframes of both centrality measures, and sort by each to see top 10 cocktails, shots, and ordinary drinks by each centrality measure.

In [74]:
summary_shots = pd.merge(shots_degree, shots_eig, how = "inner", on = "index")
summary_shots = summary_shots.rename(columns = 
        {"index":"Name","0_x":"Degree Centrality","0_y":"Eigenvector Centrality"})
summary_shots.head()

Unnamed: 0,Name,Degree Centrality,Eigenvector Centrality
0,Moranguito,0.0,3.958161e-18
1,Bumble Bee,0.0,-4.471129e-19
2,Fahrenheit 5000,0.0,-2.017609e-18
3,Shot-gun,0.0,-7.002405999999999e-19
4,Flaming Dr. Pepper,0.0,1.144746e-18


In [75]:
summary_cocktails = pd.merge(cocktail_degree, cocktail_eig, how = "inner", on = "index")
summary_cocktails = summary_cocktails.rename(columns = 
        {"index":"Name","0_x":"Degree Centrality","0_y":"Eigenvector Centrality"})
summary_cocktails.head()

Unnamed: 0,Name,Degree Centrality,Eigenvector Centrality
0,Aviation,0.008043,0.1036203
1,Zombie,0.0,-1.06264e-20
2,The Laverstoke,0.0,8.29038e-18
3,Dirty Martini,0.013405,0.06715205
4,Tipperary,0.0,-1.430504e-18


In [76]:
summary_cocktails.sort_values(by=['Degree Centrality'],ascending=False).head()

Unnamed: 0,Name,Degree Centrality,Eigenvector Centrality
187,Gin,0.048257,0.481058
174,Lemon juice,0.032172,0.260864
109,Vodka,0.02681,0.099439
150,Dry Vermouth,0.024129,0.137925
194,Lime,0.024129,0.07945


In [77]:
summary_cocktails.sort_values(by=['Eigenvector Centrality'],ascending=False).head()

Unnamed: 0,Name,Degree Centrality,Eigenvector Centrality
187,Gin,0.048257,0.481058
174,Lemon juice,0.032172,0.260864
287,Boxcar,0.013405,0.226975
324,Casino,0.013405,0.191902
347,Casino Royale,0.013405,0.18828


In [78]:
summary_shots.sort_values(by=['Degree Centrality'],ascending=False).head()

Unnamed: 0,Name,Degree Centrality,Eigenvector Centrality
89,Gin,0.069182,0.526822
158,Bluebird,0.037736,0.261762
118,Acapulco,0.037736,0.093719
137,Allegheny,0.031447,0.084085
91,Ace,0.031447,0.224822


In [79]:
summary_shots.sort_values(by=['Eigenvector Centrality'],ascending=False).head()

Unnamed: 0,Name,Degree Centrality,Eigenvector Centrality
89,Gin,0.069182,0.526822
152,Boxcar,0.031447,0.291843
158,Bluebird,0.037736,0.261762
91,Ace,0.031447,0.224822
72,Grenadine,0.031447,0.221402


In [86]:
cocktail_degree_mean = summary_cocktails["Degree Centrality"].mean()
cocktail_eig_mean = summary_cocktails["Eigenvector Centrality"].mean()
cocktail_degree_median = summary_cocktails["Degree Centrality"].median()
cocktail_eig_median = summary_cocktails["Eigenvector Centrality"].median()

In [87]:
shot_degree_mean = summary_shots["Degree Centrality"].mean()
shot_eig_mean = summary_shots["Eigenvector Centrality"].mean()
shot_degree_median = summary_shots["Degree Centrality"].median()
shot_eig_median = summary_shots["Eigenvector Centrality"].median()

In [89]:
print("cocktail_degree_mean: ",cocktail_degree_mean)
print("cocktail_eig_mean: ",cocktail_eig_mean)
print("cocktail_degree_median: ",cocktail_degree_median)
print("cocktail_eig_median: ",cocktail_eig_median)
print("shot_degree_mean: ",shot_degree_mean)
print("shot_eig_mean: ",shot_eig_mean)
print("shot_degree_median: ",shot_degree_median)
print("shot_eig_median: ",shot_eig_median)

cocktail_degree_mean:  0.004516064285816692
cocktail_eig_mean:  0.022352580651994344
cocktail_degree_median:  0.002680965147453083
cocktail_eig_median:  0.0015058726209212026
shot_degree_mean:  0.009512578616352203
shot_eig_mean:  0.037545222527177055
shot_degree_median:  0.006289308176100629
shot_eig_median:  0.00432482036442818


In [90]:
# to add - ordinary drink category, clean up, and interpretation