## CITS4403 Project Notebook 2
### FooDB data extraction
The following code is an example of how fooDB was used to extract the information we used to unify our datasets and create all the necessary mappings for the ingredients to properly create the networks.

In [None]:
import json
import pandas
import random

random.seed(0)

# Input Files
PATH = "data/foodb_2020_04_07_csv/"
contentsFileName = PATH + "Content.csv"
ingredientFileName = PATH + "Food.csv"
compoundFileName = PATH + "Compound.csv"

# Output Files
adjListFilePath = "data/compounds/compounds.json"
ingredientListFilePath = "data/notebook-3-data/ingredients.json"
nameMapFilePath = "data/notebook-3-data/common-names.json"
CategoryListFilePath = "data/categoryList.json"
CategoryMapFilePath = "data/categoryMap.json"
subCategorySetFilePath = "data/subCategorySet.json"
subCategoryMapFilePath = "data/subCategoryMap.json"
colourMapFilePath = "data/colourMap.json"
catColourMapFilePath = "data/catColourMap.json"
fullColourMapFilePath = "data/fullColourMap.json"

To deal with the database, we need to be able to map between ingredient ids and ingredient names. 

In [None]:
# extract list of all ingredient names
with open(ingredientFileName) as ingredientFile:
    ingredientData = pandas.read_csv(ingredientFile, low_memory=False)
    ingredientMap = {ingredient["id"]:ingredient["name"] for index, ingredient in ingredientData.iterrows()}
print(ingredientData)

Similarly, we need to be able to map between compound ids and compound names. 

In [None]:
with open(compoundFileName) as compoundFile:
    compoundData = pandas.read_csv(compoundFile, low_memory=False)
    compoundMap = {compound["id"]:compound["name"] for index, compound in compoundData.iterrows()}
print(compoundData)

Fetching the fooDB subcategory list json file to use as ingredient categories.

The FooDB categories were not specific enough, for example the categories do not differentiate between vegetables and mushrooms whereas the subscategories do differentiate.

In [None]:
subCategorySet = list(set([entry["food_subgroup"] for index, entry in ingredientData.iterrows()]))
with open(subCategorySetFilePath, "w") as outSubCategorySetFile:
    json.dump(subCategorySet, outSubCategorySetFile)

Creating the sub-category mapping json to determine ingredient categories.

In [None]:
# Create Sub-Category Map
subCategoryMap ={entry["name"]:entry["food_subgroup"] for index, entry in ingredientData.iterrows()}
with open(subCategoryMapFilePath, "w") as outSubCategoryMapFile:
    json.dump(subCategoryMap, outSubCategoryMapFile)

Creating a category to colour map to be used in the visualization notebook.

In [None]:
def random_colour():
    return "#"  + "".join([random.choice("0123456789ABCDEF") for halfbyte in range(6)])

colourMap = {subCategory:random_colour() for subCategory in subCategorySet}
with open(catColourMapFilePath, "w") as outCatColourMapFile:
    json.dump(colourMap, outCatColourMapFile)

foodColourMap = {entry["name"]:colourMap[entry["food_subgroup"]] for index, entry in ingredientData.iterrows()}
with open(colourMapFilePath, "w") as outColourMapFile:
    json.dump(foodColourMap, outColourMapFile)

colourMap.update(foodColourMap)
with open(fullColourMapFilePath, "w") as outColourMapFile:
    json.dump(colourMap, outColourMapFile)

Create Ingredient to Compound Mapping to determine ingredient compounds

**This can take several minutes - it generates 150mb of data**

In [None]:
# Map Ingredient ID <-> Compound IDs Edge List
with open(contentsFileName) as inFile:
    contentsData = pandas.read_csv(inFile, low_memory=False)
print(contentsData)

edgeList = [(row["food_id"], compoundMap[row["source_id"]])
            for index, row in contentsData.iterrows()
            if row["source_type"] == "Compound" and row["source_id"] in compoundMap]

# Create Adjacency List
adjList = dict()
for edge in edgeList:
    if not edge[0] in ingredientMap:
        print("Could not find food in map", edge)
        continue

    ingrName = ingredientMap[edge[0]]
    if (ingrName in adjList):
        adjList[ingrName].append(edge[1])
    else:
        adjList[ingrName] = [edge[1]]
    
# Write Out Adjacency List
with open(adjListFilePath, "w") as outAdjList:
    json.dump(adjList, outAdjList)

Create Default Common Ingredient Name to FooDB Ingredient Name Mapping for Database Matching in Notebook 3

**This can take several minutes**

In [None]:
## Create Default Mapping
nameMap = {entry["orig_food_common_name"]:ingredientMap[entry["food_id"]]
           for index, entry in contentsData.iterrows()
           if entry["food_id"] in ingredientMap and entry["source_id"] in compoundMap}
with open(nameMapFilePath, "w") as outNameFile:
    json.dump(nameMap, outNameFile)