The below code is written to wrangle the **"Australian Food Composition Database"**: https://www.foodstandards.gov.au/science/monitoringnutrients/afcd/Pages/downloadableexcelfiles.aspx

The database consists of five datasets as follows: 

- Food Details.
- Food Nutrient.
- Food Measure.
- Recipes File.
- Food Retention Factor.

The datasets are cleaned and prepared to be used on our website Food Save Hero.

The <u>primary key</u> that links the five datasets is the Public Food Key.

In [127]:
#import required library
import pandas as pd

## 1.Food Measure

In [44]:
#### Data Cleaning 
measures=pd.read_excel('Release 1 - Measures file.xlsx')

In [45]:
measures.shape

(375, 13)

In [46]:
#view the first few rows
measures.head(2)

Unnamed: 0,Public Food Key,Food Survey ID,Food Name,Measure ID,Quantity,Measure description 1,Measure description 2,Measure description 3,Measure description 4,Weight in grams,Volume in mLs,Derivation,Measure derivation description
0,F000996,29101,"Beer, high alcohol (5% v/v & above)",74551,1,can,,,,379.0,375,Estimated,Assumes 375mL based on FSANZ internet search. ...
1,F000994,29101,"Beer, full strength (alcohol 4-4.9% v/v)",74554,1,can,,,,379.0,375,Estimated,Assumes 375mL based on FSANZ internet search. ...


In [49]:
#drop unqanted columns 
measures=measures.drop(['Measure description 2', 'Measure description 3','Measure description 4',
               'Derivation','Measure derivation description'], axis=1)

In [50]:
measures

Unnamed: 0,Public Food Key,Food Survey ID,Food Name,Measure ID,Quantity,Measure description 1,Weight in grams,Volume in mLs
0,F000996,29101,"Beer, high alcohol (5% v/v & above)",74551,1,can,379.0,375
1,F000994,29101,"Beer, full strength (alcohol 4-4.9% v/v)",74554,1,can,379.0,375
2,F000995,29101,"Beer, full strength (alcohol 4-4.9% v/v), carb...",74972,1,can,379.0,375
3,F001006,29102,"Beer, mid-strength (alcohol 3-3.9% v/v)",74971,1,can,379.0,375
4,F001004,29102,"Beer, light (alcohol 1- <3% v/v)",74558,1,can,379.0,375
...,...,...,...,...,...,...,...,...
370,F009775,24702,"Zucchini, green skin, fresh, unpeeled, raw",74966,1,zucchini,195.0,0
371,F009774,24702,"Zucchini, green skin, fresh, unpeeled, fried, ...",74967,1,zucchini,137.0,0
372,F009773,24702,"Zucchini, green skin, fresh, unpeeled, boiled,...",74968,1,zucchini,164.0,0
373,F009766,24702,"Zucchini, golden, fresh, unpeeled, raw",74969,1,zucchini,195.0,0


#pip install xlrd

## 2.Food Nutrient

This dataset contains different food with the nutrient values for each kind. The dataset can be used to provide the website users with the food nutrient informatin. The final version of the file consists of 178 nutrient factors. These factors can be ranked according to importance and relevance to user when the dataset is in use. 


In [108]:
#open the file and specify the required sheet 
nutrient = pd.read_excel('Release 1 - Food nutrient database.xlsx', 
                                sheet_name='All solids & liquids per 100g')

In [112]:
#view the shape for the dataframe 
nutrient.shape

(1535, 252)

In [117]:
nutrient.head(2)

Unnamed: 0,Public Food Key,Classification,Food Name,"Energy, with dietary fibre","Energy, without dietary fibre",Moisture (water),Protein,Nitrogen,Total Fat,Ash,...,Proline,Unnamed: 243,Serine,Unnamed: 245,Threonine,Unnamed: 247,Tyrosine,Unnamed: 249,Valine,Unnamed: 251
1,F002258,31302.0,"Cardamom seed, dried, ground",1236,1012,8.3,10.8,1.72,6.7,5.8,...,,,,,,,,,,
2,F002893,31302.0,"Chilli (chili), dried, ground",1280,1002,10.8,13.4,2.14,14.3,11.8,...,,,,,,,,,,


In [116]:
#drop the first row 
nutrient = nutrient.drop(0)

In [118]:
#drop columns with NaN as all values 
nutrient=nutrient.dropna(axis='columns',how='all')

In [119]:
nutrient

Unnamed: 0,Public Food Key,Classification,Food Name,"Energy, with dietary fibre","Energy, without dietary fibre",Moisture (water),Protein,Nitrogen,Total Fat,Ash,...,Proline,Unnamed: 243,Serine,Unnamed: 245,Threonine,Unnamed: 247,Tyrosine,Unnamed: 249,Valine,Unnamed: 251
1,F002258,31302.0,"Cardamom seed, dried, ground",1236,1012,8.3,10.8,1.72,6.7,5.8,...,,,,,,,,,,
2,F002893,31302.0,"Chilli (chili), dried, ground",1280,1002,10.8,13.4,2.14,14.3,11.8,...,,,,,,,,,,
3,F002963,31302.0,"Cinnamon, dried, ground",1004,579,10.6,4,0.64,1.2,3.6,...,,,,,,,,,,
4,F002970,31302.0,"Cloves, dried, ground",1389,1118,9.9,6,0.96,13,5.6,...,,,,,,,,,,
5,F003190,31302.0,"Coriander seed, dried, ground",1344,1009,8.9,12.4,1.98,17.8,6,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1530,F009774,24702.0,"Zucchini, green skin, fresh, unpeeled, fried, ...",92,80,92.6,1.3,0.21,0.4,0.6,...,,32,,81,,39,,28,,53
1531,F009773,24702.0,"Zucchini, green skin, fresh, unpeeled, boiled,...",77,66,93.8,1.1,0.17,0.4,0.5,...,,26,,67,,33,,24,,44
1532,F009766,24702.0,"Zucchini, golden, fresh, unpeeled, raw",78,70,94.1,2.2,0.35,0.3,0.6,...,154,54,393,138,191,67,138,49,259,91
1533,F009765,24702.0,"Zucchini, golden, fresh, unpeeled, fried, no a...",111,100,91.6,3.1,0.5,0.4,0.9,...,,77,,198,,96,,69,,130


In [120]:
#drop unnamed columns 
#create a conditin to drop columns
dropList = [i for i in nutrient.columns if i.startswith('Unnamed')]

In [121]:
#drop the columns using the dropList columns
nutrient.drop(dropList,axis=1,inplace=True)

In [122]:
#view the shape of the dataframe
nutrient.shape

(1534, 181)

In [125]:
nutrient.head(2)

Unnamed: 0,Public Food Key,Classification,Food Name,"Energy, with dietary fibre","Energy, without dietary fibre",Moisture (water),Protein,Nitrogen,Total Fat,Ash,...,Isoleucine,Leucine,Lysine,Methionine,Phenylalanine,Proline,Serine,Threonine,Tyrosine,Valine
1,F002258,31302.0,"Cardamom seed, dried, ground",1236,1012,8.3,10.8,1.72,6.7,5.8,...,,,,,,,,,,
2,F002893,31302.0,"Chilli (chili), dried, ground",1280,1002,10.8,13.4,2.14,14.3,11.8,...,,,,,,,,,,
