# Exploratory Data Analysis for the Food Nutrients Data

## Description

Clean fetched FDC datasets contatining nutritional information about various products and explore them.

## Table of Contents

## Results summary

## Imports

In [263]:
import os
import sys

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

sys.path.append(os.path.abspath('..'))

from src.data.utils import *

## Data source

The data was obtained from [Food Data Central](https://fdc.nal.usda.gov/) (Department of Agriculture, USA). The database contains nutritional information about various products and is divided into a few sections (Foundational Foods, most recent; SR Legacy, Survey and Branded Foods). The data was fetched using "SR Legacy" type, because it contains more products than "Foundational Foods". More complete decription of the database and its contents can be found in [Foundation Foods documentation](https://fdc.nal.usda.gov/docs/Foundation_Foods_Documentation_Apr2023.pdf).

## Outline of data being explored

There are 8 datasets stored in .csv files in /FoodDataAnalysis/data/raw/ directory.

Each of them contains data for particular kind of food (fish, vegetable, fruit etc.). \
Products in each food category are characterized by 7 variables:
- Energy (kcal)
- Protein (g)
- Fat (g)
- Carbohydrate (g)
- Potassium (mg)
- Calcium (mg)
- Magnesium (mg)

Datasets ready for analysis are stored in /FoodDataAnalysis/data/processed/ directory.

## Analysis

### Read & clean the data 

In [459]:
data = load_data('../data/raw/')
data = choose_foods(data) # Datasets obtained via API contain much unnecessary information (like highly proccesed foods), that need to be filtered
data = add_food_type(data)
food_df = merge_data(data)
food_df['Description'] = food_df['Description'].str.replace(r"\(Includes foods for USDA's Food Distribution Program\)",'',regex=True)
save_final_datasets(data,'../data/processed/')

### EDA

Unnamed: 0,Description,Energy,Protein,Total lipid (fat),"Carbohydrate, by difference","Potassium, K","Calcium, Ca","Magnesium, Mg",Category
0,"Salsify, (vegetable oyster), raw",82.0,3.30,0.20,18.60,380.0,60.0,23.0,vegetable
1,"Alfalfa seeds, sprouted, raw",23.0,3.99,0.69,2.10,79.0,32.0,27.0,vegetable
2,"Amaranth leaves, raw",23.0,2.46,0.33,4.02,611.0,215.0,55.0,vegetable
3,"Arrowhead, raw",99.0,5.33,0.29,20.20,922.0,10.0,51.0,vegetable
4,"Arrowroot, raw",65.0,4.24,0.20,13.40,454.0,6.0,25.0,vegetable
...,...,...,...,...,...,...,...,...,...
194,"Winged beans, immature seeds, raw",49.0,6.95,0.87,4.31,223.0,84.0,34.0,vegetable
195,"Yam, raw",118.0,1.53,0.17,27.90,816.0,17.0,21.0,vegetable
196,"Yambean (jicama), raw",38.0,0.72,0.09,8.82,150.0,12.0,12.0,vegetable
197,"Yardlong bean, raw",47.0,2.80,0.40,8.35,240.0,50.0,44.0,vegetable


## Conclusions