# 🍊 Orange is the new black

* [Visualisation Plotly](#📉-Visualisation-Plotly)
    * [Répartition des jus d'orange par Nutriscore](#Répartition-des-jus-d'orange-par-Nutriscore)
    * [Ratio sucre / kcal](#Ratio-sucre-/-kilocalorie)
    * [Taux de sucre pour 100 gr](#Taux-de-sucre-pour-100gr)
* [Exploration](#🔍-Exploration)
    * [Marques les plus représentées](#Marques-les-plus-représentées)
    * [Les jus d'orange sans sucre ajouté, colorants et conservateurs](#Les-jus-d'orange-sans-sucre-ajouté,-sans-colorants,-sans-conservateurs)
    * [Jus d'orange avec nutriscore A](#Les-jus-d'orange-avec-Nutriscore-A)
    * [Jus d'orange avec le plus de vitamine C](#Les-jus-d'orange-contenant-le-plus-de-vitamine-C)

Import des librairies

In [21]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px

from sklearn.preprocessing import MinMaxScaler

import warnings
warnings.filterwarnings("ignore")

# 📉 Visualisation Plotly

In [108]:
path = '../data/product.csv'

def import_csv(path):
    # import csv file from cleaning notebook
    df = pd.read_csv(path)


In [109]:
df = import_csv(path)

# <b>Répartition des jus d'orange par Nutriscore</b>

In [23]:
# delete product with nutriscore = 0
df_nutriscore = df[df.nutriscore_grade != 0]

In [24]:
grades = df_nutriscore.nutriscore_grade.value_counts()

In [25]:
grades

c    286
b     37
e     18
d     13
a      2
Name: nutriscore_grade, dtype: int64

In [31]:
# visualisation
fig = px.bar(df_nutriscore.nutriscore_grade, labels={
                     "nutriscore_grade": "Nutriscore",
                     "count": "Nombre de produits",
                 },
                title="Répartition des jus d'orange par Nutriscore",
             category_orders={"nutriscore_grade":['a','b','c','d','e']})

fig.show()

# <b>Ratio sucre / kilocalorie</b>

In [32]:
# nouveau dataframe avec nos targets
df_visu1 = df[['sugars_100g', 'energy-kcal_100g']]

In [33]:
# standardisation de sugars_100g et energy-kcal_100g pour obtenir la même échelle
df_visu1 = MinMaxScaler().fit_transform(df_visu1)

In [34]:
# conversion du numpy array en dataframe
df_visu1 = pd.DataFrame(df_visu1, columns=['Sucre', 'Kcal'])

In [35]:
# visualisation
sucre = go.Histogram(
    x=df_visu1.Sucre,
    opacity=0.75,
    name = "Sucre",
    marker=dict(color='rgba(171, 50, 96, 0.6)'))
kcal = go.Histogram(
    x=df_visu1.Kcal,
    opacity=0.75,
    name = "Kcal",
    marker=dict(color='rgba(12, 50, 196, 0.6)'))

data = [sucre, kcal]
layout = go.Layout(barmode='overlay',
                   title='Ratio sucre / kilocalorie',
                   xaxis=dict(title='Ratio sucre / kcal'),
                   yaxis=dict( title='Nombres'))
fig = go.Figure(data=data, layout=layout)
fig.show()

# <b>Taux de sucre pour 100gr</b>

In [36]:
# delete product with sugars_100g = 0
df_sugars = df[df.sugars_100g != 0]

In [37]:
df_sugars['sugars_100g'].value_counts().sort_values(ascending=False)

8.7     79
10.0    51
9.0     42
11.0    23
8.0     17
        ..
7.2      1
10.6     1
6.0      1
10.6     1
7.9      1
Name: sugars_100g, Length: 77, dtype: int64

In [40]:
# create 5 bins starting with 0 up to 50
bins = np.arange(0, 30, 5)

# use pd.cut to create the bins
df['sugars_100g'] = pd.cut(df['sugars_100g'], bins, include_lowest=True)

# pd.cut creates an interval category which is sorted from lowest bin to the greatest bin
df['sugars_100g'].cat.categories

# count the values in each bin. Bins are sorted based on the occurance (from most populated to the least one)
agg = df['sugars_100g'].value_counts()

# sort the values according to the bins (`sort_index`), turn into data frame (`to_frame`) and reset index
agg = agg.sort_index().to_frame().reset_index()

# rename index (containing the bin range to bins)
agg.rename(columns={"index":"bins"}, inplace=True)

# Plotly cannot work with categories index, so we need to turn it into string
agg["bins"] = agg["bins"].astype("str")

agg

TypeError: '<' not supported between instances of 'int' and 'pandas._libs.interval.Interval'

In [41]:
#pie chart
fig = px.pie(agg, values='sugars_100g', names='bins', title="Répartition des produits avec taux de sucre entre 0,0001 et 50 grammes")
fig.show()

- Il y a 83 jus d'orange (soit 16 %) possédant un taux de sucre pour 100gr entre 0 et 5 gr
- La majorité (372 produits soit 72%) possède un taux de sucre pour 100gr  entre 5 et 10 gr

# 🔍 Exploration

# <b>Marques les plus représentées</b>

In [43]:
def marques_df(df):
    # define dataset
    df_brands = df
    # delete product with brands = 0
    df_brands = df_brands[df_brands.brands != 0]

    return df_brands['brands'].value_counts(ascending=False)


In [44]:
marques_df(df)

U                                                  38
Jafaden,Marque Repère                              21
Auchan                                             19
Leader Price                                       18
Paquito                                            13
                                                   ..
Monop' Daily,Monoprix                               1
Auchan,L'oiseau,Auchan Production,Groupe Auchan     1
Naturalia                                           1
Plein Fruit                                         1
Cidou,Casino                                        1
Name: brands, Length: 162, dtype: int64

<b>Les 5 marques de jus d'orange les plus representées dans l'échantillon sont des marques distributeurs : </b>
- U (Super U)
- Jafadan / Marque Repère (Leclerc)
- Auchan
- Leader Price
- Paquito (Intermarché)

<img src="../img/brands_logo.jpg">

# <b>Les jus d'orange sans sucre ajouté, sans colorants, sans conservateurs | BIO <b>

In [64]:
def products_bio(df):
    # define dataset with brands != 0
    df_juice = df[df.labels != 0]

    # isolate all rows with labels "Sans sucre ajouté, Sans colorants, Sans conservateurs"
    df_juice = df_juice[df_juice.labels.str.contains("Sans sucre ajouté, Sans colorants, Sans conservateurs", na=False)]

    return  df_juice

In [65]:
products_bio(df)

Unnamed: 0,product_name,quantity,nutrition-score-fr_100g,nutriscore_score,nutriscore_grade,brands,origins,ingredients_text,countries,labels,...,fat_100g,saturated-fat_100g,carbohydrates_100g,sugars_100g,sodium_100g,additives,vitamin-c_100g,nova_group,pnns_groups_1,pnns_groups_2
46,jus d'orange,50 cl,,,,Leclerc,,oranges,France,"Sans sucre ajouté, Sans colorants, Sans conser...",...,,,,,,,,1.0,Beverages,Fruit juices
73,jus orange sans pulpe,1 l,3.0,3.0,c,Andros,,Jus d'orange,France,"Sans sucre ajouté, Sans colorants, Sans conser...",...,0.1,0.0,9.1,"(5.0, 10.0]",0.004,,,1.0,Beverages,Fruit juices
75,100% pur jus oranges pressées,,5.0,5.0,c,Andros,,,France,"Sans sucre ajouté, Sans colorants, Sans conser...",...,0.1,0.0,11.0,"(10.0, 15.0]",0.0,,,,Beverages,Fruit juices
90,Joker le pur jus orange du matin 1L,1litre,4.0,4.0,c,Joker le pur jus,Brésil,,France,"Sans sucre ajouté, Sans colorants, Sans conser...",...,0.0,0.0,9.8,"(5.0, 10.0]",0.004,,,,Beverages,Fruit juices
135,Pressade pur jus orange,,,,,Pressade,,,France,"Sans sucre ajouté, Sans colorants, Sans conser...",...,0.0,0.0,10.2,"(5.0, 10.0]",,,,,Beverages,Fruit juices
137,Jus d'orange 100% pur jus bio,75 cL,,,,Pressade,,jus d'orange issu de l'agriculture biologique,France,"Bio, Bio européen, AB Agriculture Biologique, ...",...,0.1,,10.2,"(5.0, 10.0]",0.0,,0.02,1.0,Beverages,Fruit juices
138,Lot 6 Recre 100% jus d'orange 20CL,,10.0,10.0,e,,,,France,"Sans sucre ajouté, Sans colorants, Sans conser...",...,0.0,0.0,16.0,"(10.0, 15.0]",0.004,,,,Beverages,Fruit juices
144,Pur jus d'orange,1 l,3.0,3.0,c,U,Hors France,Jus d'orangE 100%.,France,"Peu ou pas de sucre, en:FSC, Sans sucre ajouté...",...,0.1,0.0,11.0,"(5.0, 10.0]",0.0,,0.02,1.0,Beverages,Fruit juices
149,Pur jus d'orange sans pulpe,1 l,4.0,4.0,c,"U Bio, U",Espagne,Jus d'orange sans pulpe*. *ingrédient issu de ...,France,"Bio, Bio européen, AB Agriculture Biologique, ...",...,0.5,0.1,11.0,"(5.0, 10.0]",0.004,,0.02,1.0,Beverages,Sweetened beverages
274,Pur jus d’orange bio,,13.0,13.0,e,Feria,,,France,"Bio, Sans sucre ajouté, Sans colorants, Sans c...",...,0.18,0.0,9.2,"(5.0, 10.0]",0.0,,,,unknown,unknown


<b>13 jus d'orange sans conservateurs, sans colorants, ni sucre ajouté</b>

# Les jus d'orange avec Nutriscore A

In [73]:
def nutriscore_A(df):
    # isolate all rows with nutriscore_grade = A
    df = df_nutriscore[df_nutriscore['nutriscore_grade'].str.contains('a', na=False)]

    return df

In [74]:
nutriscore_A(df)

Unnamed: 0,product_name,quantity,nutrition-score-fr_100g,nutriscore_score,nutriscore_grade,brands,origins,ingredients_text,countries,labels,...,fat_100g,saturated-fat_100g,carbohydrates_100g,sugars_100g,sodium_100g,additives,vitamin-c_100g,nova_group,pnns_groups_1,pnns_groups_2
178,"Pur jus d'orange sans pulpe flash pasteurisé, ...","1,5 l",-4.0,-4.0,a,U,oranges,Jus d'orange 100%,France,"DEMARCHE FLEG METIERS, Recyclé plastique",...,0.5,0.1,8.1,8.1,0.004,,0.02,1.0,unknown,unknown
426,Salade de fruit en jus à l'orange et citron de...,,-3.0,-3.0,a,I.D Fruits,,Fruits en proportion variable 80% (pomme (dont...,France,,...,0.5,0.1,14.0,13.0,0.004,,,4.0,Fruits and vegetables,Fruits


<b>Il y en a que de deux de jus d'orange avec un nutriscore A.</b>

# Les jus d'orange contenant le plus de vitamine C

In [104]:
def vitamine_C(df):
    # delete product with vitamin-c_100g = 0
    df_vit = df[df['vitamin-c_100g'] != 0]
    # delete product vitamin-c_100g > 0.039
    df_vit = df_vit[(df_vit['vitamin-c_100g'] > 0.039) & (df_vit['vitamin-c_100g'] < 24)]

    return df_vit['vitamin-c_100g'].value_counts(ascending=False)

In [105]:
vitamine_C(df)

0.0450    6
0.0400    2
0.0420    1
0.0500    1
0.0448    1
Name: vitamin-c_100g, dtype: int64

<b>Echantillons de 5 jus d'orange possédant entre 0,0420 et 0,0450 grammes de vitamine C pour 100 gr</b>

- 100 % pur jus orange de Franprix
- Premium 100% jus d'orange sanguine de Monoprix	
- Le Pur jus - Sans pulpe Jus d'orange de Joker
- Pur jus de fruit pressé 100% orange de Innocent
- Pur jus d'oranges sanguines pressées de Super U

<img src="../img/C.jpg">

# Les jus d'orange avec nutriscore E

In [38]:
df_E = df

In [39]:
# delete product with nutriscore = 0
df_E = df[df.nutriscore_grade != 0]

In [40]:
# convert sugars_100g to string values
df_E['nutriscore_grade']=df_E['nutriscore_grade'].astype(str)

In [41]:
# isolate all rows with nutriscore_grade = e
df_E = df_E[df_E["nutriscore_grade"].str.contains("e")]

In [42]:
df_E

Unnamed: 0,product_name,quantity,nutrition-score-fr_100g,nutriscore_score,nutriscore_grade,brands,origins,ingredients_text,countries,labels,...,fat_100g,saturated-fat_100g,carbohydrates_100g,sugars_100g,sodium_100g,additives,vitamin-c_100g,nova_group,pnns_groups_1,pnns_groups_2
31,jus d'orange 💯% fruits pressés,1L,12.0,12.0,e,joker,0,A,France,"Nutriscore, Nutriscore C",...,0.0,0.0,8.6,"(5.0, 10.0]",0.005,0.0,0.0,0.0,unknown,unknown
43,Trimm jus d'orange,0,14.0,14.0,e,Trimm,0,Jus d orange Teneur en fruits : 100 % Sans col...,France,"Sans colorants, Sans conservateurs",...,0.2,0.0,11.0,"(5.0, 10.0]",0.0,0.0,0.0,0.0,Beverages,Unsweetened beverages
60,Boisson aux jus d'orange et de citron,"0,2 L",11.0,11.0,e,Fruima,0,"Eau, jus d'orange à base de concentré 18%, suc...",France,"en:FSC, FSC Mix, FSC C105544",...,0.5,0.1,8.2,"(5.0, 10.0]",0.004,0.0,0.0,4.0,Beverages,Sweetened beverages
88,Le pur jus JOKER orange clémentine,0,14.0,14.0,e,joker,0,0,France,0,...,0.0,0.0,10.5,"(10.0, 15.0]",0.0,0.0,0.0,0.0,unknown,unknown
99,Nectar d'orange à base de jus concentré Teneur...,2 l,10.0,10.0,e,Casino,0,Jus d'orange à base de jus d'orange concentré ...,France,"Point Vert, Sans colorants, Sans conservateurs...",...,0.0,0.0,10.0,"(5.0, 10.0]",0.0,0.0,0.0,3.0,Beverages,Sweetened beverages
100,Nectar orange pêche abricot à base de jus et d...,1 l,13.0,13.0,e,Casino,0,Jus et purées de fruits à base de concentrés 5...,France,0,...,0.0,0.0,12.0,"(10.0, 15.0]",0.0,0.0,0.0,3.0,Beverages,Sweetened beverages
105,Boisson aux jus de fruits à base de concentrés...,0,13.0,13.0,e,"Casino,Les Doodingues",0,S. Eau - sucre - jus d'orange à base de concen...,France,0,...,0.0,0.0,10.0,"(5.0, 10.0]",0.0,0.0,0.0,4.0,Beverages,Sweetened beverages
128,Pur jus orange avec pulpe,0,12.0,12.0,e,Paquito,0,0,France,0,...,0.5,0.1,8.4,"(5.0, 10.0]",0.004,0.0,0.0,0.0,unknown,unknown
138,Lot 6 Recre 100% jus d'orange 20CL,0,10.0,10.0,e,0,0,0,France,"Sans sucre ajouté, Sans colorants, Sans conser...",...,0.0,0.0,16.0,"(10.0, 15.0]",0.004,0.0,0.0,0.0,Beverages,Fruit juices
266,Le jus orange carotte,0,11.0,11.0,e,Monoprix,0,0,France,Sans conservateurs,...,0.0,0.0,8.8,"(5.0, 10.0]",0.0,0.0,0.0,0.0,unknown,unknown
