## Commenter brièvement le tableau des statistiques sommaires.
Les valeurs de la colonne viande sont nettement supérieures à celles des autres colonnes. En général, les manœuvres consomment beaucoup plus de lait, surtout pour les familles de 4 enfants ou moins. Les cadres, quant à eux, consomment davantage de viande, légumes, fruits et volaille. Pour les 12 individus, la viande reste le produit le plus coûteux.

Dans ce tableau de 12 individus, on observe une grande variabilité des valeurs. Le nombre d’enfants n’influence pas systématiquement la consommation des produits (par exemple, le cadre CA3 consomme beaucoup plus de légumes que le CA4). On constate également que le cadre CA5 dépense davantage sur la consommation autre que le pain.

### Que peut-on dire des corrélations entre variables ?
Viande – Volaille (0.982) : Oui, très fortement corrélées.

Viande – Fruits (0.959) : Oui, fortement corrélées.

Viande – Légumes (0.881) : Oui, fortement corrélées.

Fruits – Volaille (0.926) : Oui, fortement corrélées.

Légumes – Fruits (0.856) : Oui, corrélées.

Toutes ces paires de variables sont corrélées positivement

In [None]:
corr = df.corr()
print(corr)

              pain   légumes    fruits    viande  volaille      lait       vin
pain      1.000000  0.593110  0.196139  0.321269  0.246814  0.855575  0.303761
légumes   0.593110  1.000000  0.856250  0.881081  0.827300  0.662799 -0.356468
fruits    0.196139  0.856250  1.000000  0.959477  0.926396  0.332189 -0.486281
viande    0.321269  0.881081  0.959477  1.000000  0.981688  0.374591 -0.437235
volaille  0.246814  0.827300  0.926396  0.981688  1.000000  0.232489 -0.401854
lait      0.855575  0.662799  0.332189  0.374591  0.232489  1.000000  0.006880
vin       0.303761 -0.356468 -0.486281 -0.437235 -0.401854  0.006880  1.000000


### Import packages


In [None]:
import numpy as np
import pandas as pd
import sklearn as skl
import matplotlib.pyplot as plt

### Load Dataset


In [None]:
df = pd.read_excel('./Dataset_TP_ACP.xlsx', index_col=0)
df

Unnamed: 0_level_0,pain,légumes,fruits,viande,volaille,lait,vin
IDEN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
MA2,332,428,354,1437,526,247,427
EM2,293,559,388,1527,567,239,258
CA2,372,767,562,1948,927,235,433
MA3,406,563,341,1507,544,324,407
EM3,386,608,396,1501,568,319,363
CA3,438,843,689,2345,1148,243,341
MA4,534,660,367,1620,638,414,407
EM4,460,699,484,1856,762,400,416
CA4,385,789,621,2366,1149,304,282
MA5,655,776,423,1848,759,495,486


### Create Matrice

In [None]:
M = df.values
M.shape # Size

(12, 7)

### Calculate the mean and standard deviation

In [None]:
mean = np.round(np.mean(M, axis=0),1) # Axis=0 >>>> Column
mean

array([ 446.7,  732. ,  505. , 1886.8,  804. ,  358.2,  368.6])

In [None]:
std_dev = np.round(np.std(M, axis=0),1)  # axis=0 pour les colonnes
std_dev

array([102.6, 181.1, 158.1, 378.9, 238.1, 112.1,  68.7])

### We consider that the data are heterogeneous / Centrage et Réduction (Standardisation)

#### Centrage

In [None]:
M_center = np.round(M - mean,2)
M_center


array([[-114.7, -304. , -151. , -449.8, -278. , -111.2,   58.4],
       [-153.7, -173. , -117. , -359.8, -237. , -119.2, -110.6],
       [ -74.7,   35. ,   57. ,   61.2,  123. , -123.2,   64.4],
       [ -40.7, -169. , -164. , -379.8, -260. ,  -34.2,   38.4],
       [ -60.7, -124. , -109. , -385.8, -236. ,  -39.2,   -5.6],
       [  -8.7,  111. ,  184. ,  458.2,  344. , -115.2,  -27.6],
       [  87.3,  -72. , -138. , -266.8, -166. ,   55.8,   38.4],
       [  13.3,  -33. ,  -21. ,  -30.8,  -42. ,   41.8,   47.4],
       [ -61.7,   57. ,  116. ,  479.2,  345. ,  -54.2,  -86.6],
       [ 208.3,   44. ,  -82. ,  -38.8,  -45. ,  136.8,  117.4],
       [ 137.3,  263. ,   43. ,  169.2,   89. ,  159.8,  -49.6],
       [  68.3,  365. ,  382. ,  743.2,  363. ,  202.8,  -84.6]])

#### Reduction

In [None]:
M_center_reduc = np.round((M - mean)/std_dev, 2)
M_center_reduc

array([[-1.12, -1.68, -0.96, -1.19, -1.17, -0.99,  0.85],
       [-1.5 , -0.96, -0.74, -0.95, -1.  , -1.06, -1.61],
       [-0.73,  0.19,  0.36,  0.16,  0.52, -1.1 ,  0.94],
       [-0.4 , -0.93, -1.04, -1.  , -1.09, -0.31,  0.56],
       [-0.59, -0.68, -0.69, -1.02, -0.99, -0.35, -0.08],
       [-0.08,  0.61,  1.16,  1.21,  1.44, -1.03, -0.4 ],
       [ 0.85, -0.4 , -0.87, -0.7 , -0.7 ,  0.5 ,  0.56],
       [ 0.13, -0.18, -0.13, -0.08, -0.18,  0.37,  0.69],
       [-0.6 ,  0.31,  0.73,  1.26,  1.45, -0.48, -1.26],
       [ 2.03,  0.24, -0.52, -0.1 , -0.19,  1.22,  1.71],
       [ 1.34,  1.45,  0.27,  0.45,  0.37,  1.43, -0.72],
       [ 0.67,  2.02,  2.42,  1.96,  1.52,  1.81, -1.23]])

In [None]:
# Matrice de corrélation
# Par Formule : -1/n)​M.T*M

V = np.round((1/M_center_reduc.shape[0]) * M_center_reduc.T @ M_center_reduc,2)
V

array([[ 1.  ,  0.6 ,  0.2 ,  0.32,  0.25,  0.86,  0.3 ],
       [ 0.6 ,  1.  ,  0.86,  0.88,  0.83,  0.66, -0.36],
       [ 0.2 ,  0.86,  1.  ,  0.96,  0.93,  0.33, -0.49],
       [ 0.32,  0.88,  0.96,  1.  ,  0.98,  0.38, -0.44],
       [ 0.25,  0.83,  0.93,  0.98,  1.  ,  0.23, -0.4 ],
       [ 0.86,  0.66,  0.33,  0.38,  0.23,  1.  ,  0.01],
       [ 0.3 , -0.36, -0.49, -0.44, -0.4 ,  0.01,  1.  ]])

In [None]:
# Matrice de corrélation
# Par Methode : np.corrcoef
M_corr = np.round(np.corrcoef(M_center_reduc, rowvar=False),2)
M_corr

array([[ 1.  ,  0.59,  0.2 ,  0.32,  0.25,  0.86,  0.3 ],
       [ 0.59,  1.  ,  0.86,  0.88,  0.83,  0.66, -0.36],
       [ 0.2 ,  0.86,  1.  ,  0.96,  0.93,  0.33, -0.49],
       [ 0.32,  0.88,  0.96,  1.  ,  0.98,  0.38, -0.44],
       [ 0.25,  0.83,  0.93,  0.98,  1.  ,  0.23, -0.4 ],
       [ 0.86,  0.66,  0.33,  0.38,  0.23,  1.  ,  0.01],
       [ 0.3 , -0.36, -0.49, -0.44, -0.4 ,  0.01,  1.  ]])

In [None]:
# Calcul Valeurs propres, Vecteurs propres
valeurs_propres, vecteurs_propres = np.linalg.eig(M_corr)
valeurs_propres, vecteurs_propres = np.round(valeurs_propres,2), np.round(vecteurs_propres,2)
valeurs_propres, vecteurs_propres

(array([ 4.34,  1.83,  0.63,  0.12, -0.  ,  0.02,  0.06]),
 array([[ 0.24,  0.62, -0.01, -0.56,  0.08,  0.49,  0.04],
        [ 0.47,  0.1 , -0.06, -0.02,  0.16, -0.33, -0.8 ],
        [ 0.45, -0.2 ,  0.15,  0.52, -0.1 ,  0.67, -0.08],
        [ 0.46, -0.14,  0.2 , -0.02,  0.68, -0.22,  0.46],
        [ 0.44, -0.2 ,  0.36, -0.33, -0.66, -0.25,  0.18],
        [ 0.28,  0.53, -0.44,  0.46, -0.25, -0.29,  0.32],
        [-0.21,  0.48,  0.78,  0.31,  0.02, -0.13, -0.07]]))

In [None]:
# Etude des axes

## Trier les valeurs propres
idx = valeurs_propres.argsort()[::-1]

valeurs_propres_ordre = valeurs_propres[idx]
valeurs_propres_ordre

array([ 4.34,  1.83,  0.63,  0.12,  0.06,  0.02, -0.  ])

In [None]:
# Interie
Inertie = float(np.sum(valeurs_propres_ordre))
Inertie

6.999999999999999

In [None]:
# Choix Axe1
rate_axe1 = float(round(valeurs_propres_ordre[0]/Inertie, 2)) * 100
f"Taux 1 Axe : {rate_axe1}%"

'Taux 1 Axe : 62.0%'

In [None]:
# Choix Axe2
rate_axe2 = float(round((valeurs_propres_ordre[0]+valeurs_propres_ordre[1])/Inertie, 2)) * 100
f"Taux 2 Axes : {rate_axe2}% >>>>> SUFFISANT"

'Taux 2 Axes : 88.0% >>>>> SUFFISANT'

In [None]:
vecteurs_propres_ordre = vecteurs_propres[:, idx]
vecteurs_propres_ordre

array([[ 0.24,  0.62, -0.01, -0.56,  0.04,  0.49,  0.08],
       [ 0.47,  0.1 , -0.06, -0.02, -0.8 , -0.33,  0.16],
       [ 0.45, -0.2 ,  0.15,  0.52, -0.08,  0.67, -0.1 ],
       [ 0.46, -0.14,  0.2 , -0.02,  0.46, -0.22,  0.68],
       [ 0.44, -0.2 ,  0.36, -0.33,  0.18, -0.25, -0.66],
       [ 0.28,  0.53, -0.44,  0.46,  0.32, -0.29, -0.25],
       [-0.21,  0.48,  0.78,  0.31, -0.07, -0.13,  0.02]])

In [None]:
# Calculer les coordonnées dans le nouveau Axe 1
coord_I1 = M_center_reduc.dot(vecteurs_propres_ordre[:,0])
np.round(coord_I1,2)

array([-3.01, -1.98, -0.13, -2.15, -1.76,  1.78, -0.98, -0.27,  1.68,
        0.22,  2.05,  4.53])

In [None]:
# Calculer les coordonnées dans le nouveau Axe 2
coord_I2 = M_center_reduc.dot(vecteurs_propres_ordre[:,1])
np.round(coord_I2,2)

array([-0.39, -1.88, -0.76,  0.33, -0.18, -1.42,  1.43,  0.66, -1.81,
        2.91,  1.2 , -0.08])

In [None]:
df2 = pd.DataFrame({
    "AXE1": coord_I1,
    "AXE2": coord_I2
})
df2

Unnamed: 0,AXE1,AXE2
0,-3.0083,-0.3865
1,-1.9799,-1.8796
2,-0.1269,-0.7638
3,-2.1451,0.3295
4,-1.7577,-0.1789
5,1.7753,-1.4159
6,-0.9831,1.4328
7,-0.2692,0.6631
8,1.678,-1.8126
9,0.2189,2.906


In [None]:
 # Combinaison des Dataframe
df2.index = df.index
df2

Unnamed: 0_level_0,AXE1,AXE2
IDEN,Unnamed: 1_level_1,Unnamed: 2_level_1
MA2,-3.0083,-0.3865
EM2,-1.9799,-1.8796
CA2,-0.1269,-0.7638
MA3,-2.1451,0.3295
EM3,-1.7577,-0.1789
CA3,1.7753,-1.4159
MA4,-0.9831,1.4328
EM4,-0.2692,0.6631
CA4,1.678,-1.8126
MA5,0.2189,2.906


#########################################################################HHHHHH