# Analisi esplorativa con Python
## Confronto con R

A differenza di R, Python non è un linguaggio verticale e specifico per la data science quindi molte funzioni base (manipolazione di dataset e analisi statistica descrittiva) non sono presenti nella definizione del linguaggio ma è necessario importarle da librerie esterne. Le più utilizzate sono *numpy* per il calcolo matriciale e *pandas* per la manipolazione di data frames.

In [1]:
import numpy as np
import pandas as pd

I risultati delle operazioni possono essere stampati come output o salvati in variabili come segue:

In [2]:
3 + 5

8

In [72]:
12 / 7

1.7142857142857142

In [73]:
result = 3 + 5

In [74]:
result

8

In [75]:
print(result)

8


In [76]:
result = result * 3.1415
print(result)

25.132


È possibile definire vettori e liste in cui salvare più dati in una sola dimensione.

In [2]:
vector = [1, 3, 8, 13]

In [4]:
vector * 3

[1, 3, 8, 13, 1, 3, 8, 13, 1, 3, 8, 13]

A differenza di R, la versione base di Python non permette le operazioni tra scalari e matrici. Per questo è necessario convertire il vettore in array di numpy. 

Le funzioni dei pacchetti devono essere chiamate tenendo conto della struttura libreria.funzione

In [3]:
vector = np.array(vector)
vector

array([ 1,  3,  8, 13])

In [4]:
vector * 3

array([ 3,  9, 24, 39])

La procedura per il subset è simile a quello di R ma bisogna tenere conto che la numerazione parte da 0 al posto di 1.

In [6]:
vector[1]

3

In [5]:
vector[0]

1

Inoltre nel caso di selezione multipla, il secondo valore che rappresenta l'ultimo elemento NON è compreso nel subset (a differenza di R che è compreso).

In [7]:
vector[1:3]

array([3, 8])

In [8]:
vector[[False, True, True, False]]


array([3, 8])

In [9]:
vector < 3

array([ True, False, False, False])

In [10]:
vector[vector < 3]

array([1])

Il caricamento di un file CSV avviene tramite la libreria pandas selezionando il nome del file presente nella stessa cartella dello script (path relativo).

In [12]:
ecommerce = pd.read_csv('ecommerce_small.csv')

In [13]:
ecommerce

Unnamed: 0,X,orderid,status,customerid,country,year,month,day,hour,minute,stockid,description,unitprice,quantity
0,1,536365,shipped,17850.0,United Kingdom,2010,12,1,8,26,85123A,WHITE HANGING HEART T-LIGHT HOLDER,2.55,6
1,2,536365,shipped,17850.0,United Kingdom,2010,12,1,8,26,71053,WHITE METAL LANTERN,3.39,6
2,3,536365,shipped,17850.0,United Kingdom,2010,12,1,8,26,84406B,CREAM CUPID HEARTS COAT HANGER,2.75,8
3,4,536365,shipped,17850.0,United Kingdom,2010,12,1,8,26,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,3.39,6
4,5,536365,shipped,17850.0,United Kingdom,2010,12,1,8,26,84029E,RED WOOLLY HOTTIE WHITE HEART.,3.39,6
5,6,536365,shipped,17850.0,United Kingdom,2010,12,1,8,26,22752,SET 7 BABUSHKA NESTING BOXES,7.65,2
6,7,536365,shipped,17850.0,United Kingdom,2010,12,1,8,26,21730,GLASS STAR FROSTED T-LIGHT HOLDER,4.25,6
7,8,536366,shipped,17850.0,United Kingdom,2010,12,1,8,28,22633,HAND WARMER UNION JACK,1.85,6
8,9,536366,shipped,17850.0,United Kingdom,2010,12,1,8,28,22632,HAND WARMER RED POLKA DOT,1.85,6
9,10,536367,shipped,13047.0,United Kingdom,2010,12,1,8,34,84879,ASSORTED COLOUR BIRD ORNAMENT,1.69,32


In [14]:
ecommerce.describe()

Unnamed: 0,X,orderid,customerid,year,month,day,hour,minute,unitprice,quantity
count,5000.0,5000.0,3795.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0
mean,2500.5,536593.6628,15906.28195,2010.0,12.0,1.3784,13.337,29.0642,3.792314,9.1858
std,1443.520003,123.308616,1732.068892,0.0,0.0,0.485037,2.466789,16.206931,13.21172,144.940788
min,1.0,536365.0,12431.0,2010.0,12.0,1.0,7.0,0.0,0.0,-9360.0
25%,1250.75,536532.0,14606.0,2010.0,12.0,1.0,11.0,15.0,1.25,1.0
50%,2500.5,536592.0,15862.0,2010.0,12.0,1.0,14.0,32.0,2.51,3.0
75%,3750.25,536672.0,17841.0,2010.0,12.0,2.0,15.0,42.0,4.21,10.0
max,5000.0,536836.0,18239.0,2010.0,12.0,2.0,18.0,59.0,607.49,2880.0


Per selezionare una colonna usando il nome

In [16]:
ecommerce['unitprice']

0         2.55
1         3.39
2         2.75
3         3.39
4         3.39
5         7.65
6         4.25
7         1.85
8         1.85
9         1.69
10        2.10
11        2.10
12        3.75
13        1.65
14        4.25
15        4.95
16        9.95
17        5.95
18        5.95
19        7.95
20        7.95
21        4.25
22        4.95
23        4.95
24        4.95
25        5.95
26        3.75
27        3.75
28        3.75
29        0.85
         ...  
4970      2.95
4971      4.25
4972      2.10
4973      2.10
4974      5.95
4975      6.75
4976      0.85
4977      0.85
4978      0.85
4979      0.85
4980      0.85
4981      0.85
4982      1.65
4983      2.95
4984      1.45
4985      6.75
4986      6.75
4987      8.50
4988      2.95
4989    295.00
4990      1.25
4991      1.25
4992      9.95
4993      9.95
4994      9.95
4995     10.95
4996      2.55
4997      4.95
4998      1.65
4999      0.85
Name: unitprice, Length: 5000, dtype: float64

Per fare un subset di righe

In [17]:
ecommerce.loc[0:5]

Unnamed: 0,X,orderid,status,customerid,country,year,month,day,hour,minute,stockid,description,unitprice,quantity
0,1,536365,shipped,17850.0,United Kingdom,2010,12,1,8,26,85123A,WHITE HANGING HEART T-LIGHT HOLDER,2.55,6
1,2,536365,shipped,17850.0,United Kingdom,2010,12,1,8,26,71053,WHITE METAL LANTERN,3.39,6
2,3,536365,shipped,17850.0,United Kingdom,2010,12,1,8,26,84406B,CREAM CUPID HEARTS COAT HANGER,2.75,8
3,4,536365,shipped,17850.0,United Kingdom,2010,12,1,8,26,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,3.39,6
4,5,536365,shipped,17850.0,United Kingdom,2010,12,1,8,26,84029E,RED WOOLLY HOTTIE WHITE HEART.,3.39,6
5,6,536365,shipped,17850.0,United Kingdom,2010,12,1,8,26,22752,SET 7 BABUSHKA NESTING BOXES,7.65,2


Per filtrare una categoria

In [20]:
ecommerce[ecommerce['status']=='shipped']

Unnamed: 0,X,orderid,status,customerid,country,year,month,day,hour,minute,stockid,description,unitprice,quantity
0,1,536365,shipped,17850.0,United Kingdom,2010,12,1,8,26,85123A,WHITE HANGING HEART T-LIGHT HOLDER,2.55,6
1,2,536365,shipped,17850.0,United Kingdom,2010,12,1,8,26,71053,WHITE METAL LANTERN,3.39,6
2,3,536365,shipped,17850.0,United Kingdom,2010,12,1,8,26,84406B,CREAM CUPID HEARTS COAT HANGER,2.75,8
3,4,536365,shipped,17850.0,United Kingdom,2010,12,1,8,26,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,3.39,6
4,5,536365,shipped,17850.0,United Kingdom,2010,12,1,8,26,84029E,RED WOOLLY HOTTIE WHITE HEART.,3.39,6
5,6,536365,shipped,17850.0,United Kingdom,2010,12,1,8,26,22752,SET 7 BABUSHKA NESTING BOXES,7.65,2
6,7,536365,shipped,17850.0,United Kingdom,2010,12,1,8,26,21730,GLASS STAR FROSTED T-LIGHT HOLDER,4.25,6
7,8,536366,shipped,17850.0,United Kingdom,2010,12,1,8,28,22633,HAND WARMER UNION JACK,1.85,6
8,9,536366,shipped,17850.0,United Kingdom,2010,12,1,8,28,22632,HAND WARMER RED POLKA DOT,1.85,6
9,10,536367,shipped,13047.0,United Kingdom,2010,12,1,8,34,84879,ASSORTED COLOUR BIRD ORNAMENT,1.69,32


Inoltre per ogni colonna numerica è possibile calcolare alcuni indicatori statistici fondamentali

In [27]:
ecommerce.unitprice.mean()

3.792314000000056

In [28]:
ecommerce.unitprice.var()

174.54955841708835

In [29]:
ecommerce.unitprice.std()

13.21172049420848

In [30]:
ecommerce.unitprice.min()

0.0

In [31]:
ecommerce['unitprice'].max()

607.49

In [32]:
np.quantile(ecommerce.unitprice, 0.25)

1.25