# Working with data

## 1.Introduction to Numpy and Pandas

The following tutorial contains examples of using the numpy and pandas library modules. 

__NUMPY__<br>
NumPy stands for Numerical Python.
NumPy is a Python library used for working with arrays.
It also has functions for working in domain of linear algebra, fourier transform, and matrices.
It is an open source project and you can use it freely.
You can easly demonstrate that processing using numpy is faster than normal array processing using Python,
that becames interesting with big arrays (processing of an array with 1 Million items is 10 time faster)

__PANDAS__<br>
Pandas is a library for data analysis.


---




Il seguente tutorial contiene esempi di utilizzo dei moduli delle librerie numpy e pandas.

__NUMPY__<br>
NumPy sta per Numerical Python (Python Numerico).
NumPy è una libreria di Python usata per lavorare con gli array.
Include anche funzioni per lavorare nell'ambito dell'algebra lineare, della trasformata di Fourier e delle matrici.
È un progetto open source e puoi usarlo liberamente.
È facile dimostrare che l'elaborazione con NumPy è più veloce rispetto alla normale elaborazione degli array in Python,
il che diventa interessante con array molto grandi (l'elaborazione di un array con 1 milione di elementi è 10 volte più veloce).

__PANDAS__ <br>
Pandas è una libreria per l'analisi dei dati


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### 1.1 Ndarray
The basic data structure in numpy is a multi-dimensional array object called ndarray. Numpy provides a suite of functions that can efficiently manipulate elements of the ndarray. 

#### 1.1.1 Creating ndarrays

An ndarray can be created from a list object as shown in the examples below. It is possible to create a 1-dimensional or multi-dimensional array from the list objects.

---

La struttura dati di base in NumPy è un oggetto array multidimensionale chiamato ndarray.
NumPy fornisce una serie di funzioni che possono manipolare in modo efficiente gli elementi di un ndarray.

#### 1.1.1 Creare ndarrays

Un ndarray può essere creato da una lista, come mostrato negli esempi qui sotto.
È possibile creare un array monodimensionale (1D) o multidimensionale (2D, 3D, ecc.) a partire da liste.


In [None]:
oneDim = np.array([1.0,2,3,4,5])   # a 1-dimensional array (vector)
print(oneDim)
print("#Dimensions =", oneDim.ndim)
print("Dimension =", oneDim.shape)
print("Size =", oneDim.size)
print("Array type =", oneDim.dtype, '\n')

twoDim = np.array([[1,2],[3,4],[5,6],[7,8]])  # a two-dimensional array (matrix)
print(twoDim)
print("#Dimensions =", twoDim.ndim)
print("Dimension =", twoDim.shape)
print("Size =", twoDim.size)
print("Array type =", twoDim.dtype, '\n')

[1. 2. 3. 4. 5.]
#Dimensions = 1
Dimension = (5,)
Size = 5
Array type = float64 

[[1 2]
 [3 4]
 [5 6]
 [7 8]]
#Dimensions = 2
Dimension = (4, 2)
Size = 8
Array type = int64 



There are also built-in functions available in numpy to create the ndarrays.

---

In NumPy ci sono anche funzioni integrate per creare gli ndarray.

In [None]:
print('Array of random numbers from a uniform distribution')
print(np.random.rand(5))      # random numbers from a uniform distribution between [0,1]

print('\nArray of random numbers from a normal distribution')
print(np.random.randn(5))     # random numbers from a normal distribution

print('\nArray of integers between -10 and 10, with step size of 2')
print(np.arange(-10,10,2))    # similar to range, but returns ndarray instead of list

print('\n2-dimensional array of integers from 0 to 11')
print(np.arange(12).reshape(3,4))  # reshape to a matrix

print('\nArray of values between 0 and 1, split into 10 equally spaced values')
print(np.linspace(0,1,10))    # split interval [0,1] into 10 equally separated values

print('\nArray of values from 10^-3 to 10^3')
print(np.logspace(-3,3,7))    # create ndarray with values from 10^-3 to 10^3

Array of random numbers from a uniform distribution
[0.86859997 0.7021351  0.38467352 0.32070688 0.19128229]

Array of random numbers from a normal distribution
[-2.16129783 -0.31662708 -0.47223878 -1.22669143  1.38513474]

Array of integers between -10 and 10, with step size of 2
[-10  -8  -6  -4  -2   0   2   4   6   8]

2-dimensional array of integers from 0 to 11
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

Array of values between 0 and 1, split into 10 equally spaced values
[0.         0.11111111 0.22222222 0.33333333 0.44444444 0.55555556
 0.66666667 0.77777778 0.88888889 1.        ]

Array of values from 10^-3 to 10^3
[1.e-03 1.e-02 1.e-01 1.e+00 1.e+01 1.e+02 1.e+03]


In [None]:
print('A 2 x 3 matrix of zeros')
print(np.zeros((2,3)))        # a matrix of zeros

print('\nA 3 x 2 matrix of ones')
print(np.ones((3,2)))         # a matrix of ones

print('\nA 3 x 3 identity matrix')
print(np.eye(3))     

A 2 x 3 matrix of zeros
[[0. 0. 0.]
 [0. 0. 0.]]

A 3 x 2 matrix of ones
[[1. 1.]
 [1. 1.]
 [1. 1.]]

A 3 x 3 identity matrix
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


#### ?.1.1 Exercise
Create two ndarrays _a_ and _b_, and print an ndarray _c_ where _c_=_a_+_b_ <br>
_a_ and _b_ must have 1 dimension.

In this case, + is the element-wise summation.

---

Crea due ndarray _a_ e _b_, e stampa un ndarray _c_ dove _c_ = _a_ + _b_ <br>
_a_ e _b_ devono essere array monodimensionali (1D).

In questo caso, + indica la somma elemento per elemento.

In [None]:
#Your code

### 1.1.2 Accessing elements of ndarray (Indexing and slicing)
To access elements or subsets of elements of an ndarray, we use the [] operator

---

### 1.1.2 Accesso agli elementi di un ndarray (Indicizzazione e slicing)
Per accedere agli elementi o a sottoinsiemi di elementi di un ndarray, si usa l'operatore [].

In [None]:
oneDim = np.array([1.0,2,3,4,5])   # a 1-dimensional array (vector)

#Accessing the first element of oneDim
print('The first element of oneDim is ', oneDim[0])
print('With type ', type(oneDim[0]), '\n')


#Accessing the first 3 elements of oneDim
print('The first 3 elements of oneDim are ', oneDim[0:3])
print('With type ', type(oneDim[0:3]))


The first element of oneDim is  1.0
With type  <class 'numpy.float64'> 

The first 3 elements of oneDim are  [1. 2. 3.]
With type  <class 'numpy.ndarray'>


### ?.1.2 Exercise
Create two ndarrays _d_ and _e_ and swap their last two elements

---

Crea due ndarray _d_ ed _e_ e scambia i loro ultimi due elementi

Assigning a numpy array (or a subset of its elements) to another variable will simply pass a reference to the array instead of copying its values. To make a copy of an ndarray, you need to explicitly call the .copy() function.

---

Assegnare un array numpy (o un sottoinsieme dei suoi elementi) a un'altra variabile passerà semplicemente un riferimento all'array, invece di copiarne i valori.
Per creare una copia di un ndarray, devi chiamare esplicitamente la funzione .copy().

In [None]:
x = np.arange(-5,5)
print('Before: x =', x)

y = x[3:5]     # y is a slice, i.e., pointer to a subarray in x
print('        y =', y)
y[:] = 1000    # modifying the value of y will change x
print('After : y =', y)
print('        x =', x, '\n')

z = x[3:5].copy()   # makes a copy of the subarray
print('Before: x =', x)
print('        z =', z)
z[:] = 500          # modifying the value of z will not affect x
print('After : z =', z)
print('        x =', x)

Before: x = [-5 -4 -3 -2 -1  0  1  2  3  4]
        y = [-2 -1]
After : y = [1000 1000]
        x = [  -5   -4   -3 1000 1000    0    1    2    3    4] 

Before: x = [  -5   -4   -3 1000 1000    0    1    2    3    4]
        z = [1000 1000]
After : z = [500 500]
        x = [  -5   -4   -3 1000 1000    0    1    2    3    4]


## 2.2 Introduction to Pandas

Pandas provide two convenient data structures for storing and manipulating data--Series and DataFrame. A Series is similar to a one-dimensional array whereas a DataFrame is a tabular representation akin to a spreadsheet table.  

In this tutorial we will only consider the dataframe.

---

Pandas fornisce due strutture dati comode per memorizzare e manipolare i dati: Series e DataFrame.
Una Series è simile a un array monodimensionale, mentre un DataFrame è una rappresentazione tabellare simile a una tabella di un foglio di calcolo.

In questo tutorial andremo a vedere solamente il dataframe.


In [None]:
from pandas import DataFrame

cars = {'make': ['Fordaa', 'Honda', 'Toyota', 'Tesla'],
       'model': ['Taurus', 'Accord', 'Camry', 'Model S'],
       'MSRP': [27595, 23570, 23495, 68000]}          
carData = DataFrame(cars)            # creating DataFrame from dictionary
carData                              # display the table

Unnamed: 0,make,model,MSRP
0,Fordaa,Taurus,27595
1,Honda,Accord,23570
2,Toyota,Camry,23495
3,Tesla,Model S,68000


Creating a dataframe from an np array

---

Creare un dataframe da un np array


In [None]:
import numpy as np

npdata = np.random.randn(5,3)  # create a 5 by 3 random matrix
columnNames = ['x1','x2','x3']
data = DataFrame(npdata, columns=columnNames)
data

Unnamed: 0,x1,x2,x3
0,0.494603,-0.022883,-0.336624
1,-0.325229,0.046138,-1.029201
2,1.099771,-0.539464,0.837754
3,-0.677333,0.606175,-0.301533
4,-0.145999,-1.17242,-0.288766


### ?.2.1 Exercise 1: Analyzing Online Store Sales Data

__Objective:__ 
The goal is to analize and visualize sales and revenue data from an online store

---

__Obiettivo:__
L'obiettivo è analizzare e visualizzare i dati sulle vendite e sui ricavi di un negozio online.

In [None]:
np.random.seed(0)
months = 'create a list of months'
sales = 'create a list of monthly sales'
revenue = 'create a list of monthly revenue'

SyntaxError: invalid syntax (1003455621.py, line 2)

In [None]:
data = 'create a dataframe with the np arrays of months, sales, and revenue'

Calculate and display basic statistics for the data using pandas. For example, you can show the mean, median, and standard deviation of sales and revenue.

TIPS: <br>
👉 data['Sales'].mean() 
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mean.html

---

Calcola e visualizza le statistiche di base per i dati usando pandas.
Ad esempio, puoi mostrare la media, la mediana e la deviazione standard delle vendite e dei ricavi.

Suggerimento: <br>
👉 data['Sales'].mean() — Calcola la media delle vendite
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mean.html

In [None]:
sales_mean = 'calculate the mean of the sales'
revenue_median = 'calculate the median of the revenue'
sales_std =  'calculate the standard deviation of the sales'
print(f"Mean Sales: {sales_mean}")
print(f"Median Revenue: {revenue_median}")
print(f"Standard Deviation of Sales: {sales_std}")

Visualize the data using Matplotlib. Create a plot to show the monthly sales and a line plot for monthly revenue.

---

Visualizza i dati usando Matplotlib.
Crea un grafico per mostrare le vendite mensili e un grafico a linee per il ricavo mensile.



In [None]:
plt.figure(figsize=(10, 5))

plt.subplot(1, 2, 1)
plt.bar(data['Month'], data['Sales'])
plt.xlabel('Month')
plt.ylabel('Sales')
plt.title('Monthly Sales')

plt.subplot(1, 2, 2)
plt.plot(data['Month'], data['Revenue'], marker='o', color='g')
plt.xlabel('Month')
plt.ylabel('Revenue')
plt.title('Monthly Revenue')

plt.tight_layout()
plt.show()
