# Module 03 - Scientific Computing with Numpy

---

#### <a href="linkedin.com/in/tasmim-rahman-adib-403074221">Tasmim Rahman Adib</a>
![numpylogo](../img/numpy.jpeg)

# Lecture 3.5 - Aggregations: Statistics
## Agenda
- Sum
- Mean
- Median
- Minimum
- Maximum
- Range
- Standard Deviation
- Variance
- Quantile
- Correlation Coefficient

In [1]:
#conventional import
import numpy as np

## 3.5.1 Sum 
- Sum of array elements over a given axis.
    - **Syntax:** `np.sum(array); array-wise sum`
    - **Syntax:** `np.sum(array, axis=0); column-wise sum`
    - **Syntax:** `np.sum(array, axis=1); row-wise sum`

In [2]:
# 1D array 
A1 = np.arange(20)
print(A1)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


In [6]:
A1.ndim #returns the depth of a numpy array. basically the row

1

In [13]:
# 2D array 
A2 = np.array([[11, 12, 13], [21, 22, 23]])
print(A2)

[[11 12 13]
 [21 22 23]]


In [9]:
# Column-wise sum
np.sum(A2, axis=0)

array([32, 34, 36])

In [14]:
# row-wise sum
np.sum(A2, axis=1)

array([36, 66])

In [10]:
np.sum(A2)

np.int64(102)

## 3.5.2 Mean 
- Compute the median along the specified axis.
- Returns the average of the array elements. The average is taken over the flattened array by default,  otherwise over the specified axis. `float64` intermediate and return values re used for integer inputs.

    - **Syntax:** `np.mean(array); array-wise mean`
    - **Syntax:** `np.mean(array, axis=1); column-wise mean`
    - **Syntax:** `np.mean(array, axis=0); row-wise mean`

In [15]:
A1

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [16]:
A2

array([[11, 12, 13],
       [21, 22, 23]])

In [19]:
# compute the average of array `A1`
np.mean(A1)

np.float64(9.5)

In [60]:
# mean of 2D array(axis=1, column-wise)
np.mean(A2, axis=1)

array([12., 22.])

In [61]:
# mean of 2D array(axis=0, row-wise)
np.mean(A2, axis=0)

array([16., 17., 18.])

## 3.5.3 Median
- Compute the median along the specified axis.
- Returns the median of the array elements.
    
    - **Syntax:** `np.median(array); array-wise median`
    - **Syntax:** `np.median(array, axis=1); column-wise median`
    - **Syntax:** `np.median(array, axis=0); row-wise median`

In [25]:
# compute the meadian of `A1`
np.median(A1)

np.float64(9.5)

In [37]:
A2

array([[11, 12, 13],
       [21, 22, 23]])

In [58]:
# median of 2D array(axis=1, column-wise)
np.median(A2, axis=1)

array([12., 22.])

In [59]:
# median of 2D array(axis=0, row-wise)
np.median(A2, axis=0)

array([16., 17., 18.])

## 3.5.4 Minimum 
- Return the minimum of an array or minimum along an axis.
     
    - **Syntax:** `np.min(array); array-wise min`
    - **Syntax:** `np.min(array, axis=1); column-wise min`
    - **Syntax:** `np.min(array, axis=0); row-wise min`

In [31]:
# minimum value of `A1`
np.min(A1)

np.int64(0)

In [36]:
A2

array([[11, 12, 13],
       [21, 22, 23]])

In [56]:
# minimum value of A2(axis=1, column-wise)
np.min(A2, axis=1)

array([11, 21])

In [57]:
# minimum value of A2(axis=0, row-wise)
np.min(A2, axis=0)

array([11, 12, 13])

## 3.5.5 Maximum
- Return the maximum of an array or minimum along an axis.
     
    - **Syntax:** `np.max(array); array-wise max`
    - **Syntax:** `np.max(array, axis=0); row-wise max`
    - **Syntax:** `np.max(array, axis=1); column-wise max`

In [38]:
# maxiumum value of `A1`
np.max(A1)

np.int64(19)

In [63]:
# maxiumum value of A2(axis=0, row-wise)
np.max(A2, axis=0)

array([21, 22, 23])

In [42]:
A2

array([[11, 12, 13],
       [21, 22, 23]])

In [62]:
# maxiumum value of A2(axis=1, column-wise)
np.max(A2, axis=1)

array([13, 23])

## 3.5.6 Range 
- **Syntax:** `np.max(array) - np.min(array)`

In [43]:
A1.max() 

np.int64(19)

In [44]:
A1.min() 

np.int64(0)

In [45]:
r = np.max(A1) - np.min(A1)
print(r)

19


## 3.5.7 Standard Deviation
- Compute the standard deviation along the specified axis.
- Returns the standard deviation, a measure of the spread of a distribution, of the array elements. The standard deviation is computed for the
flattened array by default, otherwise over the specified axis.
    - **Syntax:** `np.std(array); array-wise std`
    - **Syntax:** `np.std(array, axis=0); row-wise std`
    - **Syntax:** `np.std(array, axis=1); column-wise std`

In [46]:
# compute the standard deviation of `A1`
np.std(A1)

np.float64(5.766281297335398)

In [47]:
A2

array([[11, 12, 13],
       [21, 22, 23]])

In [66]:
# standard deviation of 2D array(axis=1, column-wise)
np.std(A2, axis=1)

array([0.81649658, 0.81649658])

In [67]:
# standard deviation of 2D array(axis=0, row-wise)
np.std(A2, axis=0)

array([5., 5., 5.])

## 3.5.8 Variance
- Compute the variance along the specified axis.
- Returns the variance of the array elements, a measure of the spread of a
  distribution.  The variance is computed for the flattened array by
  default, otherwise over the specified axis.
    - **Syntax:** `np.var(array); array-wise var`
    - **Syntax:** `np.var(array, axis=0); row-wise var`
    - **Syntax:** `np.var(array, axis=1); column-wise var`

In [68]:
# compute the variance of `A`
np.var(A1)

np.float64(33.25)

In [69]:
# variance of 2D array(axis=0, row-wise)
np.std(A2, axis=0)

array([5., 5., 5.])

In [70]:
# variance of 2D array(axis=1, column-wise)
np.std(A2, axis=1)

array([0.81649658, 0.81649658])

## 3.5.9 Quantile
- Compute the q-th quantile of the data along the specified axis.
    - **Syntax:** `np.quantile(array); array-wise quantile`
    - **Syntax:** `np.quantile(array, axis=0); row-wise quantile`
    - **Syntax:** `np.quantile(array, axis=1); column-wise quantile`

In [71]:
# 25th percentile of `A1`
np.quantile(A1, 0.25)

np.float64(4.75)

In [72]:
# 50th percentile of `A2`(axis=0)
np.quantile(A2, 0.5, axis=0)

array([16., 17., 18.])

In [73]:
# 75th percentile of `A2`(axis=1)
np.quantile(A2, 0.75, axis=1)

array([12.5, 22.5])

## 3.5.10 Correlation Coefficient

In [74]:
# documentation 
np.info(np.corrcoef)

Return Pearson product-moment correlation coefficients.

Please refer to the documentation for `cov` for more detail.  The
relationship between the correlation coefficient matrix, `R`, and the
covariance matrix, `C`, is

.. math:: R_{ij} = \frac{ C_{ij} } { \sqrt{ C_{ii} C_{jj} } }

The values of `R` are between -1 and 1, inclusive.

Parameters
----------
x : array_like
    A 1-D or 2-D array containing multiple variables and observations.
    Each row of `x` represents a variable, and each column a single
    observation of all those variables. Also see `rowvar` below.
y : array_like, optional
    An additional set of variables and observations. `y` has the same
    shape as `x`.
rowvar : bool, optional
    If `rowvar` is True (default), then each row represents a
    variable, with observations in the columns. Otherwise, the relationship
    is transposed: each column represents a variable, while the rows
    contain observations.
bias : _NoValue, optional
    Has no effect, do not u

In [75]:
# compute Correlation Coefficient
np.corrcoef(A2)

array([[1., 1.],
       [1., 1.]])

*Copyright &copy; 2024  [Md. Jubayer Hossain](https://hossainlab.github.io/) &  [Center for Bioinformatics Learning Advancement and Systematic Training (cBLAST)](https://www.cblast.du.ac.bd/). All rights reserved*