![Quote](http://darlington.infinityfreeapp.com/images/quote7.png)

# 1. Introduction

Welcome to another insightful lesson in our Python Essentials course! In this important session, we will explore the basics of **using Python for data analysis**. Python, due to its simplicity and powerful libraries, stands out as the preferred language for data analysis worldwide. As data analysis has become increasingly important in various fields and industries, both individuals and organizations rely heavily on Python to make well-informed decisions based on data. By the end of this lesson, you'll gain valuable knowledge that will empower you and enhance your proficiency in this essential skill.

# 2. What is Data Analysis?

Data analysis is the systematic process of **extracting valuable insights and knowledge from raw data**. In today's data-driven world, organizations across various sectors such as *business*, *finance*, *healthcare*, *marketing*, and *research* heavily depend on it to make informed decisions. By analyzing data, organizations and individuals can effectively solve real-world problems, identify emerging trends, detect anomalies, make accurate predictions, and optimize their business processes.

# 3. Python Libraries for Data Analysis

Python offers a wide range of libraries, such as **Pandas**, **NumPy**, and **Matplotlib**, for data analysis. These libraries provide powerful tools and functions to efficiently manipulate, clean, visualize, and analyze data.

### 3.2 NumPy

NumPy (**Numerical Python**) is a powerful Python library used for **working with arrays and performing mathematical operations** on them.

#### Using NumPy

**Install NumPy**: Before using NumPy, you need to install it. You can install NumPy using the following command:

In [1]:
# Installing numpy using Jupyter notebook
# ! or %
%pip install numpy

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.3.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


**Import NumPy**: Once NumPy is installed, you can import it into your Python program using the import statement. It is common practice to import NumPy with the alias **np** for better readability.

In [2]:
# Importing numpy library
import numpy as np

**Performing Operations**: NumPy offers a wide range of functions and methods for efficiently performing various numerical operations on arrays, including data from your CSV file. With NumPy, we can effortlessly analyze our sales data, perform calculations, and tackle a variety of numerical tasks with ease.

### 3.3 Matplotlib

Matplotlib is a **plotting and visualization library** that enables you to create a wide variety of static, animated, and interactive visualizations in Python. With Matplotlib, you can generate plots, histograms, scatterplots, and more to explore and communicate your data effectively.

#### Using Matplotlib

**Install Matplotlib**: Before you can use Matplotlib, you need to install it by running the following command:

In [3]:
# Installing matplotlib using Jupyter notebook
# ! or %
%pip install matplotlib

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.3.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. There are several important differences between NumPy arrays and the standard Python sequences:

- NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original.

- The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory. The exception: one can have arrays of (Python, including NumPy) objects, thereby allowing for arrays of different sized elements.

- NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are executed more efficiently and with less code than is possible using Python’s built-in sequences.

A growing plethora of scientific and mathematical Python-based packages are using NumPy arrays; though these typically support Python-sequence input, they convert such input to NumPy arrays prior to processing, and they often output NumPy arrays. In other words, in order to efficiently use much (perhaps even most) of today’s scientific/mathematical Python-based software, just knowing how to use Python’s built-in sequence types is insufficient - one also needs to know how to use NumPy arrays.

In [8]:
import numpy as np

digit = np.array(range(1,11))
digit

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [9]:
digit = np.array(np.arange(1,11))
print(digit.shape)
print(digit.ndim)
print(digit)

(10,)
1
[ 1  2  3  4  5  6  7  8  9 10]


### reshape() method

In [10]:
digit = np.array(np.arange(1,11)).reshape(2,5)
print(digit.shape)
print(digit.ndim)
print(digit)

(2, 5)
2
[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]


In [11]:
digit = np.array(np.arange(1,11)).reshape(2,5,1)
print(digit.shape)
print(digit.ndim)
print(digit)

(2, 5, 1)
3
[[[ 1]
  [ 2]
  [ 3]
  [ 4]
  [ 5]]

 [[ 6]
  [ 7]
  [ 8]
  [ 9]
  [10]]]


In [12]:
digit = np.array(np.arange(1,11),ndmin=3)
print(digit.shape)
print(digit.ndim)
print(digit)

(1, 1, 10)
3
[[[ 1  2  3  4  5  6  7  8  9 10]]]


In [32]:
np.array?

[1;31mDocstring:[0m
array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0,
      like=None)

Create an array.

Parameters
----------
object : array_like
    An array, any object exposing the array interface, an object whose
    ``__array__`` method returns an array, or any (nested) sequence.
    If object is a scalar, a 0-dimensional array containing object is
    returned.
dtype : data-type, optional
    The desired data-type for the array. If not given, NumPy will try to use
    a default ``dtype`` that can represent the values (by applying promotion
    rules when necessary.)
copy : bool, optional
    If ``True`` (default), then the array data is copied. If ``None``,
    a copy will only be made if ``__array__`` returns a copy, if obj is
    a nested sequence, or if a copy is needed to satisfy any of the other
    requirements (``dtype``, ``order``, etc.). Note that any copy of
    the data is shallow, i.e., for arrays with object dtype, the new
    array will po

In [5]:
np.__dir__

<function numpy.__dir__()>

In [4]:
dir(np)

['False_',
 'ScalarType',
 'True_',
 '_CopyMode',
 '_NoValue',
 '__NUMPY_SETUP__',
 '__all__',
 '__array_api_version__',
 '__array_namespace_info__',
 '__builtins__',
 '__cached__',
 '__config__',
 '__dir__',
 '__doc__',
 '__expired_attributes__',
 '__file__',
 '__former_attrs__',
 '__future_scalars__',
 '__getattr__',
 '__loader__',
 '__name__',
 '__numpy_submodules__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_array_api_info',
 '_core',
 '_distributor_init',
 '_expired_attrs_2_0',
 '_get_promotion_state',
 '_globals',
 '_int_extended_msg',
 '_mat',
 '_msg',
 '_pyinstaller_hooks_dir',
 '_pytesttester',
 '_set_promotion_state',
 '_specific_msg',
 '_type_info',
 '_typing',
 '_utils',
 'abs',
 'absolute',
 'acos',
 'acosh',
 'add',
 'all',
 'allclose',
 'amax',
 'amin',
 'angle',
 'any',
 'append',
 'apply_along_axis',
 'apply_over_axes',
 'arange',
 'arccos',
 'arccosh',
 'arcsin',
 'arcsinh',
 'arctan',
 'arctan2',
 'arctanh',
 'argmax',
 'argmin',
 'argpartition',
 

In [6]:
len(dir(np))

535

In [13]:
len(dir(digit))

168

In [14]:
digit.mean()

np.float64(5.5)

In [15]:
round(digit.std(),2)

np.float64(2.87)

In [16]:
np.round(digit.std(),2)

np.float64(2.87)

In [17]:
np.median(digit)

np.float64(5.5)

In [18]:
new_array = np.arange(100).reshape(20,5)
new_array

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54],
       [55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64],
       [65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74],
       [75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84],
       [85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94],
       [95, 96, 97, 98, 99]])

In [22]:
new = new_array.T

In [23]:
new.shape

(5, 20)

In [21]:
new_array.transpose()

array([[ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75,
        80, 85, 90, 95],
       [ 1,  6, 11, 16, 21, 26, 31, 36, 41, 46, 51, 56, 61, 66, 71, 76,
        81, 86, 91, 96],
       [ 2,  7, 12, 17, 22, 27, 32, 37, 42, 47, 52, 57, 62, 67, 72, 77,
        82, 87, 92, 97],
       [ 3,  8, 13, 18, 23, 28, 33, 38, 43, 48, 53, 58, 63, 68, 73, 78,
        83, 88, 93, 98],
       [ 4,  9, 14, 19, 24, 29, 34, 39, 44, 49, 54, 59, 64, 69, 74, 79,
        84, 89, 94, 99]])

In [24]:
new_array.mean(axis=0)

array([47.5, 48.5, 49.5, 50.5, 51.5])

In [25]:
new_array.mean(axis=1)

array([ 2.,  7., 12., 17., 22., 27., 32., 37., 42., 47., 52., 57., 62.,
       67., 72., 77., 82., 87., 92., 97.])

In [26]:
new_array[0]

array([0, 1, 2, 3, 4])

In [27]:
new_array[[0,1,2,5,6]]

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34]])

In [29]:
new_array[9,[2,3]]

array([47, 48])

In [28]:
new_array[:10]

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49]])

In [30]:
new_array[:10,0:3]

array([[ 0,  1,  2],
       [ 5,  6,  7],
       [10, 11, 12],
       [15, 16, 17],
       [20, 21, 22],
       [25, 26, 27],
       [30, 31, 32],
       [35, 36, 37],
       [40, 41, 42],
       [45, 46, 47]])

In [31]:
new_array * 2

array([[  0,   2,   4,   6,   8],
       [ 10,  12,  14,  16,  18],
       [ 20,  22,  24,  26,  28],
       [ 30,  32,  34,  36,  38],
       [ 40,  42,  44,  46,  48],
       [ 50,  52,  54,  56,  58],
       [ 60,  62,  64,  66,  68],
       [ 70,  72,  74,  76,  78],
       [ 80,  82,  84,  86,  88],
       [ 90,  92,  94,  96,  98],
       [100, 102, 104, 106, 108],
       [110, 112, 114, 116, 118],
       [120, 122, 124, 126, 128],
       [130, 132, 134, 136, 138],
       [140, 142, 144, 146, 148],
       [150, 152, 154, 156, 158],
       [160, 162, 164, 166, 168],
       [170, 172, 174, 176, 178],
       [180, 182, 184, 186, 188],
       [190, 192, 194, 196, 198]])

In [35]:
lst = [1,2,3,4,5,6,7,8,9,10,11,12]
ls = []
for i in lst:
    ls.append(i*2)

ls

[2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24]

In [37]:
new_array

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54],
       [55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64],
       [65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74],
       [75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84],
       [85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94],
       [95, 96, 97, 98, 99]])

In [36]:
new_array % 2

array([[0, 1, 0, 1, 0],
       [1, 0, 1, 0, 1],
       [0, 1, 0, 1, 0],
       [1, 0, 1, 0, 1],
       [0, 1, 0, 1, 0],
       [1, 0, 1, 0, 1],
       [0, 1, 0, 1, 0],
       [1, 0, 1, 0, 1],
       [0, 1, 0, 1, 0],
       [1, 0, 1, 0, 1],
       [0, 1, 0, 1, 0],
       [1, 0, 1, 0, 1],
       [0, 1, 0, 1, 0],
       [1, 0, 1, 0, 1],
       [0, 1, 0, 1, 0],
       [1, 0, 1, 0, 1],
       [0, 1, 0, 1, 0],
       [1, 0, 1, 0, 1],
       [0, 1, 0, 1, 0],
       [1, 0, 1, 0, 1]])

In [38]:
new_array % 2 == 0

array([[ True, False,  True, False,  True],
       [False,  True, False,  True, False],
       [ True, False,  True, False,  True],
       [False,  True, False,  True, False],
       [ True, False,  True, False,  True],
       [False,  True, False,  True, False],
       [ True, False,  True, False,  True],
       [False,  True, False,  True, False],
       [ True, False,  True, False,  True],
       [False,  True, False,  True, False],
       [ True, False,  True, False,  True],
       [False,  True, False,  True, False],
       [ True, False,  True, False,  True],
       [False,  True, False,  True, False],
       [ True, False,  True, False,  True],
       [False,  True, False,  True, False],
       [ True, False,  True, False,  True],
       [False,  True, False,  True, False],
       [ True, False,  True, False,  True],
       [False,  True, False,  True, False]])

In [39]:
new_array[new_array % 2 == 0]

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32,
       34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66,
       68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98])

In [41]:
nm = new_array.sum(axis = 1) < 100

In [42]:
new_array[nm]

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [43]:
nmw = new_array.mean(axis = 0)<50

In [44]:
new_array[:,nmw]

array([[ 0,  1,  2],
       [ 5,  6,  7],
       [10, 11, 12],
       [15, 16, 17],
       [20, 21, 22],
       [25, 26, 27],
       [30, 31, 32],
       [35, 36, 37],
       [40, 41, 42],
       [45, 46, 47],
       [50, 51, 52],
       [55, 56, 57],
       [60, 61, 62],
       [65, 66, 67],
       [70, 71, 72],
       [75, 76, 77],
       [80, 81, 82],
       [85, 86, 87],
       [90, 91, 92],
       [95, 96, 97]])

In [45]:
np.add(new_array,5)

array([[  5,   6,   7,   8,   9],
       [ 10,  11,  12,  13,  14],
       [ 15,  16,  17,  18,  19],
       [ 20,  21,  22,  23,  24],
       [ 25,  26,  27,  28,  29],
       [ 30,  31,  32,  33,  34],
       [ 35,  36,  37,  38,  39],
       [ 40,  41,  42,  43,  44],
       [ 45,  46,  47,  48,  49],
       [ 50,  51,  52,  53,  54],
       [ 55,  56,  57,  58,  59],
       [ 60,  61,  62,  63,  64],
       [ 65,  66,  67,  68,  69],
       [ 70,  71,  72,  73,  74],
       [ 75,  76,  77,  78,  79],
       [ 80,  81,  82,  83,  84],
       [ 85,  86,  87,  88,  89],
       [ 90,  91,  92,  93,  94],
       [ 95,  96,  97,  98,  99],
       [100, 101, 102, 103, 104]])

In [46]:
np.add(new_array,5, where=(new_array % 2 == 0))

array([[  5,   1,   7,   1,   9],
       [  1,  11,   1,  13,   1],
       [ 15,   1,  17,   1,  19],
       [  1,  21,   1,  23,   1],
       [ 25,   1,  27,   1,  29],
       [  1,  31,   1,  33,   1],
       [ 35,   1,  37,   1,  39],
       [  1,  41,   1,  43,   1],
       [ 45,   1,  47,   1,  49],
       [  1,  51,   1,  53,   1],
       [ 55,   1,  57,   1,  59],
       [  1,  61,   1,  63,   1],
       [ 65,   1,  67,   1,  69],
       [  1,  71,   1,  73,   1],
       [ 75,   1,  77,   1,  79],
       [  1,  81,   1,  83,   1],
       [ 85,   1,  87,   1,  89],
       [  1,  91,   1,  93,   1],
       [ 95,   1,  97,   1,  99],
       [  1, 101,   1, 103,   1]])

In [47]:
np.add?

[1;31mCall signature:[0m  [0mnp[0m[1;33m.[0m[0madd[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m            ufunc
[1;31mString form:[0m     <ufunc 'add'>
[1;31mFile:[0m            c:\users\colli\appdata\local\programs\python\python310\lib\site-packages\numpy\__init__.py
[1;31mDocstring:[0m      
add(x1, x2, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True[, signature])

Add arguments element-wise.

Parameters
----------
x1, x2 : array_like
    The arrays to be added.
    If ``x1.shape != x2.shape``, they must be broadcastable to a common
    shape (which becomes the shape of the output).
out : ndarray, None, or tuple of ndarray and None, optional
    A location into which the result is stored. If provided, it must have
    a shape that the inputs broadcast to. If not provided or None,
    a freshly-allocated array is returned. A tuple (possible only as

Data Analysis and Visualization

Financial Mathematics

### Exersice

You have daily stock prices of a company for the last month, and you want to calculate important statistics like the average price, highest and lowest prices, and the percentage changes day-to-day.

In [48]:
price = np.array([150.5, 152.3, 151.2, 154.1, 153.7, 152.8, 151.0, 149.8, 151.7,
                         152.4, 153.1, 154.7, 155.2, 157.0, 156.5, 158.1, 159.2, 160.0,
                         161.5, 162.7, 161.2, 160.8, 159.6, 158.2, 157.3, 156.1, 157.9,
                         159.0, 158.4, 157.7])

average = np.mean(price)
highest = np.max(price)
lowest = np.min(price)
percentage_change = np.diff(price) / price[:-1] * 100


print(f"Average price: {average:.2f}")
print(f"Highest price: {highest}")
print(f"Lowest price: {lowest}")
print(f"Percentage change: {percentage_change}")


Average price: 156.12
Highest price: 162.7
Lowest price: 149.8
Percentage change: [ 1.19601329 -0.7222587   1.91798942 -0.25957171 -0.58555628 -1.17801047
 -0.79470199  1.26835781  0.46143705  0.45931759  1.04506858  0.32320621
  1.15979381 -0.31847134  1.02236422  0.69576218  0.50251256  0.9375
  0.74303406 -0.92194222 -0.24813896 -0.74626866 -0.87719298 -0.56890013
 -0.76287349  1.15310698  0.69664345 -0.37735849 -0.44191919]
