![Quote](http://darlington.infinityfreeapp.com/images/quote7.png)

# 1. Introduction

Welcome to another insightful lesson in our Python Essentials course! In this important session, we will explore the basics of **using Python for data analysis**. Python, due to its simplicity and powerful libraries, stands out as the preferred language for data analysis worldwide. As data analysis has become increasingly important in various fields and industries, both individuals and organizations rely heavily on Python to make well-informed decisions based on data. By the end of this lesson, you'll gain valuable knowledge that will empower you and enhance your proficiency in this essential skill.

# 2. What is Data Analysis?

Data analysis is the systematic process of **extracting valuable insights and knowledge from raw data**. In today's data-driven world, organizations across various sectors such as *business*, *finance*, *healthcare*, *marketing*, and *research* heavily depend on it to make informed decisions. By analyzing data, organizations and individuals can effectively solve real-world problems, identify emerging trends, detect anomalies, make accurate predictions, and optimize their business processes.

# 3. Python Libraries for Data Analysis

Python offers a wide range of libraries, such as **Pandas**, **NumPy**, and **Matplotlib**, for data analysis. These libraries provide powerful tools and functions to efficiently manipulate, clean, visualize, and analyze data.

### 3.2 NumPy

NumPy (**Numerical Python**) is a powerful Python library used for **working with arrays and performing mathematical operations** on them.

#### Using NumPy

**Install NumPy**: Before using NumPy, you need to install it. You can install NumPy using the following command:

In [1]:
# Installing numpy using Jupyter notebook
!pip install numpy




[notice] A new release of pip available: 22.3.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


**Import NumPy**: Once NumPy is installed, you can import it into your Python program using the import statement. It is common practice to import NumPy with the alias **np** for better readability.

In [2]:
# Importing numpy library
import numpy as np

**Performing Operations**: NumPy offers a wide range of functions and methods for efficiently performing various numerical operations on arrays, including data from your CSV file. With NumPy, we can effortlessly analyze our sales data, perform calculations, and tackle a variety of numerical tasks with ease.

### 3.3 Matplotlib

Matplotlib is a **plotting and visualization library** that enables you to create a wide variety of static, animated, and interactive visualizations in Python. With Matplotlib, you can generate plots, histograms, scatterplots, and more to explore and communicate your data effectively.

#### Using Matplotlib

**Install Matplotlib**: Before you can use Matplotlib, you need to install it by running the following command:

In [3]:
# Installing matplotlib using Jupyter notebook
!pip install matplotlib

Collecting matplotlib
  Downloading matplotlib-3.9.2-cp310-cp310-win_amd64.whl (7.8 MB)
     ---------------------------------------- 7.8/7.8 MB 10.2 MB/s eta 0:00:00
Collecting contourpy>=1.0.1
  Downloading contourpy-1.3.0-cp310-cp310-win_amd64.whl (216 kB)
     ------------------------------------- 216.0/216.0 kB 13.7 MB/s eta 0:00:00
Collecting fonttools>=4.22.0
  Downloading fonttools-4.54.1-cp310-cp310-win_amd64.whl (2.2 MB)
     ---------------------------------------- 2.2/2.2 MB 15.6 MB/s eta 0:00:00
Collecting cycler>=0.10
  Downloading cycler-0.12.1-py3-none-any.whl (8.3 kB)
Collecting kiwisolver>=1.3.1
  Downloading kiwisolver-1.4.7-cp310-cp310-win_amd64.whl (55 kB)
     ---------------------------------------- 55.9/55.9 kB 2.9 MB/s eta 0:00:00
Collecting pillow>=8
  Downloading pillow-10.4.0-cp310-cp310-win_amd64.whl (2.6 MB)
     ---------------------------------------- 2.6/2.6 MB 16.3 MB/s eta 0:00:00
Collecting pyparsing>=2.3.1
  Downloading pyparsing-3.1.4-py3-none-any.


[notice] A new release of pip available: 22.3.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. There are several important differences between NumPy arrays and the standard Python sequences:

- NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original.

- The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory. The exception: one can have arrays of (Python, including NumPy) objects, thereby allowing for arrays of different sized elements.

- NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are executed more efficiently and with less code than is possible using Python’s built-in sequences.

A growing plethora of scientific and mathematical Python-based packages are using NumPy arrays; though these typically support Python-sequence input, they convert such input to NumPy arrays prior to processing, and they often output NumPy arrays. In other words, in order to efficiently use much (perhaps even most) of today’s scientific/mathematical Python-based software, just knowing how to use Python’s built-in sequence types is insufficient - one also needs to know how to use NumPy arrays.

In [4]:
import numpy as np

Data Analysis and Visualization

Financial Mathematics

In [6]:
import numpy as np

digit=np.array(range(1,11))
type(digit)


numpy.ndarray

In [9]:
digit=np.array(np.arange(1,11))
print(digit.shape)
print(digit)
print(digit.ndim)

(10,)
[ 1  2  3  4  5  6  7  8  9 10]
1


In [29]:
digit=np.array(np.arange(1,11)).reshape(2,5,1)
print(digit)
print(digit.shape)
print(digit.ndim)

[[[ 1]
  [ 2]
  [ 3]
  [ 4]
  [ 5]]

 [[ 6]
  [ 7]
  [ 8]
  [ 9]
  [10]]]
(2, 5, 1)
3


In [33]:
len(dir(np))

530

In [35]:
len(dir(digit))

168

In [36]:
len(dir(digit))

168

In [37]:
digit.mean()

np.float64(5.5)

In [42]:
print(np.round(digit.std(),2))

2.87


In [45]:
print(np.median(digit))

5.5


In [46]:
np.mean(digit)

np.float64(5.5)

In [47]:
digit.ndim

3

In [56]:
new_array = np.arange(100).reshape(20, 5)
new_array

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54],
       [55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64],
       [65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74],
       [75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84],
       [85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94],
       [95, 96, 97, 98, 99]])

In [61]:
new = new_array.T #transpose()

In [64]:
new.shape

(5, 20)

In [68]:
new_array.mean(axis=1)

array([ 2.,  7., 12., 17., 22., 27., 32., 37., 42., 47., 52., 57., 62.,
       67., 72., 77., 82., 87., 92., 97.])

In [70]:
new_array[:10, 0:3]

array([[ 0,  1,  2],
       [ 5,  6,  7],
       [10, 11, 12],
       [15, 16, 17],
       [20, 21, 22],
       [25, 26, 27],
       [30, 31, 32],
       [35, 36, 37],
       [40, 41, 42],
       [45, 46, 47]])

In [75]:
new_array[new_array%2== 0]

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32,
       34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66,
       68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98])

In [77]:
nm=new_array.sum(axis=1)<100



In [78]:
new_array[nm]

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [82]:
np.add(new_array,5, where=(new_array%2==0))

array([[  5,   1,   7,   1,   9],
       [  1,  11,   1,  13,   1],
       [ 15,   1,  17,   1,  19],
       [  1,  21,   1,  23,   1],
       [ 25,   1,  27,   1,  29],
       [  1,  31,   1,  33,   1],
       [ 35,   1,  37,   1,  39],
       [  1,  41,   1,  43,   1],
       [ 45,   1,  47,   1,  49],
       [  1,  51,   1,  53,   1],
       [ 55,   1,  57,   1,  59],
       [  1,  61,   1,  63,   1],
       [ 65,   1,  67,   1,  69],
       [  1,  71,   1,  73,   1],
       [ 75,   1,  77,   1,  79],
       [  1,  81,   1,  83,   1],
       [ 85,   1,  87,   1,  89],
       [  1,  91,   1,  93,   1],
       [ 95,   1,  97,   1,  99],
       [  1, 101,   1, 103,   1]])

### Exersice

You have daily stock prices of a company for the last month, and you want to calculate important statistics like the average price, highest and lowest prices, and the percentage changes day-to-day. (python mit numpy)

In [84]:
[150.5, 152.3, 151.2, 154.1, 153.7, 152.8, 151.0, 149.8, 151.7,
                         152.4, 153.1, 154.7, 155.2, 157.0, 156.5, 158.1, 159.2, 160.0,
                         161.5, 162.7, 161.2, 160.8, 159.6, 158.2, 157.3, 156.1, 157.9,
                         159.0, 158.4, 157.7]

[150.5,
 152.3,
 151.2,
 154.1,
 153.7,
 152.8,
 151.0,
 149.8,
 151.7,
 152.4,
 153.1,
 154.7,
 155.2,
 157.0,
 156.5,
 158.1,
 159.2,
 160.0,
 161.5,
 162.7,
 161.2,
 160.8,
 159.6,
 158.2,
 157.3,
 156.1,
 157.9,
 159.0,
 158.4,
 157.7]

In [89]:
import numpy as np

prices = np.array([150.5, 152.3, 151.2, 154.1, 153.7, 152.8, 151.0, 149.8, 151.7,
                         152.4, 153.1, 154.7, 155.2, 157.0, 156.5, 158.1, 159.2, 160.0,
                         161.5, 162.7, 161.2, 160.8, 159.6, 158.2, 157.3, 156.1, 157.9,
                         159.0, 158.4, 157.7])

average = np.mean(prices)
highest = np.max(prices)
lowest = np.min(prices)
percentage_changes = np.diff(prices) / prices[:-1] *100


print(f"Average price: {average:.2f}")
print(f"Highest price: {highest}")
print(f"lowest price: {lowest}")
print(f"percentage changes: {percentage_changes}")

Average price: 156.12
Highest price: 162.7
lowest price: 149.8
percentage changes: [ 1.19601329 -0.7222587   1.91798942 -0.25957171 -0.58555628 -1.17801047
 -0.79470199  1.26835781  0.46143705  0.45931759  1.04506858  0.32320621
  1.15979381 -0.31847134  1.02236422  0.69576218  0.50251256  0.9375
  0.74303406 -0.92194222 -0.24813896 -0.74626866 -0.87719298 -0.56890013
 -0.76287349  1.15310698  0.69664345 -0.37735849 -0.44191919]
