<a href="https://colab.research.google.com/github/Hasanaraji/Numpy/blob/main/Python_NumPy_%26_File_handling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NUMPY

## Introduction

Python has many packages for working with data and performing operations like the loading, analysing and storing of data. Numpy and Pandas are two commonly used and useful ones. In this train, you will learn some of the basic operations in Numpy and Pandas.

# Numpy
Numpy is a Python package which provides high-performance, multidimensional array objects, numerical computing tools, and is fundamental in scientific computing. It is the core library for scientific computing in Python ([see the full documentation](https://numpy.org/)).

### What is an Array?
Similar to that of a Python list, a numpy array is a data structure which stores multiple values. The main difference between the two is that lists can contain heterogeneous data types (combinations of `str`, `int`, even `list`), whilst numpy arrays can only store values of the same data type. Numpy arrays can be thought of as a grid of values and can be multi-dimensional.

Numpy arrays are stored more efficiently than Python lists and allow mathematical operations to be vectorized, which results in significantly higher performance than with looping constructs in Python.

### Creating a Numpy Array
If we want to work with any numpy objects or functions we need to import the Numpy library, but first we want to that our package installer is up to date and the numpy we'll be working with is the latest version.

In [None]:
# Ensure the Package Installer is up to date.

!pip install --upgrade pip

Collecting pip
  Downloading pip-25.1.1-py3-none-any.whl.metadata (3.6 kB)
Downloading pip-25.1.1-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m20.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.1.2
    Uninstalling pip-24.1.2:
      Successfully uninstalled pip-24.1.2
Successfully installed pip-25.1.1


In [None]:
# Install latest numpy
!pip install numpy



In [1]:
# Import numpy into our workspace

import numpy as np

Understanding dimensionality

In [None]:
Zero_d = 98

one_d = [98, 67, 45]

two_d = [
    [98, 67, 45],
    [89, 34, 78],
    [45, 43, 56],
    [12, 23, 53]
    ]

three_d = [
    [
    [98, 67, 45],
    [89, 34, 78],
    [45, 43, 56],
    [12, 23, 53]
    ],
    [
    [98, 67, 45],
    [89, 34, 78],
    [45, 43, 56],
    [12, 23, 53]
    ]
    ]

To make a numpy array, you use the `np.array()` function. All you need to do is pass a list to it and optionally, you can specify the data type of the data. Let's look at an example:

In [None]:
print([1,3,4])  # a list

[1, 3, 4]


In [None]:
np.array([1,3,4])  #an array

array([1, 3, 4])

In [None]:
# Create our first array

# Creating a 1D array
arr_1d = np.array([1, 2, 3])

print(arr_1d)
print(type(arr_1d))

[1 2 3]
<class 'numpy.ndarray'>


In [None]:
list1 = [1,2,3,4]  # a normal list

print(list1)

[1, 2, 3, 4]


In [None]:
# Create array from an existing list

arr_1d2 = np.array(list1)

print(arr_1d2)
print(type(arr_1d2))

[1 2 3 4]
<class 'numpy.ndarray'>


We can inspect the shape of the array by using its shape attribute. This will return a tuple of integers giving the size of the array along each dimension.

In [None]:
#confirm the dimension of numpy array using arr.shape attribute

arr_1d.shape

(3,)

Seeing just one number in the output of the shape attribute, tells us we have a 1-D numpy array.

In [None]:
# Creating a 2D array

arr_2d = np.array([[1, 2, 3],
                   [4, 5, 6]]) #looks like creating a list of lists.
print(arr_2d)
print(type(arr_2d))

[[1 2 3]
 [4 5 6]]
<class 'numpy.ndarray'>


In [None]:
arr_2d.shape

(2, 3)

Seeing 2 numbers in the output of the shape attribute, tells us we have a 2-D numpy array.

In [None]:
#3-d array

# Create a 3D array with shape (2,3,4)
#(2planes with each containing 3rows and 4 columns)
arr_3d = np.array([[[1, 2, 3, 4],
                   [5, 6, 7, 8],
                   [9, 10, 11, 12]],

                  [[13, 14, 15, 16],
                   [17, 18, 19, 20],
                   [21, 22, 23, 24]]])

print(arr_3d)

[[[ 1  2  3  4]
  [ 5  6  7  8]
  [ 9 10 11 12]]

 [[13 14 15 16]
  [17 18 19 20]
  [21 22 23 24]]]


In [None]:
#check shape
print(arr_3d.shape)

(2, 3, 4)


Seeing 3 numbers in the output of the shape attribute, tells us we have a 3-D numpy array.

## Indexing & Slicing

Recall the indexing, slicing and mutability of lists

In [None]:
students =["Simon Ogunleye", "Jesselyn Ayanka", "Philip Donatus"]

In [None]:
students[2] #access the 3rd element

'Philip Donatus'

In [None]:
students[2:] #slice containing the last element

['Philip Donatus']

In [None]:
students[2][0] #access the first element in the 3rd element

'P'

In [None]:
students[:2][0] #access the first element in a slice

'Simon Ogunleye'

In [None]:
print(students[:2][0][:4])

Simo


In [None]:
b = ["we", "go", "up"]

#replacing element in a list
b[2] = "down"

print(b)

['we', 'go', 'down']


Indexing and slicing in numpy arrays are similar to those in most other data structures like lists and tuples

In [None]:
arr_1d

array([1, 2, 3])

Conventional indexing & Slicing

In [None]:
# Access the 2nd item
arr_1d[1]

np.int64(2)

In [None]:
#slice out an array containing only the first 2 items
arr_1d[:2]

array([1, 2])

In [None]:
arr_1d[-1:]  #negative indexing also works

array([3])

In [None]:
#recall arr_2d
arr_2d

array([[1, 2, 3],
       [4, 5, 6]])

In [None]:
arr_2d[0]

array([1, 2, 3])

In [None]:
arr_2d[0][0]

np.int64(1)

In [None]:
arr_2d[0][:2]

array([1, 2])

In [None]:
arr_2d3 = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
arr_2d3

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [None]:
arr_2d3[1:]

array([[ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

#### Slicing

Similar to Python lists, Numpy arrays can be sliced.

Since arrays may be multi-dimensional, you must specify a slice for each dimension of the array, where the slices per dimension is split by a comma.

For a 2-D array, the first dimension is the row while the second dimension is the column.

    np.array[row , column] - for one element

    np.array[row start:row end , column start:column end] - for an array slice i.e., more than one element

Let's look at a few examples:

In [None]:
# row-wise slicing
arr_2d3[:2]

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [None]:
#column-wise slicing
arr_2d3[:,1:3]

array([[ 2,  3],
       [ 6,  7],
       [10, 11]])

In [None]:
arr_2d3

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [None]:
arr_2d3[:2,1:3]

array([[2, 3],
       [6, 7]])

In [None]:
arr_2d3[:2,:2]

array([[1, 2],
       [5, 6]])

In [None]:
arr_2d3[-2:,-2:]

array([[ 7,  8],
       [11, 12]])

In [None]:
arr_2d3[1,1]

In [None]:
arr_2d3

In [None]:
arr_2d3[1,1:3]

In [None]:
arr_2d3[:,0]

In [None]:
arr_2d3[:,0].shape

In [None]:
arr_2d3[:,0:1]

In [None]:
arr_2d3[:,0:1].shape

In [None]:
std_arr= np.array(students)
std_arr

In [None]:
print(arr_3d.shape, "\n")
print(arr_3d)

In [None]:
# Use indexing to find 8, 16, 24, 5, 17, 22

In [None]:
arr_3d[0,1,3]

In [None]:
array_a = np.array([[1,2,3],[4,5,6]], dtype = np.complex64)

array_a

Re-assign values to array

In [None]:
#replace a single element
print(arr_1d2)

arr_1d2[1] = 14

print(arr_1d2)

[1 2 3 4]
[ 1 14  3  4]


In [None]:
arr_2d3

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [None]:
#replace a section/slice with a scalar
arr_2d3[:,3] = 30
arr_2d3

array([[ 1,  2,  3, 30],
       [ 5,  6,  7, 30],
       [ 9, 10, 11, 30]])

In [None]:
#replace a section/slice with a list/array
arr_2d3[:,1] = [3, 1, 5]
arr_2d3

array([[ 1,  3,  3, 30],
       [ 5,  1,  7, 30],
       [ 9,  5, 11, 30]])

In [None]:
print(arr_1d2)
arr_1d2[:] = 25
print(arr_1d2)

[1 2 3 4]
[25 25 25 25]


#### Boolean Array Indexing
Boolean array indexing let's you pick out a selection of elements from an array. This type of indexing is often used to select the elements of an array which satisfy a specific condition.

    np.array[condition]

In [None]:
arr_2d3>5

array([[False, False, False,  True],
       [False,  True,  True,  True],
       [ True,  True,  True,  True]])

In [None]:
big_arr_2d3 = arr_2d3[arr_2d3>5]
big_arr_2d3

array([30,  6,  7, 30,  9, 10, 11, 30])

#### Modifying Numpy Arrays
We will now look at how to add elements to an array, followed by how to remove elements from an array.

**Adding Elements**

Adding elements can be done by using the `np.append()` function. This will add elements to the end of an array.

**Removing Elements**

Deleting elements can be done by using the `np.delete()` function. This will delete elements at the specified indices.

**Reshape an array**

We can reshape the array into any dimension, to do so we use the `np.reshape` function.

In [None]:
arr_2d

array([[1, 2, 3],
       [4, 5, 6]])

In [None]:
#addin element
np.append(arr_2d,7)


array([1, 2, 3, 4, 5, 6, 7])

In [None]:
#removing items from numpy array

arr_2d3

array([[ 1,  2,  3, 30],
       [ 5,  6,  7, 30],
       [ 9, 10, 11, 30]])

`np.delete()` deletes a slice of a numpy array after specifying the following:

1. the array to modify
2. the index to delete
3. the axis or dimension to remove the slice from.

In [None]:
# delete the 4th column from arr_2d3
np.delete(arr_2d3, 3, 1)

array([[ 1,  2,  3],
       [ 5,  6,  7],
       [ 9, 10, 11]])

In [None]:
#delete the 2nd row on arr_2d3
np.delete(arr_2d3, 1, 0)

array([[ 1,  2,  3, 30],
       [ 9, 10, 11, 30]])

## Reshaping

arrays can be reshaped using the `array.reshape()` method.

Note that numpy is big on homogenuity

In [20]:
arr_2d4 =np.array([[1.1, 2, 3],[4, 5, 6],[7,8,9]])
arr_2d4

array([[1.1, 2. , 3. ],
       [4. , 5. , 6. ],
       [7. , 8. , 9. ]])

In [None]:
arr_1d2

array([25, 25, 25, 25])

In [None]:
arr_1d2.shape

(4,)

Note that the product of the arguments passed as the new shape must equal the total items in the initial array

In [None]:
# create a 2d array from a 1d array using reshape() method

arr_2d5 = arr_1d2.reshape(4,1)

In [None]:
arr_2d5

array([[25],
       [25],
       [25],
       [25]])

In [None]:
arr_2d6 = arr_1d2.reshape(2,2)
arr_2d6

array([[25, 25],
       [25, 25]])

In [None]:
arr_2d

array([[1, 2, 3],
       [4, 5, 6]])

In [None]:
#reshape in a 1-d array with 6 elements
arr_1d4 = arr_2d.reshape(6)

arr_1d4

array([1, 2, 3, 4, 5, 6])

In [None]:
print(arr_2d)
arr_2d7 = arr_2d.reshape(3,2)
print(arr_2d7)

[[1 2 3]
 [4 5 6]]
[[1 2]
 [3 4]
 [5 6]]


In [None]:
arr_2d3

array([[ 1,  3,  3, 30],
       [ 5,  1,  7, 30],
       [ 9,  5, 11, 30]])

In [None]:
arr_3d2 = arr_2d3.reshape(3,2,2)
arr_3d2

array([[[ 1,  3],
        [ 3, 30]],

       [[ 5,  1],
        [ 7, 30]],

       [[ 9,  5],
        [11, 30]]])

## Element-wise operations with numpy

* between array and scalar
* between array and array

### between array and scalar

In [None]:
matrix_B = arr_2d4.copy()
matrix_B

array([[1.1, 2. , 3. ],
       [4. , 5. , 6. ],
       [7. , 8. , 9. ]])

In [None]:
print(matrix_B)

print(matrix_B +10) #add 10 to each element in matrix_B

[[1.1 2.  3. ]
 [4.  5.  6. ]
 [7.  8.  9. ]]
[[11.1 12.  13. ]
 [14.  15.  16. ]
 [17.  18.  19. ]]


In [None]:
print(matrix_B)
print(matrix_B//2) ##floor division of each element in matrix_B by 2

[[1.1 2.  3. ]
 [4.  5.  6. ]
 [7.  8.  9. ]]
[[0. 1. 1.]
 [2. 2. 3.]
 [3. 4. 4.]]


In [None]:
doubled_b = matrix_B*2
doubled_b

array([[ 2.2,  4. ,  6. ],
       [ 8. , 10. , 12. ],
       [14. , 16. , 18. ]])

### between arrays and arrays

* arrays must be of same shape for addition and subtraction

* multiplication: just like matrices multiplication
arrays, must be compatible for element_wise multiplication.

In [None]:
matrix_A = np.array([[3,4,2],[5,7,8],[1,0,9]])
print(matrix_A)
print(matrix_B)

[[3 4 2]
 [5 7 8]
 [1 0 9]]
[[1.1 2.  3. ]
 [4.  5.  6. ]
 [7.  8.  9. ]]


In [None]:
matrix_B - matrix_A

array([[-1.9, -2. ,  1. ],
       [-1. , -2. , -2. ],
       [ 6. ,  8. ,  0. ]])

In [None]:
matrix_B + matrix_A

array([[ 4.1,  6. ,  5. ],
       [ 9. , 12. , 14. ],
       [ 8. ,  8. , 18. ]])

In [None]:
matrix_B * matrix_A

array([[ 3.3,  8. ,  6. ],
       [20. , 35. , 48. ],
       [ 7. ,  0. , 81. ]])

In [None]:
np.multiply(matrix_B, matrix_A)

array([[ 3.3,  8. ,  6. ],
       [20. , 35. , 48. ],
       [ 7. ,  0. , 81. ]])

Further reading:

* dot-multiplication: `np.dot()`
* matrix multiplications: `np.matmul()`

## Data types in numpy

```
* np.int32, np.int16, np.int64
* np.float32, np.float16, np.float64
* np.str
* np.bool
* np.complex64

```


In [None]:
# Create an initial integer array
array_a = np.array([[1, 2, 3.5], [4, 5, 6]], np.int32)
array_a
#print("Integer array:\n", array_a)

array([[1, 2, 3],
       [4, 5, 6]], dtype=int32)

In [None]:
# Create a float array
array_b = np.array([[1, 2, 3], [4, 5, 6]], np.float64)
print("float array:", array_b)

float array: [[1. 2. 3.]
 [4. 5. 6.]]


In [None]:
np.array(students, np.str_)

array(['Simon Ogunleye', 'Jesselyn Ayanka'], dtype='<U15')

In [None]:
# Create a string array
array_s = np.array([[1, 2, 3], [4, 5, 6]], np.str_)
print("String array:", array_s)

String array: [['1' '2' '3']
 ['4' '5' '6']]


In [None]:
# Create a boolean array
array_d = np.array([[1, 2, 3], [0, 5, 6]], np.bool)
print("Boolean array:", array_d)

Boolean array: [[ True  True  True]
 [False  True  True]]


In [None]:
# Create a complex number array
array_c = np.array([[1, 2, 3], [0, 5, 6]], np.complex256)

#print("Complex number array:", array_c)

Complex number array: [[1.+0.j 2.+0.j 3.+0.j]
 [0.+0.j 5.+0.j 6.+0.j]]


## Generate data with numpy

```
* np.arange()
* np.empty()
* 0s and 1s {np.zeros(), np.ones()}
* np.full()
* np.empty_like()
* np.linspace()
* np.random()
```


### np.arange

Creates an array with evenly spaced values within a specified range (similar to Python’s `range()` but returns an array).

In [2]:
#recall the range() function

range(2,10,3)

range(2, 10, 3)

In [3]:
list(range(2,10,3))

[2, 5, 8]

np.arange() follows this rule.

Syntax:

    np.arange(start, stop, step, dtype)


In [9]:
np.arange(5,20,2,np.float32)

array([ 5.,  7.,  9., 11., 13., 15., 17., 19.], dtype=float32)

In [4]:
np.arange(0,10,2)

array([0, 2, 4, 6, 8])

In [5]:
b = np.arange(10,25,3)

In [6]:
b

array([10, 13, 16, 19, 22])

### np.empty()

 Creates an array without initializing its values (values will be random). Yes it is fast, but values are garbage, so set them explicitly later, if needed.


In [13]:
# np.empty creates empty array (arbitrary values) of a desired shape

array_empty = np.empty(shape = (4,3), dtype= int)  #must state shape as argument
array_empty

array([[              38844,                   0,                   0],
       [                  0,                   0, 8319683848551211643],
       [3180222411935070754, 4189017755886035488, 7308535291872901920],
       [4189017755886051700, 8027140907415206688, 7018332503360695925]])

### np.zeros & np.ones

* np.zeros() - Creates an array filled with zeros.
* np.ones() - Creates an array filled with ones.

In [15]:
#np.zeros() & np.ones() returns array of 0s and 1s for a desired array shape

arr_of_0s = np.zeros(shape=(4,2))

arr_of_1s = np.ones(shape=(3,4),dtype= int)

print(arr_of_0s)
print(arr_of_1s)

[[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]
[[1 1 1 1]
 [1 1 1 1]
 [1 1 1 1]]


### np.full()

* np.full() - Creates an array filled with a specific value.

Syntax:

    np.full(shape=(rows,columns), fill_value=x)

In [18]:
#return an 2x3 array filled with 3s

arr_fl_3s = np.full((2,4),3)
print(arr_fl_3s)

arr_fl_yes = np.full(shape=(3,2),fill_value="Yes!")
print(arr_fl_yes)

[[3 3 3 3]
 [3 3 3 3]]
[['Yes!' 'Yes!']
 ['Yes!' 'Yes!']
 ['Yes!' 'Yes!']]


### like function

np.empty_like() returns an empty array (filled with arbitrary values) in the shape of an existing array

* _like() function of a method

* can also come as like argument inside a method

In [19]:
#testing _like
empty_like_yes = np.empty_like(arr_fl_yes)
print(empty_like_yes)  #return empty text arrays

empty_lik_3s = np.empty_like(arr_fl_3s)
print(empty_lik_3s)

[['' '']
 ['' '']
 ['' '']]
[[3 3 3 3]
 [3 3 3 3]]


In [21]:
matrix_B = arr_2d4.copy()
matrix_B

array([[1.1, 2. , 3. ],
       [4. , 5. , 6. ],
       [7. , 8. , 9. ]])

In [22]:
arr_0s_lik_mb = np.zeros_like(matrix_B)
print(arr_0s_lik_mb)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [23]:
# Try on your own.

# np.ones_like

# np.full_like

### np.linspace()

The np.linspace() function in NumPy generates evenly spaced numbers over a specified range.

It allows you to define how many values you get including the specified min(start) and max(stop) value. It infers the stepsize from you specified number of values:

Syntax:
```
np.linspace(start, stop, num)
```

In [24]:
x_axis = np.linspace(0,1,20)
x_axis

array([0.        , 0.05263158, 0.10526316, 0.15789474, 0.21052632,
       0.26315789, 0.31578947, 0.36842105, 0.42105263, 0.47368421,
       0.52631579, 0.57894737, 0.63157895, 0.68421053, 0.73684211,
       0.78947368, 0.84210526, 0.89473684, 0.94736842, 1.        ])

In [25]:
y_axis = np.linspace(1,10,5)
y_axis

array([ 1.  ,  3.25,  5.5 ,  7.75, 10.  ])

### np.random()

It helps to generate random values.

np.random.random() - random floats between 0.0 & 1.0

np.random.randint() - random integers between your specified limits

In [35]:
np.random.random() #return a random float btwn 0.0 &1.0

0.6749545923097363

In [39]:
#to make an array we must pass a shape as its argument.
rand_flt_arr = np.random.random(size=(3,2))
rand_flt_arr

array([[0.64037046, 0.09968804],
       [0.0561214 , 0.23735562],
       [0.25605793, 0.52813231]])

```
np.random.randint(low, high=None, size=None)
```

In [70]:
#return a random integer by passing one argument taken as max
rand_array_1d = np.random.randint(40, size=10)
rand_array_1d

array([ 1, 28, 27,  2, 10, 16, 31, 33, 37,  5])

In [65]:
#to make an array we must pass a size as its argument.
rand_int_arr = np.random.randint(20,size=(3,4))
rand_int_arr

array([[15,  7, 17, 16],
       [ 8, 11, 14,  2],
       [12,  5,  1,  0]])

convert an array to list

In [71]:
list_frm_1d = rand_array_1d.tolist()
list_frm_1d

[1, 28, 27, 2, 10, 16, 31, 33, 37, 5]

Statistical calculations other than sum, max and min are a bit tedious without using numpy.

In [72]:
print(sum(list_frm_1d)); print(max(list_frm_1d)); print(min(list_frm_1d))

190
37
1


In [74]:
rand_list_frm_2d = rand_int_arr.tolist() # a 2-d array
rand_list_frm_2d

[[15, 7, 17, 16], [8, 11, 14, 2], [12, 5, 1, 0]]

In [77]:
#sum(rand_list_frm_2d)

In [78]:
array_empty = np.empty(10)
array_zeros = np.zeros(10)
array_ones = np.ones(10)
array_twos = np.full(10, 2)
print (array_empty)
print (array_zeros)
print(array_ones)
print(array_twos)

[8.62033353e-316 0.00000000e+000 0.00000000e+000 0.00000000e+000
 6.82391406e-310 3.21867037e-057 1.14195227e-071 4.51253070e-090
 6.59478547e-042 6.82395501e-310]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[2 2 2 2 2 2 2 2 2 2]


In [79]:
array2d_empty = np.empty((2, 4))
array2d_zeros = np.zeros((2, 4))
array2d_ones = np.ones((2, 4))
array2d_twos = np.full((2, 4), 2)
print(array2d_empty)
print(array2d_zeros)
print(array2d_ones)
print(array2d_twos)

[[8.21623239e-316 0.00000000e+000 7.70200376e+218 8.90005428e-316]
 [0.00000000e+000 0.00000000e+000 9.88131292e-324 0.00000000e+000]]
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]]
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]]
[[2 2 2 2]
 [2 2 2 2]]


## Statistics With Numpy

mean, median, mode, standard deviation, correlation etc.

|Function           |Measure                  |
|-------------------|---------------------    |
|np.mean()          |Arithmetic mean          |
|np.median()        |Median value             |
|np.std()           |Standard deviation       |
|np.var()           |Variance                 |
|np.min()/np.max()	|Minimum and Maximum      |
|np.percentile()	  |Nth Percentile           |
|np.corrcoef()	    |Correlation coefficient  |
|np.sum()	          |Total sum                |
|np.cumsum()        |Cumulative sum           |

In [81]:
# Creating a 1D array with an odd number of elements
array1 = np.array([10, 2, 3, 4, 5])
median = np.median(array1)
print("Median:", median)

Median: 4.0


In [82]:
mean = np.mean(array1)
print("Mean:", mean)

Mean: 4.8


In [83]:
var, std = np.var(array1), np.std(array1)
print("Variance:", var)
print("Standard deviation:", std)

Variance: 7.760000000000001
Standard deviation: 2.785677655436824


In [86]:
#Getting the statistical measures using functions

print(rand_int_arr)
print(f"Sum: {np.sum(rand_int_arr)}.")
print(f"Minimum value: {np.min(rand_int_arr)}.")
print(f"Maximum value: {np.max(rand_int_arr)}.")
print(f"Median value: {np.median(rand_int_arr)}.")
print(f"Mean: {np.mean(rand_int_arr)}.")
print(f"Variance: {np.var(rand_int_arr)}.")
print(f"Standard deviation: {np.std(rand_int_arr)}.")
print(f"Cumulative Sum: {np.cumsum(rand_int_arr)}.")

[[15  7 17 16]
 [ 8 11 14  2]
 [12  5  1  0]]
Sum: 108.
Minimum value: 0.
Maximum value: 17.
Median value: 9.5.
Mean: 9.0.
Variance: 33.5.
Standard deviation: 5.787918451395113.
Cumulative Sum: [ 15  22  39  55  63  74  88  90 102 107 108 108].


In [89]:
# Median along the horizontal axis (axis=1)
median_horizontal = np.median(rand_int_arr, axis=1)
print("Median along horizontal axis:", median_horizontal)

# Median along the vertical axis (axis=0)
median_vertical = np.median(rand_int_arr, axis=0)
print("Median along vertical axis:", median_vertical)

Median along horizontal axis: [15.5  9.5  3. ]
Median along vertical axis: [12.  7. 14.  2.]


In [91]:
# calculate the mean along vertical axis (axis=0)
mean_vertical = np.mean(rand_int_arr, axis=0)
print("Along Vertical Axis:",mean_vertical)  # [3. 5.]
# calculate the mean along  (axis=1)
mean_horizontal = np.mean(rand_int_arr, axis=1)
print("Along Horizontal Axis :",mean_horizontal)

Along Vertical Axis: [11.66666667  7.66666667 10.66666667  6.        ]
Along Horizontal Axis : [13.75  8.75  4.5 ]


In [88]:
# getting the statistics with array methods

print(rand_int_arr)
print(f"Sum: {rand_int_arr.sum()}.")
print(f"Minimum value: {rand_int_arr.min()}.")
print(f"Maximum value: {rand_int_arr.max()}.")
#print(f"Median value: {rand_int_arr.median()}.")
print(f"Mean: {rand_int_arr.mean()}.")
print(f"Variance: {rand_int_arr.var()}.")
print(f"Standard deviation: {rand_int_arr.std()}.")
print(f"Cumulative Sum: {rand_int_arr.cumsum()}.")

[[15  7 17 16]
 [ 8 11 14  2]
 [12  5  1  0]]
Sum: 108.
Minimum value: 0.
Maximum value: 17.
Mean: 9.0.
Variance: 33.5.
Standard deviation: 5.787918451395113.
Cumulative Sum: [ 15  22  39  55  63  74  88  90 102 107 108 108].


##### Percentiles

In [97]:
# compute the 25th percentile of the array
result1 = np.percentile(rand_int_arr, 25)
print("25th percentile:",result1)

# compute the 75th percentile of the array
result2 = np.percentile(rand_int_arr, 75)
print("75th percentile:",result2)

# compute the 50th percentile i.e. median of the array
result1 = np.percentile(rand_int_arr, 50)
print("50th percentile:",result1)

25th percentile: 4.25
75th percentile: 14.25
50th percentile: 9.5


In [96]:
np.sort(rand_int_arr.reshape(12))

array([ 0,  1,  2,  5,  7,  8, 11, 12, 14, 15, 16, 17])

##### Covariance

In [100]:
# Data matrix
matrix_A = np.array([[1, 0, 0, 3, 1], [3, 6, 6, 2, 9], [4, 5, 3, 8, 0]])
print(matrix_A)
# Covariance matrix calculation
cov_matrix = np.cov(matrix_A)
print("Covariance matrix of matrix_A:\n", cov_matrix)

[[1 0 0 3 1]
 [3 6 6 2 9]
 [4 5 3 8 0]]
Covariance matrix of matrix_A:
 [[ 1.5 -2.   2. ]
 [-2.   7.7 -7. ]
 [ 2.  -7.   8.5]]


##### Correlation

|   Pearson Correlation Coefficient (r)  |       Description of Relationship     |
|:-------------------|:---------------:|
|  r = -1              |Perfect Negative Correlation |
| -1 < r < -0.8 | Strong Negative Correlation  |
| - 0.8 < r < -0.5             | Moderate Negative Correlation  |
|       - 0.5 < r < 0     |Weak Negative Correlation  |
|       r = 0  |No Linear Correlation |
| 0 < r < 0.5 | Weak Positive Correlation  |
| 0.5 < r < 0.8             | Moderate Positive Correlation  |
|       0.8 < r < 1     |Strong Positive Correlation  |
|       r = 1  |Perfect Positive Correlation |


<div align="left" style="width: 800px; text-align: left;">
<img src="https://github.com/Explore-AI/Pictures/blob/f3aeedd2c056ddd233301c7186063618c1041140/regression_analysis_notebook/pearson_corr.jpg?raw=True"
     alt="Pearson Correlation"
     style="padding-bottom=0.5em"
     width=800px/>
</div>

In [102]:
# Correlation calculation

print(matrix_A)
cov_matrix = np.corrcoef(matrix_A)
print("Correlation of matrix_A:\n", cov_matrix)

[[1 0 0 3 1]
 [3 6 6 2 9]
 [4 5 3 8 0]]
Correlation of matrix_A:
 [[ 1.         -0.58848989  0.56011203]
 [-0.58848989  1.         -0.8652532 ]
 [ 0.56011203 -0.8652532   1.        ]]


#### np.nan

used to represent errors in computation or missimg values

In [113]:
# missing value
miss_arr = np.array([[2,3,4],[5,7,np.nan]])
miss_arr

array([[ 2.,  3.,  4.],
       [ 5.,  7., nan]])

In [112]:
# computation error
np.zeros((3,3))/np.zeros((3,3))

  np.zeros((3,3))/np.zeros((3,3))


array([[nan, nan, nan],
       [nan, nan, nan],
       [nan, nan, nan]])

In [115]:
miss_arr.mean()

np.float64(nan)

ccheck for presence of np.nan in an array

In [116]:
np.isnan(miss_arr)

array([[False, False, False],
       [False, False,  True]])

In [117]:
np.isnan(miss_arr).any()

np.True_

In [118]:
np.isnan(matrix_A)

array([[False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False]])

In [119]:
np.isnan(matrix_A).any()

np.False_

In [120]:
# find and replace nan by reassigning a value

miss_arr[np.isnan(miss_arr)] = 700
miss_arr

array([[  2.,   3.,   4.],
       [  5.,   7., 700.]])

# FILE HANDLING

In Python, the `open()` function is used to open a file and return a file object.

Syntax:

    file_object = open(file_name, mode)

The most common modes when opening a file into a python file object are:

* 'r': Read mode (default)

* 'w': Write mode

* 'a': Append mode

* 'x': Create mode

## Create mode

In [None]:
#create a file in working directory
#The "x" argument creates a new file in the directory we are working on
new_file = open("Jess-story.txt", mode="x")

In [None]:
#create a file somewhere else
new_file2 = open("/content/sample_data/jess-story2.txt","x")

mount your google drive on your colab and create a text file.

In [None]:
# creating a text file in my colab notebooks folder of my google drive.

file_1 = open("/content/drive/MyDrive/Colab Notebooks/myfile.txt", "x")

Remember to close using the `.close()` method, it is good practice.

Syntax:

    file.close() #closes the file

In [None]:
new_file.close()
new_file2.close()
file_1.close()

## Writing, Appending & Reading

* Write mode uses the "w"-argument while opening a file to write new content into a file regardless of old content.

* Append mode "a"-argument while opening a file to **add** some new content to the end of a file.

* Read mode "r"-argument to **view** the content of a file.

### Write mode

In [None]:
#open the file ready to write new content.
file_1 = open("/content/drive/MyDrive/Colab Notebooks/myfile.txt", "w")

file_1.write("Good content!\n")  #actually write in the content.
file_1.write("We are writing some cool new stuff.")
file_1.close() #close the file.

### Read mode

In [None]:
#open the file in read mode:
file_1 = open("/content/drive/MyDrive/Colab Notebooks/myfile.txt", "r")
print(file_1.read()) #show the content of the file we opened
file_1.close() #close the file

### Append mode

In [None]:
#using the 'a' argument add more stuff to an existing file.

#open the file ready to add more content.
file_1 = open("/content/drive/MyDrive/Colab Notebooks/myfile.txt", "a")
file_1.write("\n\nNow the file has more content!\n")
file_1.write("We are writing more stuff")

#open and read the file after the appending:
file_1 = open("/content/drive/MyDrive/Colab Notebooks/myfile.txt", "r")
print(file_1.read())
file_1.close()

In [None]:
# Overwrite existing content using the 'w' argument
#means to delete everything existing before and replace with new input

f = open("/content/drive/MyDrive/Colab Notebooks/myfile.txt", "w")
f.write("Replacement content Alert!\n")
f.write("We are writing new cooler stuff.")

#open and read the file after the appending:
f = open("/content/drive/MyDrive/Colab Notebooks/myfile.txt", "r")
print(f.read())
f.close()

## Context Manager

* `with` keyword


In [None]:
with open("Jess-story.txt", mode="a") as J:
  J.write("Jesselyn is fasting today.\nMake she pray for us well well.")

In [None]:
with open("Jess-story.txt", mode="r") as Js:
  print(Js.read())

## Importing data with numpy

* numpy.genfromtxt()
* numpy.loadtxt()

In [None]:
import numpy as np
data = np.genfromtxt('/content/sample_data/california_housing_test.csv', delimiter=',')

data

In [None]:
#data2 = np.loadtxt('/content/sample_data/california_housing_test.csv', delimiter=',')

#data2

## JSON

* json.load() : it accepts a file object. To read JSON data from a file and convert it into a dictionary.

* Json. loads(): accepts JSON string to convert to a dictionary.

In [None]:
import json
with open('/content/sample_data/anscombe.json') as f:
	data3 = json.load(f)

In [None]:
data3

In [None]:
g = '''{"name": "John Smith", "age": 30, "address": {

  "street": "123 Main St",
  "city": "Anytown",
  "state": "CA",
  "zip": "12345"},
"phone_numbers": [
  "555-555-5555",
  "555-555-5556"
  ]
}'''

In [None]:
data4 = json.loads(g)
data4

Checkpoint:

In [None]:
import numpy as np

file_path = "Loan_prediction_dataset.csv"

'''with open(file_path, "r") as file:
  loan = np.genfromtxt(file, delimiter=",")'''

loan = np.genfromtxt(file_path, delimiter=",",skip_header=1)
loan

In [None]:
loan_amount = loan[:,8] #get the loan amount column
loan_amount = loan_amount[~(np.isnan(loan_amount))] #remove nan values from it
loan_amount

In [None]:
print(f"The average loan amount is {np.mean(loan_amount)},\n\
the median loan amount is {np.median(loan_amount)}\n\
and the standard deviation of the loan amounts is {np.std(loan_amount)}.")

In [None]:
import csv


In [None]:
with open('/content/sample_data/california_housing_test.csv') as m:
  data4 = csv.reader(m)

  count = 0
  for row in data4:
    count += 1
    if count <= 4:
      print(row)