<h1 style="text-align: center"> Basic Python for Machine Learning </h1>
<h1 style="text-align: center"> (Part 2)</h1>

In this second part, we will use CSV file format (https://en.wikipedia.org/wiki/Comma-separated_values). You  first need to understand some basic things about this format, e.g. file header, comments, delimiter, quotes, etc. We will use a dataset to demonstrate how to prepare data for machine learning.  The data is freely available from the UCI Machine Learning Repository  ( https://archive.ics.uci.edu/ml/datasets.php ).   

# 1. The Numpy Library

Numpy is the **fundamental package for scientific computing in Python**. It provides support for **large multi-dimensional arrays** and also **high level mathematical functions** to operate on these arrays. (You can play with this library to do deeplearning but NumPy is not the best choice). Nevertheless, most scientific libs rely on NumPy conventions and APIs so it is important to have some knowledges about it.

For more detail about Numpy, please refer to the official documentation available at https://numpy.org

To start, we first need to import numpy in Python and check the version

---
---



numpy: 1.18.5


## 1.2. The ndarray class

The fundamental class of NumPy is ndarray. It represents table of items, with the following constraints:

• It is multidimensional(1d,2d,3d,...,nd),

• It is homogeneous,i.e, all items inside the table should belong to the same type.

NumPy provides the foundation data structures and operations for SciPy. These are ndarrays that are efficient arrays and easy to define and manipulate. 

In [None]:
# define an array


array([1, 2, 3])

In [None]:
# create a multi-dimensional array.


array([[1, 2, 3],
       [4, 5, 6]])

In [None]:
# Type of a


numpy.ndarray

In [None]:
#Check the shape (rows and columns of the array).


(2, 3)

In [None]:
# 'Rank' as mention in NumPy doc or number of dimensions


2

In [None]:
# Total number of items


6

In [None]:
# Item type


dtype('int64')

In [None]:
#Actual data of the table


<memory at 0x7f0b86f83558>

In [None]:
#Create an evenly spaced array between 1 and 30 with a difference of 2.


array([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29])

In [None]:
#Reshape the above array into a desired shape.


array([[ 1,  3,  5],
       [ 7,  9, 11],
       [13, 15, 17],
       [19, 21, 23],
       [25, 27, 29]])

In [None]:
# Create an array with all elements as ones.


array([[1., 1.],
       [1., 1.]])

In [None]:
#Create an array filled with zeros. 


array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [None]:
#Create a diagonal matrix with diagonal values = 1


array([[1., 0.],
       [0., 1.]])

In [None]:
#Extract only diagonal values from an array.


array([1., 1.])

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.]])

In [None]:
#Generate an evenly spaced list between the interval 1 and 5. 
#(Take a minute here to understand the difference between ‘linspace’ and ‘arange’)


array([1.        , 1.21052632, 1.42105263, 1.63157895, 1.84210526,
       2.05263158, 2.26315789, 2.47368421, 2.68421053, 2.89473684,
       3.10526316, 3.31578947, 3.52631579, 3.73684211, 3.94736842,
       4.15789474, 4.36842105, 4.57894737, 4.78947368, 5.        ])

In [None]:
#Generate an evenly spaced list 


array([1.00000000e+01, 1.62377674e+01, 2.63665090e+01, 4.28133240e+01,
       6.95192796e+01, 1.12883789e+02, 1.83298071e+02, 2.97635144e+02,
       4.83293024e+02, 7.84759970e+02, 1.27427499e+03, 2.06913808e+03,
       3.35981829e+03, 5.45559478e+03, 8.85866790e+03, 1.43844989e+04,
       2.33572147e+04, 3.79269019e+04, 6.15848211e+04, 1.00000000e+05])

In [None]:
#Now, change the shape of the array in place (‘resize’ function changes the shape of the array in place, 
#unlike ‘reshape’)


array([[1.        , 1.21052632, 1.42105263, 1.63157895],
       [1.84210526, 2.05263158, 2.26315789, 2.47368421],
       [2.68421053, 2.89473684, 3.10526316, 3.31578947],
       [3.52631579, 3.73684211, 3.94736842, 4.15789474],
       [4.36842105, 4.57894737, 4.78947368, 5.        ]])

In [None]:
#Create an array consisting of repeating list


array([1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])

In [None]:
#Now, repeat each element of array n number of times using repeat function.


array([1, 1, 1, 2, 2, 2, 3, 3, 3])

In [None]:
#Generate arrays of desired shape filled with random values between 0 and 1.


array([[0.19376032, 0.61983444, 0.86120886, 0.76441249],
       [0.67200117, 0.56163187, 0.99061394, 0.7036518 ],
       [0.38836475, 0.413436  , 0.07480116, 0.39627932]])

In [None]:
# !!! shape is given dimension by dimension as arguments not in one tuple

array([[-0.05878962, -0.7430083 ,  1.04349352, -0.02944567],
       [-1.44532637,  2.42504766, -0.70464273, -0.32784839],
       [-1.09365777,  1.36035024,  0.36635167,  0.64439498]])

In [None]:
#Stack the above two arrays created vertically


array([[ 0.19376032,  0.61983444,  0.86120886,  0.76441249],
       [ 0.67200117,  0.56163187,  0.99061394,  0.7036518 ],
       [ 0.38836475,  0.413436  ,  0.07480116,  0.39627932],
       [-0.05878962, -0.7430083 ,  1.04349352, -0.02944567],
       [-1.44532637,  2.42504766, -0.70464273, -0.32784839],
       [-1.09365777,  1.36035024,  0.36635167,  0.64439498]])

In [None]:
# stack the above two arrays createdhorizontally.


array([[ 0.19376032,  0.61983444,  0.86120886,  0.76441249, -0.05878962,
        -0.7430083 ,  1.04349352, -0.02944567],
       [ 0.67200117,  0.56163187,  0.99061394,  0.7036518 , -1.44532637,
         2.42504766, -0.70464273, -0.32784839],
       [ 0.38836475,  0.413436  ,  0.07480116,  0.39627932, -1.09365777,
         1.36035024,  0.36635167,  0.64439498]])

## 1.3. Operations

In [None]:
# randomly create 2 np array 


[[0.25339678 0.09479027]
 [0.54647621 0.76720242]]
[[0.16977469 0.26604882]
 [0.97836458 0.62856282]]


In [None]:
#element-wise addition.


array([[0.42317147, 0.3608391 ],
       [1.52484079, 1.39576524]])

In [None]:
#Element wise subtraction.


array([[ 0.08362209, -0.17125855],
       [-0.43188837,  0.1386396 ]])

In [None]:
#Element wise multiplication 


array([[0.04302036, 0.02521884],
       [0.53465297, 0.48223491]])

In [None]:
#power each element to 2.


array([[0.06420993, 0.0089852 ],
       [0.29863625, 0.58859955]])

In [None]:
# dot product of the two arrays k and l.


array([[0.13575981, 0.12699756],
       [0.8433815 , 0.62762427]])

In [None]:
# transpose of a.


array([[0.25339678, 0.54647621],
       [0.09479027, 0.76720242]])

In [None]:
#datatype of elements in the array.


dtype('float64')

In [None]:
#Change the datatype of the array.


dtype('float32')

In [None]:
#some mathematical functions in an array, starting with sum of an array.


22

In [None]:
#Maximum of the elements of an array.


10

In [None]:
#Mean of the elements of the array


4.4

In [None]:
#Now, let’s retrieve the index of the maximum value of the array.


2

In [None]:
d.argmin()

0

In [None]:
#Create an array consisting of square of first ten whole numbers.


array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

In [None]:
# randomly create 2 np array

[[ 0.10748167 -0.60594817 -1.27936755]
 [-0.48475664  1.25499446  0.51523127]
 [ 0.3037174   0.12753081 -0.86024197]
 [-1.26656489  0.441959   -1.13760457]]
[[ 0.23727124  0.01564153  0.62970145]
 [-0.32637164  0.87695142  2.34180192]
 [-1.06530851  0.73760767  0.5987742 ]
 [ 0.76896762  0.13973785 -0.24542938]]


If you want to compute an extremum along a particular axis, you should precise axis in argument. As indexing, this reduce the dimension of the array. If you want to keep the same number of dimension, you should set the keepdims argument to True.

array([-1.27936755, -0.48475664, -0.86024197, -1.26656489])

array([[-1.27936755],
       [-0.48475664],
       [-0.86024197],
       [-1.26656489]])

## 1.4. Indexation and Slicing

In [None]:
#Create an array consisting of square of first ten whole numbers.


array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

In [None]:
# First item


0

In [None]:
#Access values in the above array using index.


4

In [None]:
# Last item


81

In [None]:
# From item 2 to item 5 (excluded !)


array([ 4,  9, 16])

In [None]:
#Eliptic formulation 
# 3 first items


array([0, 1, 4])

In [None]:
# Starting from the 4th item


array([ 9, 16, 25, 36, 49, 64, 81])

In [None]:
# All items


array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

In [None]:
# With a step
#d[start:stop:stepsize]


array([ 4, 16, 36])

In [None]:
#Reverse


array([81, 64, 49, 36, 25, 16,  9,  4,  1,  0])

In [None]:
#Select values from array greater than 20.


array([25, 36, 49, 64, 81])

In [None]:
#Create a multidimensional array


array([[[0.26744754, 0.35799365, 0.92456232, 0.58302221, 0.73529411],
        [0.80666624, 0.88009258, 0.78993131, 0.82257413, 0.53298369],
        [0.76044335, 0.18397122, 0.92055304, 0.24047867, 0.73837399],
        [0.24355433, 0.32023589, 0.98776208, 0.97059701, 0.44095053]],

       [[0.40067613, 0.56284328, 0.27716801, 0.83958955, 0.88463481],
        [0.57432366, 0.22583783, 0.13214541, 0.90067054, 0.92375607],
        [0.12403723, 0.48170806, 0.18131214, 0.58541257, 0.79568205],
        [0.72829766, 0.424752  , 0.3694815 , 0.00231792, 0.92803375]],

       [[0.6999519 , 0.01979782, 0.00153465, 0.49607416, 0.22267532],
        [0.84745035, 0.76871248, 0.8679842 , 0.07389645, 0.0275281 ],
        [0.29804692, 0.39706664, 0.40913779, 0.1865013 , 0.6087326 ],
        [0.57975328, 0.59707089, 0.22821822, 0.85169569, 0.463421  ]]])

In [None]:
# shape

(3, 4, 5)

In [None]:
# First item on each axis


0.2674475422219845

In [None]:
#Access the second row and third column


array([0.12403723, 0.48170806, 0.18131214, 0.58541257, 0.79568205])

In [None]:
#With an interval and ann elipse


array([[0.78993131, 0.82257413, 0.53298369],
       [0.13214541, 0.90067054, 0.92375607],
       [0.8679842 , 0.07389645, 0.0275281 ]])

In [None]:
# Access 2nd row and columns 3 to 7. Note that the numbering of the rows and columns start with 0.,


array([[0.12403723, 0.48170806, 0.18131214, 0.58541257, 0.79568205],
       [0.72829766, 0.424752  , 0.3694815 , 0.00231792, 0.92803375]])

In [None]:
#Select all rows till the 2nd row and all columns except last column


array([[[0.26744754, 0.35799365, 0.92456232, 0.58302221, 0.73529411],
        [0.80666624, 0.88009258, 0.78993131, 0.82257413, 0.53298369],
        [0.76044335, 0.18397122, 0.92055304, 0.24047867, 0.73837399]],

       [[0.40067613, 0.56284328, 0.27716801, 0.83958955, 0.88463481],
        [0.57432366, 0.22583783, 0.13214541, 0.90067054, 0.92375607],
        [0.12403723, 0.48170806, 0.18131214, 0.58541257, 0.79568205]]])

In [None]:
# a[2] is equivalent to a[2,:,:]


array([[0.6999519 , 0.01979782, 0.00153465, 0.49607416, 0.22267532],
       [0.84745035, 0.76871248, 0.8679842 , 0.07389645, 0.0275281 ],
       [0.29804692, 0.39706664, 0.40913779, 0.1865013 , 0.6087326 ],
       [0.57975328, 0.59707089, 0.22821822, 0.85169569, 0.463421  ]])

In [None]:
# Multiple elipses : c[1,...,2] is equivalent to c[1,:,:,2] on 4-D array


array([[[[ 0.01092484,  0.38591024,  0.34569849],
         [-1.08145383, -0.66540738, -1.0007731 ]],

        [[ 1.69101448,  1.22981859,  0.35244832],
         [ 0.63552323, -0.09029121,  0.31642548]]],


       [[[ 0.56990119,  0.28983554, -0.08689175],
         [-0.54836915, -0.38713517,  0.89257086]],

        [[-1.29572469, -0.86750909, -0.17548642],
         [ 0.70761295, -1.3329843 ,  0.85623445]]]])

In [None]:
c[1, ..., 2]

array([[-0.08689175,  0.89257086],
       [-0.17548642,  0.85623445]])

In [None]:
c[1, :, :, 2]

array([[-0.08689175,  0.89257086],
       [-0.17548642,  0.85623445]])

In [None]:
d = np.random.randn(4, 3)
d

array([[-0.269     ,  1.61725172, -0.21322409],
       [ 0.0098456 ,  0.16965464, -1.44931971],
       [-0.75817911, -1.64369599, -0.50108824],
       [-0.44023007,  0.41824238, -1.38978736]])

In [None]:
e = d[:, 0] 
e

array([-0.269     ,  0.0098456 , -0.75817911, -0.44023007])

In [None]:
# b has shape (4,) not (4,1)


(4,)

In [None]:
e = d[0, :]
e

array([-0.269     ,  1.61725172, -0.21322409])

In [None]:
# c has shape (3,) not (1,3)


(3,)

In [None]:
# Meanwhile using slice and not index preserves dimension


array([[-0.269     ,  1.61725172, -0.21322409]])

In [None]:
f.shape

(1, 3)

## 1.5. Assignation

In [None]:
#Assignation is performed by the operator =. Item or a sub-array can be targeted.


array([[1, 2, 3],
       [4, 5, 6]])

array([[10,  2,  3],
       [ 4,  5,  6]])

array([[10,  1,  1],
       [ 4,  1,  1]])

In [None]:
#Take Care ! dtype is determined at instanciation and can not be changed after.

In [None]:
#1.175 will be downcast before assignation


array([[10,  1,  1],
       [ 1,  1,  1]])

In [None]:
#Arrays can be reshaped by the resize method. That’s an in-place operation:


array([[10,  1],
       [ 1,  1],
       [ 1,  1]])

## 1.6. References, view and copy

If a and b reference the same ndarray, all operation on a also applied to b. They share both data and metadata. If c is a view of a, they share the same data but not the metadata. For example shapes can be modified separately. But if we change the first element of c, the first element of a is also changed. If d is a copy of a, all data and metadata are separated.

In [None]:
a = np.random.randn(4, 3)
a

array([[-0.75285817, -0.83421125, -0.4489713 ],
       [-1.41187743,  0.76100036,  0.76552791],
       [ 0.67015368, -0.05031783, -0.26206876],
       [ 1.89567617, -1.65713055,  0.46353272]])

In [None]:
# b is a reference to a


array([[ 1.        , -0.83421125, -0.4489713 ],
       [-1.41187743,  0.76100036,  0.76552791],
       [ 0.67015368, -0.05031783, -0.26206876],
       [ 1.89567617, -1.65713055,  0.46353272]])

In [None]:
#c is a view of a


array([[ 1.        , -0.83421125, -0.4489713 , -1.41187743],
       [ 0.76100036,  0.76552791,  0.67015368, -0.05031783],
       [-0.26206876,  1.89567617, -1.65713055,  0.46353272]])

In [None]:
# Shape of a is not affected


array([[ 1.        , -0.83421125, -0.4489713 ],
       [-1.41187743,  0.76100036,  0.76552791],
       [ 0.67015368, -0.05031783, -0.26206876],
       [ 1.89567617, -1.65713055,  0.46353272]])

In [None]:
# But if we modify the last element of c, the last element of a is changed


array([[ 1.        , -0.83421125, -0.4489713 ],
       [-1.41187743,  0.76100036,  0.76552791],
       [ 0.67015368, -0.05031783, -0.26206876],
       [ 1.89567617, -1.65713055,  0.        ]])

In [None]:
# d is a copy of a


array([[ 1.        , -0.83421125, -0.4489713 ],
       [-1.41187743,  0.76100036,  0.76552791],
       [ 0.67015368, -0.05031783, -0.26206876],
       [ 1.89567617, -1.65713055,  0.        ]])

array([[ 3.        , -0.83421125, -0.4489713 ],
       [-1.41187743,  0.76100036,  0.76552791],
       [ 0.67015368, -0.05031783, -0.26206876],
       [ 1.89567617, -1.65713055,  0.        ]])

In [None]:
# a was not modified by the assigniation on d


array([[ 1.        , -0.83421125, -0.4489713 ],
       [-1.41187743,  0.76100036,  0.76552791],
       [ 0.67015368, -0.05031783, -0.26206876],
       [ 1.89567617, -1.65713055,  0.        ]])

* ndarray.resize(new shape, refcheck=True) Resize in-place
* ndarray.reshape(shape, order=C) Return a view with a new shape ndarray.ravel(order=C) Return a flatten view
* ndarray.flatten(order=C) Return a flatten copy
* numpy.concatenate((a1, a2, ...), axis=0) Return a concatenation of arrays along an existing axis
* numpy.stack((a1, a2, ...), axis=0) Return a stack of arrays along a new axis

## 1.7. Saving and loading data

Load a npy or npz file: 

* numpy.load(file, mmap_mode=None, allow_pickle=True, fix_imports=True, encoding='ASCII')


Load a txt file: 
* numpy.loadtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0)



Save ONE array into a npy file: 

* numpy.save(file, arr, allow_pickle=True, fix_imports=True)



Save many arrays into an npz file,

* numpy.savez(file, *args, **kwds) 

save ONE array into a txt file: 

* numpy.savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='# ')


## 1.8. CVS reading 

The Python API provides the module CSV and the function reader() that can be used to load CSV files. Once loaded, you can convert to a numpy array and use it for machine learning.



Mounted at /content/gdrive


 babeldomains_babelnet.txt   imagenet_cca.zip	   KR20-rebuttal.gdoc
'Colab Notebooks'	    'ISWC rebuttal.gdoc'


In [None]:
# Load CSV Using Python


(151,)


You can load your CSV data using numpy and the numpy.loadtxt() por numpy.genfromtxt() functions. This functions assume no header row and all data has the same format.



In [None]:
# Load CSV using NumPy




(150, 5)


## 1.9. Your turn

Try to answer each following questions by a small snippet of code.

1. How to reverse a vector (1d array) ?
2. How to keep dimension consistency when slicing a matrix (2d array) ?
3. How to create a (5,5) array with random values and find the extrema values ?
4. With the help of broadcasting, how to produce a matrix A where A[i,j] = 2i + j ? (no for loop allowed)
5. A is a (4,4) int array, I want to change the last element of A to 1.5 without loosing any information. How can I do it ?