# Data Science Process

The data science (DS) process consists of 5 main steps:
1. **Obtain**. It's basically the process of gathering the required data. It can be any kind of data: text, images, audio, HTML, etc. Consequently, this data can be of any format, e.g. csv, jpeg, json, html, etc.
2. **Scrub**. We should then check our dataset for errors, blanks, duplicates, etc., and clean it.  

In [1]:
import numpy as np

np.random.seed(42)

Linear algebra works with scalars, vectors, and matrices.
Let's see each of these below.

In [6]:
# a scalar is a 1-dimensional point
scalar = np.array(5)

print(scalar)
print(scalar.ndim)

5
0


In [9]:
# a vector is a 1-dimensional line
vector = np.arange(1, 6)

print(vector)
print(vector.ndim)

[1 2 3 4 5]
1


In [18]:
# a matrix is a 2-dimensional array
matrix = np.random.randint(1, 11, 9).reshape((3, 3)).astype(np.int8)

print(matrix)
print(matrix.ndim)

[[3 5 3]
 [7 5 9]
 [7 2 4]]
2


In [43]:
# it's also possible to have a 3, 4, more dimensional matrices
matrix = np.random.randint(1, 11, 9 * 4 * 4).reshape(4, 4, 3, 3)  # it's a 4-dimensional matrix

print(matrix)
print(matrix.ndim)

[[[[ 6  3  8]
   [ 9  6  7]
   [ 1  5  5]]

  [[10  4  6]
   [ 7  9  1]
   [ 6  7  3]]

  [[ 8  5  9]
   [ 5  9 10]
   [ 4  9  5]]

  [[10  9  3]
   [ 4  9 10]
   [ 6  2  7]]]


 [[[ 2  2  1]
   [ 3  6  8]
   [ 9  7  4]]

  [[ 6  5  8]
   [ 1  1  6]
   [10  2  5]]

  [[ 1  7  1]
   [ 4  9  1]
   [ 4  6  6]]

  [[ 8  1  7]
   [ 1  7  3]
   [ 8  3  1]]]


 [[[ 8  3  2]
   [ 1  3  2]
   [ 2  6  1]]

  [[ 2  6  3]
   [ 4  1  8]
   [ 7  7  3]]

  [[10 10  3]
   [ 2  1  7]
   [ 7  2  7]]

  [[ 8  1  9]
   [ 7  3  8]
   [ 7  8  6]]]


 [[[ 9  5  3]
   [ 3  1  8]
   [ 2  3  8]]

  [[ 8  4  4]
   [ 5  6  5]
   [ 7  2  7]]

  [[10  7  1]
   [ 7  5  3]
   [ 3  4  2]]

  [[ 2  1  5]
   [ 4 10  2]
   [ 3  7  7]]]]
4


### Operations on Scalars and Vectors

In [44]:
# we can add scalars and vectors
scalar = np.array(5)
vector = np.arange(1, 11)

print(f"Scalar: {scalar}")
print(f"Vector: {vector}")
print(f"Adding a scalar to a vector: {scalar + vector}")

Scalar: 5
Vector: [ 1  2  3  4  5  6  7  8  9 10]
Adding a scalar to a vector: [ 6  7  8  9 10 11 12 13 14 15]


In [45]:
# we can also subtract scalars from vectors
print(f"Subtracting scalars from vectors: {vector - scalar}")
print(f"Subtracting vectors from scalars: {scalar - vector}")

Subtracting scalars from vectors: [-4 -3 -2 -1  0  1  2  3  4  5]
Subtracting vectors from scalars: [ 4  3  2  1  0 -1 -2 -3 -4 -5]


In [47]:
# we can also multiply and divide them
print(f"Scalar multiplied by vector: {scalar * vector}")
print(f"Scalar divided by vector: {scalar / vector}")
print(f"Vector divided by scalar: {vector / scalar}")

Scalar multiplied by vector: [ 5 10 15 20 25 30 35 40 45 50]
Scalar divided by vector: [5.         2.5        1.66666667 1.25       1.         0.83333333
 0.71428571 0.625      0.55555556 0.5       ]
Vector divided by scalar: [0.2 0.4 0.6 0.8 1.  1.2 1.4 1.6 1.8 2. ]


In [48]:
# we can also perform the same math operations between two or more scalars and vectors
vector1 = np.arange(1, 11)
vector2 = np.arange(31, 41)

print(f"Vector addition: {vector1 + vector2}")
print(f"Vector subtraction: {vector1 - vector2}")
print(f"Vector multiplication: {vector1 * vector2}")
print(f"Vector division: {vector1 / vector2}")

Vector addition: [32 34 36 38 40 42 44 46 48 50]
Vector subtraction: [-30 -30 -30 -30 -30 -30 -30 -30 -30 -30]
Vector multiplication: [ 31  64  99 136 175 216 259 304 351 400]
Vector division: [0.03225806 0.0625     0.09090909 0.11764706 0.14285714 0.16666667
 0.18918919 0.21052632 0.23076923 0.25      ]


### Dot product

A dot product is the sum of all the positional elements in the array, multiplied one by another.

In [53]:
array1 = np.arange(1, 11)
array2 = np.arange(31, 41)

print(f"Array 1: {array1}")
print(f"Array 2: {array2}")
print(f"Preliminary dot product of arrays 1 and 2: {array1 * array2}")  # the sum of all the elements is 2035
print(f"Dot product of arrays 1 and 2: {np.dot(array1, array2)}")

Array 1: [ 1  2  3  4  5  6  7  8  9 10]
Array 2: [31 32 33 34 35 36 37 38 39 40]
Preliminary dot product of arrays 1 and 2: [ 31  64  99 136 175 216 259 304 351 400]
Dot product of arrays 1 and 2: 2035
