### A. Getting Started With Python & Data Analysis
#### Data is the core of data science, hence, scoping and collecting the right data for a project is very crucial to achieving the required results. A complete Data Science Pipeline involves 
#### 1. Data Scoping
#### 2. Data Review
#### 3. Feature engineering 
#### 4. Feature Review 
#### 5. Model Selection and review 
#### 6. Model Evaluation and Insights
#### 7. Interaction Production 
#### 8. Feedback

#### Conducting Exploratory Data Analysis (EDA) on the cleaned data using visualisations and statistical methods gives a quick insight into the various patterns and relationships between features in the dataset. Modelling involves using statistical and machine learning methods for classifying and clustering the processed data to create predictive models. Several evaluation methods are employed to compare the performance of these models and continuously improve before a final model is selected.

#### For the most part, the data science pipeline is not a linear process; it’s instead an iterative process.

#### Data can be presented in different forms such as CSV, JSON, Excel files, database etc. Python is very efficient in processing and wrangling most data types. The libraries Include Numpy, Pandas , Matplotlib , Scikit-Learn and TensorFlow.

#### Jupyter notebook is an interactive web environment that supports many programming languages including Python and R, allowing for explanatory text, images and visualisation.

### B. Introduction to NumPy & Creating Arrays.

#### NumPy is a library that has ndarray as its basic data structure used to handle arrays and matrices. A NumPy array has a grid of values all of which are of the same data type, mostly integers and floats. These arrays can also be created from Python lists.

In [2]:
#Importing Numpy library 
import numpy as np 

arr = [1,2,3,4]  #Created a simple list and assigned it to a variable 

print (arr)
print (type(arr))

[1, 2, 3, 4]
<class 'list'>


In [3]:
#Converting the list arr to an Array 
a = np.array(arr)

print(type(a)) # The type of ellemnt which is a numpy array

print(a.shape) # The shape of the array which is (4,0)

print(a.dtype) # The type of data in the array which is int

print(a.ndim) # The number of dimension which is 1

<class 'numpy.ndarray'>
(4,)
int32
1


In [4]:
# Lets create a two dimentional array
b= np.array([[1,2,3,4],[5,6,7,8]])

print(b.shape) # The shape of the array where the first dimension has 2 elements and the second has 4.

print(b.ndim) # The dimension of the array

(2, 4)
2


#### There are also some inbuilt functions that can be used to initialize numpy which include empty(), zeros(), ones(), full(), random.random().

In [5]:
zero_array = np.zeros(5) #Takes the number of zeros as an argument 

print(zero_array)

[0. 0. 0. 0. 0.]


In [6]:
empty_array =np.empty([2,2]) #Takes an array or integer as an argument

print(empty_array)

[[-1.10380189e-282  2.52625418e+286]
 [ 6.65259379e-301  2.21764760e-301]]


In [7]:
one_array = np.ones([2,3])

one_array2 = np.ones(5)

print(one_array)
print('-----------')
print(one_array2)

[[1. 1. 1.]
 [1. 1. 1.]]
-----------
[1. 1. 1. 1. 1.]


In [8]:
np.full((2,2),10)

array([[10, 10],
       [10, 10]])

In [9]:
np.full((2,2),[1,2])

array([[1, 2],
       [1, 2]])

In [10]:
np.random.random((2,3))

array([[0.12813459, 0.25392002, 0.31635468],
       [0.49618729, 0.77583433, 0.46724096]])

### C. Intra-operability of Arrays and Scalars.

#### This allows for batch arithmetic operations on the arrays by applying the operator elementwise. Similarly, scalars are also propagated element-wise across an array. For arrays with different sizes, it is impossible to perform element-wise operations instead; numpy handles this by broadcasting provided the dimensions of the arrays are the same or, one of the dimensions of the array is 1

In [11]:
c = np.array([[1.0,2.0,3.0],[4.0,5.0,6.0]])
d = np.array([[2.0,4.0,8.0],[1.0,3.0,6.0]])

print(c)
print(d)

[[1. 2. 3.]
 [4. 5. 6.]]
[[2. 4. 8.]
 [1. 3. 6.]]


In [12]:
c+d #Addition operator 

array([[ 3.,  6., 11.],
       [ 5.,  8., 12.]])

In [13]:
c-d # subtraction operator

array([[-1., -2., -5.],
       [ 3.,  2.,  0.]])

In [14]:
d/5 #Divide

array([[0.4, 0.8, 1.6],
       [0.2, 0.6, 1.2]])

In [15]:
c**2 #Power

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

### D. Indexing With Arrays & Using Arrays for Data Processing

In [16]:
a[2]

3

In [17]:
b[0,0] #Enter list 0 and give me the element on index zero

1

In [18]:
b[1,2] #Enter the list 1 and give me the second element 

7

#### Array Slicing 

In [20]:
d

array([[2., 4., 8.],
       [1., 3., 6.]])

In [21]:
d[1, :2]

array([1., 3.])

In [23]:
e=np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[11,12,13,14]])
e

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [11, 12, 13, 14]])

In [24]:
e[:3 , :2] # Means give me index 0 to 2 list and their inddex 0 to 1 elemnets 

array([[ 1,  2],
       [ 5,  6],
       [ 9, 10]])

In [25]:
e.sum() # sum of all elements

128

In [26]:
e[1].sum()

26

In [27]:
e[1].mean() #Mean of elemtns in index 0 (a list)

6.5

In [28]:
e.std()  #look up the formula for standard deviation

4.0

In [29]:
e[1].min() # Prints the minimun value

5

In [31]:
np.corrcoef(e)  #Browse more on Correlation and how it works

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

### File input and output with Arrays 