## Data Science Bootcamp

### Table of contents:
* [Import biblioteki](#0)
* [Exercise 221](#1)
* [Exercise 222](#2)
* [Exercise 223](#3)
* [Exercise 224](#4)
* [Exercise 225](#5)
* [Exercise 226](#6)
* [Exercise 227](#7)
* [Exercise 228](#8)
* [Exercise 229](#9)
* [Exercise 230](#10)

### <a name='0'></a> Import of libraries

In [74]:
import numpy as np
np.random.seed(20)

np.__version__

'1.23.3'

### Solving systems of equations
Let's consider the system of equations $U$:  
$${\displaystyle \mathrm {U} \colon {\begin{cases}{\begin{matrix}a_{11}x_{1}&+&a_{12}x_{2}&+&\dots &+&a_{1n}x_{n}&=b_{1},\\a_{21}x_{1}&+&a_{22}x_{2}&+&\dots &+&a_{2n}x_{n}&=b_{2},\\\vdots &&\vdots &&\ddots &&\vdots &\vdots \\a_{m1}x_{1}&+&a_{m2}x_{2}&+&\dots &+&a_{mn}x_{n}&=b_{m}.\end{matrix}}\end{cases}}.}$$

Using matrices we can present it as follows:

$${\begin{bmatrix}a_{11}&a_{12}&\dots &a_{1n}\\a_{21}&a_{22}&\dots &a_{2n}\\\vdots &\vdots &\ddots &\vdots \\a_{m1}&a_{m2}&\dots &a_{mn}\end{bmatrix}}{\begin{bmatrix}x_{1}\\x_{2}\\\vdots \\x_{n}\end{bmatrix}}={\begin{bmatrix}b_{1}\\b_{2}\\\vdots \\b_{m}\end{bmatrix}}$$
And in the summary:
$$\mathbf {AX} =\mathbf {B}$$

Where:   
$A = {\begin{bmatrix}a_{11}&a_{12}&\dots &a_{1n}\\a_{21}&a_{22}&\dots &a_{2n}\\\vdots &\vdots &\ddots &\vdots \\a_{m1}&a_{m2}&\dots &a_{mn}\end{bmatrix}}$ - coefficient matrix

${\displaystyle \mathbf {B} =[b_{1},b_{2},\dots ,b_{m}]} $ - vector  
${\mathbf  X}=[x_{1},x_{2},\dots ,x_{n}] $ - vector  

If the system matrix $ A $ is a square matrix, then the determinate of the system is equivalent to its reversibility, i.e.
$$\mathbf {AX} =\mathbf {B}$$
$${\displaystyle \mathbf {A} ^{-1}\mathbf {AX} =\mathbf {A} ^{-1}\mathbf {B}} $$
$$\mathbf {X} =\mathbf {A} ^{-1}\mathbf {B} .$$

#### Example:
Let's consider the system of equations:
$$\begin{cases}2x + 4y = 10 \\ x - y = -1 \end{cases}$$  
The solution is a pair of numbers:
$$\begin{cases}x = 1 \\ y = 2 \end{cases}$$  


### <a name='1'></a> Exercise 221
Solve the following system of equations using the numpy library:
$$\begin{cases}5x - 3y = 21 \\ x - 2y = 7 \end{cases}$$


In [75]:
# enter solution here

A = np.array([[5, -3], [1, -2]])
B = np.array([[21], [7]])

print(A)
print(B)


[[ 5 -3]
 [ 1 -2]]
[[21]
 [ 7]]


In [76]:
X = np.linalg.inv(A).dot(B)

X

array([[ 3.],
       [-2.]])

In [77]:
X = np.linalg.solve(A, B)
X

array([[ 3.],
       [-2.]])

Check the solution.

__Tip:__ Use the _np.allclose()_ function.

In [78]:
# enter solution here

np.allclose(np.dot(A, X), B)

True

### <a name='2'></a> Exercise 222
Solve the following system of equations using the numpy library:

$$\begin{cases}x + y + z = 1 \\ 2x + y + 5z = 0 \\ x - y = z \end{cases}$$

In [79]:
# enter solution here

A = np.array([[1, 1, 1], [2, 1, 5], [1, -1, -1]])
B = np.array([1, 0, 0])

X = np.linalg.solve(A, B.T)

X


array([ 0.5  ,  0.875, -0.375])

Check the solution.

__Tip:__ Use the _np.allclose()_ function.

In [80]:
# enter solution here

np.allclose(A.dot(X.T), B.T)

True

### <a name='3'></a> Exercise 223

Load the popular IRIS data set into the _data_raw_ variable using the _scikit-learn_ library.

In [81]:
# enter solution here

from sklearn.datasets import load_iris

data_row = load_iris()

data_row


{'data': array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2],
        [4.4, 2.9, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.1],
        [5.4, 3.7, 1.5, 0.2],
        [4.8, 3.4, 1.6, 0.2],
        [4.8, 3. , 1.4, 0.1],
        [4.3, 3. , 1.1, 0.1],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [5.4, 3.9, 1.3, 0.4],
        [5.1, 3.5, 1.4, 0.3],
        [5.7, 3.8, 1.7, 0.3],
        [5.1, 3.8, 1.5, 0.3],
        [5.4, 3.4, 1.7, 0.2],
        [5.1, 3.7, 1.5, 0.4],
        [4.6, 3.6, 1. , 0.2],
        [5.1, 3.3, 1.7, 0.5],
        [4.8, 3.4, 1.9, 0.2],
        [5. , 3. , 1.6, 0.2],
        [5. , 3.4, 1.6, 0.4],
        [5.2, 3.5, 1.5, 0.2],
        [5.2, 3.4, 1.4, 0.2],
        [4.7, 3.2, 1.6, 0.2],
        [4.8, 3.1, 1.6, 0.2],
        [5.4, 3.4, 1.5, 0.4],
        [5.2, 4.1, 1.5, 0.1],
  

Display description of the IRIS dataset.

In [82]:
# enter solution here

print(data_row['DESCR'])

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

                    Min  Max   Mean    SD   Class Correlation
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :

Display variable names in the dataset.

In [83]:
# enter solution here

print(data_row['feature_names'])

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']


Display class names in the dataset.

In [84]:
# enter solution here

print(data_row['target_names'])

['setosa' 'versicolor' 'virginica']


### <a name='4'></a> Exercise 224
Assign the data (numpy table) of the IRIS file to the _data_ variable.


In [85]:
# enter solution here

data = data_row['data']

Assign the target variable values from the IRIS set to the _target_ variable.

In [86]:
# enter solution here

target = data_row['target']

Display the shape of the _data_ and _target_ variables.

In [87]:
# enter solution here

print(data.shape)
print(target.shape)

(150, 4)
(150,)


### <a name='5'></a> Exercise 225
Split the data into training (_data_train_, _target_train_) and test (_data_test_, _target_test_) sets. The size of the test set is 30% of the samples.

In [88]:
# enter solution here

from sklearn.model_selection import train_test_split

data_train, data_test, target_train, target_test = train_test_split(
    data, target, test_size=0.3
)


Display set size:
* _data_train_
* _target_train_
* _data_test_
* _target_test_

In [89]:
# enter solution here

name = ["data_train", "data_test", "target_train", "target_test"]
dfs = [data_train, data_test, target_train, target_test]

for title, df in zip(name, dfs):
    print(f"{title} shape: {df.shape}")


data_train shape: (105, 4)
data_test shape: (45, 4)
target_train shape: (105,)
target_test shape: (45,)


### <a name='6'></a> Exercise 226
Build a logistic regression model (set default parameter values) using the scikit-learn library and IRIS data.

In [100]:
# enter solution here

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()


Fit training data to the model.

In [101]:
# enter solution here

model.fit(data_train, target_train)


### <a name='7'></a> Exercise 227
Evaluate the model on the training set.

In [102]:
# enter solution here

model.score(data_train, target_train)


0.9714285714285714

Evaluate the model on the test set.

In [103]:
# enter solution here

model.score(data_test, target_test)


0.9333333333333333

### <a name='8'></a> Exercise 228
Predict the test data based on the model and assign it to the _target\_pred_ variable.

In [104]:
# enter solution here

target_pred = model.predict(data_test)

Display the _target\_pred_ variable.

In [105]:
# enter solution here

target_pred

array([0, 1, 1, 2, 1, 1, 2, 0, 2, 0, 2, 1, 1, 0, 0, 2, 0, 1, 2, 1, 1, 2,
       2, 0, 1, 1, 1, 0, 2, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 2, 1, 2, 0, 1,
       1])

Display the _target\_test_ variable.

In [106]:
# enter solution here

target_test

array([0, 1, 1, 2, 1, 1, 2, 0, 2, 0, 2, 1, 2, 0, 0, 2, 0, 1, 2, 1, 1, 2,
       2, 0, 1, 1, 1, 0, 2, 2, 1, 1, 0, 0, 0, 2, 1, 0, 1, 2, 1, 2, 0, 1,
       1])

### <a name='9'></a> Exercise 229
Calculate the confusion matrix.

In [107]:
# enter solution here

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(target_test, target_pred)
cm


array([[13,  0,  0],
       [ 0, 18,  0],
       [ 0,  3, 11]], dtype=int64)

### <a name='10'></a> Exercise 230
Display the classification report.

In [108]:
# enter solution here

from sklearn.metrics import classification_report

print(classification_report(target_test, target_pred))


              precision    recall  f1-score   support

           0       1.00      1.00      1.00        13
           1       0.86      1.00      0.92        18
           2       1.00      0.79      0.88        14

    accuracy                           0.93        45
   macro avg       0.95      0.93      0.93        45
weighted avg       0.94      0.93      0.93        45

