# Introduction to notebooks, python, numpy, pandas and scikit-learn

Notebooks, such as this one, consist of cells containing programming code that you can run interactively. Some other cells, such as the one you are reading now, can contain text and even images.

**Exercise:**
* Select the next cell with the mouse and run it by clicking the ▶️ ("play") button above
* Change the message to be printed and run the cell again, this time by using the keyboard short-cut: Shift-Enter

In [None]:
print("Hello, world")

## Python

This course will be using the Python version 3 programming language. Let's try some more Python code

In [None]:
x = 2
y = 3
print('x =', x)
print('y =', y)

In [None]:
y += x
print('y =', y)

**Exercise:** what happens if you run the previous cell again (the one with `y += x`). Why?

## NumPy

[NumPy](https://numpy.org/) is the fundamental package for scientific computing with Python. It provides a data type, the NumPy array, which is much more efficient for numerical calculations than Python's own list data type.

In [None]:
import numpy as np

Let's create a small NumPy array, and print it's data type and shape (size):

In [None]:
a = np.array((1, 2, 3, 4))
print(a)
print(a.dtype)
print(a.shape)


The shape is shown as a tuple `(4,)`, it just means it has a single dimension of size 4.

We can also specify the data type ourselves by giving the `dtype` parameter:

In [None]:
a = np.array((1,2,3,4), dtype=float) # Type can be explicitly specified
print(a)
print(a.dtype)
print(a.size)

Multidimensional arrays can also be created. For example here is a 2x3 array or matrix:

In [None]:
b = np.array([[1,2,3], [4,5,6]])
print(b)
print(b.shape)

NumPy has many convenience functions for creating arrays, for example to create a 3x3 array initialized to zeros:

In [None]:
c = np.zeros((3,3), int)
print(c)

You can access single elements or slices (subsets) with the normal Python slicing syntax:

In [None]:
print(a)
print(a[1])
print(a[1:3])

NumPy contains linear algebra operations for matrix and vector products, eigenproblems and linear systems. Typically, NumPy is built against optimized BLAS libraries which means that these operations are quite efficient.

In [None]:
A = np.array(((2, 1), (1, 3)))
B = np.array(((-2, 4.2), (4.2, 6)))
C = np.dot(A, B) # matrix-matrix product
w, v = np.linalg.eig(A) # eigenvalues in w, eigenvectors in v
b = np.array((1, 2))
x = np.linalg.solve(C, b) # Solve Cx = b
print(np.dot(C, x)) # np.dot calculates also matrix-vector and vector-vector products

Further reading:

* [NumPy Quickstart tutorial](https://numpy.org/doc/stable/user/quickstart.html)
* [NumPy: the absolute basics for beginners](https://numpy.org/doc/stable/user/absolute_beginners.html)
* [NumPy Reference](https://numpy.org/doc/stable/reference/index.html)

## Simple plotting with Matplotlib 

[Matplotlib](https://matplotlib.org/)

In [None]:
import matplotlib.pyplot as plt

In [None]:
%matplotlib inline

In [None]:
x = np.linspace(-np.pi, np.pi, 100)
y = np.sin(x)
plt.plot(x, y)
plt.title('A simple plot')
plt.xlabel('time (s)')

## Pandas

In [None]:
import pandas as pd


## Scikit-learn

In [None]:
from skearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [None]:
!wget https://a3s.fi/mldata/iris.csv

In [None]:
data = pd.read_csv('iris.csv')

In [None]:
data.dtypes

In [None]:
data

In [None]:
X = data.iloc[:,:4]
X

In [None]:
y = data['class']
y

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

In [None]:
clf = LogisticRegression(C=1.0)
clf.fit(X_train, y_train)

In [None]:
y_predicted = clf.predict(X_test)

In [None]:
print(classification_report(y_test, y_predicted))

In [None]:
X_test