#  Data Wrangling.Basics of working with vector data and visualization. NumPy

## Example tasks

Let's look at an example of solving one of the problems on the kaggle website and try to analyze it.

https://www.kaggle.com/madhulekha/a-comprehensive-solution-to-your-first-ml-problem

 As you can see, data exploration and feature generation takes up a large part of the solution. It is ok.

## Numpy

Let's briefly remind why do we use NumPy instead of built-in arrays

In [1]:
import numpy as np

In [2]:
%timeit [i**2 for i in range(1000)]

1000 loops, best of 5: 288 µs per loop


In [3]:
%timeit np.arange(1000)**2

The slowest run took 2453.22 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 3.16 µs per loop


The difference is significant

http://www.numpy.org

### Cheat sheet

Numpy has a large set of standard functions that help you implement even the most complex operations. In order not to waste time writing your own functions, it will be useful to have a cheat sheet with the main useful

In [None]:
from IPython.display import Image
Image('numpy.png') 

In [None]:
# some jupyter tricks - double tab, you can call the documentation
# np.

In [None]:
a = np.array([1,3,4,5])
a

see the dimension of the array

In [None]:
a.ndim

In [None]:
# TASK: create a two-dimensional/three-dimensional array
# Your code is here

In [None]:
b = np.array([[1,3,4,5],[1,3,4,5]])
b

### Basic NumPy Data Types

In [4]:
import numpy as np

In [5]:
a = np.array([1, 2, 3])
a.dtype

dtype('int64')

A dot after the number means that this is the 'float64' data type

In [6]:
b = np.array([1., 2., 3.])
b.dtype

dtype('float64')

In [7]:
c = np.array(["1", 2, 3])
c.dtype

dtype('<U21')

In [8]:
c[1]

'2'

In [9]:
a = np.ones((3, 3))
a

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [10]:
a

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

Other data types:

- Complex numbers

In [11]:
d = np.array([1+2j, 3+4j, 5+6*1j])
d.dtype

dtype('complex128')

 - Bool

In [12]:
e = np.array([True, False, False, True])
e.dtype

dtype('bool')

- Strings

On a line, memory is allocated 'greedily' - according to the maximum number of letters in the line.
In this example, 7 letters are allocated for each row, and the data type is 'U7'

In [13]:
f = np.array([(1,23,4), 'Hello', 'Hallo',])
f.dtype

  """Entry point for launching an IPython kernel.


dtype('O')

### Multiplication of matrices and columns

In [None]:
a = np.array([[1, 0], [0, 1]])
b = np.array([[4, 1], [2, 2]])
r1 = np.dot(a, b)
r2 = a.dot(b)

print("Matrix A:\n", a)
print("Matrix B:\n", b)
print("The result of multiplication by the function:\n", r1)
print("The result of multiplication by the method:\n", r2)

In [None]:
a@b

Matrices in 'NumPy' can also be multiplied by vectors:

In [None]:
c = np.array([1, 2])
r3 = b.dot(c)

print("Матрица:\n", b)
print("Вектор:\n", c)
print("Результат умножения:\n", r3)

__Note:__ the __'*'__ operation performs coordinate multiplication on matrices, not matrix multiplication!

In [None]:
r = a * b

print("Matrix A:\n", a)
print("Matrix B:\n", b)
print("The result of coordinate multiplication through operator *:\n", r)
a

More about matrix multiplication in 'NumPy'
see [documentation](http://docs.scipy.org/doc/numpy-1.10.0/reference/routines.linalg.html#matrix-and-vector-products).