# # Getting Started with Numpy
### <p style="color:Tomato">Learn the basics of Numpy while working with alcohol consumption data<p/>

#### <p style="color:Gray">world_alcohol.csv<p/>
* a comma separated value dataset
* specify the delimiter using the delimiter parameter

In [2]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

In [3]:
import numpy as np

In [6]:
vector = np.array([5, 10, 15, 20])
vector
matrix = np.array([[5, 10, 15], [20, 25, 30], [35, 40, 45]])
matrix

array([ 5, 10, 15, 20])

array([[ 5, 10, 15],
       [20, 25, 30],
       [35, 40, 45]])

In [7]:
vector = np.array([10, 20, 30])
vector
matrix = np.array([[5, 10, 15], [20, 25, 30], [35, 40, 45]])

array([10, 20, 30])

#### <p style="color:Gray">ndarray.shape<p/><hr>
To figure out how many elements are in the array.

In [8]:
vector = np.array([1, 2, 3, 4])
print(vector.shape)

(4,)


> This tuple indicates that the array vector has one dimension, with length 4, which matches our intuition that vector has 4 elements.

In [9]:
matrix = np.array([[5, 10, 15], [20, 25, 30]])
print(matrix.shape)

(2, 3)


> The above code will result in the tuple (2,3) indicating that matrix has 2 rows and 3 columns.

In [11]:
vector_shape = vector.shape
matrix_shape = matrix.shape
print(vector_shape)
print(matrix_shape)

(4,)
(2, 3)


#### <p style="color:Gray">numpy.genfromtxt()<p/><hr>
Read in datasets.<br/>


In [15]:
nfl = np.genfromtxt("nfl.csv", delimiter=",")
print(nfl[:10])

[[             nan              nan              nan              nan]
 [  2.00900000e+03   1.00000000e+00              nan              nan]
 [  2.00900000e+03   1.00000000e+00              nan              nan]
 [  2.00900000e+03   1.00000000e+00              nan              nan]
 [  2.00900000e+03   1.00000000e+00              nan              nan]
 [  2.00900000e+03   1.00000000e+00              nan              nan]
 [  2.00900000e+03   1.00000000e+00              nan              nan]
 [  2.00900000e+03   1.00000000e+00              nan              nan]
 [  2.00900000e+03   1.00000000e+00              nan              nan]
 [  2.00900000e+03   1.00000000e+00              nan              nan]]


In [16]:
world_alcohol = np.genfromtxt('world_alcohol.csv', delimiter=",")
print(type(world_alcohol))
print(world_alcohol[:10])

<class 'numpy.ndarray'>
[[             nan              nan              nan              nan
               nan]
 [  1.98600000e+03              nan              nan              nan
    0.00000000e+00]
 [  1.98600000e+03              nan              nan              nan
    5.00000000e-01]
 [  1.98500000e+03              nan              nan              nan
    1.62000000e+00]
 [  1.98600000e+03              nan              nan              nan
    4.27000000e+00]
 [  1.98700000e+03              nan              nan              nan
    1.98000000e+00]
 [  1.98700000e+03              nan              nan              nan
    0.00000000e+00]
 [  1.98700000e+03              nan              nan              nan
    1.30000000e-01]
 [  1.98500000e+03              nan              nan              nan
    3.90000000e-01]
 [  1.98600000e+03              nan              nan              nan
    1.55000000e+00]]


#### <p style="color:Gray">dtype<p/><hr>
Numpy will automatically figure out an appropriate data type when reading in data of converting lists to arrays. Check the data type of a Numpy array<br/>

NumPy는 데이터를 읽거나 목록을 배열로 변환 할 때 자동으로 적절한 데이터 유형을 찾습니다. dtype 속성을 사용하여 NumPy 배열의 데이터 유형을 확인할 수 있습니다.

In [17]:
numbers = np.array([1, 2, 3, 4])
numbers.dtype

dtype('int32')

In [19]:
world_alcohol_dtype = world_alcohol.dtype
world_alcohol_dtype

dtype('float64')

### <p style="color:Tomato">How to deal with missing data<p/>
#### <p style="color:Gray">Nan<p/><hr>
Which stands for "Not a number", is a data type used to represent missing values.<br/>
NumPy can't convert a value to a numeric data type like float or integer, it uses a special nan value that stands for "not a number".<br/>
When NumPy can't convert a value to a numeric data type like float or integer, it uses a special nan value that stands for "not a number".
* header

