# [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/)

## Key Dependancies:

1. IPython and Jupyter
2. NumPy: efficient storage and manipulation of dense data arrays
3. Pandas: provides the DataFrame for efficient storage and manipulation of labeled/columnar data
4. Matplotlib: flexible range of data visualizations
5. Scikit-Learn: efficient and clean implementations of important and estabablished machine learning algorithms

## Installation
Install miniconda or anaconda then the packages for this:
`conda install numpy pandas scikit-learn matplotlib seaborn jupyter`

## Documentation
?: shows documentation  
??: shows sourcecode

## NumPy

In [2]:
# check the version of numpy
import numpy
numpy.__version__

'1.12.1'

In [3]:
# import as an alias
import numpy as np

In [5]:
# check namespace via autocomplete
# np.<TAB>

In [6]:
# check documentation
np?

### Data Types in python
- Data types are dynamically typed.
- This means we can assign any kind of data to any variable

In [8]:
# switch x from an integer to a string
x = 4
print(x)

x = "four"
print(x)

4
four


integers:
- actually a pointer

Lists

In [10]:
L = list(range(10))
L

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [11]:
type(L[0])

int

In [12]:
L2 = [str(c) for c in L]
L2

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [13]:
type(L2[0])

str

In [14]:
L3 = [True, "2", 3.0, 4]
[type(item) for item in L3]

[bool, str, float, int]

Fixed type arrays

In [16]:
import array
L = list(range(10))
A = array.array('i', L)
A

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Here, 'i' means that the contents are integers. ndarray from numpy allows for efficent operations on the data

### Creating arrays from python lists

In [17]:
# integer arrays:
np.array([1, 4, 2, 5, 3])

array([1, 4, 2, 5, 3])

In [18]:
# upcasted to a uniform data type
np.array([3.14, 4, 2, 3])

array([ 3.14,  4.  ,  2.  ,  3.  ])

In [19]:
# explicitly set the data type with dtype
np.array([1, 2, 3, 4], dtype='float32')

array([ 1.,  2.,  3.,  4.], dtype=float32)

In [25]:
# multi-dimensional arrays
# nested lists result in these
np.array([range(i, i + 3) for i in [2, 4, 6]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

### Creating arrays from scratch
- we may want to create a large array

In [26]:
# create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [27]:
# create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)

array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])

In [28]:
# create a 3x5 array filled with 3.14
np.full([3, 5], 3.14)

array([[ 3.14,  3.14,  3.14,  3.14,  3.14],
       [ 3.14,  3.14,  3.14,  3.14,  3.14],
       [ 3.14,  3.14,  3.14,  3.14,  3.14]])

In [29]:
# create a linear sequence array
# starting with 0, ending at 20, and stepping by 2
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [30]:
# create an array of 5 values, evenly spaced between 0 and 1
np.linspace(0, 1, 5)

array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ])

In [31]:
# create a 3x3 array of uniformly distribued 
# random values between 0 and 1
np.random.random([3, 3])

array([[ 0.40313467,  0.98115062,  0.47114332],
       [ 0.18809273,  0.54643531,  0.93810403],
       [ 0.03606709,  0.47803966,  0.64296618]])

In [32]:
# create an array of 3x3 normally distributed random values
# with a mean of 0 and standard deviation of 1
np.random.normal(0, 1, (3, 3))

array([[-0.59195007,  1.32116139,  0.45499338],
       [-0.86947637, -1.67532737, -0.77642893],
       [ 0.9945531 , -0.21690609,  0.71891321]])

In [33]:
# create a 3x3 array of random integers in the interval [0, 10]
np.random.randint(0, 10, (3, 3))

array([[5, 6, 6],
       [9, 0, 7],
       [5, 4, 9]])

In [34]:
# create a 3x3 identity matrix
np.eye(3)

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

In [35]:
# create an uninitialized array of 3 integers
# values are what happens to be in that memory location
np.empty(3)

array([ 1.,  1.,  1.])

### NumPy Standard Data types
to specify the data type, user a string or the np object

### Basics of Numpy Arrays
- Lets look at how to:
    - access data and subarrays
    - split
    - reshape
    - join
    

#### Array attributes

In [38]:
import numpy as np
np.random.seed(0) # for reproducibility

x1 = np.random.randint(10, size=6) # 1d array
x2 = np.random.randint(10, size=(3, 4)) # 2d array
x3 = np.random.randint(10, size=(3, 4, 5)) # 3d array

In [43]:
# each has ndim (number of dimensions), shape (size of each dimension), dtype (type of array), and size (total size)
print("x3 ndim:", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size:", x3.size)
print("x3 dtype:", x3.dtype)

x3 ndim: 3
x3 shape: (3, 4, 5)
x3 size: 60
x3 dtype: int64


In [42]:
# we can also get the itemsize (size in btyes of each item) and nbytes (total size in bytes of the array)
print("x3 itemsize:", x3.itemsize, "bytes")
print("x3 nbytes:", x3.nbytes, "bytes")

x3 itemsize: 8 bytes
x3 nbytes: 480 bytes
