# Why python for data analysis, machine learning?
There are lots of reasons that we want to use python for doing data science. It is certainly one of the younger programming languages used in the data science ecosystem (compared to say R and SAS) but it is used just as frequently for analysis as SAS and R. Having a good foundation in python, R, and SAS should be a *must* for **every data scientist** and machine learning enthusiast. 

In this course, python allows for an open source method of performing machine learning that runs from just about any machine. So let's start with looking at Numpy and Pandas pachages for analyzing data. 

With that in mind, let's go over the following:
- Numpy matrices
- Simple operations on arrays and matrices
- Indexing with numpy
- Pandas for tabular data
- Representing categorical data (discussion point)

In [None]:
import sys
import numpy as np

print(sys.version)
print(np.__version__)

In [None]:
x = np.random.rand(5,3)
x

In [None]:
x.shape

In [None]:
x.dtype

In [None]:
y = np.random.rand(3,4)
z = x*y
z

In [None]:
# we can designate what matrix multiplication is directly using objects
z = np.dot(x,y)
z

In [None]:
# or we can use the overloaded matrix multiplication operator
z = x @ y
z

# Indexing

In [None]:
x1 = np.array([[1,2,3],
               [4,5,6],
               [7,8,9]])
x1

In [None]:
for row in range(x1.shape[0]):
    print(x1[row,1])

In [None]:

x1[:,1]

In [None]:
x1[:,1]>3

In [None]:
# slicing
x1[ x1[:,1]>3 ]

In [None]:
x2 = np.array(range(10))
x2

In [None]:
x2.shape

In [None]:
idx = x2>5
idx

In [None]:
x2[idx]

In [None]:
x2[x2>5]

# Named columns
So what if we have a matrix of data where each row is some observation of features and the feature values are represented in each column?

In [None]:
col_names = ['temperature','time','day']
data = np.array([[64,2100,1],
                 [50,2200,4],
                 [48,2300,3],
                 [34,0,   2],
                 [30,100, 5]])
data

In [None]:
data2 = data[data[:,1]>1500]
data2

In [None]:
# pandas to the rescue
import pandas as pd

df = pd.DataFrame(data,columns=col_names)
df

In [None]:
df[df.time>1500]

In [None]:
# lets get a description of the data
df.info()

In [None]:
df.day[df.day==1] = 'Mon'
df

In [None]:
# there is almost always a more efficient built in pandas function
df.day.replace(to_replace=range(7),
               value=['Su','Mon','Tues','Wed','Th','Fri','Sat'],
               inplace=True)
df

In [None]:
# notice how the type of the column has changed to an object "categorical"
df.info()

In [None]:
# one hot encoding example
pd.get_dummies(df.day)