# Test Your Software Installation

If this is your first time using a Jupyter notebook, please make sure to follow along with me in the class Video.  If you know what you are doing, just go ahead and run the cells and make sure everything works on your system.

In [None]:
import numpy as np
import pandas as pd
from sklearn import datasets
import matplotlib.pyplot as plt

%matplotlib inline

###  No errors yet? Your software is installed!
If you don't have any errors from running the cell above, then you are set.

# Let's plot a sine wave with numpy and matplotlib

You can adjust the range of the wave by playing around with the `np.arange()` parameters, the syntax is `(start, stop, step)` just like normal python slicing.  Except with step we can step by decimal amounts.

In [None]:
a = np.sin(np.arange(0,10,.1))
plt.plot(a)

# We can dress up our plot a bit with extra attributes

In [None]:
plt.plot(a)
plt.title("This is a Sine Wave")
plt.xlabel("These are the values of X")
plt.ylabel("Y LABEL!!")

If we want to control the size of the plot we have to do it _before_ we make the plot, at least that's one way.

In [None]:
# figsize allows us to control the size of the plot, and we access it through the plots figure object

plt.figure(figsize=(10,5))
plt.plot(a)
plt.title("This is a Sine Wave")
plt.xlabel("These are the values of X")
plt.ylabel("Y LABEL!!")

## Finally let's plot a few other things on the same plot.


In [None]:
plt.figure(figsize=(10,5))
b = np.cos(np.arange(0,10,.1))
plt.plot(a, label = "cosine", marker = 'o' ) # add a label so we can create a legend and mess with the marker
plt.plot(b, label = 'sine', linewidth = 4, color = 'purple') # add b by simply plotting it as well. , make it thicker and purple
plt.title("This is a Sine AND a Cosine Wave")
plt.xlabel("These are the values of X")
plt.ylabel("Y LABEL!!")
plt.legend()

# Ok, let's load up some data

For now we'll just take some default datasets that come packaged with Scikit-Learn, later on we can look into getting our own data and loading it up.

In [None]:
iris = datasets.load_iris()

The type of the data is a special scikit-learn datatype called `bunch`, don't worry much about it, you will never use it except for when you are just learning.  Normally we load our data from outside and into pandas dataframes or numpy arrays (or both).

In [None]:
type(iris)

In [None]:
iris.feature_names

In [None]:
iris.target

In [None]:
type(iris.data)

Scikit-Learn also stores their data in numpy `ndarray`'s. This is the most common way to store any kind of tabular data. We will have to learn how to work with these guys soon.  For now I'll just show you a couple of basic slicing things, it works pretty much how you would think.

In [None]:
# check the "shape" of the array, this will tell us how many dimensions it's in (tensor rank for the math inclined)
iris.data.shape

This means we have 150 rows and 4 columns.  The dataset is 150 samples, each sample has 4 describing "features" about it. We can look it at really easily -- either `print()` or because we are in an interactive session, just run it (on the last line of the cell)

In [None]:
iris.data

In [None]:
# first row, single int slices default to the row view
iris.data[0]

In [None]:
# one column, you need to index with a comma 
# [:,:] --> [first_dim, second_dim]
iris.data[:,0]

In [None]:
## 2-4 rows, 2-3 column, [start:stop] with start inclusive, stop exclusive (like python)
iris.data[2:5,2:4]

# Let's look at some data we can "look" at

In [None]:
digits = datasets.load_digits()

In [None]:
digits.data[0].shape

In [None]:
plt.imshow(digits.data[0].reshape(8,8), cmap='gray')

In [None]:
digits.data.shape

In [None]:
fig, axes = plt.subplots(2, 10, figsize=(12, 2))

for i, ax in enumerate(axes.flat):
    ax.imshow(digits.data[i].reshape(8,8), cmap='binary', interpolation='nearest')
    ax.text(0.05, 0.05, str(digits.target[i]),
            transform=ax.transAxes, color='green')
    ax.set_xticks([])
    ax.set_yticks([])

In [None]:
boston = datasets.load_boston()

In [None]:
boston.feature_names

In [None]:
housing = pd.DataFrame(boston.data, columns = boston.feature_names)

In [None]:
housing.head()

In [None]:
housing.describe()

In [None]:
housing.tail()

In [None]:
housing.shape

In [None]:
housing_targets = pd.Series(boston.target)

In [None]:
housing_targets

In [None]:
housing_targets.describe()