In [None]:
import numpy as np

# Setting a random seed is useful for reproducability. Make sure you use the correct number (or no number).
np.random.seed(0)

## Loading Data

For our first example we'll use the famous iris data set. It is available on the HW-1 zip file.

In [None]:
path_to_file = 'data/iris.txt'   # This is where my sits, update it to your path.
iris = np.genfromtxt(path_to_file, delimiter=None)  # Loading thet txt file

This loaded the iris data set into a numpy 2D array (basically a table). Some of the things you can do with a table
is to check the size of it, get a row/column, slice it, etc.

### Show the entire array 

In [None]:
iris

### Size

In [None]:
iris.shape

### Get (or set) item/row/column

In [None]:
iris[0, 3]  # Get the element on the first row and 4th column.

In [None]:
iris[0, :] # Or iris[0]. Getting the 0'th row

In [None]:
iris[:, 0]  # Get the 0'th column

### Slicing

In [None]:
iris[:, 0:2]  # Or iris[:, :2] -- Get the first two columns.

In [None]:
iris[0:3, :]  # Or iris[:3, :] -- Get the first 3 rows

### Using negative indicators.

In [None]:
# You can also use negative indicator for both getting an item/row/column and for slicing
iris[-1, :]   # Get the last row
iris[-2:, :]  # Get the last two rows
iris[:, :-1]  # First four columns

### Indexing with lists/arrays

We can access non-contiguous groups of rows or columns by using lists or arrays

In [None]:
iris[np[1, 3, 5], :] # Get rows 1, 3, and 5

### Logical indexing

We can perform logical operations on arrays and use the results to index into the array

In [None]:
arr = np.random.random(10) # Create an length 10 vector of random numbers
arr

In [None]:
arr > 0.5 # Check if each entry is > 0.5

In [None]:
arr[arr > 0.5] # Find all entries > 0.5

## Plotting the Data

First thing first, if you are using Jupyter Notebooks you need to allow inline plotting. Just run the next code cell.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

Let's try plotting a simple function (y = x^2) with matplotlib using a line plot.

We first need to define the x and y coordinates of the points along the function that we want to plot.

In [None]:
x = np.arange(-10, 10, 0.01) # Similar to Python's range, creates an array of equally-spaced values between -10 and 10

In [None]:
x

In [None]:
x.shape

In [None]:
y = x ** 2 # Square every value in x

In [None]:
y

We can now plot the points that we just defined.

In [None]:
plt.plot(x, y)
plt.show() 

Let's try plotting the first two columns of the iris data using the first one as the x-axis and the second as the y-axis.

In [None]:
plt.plot(iris[:, 0], iris[:, 1])
plt.show() 

This plot doesn't make much sence, that's because that's not the right way to plot the data. A scatter plot would be more appropriate in this case.

In [None]:
plt.scatter(iris[:, 0], iris[:, 1])
plt.show()

Now let's plot the scatter using the same two columns and color each observation by it's class (which you will have to do in your assignment).

In [None]:
colors = ['blue', 'green', 'red']

for i, c in enumerate(np.unique(iris[:, -1])):
    mask = iris[:, -1] == c  # Finding the right points
    plt.scatter(iris[mask, 0], iris[mask, 1], s=120, c=colors[i], alpha=0.75, label='class %d' % i)

plt.legend()
plt.show()

Another plot type that you will need to use is the histogram

In [None]:
normal_values = np.random.randn(500) # Draw 500 values from a standard normal distribution
plt.hist(normal_values) # Plot the histogram of these values
plt.show()