## Jupyter Pyplot Numpy Problem Sheet
###### Solution by Ervin Mamutov - github.com/imervin

#### Citation
[1]https://en.wikipedia.org/wiki/Iris_flower_data_set
[2]https://en.wikipedia.org/wiki/Multivariate_statistics
[3]https://en.wikipedia.org/wiki/Linear_discriminant_analysis


### Fisher's Iris Dataset
Fisher's Iris Dataset is a multivariate (analysis of more than out outcome variable. [2]) dataset of the Iris flower. The dataset was introduced by a British statistician Ronald Fisher in his paper *The use of multiple measurements in taxonomic problems (1936)* as an example of linear discriminant analysis (method to characterize or separate two or more classes of objects or events for statistics, pattern matching and machine learning to find a linear combination of features. [3]). [1]

### What is the dataset?
The dataset consists of 50 samples from each of three species of Iris (setosa, versicolor and virginica with four features that were measured for each sample: length and width of the sepals and petals, in centimeters. [1]

### What is the dataset used for?
The dataset was used to develop a linear discriminant model to distinguish the species from each other. [1]


### Get and load the data.
I have downloaded a CSV file of Fisher's Iris Dataset. The next step I want to take is to distinguish what each piece of data represents.

I found the attribute information you see below here - https://archive.ics.uci.edu/ml/datasets/iris.

| First              | Second            | Third              | Fourth            | Fifth |
| :----------------- |:------------------| :------------------|:------------------| :-----|
| sepal length in cm | sepal width in cm | petal length in cm | petal width in cm | Class |

Now that I know what the structure and data represents, I can store it into a numpy array(s).

In [6]:
# Import numpy
import numpy as np

# Adapted code from - https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt
# Read column by column into 5 different arrays, one for sepal length, one for sepal width, one for petal length, one for petal width and one for iris class.
# Read in by using a "," delimiter (because it's a CSV file), set the first 4 variables as float type and the last one as a 15 character long string.
# Usecols allows me to take the value in each column and place it into an appropriate array.
sepal_L, sepal_W, petal_L, petal_W, iris_class = np.loadtxt(open("IRIS_dataset.csv", "rb"), 
                                                            delimiter=",", 
                                                            dtype={'names': ('sepal_L', 'sepal_W', 'petal_L', 'petal_W', 'iris_class'), 'formats': ('float', 'float', 'float', 'float', 'S15')}, 
                                                            usecols=(0,1,2,3,4), 
                                                            unpack=True)
print("Sepal Lengths:\n",sepal_L,"\nSepal Widths:\n",sepal_W,"\nPetal Lengths:\n",petal_L,"\nPetal Widths:\n",petal_W,"\n")

Sepal Lengths:
 [ 5.1  4.9  4.7  4.6  5.   5.4  4.6  5.   4.4  4.9  5.4  4.8  4.8  4.3  5.8
  5.7  5.4  5.1  5.7  5.1  5.4  5.1  4.6  5.1  4.8  5.   5.   5.2  5.2  4.7
  4.8  5.4  5.2  5.5  4.9  5.   5.5  4.9  4.4  5.1  5.   4.5  4.4  5.   5.1
  4.8  5.1  4.6  5.3  5.   7.   6.4  6.9  5.5  6.5  5.7  6.3  4.9  6.6  5.2
  5.   5.9  6.   6.1  5.6  6.7  5.6  5.8  6.2  5.6  5.9  6.1  6.3  6.1  6.4
  6.6  6.8  6.7  6.   5.7  5.5  5.5  5.8  6.   5.4  6.   6.7  6.3  5.6  5.5
  5.5  6.1  5.8  5.   5.6  5.7  5.7  6.2  5.1  5.7  6.3  5.8  7.1  6.3  6.5
  7.6  4.9  7.3  6.7  7.2  6.5  6.4  6.8  5.7  5.8  6.4  6.5  7.7  7.7  6.
  6.9  5.6  7.7  6.3  6.7  7.2  6.2  6.1  6.4  7.2  7.4  7.9  6.4  6.3  6.1
  7.7  6.3  6.4  6.   6.9  6.7  6.9  5.8  6.8  6.7  6.7  6.3  6.5  6.2  5.9]
