### Introduction to the iris data set
This is perhaps the best known datasets to be found in the statistical sciences. 

Fisher's paper is a classic in the field and is referenced frequently to this day. 

The data set contains 3 Species of 50 instances each, where each class refers to a type of iris plant. 

One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. 

#### Attribute Information:
Variables

1. sepal length in cm 
2. sepal width in cm 
3. petal length in cm 
4. petal width in cm 
5. class (Species) 
    - Iris Setosa 
    - Iris Versicolour 
    - Iris Virginica

In [4]:
import pandas as pd
import numpy as np
# import seaborn as sb

In [5]:
iris = pd.read_csv("iris.csv")

### Iris Data Set

- 4 numeric variables
- 1 categorical variable

Notice that we have a  row number for each case
 - Python is "Zero Indexed" 
 - Other languages such as R are "1-Indexed"

In [6]:
# First Ten Rows
iris.head(10)

Unnamed: 0.1,Unnamed: 0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
0,1,5.1,3.5,1.4,0.2,setosa
1,2,4.9,3.0,1.4,0.2,setosa
2,3,4.7,3.2,1.3,0.2,setosa
3,4,4.6,3.1,1.5,0.2,setosa
4,5,5.0,3.6,1.4,0.2,setosa
5,6,5.4,3.9,1.7,0.4,setosa
6,7,4.6,3.4,1.4,0.3,setosa
7,8,5.0,3.4,1.5,0.2,setosa
8,9,4.4,2.9,1.4,0.2,setosa
9,10,4.9,3.1,1.5,0.1,setosa


In [7]:
list(iris)

['Unnamed: 0',
 'Sepal.Length',
 'Sepal.Width',
 'Petal.Length',
 'Petal.Width',
 'Species']

In [8]:
iris = iris.rename(columns={"Unnamed: 0" : "Case"})

In [9]:
iris.head(5)

Unnamed: 0,Case,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
0,1,5.1,3.5,1.4,0.2,setosa
1,2,4.9,3.0,1.4,0.2,setosa
2,3,4.7,3.2,1.3,0.2,setosa
3,4,4.6,3.1,1.5,0.2,setosa
4,5,5.0,3.6,1.4,0.2,setosa


In [10]:
iris.drop(["Case"],axis=1)

Unnamed: 0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
5,5.4,3.9,1.7,0.4,setosa
6,4.6,3.4,1.4,0.3,setosa
7,5.0,3.4,1.5,0.2,setosa
8,4.4,2.9,1.4,0.2,setosa
9,4.9,3.1,1.5,0.1,setosa


In [11]:
iris.head(3)

Unnamed: 0,Case,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
0,1,5.1,3.5,1.4,0.2,setosa
1,2,4.9,3.0,1.4,0.2,setosa
2,3,4.7,3.2,1.3,0.2,setosa


In [12]:
iris.drop(["Case"],inplace=True,axis=1)

In [13]:
iris.head(3)

Unnamed: 0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa


In [14]:
SepalRatio = iris["Sepal.Length"] / iris["Sepal.Width"]

In [15]:
type(SepalRatio)

pandas.core.series.Series

In [16]:
iris["Sepal.Ratio"] = SepalRatio

In [17]:
iris.head(2)

Unnamed: 0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species,Sepal.Ratio
0,5.1,3.5,1.4,0.2,setosa,1.457143
1,4.9,3.0,1.4,0.2,setosa,1.633333


In [18]:
iris["Petal.Ratio"] = iris["Petal.Length"] / iris["Petal.Width"]

In [19]:
iris.head(2)

Unnamed: 0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species,Sepal.Ratio,Petal.Ratio
0,5.1,3.5,1.4,0.2,setosa,1.457143,7
1,4.9,3.0,1.4,0.2,setosa,1.633333,7


In [20]:
iris.to_csv("newiris.csv")

In [21]:
import numpy as np


In [22]:
np.mean(iris["Sepal.Length"])

5.843333333333335

In [23]:
np.std(iris["Sepal.Length"])

0.8253012917851409

In [24]:
np.amax(iris["Sepal.Length"])

7.9

In [25]:
iris["Sepal.Length"]/ np.amax(iris["Sepal.Length"])

0      0.645570
1      0.620253
2      0.594937
3      0.582278
4      0.632911
5      0.683544
6      0.582278
7      0.632911
8      0.556962
9      0.620253
10     0.683544
11     0.607595
12     0.607595
13     0.544304
14     0.734177
15     0.721519
16     0.683544
17     0.645570
18     0.721519
19     0.645570
20     0.683544
21     0.645570
22     0.582278
23     0.645570
24     0.607595
25     0.632911
26     0.632911
27     0.658228
28     0.658228
29     0.594937
         ...   
120    0.873418
121    0.708861
122    0.974684
123    0.797468
124    0.848101
125    0.911392
126    0.784810
127    0.772152
128    0.810127
129    0.911392
130    0.936709
131    1.000000
132    0.810127
133    0.797468
134    0.772152
135    0.974684
136    0.797468
137    0.810127
138    0.759494
139    0.873418
140    0.848101
141    0.873418
142    0.734177
143    0.860759
144    0.848101
145    0.848101
146    0.797468
147    0.822785
148    0.784810
149    0.746835
Name: Sepal.Length, dtyp