# Python: playing with Iris data

In this notebook we will build our Python skills by playing with the Iris data.

### Instructions

- Please read the entire notebook carefully.
- Problems 1-10 are shown in code cells below.  Each problem begins with #@.
- Do not forget about the difference between an 'expression' and a 'statement'.
- Do not make changes outside the problem cells.
- Run your code from top to bottom before submitting.
- Do not modify the file name.

In [13]:
# code in this cell from: 
# https://stackoverflow.com/questions/27934885/how-to-hide-code-from-cells-in-ipython-notebook-visualized-with-nbviewer
from IPython.display import HTML

HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to display/hide the code."></form>''')

### Iris data

We will work with some random samples from the Iris data set.  

We will use a list for each feature of the data, and a list for the Iris classes.

In [14]:
# if you think of our data as a table, these are the columns of the table
sepal_length = [5.8, 6.0, 5.5, 7.3, 5.0, 6.3, 5.0, 6.7, 6.8, 6.1]
sepal_width  = [2.8, 2.2, 4.2, 2.9, 3.4, 3.3, 3.5, 3.1, 2.8, 2.8]
petal_length = [5.1, 4.0, 1.4, 6.3, 1.5, 6.0, 1.3, 4.7, 4.8, 4.0]
petal_width  = [2.4, 1.0, 0.2, 1.8, 0.2, 2.5, 0.3, 1.5, 1.4, 1.3]

# species for each Iris
species = ['virginica', 'versicolor', 'setosa', 'virginica', 'setosa',
       'virginica', 'setosa', 'versicolor', 'versicolor', 'versicolor']

In [15]:
# collect information about the first two flowers in the data
features = [sepal_length, sepal_width, petal_length, petal_width]
iris_0 = [ f[0] for f in features ]
iris_1 = [ f[1] for f in features ]

In [16]:
#@ 1  How many flowers in our sample?
#  10
#  Write an expression to compute the length of list sepal_length.  Do not use a print statement!

# YOUR CODE HERE
len(sepal_length)


10

In [18]:
#@ 2  What's the biggest petal width value?
#  2.5
#  Write an expression to compute the maximum petal_width value.

# YOUR CODE HERE
max(petal_width)

2.5

In [19]:
#@ 3  What is the range of petal lengths?
#  6.3 - 1.3 = 5.0
#  Write an expression to compute the difference between the maximum value 
#  in petal_length and the minimum value in petal_length.

# YOUR CODE HERE
max(petal_length) - min(petal_length)

5.0

In [20]:
#@ 4  What's the average petal length value?
# ( 5.1 + 4.0 + 1.4 + 6.3 + 1.5 + 6.0 + 1.3 + 4.7 + 4.8 + 4.0) / 10 == 3.91
#  Write an expression to compute the average value of petal_length.  
#  Your answer should be one line.

# YOUR CODE HERE
sum(petal_length)/len(petal_length)

3.91

In [21]:
#@ 5  What are the feature values for the first flower in the data set?
# 5.8, 2.8, 5.1, 2.4
#  Print the feature values for the first iris in the data set.
# YOUR CODE HERE
print(iris_0)

[5.8, 2.8, 5.1, 2.4]


In [22]:
#@ 6  How much variation is there in petal lengths?
#
#  To measure the amount of variation in a list of numbers, we can
#  compute how "far away" the numbers are from the average.
# 
#  Write a function 'variation(x)' that will return the sum of the 
#  squared differences between the items in x and the average value of x.
#  For example, the average value of [1,3,5] is 3, so 
#  variation([1,3,5]) should be 8, because (1-3)**2, (3-3)**2, and (5-3)**2 is 8

def variation(x):
    """ Return the sum of the squared differences between between
    the values in x and the average value of x.
    """
    
    # YOUR CODE HERE
    # Don't forget the return statement

    total = 0;
    aver = sum(x)/len(x)

    for num in x:
      total += (num - aver) **2

    return total

What is the variation in petal length?

In [23]:
print('Variation in petal length: {:.2f}'.format(variation(petal_length)))

Variation in petal length: 31.85


In [24]:
#@ 7  What are the average values for each of the features?
#6.05, 3.1, 3.91,1.26
#  Write an expression that will give a list contain the average value for each of the four features.  
#  Hint: use variable 'features', which is defined in an earlier cell.
#  Your answer should be only one line.  Hint: use a list comprehension.

# YOUR CODE HERE
print([round((sum(i)/len(i)), 2) for i in features])

[6.05, 3.1, 3.91, 1.26]


In [25]:
#@ 8  The species names are pretty long.  
#
# Write an expression that will give a list containing the 
#  first two letters of every string in list 'species'.
#  Again, your answer should be only one line.

# YOUR CODE HERE
print([i[:2] for i in species])

['vi', 've', 'se', 'vi', 'se', 'vi', 'se', 've', 've', 've']


In [26]:
#@ 9  Do certain Iris species have larger petals than other species?  
# 12.24, 4.0, 0.28, 11.34, 0.3, 15.0, 0.39, 7.05, 6.72, 5.2
#  Write an expression that will give the petal width times petal length for each flower.
#  Your answer should be only one line.

# YOUR CODE HERE
print([ round(petal_length[i] * petal_width[i], 2)  for i in range(len(petal_length))])

[12.24, 4.0, 0.28, 11.34, 0.3, 15.0, 0.39, 7.05, 6.72, 5.2]


In [27]:
#@ 10 Does the product of petal length times petal width give us a good way 
#  to figure out the species of a flower?
#  What do you think?  You will need to look at the species corresponding to each
#  value in the list above.
#  (Enter you thoughts in the text cell below.)


I think multiplying the petals length and width does not give us a good way to figure out which species of flower it is. Looking at the numbers from the question above, it seems that some of the numbers are too similar to one another, like 0.28, 0.3, and 0.39. The similarites in number could get you the wrong species of flower. If you were to get a flower that you didn't know it species, and tried to multiply the petals length and width, and tried to match it to some of the numbers above, you might get the wrong species.