## Exercise on wine dataset


Let's start off by following the general workflow that we use when moving data into a DataFrame: 

    * Importing Pandas
    * Reading data into the DataFrame
    * Getting a general sense of the data

So, in terms of what you should do for this part...

1. Select the first 10 rows of the `chlorides` column. 
2. Select the last 10 rows of the `chlorides` column. 
3. Select columns `chlorides` **and** `density`. 
4. Select all rows where the `chlorides` value is less than 0.10. 
5. Now select all the rows where the `chlorides` value is greater than the column's mean (try **not** to use a hard-coded value for the mean, but instead a method).
6. Select all those rows where the `pH` is greater than 3.0 and less than 3.5. Further filter the results from 6 to grab only those rows that have a `residual sugar` less than 2.0. 

If you'd like some extra practice, try answering each of the questions in more than one way (because remember, we can often select data in a couple of different ways). Selecting is the same as displaying on the screen in this context.

In [1]:
import pandas as pd

In [2]:
winequality_red = pd.read_csv('../data/winequality-red.csv', sep=";")
winequality_red

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.700,0.00,1.9,0.076,11.0,34.0,0.99780,3.51,0.56,9.4,5
1,7.8,0.880,0.00,2.6,0.098,25.0,67.0,0.99680,3.20,0.68,9.8,5
2,7.8,0.760,0.04,2.3,0.092,15.0,54.0,0.99700,3.26,0.65,9.8,5
3,11.2,0.280,0.56,1.9,0.075,17.0,60.0,0.99800,3.16,0.58,9.8,6
4,7.4,0.700,0.00,1.9,0.076,11.0,34.0,0.99780,3.51,0.56,9.4,5
...,...,...,...,...,...,...,...,...,...,...,...,...
1594,6.2,0.600,0.08,2.0,0.090,32.0,44.0,0.99490,3.45,0.58,10.5,5
1595,5.9,0.550,0.10,2.2,0.062,39.0,51.0,0.99512,3.52,0.76,11.2,6
1596,6.3,0.510,0.13,2.3,0.076,29.0,40.0,0.99574,3.42,0.75,11.0,6
1597,5.9,0.645,0.12,2.0,0.075,32.0,44.0,0.99547,3.57,0.71,10.2,5


In [3]:
#1. Select the first 10 rows of the `chlorides` column. 

winequality_red[['chlorides']].head(10)

Unnamed: 0,chlorides
0,0.076
1,0.098
2,0.092
3,0.075
4,0.076
5,0.075
6,0.069
7,0.065
8,0.073
9,0.071


In [4]:
#2. Select the last 10 rows of the `chlorides` column. 

winequality_red[['chlorides']].tail(10)

Unnamed: 0,chlorides
1589,0.073
1590,0.077
1591,0.089
1592,0.076
1593,0.068
1594,0.09
1595,0.062
1596,0.076
1597,0.075
1598,0.067


In [5]:
#3. Select columns `chlorides` and `density`. 

winequality_red[['chlorides', 'density']]

Unnamed: 0,chlorides,density
0,0.076,0.99780
1,0.098,0.99680
2,0.092,0.99700
3,0.075,0.99800
4,0.076,0.99780
...,...,...
1594,0.090,0.99490
1595,0.062,0.99512
1596,0.076,0.99574
1597,0.075,0.99547


In [6]:
winequality_red.loc[200:250, ['chlorides', 'density']]

Unnamed: 0,chlorides,density
200,0.056,0.99695
201,0.097,0.9975
202,0.075,0.99545
203,0.088,0.9961
204,0.089,0.99615
205,0.095,0.9994
206,0.095,0.9994
207,0.069,0.99625
208,0.1,0.9966
209,0.054,0.998


In [7]:
winequality_red[['chlorides', 'density']].iloc[200:251]

Unnamed: 0,chlorides,density
200,0.056,0.99695
201,0.097,0.9975
202,0.075,0.99545
203,0.088,0.9961
204,0.089,0.99615
205,0.095,0.9994
206,0.095,0.9994
207,0.069,0.99625
208,0.1,0.9966
209,0.054,0.998


In [8]:
#4. Select all rows where the `chlorides` value is less than 0.10.

winequality_red[winequality_red['chlorides'] < 0.10]

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.700,0.00,1.9,0.076,11.0,34.0,0.99780,3.51,0.56,9.4,5
1,7.8,0.880,0.00,2.6,0.098,25.0,67.0,0.99680,3.20,0.68,9.8,5
2,7.8,0.760,0.04,2.3,0.092,15.0,54.0,0.99700,3.26,0.65,9.8,5
3,11.2,0.280,0.56,1.9,0.075,17.0,60.0,0.99800,3.16,0.58,9.8,6
4,7.4,0.700,0.00,1.9,0.076,11.0,34.0,0.99780,3.51,0.56,9.4,5
...,...,...,...,...,...,...,...,...,...,...,...,...
1594,6.2,0.600,0.08,2.0,0.090,32.0,44.0,0.99490,3.45,0.58,10.5,5
1595,5.9,0.550,0.10,2.2,0.062,39.0,51.0,0.99512,3.52,0.76,11.2,6
1596,6.3,0.510,0.13,2.3,0.076,29.0,40.0,0.99574,3.42,0.75,11.0,6
1597,5.9,0.645,0.12,2.0,0.075,32.0,44.0,0.99547,3.57,0.71,10.2,5


In [9]:
#5. Now select all the rows where the `chlorides` value is greater than the column's mean
#(try not to use a hard-coded value for the mean, but instead a method.)

winequality_red[winequality_red['chlorides'] > winequality_red['chlorides'].mean()]

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
1,7.8,0.880,0.00,2.6,0.098,25.0,67.0,0.99680,3.20,0.68,9.8,5
2,7.8,0.760,0.04,2.3,0.092,15.0,54.0,0.99700,3.26,0.65,9.8,5
10,6.7,0.580,0.08,1.8,0.097,15.0,65.0,0.99590,3.28,0.54,9.2,5
12,5.6,0.615,0.00,1.6,0.089,16.0,59.0,0.99430,3.58,0.52,9.9,5
13,7.8,0.610,0.29,1.6,0.114,9.0,29.0,0.99740,3.26,1.56,9.1,5
...,...,...,...,...,...,...,...,...,...,...,...,...
1558,6.9,0.630,0.33,6.7,0.235,66.0,115.0,0.99787,3.22,0.56,9.5,5
1570,6.4,0.360,0.53,2.2,0.230,19.0,35.0,0.99340,3.37,0.93,12.4,6
1578,6.8,0.670,0.15,1.8,0.118,13.0,20.0,0.99540,3.42,0.67,11.3,6
1591,5.4,0.740,0.09,1.7,0.089,16.0,26.0,0.99402,3.67,0.56,11.6,6


In [10]:
winequality_red['chlorides'].mean()

0.08746654158849279

In [11]:
winequality_red.chlorides.mean()

0.08746654158849279

In [12]:
winequality_red.query('chlorides > chlorides.mean()')

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
1,7.8,0.880,0.00,2.6,0.098,25.0,67.0,0.99680,3.20,0.68,9.8,5
2,7.8,0.760,0.04,2.3,0.092,15.0,54.0,0.99700,3.26,0.65,9.8,5
10,6.7,0.580,0.08,1.8,0.097,15.0,65.0,0.99590,3.28,0.54,9.2,5
12,5.6,0.615,0.00,1.6,0.089,16.0,59.0,0.99430,3.58,0.52,9.9,5
13,7.8,0.610,0.29,1.6,0.114,9.0,29.0,0.99740,3.26,1.56,9.1,5
...,...,...,...,...,...,...,...,...,...,...,...,...
1558,6.9,0.630,0.33,6.7,0.235,66.0,115.0,0.99787,3.22,0.56,9.5,5
1570,6.4,0.360,0.53,2.2,0.230,19.0,35.0,0.99340,3.37,0.93,12.4,6
1578,6.8,0.670,0.15,1.8,0.118,13.0,20.0,0.99540,3.42,0.67,11.3,6
1591,5.4,0.740,0.09,1.7,0.089,16.0,26.0,0.99402,3.67,0.56,11.6,6


In [13]:
#6. Select all those rows where the `pH` is greater than 3.0 and less than 3.5. 

winequality_red[winequality_red['pH'].between(3.0, 3.5)]

# don‘t use between with time!

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
1,7.8,0.88,0.00,2.6,0.098,25.0,67.0,0.99680,3.20,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.99700,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.99800,3.16,0.58,9.8,6
6,7.9,0.60,0.06,1.6,0.069,15.0,59.0,0.99640,3.30,0.46,9.4,5
7,7.3,0.65,0.00,1.2,0.065,15.0,21.0,0.99460,3.39,0.47,10.0,7
...,...,...,...,...,...,...,...,...,...,...,...,...
1592,6.3,0.51,0.13,2.3,0.076,29.0,40.0,0.99574,3.42,0.75,11.0,6
1593,6.8,0.62,0.08,1.9,0.068,28.0,38.0,0.99651,3.42,0.82,9.5,6
1594,6.2,0.60,0.08,2.0,0.090,32.0,44.0,0.99490,3.45,0.58,10.5,5
1596,6.3,0.51,0.13,2.3,0.076,29.0,40.0,0.99574,3.42,0.75,11.0,6


In [14]:
#7. Further filter the results from 6 to grab only those rows that have a `residual sugar` less than 2.0.
#Tip: Use backticks (``) to mask column names with spaces in query string.

winequality_red[winequality_red['pH'].between(3.0, 3.5) & (winequality_red['residual sugar'] < 2.0)]

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.99800,3.16,0.58,9.8,6
6,7.9,0.60,0.06,1.6,0.069,15.0,59.0,0.99640,3.30,0.46,9.4,5
7,7.3,0.65,0.00,1.2,0.065,15.0,21.0,0.99460,3.39,0.47,10.0,7
10,6.7,0.58,0.08,1.8,0.097,15.0,65.0,0.99590,3.28,0.54,9.2,5
13,7.8,0.61,0.29,1.6,0.114,9.0,29.0,0.99740,3.26,1.56,9.1,5
...,...,...,...,...,...,...,...,...,...,...,...,...
1569,6.2,0.51,0.14,1.9,0.056,15.0,34.0,0.99396,3.48,0.57,11.5,6
1576,8.0,0.30,0.63,1.6,0.081,16.0,29.0,0.99588,3.30,0.78,10.8,6
1578,6.8,0.67,0.15,1.8,0.118,13.0,20.0,0.99540,3.42,0.67,11.3,6
1590,6.3,0.55,0.15,1.8,0.077,26.0,35.0,0.99314,3.32,0.82,11.6,6


In [15]:
winequality_red[winequality_red['pH'].between(3.0, 3.5)].query('`residual sugar` < 2.0')

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.99800,3.16,0.58,9.8,6
6,7.9,0.60,0.06,1.6,0.069,15.0,59.0,0.99640,3.30,0.46,9.4,5
7,7.3,0.65,0.00,1.2,0.065,15.0,21.0,0.99460,3.39,0.47,10.0,7
10,6.7,0.58,0.08,1.8,0.097,15.0,65.0,0.99590,3.28,0.54,9.2,5
13,7.8,0.61,0.29,1.6,0.114,9.0,29.0,0.99740,3.26,1.56,9.1,5
...,...,...,...,...,...,...,...,...,...,...,...,...
1569,6.2,0.51,0.14,1.9,0.056,15.0,34.0,0.99396,3.48,0.57,11.5,6
1576,8.0,0.30,0.63,1.6,0.081,16.0,29.0,0.99588,3.30,0.78,10.8,6
1578,6.8,0.67,0.15,1.8,0.118,13.0,20.0,0.99540,3.42,0.67,11.3,6
1590,6.3,0.55,0.15,1.8,0.077,26.0,35.0,0.99314,3.32,0.82,11.6,6


## Exercise on iris dataset

![IRIS, https://github.com/simonava5/fishers-iris-data](../images/iris.png)

After the notebook with a lot of new input, let's start applying it totally by yourselves. 
For this purpose we will use one of the most standard real-life datasets: It's called Iris Dataset, and is all about the plant iris. Let's learn a little bit more about the dataset by taking a closer look at it. 

In [16]:
# import pandas
import pandas as pd

In [17]:
# load the data
iris = pd.read_csv('../../Data/iris.csv')
iris

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica


In [18]:
iris.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667
std,0.828066,0.433594,1.76442,0.763161
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


1. Let us first have a look at the head of the table, maybe also on the last 10 rows...

In [19]:
iris.head(10)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa
9,4.9,3.1,1.5,0.1,Iris-setosa


In [20]:
iris.tail(10)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
140,6.7,3.1,5.6,2.4,Iris-virginica
141,6.9,3.1,5.1,2.3,Iris-virginica
142,5.8,2.7,5.1,1.9,Iris-virginica
143,6.8,3.2,5.9,2.3,Iris-virginica
144,6.7,3.3,5.7,2.5,Iris-virginica
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica
149,5.9,3.0,5.1,1.8,Iris-virginica


2. How many irises are the data set?


In [21]:
# 150 different entries, so also 150 irises
len(iris)

150

In [22]:
iris.nunique()

sepal_length    35
sepal_width     23
petal_length    43
petal_width     22
species          3
dtype: int64

In [23]:
iris.shape

(150, 5)

8. Calculate the basic descriptive statistics for all columns of the iris data set using a single command.

In [24]:
iris.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


12. Add the sum of the sepal width and length as a new column to your data frame.

In [25]:
iris['sepal_sum'] = iris['sepal_length'] + iris['sepal_width']
iris

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,sepal_sum
0,5.1,3.5,1.4,0.2,Iris-setosa,8.6
1,4.9,3.0,1.4,0.2,Iris-setosa,7.9
2,4.7,3.2,1.3,0.2,Iris-setosa,7.9
3,4.6,3.1,1.5,0.2,Iris-setosa,7.7
4,5.0,3.6,1.4,0.2,Iris-setosa,8.6
...,...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica,9.7
146,6.3,2.5,5.0,1.9,Iris-virginica,8.8
147,6.5,3.0,5.2,2.0,Iris-virginica,9.5
148,6.2,3.4,5.4,2.3,Iris-virginica,9.6


In [26]:
# with the insert-command I can choose where to insert the new column

iris.insert(2, 'sepal_total', iris['sepal_length'] + iris['sepal_width'])
iris

Unnamed: 0,sepal_length,sepal_width,sepal_total,petal_length,petal_width,species,sepal_sum
0,5.1,3.5,8.6,1.4,0.2,Iris-setosa,8.6
1,4.9,3.0,7.9,1.4,0.2,Iris-setosa,7.9
2,4.7,3.2,7.9,1.3,0.2,Iris-setosa,7.9
3,4.6,3.1,7.7,1.5,0.2,Iris-setosa,7.7
4,5.0,3.6,8.6,1.4,0.2,Iris-setosa,8.6
...,...,...,...,...,...,...,...
145,6.7,3.0,9.7,5.2,2.3,Iris-virginica,9.7
146,6.3,2.5,8.8,5.0,1.9,Iris-virginica,8.8
147,6.5,3.0,9.5,5.2,2.0,Iris-virginica,9.5
148,6.2,3.4,9.6,5.4,2.3,Iris-virginica,9.6


In [27]:
# reordering the columns

iris["something"] = iris['sepal_length'] + iris['sepal_width']

iris_1 = iris[['sepal_length', 'sepal_width', 'something', 'petal_length', 'petal_width', 'species']]
iris_1

Unnamed: 0,sepal_length,sepal_width,something,petal_length,petal_width,species
0,5.1,3.5,8.6,1.4,0.2,Iris-setosa
1,4.9,3.0,7.9,1.4,0.2,Iris-setosa
2,4.7,3.2,7.9,1.3,0.2,Iris-setosa
3,4.6,3.1,7.7,1.5,0.2,Iris-setosa
4,5.0,3.6,8.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...,...
145,6.7,3.0,9.7,5.2,2.3,Iris-virginica
146,6.3,2.5,8.8,5.0,1.9,Iris-virginica
147,6.5,3.0,9.5,5.2,2.0,Iris-virginica
148,6.2,3.4,9.6,5.4,2.3,Iris-virginica


18. Create a new column with a rough estimate of petal area by multiplying petal length and width together.

In [28]:
iris['petal_area'] = iris['petal_length'] * iris['petal_width']
iris

Unnamed: 0,sepal_length,sepal_width,sepal_total,petal_length,petal_width,species,sepal_sum,something,petal_area
0,5.1,3.5,8.6,1.4,0.2,Iris-setosa,8.6,8.6,0.28
1,4.9,3.0,7.9,1.4,0.2,Iris-setosa,7.9,7.9,0.28
2,4.7,3.2,7.9,1.3,0.2,Iris-setosa,7.9,7.9,0.26
3,4.6,3.1,7.7,1.5,0.2,Iris-setosa,7.7,7.7,0.30
4,5.0,3.6,8.6,1.4,0.2,Iris-setosa,8.6,8.6,0.28
...,...,...,...,...,...,...,...,...,...
145,6.7,3.0,9.7,5.2,2.3,Iris-virginica,9.7,9.7,11.96
146,6.3,2.5,8.8,5.0,1.9,Iris-virginica,8.8,8.8,9.50
147,6.5,3.0,9.5,5.2,2.0,Iris-virginica,9.5,9.5,10.40
148,6.2,3.4,9.6,5.4,2.3,Iris-virginica,9.6,9.6,12.42


19. Create a new dataframe with petal areas greater than $1cm^2$.

In [29]:
iris_great_petal = iris[iris['petal_area'] > 1]
iris_great_petal

Unnamed: 0,sepal_length,sepal_width,sepal_total,petal_length,petal_width,species,sepal_sum,something,petal_area
50,7.0,3.2,10.2,4.7,1.4,Iris-versicolor,10.2,10.2,6.58
51,6.4,3.2,9.6,4.5,1.5,Iris-versicolor,9.6,9.6,6.75
52,6.9,3.1,10.0,4.9,1.5,Iris-versicolor,10.0,10.0,7.35
53,5.5,2.3,7.8,4.0,1.3,Iris-versicolor,7.8,7.8,5.20
54,6.5,2.8,9.3,4.6,1.5,Iris-versicolor,9.3,9.3,6.90
...,...,...,...,...,...,...,...,...,...
145,6.7,3.0,9.7,5.2,2.3,Iris-virginica,9.7,9.7,11.96
146,6.3,2.5,8.8,5.0,1.9,Iris-virginica,8.8,8.8,9.50
147,6.5,3.0,9.5,5.2,2.0,Iris-virginica,9.5,9.5,10.40
148,6.2,3.4,9.6,5.4,2.3,Iris-virginica,9.6,9.6,12.42


In [30]:
iris

Unnamed: 0,sepal_length,sepal_width,sepal_total,petal_length,petal_width,species,sepal_sum,something,petal_area
0,5.1,3.5,8.6,1.4,0.2,Iris-setosa,8.6,8.6,0.28
1,4.9,3.0,7.9,1.4,0.2,Iris-setosa,7.9,7.9,0.28
2,4.7,3.2,7.9,1.3,0.2,Iris-setosa,7.9,7.9,0.26
3,4.6,3.1,7.7,1.5,0.2,Iris-setosa,7.7,7.7,0.30
4,5.0,3.6,8.6,1.4,0.2,Iris-setosa,8.6,8.6,0.28
...,...,...,...,...,...,...,...,...,...
145,6.7,3.0,9.7,5.2,2.3,Iris-virginica,9.7,9.7,11.96
146,6.3,2.5,8.8,5.0,1.9,Iris-virginica,8.8,8.8,9.50
147,6.5,3.0,9.5,5.2,2.0,Iris-virginica,9.5,9.5,10.40
148,6.2,3.4,9.6,5.4,2.3,Iris-virginica,9.6,9.6,12.42


In [34]:
id(iris['sepal_length']) # this is the address - so one can compare the adresses of different DataFrames or Columns

4543456688

20. Using the original unfiltered dataframe, create 3 new dataframes, each containing only irises of each a single species 'Iris-setosa', 'Iris-versicolor' or 'Iris-virginica'.

In [31]:
iris_setosa = iris[iris['species'] == 'Iris-setosa']
iris_setosa

Unnamed: 0,sepal_length,sepal_width,sepal_total,petal_length,petal_width,species,sepal_sum,something,petal_area
0,5.1,3.5,8.6,1.4,0.2,Iris-setosa,8.6,8.6,0.28
1,4.9,3.0,7.9,1.4,0.2,Iris-setosa,7.9,7.9,0.28
2,4.7,3.2,7.9,1.3,0.2,Iris-setosa,7.9,7.9,0.26
3,4.6,3.1,7.7,1.5,0.2,Iris-setosa,7.7,7.7,0.3
4,5.0,3.6,8.6,1.4,0.2,Iris-setosa,8.6,8.6,0.28
5,5.4,3.9,9.3,1.7,0.4,Iris-setosa,9.3,9.3,0.68
6,4.6,3.4,8.0,1.4,0.3,Iris-setosa,8.0,8.0,0.42
7,5.0,3.4,8.4,1.5,0.2,Iris-setosa,8.4,8.4,0.3
8,4.4,2.9,7.3,1.4,0.2,Iris-setosa,7.3,7.3,0.28
9,4.9,3.1,8.0,1.5,0.1,Iris-setosa,8.0,8.0,0.15


In [32]:
iris_versicolor = iris[iris['species'] == 'Iris-versicolor']
iris_versicolor

Unnamed: 0,sepal_length,sepal_width,sepal_total,petal_length,petal_width,species,sepal_sum,something,petal_area
50,7.0,3.2,10.2,4.7,1.4,Iris-versicolor,10.2,10.2,6.58
51,6.4,3.2,9.6,4.5,1.5,Iris-versicolor,9.6,9.6,6.75
52,6.9,3.1,10.0,4.9,1.5,Iris-versicolor,10.0,10.0,7.35
53,5.5,2.3,7.8,4.0,1.3,Iris-versicolor,7.8,7.8,5.2
54,6.5,2.8,9.3,4.6,1.5,Iris-versicolor,9.3,9.3,6.9
55,5.7,2.8,8.5,4.5,1.3,Iris-versicolor,8.5,8.5,5.85
56,6.3,3.3,9.6,4.7,1.6,Iris-versicolor,9.6,9.6,7.52
57,4.9,2.4,7.3,3.3,1.0,Iris-versicolor,7.3,7.3,3.3
58,6.6,2.9,9.5,4.6,1.3,Iris-versicolor,9.5,9.5,5.98
59,5.2,2.7,7.9,3.9,1.4,Iris-versicolor,7.9,7.9,5.46


In [33]:
iris_virginica = iris[iris['species'] == 'Iris-virginica']
iris_virginica

Unnamed: 0,sepal_length,sepal_width,sepal_total,petal_length,petal_width,species,sepal_sum,something,petal_area
100,6.3,3.3,9.6,6.0,2.5,Iris-virginica,9.6,9.6,15.0
101,5.8,2.7,8.5,5.1,1.9,Iris-virginica,8.5,8.5,9.69
102,7.1,3.0,10.1,5.9,2.1,Iris-virginica,10.1,10.1,12.39
103,6.3,2.9,9.2,5.6,1.8,Iris-virginica,9.2,9.2,10.08
104,6.5,3.0,9.5,5.8,2.2,Iris-virginica,9.5,9.5,12.76
105,7.6,3.0,10.6,6.6,2.1,Iris-virginica,10.6,10.6,13.86
106,4.9,2.5,7.4,4.5,1.7,Iris-virginica,7.4,7.4,7.65
107,7.3,2.9,10.2,6.3,1.8,Iris-virginica,10.2,10.2,11.34
108,6.7,2.5,9.2,5.8,1.8,Iris-virginica,9.2,9.2,10.44
109,7.2,3.6,10.8,6.1,2.5,Iris-virginica,10.8,10.8,15.25
