# Let's create some Pandas Series and Dataframes! 
In this notebook we will see how to create and access pandas series and dataframes.

## Exercise 1 - Create and access a Pandas Series
Create a pandas series with:
 - the following values:  [1, 3, 2, 7, 5, 4, 9] 
 - indexed with the following fruit names: ["apple", "orange", "cherry", "pear", "pineapple", "mango", "banana"]

Then print:
 - the first three elements of the series
 - all the values associated to a "pear" or a "mango"
 - all the indexes associated to a value greater than 5

In [12]:
import pandas as pd
values = [1, 3, 2, 7, 5, 4, 9] 
fruits = ["apple", "orange", "cherry", "pear", "pineapple", "mango", "banana"]
series = pd.Series(values, index=fruits)
print(series)

# first three elements of the series
print("\nFirst three elements of the series:")
print(series.iloc[:3]) # slicing
# print(series.loc[:"cherry"])

# values associated to index "pear" or "mango"
print("\nValues associated to index 'pear' or 'mango':")
print(series.loc[["pear", "mango"]].values) # fancy indexing on the index (.loc)

# indexes associated to a value greater than 5
print("\nIndexes associated to a value greater than 5:")
print(series.loc[series > 5].index) # masking on the entire series


apple        1
orange       3
cherry       2
pear         7
pineapple    5
mango        4
banana       9
dtype: int64

First three elements of the series:
apple     1
orange    3
cherry    2
dtype: int64

Values associated to index 'pear' or 'mango':
[7 4]

Indexes associated to a value greater than 5:
Index(['pear', 'banana'], dtype='object')


In [8]:
type(series['pear'])

numpy.int64

## Exercise 2 - Create and access a Pandas Dataframe
Given the following input table  with 12 samples and  4 attributes:

```
[[5.1, 3.5, 1, 0.2],
[4.3, 3. , 1, 0.1],
[5. , 0. , 1, 0.4],
[5.1, 3.4, 2, 0.2],
[7.0, 3.2, 1, 0.2],
[6.9, 3.1, 3, 1.5],
[6.7, 3.1, 1, 2. ],
[6. , 2.9, 2, 1.5],
[6.1, 3. , 2, 1.4],
[6.5, 3. , 3, 2.2],
[7.7, 3.8, 3, 2.2],
[7.4, 2.8, 1, 1.9],
[6.8, 3.2, 1, 2.3]]
```

Given the following column names
['height','width','intensity','weight']

Comput the following:
- Create a pandas dataframe with the given data and column names
- Add a new composite feature, 'area' = 'width' * 'height'
- Retrive the elements associated to an odd index
- Retrieve the elements with area > 20





In [14]:
import numpy as np

# Input table (12 samples x 4 attributes)
X = np.array([[5.1, 3.5, 1, 0.2],
             [4.3, 3. , 1, 0.1],
             [5. , 0. , 1, 0.4],
             [5.1, 3.4, 2, 0.2],
             [7.0, 3.2, 1, 0.2],
             [6.9, 3.1, 3, 1.5],
             [6.7, 3.1, 1, 2. ],
             [6. , 2.9, 2, 1.5],
             [6.1, 3. , 2, 1.4],
             [6.5, 3. , 3, 2.2],
             [7.7, 3.8, 3, 2.2],
             [7.4, 2.8, 1, 1.9],
             [6.8, 3.2, 1, 2.3]]
            )
# Column names
columns = ['height','width','intensity','weight']

# Create a pandas dataframe with the given data
df = pd.DataFrame(X, columns=columns)
print(df)

# Add a new composite feature, 'area' = 'width' * 'height'
df['area'] = df['width'] * df['height']

print("Dataframe:")
print(df)

# Retrive the elements associated to an odd index
print("\nElements associated to an odd index:")
print(df.loc[df.index % 2 == 1])
# print(df.iloc[1::2])

# Retrieve the elements samples with area > 20
print("\nElements samples with area > 20:")
mask = df['area'] > 20
print(df.loc[mask])
# print(df[df['area'] > 20]) # this also works, but it is not recommended

# Retrieve the "height" and "weight" of the elements with area > 20
print("\nHeight and weight of samples with area > 20:")
print(df.loc[mask, ["height", "weight"]])


    height  width  intensity  weight
0      5.1    3.5        1.0     0.2
1      4.3    3.0        1.0     0.1
2      5.0    0.0        1.0     0.4
3      5.1    3.4        2.0     0.2
4      7.0    3.2        1.0     0.2
5      6.9    3.1        3.0     1.5
6      6.7    3.1        1.0     2.0
7      6.0    2.9        2.0     1.5
8      6.1    3.0        2.0     1.4
9      6.5    3.0        3.0     2.2
10     7.7    3.8        3.0     2.2
11     7.4    2.8        1.0     1.9
12     6.8    3.2        1.0     2.3
Dataframe:
    height  width  intensity  weight   area
0      5.1    3.5        1.0     0.2  17.85
1      4.3    3.0        1.0     0.1  12.90
2      5.0    0.0        1.0     0.4   0.00
3      5.1    3.4        2.0     0.2  17.34
4      7.0    3.2        1.0     0.2  22.40
5      6.9    3.1        3.0     1.5  21.39
6      6.7    3.1        1.0     2.0  20.77
7      6.0    2.9        2.0     1.5  17.40
8      6.1    3.0        2.0     1.4  18.30
9      6.5    3.0        3.0   