# Session 11: Introduction to Pandas and data manipulation

`pandas` is a Python library that uses `NumPy` under the hood. `pandas` allows us to perform data analysis and manipulation for tabular data. 

We can use pandas by importing it
```Python
import pandas
```
Or using its well known alias `pd`
```Python
import pandas as pd
```

In [1]:
# importing pandas as its shorter alias: pd
import pandas as pd

Now that we have loaded `pandas` we can start using it. But first, some of its main classes and functionalities.

## Pandas series

`pd.Series` is a pandas object that contains one dimensional data in an array-like data structure.

For each element in a `pd.Series`, a label is assigned to it: the `index`.

`pd.Series` can be created from lists, dictionaries, `NumPy` arrays, etc, and can contain integers, floats, strings, booleans, datetimes, ...

Let's create a series that contains the following elements `[1, 2, 3]` and let's assign the following labels (`index`) to them `["elem1", "elem2", "elem3"]`

In [2]:
# specifying the index

s = pd.Series(
    [1, 2, 3, "a"]
)

s[0]

1

In [4]:
type(s.index)

pandas.core.indexes.range.RangeIndex

In [3]:
s.values

array([1, 2, 3, 'a'], dtype=object)

In [5]:
pd.Series({"a": 1, "b": 2})

a    1
b    2
dtype: int64

In [3]:
# without specifying the index you get an index of 0, 1, 2

s = pd.Series([1, 2, 3, 4], index=["elem1", "elem2", "elem3", "a"])

s

elem1    1
elem2    2
elem3    3
a        4
dtype: int64

In [7]:
s.index

Index(['elem1', 'elem2', 'elem3', 'a'], dtype='object')

### Series elements

`pd.Series` have two main properties: `values` and `index`. 

We can extract the values contained in a series or the index that labels the values.

In [8]:
print(s.values)
print(type(s.values))

[1 2 3 4]
<class 'numpy.ndarray'>


In [9]:
s.dtype

dtype('int64')

In [10]:
list(s.values)

[1, 2, 3, 4]

In [11]:
print(s.index)
print(type(s.index))

Index(['elem1', 'elem2', 'elem3', 'a'], dtype='object')
<class 'pandas.core.indexes.base.Index'>


As we can see, the `values` are a `NumPy` array, whereas the `index` is a `pandas.Index` object

When `index` is not specified, `pandas` will use 0, 1, ..., N as indices.

In [12]:
s[0]

1

### Creating `pd.Series`

We can use the following objects as the argument of `pd.Series`

* From `list`, `tuple`
* From `np.array`
* From `dict`
* From `range()`

In [13]:
pd.Series([
    {"a":1}, {"a":1}, {"a":1}
])

0    {'a': 1}
1    {'a': 1}
2    {'a': 1}
dtype: object

In [4]:
import numpy as np

# from list
s_list = pd.Series(["1", "2", "3"])

# from tuple
s_tuple = pd.Series(("1", "2", "3"))

# from np.array
s_array = pd.Series(np.array((1, 2, 3)))

# from dict
s_dict = pd.Series({"x": 1, "y": 2, "z": 3, })

# from range
s_range = pd.Series(range(5))

print(f"Series from list: {s_list}")
print(f"Series from tuple: {s_tuple}")
print(f"Series from array: {s_array}")
print(f"Series from dict: {s_dict}")
print(f"Series from range: {s_range}")

Series from list: 0    1
1    2
2    3
dtype: object
Series from tuple: 0    1
1    2
2    3
dtype: object
Series from array: 0    1
1    2
2    3
dtype: int32
Series from dict: x    1
y    2
z    3
dtype: int64
Series from range: 0    0
1    1
2    2
3    3
4    4
dtype: int64


### `pd.Series` basic properties and methods

In [15]:
s = pd.Series([1, 2, 3, 4])

In [16]:
s

0    1
1    2
2    3
3    4
dtype: int64

In [17]:
# length: `len(s)`

len(s)

4

In [18]:
# shape: `s.shape`
s.shape

(4,)

In [19]:
len(s)==s.shape[0]

True

In [20]:
# type of elements

s = pd.Series([1, 2, "a"])

s.dtype

dtype('O')

In [21]:
a = pd.Series(["a", 2.0, 1])

a

0      a
1    2.0
2      1
dtype: object

In [22]:
type(a[1])

float

In [5]:
my_dict = {
    "a": 1,
    "b": 2
}

type(my_dict.get("c"))

NoneType

In [6]:
my_dict.get('a')

1

In [25]:
my_dict = {"b": 1, "a": 2}

my_dict.get("g")

In [26]:
# selecting a certain element according to the index: `get()`
ser = pd.Series({"a": 1, "b": 2, "c": 3})

ser.get("b")

2

## Pandas DataFrames

If we aggregate several series together we can build a `pd.DataFrame`. These objects are table-like structures in which each row is represented by its own `index` and each column is represented by a column name.

A `pd.DataFrame` is, in the end, a `NumPy` matrix of `n` rows and `m` columns, with labels for each row and column.

### How to create a `pd.DataFrame`

We can use the following objects as the argument of `pd.DataFrame`

* From dict of `pd.Series`: the keys will be the name of the columns
* From list of dicts: the keys will be the name of the columns
* From dict of lists: the keys will be the name of the columns 
...

In [7]:
# from dict of pd.Series

series1 = pd.Series([1, 2, 3, 4])
series2 = pd.Series([2, 3, 4, 5])

df = pd.DataFrame(
    {
        "var1": series1,
        "var2": series2
    }
)

df

Unnamed: 0,var1,var2
0,1,2
1,2,3
2,3,4
3,4,5


In [28]:
df = pd.DataFrame(
    {
        "dani": [1, 2, 3],
        "pedro": ["a", "b", "c"]
    }
)

df

Unnamed: 0,dani,pedro
0,1,a
1,2,b
2,3,c


In [29]:
df.values

array([[1, 'a'],
       [2, 'b'],
       [3, 'c']], dtype=object)

In [30]:
list(df.index)

[0, 1, 2]

In [31]:
df.columns

Index(['dani', 'pedro'], dtype='object')

In [32]:
# from list of dicts

list_of_dicts = [
    {"Name": "Daniel", "Age": None, "Furry": False, "Height": 178},
    {"Name": "Churro", "Age": None, "Furry": True, "Height": 60},
    {"Age": None, "Furry": False, "Height": 40},
]

pd.DataFrame(list_of_dicts)

Unnamed: 0,Name,Age,Furry,Height
0,Daniel,,False,178
1,Churro,,True,60
2,,,False,40


In [33]:
# from dict of lists
dict_lists = {
    "var1": ["Good", "Average", "Bad"],
    "var2": [32, 6, 1],
    "var3": [False, True, None],
    "var4": [178, 60, 40]
}

pd.DataFrame(dict_lists)

Unnamed: 0,var1,var2,var3,var4
0,Good,32,False,178
1,Average,6,True,60
2,Bad,1,,40


### `pd.DataFrame` basic properties and methods

In [34]:
df = pd.DataFrame({
    "col_float": [1.0, 2.3, 5.66],
    "col_int": [1, 2, 3],
    "col_string": ["abc", "abc", "ghi"],
    "col_boolean": [True, True, False]
})

df

Unnamed: 0,col_float,col_int,col_string,col_boolean
0,1.0,1,abc,True
1,2.3,2,abc,True
2,5.66,3,ghi,False


#### Values:

`df.values` is the actual information contained and labelled in our dataframe.

Returns an `np.array`. These are formed as a list of list in which each sublist contains the rows of the matrix

In [47]:
# values: np.array
# np.arrays are formed as a list of list in which each sublist contains the rows of the matrix

df.values

array([[1.0, 1, 'abc', True],
       [2.3, 2, 'abc', True],
       [5.66, 3, 'ghi', False]], dtype=object)

#### Index:

We can see the index used in our DF by using the `index` property. We will receive a generator that we can unfold with `list()` for example

We can also change this index to any other info we want with `df.set_index()`.

If we pass the argument `drop=False` we will not remove the column from the values, and we'll have it as index AND in the values. Otherwise, it will be removed.

In [48]:
print(df.index)
print(list(df.index))

RangeIndex(start=0, stop=3, step=1)
[0, 1, 2]


In [49]:
df

Unnamed: 0,col_float,col_int,col_string,col_boolean
0,1.0,1,abc,True
1,2.3,2,abc,True
2,5.66,3,ghi,False


In [50]:
# we can use an existing column as new index
df_new_index = df.set_index("col_string")

# print(df_new_index.index)

df_new_index.loc["abc", :]

Unnamed: 0_level_0,col_float,col_int,col_boolean
col_string,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
abc,1.0,1,True
abc,2.3,2,True


We can reset our index to the original (0, 1, ...) with `df.reset_index()`

In [51]:
df_new_index = df_new_index.reset_index()

In [52]:
df_new_index

Unnamed: 0,col_string,col_float,col_int,col_boolean
0,abc,1.0,1,True
1,abc,2.3,2,True
2,ghi,5.66,3,False


In [53]:
df_new_index.reset_index(inplace=True)

df_new_index

Unnamed: 0,index,col_string,col_float,col_int,col_boolean
0,0,abc,1.0,1,True
1,1,abc,2.3,2,True
2,2,ghi,5.66,3,False


#### Columns and their names:

`df.columns` return the labels attached to each column, or their names.

We can mutate this by mutating the `df.columns` values, or by using `df.rename()` passing as argument a dict with keys containing the old columns we want to change, and value the new name.

In [54]:
# get the columns names
df.columns

Index(['col_float', 'col_int', 'col_string', 'col_boolean'], dtype='object')

In [55]:
df

Unnamed: 0,col_float,col_int,col_string,col_boolean
0,1.0,1,abc,True
1,2.3,2,abc,True
2,5.66,3,ghi,False


In [56]:
df.columns = [colname.replace("col", "type") for colname in df.columns]

df

Unnamed: 0,type_float,type_int,type_string,type_boolean
0,1.0,1,abc,True
1,2.3,2,abc,True
2,5.66,3,ghi,False


In [57]:
# updating df.columns by mutating the df.columns info directly
df.columns = ["type_" + colname.split("_")[1] for colname in df.columns]

df

Unnamed: 0,type_float,type_int,type_string,type_boolean
0,1.0,1,abc,True
1,2.3,2,abc,True
2,5.66,3,ghi,False


In [58]:
df = df.rename(columns={
    "type_float": "asdfqasdf"
})

df

Unnamed: 0,asdfqasdf,type_int,type_string,type_boolean
0,1.0,1,abc,True
1,2.3,2,abc,True
2,5.66,3,ghi,False


In [59]:
df.rename(
    columns={
        "type_float": "col01",
        "type_int": "col02",
        "type_string": "col03",
        "type_boolean": "col04",
    },
    inplace=True
)

df

Unnamed: 0,asdfqasdf,col02,col03,col04
0,1.0,1,abc,True
1,2.3,2,abc,True
2,5.66,3,ghi,False


In [60]:
df

Unnamed: 0,asdfqasdf,col02,col03,col04
0,1.0,1,abc,True
1,2.3,2,abc,True
2,5.66,3,ghi,False


In [61]:
# using df.rename({old_col:new_col})
# we need to use `inplace=True` 
# if we want to update the information stored in memory or assign the result to a new variable

df.rename(columns={
    "type_float": "type_num_float",
    "type_int": "type_num_int"
}, inplace=True)

# or updating the variable: df = df...

In [62]:
df

Unnamed: 0,asdfqasdf,col02,col03,col04
0,1.0,1,abc,True
1,2.3,2,abc,True
2,5.66,3,ghi,False


#### Describe 

`df.describe()` returns a summary with statistics for the numeric columns, very useful for Exploratory Data Analysis

In [63]:
df.describe()

Unnamed: 0,asdfqasdf,col02
count,3.0,3.0
mean,2.986667,2.0
std,2.40469,1.0
min,1.0,1.0
25%,1.65,1.5
50%,2.3,2.0
75%,3.98,2.5
max,5.66,3.0


#### Transpose

Since DataFrames are matrices, we can transpose them with `df.T`

In [64]:
df

Unnamed: 0,asdfqasdf,col02,col03,col04
0,1.0,1,abc,True
1,2.3,2,abc,True
2,5.66,3,ghi,False


In [65]:
df.T

Unnamed: 0,0,1,2
asdfqasdf,1.0,2.3,5.66
col02,1,2,3
col03,abc,abc,ghi
col04,True,True,False


## Indexing and slicing

We can create subsets of our `pandas` objects in different ways:

* `df.loc[label_row_start:label_row_end, label_col_start:label_col_end]` to slice by label
* `df.iloc[pos_row_start:pos_row_end, pos_col_start:pos_col_end]` to slice by position
* Good old []

In [66]:
a = [1, 2, 3, 4]

a[1:3]

[2, 3]

In [5]:
df = pd.DataFrame({
    "col_float": [1.0, 2.3, 5.66, 9.99],
    "col_int": [1, 2, 3, 4],
    "col_string": ["abc", "def", "ghi", "jkl"],
    "col_boolean": [True, True, False, True]
})

# set `col_string` as index
df = df.set_index("col_string")

df

Unnamed: 0_level_0,col_float,col_int,col_boolean
col_string,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
abc,1.0,1,True
def,2.3,2,True
ghi,5.66,3,False
jkl,9.99,4,True


Now we have string labels for rows (`"abc", "def", "ghi"`) and columns (`"col_float", "col_int", "col_boolean"`)

### Slicing based on single values as arguments

* `loc[row_label, column_label]`
* `iloc[row_position, column_position]`

In [6]:
df

Unnamed: 0_level_0,col_float,col_int,col_boolean
col_string,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
abc,1.0,1,True
def,2.3,2,True
ghi,5.66,3,False
jkl,9.99,4,True


In [7]:
# value at second row and second column: using loc
df.loc[:, "col_int"] # all the rows but just the column called `col_int`

col_string
abc    1
def    2
ghi    3
jkl    4
Name: col_int, dtype: int64

In [8]:
df.loc["ghi", :]

col_float       5.66
col_int            3
col_boolean    False
Name: ghi, dtype: object

In [9]:
# value at second row and second column: using iloc
df.iloc[:, 2]

col_string
abc     True
def     True
ghi    False
jkl     True
Name: col_boolean, dtype: bool

We can also get all the row or all the column by using `:`

In [10]:
# getting the 3rd row using loc
df.loc["ghi", :]

col_float       5.66
col_int            3
col_boolean    False
Name: ghi, dtype: object

In [11]:
# getting the 3rd row using iloc
df.iloc[2, :]

col_float       5.66
col_int            3
col_boolean    False
Name: ghi, dtype: object

In [12]:
# to get the whole column we can use `:` in the rows position

# with loc
df.loc[:, "col_float"]

col_string
abc    1.00
def    2.30
ghi    5.66
jkl    9.99
Name: col_float, dtype: float64

In [13]:
# with iloc
df.iloc[:, 0]

col_string
abc    1.00
def    2.30
ghi    5.66
jkl    9.99
Name: col_float, dtype: float64

We can do all this for columns with just square brackets `[]` and the name of the column:

In [14]:
df_copy = df.copy()

In [15]:
df_copy.loc[:, "col_float"] = df_copy.loc[:, "col_float"] * 2

df_copy

Unnamed: 0_level_0,col_float,col_int,col_boolean
col_string,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
abc,2.0,1,True
def,4.6,2,True
ghi,11.32,3,False
jkl,19.98,4,True


In [16]:
df_copy['c'] = df_copy['col_float'].map(lambda x: x**2)

df_copy

Unnamed: 0_level_0,col_float,col_int,col_boolean,c
col_string,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
abc,2.0,1,True,4.0
def,4.6,2,True,21.16
ghi,11.32,3,False,128.1424
jkl,19.98,4,True,399.2004


In [17]:
df["col_int"] # extracting a single column as a series

col_string
abc    1
def    2
ghi    3
jkl    4
Name: col_int, dtype: int64

In [18]:
df[["col_int"]] # extracting a single column as a dataframe

Unnamed: 0_level_0,col_int
col_string,Unnamed: 1_level_1
abc,1
def,2
ghi,3
jkl,4


In [19]:
df[["col_float", "col_int"]]

Unnamed: 0_level_0,col_float,col_int
col_string,Unnamed: 1_level_1,Unnamed: 2_level_1
abc,1.0,1
def,2.3,2
ghi,5.66,3
jkl,9.99,4


### Slicing based on several values: lists or ranges.

* Using ranges of values:
  * Using `loc`: when using `loc` the final value **WILL BE INCLUDED**
    * df.loc[ini_row_label:end_row_label, ini_col_label:end_col_label]
  * Using `iloc`
    * df.iloc[ini_row_position:end_row_position, ini_col_position:end_col_position]
    
* Using list of values:
  * Using `loc`:
    * df.loc[[row_labels_to_include], [col_labels_to_include]]
  * Using `iloc`
    * df.iloc[[row_positions_to_include], [col_positions_to_include]]

In [20]:
df

Unnamed: 0_level_0,col_float,col_int,col_boolean
col_string,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
abc,1.0,1,True
def,2.3,2,True
ghi,5.66,3,False
jkl,9.99,4,True


In [21]:
# get the last 3 rows for the last 2 columns
# using ranges and loc
df.loc["def":"jkl", "col_int": "col_boolean"]

Unnamed: 0_level_0,col_int,col_boolean
col_string,Unnamed: 1_level_1,Unnamed: 2_level_1
def,2,True
ghi,3,False
jkl,4,True


In [22]:
# get the last 3 rows for the last 2 columns
# using ranges and iloc
df.iloc[-3:, -2:]


Unnamed: 0_level_0,col_int,col_boolean
col_string,Unnamed: 1_level_1,Unnamed: 2_level_1
def,2,True
ghi,3,False
jkl,4,True


In [23]:
# get the 2nd and 4th row for the 2nd column
# using lists and loc
df.loc[["def", "jkl"], "col_int"]

col_string
def    2
jkl    4
Name: col_int, dtype: int64

In [24]:
# get the 2nd and 4th row for the 2nd column
# using lists and iloc
df.iloc[[1, 3], 1]

col_string
def    2
jkl    4
Name: col_int, dtype: int64

### Slicing based on conditions

Before diving into it, let's see what `pandas` return when we perform logical operations on a series

In [25]:
# create series
s = pd.Series([1, 2, 3, 4, 5])

s

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [26]:
s > 3 # mask

0    False
1    False
2    False
3     True
4     True
dtype: bool

In [29]:
s[s>3] # applying mask to series

3    4
4    5
dtype: int64

In [31]:
# create series
s = pd.Series([1, 2, 3, 4, 5])

# condition
s[s>3]

3    4
4    5
dtype: int64

In [32]:
s[(s > 3) & (s % 2 == 0)] # concatenate conditions

3    4
dtype: int64

### equivalences pandas - native python for logical operators

Python - Pandas

* `and` - `&`
* `or` - `|`
* `not` - `~`

The result is another `pd.Series` filled with `True/False` according to whether or not the condition was met for each element.

We can use this to *filter* series, by including the condition between brackets.

```Python
series[series[condition]]
```

### logical operators in pandas
* and: &
* or: |
* not: ~

In [33]:
s = pd.Series(range(26))

s[(s % 5 == 0) & (s > 12)]

15    15
20    20
25    25
dtype: int64

In [34]:
s[(s % 5 == 0) | (s % 7 == 0)]

0      0
5      5
7      7
10    10
14    14
15    15
20    20
21    21
25    25
dtype: int64

We can extend this behaviors to DataFrames: we can filter the rows of our df based on logical conditions on the columns.

In [35]:
# defining the dataset
df = pd.DataFrame({
    "col_a": [1, 2, 3, 4],
    "col_b": [2, 4, 6, 8],
    "col_c": [3, 6, 9, 12],
    "col_d": [4, 8, 12, 16]
})

df

Unnamed: 0,col_a,col_b,col_c,col_d
0,1,2,3,4
1,2,4,6,8
2,3,6,9,12
3,4,8,12,16


### filter rows that have `col_c` greater or equal than 9

In SQL:

select
    col_a, 
    col_b,
from df
where col_c >= 9

In [40]:
# filter rows that have `col_c` greater or equal than 9
df[df["col_c"] >= 9][["col_a", "col_b"]]

Unnamed: 0,col_a,col_b
2,3,6
3,4,8


We can concatenate conditions in a single instruction:

In [41]:
"""
select *
from df
where col_a div2 and col_d > 8
"""

# rows with even values of `col_a` AND values of `col_d` greater than 8
df[
    (df["col_a"] % 2 == 0) &
    (df["col_d"] > 8)
]

Unnamed: 0,col_a,col_b,col_c,col_d
3,4,8,12,16


In [42]:
# filter all the columns that end with "d" 

df[[col for col in df.columns if col.endswith("d")]]

Unnamed: 0,col_d
0,4
1,8
2,12
3,16


## Adding data to a `pd.DataFrame`

* Adding columns:
```Python
df["new_column"] = data_to_include
```

In [43]:
# create df
df_sport = pd.DataFrame({
    "sport": ["football", "basketball", "rugby"],
    "round_ball": [True, True, False],
    "is_cool": [False, True, True]
})

df_sport

Unnamed: 0,sport,round_ball,is_cool
0,football,True,False
1,basketball,True,True
2,rugby,False,True


In [None]:
# add a new column called players_per_team
df_sport["players_per_team"] = [11, 5, 15]

# print df
df_sport

Unnamed: 0,sport,round_ball,is_cool,players_per_team
0,football,True,False,11.0
1,basketball,True,True,5.0
2,rugby,False,True,


* Adding rows: NOPE
```Python
new_row = {col_1: data_1, col_2: data_2, ..., col_n:data_n}
df = df.append(new_row, ignore_index=True)
```

In [51]:
# define new row
am_football = [{
    "sport": "american football",
    "round_ball": False,
    "is_cool": False,
    "players_per_team": 11
}]

new_df = pd.DataFrame(am_football)

# add new row for american football using concat
# basically concat is used to add rows with axis=0 (default)
# or columns with axis=1
df = pd.concat([df_sport, new_df])

# print df
df

Unnamed: 0,sport,round_ball,is_cool,players_per_team
0,football,True,False,11.0
1,basketball,True,True,5.0
2,rugby,False,True,
0,american football,False,False,11.0


In [52]:
# reverting the screwed up df

df_fixed = df.loc[:, 0:3]

df_fixed.columns = ["sport", "round_ball", "is_cool", "players_per_team"]

df_fixed

TypeError: cannot do slice indexing on Index with these indexers [0] of type int

## Practice

### 1. Create the following DataFrame

Save it as the variable `dataset`

| name   | age | type    | is_furry | likes_cats |
|--------|-----|---------|----------|------------|
| dani   | 36  | human   | False    | False      |
| churro | 11   | dog     | True     | False      |
| plant  | 1   | plant   | False    | True       |
| cup    | 1   | object  | False    | True       |

In [8]:
dataset = pd.DataFrame({
    'name': ['dani', 'churro', 'plant', 'cup'],
    'age': [36, 11, -1, 1],
    'type': ['human', 'dog', 'dead', 'object'],
    'is_furry': [False, True, False, False],
    'likes_cats': [False, False, True, True]
})

dataset

Unnamed: 0,name,age,type,is_furry,likes_cats
0,dani,36,human,False,False
1,churro,11,dog,True,False
2,plant,-1,dead,False,True
3,cup,1,object,False,True


### 2. Return the value of `is_furry` for the element with `index = 2`

In [9]:
is_furry = dataset.loc[2, 'is_furry']

is_furry

False

### 3. Add a new column containing the length of the name in characters

In [10]:
dataset['name_length'] = [len(name) for name in dataset['name']]

dataset

Unnamed: 0,name,age,type,is_furry,likes_cats,name_length
0,dani,36,human,False,False,4
1,churro,11,dog,True,False,6
2,plant,-1,dead,False,True,5
3,cup,1,object,False,True,3


### 4. Add a new column named `logical_op` containing the following logical operation:

```Python
`is_furry` and `likes_cats`
```

Keep in mind that logical operators in pandas are not the ones we know (`not`, `and`, `or`) but rather (`~`, `&`, `|`).

Source: https://jakevdp.github.io/PythonDataScienceHandbook/02.06-boolean-arrays-and-masks.html#Boolean-operators

In [62]:
dataset['logical_op'] = dataset['is_furry'] & dataset['likes_cats']

dataset

Unnamed: 0,name,age,type,is_furry,likes_cats,name_length,logical_op
0,dani,36,human,False,False,4,False
1,churro,11,dog,True,False,6,False
2,plant,-1,dead,False,True,5,False
3,cup,1,object,False,True,3,False


### 5. Change the `index` from the default to:

`element_1, element_2, element_3, element_4, element_5`

In [11]:
elements = [
    f'element_{i + 1}' for i in dataset.index
]

dataset.index = elements

dataset

Unnamed: 0,name,age,type,is_furry,likes_cats,name_length
element_1,dani,36,human,False,False,4
element_2,churro,11,dog,True,False,6
element_3,plant,-1,dead,False,True,5
element_4,cup,1,object,False,True,3


In [17]:
dataset.index = ['el1', 'el2', 'el3', 'el4']

dataset

Unnamed: 0,name,age,type,is_furry,likes_cats,name_length
el1,dani,36,human,False,False,4
el2,churro,11,dog,True,False,6
el3,plant,-1,dead,False,True,5
el4,cup,1,object,False,True,3


### 6. Create a new dataframe called `non_furry` that contains the rows with `is_furry = False`:

In [74]:
non_furry = dataset[dataset['is_furry'] == False]

non_furry

Unnamed: 0,name,age,type,is_furry,likes_cats,name_length,logical_op
element_1,dani,36,human,False,False,4,False
element_3,plant,-1,dead,False,True,5,False
element_4,cup,1,object,False,True,3,False


In [18]:
dataset['not_furry'] = dataset['is_furry'] == False

dataset

Unnamed: 0,name,age,type,is_furry,likes_cats,name_length,not_furry
el1,dani,36,human,False,False,4,True
el2,churro,11,dog,True,False,6,False
el3,plant,-1,dead,False,True,5,True
el4,cup,1,object,False,True,3,True
