![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Working with Data

We will use the [pandas](https://pandas.pydata.org/) library to work with datasets in a format similar to spreadsheets or tables. `▶Run` the code cell below to import the `pandas` library using the short form `pd`. We will then load and display a [CSV file](https://en.wikipedia.org/wiki/Comma-separated_values) of data related to *hypothetical* pets for adoption from [Bootstrap](https://www.bootstrapworld.org/materials/data-science/).

In [None]:
import pandas as pd
pets = pd.read_csv('data/pets.csv')
pets

Now that we have the data in a [dataframe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html), we can see see that there are 31 rows (the `index` starts at `0`) and 8 columns.

Let's use a `for` loop to print the column names.

In [None]:
for column in pets.columns:
    print(column)

We can select (and display) just one column.

In [None]:
pets['Name']

or multiple columns

In [None]:
pets[['Name', 'Species']]

The methods `.head()` and `.tail()` will display the top or bottom rows.

In [None]:
pets.tail(3)

This also works with column selection.

In [None]:
pets[['Name', 'Weight (lbs)']].head(4)

---

### Exercise

In the cell below, use code to display the `'Name'`, `'Legs'` and `'Time to Adoption (weeks)'` columns. Remember that you can copy and paste the column names to avoid mistakes with spelling or capitalization.

---

### Selecting data from columns by condition

You can filter the data, for example to show only `dog`s.

*Note that the `==` sign means "check if it is equal to" rather than asigning a value to a variable.*

In [None]:
condition = pets['Species'] == 'dog'
pets[condition]

We can also use `!=`, which means "not equal to".

In [None]:
condition = pets['Species'] != 'dog'
pets[condition]

It's also possible to check if the value is in a list.

In [None]:
list_of_species = ["lizard","rabbit", "tarantula"]
pets[pets["Species"].isin(list_of_species)]

We can also use multiple conditions with **and** (`&`) or **or** (`|`). We need to include `()` around each condition.

In [None]:
pets[ (pets["Gender"]=="female") & (pets["Age (years)"]>3) ]

In [None]:
pets[ (pets["Fixed"]==True) | (pets["Legs"]>4) ]

---

### Exercise

In the cell below, get a subset of the data where `'Fixed'` is equal to `True` and `'Time to Adoption (weeks)'` is less than `5`.

---

### Sorting data

We can sort the dataframe, for example by the `'Age (years)'` column.

In [None]:
pets.sort_values('Age (years)')

The default is to sort `ascending`, but we can instead sort in descending order.

In [None]:
pets.sort_values('Age (years)', ascending=False)

Or we can sort by two columns, first by age and then by time to adoption

In [None]:
pets.sort_values(['Age (years)', 'Time to Adoption (weeks)'])

### Adding a new column using existing one

In Canada, most pet food instructions are based on the pet's mass in kilograms.

We can create a new column, `Mass (kg)` by dividing the `Weight (lbs)` column by $2.205$.

In [None]:
pets['Mass (kg)'] = pets['Weight (lbs)'] / 2.205
pets.head()

---

### Exercise

Create a new column called `Time to Adoption (days)` by multiplying (`*`) the `Time to Adoption (weeks)` column by `7`.

---

---

<span style="color:#663399">Your **assignment** is to *copy* the code that you wrote for the exercieses in this notebook, and *paste* it into a document to hand in. As well, give an example of why each of these calculations might be useful.</span>

<span style="color:#FF6633">An **optional advanced challenge** is to create a new column that is the sum of the two columns in https://raw.githubusercontent.com/callysto/data-files/main/data-science-and-artificial-intelligence/datasaurus.csv and list the first five values in that column.</span>

---

Now that you have some experience with dataframes, the [next notebook](03-statistics.ipynb) will help you find some statistics.

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)