![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Basics of Python

This notebook will provide the basics of Python in Jupyter and an introduction to DataFrames.

To enter code in a Jupyter notebook we are going to use **code cells**.

### To create a new Code cell:

At the top left, click on the plus sign (`+`) next to the save (`💾`) button.

*New cells are code cells by default, but you can also use the dropdown menu at the top middle of the page to change a cell to `Markdown` to change to a text cell.*

### To run a Code cell:

Select the cell, then click the `▶Run` button at the top near the stop (`◼`) button, or press `Ctrl-Enter`.

In [None]:
# Variables are defined with an equals sign (=)

my_variable = 10             # You cannot put spaces in variable names. 
other_variable = "some text" # variables need not be numbers!

# Print will output our variables below the cell
print(my_variable, other_variable)

Variables are also shared between cells. You can also print words and sentences directly.

In [None]:
print(my_variable, other_variable)
print("We can print text directly in quotes")

You can also do mathematical operations in Python.

In [None]:
x = 5
y = 10

add = x + y

subtract = x - y

multiply = x * y

divide = x / y

print(add, subtract, multiply, divide)

---
### Exercise 1

1. In the cell below, assign variable **z** to your name and run the cell. 
2. In the cell below, write a comment on the same line where you define z. Run the cell to make sure the comment is not changing anything.
---


In [None]:
z = "your name here"

print(z, "is loving Python!")

# Basics of DataFrames and pandas

A **DataFrame** is a two-dimensional data structure, similar to a table or a spreadsheet.

In Python there is a library of pre-defined functions to work with DataFrames called **pandas**.

To read file in csv format the `read_csv()` function is used, it can read a file or a file from an internet address (URL).

In [None]:
# load the "pandas" library under short name "pd"
import pandas as pd

# we have csv file of data related to hypothetical pets for adoption from https://www.bootstrapworld.org/materials/data-science/
pets = pd.read_csv('pets.csv')
# this would also work:
#url = "https://swift-yeg.cloud.cybera.ca:8080/v1/AUTH_233e84cd313945c992b4b585f7b9125d/callysto-open-data/pets.csv"
#pets = pd.read_csv(url)

# print data on the screen
pets

## Basic operations with DataFrames  

In [None]:
# shape shows the number of rows and columns
pets.shape

### Select  colums  by name 

In [None]:
#Getting column names
pets.columns

In [None]:
#Selecting one column
pets[['Species']]

In [None]:
# Selecting multiple columns
pets[['Species','Gender']]

In [None]:
# Selecting first 5 rows
# try changin to head(10) or head(2)
pets.head()

---
### Exercise 2

1. In the cell below, uncomment the code (remove the `#` sign)
2. Change "column1", "column2", and "column3" to "Species", "Fixed", and "Time to Adoption (weeks)" to get these 3 columns

---

In [None]:
#pets[["column1","column2","column3"]]

### Add a new column using existing one

You can create a new column, `Mass (kg)` by dividing the `Weight (lbs)` column by $2.205$.

In [None]:

pets['Mass (kg)'] = pets['Weight (lbs)']/2.205

#look at the DataFrame again, now with the new column
pets

### Select data from colums by condition

You can filter the data, for example to show only dogs, using `pets["Column"] == "something"`.

Note that the `==` sign means "check if it is equal to" rather than asigning a value to a variable.

In [None]:
condition = pets["Species"]=="dog"
pets[condition]

#### Other examples of conditions:

Not equal to using `!=`

In [None]:
pets[pets["Species"] != "dog"]

Check if the value is in a list

In [None]:
list_of_species = ["lizard","rabbit", "tarantula"]
pets[pets["Species"].isin(list_of_species)]

Two conditions with "and" using `&`

In [None]:
pets[ (pets["Gender"]=="female") & (pets["Age (years)"]>3) ]  # two conditions with and

Two conditions with "or" using `|`

In [None]:
pets[ (pets["Fixed"]==True) | (pets["Legs"]>4) ]

---
### Exercise 3

1. Change the cell below to get a subset of the data where "Fixed" is equal to True and "Time to Adoption (weeks)" is less than 5

---

In [None]:
condition5 = (pets[""]== 0) | (pets[""] == 0)
pets[condition5]

### Sorting

You can sort the DataFrame, for example by the "age" column.

In [None]:
pets.sort_values("Age (years)")

The default is to sort `ascending`, but you can instead sort in descending order.

In [None]:
pets.sort_values("Age (years)", ascending=False)

Or you can sort by two columns, first by age and then by time to adoption

In [None]:
pets.sort_values(["Age (years)","Time to Adoption (weeks)"])

### Grouping and calculating summaries on groups

You split data into groups based on values in a column. For example, the `Fixed` column would be split into two groups, `True` (fixed) and `False` (not fixed).

You can then calculate the mean (average) for every column for both groups using `.mean()`

In [None]:
pets.groupby("Fixed").mean()

Other operations you can do on groups are `.min()`, `.max()`, and `.sum()`

You can also do multiple operations at once using the `.agg()` function

In [None]:
pets.groupby("Legs").agg(["mean","max"])

---
### Exercise 4

1. Modify the cell below to  calculate **max()** for every column grouped by "Species"
---

In [None]:
pets.groupby("Age (years)").mean()

### Calculating number of rows by group

You can use the `.size()` function to calculate number of rows by group

In [None]:
pets.groupby("Gender").size()

---
### Exercise 5

1. Calculate the number of rows grouped by column "Species"
---

In [None]:
pets

Addtional resources for pandas and DataFrames can be found [here](https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python) and [here](https://www.kaggle.com/grroverpr/pandas-cheatsheet).

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)