![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Basics of Python
This notebook will provide the basics of Python and an introduction to DataFrames.

To enter code in a Jupyter notebook we are going to use **Code cells**.

### To create a new Code cell:
#### If you are using Callysto Hub:
At the top left, click on the plus sign (`+`) next to the save (`💾`) button.

*New cells are code cells by default, but you can also use the dropdown menu at the top middle of the page to change a cell to `Markdown` to change to a text cell.*

#### If you are using Colab:
Click on `+Code` in the top left corner (or between cells) to create a new Code cell.

### To run a Code cell:
#### Callysto Hub:
Click the `⇥ Run` button at the top near the stop (`◼`) button, or hit `Ctrl-Enter`.

#### Colab:
Click the play button (`▶`) on the left of the selected cell, or hit `Ctrl-Enter`.

In [None]:
# Anything in a code cell after a pound sign is a comment! 
# You can type anything here and it will not be executed 

In [None]:
# Variables are defined with an equals sign (=)

my_variable = 10             # You cannot put spaces in variable names. 
other_variable = "some text" # variables need not be numbers!

# Print will output our variables below the cell
print(my_variable, other_variable)

In [None]:
# Variables are also shared between cells. You can also pring words and sentences directly. 
print(my_variable, other_variable, "We can print text directly in quotes")

In [None]:
# You can do mathematical operations in Python

x = 5
y = 10

add = x + y
subtract = x - y
multiply = x * y
divide = x / y

print(add, subtract, multiply, divide)

---
### Exercise 1

1. In the cell below, assign variable **z** to your name and run the cell. 
2. In the cell below, write a comment on the same line where you define z. Run the cell to make sure the comment is not changing anything.
---


In [None]:
## Enter your code in the line below
z = "your name here"

##

print(z, "is loving Python!")

# Basics of DataFrames and pandas

A **DataFrame** is a two-dimensional data structure, similar to a table or a spreadsheet.

In Python there is a library of pre-defined functions to work with DataFrames called **pandas**.

In [None]:
#load "pandas" library under short name "pd"
import pandas as pd

To read file in csv format the **read_csv()** function is used, it can read a file or a file from a URL.

In [None]:
# we have csv file of data related to hypothetical pets for adoption
# from https://www.bootstrapworld.org/materials/data-science/
#url = './pets.csv'
url = "https://swift-yeg.cloud.cybera.ca:8080/v1/AUTH_233e84cd313945c992b4b585f7b9125d/callysto-open-data/pets.csv"

#read csv file from url and save it as dataframe
pets = pd.read_csv(url)

#print on the screen
pets

In [None]:
#shape shows number of rows and number of columns
pets.shape

## Basic operations with DataFrames  

### Select  rows/colums  by name and index

In [None]:
#Getting column names
pets.columns

In [None]:
#Selecting one column
pets[['Species']]

In [None]:
#Selecting multiple columns
pets[['Species','Gender']]  

In [None]:
#Selecting first 5 rows
#try changin to head(10) or head(2)
pets.head()

In [None]:
#Getting index (row names)  - note that row names start at 0
pets.index.tolist()

In [None]:
#Selecting one row
pets.iloc[[2]]
#(it's row 3, remember row number start at zero)

In [None]:
#Selecting multiple rows(rows 2 and 5):
pets.iloc[[2,5]]

In [None]:
#Selecting rows and columns:
pets[['Species','Gender']].iloc[[2,5]]

---
### Exercise 2

1. In the cell below, uncomment the code
2. Change "column1", "column2", and "column3" to "Species", "Fixed", and "Time to Adoption (weeks)" to get these 3 columns

---

In [None]:
#pets[["column1","column2","column3"]]

### Add a new column using existing one

In [None]:
#create a new column (mass in kilograms) by dividing "Weight (lbs)" column by 2.205
pets['Mass (kg)'] = pets['Weight (lbs)']/2.205

#look at the very last column 'Mass (kg)'
pets

### Select specific colums  by condition

In [None]:
#create a condition: for example Species is dog
condition = pets["Species"]=="dog" #note == signs means "is equal" rather than assigning

condition #it shows True for rows where species is equal to dog

In [None]:
#select only rows where condition is True (all female)
pets[condition]

In [None]:
#other examples of conditions:

#Not equal
condition1 = pets["Species"]!="dog"

#equal to one value in the list
condition2 = pets["Species"].isin(["lizard","rabbit", "tarantula"])

#Multiple conditions: "and" (Gender is "female" and age is greater than 3)
condition3 = (pets["Gender"]=="female") & (pets["Age (years)"]>3)

#Multiple conditions: "or" (Fixed is True or Legs is greater than 4)
condition4 = (pets["Fixed"]==True) | (pets["Legs"]>4)

---
### Exercise 3

1. Change the cell below to get a subset of the data where "Fixed" is equal to True and "Time to Adoption (weeks)" is less than 5

---

In [None]:
#change the conditions here
condition5 = (pets["Fixed"] == True) | (pets["Time to Adoption (weeks)"] < 5)

pets[condition5]

### Sorting 

In [None]:
#sorting by age - note the ascending paramater, try changing it to False
pets.sort_values("Age (years)", ascending=False)

In [None]:
#sort by two columns - first by age and then by time to adoption
pets.sort_values(["Age (years)","Time to Adoption (weeks)"])

### Grouping and calculating summaries on groups

In [None]:
#split data into groups based on all unique values in "Fixed" column
#first group is fixed (1), second groups is not fixed (0)

#calculate average (mean) for every column for both groups
pets.groupby("Fixed").mean()

In [None]:
#another operations you can do on groups  are
# min(), max(),  sum()

In [None]:
#you can do multiple operations at once using agg() function
pets.groupby("Legs").agg(["mean","max"])

---
### Exercise 4

1. Modify the cell below to  calculate **max()** for every column grouped by "Species"
---

In [None]:
#modify this cell
pets.groupby("Species").max()

### Calculating number of rows by group

In [None]:
#using size() function to calculate number of rows by group
row_counts = pets.groupby("Gender").size()

#create new column "count" to store row numbers
row_counts = row_counts.reset_index(name="Count")

row_counts

---
### Exercise 5

1. Calculate the number of rows grouped by column "Species"
---

In [None]:
#using size() function to calculate number of rows by group
row_counts = pets.groupby("Species").size()

#create new column "count" to store row numbers
row_counts = row_counts.reset_index(name="Count")

row_counts

Addtional resources for Pandas and DataFrames can be found [here](https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python) and 
[here](https://www.kaggle.com/grroverpr/pandas-cheatsheet).

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)