
# GEOG59950 Programming for Geographical Information Analysis: Core Skills <a class="tocSkip">

 #### Contact: F.L.Pontin@leeds.ac.uk <a class="tocSkip">

# Exercise 2: Getting started

- Data types

## Using Jupyter notebooks

Welcome to coding in Python! This course is designed to introduce you to the basics of coding in Python and then get you to apply your new found coding skills to carry out some spatial data analytics! 
This session is written in a Jupyter Notebook. A Jupyter Notebook ["is an open source web application that you can use to create and share documents that contain live code, equations, visualizations, and text"](https://realpython.com/jupyter-notebook-introduction/). 

A Jupyter Notebook is made up of cells (the grey boxes). These cells can be text (known as markdown), code or visulisations/images, to name a few. 
To select a cell click anywhere in the box and a large blue box will appear arround it. To edit a cell click in it twice and the surrounding box will turn green. <br>
As shown below.

![image.png](../visualisations/screenshots/blue_box.PNG)

![image.png](../visualisations/screenshots/green_box.png)

During the teaching session you will read through the Jupyter Notebok, running sections of code yourself to learn key data analysis skills. There will also be places to edit and write your own code within the notebook.

* ##### <font color='orchid'>Instructions and tasks for you to complete are in purple</font> 
    
* ##### Where you have to write your own code answers are provided at the end of the section 

#### How to run a cell of code
To run code select the cell you want to run (the blue box will appear arround the selected cell) and then EITHER:

1) type CTRL + ENTER on your key board (shift + enter for mac)

OR

2) click the play buttor on the Jupter Notebook the menu above 

![image.png](attachment:image.png)

### How do I know if I have run the cell?
If the cell of code has been run a number will appear in squared brackets by the cell e.g. <code>In [1]: or In [12]:</code>

The numbering refers to the order in which you have run the cells

An un-run cell of code has an empty set of squared brackets by the cell I.e. <code>In []:</code>

### How do I add a new cell?<a id='new_cell'></a>

To add a new cell select a cell (so it is surrounded by a blue box) and type 'b' to add a cell bellow or 'a' to add a cell above.

To delete a cell select it and double tap 'd'.

## Hello World

Following coding tradition the first thing we are going to program is to print the words "Hello World".

The <code>print() </code> function prints the specified message to your screen

The speech marks <code>" "</code> arround Hello World let python know you are typing text (also known as a string).

 <font color='orchid'> <b>Run the code below</b></font>

In [None]:
print("Hello World")

## Basic maths

Lets do some basic maths 
<br><font color='orchid'> <b>Run the code below</b></font>

In [None]:
2+2

If we print 2+2 we get the same thing
<br> <font color='orchid'> <b>Run the code below</b></font>

In [None]:
print(2+2)

If we <code>print("2+2")</code> however we get the characters 2 + 2 as the speach marks tell Python we are typing text and not numerical characters


<br> <font color='orchid'> <b>Run the code below</b></font> and see for yourself

In [None]:
print("2+2")


## Assigning values to variables

We can also assign the sum 2+2 to a variable. 

In this case we have named the variable <font color='blue'>answer</font>. We can now type <font color='blue'>answer</font> instead of 2+2 every time.

<br> <font color='orchid'> <b>To assign 2+2 to the variable answer run the code below</b></font>

In [None]:
answer = 2+2
print(answer)

We can also add our created variable to a new equation
<br> <font color='orchid'> <b>To add 5 to the variable answer run the code below</b></font>

In [None]:
# we can also add our created variable to a new equation
answer + 5

NOTE: <code>#</code> When a hashtag is added to a line of code the rest of that line is not treated as code.
This can be used to add comments to your code so you know what it is doing (especially useful when you come back to it later on!)

## Lists

Remember lists are an ordered collection of one or more data item.  Defined by square brackets <code>[]</code>

<br> <font color='orchid'> <b>Look back at the lecture notes and create a list called fruit with the following data items: 'apple', 'pear', 'banana', 'strawberry' (note these are strings)</b></font>

In [None]:
# Type code here

We can check that you have created a list using the <code>type</code> function

<br> <font color='orchid'> <b>Enter <code> type(fruit) </code> into the cell below</b></font>

In [None]:
# run this code


### Lists and basic maths

We can also create a list of numbers.  <br> <font color='orchid'> <b>Run the code below </b></font>. What happens when we try to multiply the list by 3?

In [None]:
numbers = [1,3,5,7,9,11]
numbers*3

### List comprehension

To multiply each item in the list (and create a new list of the results) we need to select each value <code>i</code> in the list:
<code>[i*3 for i in numbers]</code>

I.e. for each element 1 to i in the list named numbers, multiply that element by 3

<br> <font color='orchid'> <b> Run the code to see</b></font>

In [None]:
# multiply element by 3, repeat for every element in the list 'numbers'
new_numbers = [i*3 for i in numbers]

# print the new list
print(new_numbers)

<div class="alert alert-block alert-warning">

### A very quick introduction to for loops

Making a list based on an old list is known as list comprehension. An alternative is a for loop as shown below.

We will come back to these later, so for now just <font color='orchid'> <b> run the code and read the comments explaining what each step does. </b></font>

In [None]:
# create a new empty list
my_new_list = []

# for element in the list 'numbers': 
for i in numbers:
    # multiply element by 3 and append the result to the new empty list        
    my_new_list.append(i * 3)

# print the new list
print(my_new_list)

We can also multiply by another defined variable. For example we previously defined the variable <code>answer</code> (<code>answer = 2+2</code>)

In [None]:
# multiply element by variable 'answer', repeat for every element in the list 'numbers'
[i*answer for i in numbers]

Commonly you will see <code>i</code> and <code>j</code> to define elements in a list. However this is just convention and you could use anything you wanted to refer to the elements in the list e.g. <code>elephants</code>

(Though typing <code>i</code> is a lot quicker)

In [None]:
# multiply element by 3, repeat for every element in the list 'numbers'
[elephants*3 for elephants in numbers]

## First look at data frames

### .head() and .tail() functions
To understand how to explore data frames and different data types in python we are going to use a set of data about passengers on the titanic. This is an example dataset built into the seaborn python package. 

We will go into detail about reading in data and loading packages in the next exercise, <font color='orchid'> <b>for now run the cell of code below. </b></font>

Note the <code>.head()</code> function shows the top 5 lines of the data frame. 

In [None]:
# Import the seaborn package
import seaborn as sns

# load the titanic example dataset and save it as a dataframe named titanic
titanic = sns.load_dataset('titanic')

# look at the first 5 rows of the dataframe
titanic.head()

Note NaN denotes a cell containing no data - a null cell

<font color='orchid'> <b> Try entering and running <code>titanic.tail()</code> instead of <code>.head()</code> in the cell below. </b></font> What view of the dataframe do you think you are now seeing? 

In [None]:
# enter the instructed code here



    
<I> A Quick description of the titanic data variables:
- <b>survival:</b>    If the passenger survived
- <b>PassengerId:</b> Unique Id of a passenger. 
- <b>pclass:</b>    Ticket class
- <b>sex:</b>   Sex     
- <b>Age:</b>   Age in years     
- <b>sibsp:</b>    Number of siblings / spouses aboard the Titanic     
- <b>parch:</b>   Number of parents / children aboard the Titanic     
- <b>ticket:</b>   Ticket number     
- <b>fare:</b>   Cost of the passenger fare     
- <b>cabin:</b>  Cabin number     
- <b>embarked:</b>    Port of Embarkation</I> 

### Data frame columns

We might just want to get a list of the columns in the dataframe to give us a quick idea of what data we have present. To do this we can use the <code>.columns</code> function after we name the dataframe.

<font color='orchid'> <b> Enter <code>titanic.columns</code> in the cell below. </b></font>. The columns listed should be the same as the columns in the <code>.head()</code> view of the dataframe. 


In [None]:
# enter the code here


#### Referring to a column in a dataframe

If we want to select a single column of the dataframe we can also do that.

There are several ways to refer to a column in a pandas dataframe. 

The easiest way is by putting the name of the column in square barckets and speech marks <code>[" "]</code> after the name of the dataframe.
e.g. <code><font color='blue'>dataframe_name</font>["<font color='blue'>column_name</font>"]</code>
    
<font color='orchid'> <b> Try to select the 'fare' column from the titanic dataframe. 

In [None]:
# enter the code here


*Note: To save space only a snapshot of the column is shown and not all the rows

### Data frame index
When we type <code> dataframe_name["column_name"]</code> we get the values of the column but we also see the index (the row names). In this case the rows are just numbered 0:890. But these could be other values such as passenger names.

Python indexing starts at 0 not 1. So the first row is row 0 and the first column column 0. 

<code>.index</code> works the same as <code>.columns</code> but this time shows the row names. 

<font color='orchid'> <b> Run <code>titanic.index</code> in the cell bellow </font> 

In [None]:
# enter the code here


We can see the index starts at 0, stops after 891 enteries and increases by a step of 1 for each row.

### Data frame shape
To get the number of rows and columns of a dataframe we can use <code>.shape</code>

<font color='orchid'> <b> Run <code>titanic.shape</code> in the cell bellow </font>

In [None]:
# enter the code here
titanic.shape

## Data Types

### Data types recap

As we covered in the lecture there are different types of data: 

<b>Objects:</b> also known as strings or written characters/text in plain english e.g. Hello World

<b>Intergers:</b> Whole numbers e.g. 2, 57 or 109567835

<b>Floats:</b> A number with a decimal place e.g. 2.34534, 5.5 or 1.0

<b>Boolean:</b> True or False data type
<br>

<b>Datetime:</b> Values that are either a date, time or both e.g. 2019-10-31 09:26:03.478039 (9:26 am on Halloween 2019)

<b>Category:</b> A fintie list of text values E.g. London, Paris, Berlin, Rome (There are a finite number of captial cities)

Learn more about python data types using this realpython [online resource](https://realpython.com/python-data-types/)
<br>
<br>
<br>

### Checking the data type

We can check the data type of each column in a dataframe using the <code>.dtypes</code> funciton 
<br> <font color='orchid'> <b>Run the code below</b></font> and have a look at the data type of each of the columns. Are they all as you expected?

In [None]:
titanic.dtypes

<code>.info()</code> gives us slightly more information including: 
- data types: (<code>.dtypes</code>)
- null counts: number of rows containing non-null values
- memory usage: how much computer memory the table uses (useful to know to stop your code running)

If a column has fewer non-null values than the total number of rows this indicates that data might be missing. 
<br> <font color='orchid'> <b>Run the code below</b></font>

In [None]:
titanic.info()

## Extra tasks

If you have time at the end of the session run the code below to load the penguins dataset and answer the following question using your new found coding skills. 
    
- How many columns and rows does the penguins data frame have?
- What are the data types of the different columns in the penguindataframe?
- Is there any data obviously missing from the dataframe?
- Can you use markdown to produce a list fo the column names and variable types for the penguins data set as I have done above for the titanic data set

Work through the following:
- https://www.programiz.com/python-programming/comments
- https://www.programiz.com/python-programming/variables-constants-literals
- https://www.programiz.com/python-programming/variables-datatypes 
- https://www.programiz.com/python-programming/operators

In [None]:
# load the penguins example dataset and save it as a dataframe named penguins
penguins = sns.load_dataset('penguins')

In [None]:
# How many columns and rows does the coffee data frame have?
penguins

In [None]:
# What are the data types of the different columns in the penguins dataframe?


In [None]:
# Is there any data obviously missing from the dataframe?


Use the following code to get a list of other dataframes you can explore
<code>sns.get_dataset_names()</code>, can you produce a summary of what the data is showing and idenitfy where data is missing? 

You might need to google the dataset to understand the variable names. You might want to create a markdown data summary similar to the one I created for the titanic dataset:

    
<I> A Quick description of the titanic data variables:
- <b>survival:</b>    If the passenger survived
- <b>PassengerId:</b> Unique Id of a passenger. 
- <b>pclass:</b>    Ticket class
- <b>sex:</b>   Sex     
- <b>Age:</b>   Age in years     
- <b>sibsp:</b>    Number of siblings / spouses aboard the Titanic     
- <b>parch:</b>   Number of parents / children aboard the Titanic     
- <b>ticket:</b>   Ticket number     
- <b>fare:</b>   Cost of the passenger fare     
- <b>cabin:</b>  Cabin number     
- <b>embarked:</b>    Port of Embarkation</I> 

In [None]:
sns.get_dataset_names()


## Answers 
Answers to the enter your own code sections. 

<b>Lists <font color='orchid'> Create a list called fruit with the following data items: 'apple', 'pear', 'banana', 'strawberry' (note these are strings)</font></b>
 
 ![fruit.png](attachment:fruit.png)   

<b> Lists and basic maths<font color='orchid'> What happens when we try to multiply the list by 3? </font> </b> <br>
The list repeats 3 times:
![number_list.png](attachment:number_list.png)
    
<b>Data frame columns <font color='orchid'> Enter titanic.columns in the cell below. </font> </b>
![titanic_columns.png](attachment:titanic_columns.png)
    
    
<b>Referring to a column in a dataframe <font color='orchid'> Select the 'fare' column from the titanic dataframe.</font> </b>
    
![titanic_fare.png](attachment:titanic_fare.png)
    
    

*Answers are screenshots of code so cannot be copied and pasted. Type out the code yourself if you get stuck. 