# Homework Notes - Pre-Week 2: Introduction to Python, data types, iterations, conditions

## Introduction

This section is for those who are not familiar with Python, or those who need a refresher in Python, especially in the Jupyter Notebook environment.

![P1](picture/P1.png)

Before starting this week's Introduction to Python, please ensure that you have your Jupyter Notebook environment set up and ready to go.  Ready to go here means that you have tested with at least the <span style="color:red">print('Hello Jupyter')</span> , and some of the markdown cells using markdowns such as <span style="color:red"># Header</span>, and  <span style="color:red">[HWU](https://www.hw.ac.uk)</span>.  If you are already familiar with Python, you can skip to the "Python Libraries and Useful Data Science Libraries" section.

## Python Primitive (Basic) Data Types

Why is it that when we start to learn a programming language, we introduce the basic data types?  Simply put, a data type tells the computer how to interpret its value.  For a data scientist, understanding data types ensures that the data collected and used is in the appropriate format and the representation of the value is as expected by the program.

There are 5 basic data types in Python, which are:  
* **Integers**: Whole numbers, with an accurate representation but the range is limited to the bit representation.  (As an aside, this can be used to represent octals and hexademicals as well).  
* **Floating-Point Numbers**: Real numbers, however there is a trade-off with precision.  
* **Complex Numbers**: A real and imaginary component, that cannot be represented using a real number.  
* **Strings**: A sequence of characters, for example a phrase or word.  
* **Boolean**: Represents two possible (binary) values, usually interpreted as True or False.  

We are mainly concerned with Boolean, Integers, Floating-Point and Strings.  Let us start with the primitive data types. In programming, we name a storage area and call it a variable. For example, we may have a variable called <span style="color:red">num</span> and we want to assign it an integer number of <span style="color:red">4</span>.  
num = 4  
We have now successfully stored the integer number <span style="color:red">4</span> in a storage (the computer memory) that we called <span style="color:red">num</span>.  We can check for the type that we have by using the function <span style="color:red">type</span>. (A function in programming is the same as a mathematical function where it is a sequence of instructions that performs a specific task that is packaged as a unit. This unit can then be used in programs wherever that particular task should be performed).  
type(num)  
We should get an output if we run the command above and it should tell us that it is an <span style="color:red">int</span> (which stands for integer).  To ensure that we are all on the same page, on your Jupyter Notebook, you should have something that looks like below:

![P2](picture/P2.png)

To illustrate the importance of understanding the data types, we will do a simple arithmetic addition by typing <span style="color:red">num + 5</span>.  This should return the value <span style="color:red">9</span>.

Let's then try to assign a new real number (floating-point number)  to the variable named <span style="color:red">fp</span>, i.e., <span style="color:red">fp = 4.5</span>.  We then try to ask the computer to add a string <span style="color:red">"some"</span> to it.  It will give us an error.  Before you conclude that the <span style="color:red">'+'</span> sign is for arithmetic addition, it is also used for "adding" two parts of a string together, or concatenation of strings.  You can do this using <span style="color:red">"some" + "thing"</span>  to get  <span style="color:red">'something'</span>.

However, Python does allow integer and floating-point numbers to be mixed. This is because Python is a dynamically typed language, meaning that the Python interpreter is able to figure out that you are mixing them. It will then convert the type that can be a subset (in this case the integer) to the superset (in this case the floating-point) representation.  
<span style="color:red">4.5 + 5</span>  
or  
<span style="color:red">num + fp</span>  
This should give you an introduction to understanding data types and how it plays a role in handling data.  By now, you should have noticed the left side (In and Out) numbering on your Jupyter Notebook.  If you haven't figured out what it means, come to class and ask.  We will also have less illustrations here on Canvas lecture and tutorial notes from here forward as we expect you to be able to follow the notes from here on without it (and execute it on your own in your Jupyter Notebook).

## Python Composite Data Types

In data science, we do need to understand the following composite data types (composite in that it contains the primitive data types internally).  Generally, <span style="color:red">list</span> and <span style="color:red">disct</span> (dictionary) are useful for data science.  We have composite data types in order to store data in a commonly agreed upon data structure, which has certain useful functions built-in to the data types.  This means that we do not need to create a data structure each time we want to do some processing.

* List - A data structure in Python (and also common in many other programming languages) that is changeable (in programming, we call this mutable), and is an ordered sequence of items. The term item here means the value that is contained inside the list data structure. Lists (in Python) are defined by having the items between square brackets <span style="color:red">[ ]</span>, i.e., <span style="color:red">[3,4,6,2,3]</span> is a list of 5 integer items.

* Dictionary - A data structure in Python that stores items in key-value pairs. A key is an identifier for the value stored, and the value is the item associated with that key. Do note that dictionaries in Python are mutable, but they are not ordered.  Dictionaries are created using curly brackets (also known as braces) <span style="color:red">{ }</span>. The key-value pairs within the dictionary are separated by commas (<span style="color:red">,</span>), and within the key-value pairs, the key and value are separated by a colon (<span style="color:red">:</span>), e.g., <span style="color:red">{1:'apple', 2:'ball', 3:'carrot', 4:'doll'}</span>.  You may find this useful as a database format for data science purposes. (As an aside, there are databases such as MongoDB that store data in this format and this is also commonly represented in data exchange formats such as JSON).

You can read up on tuple and set composite data types.

## Python Iterations and Conditions

Now that we have introduced the data types that are commonly used in Python (we will introduce more, such as DataFrame later), we shall proceed to review (for most of  you anyway) programming. Programming can simply be considered as a sequence of instructions where there will be decisions to be made (conditions) and repetitive instructions (iterations).  When you think about it, this is similar to a (sad) person's working life where it's iterative on a daily basis until death; containing waking up, washing up, brushing teeth, getting dressed, having breakfast, catching the bus and so on.  Within it, there are decisions to be made, what to wear for the day, what to eat for breakfast and so on.  Hence, programming is not that much different from your daily life (even if you don't have a sad life!).  Let's look at conditions and iterations:

### Python Conditions

Simply put, the conditional statement <span style="color:red">P→Q</span> means that Q is true whenever P is true.  Python supports the usual mathematical conditions:  
* Equality: <span style="color:red">a == b</span> (do note that in most programming languages, we use the double '=' because a single '=' is usually an assignment)  
* Not Equals: <span style="color:red">a != b</span> (the exclamation mark '!' is sometimes referred to as "bang" or "pling")
* Greater than: <span style="color:red">a > b</span>, greater than or equal to: <span style="color:red">a >= b</span>  
* Less than: <span style="color:red">a < b</span>, less than or equal to: <span style="color:red">a <= b</span>

A condition will result in a <span style="color:red">True</span> or <span style="color:red">False</span>, and we call this a <span style="color:red">boolean</span>.  For conditional statements, in Python we use the <span style="color:red">if</span> statement, where if a condition is met, a sequence of instructions are to be executed, otherwise (<span style="color:red">else</span>) <u>optionally</u> another sequence of instructions are to be executed.  For example:

<span style="color:red">a = 10</span>  
<span style="color:red">b = 20</span>  
<span style="color:red">if b >= a:</span>  
<p style="margin-left: 40px;"><span style="color:red">print("b is greater than or equal to a")</span></p>


#### Indentation (Important for new Python programmers!)

For the sharp-eyed reader, you will notice two things, one is the use of the colon (:) at the end of the if statement.  This signifies the end of the condition, as we can also combine multiple conditions (in a boolean algebra manner).  E.g., 

<span style="color:red">a = 10</span>  
<span style="color:red">b = 20</span>  
<span style="color:red">if b > a or b==a:</span>  
<p style="margin-left: 40px;"><span style="color:red">print("b is greater than or equal to a")</span></p>

and secondly, the last line "<span style="color:red">print("b is greater than or equal to a")</span>  " is <u>indented</u>.  Python relies on this indentation to define the scope of the instructions in the code.  It will become clearer as we look at the optional "<span style="color:red">else</span>" and "<span style="color:red">elif</span>" statements. Let's extend our sample code for this.

<span style="color:red">a = 10</span>  
<span style="color:red">b = 20</span>  
<span style="color:red">if b > a or b==a:</span>  
<p style="margin-left: 40px;"><span style="color:red">print("condition met")</span></p>  
<p style="margin-left: 40px;"><span style="color:red">print("b is greater than or equal to a")</span></p>  
<span style="color:red">else:</span>  
<p style="margin-left: 40px;"><span style="color:red">print("condition not met")</span></p>  
<p style="margin-left: 40px;"><span style="color:red">print("b is less than a")</span></p>  
<span style="color:red">print("outside the if or else scope")</span>  

The condition was met and hence the 2 lines indented after the <span style="color:red">if</span> statement will execute as it is within the scope of the <span style="color:red">if</span> statement but the lines in the <span style="color:red">else</span> statement are not.  Do note that the last line will execute as it is in the scope of the overall code.  We will leave you to explore the <span style="color:red">elif</span>  command.

### Python Iterations

There are 2 primitive ways to achieve iterations in Python, the while statement and the for statement.  Similarly to the for statement, a while statement executes a set of instructions (within it's scope, i.e., indented) if a condition is met and repeats until the condition is not met (a for statement does not repeat). E.g.,

i = 1
while i <= 10:
  print(i)
  i += 1

The above will print the value of i 10 times, which should be incremented each time. We will leave further exploration of the while statement to you. The for statement in Python is used to iterate over a sequence of items, for example the sequence in a list, a dictionary, or a string (called the iterator).  Using the for loop, we can then iterate the execution of instructions for each item in the iterator.

For example, over a list of kids alphabet words (their A B Cs),

alphabet = ["apple", "ball", "cat", "door"]
for x in alphabet:
  print(x)

As a string type is a sequence of characters, the for loop can also iterate over it, e.g.,

for x in "apple":
  print(x)

We have given a refresher for Python programming and there will be other programming keywords and methods to learn.  Among the interesting ones would be pass, break, continue and how to create Python functions.  We will leave that for now and move on with more Data Science related coding.

## My code part

### Introduction

In [1]:
print("Hello World")

Hello World


# Header 1
## Header 3
### Header 3
#### Header 4
not a header

### Python Primitive (Basic) Data Types

In [15]:
num=4
type(num)

int

In [7]:
print(num+5)

9


In [18]:
fp=4.5
type(fp)

float

In [None]:
#fp=fp+"some"

TypeError: can only concatenate tuple (not "str") to tuple

In [11]:
print("some"+"thing")

something


In [20]:
print(num+fp)

8.5
