# Python for Data Analytics - week 1

Acknowledgement: This notebook is based on open teaching materials of Worldbank, https://github.com/worldbank

### Getting started in Colab:
* Make sure you're signed into your Google account. Click connect in the top right corner.

#### Preliminary exercise - get to know Colab

Notebooks comprise two types of cells:
* _Code cells._ These contain executable commands in Python.
* _Text cells._ These include plain text, or you can use [markdown](https://commonmark.org/help/) to add formatting.

__EXERCISE:__ Spend a couple of minutes learning to navigate Colab. Perform the following:
 * Add a new code cell, first by point-and-click method, then using the keyboard shortcut.
 * Write your first program: print("hello world!")
 * Run the program two ways: using CTRL-ENTER and SHIFT-ENTER. Note the difference in which cell is selected.
 * Delete the code cell when you're finished with it.

#### Keyboard shortcuts

Action | Colab Shortcut
---|---
Execute current cell | `<CTRL-ENTER>`
Execute current cell and moves to next cell | `<SHIFT-ENTER>`
Insert cell above | `<CTRL-M> <A>`
Append cell below | `<CTRL-M> <B>`
Convert cell to code | `<CTRL-M> <Y>`
Convert cell to Markdown | `<CTRL-M> <M>`
Delete cell | `<CTRL-M> <D>`
Autocomplete | `<TAB>`
Goes from edit to "command" mode | `<ESC>`
Goes from "command" to edit mode | `<ENTER>`
<p align="center"><b>Note:</b> On OS X use `<COMMAND>` instead of `<CTRL>`</p>

### 1. Variables and math in Python

#### 1.1 Math operators

In [1]:
# add two integers
2 + 2

4

In [2]:
# multiply two integers
2 * 2

4

In [3]:
# spaces don't matter here, but keep them consistent (PEP8 good practice)
2*3   +   10

16

In [4]:
# divide two integers
6 / 3

2.0

In [5]:
# raise 2 to the 4th power
2 ** 4

16

In [6]:
# the mod function returns the remainder after division. Useful to check divisibility (among other things)
10 % 3

1

| Symbol | Task Performed |
|----|---|
| +  | Addition |
| -  | Subtraction |
| /  | division |
| *  | multiplication |
| **  | to the power of |
| %  | mod |

#### 1.2 Working with variables

In [7]:
# variables, such as x here, contain values and their values can vary
x = 5

In [8]:
# to inspect a value, just call it
x

5

In [9]:
# you can perform calculations on variables
x + 3

8

In [10]:
# what's the value of x now?
x

5

In [11]:
# to update the value of a variable, you have to do assignment again
x = x + 3

In [12]:
# now what's the value of x?
x

8

In [13]:
# create a new variable y from a calculation involving x
y = x + 2
y

10

In [14]:
# to modify a variable in place through addition or subtraction, use the shorthand += or -=
x += 10
x

18

In [15]:
# calling two variables only displays the last one
x
y

10

In [16]:
# use the print() function to output value(s) to the console
print(x)
print(y)

18
10


In [17]:
# separate two values by commas to output on the same line
print(x,y)

18 10


In [18]:
# you can also print the output of an expression
print(x * y)

180


NOTE: Use valid variable names!
* Variable names can contain letters, numbers, and the underscore character.
* You can't begin variable names with a digit, or use any of Python's _reserved words_ (eg. False, list, None, zip, else, class, ...).
* Don't use a space in the middle of a variable name.

| result | variable name |
|----|----|
| Valid | my_float, xyz_123, zip_code |
| Error! | my float, 123_xyz, zip |

#### 1.3 Getting help and using tab complete

In [19]:
# get iPython help on an expression by putting ? after it
len?

In [21]:
# use tab complete to fill in the rest of statements, functions, methods
prin

NameError: name 'prin' is not defined

In [22]:
# also use it to complete variable or functions that you defined yourself
name_of_course = "Python for Data Analytics"

In [None]:
name_of_cou

### 2. Basic data types: int, float, string, Boolean
These object types are the most basic building blocks when handling data in Python. Note that Python is an object-oriented language. Each object has a type, which determines what can be done with it. For instance, an object of type _int_ can be added to another _int_.

In compiled languages like C++, the programmer has to declare the type of any variable before using it. By contrast, Python will **infer the type of variable you want** at run-time. It does this based on what characters you pass, whether they are surrounded by quote marks or brackets. This keeps the syntax much more 'natural' - but take care to learn the rules your Python interpreter applies.

In [None]:
# integers are whole numbers
x = 10
type(x)

In [None]:
# floats are floating point (or decimal) numbers
y = 4.25
type(y)

In [None]:
# strings are sets of characters in a row, denoted by single or double quotes
course_name = 'Python for Data Science'

In [None]:
# the possible values for a Boolean are True or False

my_enrollment_status = True
type(my_enrollment_status)

In [None]:
# use isinstance to check an object's type (answer is a Boolean)
isinstance(course_name, int)

#### 2.1. Manipulating strings

In [None]:
# this is a string. It can be assigned to a variable. 
mystring = 'I am a string. Humans can interpret me easily'

In [None]:
# we can print this string as follows:
print(mystring)

In [None]:
# Data types are defined as classes. Classes have methods attached to them, which you can access with dot notation.
# Example: strings have a method '.split()' that returns a list of component parts. 

mystring.split('.')

In [None]:
# we can use this to access just part of the string
split_up_list = mystring.split('.')
split_up_list[1]

In [None]:
# strings are iterable - we can print certain letters or chunks, depending on how we index them:
print(mystring[0])
print(mystring[3])

In [None]:
for q in range(0, 15):
    print(mystring[q:15])

In [None]:
for q in range(0, 15):
    print(mystring[15:-q])

In [None]:
# the .replace() method is handy too. This operation can be chained for entertainment value:
print(mystring)

new_string = mystring.replace("I am","Nicholas is").replace("string", "human").replace("Humans", "Other humans").replace("me","him")

print(new_string)
print(new_string.replace('Nicholas', 'Charles').replace('easily','from time to time'))

In [None]:
# strings can added (concatenated) together
add_chunk = '. I love strings'
mystring + add_chunk

In [None]:
# ... but not subtracted
subtract_bit = 'easily.'
mystring - subtract_bit

In [None]:
# the backslash is special in strings - it is called an escape character. 
# It does a number of different things depending on the next letter:
print('using \n generates a new line!')

In [None]:
print('using \t generates a tab!')

In [None]:
# you can tell python to ignore this by adding 'r' to the start of a string.\
string_will_fail = 'C:\Users\charl\Documents\CE\RAM\OneDrive_1_3-6-2019\QXN\RN'

In [None]:
string_will_work = r'C:\Users\charl\Documents\CE\RAM\OneDrive_1_3-6-2019\QXN\RN'

In [None]:
# the traditional way to format strings with variables is through % notation
fave_number = 10
print('my favourite number is: %f' % fave_number)

In [None]:
import time 
the_time = time.ctime()
print('the date and time is currently: %s' % the_time)

In [23]:
# be aware of the letter after the first percent sign - it changes the nature of the string formatting:
q = 22 / 7

print('%d' % q)    # d: decimal
print('%f' % q)    # f: float
print('%e' % q)    # e: exponential
print('%s' % q)    # s: string

3
3.142857
3.142857e+00
3.142857142857143


In [27]:
# a more recent and easy-to-use way to print variables is with .format()
# to keep your outputs neat, you might want to limit the decimal places

print(f"My output: {q}")
print(f"My output: {q:.3f}")

My output: 3.142857142857143
My output: 3.143


#### 2.2 Converting between types
Often you need to convert variables to other types, especially to make them work together. Use the _int()_, _str()_ or _float()_ functions to convert to these data types.

In [None]:
# sometimes Python will change a variable's data type for you. Take this variable:
my_salary = 500000
print("Variable my_salary has value {} and type {}".format(my_salary, type(my_salary)))

In [None]:
# now divide it by another integer. The result is necessarily a float:
daily_rate = my_salary / 365
print("Variable daily_rate has value {} and type {}".format(daily_rate, type(daily_rate)))

In [None]:
# changing a float to an integer lops off everything after the decimal place
int(daily_rate)

In [None]:
# the output should actually be 1370
round(daily_rate)

In [None]:
# you can't concatenate a string and an integer

address = "1808 H ST NW, DC"
WB_zip = 20037

address + " " + WB_zip

In [None]:
# instead, change the integer to a string first
WB_zip = str(WB_zip)
type(WB_zip)

In [None]:
# does it work now?
address + " " + WB_zip

### 3. Data types: lists, tuples and dictionaries
Ints, floats and strings are the most basic data structures (think of them as atoms). Next, we'll look at data types that combine those atoms together in more complex ways. Lists, tuples and dictionaries are containers for other data. They can be used to combine your data in more complex ways (think molecules rather than atoms).

|Data structure | Properties| Syntax|
|----|----|----|
|List | Ordered, mutable sequence | mylist = [1,2,3] |
|Tuple | Ordered, immutable sequence | mytuple = (1,2,3) |
|Set | Unordered set of unique values | set(1,2,3) |
|Dictionary | Mutable set of key, value pairs | mydict = {'first_value':1, 'second_value:2} |


#### 3.1 Manipulating lists
Lists are ordered sequences denoted by square brackets. They're helpful when your data has an order, and may need to be changed in place. You can put strings, floats, integers, or any of Python's more complex data types into a list.

In [None]:
# define a list with []
weekdays = ['monday','tuesday','wednesday','thursday','friday']
weekdays

In [None]:
# get an item using [offset]
weekdays[3]

In [None]:
# check the type of items in a list
type(weekdays[3])

In [None]:
# change an item using mylist[offset]
weekdays[3] = 'thursday - remember Python class!'
weekdays

In [None]:
# slicing: extract items by offset range
weekdays[2:4]

In [None]:
# add an item to a list with append()
weekdays.append('saturday')
weekdays

In [None]:
# test for a value in your list
'saturday' in weekdays

In [None]:
# use .remove() to clean up the weekdays list

weekdays.remove('saturday')
weekdays

In [None]:
# concatenate two lists
odds = [1,3,5]
evens = [2,4,6]
all_nums = odds + evens
all_nums

In [None]:
# Lists, like other data types, have methods associated with them. These are accessed through dot notation.
# Use tab complete to find helpful methods! 

all_nums.sor

### 4. Intro to logic and control flow

Definition of **control flow**:
* In a simple script, program execution starts at the top and executes each instruction in order. 
* **Control flow** statements can cause the execution to loop and skip instructions based on conditions.

#### 4.1 Loops and iterables
Definition: an **iterable** is an object capable of returning its members one at a time. Strings, lists and dictionaries are all iterables.

A **for loop** runs a block of code repeatedly "for" each item in an iterable. End the declaration with : and indent the subsidiary code.

In [None]:
for color in ['red','green','blue']:
    print("I love " + color)

In [None]:
# or characters in a string
for letter in 'abcd':
    print(letter.upper())

In [None]:
# the range() function produces a helpful iterator
for n in range(5):
    print("I ate {} donuts".format(n + 1))

In [None]:
# a while loop allows you to move through part of an iterable until a condition is met

a = 0
while a < 5:
    print('I am small {}'.format(a))
    a += 1

In [None]:
# try-except is one basic method for error handling

my_list = [5,6,'Sally',10]
for obj in my_list:
    try:
        print('{}'.format(obj + 1))
    except:
        print("I am not a number, I am a free woman!")

#### 4.2 Logic operators

We test conditions using logic operators.

| Symbol | Task Performed |
|----|---|
| == | True, if it is equal |
| !=  | True, if not equal to |
| < | less than |
| <= | less than or equal to |
| > | greater than |
| >= | greater than or equal to |

In [None]:
# NOTE: We declare variables using '='
a = 5
b = 7

In [None]:
# But compare them using '=='
a == b

In [None]:
# Test whether a does not equal b
a != b

In [None]:
# Logic expressions evaluate to True or False (datatype: Boolean)

test = b > a

test

In [None]:
type(test)

#### 4.3 Conditional statements with if

My pet Python is a vegetarian. She will test whether variable 'food' is 'burger', 'chicken' or 'veg', then decide whether to eat.

Do this with 'if', 'elif' (else if), and 'else'.

In [2]:
food = 'veg'

In [3]:
if food == 'veg':
    print ('yum')
elif food == 'chicken':
    print ('hmm maybe')
elif food == 'burger':
    print ('no thanks')
else:
    pass

yum


NOTE: Here's how the structure works:
* start with an 'if' statement, specifying the logical test to apply
* make sure your 'if' statement ends with :
* **indent the conditional code block.** Whatever code should be executed if the condition is true, indent it with a tab.
* test additional actions using 'elif', and any other actions with 'else'.

#### 4.4 Testing conditions inside a loop
Combining loops with logic allows you to build more sophisticated code structures:

In [None]:
days = ['Mon','Tue','Wed','Thu','Fri','Sat','Sun']

for day in days:
    if day == 'Sat':
        location = '--> Beach!'
    elif day == 'Sun':
        location = '--> My sofa!'
    else:
        location = '--> MC5-215B'
    print(day, location)

In [None]:
# EXAMPLE 2: is your pet allowed?

authorized_pets = ['small dog', 'cat', 'hamster','budgerigar']

print("Welcome to Nick's Apartment Block.")
my_pet = input("Type your pet's breed to see if it's accepted: ")

if my_pet in authorized_pets:
    print("Congratulations, your {} is welcome here!".format(my_pet))
else:
    print("Sorry your {} is NOT ACCEPTED".format(my_pet))

#### --> This concludes our tour of data structures and brief introduction to control flow. We'll go into more depth on structures like dictionaries next week, and cover control flow in more detail.

# Python for Data Science - Week 1
### Lab session
Work through the following exercises. Get through as many as you can in the time allotted.
The box lists additional Python functionality that may help you.

| Task | Python function |
|----|----|
| Import the code library 'random' | import random |
| Generate random numbers | random.randint() |
| Prompt the user for some text | my_variable = input() |


#### EXERCISE 1: Too many donuts?
*Objective: handle strings and integers, test conditions*

Write a program that will:
1. Define a variable 'donuts_eaten'.
2. Print a string of the form 'Donuts consumed: < count >'. However if 10 or more donuts were eaten, then use the word 'many' instead of the actual count.

In [5]:
# YOUR CODE HERE:

donuts_eaten = 12
if donuts_eaten > 10:
    print("Donuts consumed: " + "many")
else:
    print("Donuts consumed: " + donuts_eaten)

Donuts consumed: many


#### EXERCISE 2a: Age calculator
*Objective: Define variables, do basic math, convert data types*
    
Write a program that will:
1. Define a variable 'birth_year'
2. Define a variable 'current_year'
3. Calculate the person's age from these two
4. Print the output in format ("You are x years old.")

In [3]:
# YOUR CODE HERE:

# YOUR CODE HERE:

birth_year = 
current_year = 

#### EXERCISE 2b: Premier League salaries
*Objective: Define variables, do math, use conditional logic.*

1. Define two variables, player_name (a string), and annual_salary (an integer).
2. Assume there are 125 days in the football season. Print a neat string saying your player's name and how much they earn (a) per month; and (b) per day of the season. 
3. Now define a Boolean variable championship_winners.
4. Test whether championship_winners is True, and if so, print your output again but with a 33% bonus.

**BONUS POINTS**: limit the salary to two decimal points when printing it.

In [4]:
# YOUR CODE HERE:

player_name = 'Ronaldo'
annual_salary = 33500000
days_in_season = 125



#### EXERCISE 3: Dog's dinner
*Objective: Practice string manipulation*

1. Define two strings, a and b.
2. Your program should return a single string with a and b separated by a space, except swap the first 2 chars of each string.
* eg. 'dog', 'dinner' -> 'dig donner'


In [None]:
# YOUR CODE HERE:

#### EXERCISE 4: Higher or lower?
*Objective: Test conditions, use a counter.*

Write a program to:
1. Generate a random number (kept secret from the user).
2. Prompt the user "Guess a number" and record their input.
3. Tell them if they were correct, too high, or too low.

BONUS POINTS: limit the number of guesses to 5.

In [5]:
# YOUR CODE HERE:

#### EXERCISE 5: FizzBuzz
*Objective: Get a job at Amazon! (they use it in software engineer interviews)*

Write a script to:

* Print out the numbers from 1 to 20 but replacing numbers with 'Fizz' if divisible by 3, 'Buzz' if divisible by '5', and 'FizzBuzz' if divisible by 3 and 5.

Hint: the 'mod' operator, denoted %, is used to check divisibility. Example: 10 % 2 == 0.

In [None]:
# YOUR CODE HERE: