# Programming for Data Science (Python)

### In this Notebook, we will learn about
    a. Variables
    b. Different data types
    c. Operations associated with each data type

In [1]:
# This code appears in every demonstration Notebook.
# By default, when you run each cell, only the last output of the codes will show.
# This code makes all outputs of a cell show.
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Variables
### 1. Define a variable

In [None]:
# You can think variables as placeholders, or boxes. Each box can hold a value, which can be changed anywhere in the program.
# Variables are defined with assignment statement
# Such a statement starts with a variable name, followed with an equal sign and the value you want to give to the variable
# For example:
myName = "Ling" # string
myAge = 60 # number
myInterests = ['art', 'music', 'books'] # list
myDemo = {'name': "Ling", 'age': 60, 'degree': 'PhD'} # dictionary

In [None]:
# Assigned value can be the output of a function. For example:
myName = input("Please tell me your name: ")
print("Hello", myName)

In [None]:
# When a variable is assigned a new value, the old value is forgotten.
# What is the output of the following code?
myAge = 60
myAge = 45
print(myAge)

### 2. Variable names
     a. variable names can only contains letters, numbers and "_"

     b. Case matters

     c. Don't use generic names, especially reserved function names

In [None]:
# variable name can only contains letters, numbers and "_"
# Are these valid variable names?

# 7up = 'drink' # variable name cannot start with numbers
# var%var = 7 # variable name cannot have characters other than letters, numbers and "_"
# _time_status_ = 'over' # This a valid variable name.

In [None]:
# Case matters
currentbalance = 1000
CurrentBalance = 900
print(currentbalance)
currentbalance != CurrentBalance
# Therefore, being consistent with naming conventions can help. Multiple-word names: capitalizing from the second word
# currentBalance, myAge, startTime, etc.

In [None]:
# We do not use generic names as variable names, such as "variable", "function", or "number"
# Do not use reserved function names. It will cause problems.
'''
int = 78 # the variable name is int.
myAge = input("Your age: ")
print(int(myAge))
'''
# Sensible names with certain meanings can help to better understand and maintain your program.

## Data Types
    a. Type checking is done at runtime
    b. No mixing operations between different types    

In [None]:
# Each variable has a type
# All type checking is done at runtime: dynamically typed
# No need to declare a variable or give it a type before use
# What will be the output of the following code?
x = 2
x = 'changed to 3'
print(x)

In [None]:
# To find out the data type of any variable, we can use type() function
type(myDemo) # returns the datatype of the variable
print(myDemo)

In [None]:
# No mixing operations between data types.
# For example:
x = 60
y = 'sixty'
x+y
# Recall in our first program, if we tried to print the hello message with updated age in it:
newAge = myAge + 1
print("You will be " + newAge + " next year.") # This will return error message because we mix strings with an integer.

In [None]:
# Sometimes type conversion is necessary
# For example, what we did in our first program is to cast the newAge into a string using type casting function
str(newAge)
# Then we can put it together with the strings.
print("You will be " + str(newAge) + " next year.")
# We have a list of type casting functions. Please refer to the slides.

### Number
    a. The errors in floats
    b. Arithmetic operations

In [None]:
# A number is a numeric value;
# Three main types: integer, float and complex

# The precision of floats
0.1+0.2 # This is a weird one; I have not found any other cases like this.
0.1+0.3
0.2+0.3

# If it happens that you need to use 0.1+0.2 and want it to be precisely 0.3, how to solve the problem?
0.1 + 0.2 == 0.3
round(0.1+0.2, 2)
# We can use the round function to eliminate the error.

In [None]:
# Arithmetic operations are intuitive operations for numbers.
# multiply, power, division, remainder, floor quotient, etc.
8*5
8/5
8//5
8%5
-8//5

# This formular is always true x == (x // y) * y + (x % y)
(8//5)*5 + (8%5)

# Please refer to the table in the slides for more operators.
# What does divmod(x,y) do?
divmod(8, 5)
# The function returns two values at the same time: the floor quotient and the remainder.

In [None]:
# Sometimes arithmetic operations can be very helpful.
# Practice: There is flight data. The departure time is recorded as 1450 for 14:50. 
# How can we get hour and minutes from this type of departure time records?
depTime = 1450
hour = ?
minutes = ?
# hour = dptime//100; minutes = dptime % 100

### String
    a. Definition: quotes
    b. Escape "\"
    c. Concatenation

In [None]:
# A string is a sequence of characters.
# Single Quotes or Double Quotes to define a string. It is suggested to stick with on throughout a program.
print('Python is powerful and easy to learn.')
print("Data science is amazing.")
# Sometimes we use both to avoid confusion. For example, how to print 'Let's learn Python'?
print('Let's learn Python')
print("Let's learn Python")
# Triple quotes for multiple lines of text; we often use it to make long comments
'''
For there is always light
If only we are brave enough to see it
If only we are brave enough to be it
'''

In [None]:
# Escape '\' sign indicates the character after will 'escape' the default meaning/function it has in Python and
# takes upon its original or special meanings.

# For example: How to print "Let's learn Python"?
print('Let\'s learn python')
# Escape '\' in front of the single quote sign means the it will 'escape' its default function as indicators of strings,
# and become a regular sign. So it can be printed as a regular sign.

# How to print several lines of text?
print('Hello\nworld!\nNew day!')
# '\n' has a special meaning: starting at new line; you can find a table for these escape characters in the slides

# How to print a file directory with backslash '\'?
print('C:\new\numbers')
print('C:\\new\\numbers') 
# The default meaning of "\" is escape sign. To turn it back to a normal '\' sign, we use escape in front of it,'\\'

In [None]:
# Concatenation
# Use "+" to put strings together.
print(myName + ' is ' + str(myAge) + '.')

# Use '*' to repeat
print("Hello "*5)

### Boolean type
    a. Boolean values: True/False
    b. Values defined as False in Python; Use bool() function to find True/False value
    d. Logic operators: and, or, not


In [None]:
# An expression may be evaluated to be simply True/False.
# For example, comparisons between numbers
3 == 4 # double equal sign means being equal to. Single equal sign is for variable assignment.
3 != 4 # not equal to
3 > 4

# For strings, we compare one by one the character of one string with another string.
# For instance, 'Apple' vs. 'apple'. We compare 'A' with 'a'. 
# The comparison between strings is based their unicode value. Uppercase letters come before lowercase letters.
# 'A' is smaller than 'a', so 'Apple' < 'apple'.
'Apple' < 'apple'
# To find out the unicode value of each character, we can use ord() function.
ord('A')
ord('a')

# When there are numbers in the string, the comparison is similar. For example,
'12' < '5'
# We compare '1' with '5'. String '1' unicode value is smaller than String '5'.

In [None]:
# Some values in Python are defined to indicate False: None, integer 0, float 0.0, imaginary number 0i, empty sequence or set
# bool() can evaluate whether a value or statement to be True or False
bool(0)
bool(1)
bool([])
bool('Apple' < 'apple')

In [None]:
# The operators for Boolean type are logic operators: and, or, not
x = 8
y = 0
bool(x) and bool(y) # True and False --> False
bool(x) or bool(y) # True or False --> True
not y

In [None]:
# The three operators should be evaluated in a sequence if they appear in one expression. That is operator precedence.
# The sequence should be: not, and, or
# For example, what is the output of the following expression?
not True and False or True and False

# We can combine expressions together using logic operators.
crazy = 3>5 or '12'>'5' or not 0 == 0.0
crazier = 3>5 and '12'>'5' and not 0 == 0.0

print("It is a crazy world: ", crazy)
print("It is a crazier world: ", crazier)

### List
    a. Create and access a list
    b. Slicing
    c. Make changes: add or remove items
    d. List comprehension
    e. Common functions for sequences: mutable vs. immutable

In [2]:
# List is a powerful Python data type.
# It is a container which holds comma separated values (i.e., items or elements) between square brackets
# The elements can be the same or different data types

# A number list
myList1 = [0, 1, 1, 2, 3, 5, 8, 13]
# A string list
myList2 = ['apple', 'banana', 'cherry', 'dragonfruit', 'fig', 'grape']
# A mixed list
myList3 = ['1', 'apple', '3.14', 'pie']
# A nested list
myList4 = ['10', 'fruits', ['apple', 'banana', 'cherry', 'dragonfruit']]
# An empty list; You can initialize a list with a set of brackets.
myList5 = []

In [None]:
# List is a sequence, so the position of elements matters.
[1, 2, 3] != [3, 2, 1]

# The elements are not necessarily unique.
['apple', 'banana', 'cherry', 'dragonfruit', 'apple', 'banana'] # is a valid list

In [3]:
# Each element is associated with an index, which indicates the position of the element.
# The index starts from 0. The index of the first element is 0, and the sequence goes on as 0, 1, 2...
# We can refer to elements using their indices.
# For example, how to print 'banana' in myList2?
print(myList2[1])

# We can use the reversed index as well. The index of the last item is -1, and the sequence goes on as -1, -2, -3
# to print 'banana'
print(myList2[-5])

# To refer to elements in the nested list, we can use two-layer index. For example, 'cherry' from myList4 can be accessed as
myList4[2][2]
# The first 2 refers to the third item of the big list, the sublist of fruits;
# The second 2 referes to the third item within the sublist, 'cherry'

banana
banana


'cherry'

In [4]:
# Sometimes we need to access more than one element, a subset of the list. 
# We can use 'slicing' to get them. Slicing operations in Python are notably powerful and elegant.

# We can slice a list by giving a range of index. It starts at the item with the starting index (left), and 
# stops at the item with the ending index (right)-1.
# For example, to slice 'apple' and 'banana' from myList2
myList2[0:2]
myList2[:2] # If starting index is 0, we can omit it.

# To slice 'banana', 'cherry' and 'dragonfruit'
myList2[1:4]
# If the ending index is the end of the list, you can omit it.
myList2[2:]

# It is similar to use the reversed index. The left index is included, but the right index is not.
myList2[-5:-2]

['apple', 'banana']

['apple', 'banana']

['banana', 'cherry', 'dragonfruit']

['cherry', 'dragonfruit', 'fig', 'grape']

['banana', 'cherry', 'dragonfruit']

In [5]:
# Slicing range [:] can have one more specification in the format of [ : : ]
# The third number indicates 'steps', indicating how we move along the range. 1 means one by one, 2 means step of 2

# For example, 
myList2[::2]
# will print out a sublist that picks an item every two items, starting from the first item. 
# The first two numbers omitted means to pick from the whole list.
# You may pick from any sublist.
myList2[1:5:2]

['apple', 'cherry', 'fig']

['banana', 'dragonfruit']

In [6]:
# Is myList2[0] the same as myList2[0:1]?
myList2[0] # a string
myList2[0:1] # a sublist

'apple'

['apple']

In [None]:
# List is mutable means it is editable. You may make changes to a list in many ways.

# To change the value of an element: direct re-assignment using the index
myList2[2] = 'cantaloupe'

In [7]:
# To add items to a list append(), extend(), insert()
# append() adds an item to the end of the list
myList2.append('honeydew')
myList2

['apple', 'banana', 'cherry', 'dragonfruit', 'fig', 'grape', 'honeydew']

In [8]:
# If the item is a list, it will still be added as one item. For example:
myList2.append(['jackfruit', 'kiwi'])
myList2

['apple',
 'banana',
 'cherry',
 'dragonfruit',
 'fig',
 'grape',
 'honeydew',
 ['jackfruit', 'kiwi']]

In [9]:
# To add item by item, we need to use extend()
# extend() takes an iterable object and adds to the list one item a time
myList2.extend(['jackfruit', 'kiwi'])
myList2

['apple',
 'banana',
 'cherry',
 'dragonfruit',
 'fig',
 'grape',
 'honeydew',
 ['jackfruit', 'kiwi'],
 'jackfruit',
 'kiwi']

In [10]:
# A string is also an iterable object. So extend() will extend the list by adding one character a time.
myList2.extend('xyz')
myList2

['apple',
 'banana',
 'cherry',
 'dragonfruit',
 'fig',
 'grape',
 'honeydew',
 ['jackfruit', 'kiwi'],
 'jackfruit',
 'kiwi',
 'x',
 'y',
 'z']

In [11]:
myStr1 = 'apple'
myStr1[1]

'p'

In [12]:
# insert() insert an item to a specified position.
myList2.insert(4, 'eggplant')
myList2

['apple',
 'banana',
 'cherry',
 'dragonfruit',
 'eggplant',
 'fig',
 'grape',
 'honeydew',
 ['jackfruit', 'kiwi'],
 'jackfruit',
 'kiwi',
 'x',
 'y',
 'z']

In [13]:
# Another way to add more items to a list is concatenation, i.e. combine two lists together using "+"
drinkList = ['coffee', 'tea']
snackList = myList2 + drinkList
snackList

['apple',
 'banana',
 'cherry',
 'dragonfruit',
 'eggplant',
 'fig',
 'grape',
 'honeydew',
 ['jackfruit', 'kiwi'],
 'jackfruit',
 'kiwi',
 'x',
 'y',
 'z',
 'coffee',
 'tea']

In [14]:
# To delete an item: del, remove(), pop()
# del delete an item by index.
# Check our myList2. Let's delete the sublist first
del snackList[8]
snackList

['apple',
 'banana',
 'cherry',
 'dragonfruit',
 'eggplant',
 'fig',
 'grape',
 'honeydew',
 'jackfruit',
 'kiwi',
 'x',
 'y',
 'z',
 'coffee',
 'tea']

In [15]:
# The remove() method takes a single item and deletes the first matching value in the list
snackList.remove('x')
snackList.remove('y')
snackList.remove('z')
snackList

['apple',
 'banana',
 'cherry',
 'dragonfruit',
 'eggplant',
 'fig',
 'grape',
 'honeydew',
 'jackfruit',
 'kiwi',
 'coffee',
 'tea']

In [16]:
# The pop() method removes the element at the specified position. It'll return the removed item to the screen.
snackList.pop(4)
snackList

'eggplant'

['apple',
 'banana',
 'cherry',
 'dragonfruit',
 'fig',
 'grape',
 'honeydew',
 'jackfruit',
 'kiwi',
 'coffee',
 'tea']

In [18]:
# Other useful functions: sort(); reverse=True will sort the list descending
snackList.sort()
snackList

['apple',
 'banana',
 'cherry',
 'coffee',
 'dragonfruit',
 'fig',
 'grape',
 'honeydew',
 'jackfruit',
 'kiwi',
 'tea']

In [19]:
snackList.sort(reverse = True)
snackList

['tea',
 'kiwi',
 'jackfruit',
 'honeydew',
 'grape',
 'fig',
 'dragonfruit',
 'coffee',
 'cherry',
 'banana',
 'apple']

In [20]:
# reverse(): Reverse the list
snackList.reverse()
snackList

['apple',
 'banana',
 'cherry',
 'coffee',
 'dragonfruit',
 'fig',
 'grape',
 'honeydew',
 'jackfruit',
 'kiwi',
 'tea']

In [21]:
# count(): Count the number of occurrences
snackList.count('apple')

1

In [22]:
# index(): get the index number of an item
snackList.index('fig')

5

In [23]:
# len(), min(), max()
len(snackList)
min(snackList)
max(snackList)

11

'apple'

'tea'

In [27]:
# in, not in: return True/False based on whether an item is in the list
# For example,
'apple' in snackList
'apple' not in snackList
'orange' in snackList
'orange' not in snackList

True

False

False

True