# Introduction to Python Data Analytics
# Part 1. Python Basics

Author: Kang P. Lee <br>
References: 
- Python Programming by en.wikibooks.org (https://en.wikibooks.org/wiki/Python_Programming)
- Data Wrangling with Python by Katharine Jarmul, Jacqueline Kazil (http://shop.oreilly.com/product/0636920032861.do)
- The Python Tutorial by Python Software Foundation (https://docs.python.org/3/tutorial/)
- The Python Standard Library by Python Software Foundation (https://docs.python.org/3/library/)

## ▪ Run a Cell

In [1]:
print("Hello, world!")

Hello, world!


## ▪ Numbers

- Integers
- floating point numbers
- complex numbers

Refer to https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex

In [2]:
x = 5
print(x, type(x))

5 <class 'int'>


In [3]:
x = 3.141592
print(x, type(x))

3.141592 <class 'float'>


In [4]:
x = 5 + 2j
print(x, type(x))

(5+2j) <class 'complex'>


You don't have to specify what type of variable you want; in Python the data types are dynamically inferred.

## ▪ Strings

Refer to https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str

In [5]:
a = "hello"
print(a, type(a))

hello <class 'str'>


String literals can be enclosed in matching single quotes (') or double quotes ("); either is fine. 

In [6]:
a = "She said, \"How are you?\""
print(a)

She said, "How are you?"


If you still need double quotes in a string literal already enclosed in double quotes, you can put a backslash escape character (\\) before each double quote inside the string.

In [7]:
a = "She said, "How are you?""
print(a)

SyntaxError: invalid syntax (<ipython-input-7-bcd93f8c235f>, line 1)

In [8]:
a = 'She said, "How are you?"'
print(a)

She said, "How are you?"


If the string contains double quotes, you can use single quotes around the string without using a backslash, and vice versa.

In [9]:
a = "I'm a boy"
print(a)

I'm a boy


In [10]:
b = "1"
print(b, type(b))

1 <class 'str'>


In [11]:
1 == "1"

False

In [12]:
# len(obj, /)
# Return the number of items in a container.

a = "How are you?"
len(a)

12

### String Additions and Multiplications

In [13]:
a = "hello"
b = "world"
a + b

'helloworld'

String addition is the same as sting concatenation.

In [14]:
a * 3

'hellohellohello'

In [15]:
a * b

TypeError: can't multiply sequence by non-int of type 'str'

### Containment

In [16]:
a = "hello"
b = "hell"
print(b in a)
print(a in b)

True
False


The 'in' operator returns True if the first operand is contained in the second.

### Indexing and Slicing

Python string is, in fact, a sequence, meaning that it could be indexed and sliced.

In [17]:
a = "Data_Science_Institute!"
a

'Data_Science_Institute!'

In [18]:
index = 0
for character in a:
    print(index, "\t", character)
    index += 1

0 	 D
1 	 a
2 	 t
3 	 a
4 	 _
5 	 S
6 	 c
7 	 i
8 	 e
9 	 n
10 	 c
11 	 e
12 	 _
13 	 I
14 	 n
15 	 s
16 	 t
17 	 i
18 	 t
19 	 u
20 	 t
21 	 e
22 	 !


In [19]:
a[0]

'D'

In [20]:
a[22]

'!'

In [21]:
a[23]

IndexError: string index out of range

In [22]:
a[-1]

'!'

Python also indexes the arrays backwards, using negative numbers.

In [23]:
index = 0
for character in a:
    print(index, "\t", index-len(a), "\t", character)
    index += 1

0 	 -23 	 D
1 	 -22 	 a
2 	 -21 	 t
3 	 -20 	 a
4 	 -19 	 _
5 	 -18 	 S
6 	 -17 	 c
7 	 -16 	 i
8 	 -15 	 e
9 	 -14 	 n
10 	 -13 	 c
11 	 -12 	 e
12 	 -11 	 _
13 	 -10 	 I
14 	 -9 	 n
15 	 -8 	 s
16 	 -7 	 t
17 	 -6 	 i
18 	 -5 	 t
19 	 -4 	 u
20 	 -3 	 t
21 	 -2 	 e
22 	 -1 	 !


In [24]:
a[0:4]

'Data'

Note that s[i:j] will return a string starting with s[i] and ending with s[j-1], not s[j]

In [25]:
a[:4]

'Data'

You can skip the starting index 0, if it starts from the beginning.

In [26]:
a[4:22]

'_Science_Institute'

In [27]:
a[4:]

'_Science_Institute!'

You can skip the ending index, if it ends to the end.

In [28]:
a[:]

'Data_Science_Institute!'

You can skip both the starting and ending indices if it starts from the beginning and ends to the end.

In [29]:
a[-10:]

'Institute!'

In [30]:
a

'Data_Science_Institute!'

Note that the original string a hasn't changed at all. Indexing and slicing of strings returns a new copy of string, not changing the original string. 

In [31]:
a = a[:4]
a

'Data'

If you want to change the orignial string, make sure to re-assign the new copy to the original variable. 

### String Methods

In [32]:
a = "DaTa ScIeNcE InStItUtE!"
a

'DaTa ScIeNcE InStItUtE!'

In [33]:
# S.upper() 
# Return a copy of S converted to uppercase.

a.upper()

'DATA SCIENCE INSTITUTE!'

In [34]:
# S.lower() 
# Return a copy of the string S converted to lowercase.

a.lower()

'data science institute!'

In [35]:
# S.count(sub[, start[, end]])
# Return the number of non-overlapping occurrences of substring sub in string S[start:end].

a.count("S")

2

All strings and string methods in Python are case-sensitive.

In [36]:
a = "\tData Science Institute!\nWelcome!"     # \t: tab, \n: new line
print(a)

	Data Science Institute!
Welcome!


In [37]:
# S.strip([chars]) 
# Return a copy of the string S with leading and trailing whitespace removed.

a = "\tData Science Institute    \n"
print(a.strip())

Data Science Institute


In [38]:
# S.lstrip([chars])
# Return a copy of the string S with leading whitespace removed.

print(a.lstrip())

Data Science Institute    



In [39]:
# S.rstrip([chars])
# Return a copy of the string S with trailing whitespace removed.

print(a.rstrip())

	Data Science Institute


In [40]:
a= "Data Science Institute;"
print(a.rstrip(";"))                          # Remove other types of characters.

Data Science Institute


In [41]:
# S.join(iterable)
# Return a string which is the concatenation of the strings in S.
# The separator between elements is S.

seq = ["a", "b", "c"]
"+".join(seq)

'a+b+c'

In [42]:
# S.find(sub[, start[, end]])
# Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end].

a = "Data Science Institute!"
a.find("t")

2

In [43]:
a = "Data Science Institute!"
a.find("z")

-1

In [44]:
# S.index(sub[, start[, end]])
# Like S.find() but raise ValueError when the substring is not found.

a = "Data Science Institute!"
a.index("t")

2

In [45]:
a = "Data Science Institute!"
a.index("z")

ValueError: substring not found

In [46]:
# S.replace(old, new[, count])
# Return a copy of S with all occurrences of substring old replaced by new.

a = "Data Science Institute!"
a.replace(" ", "_")

'Data_Science_Institute!'

In [47]:
# S.split(sep=None, maxsplit=-1)
# Return a list of the words in S, using sep as the delimiter string.

a = "Data Science Institute!"
a.split()

['Data', 'Science', 'Institute!']

In [48]:
a = "Data_Science_Institute!"
a.split("_")                           # It can take a seperator argument.

['Data', 'Science', 'Institute!']

## Exercises

Suppose you have a string object <i>a</i>. 

In [49]:
a = "I'm learning Python data analytics."

Print out the first character.

In [50]:
# Your answer here
a[0]

'I'

Print out the last character.

In [51]:
# Your answer here


Print out the first three characters. 

In [52]:
# Your answer here


Print out the last ten characters.

In [53]:
# Your answer here


Print out only the <i>Python</i> part from the string.

In [54]:
# Your answer here


Print out the lower-case version of the string. 

In [55]:
# Your answer here


Remove the trailing period in the string and then split it into a list of words, or tokens. 

In [56]:
# Your answer here
a.rstrip(".").split()

["I'm", 'learning', 'Python', 'data', 'analytics']

Count the number of <i>'yt'</i>s in the string. 

In [57]:
# Your answer here


Remove all whitespaces in the string. (Hint: use replace.)

In [58]:
# Your answer here


## ▪ Lists

Refer to https://docs.python.org/3/library/stdtypes.html#lists

### List Creation

In [59]:
l1 = []
print(l1, type(l1))

[] <class 'list'>


In [60]:
l1 = list()
print(l1, type(l1))

[] <class 'list'>


In [61]:
l2 = [1, 2, 3]
l2

[1, 2, 3]

In [62]:
l3 = ["a", "b", "c"]
l3

['a', 'b', 'c']

In [63]:
l4 = [1, 2, 3, "a", "b", "c"]
l4

[1, 2, 3, 'a', 'b', 'c']

List elements don't have to be of the same type.

### Length of a List

In [64]:
len(l4)

6

The 'len' function is a built-in function of Python, which is widely used for getting the length of a list of any type. 

In [65]:
l4.len()

AttributeError: 'list' object has no attribute 'len'

The len function is often confused with the 'len' method, which actually doesn't exist in lists.

### List Creation Shorcuts

In [66]:
[0, 1, 2] * 5

[0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]

### Combining Lists

In [67]:
l1 = [1, 2, 3]
l2 = ["a", "b", "c"]
l1 + l2

[1, 2, 3, 'a', 'b', 'c']

In [68]:
# L.extend(iterable) 
# Extend list by appending elements from the iterable.

l1.extend(l2)
l1

[1, 2, 3, 'a', 'b', 'c']

The a + b returns a new copy of list, while a.extend(b) actually extends a by adding b. 

In [69]:
# L.append(object) 
# Append object to end.

l = [1, 2, 3]
l.append(4)
l

[1, 2, 3, 4]

In [70]:
l.append([5, 6])
l

[1, 2, 3, 4, [5, 6]]

The [5,6] is an element of the list, not part of the list. The 'append' method always adds one element only to the end of a list.

### Indexing and Slicing of Lists

In [71]:
l = [1, 2, 3, "a", "b", "c"]
l[:3]

[1, 2, 3]

### Sorting Lists

In [72]:
# L.sort(key=None, reverse=False) 
# Sort list.

l = [1, 6, 3, 4, 2, 5]
l.sort()                    # Sort a list in ascending order.
l

[1, 2, 3, 4, 5, 6]

In [73]:
l = [1, 6, 3, 4, 2, 5]
l.sort(reverse=True)        # Sort a list in descending order.
l

[6, 5, 4, 3, 2, 1]

In [74]:
l = ["a", "c", "b", 3, 1, 2]
l.sort()
l

TypeError: '<' not supported between instances of 'int' and 'str'

If you try sorting a list with elements of different types, it returns a TypeError. 

In [75]:
# sorted(iterable, key=None, reverse=False)
# Return a new list containing all items from the iterable in ascending order.

l = [1, 6, 3, 4, 2, 5]
sorted(l)

[1, 2, 3, 4, 5, 6]

Python also has a built-in function sorted(), which works the same as the method sort() except that it returns a new copy. 

In [76]:
sorted(l, reverse=True)

[6, 5, 4, 3, 2, 1]

### Iteration

In [77]:
l = ["a", "b", "c", "d", "e"]

In [78]:
for item in l:
    print(item)               # Instead of print(), you can do whatever you want with each item.

a
b
c
d
e


In [79]:
for i in range(len(l)):       # range(n) returns a list [0, 1, ..., n-1]
    print(l[i])

a
b
c
d
e


### Removing

In [80]:
# L.pop([index]) 
# Remove and return item at index (default last).
# Raises IndexError if list is empty or index is out of range.

l = [1, 2, 3, 4, 5]
l.pop()           # Remove the last item
l

[1, 2, 3, 4]

In [81]:
l.pop(0)          # Remove the first item.
l

[2, 3, 4]

In [82]:
# L.remove(value) 
# Remove first occurrence of value.
# Raises ValueError if the value is not present.

l.remove(3)       # Remove the item 3
l

[2, 4]

### Aggregates

In [83]:
l = [1, 2, 3, 4, 5]
min(l)

1

In [84]:
max(l)

5

In [85]:
sum(l)

15

In [86]:
avg = sum(l) / len(l)
avg

3.0

### Containment

In [87]:
l = [1, 2, 3, 4, 5]
3 in l

True

In [88]:
"a" in l

False

### List Comprehensions

In [89]:
l1 = [1, 2, 3]

l2 = []                  # Create a new list l2 by multiplying each element of l1 by 10.
for item in l1:
    l2.append(item * 10)
l2

[10, 20, 30]

In [90]:
l2 = [item * 10 for item in l1]
l2

[10, 20, 30]

Using list comprehension, you simply describe the process using which the list should be created.

In [91]:
l1 = ["1", "2", "3"]
l2 = ["a", "b", "c"]

l3 = []                  # Create a new list l3 by concatenating all pairs of elements from l1 & l2.
for item1 in l1:         
    for item2 in l2:
        l3.append(item1 + item2)
l3

['1a', '1b', '1c', '2a', '2b', '2c', '3a', '3b', '3c']

In [92]:
l3 = [item1 + item2 for item1 in l1 for item2 in l2]
l3

['1a', '1b', '1c', '2a', '2b', '2c', '3a', '3b', '3c']

List comprehensions have not only the code length advantage, but also the time advantage. List comprehensions are 35% faster than for loops.

## Exercises

Suppose you have a list object like below.

In [93]:
a = ["dogs", "cats", "birds", "tigers", "lions", "foxes"]

Add a new element <i>elephants</i> to the list. 

In [94]:
# Your answer here


Print out the number of elements in the list. 

In [95]:
# Your answer here


Print out each item in such a way that "<i>I love dogs.</i>", "<i>I love cats.</i>", so on and so forth. (Hint: use for loop for iteration and + for string addition.)

In [96]:
# Your answer here


Sort the list in descending alphabetical order. 

In [97]:
# Your answer here


Create a new list from <i>a</i> using list comprehension in such a way that <i>a</i> is sorted in ascending alphabetical order. 

In [98]:
# Your answer here


## ▪ Dictionaries

Refer to https://docs.python.org/3/library/stdtypes.html#mapping-types-dict

In [99]:
d = {}
print(d, type(d))

{} <class 'dict'>


In [100]:
d = dict()
print(d, type(d))

{} <class 'dict'>


In [101]:
buildings = {"CPHB": "College of Public Health Building", "UCC": "University Capitol Center"}
buildings

{'CPHB': 'College of Public Health Building',
 'UCC': 'University Capitol Center'}

In [102]:
buildings["CPHB"]

'College of Public Health Building'

In [103]:
buildings["UCC"]

'University Capitol Center'

Note that the key should be called in square brackets, not in parentheses. Dictionaries are not functions which need parentheses to deliver parameters.

In [104]:
buildings["IMU"]

KeyError: 'IMU'

In [105]:
buildings.keys()

dict_keys(['CPHB', 'UCC'])

In [106]:
buildings.values()

dict_values(['College of Public Health Building', 'University Capitol Center'])

When designing a dictionary, think about which should be the key and which should be the value. It depends on the purpose of the dictionary.

In [107]:
buildings["IMU"] = "Iowa Memorial Union"
buildings

{'CPHB': 'College of Public Health Building',
 'UCC': 'University Capitol Center',
 'IMU': 'Iowa Memorial Union'}

In [108]:
"IMU" in buildings

True

In [109]:
"PBB" in buildings

False

In [110]:
len(buildings)

3

## Exercises

Manually create a dictionary <i>ages</i> with the names of your family members being the keys and their ages being the values.

In [111]:
# Your answer here


Check if <i>Spiderman</i> is in the dictionay.

In [112]:
# Your answer here


Add a new name-age pair ("Spiderman", 20) to the dictionary.

In [113]:
# Your answer here


Get the age of <i>Spiderman</i> from the dictionary.

In [114]:
# Your answer here


## ▪ Sets

Refer to https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset

In [115]:
s = set()
print(s, type(s))

set() <class 'set'>


In [116]:
s = {"cat", "dog", "bird"}
print(s, type(s))

{'dog', 'cat', 'bird'} <class 'set'>


In [117]:
# Add an element to a set.

s.add("fish")
s

{'bird', 'cat', 'dog', 'fish'}

In [118]:
s.add("fish")
s

{'bird', 'cat', 'dog', 'fish'}

Sets do not allow duplicate values.

In [119]:
# Remove an element from a set; it must be a member.
# If the element is not a member, raise a KeyError.

s.remove("cat")
s

{'bird', 'dog', 'fish'}

In [120]:
# Update a set with the union of itself and others.

s.update({"elephant", "horse", "whale"})
s

{'bird', 'dog', 'elephant', 'fish', 'horse', 'whale'}

The 'add' method adds a single element to a set, while the 'update' method adds a group of elements.

In [121]:
"dog" in s

True

In [122]:
"cow" in s

False

In [123]:
for item in s:
    print(item)

dog
whale
elephant
horse
bird
fish


### Set Operations - Union

In [124]:
s1 = {1, 2, 3, 4, 5}
s2 = {1, 3, 5, 7, 9}

In [125]:
s1 | s2                         # vertical bar as a union operator

{1, 2, 3, 4, 5, 7, 9}

In [126]:
# Return the union of sets as a new set.

s1.union(s2)

{1, 2, 3, 4, 5, 7, 9}

### Set Operations - Intersection

In [127]:
s1 & s2                         # ampersand as an intersection operator

{1, 3, 5}

In [128]:
# Return the intersection of two sets as a new set.

s1.intersection(s2)

{1, 3, 5}

### Set Operations - Difference

In [129]:
s1 - s2

{2, 4}

In [130]:
# Return the difference of two or more sets as a new set.

s1.difference(s2)

{2, 4}

### Set Operations - Symmetric Difference

In [131]:
s1 ^ s2

{2, 4, 7, 9}

In [132]:
# Return the symmetric difference of two sets as a new set.

s1.symmetric_difference(s2)

{2, 4, 7, 9}

### Set Operations on Multiple Sets

In [133]:
s1 = {1, 2, 3, 4, 5}
s2 = {1, 3, 5, 7, 9}
s3 = {2, 4, 6, 8, 10}

In [134]:
s1 | s2 | s3

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

In [135]:
set.union(s1, s2, s3)

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

## Exercises

Manually create a set <i>s1</i> of even numbers between 1 and 20.

In [136]:
# Your answer here


Manually create a set <i>s2</i> of all the multiples of 3 between 1 and 20.

In [137]:
# Your answer here


Get the union of the two sets.

In [138]:
# Your answer here


Get the intersection of the two sets.

In [139]:
# Your answer here


## ▪ Operators

In [140]:
(1 + 2) * 3

9

Python supports all types of arithmetic operators.

### Powers

In [141]:
2 ** 10      # 2 to the power of 10

1024

### Division

In [142]:
5 / 2    # true division

2.5

In [143]:
5 // 2   # floor division

2

In [144]:
5 % 2    # remainder division

1

### Type Conversion

In [145]:
a = 1.0
print(a, type(a))

1.0 <class 'float'>


In [146]:
b = int(a)
print(b, type(b))

1 <class 'int'>


In [147]:
c = str(b)
print(c, type(c))

1 <class 'str'>


In [148]:
d = float(c)
print(d, type(d))

1.0 <class 'float'>


### Negation

In [149]:
x = 1
-x

-1

### Comparisons

Refer to https://docs.python.org/3/library/stdtypes.html#comparisons

In [150]:
1 == 1

True

Do not counfuse the '==' (equality) operator with the '=' (assignment) operator. 

In [151]:
1 != 1

False

In [152]:
1 is 1

True

In [153]:
1 is not 1

False

### Augmented Assignment

In [154]:
x = 2
x += 1    # the same as x = x + 1
x

3

There is no x++ in Python.

In [155]:
x = 2
x -= 1    # the same as x = x - 1
x

1

In [156]:
x = 3
x *= 2    # the same as x = x * 2
x

6

In [157]:
x = 4
x /= 2    # the same as x = x / 2
x

2.0

In [158]:
x = 4
x **= 2    # the same as x = x ** 2
x

16

### Boolean Operations

Refer to https://docs.python.org/3/library/stdtypes.html#boolean-operations-and-or-not

In [159]:
p = True
q = False

In [160]:
p and q

False

In [161]:
p or q

True

In [162]:
not p

False

## ▪ Modules

In [163]:
math.sqrt(9)

NameError: name 'math' is not defined

In [164]:
import math
math.sqrt(9)

3.0

In [165]:
import numpy as np
import pandas as pd

In [166]:
from sklearn import svm 

External modules such as <i>numpy</i>, <i>pandas</i>, and <i>sklearn</i> should be installed in advance at an OS level using <i>pip</i> command, not at a Python level. 