# STA160 Data Science 2021 Spring

## References

* Python for Data Analysis
* Python Data Science Handbook
* Hands-On Machine Learning with Scikit-Learn & TensorFlow

## Preparation

The primary programming language for this class is **Python 3**. Python 2 is not a good substitute, so make sure you have Python 3.

If your operating system is Windows or OS X, the best way to set up Python 3 is to get Anaconda. Anaconda is a Python distribution designed specifically for scientific computing. Anaconda includes many of the packages we'll use this quarter. You can find Anaconda at: <https://www.anaconda.com/download/>

If your operating system is Linux, the best way to set up Python 3 is through your distribution's package manager. You'll also need to install Jupyter this way.

## Jupyter Notebooks

This document is called a Jupyter notebook. Jupyter notebooks enable you to write text and code in the same document, and to run the code to display the results. Jupyter notebooks are very popular in the data science community and are convenient for data analysis.

# Part1-Python Basic

## Number and String

In [1]:
x = 1    # int
y = 2.8  # float
x+y

3.8

In [2]:
print(type(x))
print(type(y))
print(type(x+y))

<class 'int'>
<class 'float'>
<class 'float'>


String literals in python are surrounded by either single quotation marks, or double quotation marks. For example, "Python" and 'Python' are the same.

Strings in Python are arrays of bytes representing unicode characters.
Square brackets can be used to access elements of the string. Elements are numbered starting from 0, not 1 (unlike R).

In [3]:
s='Python'
print(s[0])  # select the first character
print(s[1])  # select the second character
print(s[-1]) # select the last character
print(s[-2]) # select the second last character

P
y
n
o


You can get a slice (multiple elements) of the container with the : operator.
The first number is included, but the last number is not (unlike R).

In [4]:
s[0:4]   # from s[0] to s[3]

'Pyth'

In [5]:
s[1:]    # from s[1] to end

'ython'

In [6]:
s[:4]    # from s[0] to s[3]

'Pyth'

In [7]:
s[:]     # from a[0] to end

'Python'

In [8]:
s[0:5:2]  # from s[0] to s[4] with increment 2, so returns s[0] s[2] s[4]

'Pto'

In [9]:
len(s) # returns the length of a string

6

In [10]:
s.lower() # returns the string in lower case

'python'

In [11]:
s.upper() # returns the string in upper case

'PYTHON'

In [12]:
s.replace('hon','orch') # replaces a string with another string

'Pytorch'

'str' object does not support item assignment

Error: s[0]='A'

In [13]:
try:
    s[0]='A'
    print("finish")
except TypeError:
    print("The type of input is wrong!")

The type of input is wrong!


Concatenate two strings: String1 + String2

In [14]:
s='Python'
print('Hi ' + s + '!')
print(s[0]+s[-1])

Hi Python!
Pn


__Casting__ can be used if you want to specify a type on a variable.

In [15]:
float(1)  # from int to float

1.0

In [16]:
int(1.0)  # from float to int

1

In [17]:
str(1)    # from int to string

'1'

In [18]:
int('1')  # from string to int

1

In [19]:
int('P')  # Error

ValueError: invalid literal for int() with base 10: 'P'

## List, Tuple, Set and Dictionary
There are four collection data types in the Python programming language:

__List__ is a collection which is ordered and changeable. Allows duplicate members.

__Tuple__ is a collection which is ordered and unchangeable. Allows duplicate members.

__Set__ is a collection which is unordered and unindexed. No duplicate members.

__Dictionary__ is a collection which is unordered, changeable and indexed. No duplicate members.        

### List

In [20]:
L=[1,'a',3.0]  # list can have different types 
print(L)

[1, 'a', 3.0]


In [21]:
L[1]        # selection

'a'

In [22]:
L[0:3]      # slicing

[1, 'a', 3.0]

In [23]:
L.append(6) # append(): add an element at the end of the list
print(L)

[1, 'a', 3.0, 6]


In [24]:
L.extend([1,2,3,4]) # extend(): add the elements of a list to the end of the current list
print(L)

[1, 'a', 3.0, 6, 1, 2, 3, 4]


In [25]:
L=[1,'a',3]
L.insert(2,'b') # insert() adds an element at the specified position
print(L)

[1, 'a', 'b', 3]


In [26]:
print(L)      
L.pop(1)      # pop(): removes the element at the specified position
print(L)      # remove L[1]

[1, 'a', 'b', 3]
[1, 'b', 3]


In [27]:
print(L)
L.pop()       # remove L[-1]
print(L)

[1, 'b', 3]
[1, 'b']


In [28]:
L=[1,5,6,3,2,7,8,3]
L.remove(3)  # remove(): removes the first occurrence of the element with the specified value
print(L)     # remove the first value 3

[1, 5, 6, 2, 7, 8, 3]


In [29]:
print(L)
L.reverse()  # reverse(): reverses the order of the list
print(L)

[1, 5, 6, 2, 7, 8, 3]
[3, 8, 7, 2, 6, 5, 1]


In [30]:
L.sort()              # sort(): sorts the list ascending by default
print(L)

L.sort(reverse=True)  # sort descending
print(L)           

[1, 2, 3, 5, 6, 7, 8]
[8, 7, 6, 5, 3, 2, 1]


### Tuple (a unchangeable list)
no operation like append, extend, pop, remove ...

In [31]:
T=(1,3,5,6)
print(T)
print(type(T))

(1, 3, 5, 6)
<class 'tuple'>


In [32]:
T[0:2]

(1, 3)

In [33]:
T[0]=0  # Error

TypeError: 'tuple' object does not support item assignment

### Set

In [34]:
S={1,1,2,5,7,7}
print(S)
print(type(S))

{1, 2, 5, 7}
<class 'set'>


In [35]:
S.add(9) # add(): add one element to a set
print(S)

{1, 2, 5, 7, 9}


In [36]:
S.remove(2) # remove(): remove an element from a set; it must be a member
print(S)

{1, 5, 7, 9}


In [37]:
S1={1,3,4,5}
S2={3,4,6,7}
print(S1.union(S2)) # Union
print(S1.intersection(S2)) # Intersection

{1, 3, 4, 5, 6, 7}
{3, 4}


### Dictionary

A dictionary is a container for {key: value} pairs. You can use __any__ type as a key and __any__ type as a value.

3 ways in defining dictionary:

In [46]:
# 1
D={1:'STA' , 'Hi':160 , 2.0:'2021'}
print(D)

{1: 'STA', 'Hi': 160, 2.0: '2021'}


In [47]:
D['Hi']

160

In [48]:
# 2
dict(A='STA', B=160, C='2021')

{'A': 'STA', 'B': 160, 'C': '2021'}

In [49]:
# 3
D={}
D[1]='STA'
D['Hi']=160
D[2.0]='2021'
print(D)

{1: 'STA', 'Hi': 160, 2.0: '2021'}


In [50]:
D.items()

dict_items([(1, 'STA'), ('Hi', 160), (2.0, '2021')])

In [51]:
D.keys()

dict_keys([1, 'Hi', 2.0])

In [52]:
D.values()

dict_values(['STA', 160, '2021'])

In [55]:
for x,y in D.items():
    print(x,y)

1 STA
Hi 160
2.0 2021


In [56]:
for x in D.values():
    print(x)
print('finish')

STA
160
2021
finish


In [57]:
a=[str(x) for x in D.values()]
print(a)

['STA', '160', '2021']


From __List__ to __String__

In [145]:
' '.join(a)

'STA 160 2020'

In [146]:
b=[1,2,3]
', '.join(map(str,b)) # map() to convert each item in the list to a string then join them

'1, 2, 3'

From __String__ to __List__

In [147]:
list('Python')

['P', 'y', 't', 'h', 'o', 'n']

From __List__ to __Set__

In [148]:
set([1,2,3])

{1, 2, 3}

From __Set__ to __List__

In [149]:
list({1,2,3})

[1, 2, 3]

From __Set__ to __Tuple__

In [150]:
tuple({1,2,3})

(1, 2, 3)

From __Tuple__ to __List__

In [151]:
list((1,2,3))

[1, 2, 3]

### Inserting values into strings

You can use the string method format method to create new strings with inserted values.
The curly braces show where the inserted value should go.

In [7]:
"Month {}, Year {}.".format('April', 2020)

'Month April, Year 2020.'

For % operator formating, you show where the inserted values should go using a % character followed by a format specifier, to say how the value should be inserted.

In [8]:
# Notice the %s marker to insert a string, and the %d marker to insert an integer.
"Month %s, Year %d." % ('April', 2020)

'Month April, Year 2020.'

## iterator

String, Lists, Tuples, Sets, and Dictionaries are all iterable objects. They are iterable containers which you can get an iterator from. All these objects have a iter() method which is used to get an iterator

In [37]:
it=iter([1,2,3])
for x in it:
    print(x)

1
2
3


In [46]:
it=iter({'a':1,'b':2,'c':3})
print(next(it))
print(next(it))
print(next(it))

a
b
c


In [152]:
# iterator: enumerate elements from 0 to length
enumerate('Python')

<enumerate at 0x10dccf2d0>

In [153]:
list(enumerate('Python')) # convert to list

[(0, 'P'), (1, 'y'), (2, 't'), (3, 'h'), (4, 'o'), (5, 'n')]

In [154]:
dict(enumerate('Python')) # convert to dictionary

{0: 'P', 1: 'y', 2: 't', 3: 'h', 4: 'o', 5: 'n'}

In [155]:
# iterator: element-wise pairs
zip([1,2,3],['a','b','c'])

<zip at 0x10dd47d48>

In [156]:
list(zip([1,2,3],['a','b'])) # convert to list

[(1, 'a'), (2, 'b')]

In [157]:
dict(zip([1,2,3],['a','b'])) # convert to list

{1: 'a', 2: 'b'}

In [158]:
t=zip('abc','ABC')
[ u+l for l,u in t]

['Aa', 'Bb', 'Cc']

## For, While loops

In [26]:
L=[1,3,4]
# record L squared in result
result=[]
for i in range(len(L)):
    result.append(L[i]**2)
print(result)

[1, 9, 16]


In [48]:
# range() function
print(list(range(3)))
print(list(range(1,4)))

[0, 1, 2]
[1, 2, 3]


In [49]:
M=[[1,2,3],[4,5,6],[7,8,9]]

for i in range(len(M)):
    for j in range(len(M[0])):
        if M[i][j]==5:
            M[i][j]*=10
        else:
            M[i][j]+=1
print(M)

[[2, 3, 4], [5, 50, 7], [8, 9, 10]]


In [50]:
x=10
addup=0
while x>0:
    print(addup)
    addup+=x
    x-=1
print("The toal sum is: ",addup)

0
10
19
27
34
40
45
49
52
54
The toal sum is:  55


In [None]:
# x-=1 -> x=x-1

### break and continue

In [58]:
i = 0
while i < 6:
    i += 1
    print(i)
    if i == 3:
        break
    print('finish an iteration')

1
finish an iteration
2
finish an iteration
3


In [59]:
i = 0
while i < 6:
    i += 1
    print(i)
    if i == 3:
        continue # skip the rest and do next iteration
    print('finish an iteration')

1
finish an iteration
2
finish an iteration
3
4
finish an iteration
5
finish an iteration
6
finish an iteration


## function

In [1]:
def f(x):
    return(2*x+1)

In [2]:
f(1)

3