# Learn the Basics

### Assignment operator

- In Python, assignments can be done simultaneously --> if you need to swap 2 number, **NO NEED** any temp variable (like in c)

In [2]:
a = 1
b = 2
a, b = b, a
print(a, b)

2 1


### String Formatting

- The "%" operator is used to format a set of variables enclosed in a "tuple" (a fixed size list), together with a format string
- Argument will be formatting in right order as in tuple
  ```Python
  print("Number 1 = %d, Number 2 = %d" % (num1, num2))
  ```
- If the type in tuple does not match --> Error and can't run
- Any object which is not a string can be formatted using the %s operator as well --> That object will be formatted as a string

### Basic String Operations

- Want to find the location (index of first letter) of the first occurrence of a string: method `.index("<string need to find>")`
- String also has "slice" with syntax `[<start>:<stop>:<step>]`, from index = `start` until index < `stop` (default, step = 1)
- `.upper()` and `.lower()` methods: upper and lower a string
- `.startswith("<string>")` and `.endswith(",string>")` methods: Return bool value to check whether the string starts with something or ends with something
- `.split("<demi>")`: splits the string into a bunch of strings grouped together in a list, separate with `<demi>` (not count `<demi>` in after results)
  - Default: `<demi>` = " " (space)

In [None]:
astring = "Hello world!"
afewwords = astring.split(" ")
print(afewwords)

### Conditions

- Unlike the double equals operator "==", the "is" operator does not match the values of the variables, but the instances themselves

In [None]:
# Separate is and ==
x = [1,2,3]
y = [1,2,3]
print(x == y) # Prints out True
print(x is y) # Prints out False

### Loop

- Unlike C, C++; in Python we have **else** for loops
- When the loop condition of "for" or "while" statement fails --> code part in "else" is executed (just 1 time when condition is fails)
- If a **break** statement is executed inside the for loop then the "else" part is ***skipped***
- If a **continue** statement is executed inside the for loop then the "else" part still is ***executed***

In [None]:
count=0
while(count<4):
    print(count)
    count +=1
else:
    print("else block: count value reached %d" %(count))

for i in range(1, 10):
    if(i%3==0):
        continue
    if (i%8 == 0):
        break
    print(i)
else:
    print("else block: count value is %d" % (i))

### Dictionaries

- Dictionary works a pair `key` and `value` instead of index
- Each value stored in a dictionary can be accessed using a key
- To iterate over dictionaries, we can't using index --> have to use `.items()` method
- To remove a value, using `del` function or `.pop(<key>)` method

In [None]:
# Iterating over dictionaries
phonebook = {"John" : 938477566,"Jack" : 938377264,"Jill" : 947662781}
for name, number in phonebook.items():
    print("Phone number of %s is %d" % (name, number))

# Remove using del
del phonebook["John"]
print(phonebook)

# Remove using pop
phonebook.pop("Jack")
print(phonebook)

- To get value coressponding with a key --> using `get(<key>)` method
- To retrieve a dictionary, using
  - `keys()` method: return a ***list*** of all the keys
  - `values()` method: return a ***list*** of all the values
  - `items()` method: return each item in a dictionary ad ***tuple in a list***
  > The list of the `keys()`,`values()`  and `items()` is a **view of the dictionary**, meaning that any changes done to the dictionary will be reflected in the keys list

In [5]:
mydict = {
    'Mom':'Giang',
    'Dad':'Long',
    'Born':2002,
    }

x=mydict.keys()
y=mydict.values()
z=mydict.items()
print(x)
print(y)
print(z)

print("\nAfter change:")
mydict['Sex'] = 'Female'
print(x)
print(y)
print(z)

dict_keys(['Mom', 'Dad', 'Born'])
dict_values(['Giang', 'Long', 2002])
dict_items([('Mom', 'Giang'), ('Dad', 'Long'), ('Born', 2002)])

After change:
dict_keys(['Mom', 'Dad', 'Born', 'Sex'])
dict_values(['Giang', 'Long', 2002, 'Female'])
dict_items([('Mom', 'Giang'), ('Dad', 'Long'), ('Born', 2002), ('Sex', 'Female')])


### Modules and Packages

- Python interpreter will look for **modules** in:
  - the default local directory
  - built-in modules
  - use environment variable `PYTHONPATH` to specify additional directories
  - `sys.path.append` function (Execute it ***before*** running the `import`)
- `dir` function: look for which functions are implemented in each module
- **Packet** = directories containing multiple packages and modules
- Each package in Python ***MUST*** contain `__init__.py` file to:
  - Indicates that the directory it's in is a Python package 
  - Decide which modules the package exports as the API, while keeping other modules internal  (by overriding the `__all__` variable)

# Data Science Tutorials

### Numpy Arrays

- Key advantages: fast, easy to work with, and can perform calculations across entire arrays
- Numpy arrays can be created from a List
- Normal arithmetic operaters is treated as element-wise operaters
- Can subsetting

In [2]:
import numpy as np

# Create 2 new lists
height = [1.87,  1.87, 1.82, 1.91, 1.90, 1.85]
weight = [81.65, 97.52, 95.25, 92.98, 86.18, 88.45]

# Create 2 numpy arrays from lists
np_height = np.array(height)
np_weight = np.array(weight)
print(np_height)
print(np_weight)

# Bit-wise operaters
bmi = np_weight / np_height ** 2

# Print the result
print(type(bmi))
print(bmi)

# Subsetting --> Ex: print elements in bmi that > 26
print(bmi[bmi > 26])

[1.87 1.87 1.82 1.91 1.9  1.85]
[81.65 97.52 95.25 92.98 86.18 88.45]
<class 'numpy.ndarray'>
[23.34925219 27.88755755 28.75558507 25.48723993 23.87257618 25.84368152]
[27.88755755 28.75558507]


### Pandas Basics

- Pandas DataFrames: store and manipulate tabular data in rows of observations and columns of variables
- To create a DataFrame: Using a dictionary; import from `.csv` file (default: first line = header)
  - index_col: set which columns to be used as the index of the dataframe
- In Dataframe --> There is a column that show index
  - Default: int, start from 0
  - Can be modify by user using `.index` properties
- Can use `[]` to select one **column** using header (single bracket [out = Pandas Series] or double bracket [out = Pandas DataFrame])
- Use `()` to access one **row**
- Can use `loc` and `iloc` to selecting data (base on both colum and row)
  - `loc` is ***label-based*** --> have to pass ***name*** rows or columns and can accept the boolean data
  - `iloc` is ***integer index*** based --> have to pass ***interger index*** rows or columns by their integer index: locate a cell of the data set

In [1]:
import pandas as pd

dict = {"country": ["Brazil", "Russia", "India", "China", "South Africa"],
       "capital": ["Brasilia", "Moscow", "New Dehli", "Beijing", "Pretoria"],
       "area": [8.516, 17.10, 3.286, 9.597, 1.221],
       "population": [200.4, 143.5, 1252, 1357, 52.98] }

brics = pd.DataFrame(dict)
print(brics)

print('\n')

# index_col =
data = pd.read_csv('sample_data.csv', index_col = 0)
print(data)

        country    capital    area  population
0        Brazil   Brasilia   8.516      200.40
1        Russia     Moscow  17.100      143.50
2         India  New Dehli   3.286     1252.00
3         China    Beijing   9.597     1357.00
4  South Africa   Pretoria   1.221       52.98


   day  month  year   name
1   16     12     2   Linh
2   13      9     2   Hieu
3   14     10     2  Duong
4   12     12     2   Kiet
5    2      6    76  Giang
6   12      8    80    Hue


In [2]:
# Set the index for brics
brics.index = ["BR", "RU", "IN", "CH", "SA"]

# Print out brics with new index values
print(brics)

         country    capital    area  population
BR        Brazil   Brasilia   8.516      200.40
RU        Russia     Moscow  17.100      143.50
IN         India  New Dehli   3.286     1252.00
CH         China    Beijing   9.597     1357.00
SA  South Africa   Pretoria   1.221       52.98


In [3]:
# Print out country column as Pandas Series
print(data['day'])

# Print out country column as Pandas DataFrame
print(data[['day']])

# Print out DataFrame with country and drives_right columns
print(data[['day', 'year']])

1    16
2    13
3    14
4    12
5     2
6    12
Name: day, dtype: int64
   day
1   16
2   13
3   14
4   12
5    2
6   12
   day  year
1   16     2
2   13     2
3   14     2
4   12     2
5    2    76
6   12    80


In [4]:
# Print out first 4 observations
print(data[0:4])

# Print out fifth and sixth observation
print(data[1:3])

   day  month  year   name
1   16     12     2   Linh
2   13      9     2   Hieu
3   14     10     2  Duong
4   12     12     2   Kiet
   day  month  year   name
2   13      9     2   Hieu
3   14     10     2  Duong


In [5]:
# Print out observation for birth in year 02
print(data.loc[data['year']==2])

   day  month  year   name
1   16     12     2   Linh
2   13      9     2   Hieu
3   14     10     2  Duong
4   12     12     2   Kiet


In [6]:
print(data.loc[:,['year', 'name']])

   year   name
1     2   Linh
2     2   Hieu
3     2  Duong
4     2   Kiet
5    76  Giang
6    80    Hue


In [7]:
# select data from row 2, colum 1
print(data.iloc[2, 1])
print('\n')
print(data.loc[[1, 2]])

10


   day  month  year  name
1   16     12     2  Linh
2   13      9     2  Hieu


# Advanced Tutorials

### Generators

- A function return an iterable set of items --> used to create iterator
- Inside generator will have a ***loop*** (`for` or `while` loop)
  - When an iteration over a set of item: the generator is run
  - Once the generator's function code reaches a `yield` statement: yield (generate) a new element

In [7]:
import random

def lottery():
    # returns 6 numbers between 1 and 40
    for i in range(6):
        yield random.randint(1, 40)
        
    # returns a 7th number between 1 and 15
    yield random.randint(1, 15)

# lottery is generator, not set, list,...
print(lottery())

print('\n')

for random_number in lottery():
       print("And the next number is... %d!" %(random_number))

<generator object lottery at 0x7f1ef0558f20>


And the next number is... 7!
Will it reach?
And the next number is... 33!
Will it reach?
And the next number is... 21!
Will it reach?
And the next number is... 34!
Will it reach?
And the next number is... 9!
Will it reach?
And the next number is... 1!
Will it reach?
And the next number is... 10!


### List Comprehensions

- Create a new list based on another interable object, in a single, readable line
  --> No need to use a loop

In [None]:
# Create a list contain len of each word in a string (except for "the")
sentence = "the quick brown fox jumps over the lazy dog"
words = sentence.split()
word_lengths = [len(word) for word in words if word != "the"]
print(words)
print(word_lengths)

In [10]:
# Create a list from a tuple
numbers = (34.6, -203.4, 44.9, 68.3, -12.2, 44.6, 12.7)
newlist = [int(x) for x in numbers if x>0]
print(newlist)

[34, 44, 68, 44, 12]


In [17]:
# Create a list from a generator
def fib():
    a, b = 1, 1
    counter = 0
    while 1:
        yield a
        a, b = b, a + b
        counter += 1
        if counter == 10:
            break

for random_number in fib():
       print(random_number)

print('\n')

evenlist = [int(x) for x in fib() if x%3==0]
print(evenlist)

1
1
2
3
5
8
13
21
34
55


[3, 21]


### Lambda functions

- Lambda functions = inline functions defined at the ***same place*** we use it --> often be used when just use a function for just a single time
  - Don't need to declare a function somewhere and revisit the code
  - Don't need to have a name
- We define a lambda function using the keyword `lambda`, with syntax:
  > `your_function_name = lambda inputs : output`

In [1]:
l = [2,4,7,3,14,19]
for i in l:
    is_odd = lambda x : (x%2==1)
    if (is_odd(i)):
        print("True")
    else:
        print("False")

False
False
True
True
False
True


- We can declare a lambda function and call it as an anonymous function, ***without assigning it to a variable***

In [2]:
print((lambda x: x*x)(5))

25


### Multiple Function Arguments

#### Arbitrary Argument - *args

- We can declare functions which receive a ***variable number of arguments*** (which we unknow) --> add `*` before yhe parameter name
- For example: the code below. The `therest` variable = a ***list of variables***, which receives all arguments given to the `foo` function after the first 3 arguments

In [3]:
def foo(first, second, third, *therest):
    print("First: %s" %(first))
    print("Second: %s" %(second))
    print("Third: %s" %(third))
    print("And all the rest... %s" %(list(therest)))

foo(1, 2, 3, 4, 5)

First: 1
Second: 2
Third: 3
And all the rest... [4, 5]


#### Keyword Argument - kwargs

- We can send functions arguments ***by keyword***, with the `key = value` syntax
  > The order of the argument does not matter

In [7]:
def say_hello(name, nation):
    print("Hello " + name + " from " + nation)

say_hello(nation="Vietnamese", name="Linh")

Hello Linh from Vietnamese


#### Arbitrary Keyword Argument - **kwargs

- If you do not know how many ***keyword arguments*** that will be passed into function --> add `**` before the parameter name
- Function till receive a ***dictionary*** of arguments and can access the items
  > The order of argument does not matter
- If the function use/access the keyword argument but calling code does not pass that keyword argument --> raise `KeyError`

In [1]:
def my_function(**kid):
  print("His last name is " + kid["lname"])

my_function(fname = "Tobias", lname = "Refsnes")

His last name is Refsnes


In [8]:
# edit the functions prototype and implementation
def foo(a, b, c, *extra):
    return len(extra)

def bar(a, b, c, **extra):
    return extra.get("magicnumber") == 7

# Test code
print(foo(1, 2, 3, 4))
print(bar(1, 2, 3, magicnumber=6))
print(bar(1, 2, 3, magicnumber=7))

1
False
True


### Regular Expressions