# Python Crash Course 1

- Modules
- Function
- Data types  
  
(This notebook is translated and edited from https://hcid-courses.github.io/datajournalism-2019/schedule.html)

In [None]:
print("Hello, World!")

Hello, World!


## The Zen of Python

In [None]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


## Whitespace Formatting

- Python uses indentation for blocks.
- You must indent when using for, if, while, try-except, def...
- The number of spaces is up to you as a programmer.
- But, everyone uses 4 spaces so let's just do that.

In [None]:
for i in [1, 2, 3, 4, 5]:
    print(i)                     # first line in "for i" block
    for j in [1, 2, 3, 4, 5]:
        print(j)                # first line in "for j" block
        print(i + j)             # last line in "for j" block
    print(i)                     # last line in "for i" block
print("done looping")

1
1
2
2
3
3
4
4
5
5
6
1
2
1
3
2
4
3
5
4
6
5
7
2
3
1
4
2
5
3
6
4
7
5
8
3
4
1
5
2
6
3
7
4
8
5
9
4
5
1
6
2
7
3
8
4
9
5
10
5
done looping


- Whitespace(=indent) tells you the start and the end of blocks.
- However, the whitespace inside a parenthesis is ignored.

In [None]:
long_winded_computation=(1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 +
                         13 + 14 + 15 + 16 + 17 + 18 + 19 + 20)

- Thus, although the end result is the same, you would like to format your code so that it is easier to read.

In [None]:
list_of_lists = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

easier_to_read_list_of_lists = [ [1, 2, 3],
                                 [4, 5, 6],
                                 [7, 8, 9] ]

list_of_lists == easier_to_read_list_of_lists

True

## Commenting the Codes

- If you want to add some explaination to your code, or temporarily deactivate certain part of the code, you use #.
- Shortcut: ctrl + /

In [None]:
# This is a comment in Python
print("Hello World") # This is also a comment in Python

Hello World


'\nThis is an example of a multiline\ncomment that spans multiple lines...\n'

In [None]:
for i in range(5):
    m = a + b
    print(i)

NameError: name 'a' is not defined

In [None]:
for i in range(5):
    # m = a + b
    print(i)

0
1
2
3
4


## Modules

- All languages are lacking in their inherent features.
- Therefore, external extension libraries that someone has created are often used, and in Python, these are referred to as modules.
- Modules extend the functionality that Python provides.

In [None]:
import re  # This module is for using regular expression(=regex) in python

my_regex = re.compile("[0-9]+")
string_to_search = "I got 100 dollars first month, 200 next month, and 300 this month. What a deal!"
match = my_regex.findall(string_to_search)

match

['100', '200', '300']

In [None]:
import matplotlib.pyplot as plt # If some modules' names are too long, this is useful

In [None]:
import pandas
print(pandas.DataFrame())

Empty DataFrame
Columns: []
Index: []


In [None]:
import pandas as pd
print(pd.DataFrame())

Empty DataFrame
Columns: []
Index: []


- You can choose to import a subset of variables and/or functions from a module.

In [None]:
from collections import defaultdict, Counter

## Value Assign

- In Python, to assign a value to a variable, you do it as follows.
- Unlike other languages, Python does not create variables by specifying a particular data type.
- When one variable (d) references another variable (c), if you change the value of the new variable, the value of the referenced variable also changes.

In [None]:
a = "my string"
b = 1027
c = 1027.0
d = [1, 2, 3]
e = d
e.append(4)


print(a)
print(b)
print(c)
print(d)
print(e)

my string
1027
1027.0
[1, 2, 3, 4]
[1, 2, 3, 4]


In [None]:
type(a), type(b), type(c), type(d), type(e)

(str, int, float, list, list)

- To create a different list while copying the values from the original referenced variable, you can do as follows.

In [None]:
# 1. [:]
orig = [1, 2, 3]
dupe = orig[:]

print("original: ", orig)
print("duplicated: ", dupe)
dupe.append(4)

print("original: ", orig)
print("duplicated: ", dupe)

original:  [1, 2, 3]
duplicated:  [1, 2, 3]
original:  [1, 2, 3]
duplicated:  [1, 2, 3, 4]


In [None]:
# 2. copy

from copy import copy
orig = [1, 2, 3]
dupe = copy(orig)

print("original: ", orig)
print("duplicated: ", dupe)
dupe.append(4)

print("original: ", orig)
print("duplicated: ", dupe)

original:  [1, 2, 3]
duplicated:  [1, 2, 3]
original:  [1, 2, 3]
duplicated:  [1, 2, 3, 4]


## Interactive Programming

- To receive and process user input, the **input()** method is used.

In [None]:
name = input("Hello there, and what's your name?")
print("Nice to meet you, " + name + ".")

Hello there, and what's your name? Chur


Nice to meet you, Chur.


## Arithmetic

- Python provides basic arithmetic functions. For more complex calculations, modules like *math* are imported and used.

In [None]:
print(1 + 2)
print(1 - 2)
print(5 * 3)
print(5 / 3)

3
-1
15
1.6666666666666667


- If the end result of print(5 / 3) is 1, then you are using Python 2.
- Please consider upgrading it

- Calculating remainder

In [None]:
print(5 % 3)
print(7 % 2)

2
1


- The 'remainder' is used very often when coding.
- You can increase a number and use the remainder to control the output frequency.

In [None]:
for i in range(21):
    flag = i % 3
    if flag == 0:
        print(i, "-", flag, ": Hello")
    elif flag == 1:
        print(i, "-", flag, ": Python")
    else:
        print(i, "-", flag, ": World!")

0 - 0 : Hello
1 - 1 : Python
2 - 2 : World!
3 - 0 : Hello
4 - 1 : Python
5 - 2 : World!
6 - 0 : Hello
7 - 1 : Python
8 - 2 : World!
9 - 0 : Hello
10 - 1 : Python
11 - 2 : World!
12 - 0 : Hello
13 - 1 : Python
14 - 2 : World!
15 - 0 : Hello
16 - 1 : Python
17 - 2 : World!
18 - 0 : Hello
19 - 1 : Python
20 - 2 : World!


## Functions

- Functions are used to repeat the same functionality.
- There may be cases where they have parameters, and cases where they do not.

In [None]:
def hello():
    print("Hello, my friend.")

hello()

Hello, my friend.


In [None]:
def hello(name="my friend"):
    print("Hello, " + name)

hello("Emilio")
hello()

Hello, Emilio
Hello, my friend


In [None]:
def double(x):
    """this is where you put an optional docstring
    that explains what the function does.
    for example, this function multiplies its input by 2"""
    return x * 2

double(4)

8

In [None]:
def subtract(a=0, b=0):
    return a - b

print(subtract(5, 4))
print(subtract(0, 8))
print(subtract(b=5))
print(subtract())

1
-8
-5
0


## Strings

- Strings are declared using either single quotes or double quotes.

In [None]:
single_quoted_string = 'data science'
double_quoted_string = "data science"

single_quoted_string == double_quoted_string

True

- To use a quotation mark within a string, you can either mix single and double quotes or use an escape character.

In [None]:
string = 'I'm a python.'
print(string)

SyntaxError: unterminated string literal (detected at line 1) (1179448193.py, line 1)

In [None]:
string = "I'm a python."
print(string)

I'm a python.


In [None]:
string = 'I\'m a python.'
print(string)

I'm a python.


- \t : tab
- \n or \r : newline

(Ref.) https://docs.python.org/2.0/ref/strings.html

In [None]:
tabstring1 = "I'm\t\tpython."
tabstring2 = "This is the list of food I like:\n\t* Rat\n\t* Rabbit\n\t* Data"
print(tabstring1)
print(tabstring2)

I'm		python.
This is the list of food I like:
	* Rat
	* Rabbit
	* Data


- Strings can be operated on using operators.


In [None]:
print("I like " + "Python.")
print("python " * 3)
print("python " + 3)

I like Python.
python python python 


TypeError: can only concatenate str (not "int") to str

In [None]:
print("python " + str(3)) # make 3 string first

python 3


- When creating a multi-line string, you can use the triple quotes (""") that were used to make comments.


In [None]:
multi_line_string = """This is the first line.
and this is the second line
    and this is the third line"""
print(multi_line_string)

This is the first line.
and this is the second line
    and this is the third line


- To access elements within a string, you use an index just like with a list.
- To explicitly convert to a list, you use the list() function.
- (The usage of a list is discussed in more detail in the section below.)

In [None]:
myString = 'hello python'
print(myString[0])
print(myString[4])
print(myString[:3])
print(myString[3:])
print(myString[3:5])
list(myString)

h
o
hel
lo python
lo


['h', 'e', 'l', 'l', 'o', ' ', 'p', 'y', 't', 'h', 'o', 'n']

## Date and Time

- The datetime module that Python has by default supports datetime, date, and time types.
- As can be inferred from the name, the datetime type stores both date and time information and is the mainly used data type.



In [None]:
from datetime import datetime, date, time

# Year, Month, Day, Hour, Minute, Second
dt = datetime(2023, 8, 21, 12, 34, 56)

print(dt)

2023-08-21 12:34:56


In [None]:
print(dt.day)
print(dt.month)
print(dt.minute)

21
8
34


In [None]:
print(dt.date())
print(dt.time())

2023-08-21
12:34:56


- The strftime method converts datetime into a string. It is necessary to understand the formatting method.
-
(Ref.) http://strftime.org/

In [None]:
dt.strftime("%m/%d/%Y %H:%M (%A)")

'08/21/2023 12:34 (Monday)'

In [None]:
dt2 = dt.now()
dt2

# awkwardly big number indicates microsecond

datetime.datetime(2023, 8, 16, 12, 37, 32, 294472)

In [None]:
delta = dt2 - dt

print(delta)

print(f"{delta.days} days")
print(f"{delta.seconds} seconds")

-5 days, 0:02:36.294472
-5 days
156 seconds


## Exceptions

- While programming, errors related to data operations always occur.
- There are various types of errors, and typically the program terminates when an error occurs.
- To continue the program without termination after handling the error, the feature called "exception" is used.
- For example, performing 0/0 will cause a ZeroDivisionError and the program will terminate, but if you handle the exception, the program will continue after processing the error.
- Let's check the following code.




In [None]:
print(0/0)
print("Finished.")

ZeroDivisionError: division by zero

In [None]:
try:
    print(0/0)
except:
    print("What is happening?")

print("Finished.")

What is happening
Finished.


In [None]:
try:
    print(0/0)
except ZeroDivisionError:
    print("What is happening?")

print("Finished.")

What is happening?
Finished.


In [None]:
try:
    print(0/0)
except ZeroDivisionError as e:
    print(e)

print("Finished.")

division by zero
Finished.


## Lists

- In Python, the most basic data structure is the List, a simple ordered collection.
- It is similar to an array in other languages, but it is implemented with more diverse functions related to the insertion, deletion, and merging of data.
- Generally, in other languages, you must specify the data type to be stored when creating a list or array, but in recent languages, including Python, you don't need to specifically designate the data type.




In [None]:
integer_list = [1, 2, 3]
print(integer_list)

[1, 2, 3]


In [None]:
# Different types of data can be placed in a single list.
heterogeneous_list = ["string", 0.1, True]
print(heterogeneous_list)

['string', 0.1, True]


In [None]:
# A list can contain another list.
list_of_lists = [ integer_list, heterogeneous_list, [] ]
print(list_of_lists)

[[1, 2, 3], ['string', 0.1, True], []]


In [None]:
list_length = len(integer_list)   # equals 3
list_sum = sum(integer_list)      # equals 6
print(list_length, list_sum)

3 6


- To store and retrieve data in a List, an index is used. **The index starts from 0.**
- I am going to repeat again, **the index starts from 0.**




In [None]:
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]     # is the list [0, 1, ..., 9]
print(x)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [None]:
a = x[0]         # equals 0, lists are 0-indexed
b = x[1]         # equals 1
c = x[-1]        # equals 9, last element
d = x[-2]        # equals 8, next-to-last element
print(a, b, c, d)

0 1 9 8


In [None]:
x[0] = -1         # now x is [-1, 1, 2, 3, ..., 9]
x[-2] = 100       # now x is [-1, 1, 2, ..., 7, 100, 9]
print(x)

[-1, 1, 2, 3, 4, 5, 6, 7, 100, 9]


- You can slice a list using square brackets.




In [None]:
first_three = x[:3]     # [-1, 1, 2]
three_to_end = x[3:]    # [3, 4, ..., 7, 100, 9]
one_to_four = x[1:5]    # [1, 2, 3, 4]
last_three = x[-3:]     # [7, 100, 9]
without_first_and_last = x[1:-1]   # [1, 2, ..., 7, 100]
copy_of_x = x[:]        # [-1, 1, 2, ..., 7, 100, 9]
print(first_three, three_to_end, one_to_four, last_three, without_first_and_last, copy_of_x)

[-1, 1, 2] [3, 4, 5, 6, 7, 100, 9] [1, 2, 3, 4] [7, 100, 9] [1, 2, 3, 4, 5, 6, 7, 100] [-1, 1, 2, 3, 4, 5, 6, 7, 100, 9]


- To combine two lists, you can use the + operator or the .extend function. (As previously discussed) The .append function adds one element to the end.




In [None]:
x = [1,2,3]
x.extend([4, 5, 6])    # x is now [1,2,3,4,5,6]
print(x)

[1, 2, 3, 4, 5, 6]


In [None]:
x = [1,2,3]
y = x + [4, 5, 6]      # y is [1, 2, 3, 4, 5, 6]; x is unchanged
print(y)

[1, 2, 3, 4, 5, 6]


In [None]:
x = [1, 2, 3]
x.append(0)
print(x)

[1, 2, 3, 0]


- You can determine whether a specific element is included in a list by using the 'in' operator.




In [None]:
print(1 in [1, 2, 3])  # True
print(0 in [1, 2, 3])  # False

True
False


- You can create a sequence of consecutive numbers using the range() function.
- However, since the sequence created by range() is not a list, you must convert it using the list() function if you want to use it like a list.




In [None]:
a = range(10)
print(a)

range(0, 10)


In [None]:
b = list(a)
print(b)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


## Tuples

- A Tuple is a unique data type in Python that has almost the same properties as a list.
- Therefore, you can handle data in the same way as a list.
- The difference between a Tuple and a list is that you cannot modify the contents of a Tuple.
-
Tuples can be created with or without parentheses. Lists use square brackets [].






In [None]:
my_list = [1, 2]
my_tuple1 = (1, 2)
my_tuple2 = 1, 2

print(my_list)
print(my_tuple1)
print(my_tuple2)

my_tuple1 == my_tuple2

[1, 2]
(1, 2)
(1, 2)


True

In [None]:
my_list[1] = 3
print(my_list)

[1, 3]


In [None]:
my_tuple1[1] = 3
print(my_tuple1)

TypeError: 'tuple' object does not support item assignment

- Running the above code will result in a TypeError. In this case, creating an exception allows the program to continue without termination and handle the error.




In [None]:
try:
    my_tuple1[1] = 3
except TypeError:
    print("You cannot modify an element of a tuple.")

print(my_tuple1)

You cannot modify an element of a tuple.
(1, 2)


- Tuples are also used as return values in functions.
- Unlike in most other languages where a function can only return one value, in Python, a tuple can be used to return multiple values.




In [None]:
def sum_and_product(x, y):
    return (x + y), (x * y)

sp = sum_and_product(2, 3)
print(sp)

s, p = sum_and_product(5, 10)
print(s, p)

(5, 6)
15 50


- In the example above, s, p = sum_and_product(5, 10) is an instance of multiple assignment.
- Tuples and Lists allow you to assign elements to multiple variables simultaneously.




In [None]:
x, y = (1, 2)
a, b = [4, 3]
print(x, y, a, b)

1 2 4 3


- When programming, there are often instances where the data contained in variables needs to be swapped.
- In Python, this operation can be performed as below:

In [None]:
# pythonic way
a, b = 10, 5
print(a, b)
a, b = b, a
print(a, b)

10 5
5 10


## Dictionaries

- Dictionaries are one of the most frequently used data types in data processing.
- While lists and tuples only perform the role of storing data in order, dictionaries store data in a key-value format.
- Key-value is a common method we often use for storing data. For example, if you record
  
    Name: Emilio Ferrara  
    Course: COMM-557

- Here, "Name" and "Course" are keys, and the values after the colon are the corresponding values.



In [None]:
empty_dict1 = {}
empty_dict2 = dict()
grades = { "Joel":80, "Tim":90 }

print(empty_dict1)
print(empty_dict2)
print(grades)

{}
{}
{'Joel': 80, 'Tim': 90}


- In Python, dictionaries are collections that store key-value pairs.
- Unlike lists and tuples, where you access elements by their index (an integer), dictionaries allow you to access values using keys, which can be any immutable type, such as strings or numbers.
- It's essential to note that keys are case-sensitive, so 'Name' and 'name' would be treated as different keys in the dictionary.





In [None]:
joel_grade = grades["Joel"]
joel_grade = grades["joel"]

KeyError: 'joel'

In [None]:
try:
    kate_grade = grades["Kate"]
except KeyError:
    print("No Key found.")

No Key found.


- You can also proceed by first checking whether the key exists.




In [None]:
print("Joel" in grades)
print("Kate" in grades)

True
False


- It is better to first check if the key exists using the if-else statement (although we have not yet covered it).
- Depending on the situation, this may be preferable to using an exception.
- (In fact, this method is often used more - it's also a kind of exception handling.)




In [None]:
if "Kate" in grades:
    kate_grades = grades["Kate"]
else:
    print("Kate's grade is not in the list.")

Kate's grade is not in the list.


- Dictionaries in Python also provide a method called get().
- The advantage of using get() is that there is no need for exception handling.
- In the case of an exception, you can specify a default value to return (the default is None).ecified.

In [None]:
print(grades.get("Joel"))
print(grades.get("Kate"))
print(grades.get("Kate", 0))
print(grades.get("Kate", "nokey"))

80
None
0
nokey


- To assign a value to a specific key in a dictionary, you use square brackets [].




In [None]:
grades["Tim"] = 95
grades["Kate"] = 100
grades["Joel"] = 85
print(grades)

{'Joel': 85, 'Tim': 95, 'Kate': 100}


- Going forward, when collecting data, you will mostly handle it in the form of a Dictionary.
- For example, after collecting data from Twitter in JSON format, you would store it in the following manner.




In [None]:
tweet = {
    "user" : "joelgrus",
    "text" : "Data Science is Awesome",
    "retweet_count" : 100,
    "hashtags" : ["#data", "#science", "#datascience", "#awesome", "#yolo"]
}

- The collected data is processed in the following manner.




In [None]:
tweet_keys = tweet.keys()
print(tweet_keys)

dict_keys(['user', 'text', 'retweet_count', 'hashtags'])


In [None]:
tweet_values = tweet.values()
print(tweet_values)

dict_values(['joelgrus', 'Data Science is Awesome', 100, ['#data', '#science', '#datascience', '#awesome', '#yolo']])


In [None]:
tweet_items = tweet.items()
print(tweet_items)

dict_items([('user', 'joelgrus'), ('text', 'Data Science is Awesome'), ('retweet_count', 100), ('hashtags', ['#data', '#science', '#datascience', '#awesome', '#yolo'])])


In [None]:
print("user" in tweet)
print("Data Science is Awesome" in tweet_values)
print("Science" in tweet_values)

True
True
False


- Although not so efficient, you can use dictionary to count the frequency of a value in a list by following code:

In [None]:
document = ["a", "b", "c", "a", "a", "c", "a", "c", "a", "a", "a", "b", "c", "b"]
word_counts = {}
for word in document:
    if word in word_counts:
        word_counts[word] += 1
    else:
        word_counts[word] = 1

print(word_counts)

{'a': 7, 'b': 3, 'c': 4}


- Python's built-in library, collections, offers the Counter feature, allowing for a more effortless way to create counts.




In [None]:
from collections import Counter
word_counts = Counter(document)
print(word_counts)
print(word_counts['c'], word_counts['a'])

Counter({'a': 7, 'c': 4, 'b': 3})
4 7


## Sets

- The data type called "Set" has the characteristics of **not allowing duplicates** and **having no order** to the elements contained within it.




In [None]:
s = set()
s.add(1)
s.add(2)
s.add(2)
print(s)

{1, 2}


In [None]:
my_set = set([1,2,3])

my_set.add(4)
print(my_set)

my_set.update([5,6,7,8,9,10])
print(my_set)

my_set.remove(10)
print(my_set)

{1, 2, 3, 4}
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
{1, 2, 3, 4, 5, 6, 7, 8, 9}


- A set has no order, so you cannot access the data using an index.
- Therefore, to access specific data in a set, you must convert it to a list or a tuple.




In [None]:
my_set = set([1,2,3])
my_list = list(my_set)
my_tuple = tuple(my_set)

print(my_list)
print(my_list[0])
print(my_tuple)
print(my_tuple[0])

[1, 2, 3]
1
(1, 2, 3)
1


In [None]:
set1 = set([1,2,3])
set2 = set([2,4,5,6])
set3 = set() # empty set

# Intersection set (A ∩ B)
print(set1 & set2)
print(set1.intersection(set2))

# Union set (A ∪ B)
print(set1 | set2)
print(set1.union(set2))

# Difference (A − B)
print(set1 - set2)
print(set1.difference(set2))
print(set2.difference(set1))

# Symmetric difference (A ∪ B) - (A ∩ B)
print(set1 ^ set2)

{2}
{2}
{1, 2, 3, 4, 5, 6}
{1, 2, 3, 4, 5, 6}
{1, 3}
{1, 3}
{4, 5, 6}
{1, 3, 4, 5, 6}
