Reference: [DataDaft](https://www.youtube.com/watch?v=2_6O39UdFi0&list=PLiC1doDIe9rCYWmH9wIEYEXXaJ4KAi3jc)

# **Basic math operations**

In [1]:
1 + 2 # 3
5 - 3 # 2
3 + 4 * 5 # 23
((2 + 3) * 3) **2 # 225
100 % 75 # 25

25

In [2]:
import math # Load the math module

In [3]:
# math.log() takes the natural logarithm og its argument:

math.log(2.7182)

0.9999698965391098

In [4]:
# Add a second argument to specify the log base:

math.log(100, 10)     # Take the log base 10 of 100

2.0

In [5]:
# math.exp() raises e to the power of its argument:

math.exp(10)

22026.465794806718

In [6]:
# Use math.sqrt() to take the square root of a number:

math.sqrt(64)

8.0

In [7]:
# Use abs() to get the absolute value of a number. Note abs() is a base Python function

abs(-30)

30

In [8]:
math.pi   # Get the constant pi

3.141592653589793

## Rounding Numbers

In [9]:
# Use round() to round a number to the nearest whole number:

round(22.5483218)

23

In [10]:
# Add a second argument to round to a specified decimal place

round(22.5483218, 1) # round to 1 decimal number

22.5

In [11]:
# Enter a negative number to round to the left of the decimal

round(22.5483218, -1) # round to the 10's place

20.0

In [12]:
# Round down to the nearest whole number with math.floor()

math.floor(22.5)

22

In [13]:
# Round up with math.ciel()

math.ceil(22.2)

23

# **Basic Data Types**

## Integers

In [14]:
type(12)

int

In [15]:
# Check if 12 is an instance of type "int"

isinstance(12, int)

True

In [16]:
1/3 # A third is not a whole number*

0.3333333333333333

In [17]:
type (1/3) # The type of the result is not an int

float

# Float

Floating point numbers (floats) are **numbers with decimal values**.

In [18]:
type(1.0)

float

In [19]:
isinstance(0.333333, float)

True

# Arithmetic operations

In [20]:
5 + 1.0

6.0

In [21]:
int(6.0)

6

In [22]:
float(6)

6.0

Float can also take on a few special values: **Inf**, **-Inf** and **NaN**.
* Inf and -Inf stand for infinity and negative infinity respectively
* NaN stands for "not a number"

which is sometimes used to as a placeholder for missing or erroneous numerical values.

In [23]:
type(float("Inf"))

float

In [24]:
type(float("NaN"))

float

Note: Python contains a third, uncommon numeric data type "complex" which is used to store complex numbers.

# **Booleans**

Booleans or "bools" are **True/False** values that result from logical statements.

In [25]:
type(True)

bool

In [26]:
type(False)

bool

In [27]:
isinstance(False, bool)

True

In [28]:
# Use > and < for greater than and less than:

20 > 10

True

In [29]:
# Use >= and <= for greater than or equal and less than or equal:

20 >= 20

True

In [30]:
# Use == (two equal signs in a row) to check equality:

30 == 30

True

In [31]:
40 == 40.0  # Equivalent ints and floats are considered equal

True

In [32]:
# Use != to check inequality (!= as "not equal to")

15 != 20

True

In [33]:
# Use the keyword "not" for negation:

not False

True

In [34]:
# Use the keyword "and" for logical and:

(20 > 15) and (10 > 14)

False

In [35]:
# Use the keyword "or" for logical or:

(10 >14) or (20 > 15)

True

In [36]:
20 > 15 or 10 > 14 and not True

True

In [37]:
((20 > 15) or (10 > 14)) and (not True)

False

You can convert numbers into boolean values using the **bool()** function. All numbers other than 0 convert to True:

In [38]:
bool(1)

True

In [39]:
bool(3)

True

In [40]:
bool(0)

False

# **Strings**

Text data is known as a string or "**str**".

In [41]:
type("car")

str

In [42]:
type('car')

str

# **None**

In Python, "**None**" is a special data types that is often used to represent a missing value.

For example, if you define a function that does not return anything (does not give you back some resulting value), it will return "None" bu default.

In [43]:
type(None)

NoneType

In [44]:
# Define a function that prints the input but returns nothing*

def my_function(x):
    print(x)
    
my_function("hello") == None  # The output of my_function equals None

hello


True

# **Variables**

In Python, assign variables using "**=**".

In [45]:
x = 10
y = "This is a string"
z = 144**0.5 == 12

print(x)
print(y)
print(z)

10
This is a string
True


In [46]:
p = 23
print(p)

23


In [47]:
x + z + p

34

In [48]:
n = m = 4
print(n)
print(m)

4
4


In [49]:
# Below is a method of extracting variables from a comma separated sequence, also known as "tuple unpacking"

x, y, z = (10, 20, 30)

print(x)
print(y)
print(z)

10
20
30


In [50]:
# You also can swap the values of two variables using a similar syntax:

(x, y) = (y, x)

print(x)
print(y)

20
10


In [51]:
x = "HELLO"   # Create a new string
y = x         # Assign y the same object as x
y = y.lower() # Assign y the result of y.lower()

print(x)
print(y)

HELLO
hello


In [52]:
x = [1, 2, 3]    # Create a new list
y = x            # Assign y the same object as x
y.append(4)      # Add 4 to the end of list y

print(x)
print(y)

[1, 2, 3, 4]
[1, 2, 3, 4]


# **Lists**

To construct a list is with a comma separated sequence of objects within **square brackets** "[ ]":

# List Basics

In [53]:
my_list = ["Lesson", 5, "Python", True]

print(my_list)

['Lesson', 5, 'Python', True]


Alternatively, you can construct a list by passing some other iterable into the list() function.

In [54]:
second_list = list("Lists of Python")  # Create a list from a string

print(second_list)

['L', 'i', 's', 't', 's', ' ', 'o', 'f', ' ', 'P', 'y', 't', 'h', 'o', 'n']


In [55]:
# A list with no content is known as the empty list:

empty_list = []

print(empty_list)

[]


You can add new an item to an existing list with the **list.append()** function:

In [56]:
empty_list.append("It is now not empty")

print(empty_list)

['It is now not empty']


Remove a matching item from a list with **list.remove()** function:

In [57]:
my_list.remove(5)

print(my_list)

['Lesson', 'Python', True]


Join two lists together with the **+** operator:

In [58]:
combined_list = my_list + empty_list

print(combined_list)

['Lesson', 'Python', True, 'It is now not empty']


You can also add a sequence to the end of an existing list eith the **list.extend()** function:

In [59]:
combined_list = my_list
combined_list.extend(empty_list)

print(combined_list)

['Lesson', 'Python', True, 'It is now not empty']


Check the length, maximum, minimum and sum of a list with the **len(), max(), min(), sum()** functions, respectively:

In [60]:
num_list = [1, 3, 3, 5, 5, 5, 7, 9]
print(len(num_list))                 # Check the length
print(max(num_list))                 # Check the max
print(min(num_list))                 # Check the min
print(sum(num_list))                 # Check the sum
print(sum(num_list)/len(num_list))   # Check the mean*

8
9
1
38
4.75


Note: Python does not have a built in function to calculate the mean, but the **NumPy** library it can.

You can check whether a list contains a certain object with the "**in**" keyword:

In [61]:
1 in num_list

True

Add the keyword "**not**" to test whether a list does not contain an object:

In [62]:
1 not in num_list

False

Count the occurrences of an object within a list using the **list.count()** function:

In [63]:
num_list.count(3)

2

Other common list functions include **list.sort()** and **list.reverse()**:

In [64]:
new_list = [1, 5, 8, 2, 4, 8, 6]     # Make a new list

new_list.reverse()                   # Reverse the list
print("Reversed list:", new_list)

new_list.sort()                      # Sort the list
print("Sorted list:", new_list)

Reversed list: [6, 8, 4, 2, 8, 5, 1]
Sorted list: [1, 2, 4, 5, 6, 8, 8]


# **Lists Indexing and Slicing**

In [65]:
another_list = ["Hello", "this", "is", "a", "practice"]

print(another_list[0])
print(another_list[4])

Hello
practice


In [66]:
print(another_list[-1])
print(another_list[-3])

practice
is


Supplying an index outside of a lists range will result in an **IndexError**:

In [67]:
print(another_list[6])

IndexError: list index out of range

If your list contains other indexed objects, you can supply additional indexes to get items contained within the nested objects:

In [None]:
nested_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

print(nested_list[0][2])
print(nested_list[1][1])
print(nested_list[2][1])

You can take a slice (sequential subset) of a list using the syntax **[start:stop:step]** where:
* **start** and **stop** are the starting and ending indexes for the slice
* **step** controls how frequently you sample values along the slice (default = 1; meaning you take all values in the range provided, starting from the list, up to but not including the last)

In [None]:
my_slice = another_list[1:3]    # Slice index 1 and 2
print(my_slice)

In [None]:
# Slice the entire list  but use step size 2 to get every other item:

my_slice = another_list[0:6:2]
print(my_slice)

You can leave the starting or ending index blank to slice from the beginning or up to the end of the list respectively:

In [None]:
slice1 = another_list[:3]    # Slice everything up to index 4
print(slice1)

In [None]:
slice2 = another_list[3:]    # Slice everything from index 3 to the end
print(slice2)

If you provide a negative number as the step, the slice steps backward:

In [None]:
# Take a slice starting at index 4, backward to index 2

my_slice = another_list[4:2:-1]
print(my_slice)

If you do not provide a start or ending index, you slice of the entire list:

In [None]:
my_slice = another_list[:]   # This slice operation copies the list
print(my_slice)

Using a step of -1 without a starting or ending index slices the list in reverse, providing a shorthand to reverse a list:

In [None]:
my_slice = another_list[::-1]  # This slice operation reverses the list
print(my_slice)

You can use indexing to **change the values within a list** or **delete items in a list**:

In [None]:
another_list[3] = "basic"   # Set the value at index 3 to "new"
print(another_list)

del(another_list[3])        # Delete the item at index 3
print(another_list)

You can also remove items from a list using **list.pop()** function.

pop() removes the final item in a list and returns it:

In [None]:
next_item = another_list.pop()

print(next_item)
print(another_list)

Notices that the list resizes itself dynamically as you **delete** or **add** items to it. Appending items to lists and removing items from the end of list with **list.pop()** are very fast operations. Deleting or inserting items at the front of a list or within the body of a list is much slower.

# Copying Lists

We can slice an entire list using the [:] indexing operation.

You can also copy a list using the **list.copy()** function:

In [None]:
list1 = [1, 2, 3]        # Make a list
list2 = list1.copy()     # Copy the list
list1.append(4)          # Add an item to list 1

print("List1:", list1)
print("List2:", list2)

The copy was not affected by the append operation we performed on the original list.

The copy function (and slicing an entire list with [:]) creates what is known as **a shallow copy**, which makes a new list where each list element refers to the object at the same position (index) in the original list.

In [None]:
list1 = [1, 2, 3]                      # Make a new list
list2 = ["List within a list", list1]  # Nest it in another list
list3 = list2.copy()                   # Shallow copy list2

print("Before appending to list1:")
print("List2:", list2)
print("List3:", list3, "\n")

list1.append(4)                        # Add an item to list1
print("After appending to list1:")
print("List2:", list2)
print("List3:", list3)

We use a shallow copy on list2, the second element of list2 and its copy both refer to list1. Thus, when we append a new value into list1, the second element of list2 and the copy, list3, both change.

When you are working with nested lists, you have to make a **deepcopy** if you want to truly copy nested objects in the original to avoid this behavior of shallow copies.

You can make a deep copy using the **deepcopy()** function in the copy library:

In [None]:
import copy                            # Load the copy module

list1 = [1, 2, 3]                      # Make a new list
list2 = ["List within a list", list1]  # Nest it in another list
list3 = copy.deepcopy(list2)           # Deep copy list2

print("Before appending to list1:")
print(list1)
print("List2:", list2)
print("List3:", list3, "\n")

list1.append(4)                        # Add an item to list1
print("After appending to list1:")
print(list1)
print("List2:", list2)
print("List3:", list3)

# **Tuples and Strings**

# Tuples

Tuples are an immutable sequence data type that are commonly used to hold short collections of related data.

For instance, if you are wanted to store latitude and longitude coordinates for cities, tuples might be a good choice due to its values are related and not likely to change.

Same as lists, tuples can store objects of different types.

Construct a tuple with a comma separated sequence of objects within **parenthess** "( )":

In [None]:
my_tuple = (1, 3, 5)
print(my_tuple)

Alternatively, you can construct a tuple by passing an iterable into the **tuple()** function:

In [None]:
my_list = [2, 3, 1, 5, 4]

another_tuple = tuple(my_list)
another_tuple

Tuples generally support the same indexing and slicing operations as lists and they also support some of the same functions, with the caveat that **tuples cannot be changed after they created**.

This means we can do things like find the length, max or min of a tuple, but **we cannot append new values to them** or **remove values from them**:

In [None]:
another_tuple[2]    # You can index into tuples

In [None]:
another_tuple[2:4]   # You can slice tuples

In [None]:
# You can use common sequence functions on tuples:

print(len(another_tuple))
print(min(another_tuple))
print(max(another_tuple))
print(sum(another_tuple))
print(sum(another_tuple)/len(another_tuple))

In [None]:
another_tuple.append(2)   # You cannot append to a tuple

In [None]:
del another_tuple[1]   # You cannot delete from a tuple

# String

Strings are technically sequences: immutable sequences of text characters.

As sequences, they support indexing operations where the first character of a string is index 0. This means we can get individual letters or slices of letters with indexing:

In [None]:
my_string = "Basics Python for Data Analysis"
my_string[5]   # Get the character at index 5

In [None]:
my_string[4::]   # Slice from the forth index to the end

In [None]:
my_string[::-1]   # Reverse the string

In [None]:
len(my_string)

In [None]:
my_string.count("a")   # Count the a's in the string

As immutable objects, **you cannot change a string itself**:

Every time you transform a string with a function, Python makes a new string object, rather than actually altering the original string that exists in yout computer's memory.

Strings have many associated functions. Some basic string functions include:

In [None]:
# str.lower()

my_string.lower()   # Make all characters lowercase

In [None]:
# str.upper()

my_string.upper()   # Make all characters uppercase

In [None]:
# str.title()

my_string.title()   # Make the first letter of each word capitalize

Find the index of the first appearing substring within a string using **str.find()**.

If the substring does not appear, find() returns -1:

In [None]:
my_string.find("C")

In [None]:
my_string.find("c")

Find and replace a target substring within a string using **str.replace()**:

In [None]:
my_string.replace("for", "4")   # Substring "for" is replaced by "4"

Split a string into a list of substrings based on a given separating character with **str.split()**:

In [None]:
my_string.split()   # str.split() splits on spaces by default

In [None]:
my_string.split("a")   # Supply a substring to split

Split a multi-line string into a list of lines using **str.splitlines()**:

In [None]:
multiline_string = """Basics Python
for
Data Analysis"""

multiline_string.splitlines()

Strip leading and trailing characters from both ends of a string with **str.strip()**:

In [None]:
# str.strip() removes whitespace by default

"            strip white space!         ".strip()

Override the default by supplying a string containing all characters you would like to strip as an argument to the function:

In [None]:
"xXxBuyNOWxxXXxXxXXXXx".strip("xX")

You can strip characters from the left or right sides only with **str.lstrip()** and **str.rstrip()** respectively.

To join or concatenate two strings together, you can use the **plus** (+) operator:

In [None]:
"Basics " + "Python " + "for " + "Data " + "Analysis"

Convert a list of strings into a single string separated by a given delimiter with **str.join()**:

In [None]:
" ".join(["Basics", "Python", "for", "Data", "Analysis"])

In [None]:
name = "Jim"
age = 20
city = "Tokyo"

For complex string operations of this sort is preferable to use the **str.format()** function or formatted strings.

str.format() takes in a template string with **curly braces** "{ }" as placeholders for values you provide to the function as the arguments. The arguments are then filled into the appropriate placeholders in the string:

In [None]:
template_string = "My name is {}, I am {} years old and live in {}"

template_string.format(name, age, city)

Formatted strings or **f-string** for short are an alternative, relatively new (as of Python version 3.6) method for string formatting.

F-strings are strings prefixed with "f" (or "F") that allow you to insert existing variables into string by name by placing them within curly braces:

In [None]:
# Remaking template_string using an f-string

template_string = (f"My name is {name}, I am {age} years old and live in {city}")
print(template_string)

*Notes: **lists, tuples and strings appear everywhere in Python code**, so it is essential to understand tha basics of how they work before we can start using Python for data analysis.*

# **Dictionaries and Sets**

# Dictionaries

A dictionary (dict) is an object that maps a set of named indexes called **keys** to a set of corresponding values.

Dict are mutable, so you can add and remove keys and their associated values. **A dictionary's keys must be immutable objects, such as ints, strigs or tuples**, but the values can be anything.

Create a dictionary with a comma-separated list of **key: value pairs** within **curly braces** "{ }":

In [None]:
my_dict = {"name": "Jim",
          "age": 20,
          "city": "Tokyo"}

print(my_dict)

Notices that in the printed dictionary, the items do not appear in the same order as when we defined it, since dictionaries are unordered. Index into a dictionary using keys rather than numeric indexes:

In [None]:
my_dict["name"]

Add new items to an existing dictionary with the following syntax:

In [None]:
my_dict["new_key"] = "new_value"

print(my_dict)

In [None]:
del my_dict["new_key"]

print(my_dict)

Check the number of items in a dictionary with **len()**:

In [None]:
len(my_dict)

Check whether a certain key exists with "**in**":

In [None]:
"name" in my_dict

You can access
* *all the keys* with the **.keys()**
* *all the values* with the **.values()**
* *all the key: value pairs of a dictionary* with the **.items()**

In [None]:
my_dict.keys()

In [None]:
my_dict.values()

In [None]:
my_dict.items()

Real world data often comes in the form tables of rows and columns, where each column specifies a different data feature like name or age and each row represents an individual record.

We can encode this sort of tabular data in a dictionary by **assigning each column label a key and then storing the column values as a list**.

Consider the following data:

| name    | age  | city   |
| :---    | :--- | :---   |
| Jim     | 20   | Tokyo  |
| Shindi  | 24   | Berlin |
| Thomas  | 28   | London |


We can store this data in a dictionary like so:

In [None]:
my_table_dict = {"name": ["Jim", "Shindi", "Thomas"],
                "age": [20, 24, 28],
                "city": ["Tokyo", "Berlin", "London"]}

Certain data formats like XML and JSON have a **non-tabular, nested structure**.

Python dictionaries can contain other dictionaries, so they can mirror this sort of nested structure, providing a convenient interface for working with these sorts of data formats in Python.

# Sets

Sets are **unordered, mutable collections of immutable objects that cannot contain duplicates**.

Sets are useful for **storing and performing operations on data** where **each value is unique**.

Create a set within a comma-separated sequence of values within **curly braces** "{ }":

In [None]:
my_set = {1, 2, 3, 4, 5, 6, 7}

type(my_set)

Add and remove items from a set with **.add()** and **.remove()** respectively:

In [None]:
my_set.add(8)

print(my_set)

In [None]:
my_set.remove(4)

print(my_set)

Set do not support indexing, but they do support basic sequence functions like **len(), min(), max() and sum()**.

You can also check membership and non-membership as usual:

In [None]:
6 in my_set

In [None]:
4 in my_set

One of the main purposes of sets is to perform set operations that compare or combine different sets. Python sets support many common mathematical set operations like union, intersection, difference and checking whether one set is a subset of another:

It means that sets do not have duplicate data.

In [None]:
set1 = {1, 3, 5, 7}
set2 = {1, 2, 3, 4}

set1.union(set2)    # Get the union of two sets

In [None]:
set1.intersection(set2)

In [None]:
set2.difference(set1)

In [None]:
{2, 4}.issubset(set2)   # Check whether set1 is a subset of set2

You can convert a list into a set using the **set()** function.

Converting a list to a set drops any duplicate elements in the list. This can be a useful way to strip unwanted duplicate items or count the number of unique elements in a list. It can also be useful to convert a list to a set if you plan to lookup items repeatedly, since membership lookups are faster with sets than lists.

In [None]:
my_list = [1, 1, 2, 2, 2, 2, 2, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6]

set(my_list)

In [None]:
my_list = [1, 1, 2, 2, 2, 2, 2, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6]

list(set(my_list))