<img src="https://www.python.org/static/community_logos/python-powered-w-200x80.png" style="float: left; margin: 20px; height: 55px">

# Python Basics - Data Types

_Author: Alfred Zou_

---

### Python Introduction

This is a quick introduction to coding in python. It will explore the basic 8 data types. Depending on their inherent properties, we can understand how to manipulate them.

# 8 Basic Data Types
---
* We need to be careful in selecting the correct datatype for the type of information we want to store. This will affect how we can maniuplate and interact with the data.
* Datatypes can be visualised as buckets that store a certain type of information
* The 8 Basic Data Types:
    * Integer: stores whole numbers
    * Float: stores numbers with decimal points
    * Boolean: stores True or False
    * String: stores text or a collection of letters
    * List: stores a collection of variables
    * Tuple: stores a collection of variables. Tuples cannot be changed
    * Sets:  stores a collection of unique variables
    * Dictionaries: stores a collection of key and value pairs

### Variable Assignment
* Firstly, what is a variable in Python? Well pretty much everything, the concept of constant doesn't really exist in Python
* Unlike other programing languages when we create/assign a variable a value, we don't need to provide Python with a datatype
* In python, when creating a variable or assigning it a value, we always place the variable on the left and assign it to the value on the right

In [53]:
# An example of left assignment
a = 3
b = 4
b = a
b

3

In [54]:
# We can also assign multiple variables at the same time
c,d = 5,6
c,d

(5, 6)

### Numerical Datatypes - Integers and Floats

* There are two ways to store numerical data
* Integers are for whole numbers while floats are for numbers with decimal points
* Use int() & float() functions to convert between the datatypes
* Dividing always converts an integer into a float

In [55]:
# An integer can be a negative number, as long as its a whole number
random_int = -5
type(random_int)

int

In [56]:
random_float = 1.99999999
type(random_float)

float

In [57]:
# Turning a float into an integer removes the decimal point but does not round it
float_to_int = int(random_float)
print(float_to_int)

1


In [58]:
int_to_float = float(random_int)
print(int_to_float)

-5.0


In [59]:
# Even when dividing by two integers, the output is always a float
print(5/1)
type(5/1)

5.0


float

### Numerical Operations

Numerical operavtors include `+`, `-`, `*`, `/`, `**`, `//`, `%`

`**` is the symbol for the power operator  
`/` dividing and integer always creates a float  
`//` is called the quotient or the number of times a dividing number goes into the number. i.e. 11`//`2 = 5. The remainder is captured by the modulus
`%` is called the modulus or the remainder. Useful for checking if a number is divisible by another when `%` = 0. i.e. is a number even (`%`2 = 0) or odd (`%`2 = 1)?

In [60]:
# 5 to the power of 3
5**3

125

In [61]:
# How many times is 574 divisible by 7?
# Is 574 divisible by 7? If the modulus is 0 then it is
print(574//7)
print(574%7)

82
0


In [62]:
# When operating on a variable to itself, you can write it in two ways
a = 23
b = 23
a = a + 5
print(a)
b += 5
print(b)

28
28


### Booleans 

* There are two boolean values: True and False
* Booleans are the end result of comparison or boolean operators, which are used in logic. This will be explained later on
* The boolean of most varaibles are True
* Empty values are False

In [63]:
# True Booleans
print(bool(1),bool("as"),bool(["as","sfd"]))
# False Booleans
print(bool(),bool(None),bool(0),bool(""),bool({}),bool(set()),bool(()))

True True True
False False False False False False False


### Collections
* Collections are a collection of variables. These include strings, lists, tuples, sets and dictionaries
* They are all unique and have interesting data type properties that will be explained later on

### Strings

* Strings holds text. They are a collection of letters
* They are iterable, immutable, sortable and can be indexed. These properties will be explained later on

In [64]:
# Create strings with 'string', "string", '''string''', """string"""
print('spam eggs',"234",'''342''',"""24""")
print(type('spam eggs'),type("234"),type('''342'''),type("""24"""))

spam eggs 234 342 24
<class 'str'> <class 'str'> <class 'str'> <class 'str'>


In [65]:
# Quotations are a problem for strings
# Use escape sequence \ or use triple ''' or """ to comment a string with ' and "
print('He said, \"I\'m a dog"' )
print('''He said, "I'm a dog"''')
print("""He said, "I'm a dog""") # This one loses the last " mark, so its not ideal, use the '''string''' format instead

He said, "I'm a dog"
He said, "I'm a dog"
He said, "I'm a dog


In [66]:
# For multi-line text use \n
print("line 1\nline 2")

line 1
line 2


In [67]:
# Strings can go across multiple liknes with """""" or ''''''
print(""" 
Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to
""") 

 
Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to



In [68]:
# Windows machine use a file path using \ while linux based machines use /
# We need to use a raw string r"" to prevent escape sequences
# A common use for this is to referenec a csv folder path, where we can pull out csv files as needed
# Pathlib's Path is used as it can be used for both macs and pcs

from pathlib import Path
csv_path = Path(r"C:\Users\Draciel\Dropbox\General Assembly\Pre-work\Python\csv")
file_path_1 = csv_path / "google_returns.csv"
file_path_2 = csv_path / "female_names.csv"
print(file_path_1)
print(file_path_2)

import pandas as pd
df1 = pd.read_csv(file_path_1)
df2 = pd.read_csv(file_path_2)
print(df1.head())
print(df2.head())

C:\Users\Draciel\Dropbox\General Assembly\Pre-work\Python\csv\google_returns.csv
C:\Users\Draciel\Dropbox\General Assembly\Pre-work\Python\csv\female_names.csv
         Date    Return
0  08/19/2014  0.008073
1  08/18/2014  0.015136
2  08/15/2014 -0.002036
3  08/14/2014 -0.000226
4  08/13/2014  0.021413
   1      Emma
0  2    Olivia
1  3       Ava
2  4  Isabella
3  5    Sophia
4  6       Mia


In [69]:
# Strings can be cancatinated by adding them together and repeated by multiplying with integers
"ba" + 2*"na"

'banana'

#### String Formatting

* There are multiple ways on how to do string formatting with variables, I have  listed the two best methods
* .format is one option but I prefer f strings as they are easier to read

In [63]:
name = 'Karl'
age = 28
perc = 0.45673
print("{} is {} years old, which is {:.2%}".format(name,age,perc))

Karl is 28 years old, which is 45.67%


In [38]:
print(f"{name} is {age} years old, which is {perc:.2%}")

Karl is 28 years old, which is 45.67%


* `%s` string
* `%d` integer
* `%f` float
* `&.#f` float with # number of decimal points
* `&.#%` float with # number of decimal points and percentage sign

### Lists 

* Collection of variables
* They are iterable, mutable, ordered and can be indexed. These properties will be explained later on

In [72]:
random_list = [5,10,3]
print(random_list)

[5, 10, 3]


### Tuples 

* Collection of variables
* They are iterable, immutable, ordered and can be indexed. These properties will be explained later on
* They also take less memory than lists

In [73]:
random_tuple = (1,2,3,"sadsad",3.234243)
print(random_tuple)

(1, 2, 3, 'sadsad', 3.234243)


### Sets

* Collection of unique variables
* They are iterable, mutable, unordered and cannot be indexed. These properties will be explained later on
* Sets do not contain duplicate elements
* Important for membership testing, to see if an element exists in a set

In [74]:
random_set = {1,1,3}
print(random_set)

{1, 3}


In [75]:
1 in random_set

True

### Dictionaries

* Collection of key:value paris
* The keys are unique and do not repeat
* They are iterable, mutable, unordered and can be indexed. These properties will be explained later on

In [2]:
random_dictionary = {"user_id":209,"message":"D5 B4 G6","language":"english","datetime":"12309213092139","location":(44.23423,-131.11)}
print(random_dictionary)

{'user_id': 209, 'message': 'D5 B4 G6', 'language': 'english', 'datetime': '12309213092139', 'location': (44.23423, -131.11)}


In [4]:
# .get() uses an input key to get the associated value, if it can't find anything it will return a default value
print(random_dictionary.get("user_id"))
print(random_dictionary.get("star_sign","no star sign found"))

209
no star sign found


# Datatype Manipulation
---
### Datatype Properties

##### Iterable
* All collections are iterable. The elements within strings, lists, tuples, sets and dictionaries can be iterated on

##### Mutable vs Immutable
* Mutable means ability to be changed. They include lists, sets and dictionaries. i.e. we can add new entries into a dictionary without reassigning a new dictionary
* Immutable means inability to be changed. They include strings and tuples. You cannot change a string or tuple but you can override them with a new string or tuple

##### Ordered
* Sets and dictionaries aren't ordered. 

##### Indexable
* Sets aren't indexable as there is no order
* Strings, list and tuples are indexable by order
* Dictionaries can be indexed by calling the key, returning the respective value

##### Indexable and Mutable
* Only lists and dictionaries are indexable and mutable, this means that the values can be overwritten by assigning a value to the index

##### Indexable and Ordered
* Strings, lists and tuples can be sliced

Datatype Property|String | List | Tuple | Set | Dictionary
---|---|---|---|---|---
Iterable|O|O|O|O|O
Mutable: Ability to add new elements||O||O|O
Immutable|O||O||
Ordered|O|O|O||
Indexable|O|O|O||by key
Indexable and Mutable: Variable assignment||O|||by key
Indexable and Ordered: Slicing|O|O|O||

### Functions & Methods Vs. Selection

##### Functions vs Methods

* How do we operate on these data types? We apply functions and methods
* Functions are called using the "function(data)" format 
* For example print(random_string_variable) is one of Python's built in functions. Users can define their own functions but I will discuss that later
* Methods are called by "data.method()" format and are specific to the data type
* For example random_list.append() is one of Python's built in methods. The .append() method only applies to the list data type. Users can define their own data types called classes, and can assign methods that can be executed on these classes

##### Functions & Methods Vs. Selection

* Functions and methods manipulate data. They are represented by ()
* Indexing and slicing is used to select data. This is represented by []

##### Empty Data

In [88]:
# Creating empty data is useful during for and while loops
empty_list = []
empty_set = set()
empty_dictionary = {}
empty_tuple = ()
empty_string = ""

#### Datatype Properties Demonstration

In [77]:
random_string = "elephant"
print(random_string)
print(random_list)
print(random_set)
print(random_dictionary)
print(random_tuple)

elephant
[5, 10, 3]
{1, 3}
{'user_id': 209, 'message': 'D5 B4 G6', 'language': 'english', 'datetime': '12309213092139', 'location': (44.23423, -131.11)}
(1, 2, 3, 'sadsad', 3.234243)


##### Iterable

In [78]:
# The elements within strings, lists, tuples, sets and dictionaries can be iterated on
def iteration(x):
    empty_string = []
    for i in x:
        empty_string.append(i)
    print(empty_string)
iteration(random_string)
iteration(random_list)
iteration(random_tuple)
iteration(random_set)
iteration(random_dictionary)

['e', 'l', 'e', 'p', 'h', 'a', 'n', 't']
[5, 10, 3]
[1, 2, 3, 'sadsad', 3.234243]
[1, 3]
['user_id', 'message', 'language', 'datetime', 'location']


##### Mutable

In [79]:
# Mutable objects can apply methods that inheriently change them
# Lists, sets and dictionaries are mutable
random_list = [5,10,3]
print(random_list)
random_list.sort()
print(random_list)

[5, 10, 3]
[3, 5, 10]


##### Variable creation issue for mutable collections

In [8]:
# Python does not store values in variables
# Variables are just a reference to an object that stores the value
# When we create the variable a, we are pointing it to the object containing the numbers 1, 2 and 3 
# When we create variabe b, we are pointing it to where a is pointing, or the object containing the numbers 1, 2 and 3 
# When we change the underlying object of variable a and then retrieve variable b, we will find its value changed. This is despite not directly changing variable b
a = [1,2,3]
b = a
print(id(a) == id(b))
a.append(4)
print(a)
print(b)
id(a) == id(b)

True
[1, 2, 3, 4]
[1, 2, 3, 4]


True

In [81]:
# The work around is to create shallow copies
# Deep copies can also be created if lists, sets and dictionaries have nested lists, sets or dictionaries
a = [1,2,3]
b = a[:] # or a.copy()
print(id(a) == id(b))
a.append(4)
print(a)
print(b)
id(a) == id(b)

False
[1, 2, 3, 4]
[1, 2, 3]


False

##### Indexable

In [82]:
# Strings, lists, dictionary and tuples are indexable
print(random_string[2])
print(random_list[2])
print(random_dictionary['user_id'])
print(random_tuple[2])

e
10
209
3


##### Variable assignment: indexable and mutable

In [83]:
# Strings, lists, dictionary and tuples are indexable
# Lists and dictionaries are mutable
# Only lists and dictionary indexes can be assigned new values
random_list[2]=5
random_dictionary['user_id']=5

##### Sliceable: indexable and ordered

In [84]:
# Strings, lists and tuples are sliceable, as they are ordered and indexable
print(random_string[:2])
print(random_list[:2])
print(random_tuple[:2])

el
[3, 5]
(1, 2)


In [85]:
# For slicing you can also skip indexes
list_a = [1,2,3,4,5,6,7,8,9,10]

print(list_a[::2])

[1, 3, 5, 7, 9]


In [86]:
# For slicing you can also reverse
list_a = [1,2,3,4,5,6,7,8,9,10]

print(list_a[::-1])

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
