<div>
<img src="../images/mdsi_logo.png" width="200"/>
</div>

> Author: Dr. Jody-Ann S. Jones, Data Steward
# About Python in eight lines
Python is an object-oriented, general purpose, dynamically typed programming language that emphasizes readability and simplicity.

Python has a style guide that provides best practices on how to write clean, efficient, and readable code,the Python Enhancement Proposal (PEP-8) https://peps.python.org/pep-0008/.

Python code consists of expressions and statements.

An expression is a piece of code that produces a value.

A statement is an instruction that the Python interpreter can execute.

A statement is comprised of variables and operators.

Variables can take on a number of data types.

Operators perform some type of computation or logical evaluation on a value or variable.

Enjoy reading this notebook for more details on these important Python concepts.

> **<span style="color: blue;">PRO TIP: Spaces are very important in Python!</span>**

# Variables
- Variable types do not have to be declared.
- Variables cannot start with numbers (but it's cool if a number falls within the variable name).
- Reserved keywords cannot be used as variables.
- Variables cannot contain dashes.
- Variables cannot contain symbols.
- Variables cannot contain spaces.
- It is best practice to use snake_case for naming of your variables.
- Constants are typically written in all UPPERCASE.
- Functions are typically written in CamelCase.

In [2]:
# Example of a variable

pay_check = 100

In [3]:
# This is a statement

monthly_rent = pay_check/3

In [4]:
# This is an expression

pay_check/3

33.333333333333336

# Operators
In order to perform various operations on values and variables, Python makes use of the following operators:

| Operator      | Function |
| :----------- | ----------- |
|+      | Addition       |
|-   | Subtraction       |
|*      | Multiplication       |
|/   | Division (always returns a float)       |
|//   | Floor Division (always returns an integer rounded down)       |
|**   | Exponent      |
|%   | Modulo (returns the remainder of the division of two numbers)      |
|<   | Less than     |
|>   | Greater than      |
|<=   | Less than or equal to      |
|>=   | Greater than or equal to      |
|!=   | Not equal to/evaluate equality     |
|==   | Equal to/evaulate equality    |
|=   | Assign a variable       |
|and   | Returns True only if BOTH statements are true      |
|or   | Return Trus if ONE statement is true      |
|not   | Reverses the result, returns False if the result of the statement is true       |

# Augmented/Assignment Operators
| Operator      | Function |
| :----------- | ----------- |
|+=      | Addition and assign to a variable      |
|-=   | Subtraction and assign to a variable      |
|*=      | Multiplication and assign to a variable       |
|/=   | Division and assign to a variable       |


For a more comprehensive list of Python operators, please see https://www.w3schools.com/python/python_operators.asp.

# Data Types
The Python language consists of several data types. The most common data types that you will encounter are represented in the table below:

| Data Type     | Description                           | Example                                 | Mutable?                                 |
|---------------|---------------------------------------|-----------------------------------------|-----------------------------------------|
| integers      | Whole numbers                         | 1, 2, 3                                 |immutable
| floats        | Numbers with decimal places           | 1.0, 2.012, 3.333333                    |immutable
| strings       | Ordered sequence of characters        | "Hi, I am Jody."                        |immutable
| lists         | Ordered sequence of objects            | ["oats", "raisins", "tea"]              |mutable
| dictionaries | Unordered sequence of key:value pairs | {"John": 98, "Susan": 75, "Derrick": 100, "Brenda": 50} |mutable
| tuples        | Ordered sequence of immutable objects | (22, 18, 16, 25)                         |immutable
| sets          | Unordered sequence of unique elements | {1, 2, 3, 4, 3, 2}                     |mutable
| booleans      | Evaluates a statement to either True or False | True, False                            |immutable

**Here are some important points to remember about variables in general:**
- Python is a dynamically typed language. This means you do not need to declare your variable type, and you can easily reassign variables (and variable types) at any point in your program.
- Floats take up more memory than integers, therefore, try if a number does not need to represented as a float, it is better to convert it to an integer in order to conserve memory.
- A string is an ordered sequence of characters (emphasis on ordered, hint: they can be iterated throughout).
- Strings are immutable
- You cannot reassign parts of a string. But you can create something brand new.
- Lists are mutable.
- Tuples are immutable


## Integers

- The Python 'int' data type is a fundamental aspect of the language, especially important for data analysis.

- It represents whole numbers, positive or, negative, without a decimal point.

- Dividing integers always return a float value.

- You can also convert integers to floats by using the float() command.

- You can also convert an integer to a string by using the str() command.

> The following cell shows the integer data type in action:

In [1]:
# Basic operations
a = 10
b = 3
sum = a + b
difference = a - b
product = a * b
quotient = a / b
floor_division = a // b
remainder = a % b
power = a ** b

# Type conversions
num_str = str(a)  # Converts to string
str_num = int("123")  # Converts to integer

# Large integer example
large_int = 2 ** 1000  # Demonstrates Python's large integer capability


## Floats

- Floats are another crucial data type for data analysis.
- It represents real numbers using floating point representation.
- Values can be rounded up using the round() function, where the precision of decimal number places can be specified within the parentheses.
- You can convert a float to an integer by using the int() command.
- You can also convert a float to a string using the float() command.

> The following cell shows the float data type in action:

In [2]:
# Basic operations
x = 3.14
y = 2.5
sum = x + y
difference = x - y
product = x * y
quotient = x / y
power = x ** y

# Type conversions
float_to_int = int(x)  # Truncates to 3
int_to_float = float(5)  # Converts to 5.0
str_to_float = float("2.71")  # Converts string to float

# Handling precision
import math
rounded_value = round(x, 2)  # Rounds to 3.14
sqrt = math.sqrt(x)  # Square root of x

# Example of precision issue
result = 0.1 + 0.2  # May not be exactly 0.3 due to floating-point arithmetic

## Strings

- The Python string data type is essential to know when dealing with textual data. In relation to data analysis, this will be particularly helpful when working with NLP and/or LLM models.
- A string is sequence of characters. It is used to store and represent text-based data.
- Strings in Python are immutable, meaning that once they are created, they cannot be changed.
- A string can be created by using either single quotes or double quotes. Multi-line strings can be created using triple quotes.
- Strings can be concatenated (combined) by using the + operator. For large-scale operations, you can also use '.join()' instead of the '+' for greater efficiency.
- Strings can be repeated by using the * operator.
- Strings can be indexed and sliced, i.e. characters can be accessed via an index, and you can slice whole strings into smaller substrings.

Common String Methods include:
- Case Conversion: Methods like .lower(), .upper(), .capitalize().
- Searching: .find(), .index(), .startswith(), .endswith().
- Modification: .strip(), .split(), .replace().
- Joining: .join() for concatenating a sequence of strings.
- Format: .format() or f-strings (e.g., f"{variable}") for string formatting.

> The following cell shows the string data type in action:

In [None]:
# String creation
simple_string = "Hello, World!"
multi_line_string = """This is
a multi-line
string"""

# String operations
concatenated = "Hello, " + "World!"
repeated = "Echo! " * 3
substring = simple_string[7:12]  # 'World'
for char in simple_string:
    print(char)

# String methods
modified = simple_string.lower()
found_index = simple_string.find("World")

# Formatting
formatted_string = f"{simple_string} It's {2023}!"

# Unicode and encoding
unicode_string = "こんにちは"
encoded = unicode_string.encode('utf-8')
decoded = encoded.decode('utf-8')

## Lists
- The Python list data type is a versatile and widely used data structure. It is important in data analysis for its ability to store sequences of elements.
- A list is an ordered collection of items. Lists can contain items of any data type, including other lists.
- Lists are mutable, meaning that their elements can be changes after the list is created.
- Lists are created by using square brackets, [], with items separated by commas.
- Lists can be generated dynamically using list comprehensions or functions like list().
- Lists can be indexed and sliced (just like strings).
- Lists can be combined using '+'.
- Lists can be repeated using '*'
- Lists can be iterated through by using loops such as 'for item in list'

**Common List Methods**
- Adding Items: .append(item) to add to the end, .insert(index, item) to insert at a position.
- Removing Items: .remove(item) removes first occurrence, .pop([index]) removes and returns an item at index.
- Sorting and Reversing: .sort() for in-place sorting, .reverse() to reverse the list.
- Searching: .index(item) to find the index of the first occurrence of an item.

**List Comprehensions**

- Concise Syntax: Provides a concise way to create lists based on existing lists.
Examples: <pre>[x**2 for x in range(10)]</pre> creates a list of square numbers.

> The following cell shows the list data type in action:

In [6]:
# Modified list with only integers
my_list = [1, 2, 3, 4, 5]

# Operations
element = my_list[3]  # Accessing an element
my_list.append(6)  # Adding an element
sliced_list = my_list[1:4]  # Slicing

# Sorting the list
my_list.sort()  # Sorting the list

# List comprehension
squares = [x ** 2 for x in range(10)]

# Copying
import copy
shallow_copy = copy.copy(my_list)
deep_copy = copy.deepcopy(my_list)

## Tuples
- A tuple is an ordered sequence of immutable objects. This is an important data structure when working with object types that you don't want to accidentally change.
- Unlike lists, you cannot add or remove elements, you also cannot re-assign values.
- However, similar to lists, you can mix object types.
- Tuples are created by using parentheses '()'

**Operations**
- Indexing and Slicing: Access elements by index (tuple[index]), slice sub-tuples (tuple[start:end]).
- Concatenation: Combine tuples using +.
- Repetition: Repeat tuples using *.
- Iteration: Iterate through items using loops like for item in tuple:.

**Unpacking**
- Simultaneous Assignment: Tuples allow for the unpacking of values into variables, like x, y, z = my_tuple.
- Extended Unpacking: In Python 3, extended unpacking can be used with a * to grab excess items.

**Common Operations**
- Count and Index: .count(value) to count occurrences, .index(value) to find the index of the first occurrence.


> The following cell shows the tuple data type in action:


In [7]:
# Creating a tuple
my_tuple = (1, 2, 3, "Python")

# Single element tuple
single_element_tuple = (4,)

# Operations
element = my_tuple[2]  # Accessing an element
concatenated_tuple = my_tuple + (5, 6)  # Concatenation

# Unpacking
a, b, c, d = my_tuple

# Extended unpacking (Python 3)
first, *rest = my_tuple

# Common methods
count = my_tuple.count(1)  # Counting occurrences of 1
index = my_tuple.index("Python")  # Finding the index of "Python"

## Sets

- The set data structure is an unordered collection of unique items. Sets are used to store distinct values and support mathematical set operations like union, intersection, difference, etc.
- It is important to remember that sets do not hold order, neither do they store duplicate values.
- Sets are mutable, meaning their elements can be changed, but the elements themselves must be immutable.
- Sets are created by using curly braces, {} with items separated by commas. An empty set is created using 'set()'. You can also use the set() command to convert lists, tuples, etc. to sets.

**Operations**
- Adding Items: .add(item) to add a single element, .update([items]) to add multiple elements.
- Removing Items: .remove(item) removes an element (raises an error if not present), .discard(item) removes an element without raising an error.
- Membership Testing: Use in to test for membership.
- Iteration: Iterate through set items using loops like for item in set:.

**Set Operations**
- Union: set1 | set2 or set1.union(set2) to get all elements in either set.
- Intersection: set1 & set2 or set1.intersection(set2) to get elements common to both sets.
- Difference: set1 - set2 or set1.difference(set2) to get elements in set1 not in set2.
- Symmetric Difference: set1 ^ set2 or set1.symmetric_difference(set2) for elements in either set but not in both.

> The following cell shows the set data structure in action:

In [None]:
# Creating a set
my_set = {1, 2, 3, "Python"}

# Adding elements
my_set.add(4)
my_set.update([5, 6])

# Removing elements
my_set.discard(6)
my_set.remove(5)

# Set operations
another_set = {3, 4, 5}
union_set = my_set | another_set
intersection_set = my_set & another_set
difference_set = my_set - another_set
symmetric_difference_set = my_set ^ another_set

# Iteration
for item in my_set:
    print(item)


## Dictionaries

- A dictionary in Python is an unordered collection of data in a key-value pair format. It's used to store data values like a map.
- Dictionaries are mutable, which means you can add, remove, or modify key-value pairs after the dictionary is created.
- Dictionaries are created using curly braces {} with key-value pairs separated by commas, and keys and values separated by colons.
- They can also be created dynamically using the dict() constructor with lists or tuples of key-value pairs.

***Operations***
- Accessing Values: Access values using keys (dict[key]), get 'None' or a default value if a key is not found with dict.get(key, default).
- Adding/Updating Items: Add or update key-value pairs using dict[key] = value.
- Removing Items: Remove items with del dict[key], or use .pop(key) to remove and return the value.

***Key Characteristics***
Uniqueness of Keys: Each key must be unique in a dictionary. Adding a value to an existing key will overwrite the old value.
Immutability of Keys: Keys must be of an immutable data type (e.g., strings, numbers, tuples with immutable elements).

***Iteration and Views***
- Iterating: Iterate over keys, values, or key-value pairs using dict.keys(), dict.values(), and dict.items().
- Dictionary Views: These methods return 'view' objects that reflect the current state of the dictionary.

***Common Methods***
- .keys(): Returns a view of the dictionary's keys.
- .values(): Returns a view of the dictionary's values.
- .items(): Returns a view of the dictionary's key-value pairs.
- .update(other_dict): Updates the dictionary with elements from another dictionary or iterable.

> The following cell shows the dictionary data structure in action:

In [None]:
# Creating a dictionary
my_dict = {"name": "Alice", "age": 30, "location": "Wonderland"}

# Accessing and modifying
age = my_dict["age"]  # Accessing value
my_dict["age"] = 31   # Updating value
my_dict["email"] = "alice@example.com"  # Adding new key-value pair

# Removing items
email = my_dict.pop("email")  # Removes and returns the value
del my_dict["location"]       # Removes key-value pair

# Iteration
for key in my_dict.keys():
    print(key)
for value in my_dict.values():
    print(value)
for key, value in my_dict.items():
    print(key, value)

# Using get()
profession = my_dict.get("profession", "Unknown")


# Control Flow
Control flow allows you to direct the order in which you would like specific actions in your program to be performed. Mastering control flow will allow you to create programs that can make decisions, repeat actions, and handle varying situations. There are six main ways you can control the flow of your program. These include: 1) if statements, 2) for loops, 3) while loops, 4) break and continue statements, 5) try and except blocks and 6) pass statements. The following cells will illustrate how each of these is implemented.

## if statements
if statements allow your program to execute certain code only if a particular condition is true. You can expand this with elif (short for 'else if') and else to handle multiple conditions.

In [3]:
data_category = 'numeric'

if data_category == 'numeric':
    print("Perform numerical analysis.")
elif data_category == 'textual':
    print("Perform text analysis.")
else:
    print("Unknown data category.")

Perform numerical analysis.


## for loops
for loops are used for iterating over a sequence (like a list, tuple, dictionary, set, or string). for loops are very useful for performing an action a certain number of times using range().

In [2]:
dataset = [10, 20, 30, 40, 50]

for data_point in dataset:
    processed_value = data_point * 2  # Example processing
    print(f"Processed Data: {processed_value}")

Processed Data: 20
Processed Data: 40
Processed Data: 60
Processed Data: 80
Processed Data: 100


## while loops
A while loop repeats as long as a specified condition is true. It’s crucial to ensure the loop has a breaking condition to avoid an infinite loop.

In [4]:
threshold = 100
current_sum = 0
numbers = [10, 20, 30, 40, 50]

i = 0
while current_sum < threshold and i < len(numbers):
    current_sum += numbers[i]
    i += 1
print(f"Sum reached: {current_sum}")

Sum reached: 100


## break and continue
- break is used to exit a loop prematurely.
- continue skips the current iteration and continues with the next iteration.
- These are useful for more complex control flows within loops.

In [5]:
data_points = [1, -2, 3, -4, 5]

for point in data_points:
    if point < 0:
        continue  # Skip negative numbers
    if point > 4:
        break     # Stop the loop if the number is greater than 4
    print(point)

1
3


## try and except blocks
Python uses try-except blocks for error and exception handling.
This allows the program to respond appropriately to errors, rather than crashing.

In [6]:
data = "100"

try:
    print(int(data) / 0)
except ZeroDivisionError:
    print("Division by zero error.")
except ValueError:
    print("Invalid input for conversion.")

Division by zero error.


## pass statements
pass is a null statement in Python. It's often used as a placeholder for future code.
When the pass statement is executed, nothing happens, but it avoids a syntax error when empty code is not allowed.

In [7]:
for data in range(5):
    # Future implementation of data processing
    pass

# Functions

- Whenever you need to re-use a particular aspect of a code several times in your program, it is best to use a function. A function allows you to organize your code better and reuse blocks of code more easily.
- In Python, you create a function by using the def keyword, followed by the name of the function then ():
- Inside the function body, you write the instructions that you would like that specific block of code to perform.

> For example:

In [2]:
def HelloWorld():
    print("Hello World!")

After you create (or define) a function, you can call it anywhere in your program by typing the name of the function followed by ():

> For example:

In [3]:
HelloWorld()

Hello World!


You can use f-strings in functions in order to customize a greeting.

For example:

In [10]:
def HelloWorld(name):
    print(f"Hello {name}!")

In [9]:
HelloWorld("Jody")

Hello Jody!


A function may also take parameters, perform some computation on these parameters and return a result.

> For example:

In [4]:
def add_two_numbers(a, b):
    return a + b

In [6]:
add_two_numbers(454, 10)

464

# Classes
A class in Python is like a blueprint for creating objects. An object is an instance of a class. Classes encapsulate data and functions that operate on that data. The data is held in attributes, and functions are represented as methods within the class.

>**Basic Structure of a Class**

**Defining a Class:** Use the class keyword to define a class. Class names typically follow the CapitalizedWords convention.
Initialization Method: The __init__ method is called when an instance (object) of the class is created. It's used to initialize the attributes of the class.

**Attributes and Methods:** Attributes are variables that belong to a class. Methods are functions defined within a class and can alter the object's state.

>**Creating an Object**

**Instantiation:** Creating an object of a class is known as instantiation.

**Constructor:** The constructor (__init__ method) sets up the initial state of the object.

>**Inheritance**

**Extending Classes:** Inheritance allows a class to inherit attributes and methods from another class.
Parent and Child Classes: The class being inherited from is called the parent class, and the class that inherits is the child class.

>**Encapsulation**

**Private and Public:** Encapsulation involves keeping the internal representation of an object hidden from the outside. In Python, there's no strict enforcement of private and public members, but a convention: Prefix attributes or methods with an underscore (_) for 'protected' or with double underscores (__) for a stronger 'private' convention.

>**Polymorphism**

**Method Overriding:** Polymorphism in OOP allows methods to have the same name but behave differently in different contexts (e.g., in different classes).

> N.B. In Python, not everything needs to be a class. Sometimes, a simple function or a module is more appropriate.

An example of a simple class is illustrated in the cell below:

In [3]:
class Employee:
    def __init__(self, name, position):
        self.name = name
        self.position = position

    def display_info(self):
        return f"Employee Name: {self.name}, Position: {self.position}"

# Creating an instance of the class
emp = Employee("Alice", "Data Scientist")
print(emp.display_info())

Employee Name: Alice, Position: Data Scientist


## Should I use a Function or a Class?
Deciding whether to create a class or a function in Python depends on the nature of the problem you're solving and the structure of your code. Here are some key considerations to help you make that decision:

### When to Use a Function:

1. **Single, Isolated Task**: If your task is straightforward and can be encapsulated in a single, isolated operation, a function is typically sufficient. Functions are ideal for tasks that take inputs, process them, and return outputs without needing to maintain state or track data across multiple calls.

2. **Statelessness**: Functions are stateless, meaning they don't retain information between calls unless explicitly designed to do so (like using global variables, which is generally discouraged). If you don't need to track state, a function is a good choice.

3. **Reusability and Modularity**: If you have a piece of code that you need to reuse multiple times in different contexts, encapsulating it in a function can make your code more modular and maintainable.

### When to Use a Class:

1. **Managing State**: If you need to maintain and update the state over time, a class is the way to go. Objects of a class can hold state and have behaviors (methods) that modify that state.

2. **Grouping Related Data and Functions**: Classes allow you to logically group related data (attributes) and functions (methods) that operate on that data. This encapsulation makes code easier to understand and manage.

3. **Inheritance and Polymorphism**: If you're implementing a system where you need to create objects with similar yet distinct behaviors, classes are beneficial. They allow for inheritance and polymorphism, enabling you to create a base class with shared functionality and derived classes with specialized behaviors.

4. **Creating Complex Data Structures**: When building complex data structures (like trees, graphs, etc.), using classes to represent nodes or other elements can make the implementation more intuitive and maintainable.

5. **Modeling Real-world Entities**: Classes are ideal for modeling real-world entities and relationships in your code, especially when those entities have both data and behaviors (methods) associated with them.

### Practical Example: Data Science Context

- **Use a Function** for something like a statistical calculation, where you pass in data and get a result back without needing to track anything between calls (e.g., a function to calculate the mean of a dataset).

- **Use a Class** for building a machine learning model where you need to maintain the state (like the model parameters), and provide different functionalities (like fit, predict, update model parameters). The class can encapsulate all these aspects in a single, coherent entity.

### Conclusion

The choice between a class and a function boils down to the scope and nature of the problem. Use functions for stateless, single-purpose tasks, and use classes when you need to model entities with state and behavior or when you need inheritance and polymorphism. In many real-world applications, you'll find a mix of both, with classes leveraging functions for specific tasks and functions using classes to manage and manipulate complex data.

# Special Topics

## List Comprehensions

## Lambda Functions

## Decorators

## Generators
Generators are a special type of iterator in Python, used to iterate over sequences of data without needing to create and store the entire sequence in memory at once. They are particularly useful for working with large datasets or streams of data where you want to process **<font color="blue">one item at a time</font>**.

**yield Keyword:** The core of a generator is the yield statement. When a function contains at least one yield statement, it becomes a generator. Instead of returning a value and exiting like a regular function, a generator yields a value and pauses its state. It resumes from there when the next value is requested.

**Iteration:** Generators are iterable, so you can use them in loops or anywhere an iterator is accepted.

**Stateful:** Unlike regular functions, generators maintain their state between iterations. This means they remember where they left off last.

An example of a simple generator is illustrated in the cell below:

In [1]:
def simple_number_generator(max):
    n = 1
    while n <= max:
        yield n
        n += 1

# Using the generator
for number in simple_number_generator(5):
    print(number)


1
2
3
4
5


### An example with a generator expression
Generator expressions are a concise way to create generators. They are similar to list comprehensions but with parentheses instead of square brackets.

In [2]:
squares = (x*x for x in range(10))

# Using the generator expression
for square in squares:
    print(square)


0
1
4
9
16
25
36
49
64
81


## Map, Filter, Reduce