<div class='bar_title'></div>

*Introduction to Data Science*

# Python Basics

Gunther Gust<br>
Chair for Enterprise AI

Winter Semester 24/25

<img src='https://github.com/GuntherGust/tds2_data/blob/main/images/d3.png?raw=true' style='width:20%; float:left;' />

# Learning Objectives for today

At the end of today's lecture, you...
- have familiarized yourself with __programming basics__ in python, including:
    - Data types in python
    - Variables
    - Collections of items/variables
    - Functions
    - Control flow statements
        - If-else 
        - Loops
- you have used the basics to solve small __programming exercises__
- you now how to __use jupyter notebooks__


## Sources and recommended further reading

We use material from the following sources:

### Books
- Molin, S. (2021). [Hands-on data analysis with pandas: A Python data science handbook for data collection, wrangling, analysis, and visualization](https://github.com/stefmolin/Hands-On-Data-Analysis-with-Pandas-2nd-edition?tab=readme-ov-file) (2nd ed.). Packt Publishing.
- VanderPlas, J. (2017). [Python data science handbook: Essential tools for working with data](https://jakevdp.github.io/PythonDataScienceHandbook/) O'Reilly Media.
- McKinney, W. (2022). [Python for data analysis: Data wrangling with pandas, NumPy, and Jupyter](https://wesmckinney.com/book/) (3rd ed.). O'Reilly Media.

### Online courses
- __Datacamp:__: [Behind this link](https://www.datacamp.com/groups/shared_links/f2bd0645df4ccb93be4dc108c49f5e054873674776dbec89d0b9b344f322824f) we will make useful practice exercises on datacamp.com available 
- __Udemy:__ Pierian Data. (n.d.). *Python for data science and machine learning bootcamp* [Online course](https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/) 




<img src="images/practice_programming.jpg" alt="Your Image" style="width:50%">

# Agenda

- ## Data types in python
- ## Variables
- ## Collections of items/variables
- ## Functions
- ## Control flow statements

In [5]:

import sys # imports Python's sys module, 
# which provides access to some variables and functions 
# that interact with the Python runtime environment.

import os.path    # provides functions for handling and manipulating file paths

sys.path.append(os.path.abspath('../')) #

In [96]:
!pip freeze >> requirements.txt

# pip freeze: lists all installed Python packages in the current 
# environment, along with their versions.
#>> requirements.txt: This part appends (>>) the output of pip freeze 
# to a file named requirements.txt. The requirements.txt file is commonly 
# used in Python projects to list dependencies, which can then be installed 
# in another environment using pip install -r requirements.txt.

## Basic data types
### Numbers
Numbers in Python can be represented as integers (e.g. `5`) or floats (e.g. `5.0`). We can perform operations on them:

In [1]:
5 + 6

11

In [2]:
2.5 / 3

0.8333333333333334

### Booleans

We can check for equality giving us a Boolean:

In [3]:
5 == 6

False

In [4]:
5 < 6

True

These statements can be combined with logical operators: `not`, `and`, `or`

In [5]:
(5 < 6) and not (5 == 6)

True

In [6]:
False or True

True

In [7]:
True or False

True

In [8]:
'hi' == 'bye'

False

In [9]:
(1 > 2) or (2 < 3)

True

### Strings
Using strings, we can handle text in Python. These values must be surrounded in quotes &mdash; single (`'...'`) is the standard, but double (`"..."`) works as well:

In [10]:
'hello'

'hello'

We can also perform operations on strings. For example, we can see how long it is with `len()`:

In [11]:
len('hello')

5

We can select parts of the string by specifying the **index**. Note that in Python the first character is at index 0:

In [12]:
'hello'[0]

'h'

We can concatentate strings with `+`:

In [13]:
'hello' + ' ' + 'world'

# concatentate = verketteln

'hello world'

We can check if characters are in the string with the `in` operator:

In [None]:
print('h' in 'hello')
# The in operator in Python is used to check for membership, 
# meaning it tests whether a certain value exists within a 
# container, such as a string, list, tuple, dictionary, or set. 
# Example String
sentence = "Hello, world!"
print("world" in sentence)  # Output: True
print("Python" in sentence)  # Output: False

# Example with a list
fruits = ["apple", "banana", "cherry"]
print("banana" in fruits)  # Output: True
print("grape" in fruits)   # Output: False

# Example with a tuple
colors = ("red", "green", "blue")
print("green" in colors)   # Output: True

# Example with a set
numbers = {1, 2, 3, 4, 5}
print(3 in numbers)        # Output: True
print(6 in numbers)        # Output: False

There is also a built-in function for splitting strings.

In [21]:
s = 'Hello world'
s.split()

# The split() method in Python is used to divide a string 
# into a list of substrings based on a specified delimiter(separator). 
# By default, split() separates the string by whitespace, 
# but you can specify any other delimiter.

['Hello', 'world']

## Variables
Notice that just typing text causes an error. Errors in Python attempt to clue us in to what went wrong with our code. In this case, we have a `NameError` exception which tells us that `'hello'` is not defined. This means that [the Python interpreter](https://docs.python.org/3/tutorial/interpreter.html) looked for a **variable** named `hello`, but it didn't find one.

In [22]:
hello

NameError: name 'hello' is not defined

Variables give us a way to store data types. We define a variable using the `variable_name = value` syntax:

In [24]:
#In Python, you create a variable simply by choosing a name 
# and using the = (assignment) operator to set its value.
x = 5
y = 7
x + y
# Variable names in Python must follow a few rules:
# Must start with a letter(a-z, A-Z) 
# or an underscore (_), but not with a number.
# Can only contain letters, numbers, and underscores.
# Are case-sensitive, meaning myVar and myvar would be 
# two different variables.
# Should be descriptive to make code readable, like 
# student_age instead of x

12

The variable name cannot contain spaces; we usually use `_` instead. The best variable names are descriptive ones:

In [25]:
lecture_title = 'Intro to Data Science'

Variables can be any data type. We can check which one it is with `type()`, which is a **function** (more on that later):

In [26]:
type(x)

int

In [27]:
type(lecture_title)

str

If we need to see the value of a variable, we can print it using the `print()` function:

In [28]:
print(lecture_title)

Intro to Data Science


## Control questions


### 1. What is printed in the following code?

In [30]:
num1 = 4
num2 = 5.5

sum_a = num1 + num2 
sum_b = int(num1) + int(num2)

print(sum_a)
print(sum_b)

9.5
9


### 2. What value is stored in  `result`?

In [34]:
x = "1.5"
y = input("Please enter a number:  ") #Entry: 10
result = y + x

print(result)

101.5


## Collections of Items

### Lists
We can store a collection of items in a list:

In [None]:
['hello', ' ', 'world']

The list can be stored in a variable. Note that the items in the list can be of different types:

In [37]:
my_list = ['hello', 3.8, True, 'Python']
type(my_list)

list

We can see how many elements are in the list with `len()`:

In [38]:
len(my_list)

4

We can also use the `in` operator to check if a value is in the list:

In [39]:
'world' in my_list

False

We can select items in the list just as we did with strings, by providing the index to select:

In [40]:
my_list[1]

3.8

Python also allows us to use negative values, so we can easily select the last one:

In [41]:
my_list[-1]

'Python'

Another powerful feature of lists (and strings) is **slicing**. We can grab the middle 2 elements in the list:

In [42]:
my_list[1:3]

#Slicing is a powerful feature in Python that allows you to extract 
#a portion of a list or string. It provides a way to access a subset 
# of elements by specifying a range.
#   sequence[start:stop:step]
# start: The index where the slice starts (inclusive).
# stop: The index where the slice ends (exclusive).
# step: The interval between each index in the slice (optional).
#If you omit the start, (my_list[:3]) it defaults to 0
#If you omit the stop (my_list[1:]), it goes to the end of the list
#You can specify a step to skip elements (my_list[::2]) gibt jeden 2ten Wert der Liste nicht aus!)

[3.8, True]

... or every other one:

In [44]:
my_list[::2]

['hello', True]

We can even select the list in reverse:

In [43]:
my_list[::-1]

['Python', True, 3.8, 'hello']

Note: This syntax is `[start:stop:step]` where the selection is inclusive of the start index, but exclusive of the stop index. If `start` isn't provided, `0` is used. If `stop` isn't provided, the number of elements is used (4, in our case); this works because the `stop` is exclusive. If `step` isn't provided, it is 1.

In order to add an element to an existing list, we can use the `append` method:

In [45]:
my_list.append('new element')
my_list

['hello', 3.8, True, 'Python', 'new element']

### Tuples
Tuples are similar to lists; however, they can't be modified after creation i.e. they are **immutable**. Instead of square brackets, we use parenthesis to create tuples:

In [46]:
my_tuple = ('a', 5)
type(my_tuple)

tuple

In [47]:
my_tuple[0]

'a'

Immutable objects can't be modified:

In [48]:
my_tuple[0] = 'b'

TypeError: 'tuple' object does not support item assignment

### Dictionaries
We can store mappings of key-value pairs using dictionaries:

In [49]:
shopping_list = {
    'veggies': ['spinach', 'kale', 'beets'],
    'fruits': 'bananas',
    'meat': 0    
}
type(shopping_list)

dict

To access the values associated with a specific key, we use the square bracket notation again:

In [50]:
shopping_list['veggies']

['spinach', 'kale', 'beets']

We can extract all of the keys with `keys()`:

In [51]:
shopping_list.keys()

dict_keys(['veggies', 'fruits', 'meat'])

We can extract all of the values with `values()`:

In [52]:
shopping_list.values()

dict_values([['spinach', 'kale', 'beets'], 'bananas', 0])

Finally, we can call `items()` to get back pairs of (key, value) pairs:

In [53]:
shopping_list.items()

dict_items([('veggies', ['spinach', 'kale', 'beets']), ('fruits', 'bananas'), ('meat', 0)])

### Sets
A set is a collection of unique items; a common use is to remove duplicates from a list. These are written with curly braces also, but notice there is no key-value mapping:

In [54]:
my_set = {1, 1, 2, 'a'}
type(my_set)

set

How many items are in this set?

In [55]:
len(my_set)

3

We put in 4 items but the set only has 3 because duplicates are removed:

In [56]:
my_set

{1, 2, 'a'}

We can check if a value is in the set:

In [57]:
2 in my_set

True

## Atomic and Reference Data Types
- **Atomic types** (e.g., integers, floats) are the basic, immutable data types. When you work with atomic data types, their values cannot be changed in place — any modification creates a new object in memory.
- **Reference types** (e.g., lists, dictionaries) are mutable, meaning that if you change the contents of a reference data type, it can be modified in place without creating a new object. Reference types hold a reference to the data's memory location, so modifying the data impacts all references to that object.

In [58]:
a = 1000
print(f"Original 'a': {a}, ID: {id(a)}")

a = a + 5
print(f"Modified 'a': {a}, ID: {id(a)}")

#f-strings
#The f allows you to include variables and expressions directly 
#within the string by using curly braces {}. Python evaluates 
#the expressions inside the braces and substitutes them into the string.
#{id(a)} 
# is replaced with the memory address of a (obtained using the id() function).

Original 'a': 1000, ID: 126516212100656
Modified 'a': 1005, ID: 126516213399376


## In-Place Operations
Variables are references to objects and can be modified in-place if they are mutable (if they are reference types). This is particularly important for memory efficiency, especially when working with large datasets. In-place operations modify the data directly without creating a new object.

In [59]:
my_dict = {'a': 1, 'b': 2, 'c': 5}
print(f"'my_dict': {my_dict}, ID: {id(my_dict)}")

'my_dict': {'a': 1, 'b': 2, 'c': 5}, ID: 126516211278528


In [61]:
my_dict2 = my_dict
my_dict2['c'] = 3

print(my_dict['c'])


3


## Control question: What is the value of `my_dict['c']`?

## Functions
We can define functions to package up our code for reuse. We have already seen some functions: `len()`, `type()`, and `print()`. They are all functions that take **arguments**. Note that functions don't need to accept arguments, in which case they are called without passing in anything (e.g. `print()` versus `print(my_string)`). 

*Aside: we can also create lists, sets, dictionaries, and tuples with functions: `list()`, `set()`, `dict()`, and `tuple()`*

### Defining functions
We use the `def` keyword to define functions. Let's create a function called `add()` with 2 parameters, `x` and `y`, which will be the names the code in the function will use to refer to the arguments we pass in when calling it:

In [62]:
def add(x, y):
    """This is a docstring. It is used to explain how the code works and is optional (but encouraged)."""
    # this is a comment; it allows us to annotate the code
    print('Performing addition')
    return x + y

Once we run the code above, our function is ready to use:

In [63]:
type(add)

function

Let's add some numbers:

In [64]:
add(1, 2)

Performing addition


3

### Return values
We can store the result in a variable for later:

In [65]:
result = add(1, 2)

Performing addition


Notice the print statement wasn't captured in `result`. This variable will only have what the function **returns**. This is what the `return` line in the function definition did:

In [66]:
result

3

Note that functions don't have to return anything. Consider `print()`:

In [67]:
print_result = print('hello world')

hello world


If we take a look at what we got back, we see it is a `NoneType` object:

In [68]:
type(print_result)

NoneType

In Python, the value `None` represents null values. We can check if our variable *is* `None`:

In [69]:
print_result is None

True

*Warning: make sure to use comparison operators (e.g. >, >=, <, <=, ==, !=) to compare to values other than `None`.*

### Function arguments

*Note that function arguments can be anything, even other functions. We will see several examples of this in the text.* 

The function we defined requires arguments. If we don't provide them all, it will cause an error:

In [70]:
add(1)

TypeError: add() missing 1 required positional argument: 'y'

We can use `help()` to check what arguments the function needs (notice the docstring ends up here):

In [71]:
help(add)

Help on function add in module __main__:

add(x, y)
    This is a docstring. It is used to explain how the code works and is optional (but encouraged).



## Control Flow Statements
Sometimes we want to vary the path the code takes based on some criteria. For this we have `if`, `elif`, and `else`. We can use `if` on its own:

In [72]:
def make_positive(x):
    """Returns a positive x"""
    if x < 0:
        x *= -1 # this is short for x = -1*x
    return x

Calling this function with negative input causes the code under the `if` statement to run:

In [73]:
make_positive(-1)

1

Calling this function with positive input skips the code under the `if` statement, keeping the number positive:

In [74]:
make_positive(2)

2

Sometimes we need an `else` statement as well:

In [75]:
def add_or_subtract(operation, x, y):
    if operation == 'add':
        return x + y
    else:
        return x - y

This triggers the code under the `if` statement:

In [76]:
add_or_subtract('add', 1, 2)

3

Since the Boolean check in the `if` statement was `False`, this triggers the code under the `else` statement:

In [77]:
add_or_subtract('subtract', 1, 2)

-1

For more complicated logic, we can also use `elif`. We can have any number of `elif` statements. Optionally, we can include `else`.

In [78]:
def calculate(operation, x, y):
    if operation == 'add':
        return x + y
    elif operation == 'subtract':
        return x - y
    elif operation == 'multiply':
        return x * y
    elif operation == 'division':
        return x / y
    else:
        print("This case hasn't been handled")

The code keeps checking the conditions in the `if` statements from top to bottom until it finds `multiply`:

In [79]:
calculate('multiply', 3, 4)

12

The code keeps checking the conditions in the `if` statements from top to bottom until it hits the `else` statement:

In [80]:
calculate('power', 3, 4)

This case hasn't been handled


## Exercise

### Task: Create a Grading System for Student Performance

__Scenario:__ You are tasked with creating a simple grading system for a school. The school follows a system where students are graded based on their total score in three subjects: Math, Science, and English. The total score is out of 300, with each subject being out of 100. Based on the total score, students are assigned a letter grade.

You need to write a Python program that defines functions to compute the total score, calculate the average, and assign the appropriate grade based on certain conditions.


#### Requirements:

#### 1. Create a function `calculate_total`:

- This function should accept three parameters: `math_score`, `science_score`, and `english_score`.
- It should calculate and return the total score by adding these three values together.

In [81]:
# type your code here...
# We need a function called calculate_total that takes three parameters: 
# math_score, science_score, and english_score. This function will return 
# the total score by adding these three scores together.
def calculate_total(math_score, science_score, english_score):
    total_score = math_score + science_score + english_score
    return total_score # return sends the calculated total score back 
                       # to where the function was called.

#### 2. Create a function `calculate_average`:

- This function should accept the total score as a parameter.
- It should calculate the average by dividing the total score by 3.
- Return the average value.

In [82]:
# defines a function that takes the total score as its parameter.
def calculate_average(total_score): 
    average_score = total_score / 3
    return average_score

#### 3. Create a function `assign_grade`

This function should accept the **average score** as a parameter.

Based on the average score, assign a letter grade using the following conditions:

- **Grade A**: Average >= 90
- **Grade B**: Average >= 80 but less than 90
- **Grade C**: Average >= 70 but less than 80
- **Grade D**: Average >= 60 but less than 70
- **Grade F**: Average < 60

Return the corresponding grade as a string.


In [83]:
# 
#
def assign_grade(average_score):
    if average_score >= 90:
        return 'A'
    elif average_score >= 80:
        return 'B'
    elif average_score >= 70:
        return 'C'
    elif average_score >= 60:
        return 'D'
    else:
        return 'F'

#### 4. Create a main function to bring everything together

In the `main` function:

- Prompt the user to input the scores for **Math**, **Science**, and **English**.
    - Hint: to query an input from the user, you need to use the `input()` function, e.g. `input("Please enter Math score: ")`
- Call the `calculate_total` function to get the total score.
- Call the `calculate_average` function to compute the average score.
- Call the `assign_grade` function to assign a grade based on the average.
- Finally, print the total score, the average, and the assigned grade.


In [85]:
def main():
    # Input scores
    # The input() function captures user input, 
    # which is converted to a float for decimal support.
    math_score = float(input("Enter Math score: "))
    science_score = float(input("Enter Science score: "))
    english_score = float(input("Enter English score: "))
    # Calculate total score
    total_score = calculate_total(math_score, science_score, english_score)
    print(f"Total Score: {total_score}")
    # Calculate average score
    average_score = calculate_average(total_score)
    print(f"Average Score: {average_score:.2f}")  # Formatted to 2 decimal places
    # Assign grade
    grade = assign_grade(average_score)
    print(f"Letter Grade: {grade}")

# Call the main function to run the program
#block ensures that the main() function runs only when the script is executed directly.
if __name__ == "__main__":
    main()

Total Score: 160.0
Average Score: 53.33
Letter Grade: F


#### 5. Run your program and test it with some sample inputs

In [78]:
# type your code here... Code von 4. ausführen und mit verschiedenen Werten Testen

#### 5.  Handle invalid inputs (optional, for advanced students)

If the user enters a score that is not between **0 and 100**:

- Display an error message indicating that the score is invalid.
- Prompt the user to enter a valid score within the range of 0 to 100.
- Continue prompting the user until a valid score is entered.

In [87]:
# while True: creates an infinite loop. This loop keeps running 
# until a valid score is entered and returned.
# allows the program to repeatedly prompt the user if they enter an invalid score, 
# ensuring that the user can only proceed when they enter a valid value.

#try block 
#attempts to execute the code that might cause an error—in this case, 
#-> converting the input to a float
#except ValueError block 
#catches exceptions specifically caused by invalid inputs 
# (such as entering letters or symbols instead of a number).

def get_valid_score(subject):
    while True:
        try:
            score = float(input(f"Enter {subject} score (0-100): "))
            if 0 <= score <= 100:
                return score
            else:
                print("Error: Score must be between 0 and 100. Please try again.")
        except ValueError:
            print("Error: Invalid input. Please enter a numeric value.")
            #If a ValueError occurs, the program prints an error message 
            #("Error: Invalid input. Please enter a numeric value.") and 
            # goes back to the start of the loop, asking the user for input again.

## Loops
### `while` loops
With `while` loops, we can keep running code until some stopping condition is met:

In [86]:
done = False
value = 2
while not done:
    print('Still going...', value)
    value *= 2
    if value > 10:
        done = True

Still going... 2
Still going... 4
Still going... 8


Note this can also be written as, by moving the condition to the `while` statement:

In [88]:
value = 2
while value < 10:
    print('Still going...', value)
    value *= 2

Still going... 2
Still going... 4
Still going... 8


### `for` loops
With `for` loops, we can run our code *for each* element in a collection:

In [89]:
for i in [0,1,2,3,4]:
    print(i)

0
1
2
3
4


We can use `for` loops with lists, tuples, sets, and dictionaries as well:

In [90]:
for element in my_list:
    print(element)

hello
3.8
True
Python
new element


In [91]:
for key, value in shopping_list.items():
    print('For', key, 'we need to buy', value)

For veggies we need to buy ['spinach', 'kale', 'beets']
For fruits we need to buy bananas
For meat we need to buy 0


The `range` function can be useful to iterate over a collection of numbers following a certain pattern:

In [92]:
for i in range(5):
    print(i)

0
1
2
3
4


In [93]:
for i in range(2, 7, 2):
    print(i)

2
4
6


It is particularly useful if you want to iterate over a list of items and want to do something to each of these items, depending on how long the list is.

In [94]:
programming_languages = ["Python", "JavaScript", "Java", "C++"]

programming_languages_length = len(programming_languages)

for i in range(programming_languages_length):
  print("Programming language No. " + str(i) + ": " + programming_languages[i] + ": Hello World")

Programming language No. 0: Python: Hello World
Programming language No. 1: JavaScript: Hello World
Programming language No. 2: Java: Hello World
Programming language No. 3: C++: Hello World


With `for` loops, we don't have to worry about checking if we have reached the stopping condition. Conversely, `while` loops can cause infinite loops if we don't remember to update variables.

### List comprehension

In [95]:
x = [1,2,3,4]

Imagine we want to double each element in x. With a for loop, it would look like this:

In [96]:
out = []
for item in x:
    out.append(item**2)
print(out)

[1, 4, 9, 16]


List comprehension makes things a shorter (and usually faster) but with the same result.

In [97]:
[item**2 for item in x]

[1, 4, 9, 16]

### Break and continue

We can use `break` in order t exit a loop early. This can be useful to avoid unnecessary iterations.

In [98]:
for num in range(10):
    if num == 5:
        break
    print(num)

0
1
2
3
4


Note here the importance of the order of code:

In [99]:
for num in range(10):
    print(num)
    if num == 5:
        break

0
1
2
3
4
5


`continue`, on the other hand, is used to skip to the next iteration:

In [100]:
for num in range(10):
    if num % 2 == 0:
        continue
    print(num)

1
3
5
7
9


Using break and continue too much can make your code harder to read. It's better to write clear conditions for loops when you can. In more complex loops, break and continue might cause mistakes, so it's often safer to rewrite the code to make it simpler. In any case, it is always advised to use code comments.

## Imports
We have been working with the portion of Python that is available without importing additional functionality. The Python standard library that comes with the install of Python is broken up into several **modules**, but we often only need a few. We can import whatever we need: a module in the standard library, a 3rd-party library, or code that we wrote. This is done with an `import` statement:

In [101]:
import math

print(math.pi)

3.141592653589793


If we only need a small piece from that module, we can do the following instead:

In [102]:
from math import pi

print(pi)

3.141592653589793


*Warning: anything you import is added to the namespace, so if you create a new variable/function/etc. with the same name it will overwrite the previous value. For this reason, we have to be careful with variable names e.g. if you name something `sum`, you won't be able to add using the `sum()` built-in function anymore. Using notebooks or an IDE will help you avoid these issues with syntax highlighting.* 

## Summary

- Data types in python
- Variables
- Collections of items/variables
- Functions
- Control flow statements
   - If-else 
   - Loops

<img src='images/d3.png' style='width:80%; float:left;' />