<div class='bar_title'></div>

*Introduction to Data Science*

# 1 Python Basics - Part a)

Gunther Gust<br>
Chair for Enterprise AI

Winter Semester 25/26

<img src='https://github.com/GuntherGust/tds2_data/blob/main/images/d3.png?raw=true' style='width:20%; float:left;' />

# Learning Objectives for today

At the end of today's lecture, you...
- have familiarized yourself with __programming basics__ in python, including:
    - Data types in python
    - Variables
    - Collections of items/variables
    - Functions
    - Control flow statements
        - If-else 
        - Loops
- you have used the basics to solve small __programming exercises__
- you now how to __use jupyter notebooks__


## Sources and recommended further reading

We use material from the following sources:

### Books
- Molin, S. (2021). [Hands-on data analysis with pandas: A Python data science handbook for data collection, wrangling, analysis, and visualization](https://github.com/stefmolin/Hands-On-Data-Analysis-with-Pandas-2nd-edition?tab=readme-ov-file) (2nd ed.). Packt Publishing.
- VanderPlas, J. (2017). [Python data science handbook: Essential tools for working with data](https://jakevdp.github.io/PythonDataScienceHandbook/) O'Reilly Media.
- McKinney, W. (2022). [Python for data analysis: Data wrangling with pandas, NumPy, and Jupyter](https://wesmckinney.com/book/) (3rd ed.). O'Reilly Media.

### Online courses
- __Datacamp:__ On Wuecampus we have made useful practice exercises on datacamp.com available. Use them for practice! 
- __Udemy:__ Pierian Data. (n.d.). *Python for data science and machine learning bootcamp* [Online course](https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/) 




<img src="https://raw.githubusercontent.com/vhaus63/ids_data/main/practice_programming.jpg" alt="Your Image" style="width:50%">


# Agenda

- ## Data types in python
- ## Variables
- ## Collections of items/variables
- ## Functions
- ## Control flow statements

In [None]:
import sys
import os.path
sys.path.append(os.path.abspath('../'))

## Basic data types
### Numbers
Numbers in Python can be represented as integers (e.g. `5`) or floats (e.g. `5.0`). We can perform operations on them:

In [None]:
5 + 6

In [None]:
2.5 / 3

### Booleans

We can check for equality giving us a Boolean:

In [None]:
5 == 6

In [None]:
5 < 6

These statements can be combined with logical operators: `not`, `and`, `or`

In [None]:
(5 < 6) and not (5 == 6)

In [None]:
False or True

In [None]:
'hi' == 'bye'

In [None]:
(1 > 2) or (2 < 3)

### Strings
Using strings, we can handle text in Python. These values must be surrounded in quotes &mdash; single (`'...'`) is the standard, but double (`"..."`) works as well:

In [None]:
'hello'

We can also perform operations on strings. For example, we can see how long it is with `len()`:

In [None]:
len('hello')

We can select parts of the string by specifying the **index**. Note that in Python the first character is at index 0:

In [None]:
'hello'[0]

We can concatentate strings with `+`:

In [None]:
'hello' + ' ' + 'world'

We can check if characters are in the string with the `in` operator:

In [None]:
'ho' in 'hello'

There is also a built-in function for splitting strings.

In [None]:
s = 'Hello world'
s.split()

## Variables
Notice that just typing text causes an error. Errors in Python attempt to clue us in to what went wrong with our code. In this case, we have a `NameError` exception which tells us that `'hello'` is not defined. This means that [the Python interpreter](https://docs.python.org/3/tutorial/interpreter.html) looked for a **variable** named `hello`, but it didn't find one.

In [None]:
hello = "hello"
hello

Variables give us a way to store data types. We define a variable using the `variable_name = value` syntax:

In [None]:
x = 5
y = 7
x + y

The variable name cannot contain spaces; we usually use `_` instead. The best variable names are descriptive ones:

In [None]:
lecture_title = 'Intro to Data Science'

Variables can be any data type. We can check which one it is with `type()`, which is a **function** (more on that later):

In [None]:
type(x)

In [None]:
type(lecture_title)

If we need to see the value of a variable, we can print it using the `print()` function:

In [None]:
print(lecture_title)

Or we can use some fancier string formatting with an `f-string` (formatted string literal) in order to combine text and variables:

In [None]:
print(f'The lecture title is {lecture_title}.')

## Control questions


### 1. What is printed in the following code?

In [None]:
num1 = 4
num2 = 5.5

sum_a = num1 + num2 
sum_b = int(num1) + int(num2)

### 2. What value is stored in  `result`?

In [None]:
x = "1"
y = input("Please enter a number:  ") #Entry: 10
result = y + x

## Collections of Items

### Lists
We can store a collection of items in a list:

In [None]:
['hello', ' ', 'world']

The list can be stored in a variable. Note that the items in the list can be of different types:

In [None]:
my_list = ['hello', 3.8, True, 'Python']
type(my_list)

We can see how many elements are in the list with `len()`:

In [None]:
len(my_list)

We can also use the `in` operator to check if a value is in the list:

In [None]:
'world' in my_list

We can select items in the list just as we did with strings, by providing the index to select:

In [None]:
my_list[0]

Python also allows us to use negative values, so we can easily select the last one:

In [None]:
my_list[-1]

Another powerful feature of lists (and strings) is **slicing**. We can grab the middle 2 elements in the list:

In [None]:
my_list[1:3]

... or every other one:

In [None]:
my_list[::2]

We can even select the list in reverse:

In [None]:
my_list[::-1]

Note: This syntax is `[start:stop:step]` where the selection is inclusive of the start index, but exclusive of the stop index. If `start` isn't provided, `0` is used. If `stop` isn't provided, the number of elements is used (4, in our case); this works because the `stop` is exclusive. If `step` isn't provided, it is 1.

In order to add an element to an existing list, we can use the `append` method:

In [None]:
my_list.append('new element')
my_list

### Tuples
Tuples are similar to lists; however, they can't be modified after creation i.e. they are **immutable**. Instead of square brackets, we use parenthesis to create tuples:

In [None]:
my_tuple = ('a', 5)
type(my_tuple)

In [None]:
my_tuple[0]

Immutable objects can't be modified:

In [None]:
my_tuple[0] = 'b'

### Dictionaries
We can store mappings of key-value pairs using dictionaries:

In [None]:
shopping_list = {
    'veggies': ['spinach', 'kale', 'beets'],
    'fruits': 'bananas',
    'meat': 0    
}
type(shopping_list)

To access the values associated with a specific key, we use the square bracket notation again:

In [None]:
shopping_list['veggies'][-1] 

We can extract all of the keys with `keys()`:

In [None]:
shopping_list.keys()

We can extract all of the values with `values()`:

In [None]:
shopping_list.values()

Finally, we can call `items()` to get back pairs of (key, value) pairs:

In [None]:
shopping_list.items()

### Sets
A set is a collection of unique items; a common use is to remove duplicates from a list. These are written with curly braces also, but notice there is no key-value mapping:

In [None]:
my_set = {1, 1, 2, 'a'}
type(my_set)

How many items are in this set?

In [None]:
len(my_set)

We put in 4 items but the set only has 3 because duplicates are removed:

In [None]:
my_set

We can check if a value is in the set:

In [None]:
2 in my_set

## Atomic and Reference Data Types


Are __very common__ source of confusion and error. See this example:

In [None]:
a = 100
b = a

In [None]:
a = 101
print (b)

In [None]:
my_dict = {'a': 1, 'b': 2, 'c': 5}
my_dict2 = my_dict

In [None]:
my_dict2['c'] = 3

print(my_dict)

<img src="https://raw.githubusercontent.com/vhaus63/ids_data/main/atom_vs_reference.png" alt="Your Image" style="width:40%">


- **Atomic types** (e.g., integers, floats) are the basic, immutable data types. When you work with atomic data types, their __values are stored__ directly in memory. (The values also cannot be changed in place â€” any modification creates a new object in memory.)
- **Reference types** (e.g., lists, dictionaries) are mutable, meaning that if you change the contents of a reference data type, it can be modified in place without creating a new object. Reference types hold __a reference to the data's memory location__, so modifying the data impacts all references to that object.

In [None]:
my_dict3 = {'a': 1, 'b': 2, 'c': 5}
my_dict4 = my_dict3.copy() #creates a "deep" copy  (not only of the reference) 

my_dict4['c'] = 3

print(my_dict3)

## Summary

- Data types in python
- Variables
- Collections of items/variables

## Next lecture

- Functions
- Control flow statements
   - If-else 
   - Loops


## Let's do a Mentimeter!

<img src='https://raw.githubusercontent.com/vhaus63/ids_data/main/d3.png' style='width:80%; float:left;' />