# 01 – Introduction to Python

Authored by: *Fredrik Svensson, Oliver Scott*  
Edited by: *Florion Peni*

You can use the links in the [Contents](#contents) section to navigate the notebook.

This notebook provides a brief and general introduction to how programming with Python works. If you are already familiar with the basics, you may wish to skip ahead to [`02_rdkit_introduction.ipynb`](02_rdkit_introduction.ipynb).

⚠️ **Please run the code cells in order**. Skipping cells may result in errors due to missing variables or imports.

### Notes

This notebook works in both:
- **Local JupyterLab installations** (e.g. via Anaconda)
- **Google Colab** (run in your browser with no prior setup required)

File handling instructions (e.g. loading from disk) may differ depending on your environment.

### Python

**Python** is a popular general-purpose programming language. It is especially common in scientific computing due to its readability and rich ecosystem of libraries.

### JupyterLab / Google Colab

**JupyterLab** is an open-source interface for running and sharing notebooks with code and text.  
**Google Colab** is a cloud-based version of Jupyter that runs in your browser and connects to free GPUs and TPUs.

> *When working in Colab, any files saved in the runtime (e.g. downloaded datasets or generated outputs) are temporary and will be deleted when you close the tab or your session times out.*  
You can view runtime files by clicking the folder icon in the left-hand toolbar. If you are working with important files, make sure to download them to your local machine or save them to Google Drive before ending your session.

> If you make edits to this notebook and would like to save them you can save your own copy by clicking on **File > Save a copy in Drive** in the top menu. This ensures your work is preserved even after you close the runtime.

### Extra resources

This is a concise introduction. If you would like to learn more, see the following:

* [RealPython](https://realpython.com/) - Free Python tutorials.
* [CodeAcademy](https://www.codecademy.com/learn/learn-python-3) - Python lessons.
* [Cheat-Sheets](https://ehmatthes.github.io/pcc_2e/cheat_sheets/cheat_sheets/) - Python reference sheets.

## Contents

* [Writing code](#writing-code)
* [Comments](#comments)
* [Variables and data types](#variables-and-data-types)
* [Operators](#operators)
* [Imports](#imports)
* [Errors](#errors)
* [Functions](#functions)
* [Control flow](#control-flow)
* [Reading files](#reading-files)
* [Discussion](#discussion)

---

## Writing and running code

You can write and run Python code inside **code cells**.

Try running the cell below. First, select it by clicking on it. Then:
- Click the run/play button located inside the cell (in Colab) or in the toolbar above (in JupyterLab), or
- Use these keyboard shortcuts:
  - Windows: `Ctrl` + `Enter`
  - macOS: `⌘ Command` + `Enter`

> When you first run code on this notebook in Google Colab, you may see a message like:  
*“This notebook was not authored by Google. It may request access to your data…”*  
This is expected for notebooks loaded from GitHub. Click the *“Run anyway”* button when prompted to start running the code.


In [None]:
print("This is a code cell.")

----

## Comments

To enhance readability of code, programmers use comments. These have no effect on the running of the programme, but are important to make your code understandable.

```python
# We use the hash symbol to define a comment
```

Comments will be used in the code blocks to help you understand what is going on.

In [None]:
# This is a comment
# Notice that this cell outputs nothing when run

Comments make it clear what specific sections of code are doing.

In [None]:
# The line below will print the sum of two numbers
print(10 + 12)

----

## Variables and data types

Variables are one of the most important components of a programming language. Variables are used to store information. This gives a short hand notation to refer to potentially large amounts of information.

We can assign data to variables using the `=` symbol.

```python
variable = data
```

Once we have assigned data to a variable we can then access that data using the variable name.

When naming a variable it is best practice to give it a name which describes the data it holds. We use underscores `_` to make variable names easier to read. Blank spaces cannot be used.

```python
my_long_variable_name
```

Also note that Python contains keywords that **should not** be used as variable names. This is because they have explicit functions in Python. In Jupyter and Colab you will be able to tell a keyword when it is automatically highlighted. [Here](https://www.w3schools.com/python/python_ref_keywords.asp) is a list of reserved keywords.

Note: *Python is a dynamically typed language and therefore the type of a variable does not need to be specified. If you are familiar with statically typed languages (e.g. C, C++, Java) this may seem a little unusual.*

In [None]:
# In this line we assign the text "Hello, World!" to the variable `my_string`
my_string = "Hello, World!"

# We can print the contents of the variable using the print function
print(my_string)

# We can also create aliases to the same data
my_string_alias = my_string

# Another way to assign the text to two different variables
my_string = my_string_alias = "Hello, World"

**Python** has several built-in data types/objects which are useful to a programmer.

In this session we will look at:

- strings
- numerical types
- Booleans
- collections

Note: *This is only an introduction. There is much more you can do with these data types. Check out these [Cheat-Sheets](https://ehmatthes.github.io/pcc_2e/cheat_sheets/cheat_sheets/), for easy-to-use reference.*

### Strings

Strings in Python hold text data (`str`).

Strings can be enclosed with either single or double quotation marks (`' '` or `" "`). Multi-line strings can be defined using triple quotes (`""" """`). The latter can be useful for making long comments or documenting code.

In [None]:
# Double quote syntax
string_one = "This is a string."
print(string_one)

# Single quote syntax
string_two = 'This is also a string.'
print(string_two)

# Triple quote syntax
string_three = """This is also another string but it
can span multiple lines."""
print(string_three)

### Numerical or scalar types

Python has three built-in numerical types.

**Integer** `int`:

- An integer is a whole number, positive or negative, without decimals, and of unlimited length.

**Float** `float`:

- A "floating point number" is a number, positive or negative, containing one or more decimals.

**Complex** `complex`:

- Complex numbers are written with a "j" as the imaginary part:

In [None]:
x = 42    # Integer
y = 42.0  # Float
z = 1j    # Complex

print(x)
print(y)
print(z)

# To verify the type of variable we can use the `type()` function
print(type(x))
print(type(y))
print(type(z))

Floats can also be formatted in scientific notation with an "e" to indicate the power of 10.

In [None]:
big_float = 1e9
small_float = -82.7e10  # Negative number

print(big_float)
print(small_float)

### Type conversion

You can convert from one type to another using the namesake functions:

```python
int()
float()
complex()
```

This conversion is called *casting*.

Note: *Casting is useful for more than just numerical types. We will see examples of this later.*

In [None]:
x = 1    # Integer
y = 2.8  # Float

# Convert from integer to float
a = float(x)

# Convert from float to integer
b = int(y)

print(x, "->", a)
print(y, "->", b)

# We can verify that the type has changed
print(type(a))
print(type(b))

Complex numbers cannot be converted into other numerical types, although `float` and `int` can be converted to `complex`.

### Booleans

Booleans `bool` represent either of two values: `True` or `False`.

In programming you often need to know if an expression is `True` or `False`. When you compare two values, the expression is evaluated and Python returns the Boolean answer.

Note: *We will learn more about comparisons in the ["Operators"](#Operators) section. Booleans are also essential for [control flow](#Control-flow)*.

In [None]:
print(10 > 9)   # Greater than `>`
print(10 == 9)  # Equal to `==`
print(10 < 9)   # Less than `<`

# We can also assign Booleans to variables
true = True
false = False

# Verify that the type is indeed Boolean
print(type(true))
print(type(false))

We can also convert Booleans into and from multiple types.

**Try to guess the output of the cell below before running it.**

Hint: *Think binary!*

In [None]:
true_int = int(True)    # Convert the Boolean `True` to an integer
false_int = int(False)  # Convert the Boolean `False` to an integer

# What will these lines print?
print(true_int)
print(false_int)

### Collections

There are four core collection data types in Python:

- **Lists**; `list` is a collection which is **ordered** and **changeable**. Allows duplicate members.
- **Tuples**; `tuple` is a collection which is **ordered** and **unchangeable**. Allows duplicate members.
- **Sets**; `set` is a collection which is **unordered** and **unindexed**. No duplicate members.
- **Dictionaries**; `dict` is a collection which is **unordered**, **changeable** and **indexed**. No duplicate members.

When choosing a collection type, it is useful to understand the properties of that type.

Note: *Sometimes you will see the terms changeable and unchangeable written as mutable and immutable, respectively.*

#### Lists
A `list` is **ordered** and **changeable**.

In Python, lists are written with square brackets:

```python
my_list = ['hello', 'John', 22]
```

A list can contain any type of Python object.

In [None]:
# Define and print a list containg colours
colour_list = ['red', 'green', 'blue', 'yellow', 'white', 'black']
print(colour_list)

We can use **indexes** to access items in a `list`.

In Python indexes start at **0** referring to the **1st** item in the list.

We can also use **negative indexes** to access items from the end of the list, i.e. -1 refers to the last item in the list and -2 refers to the item before the last item in the list.

![PythonIndexing](https://railsware.com/blog/wp-content/uploads/2018/10/positive-indexes.png)

[Image source](https://railsware.com/blog/python-for-machine-learning-indexing-and-slicing-for-lists-tuples-strings-and-other-sequential-types/)

To use indexes we utilise a square bracket syntax:

```python
my_list[0]
```

The above example will access the first item in the list.

**Try to guess the output of the cell below before running it!**

In [None]:
print(colour_list[0])
print(colour_list[1])
print(colour_list[3])
print(colour_list[-1])
print(colour_list[-3])

# We can also change a value using indexing (the index must already exist)
colour_list[4] = 'purple'

# Verify the change
print(colour_list)

# What do you think this will output?
print(colour_list[0:3])

The [n:m] syntax generates a slice of the data, where n and m are numerical indexes forming a ranged query.

There are numerous ways to add and remove items from a list. For now we will only mention one of these: the `.append()` method.

```python
my_list.append('item to append')
```

This method adds an item to the end of a list. To see other methods check out the [Cheat-Sheet](https://github.com/ehmatthes/pcc_2e/releases/download/v1.0.1/beginners_python_cheat_sheet_pcc_lists.pdf) later.

In [None]:
# Add an item to the list
colour_list.append('pink')

# Verify that 'pink' was added to the end of the list
print(colour_list)

#### Tuples

A `tuple` is **ordered** and **unchangeable**.

In Python, tuples are written with round brackets.

```python
my_tuple = ('John', 'Doe', 22)
```

- Like lists, tuples can contain strings and any other type of Python object.
- Like lists, we can also use indexing to access items in the tuple.
- Unlike lists, we cannot change, add, or remove items in a tuple.

#### Exercise 1: Working with tuples

Given the tuple `xyz` below, complete the following tasks:

1. Extract and print the value of `x`.
2. Extract and print the value of `y`.
3. Extract and print the value of `z`.
4. Print the values of both `x` and `y` at the same time.

Hint: *Use indexing to access the elements of the tuple.*

In [None]:
xyz = (3.01, 1.23, -1.22)  # Define a tuple holding three-dimensional coordinates

# Add your solution code here

Remember that tuples are **unchangeable** so trying to change items will give us an error message.

Error messages in Python are designed to be understandable and relatively easy to read.

Uncomment and run the code in the cell below.

In [None]:
# xyz[1] = 2.90

To create a tuple with only one item, you have to add a comma after the item, otherwise Python will not recognise it as such.

In [None]:
my_tuple = (1,)     # This will assign a tuple
my_not_tuple = (1)  # This will NOT assign a tuple

print(type(my_tuple))
print(type(my_not_tuple))

#### Sets

A `set` is **unordered** and **unindexed**. They do not contain duplicate items and are defined with curly brackets.

```python
my_set = {'John', 'Doe', 22}
```

- Sets can contain strings and any **hashable** Python object.
- Like lists, items can be added and removed from a set.
- Unlike lists and tuples, sets cannot be indexed as they have no order.
- Unlike lists and tuples, sets do not hold duplicate items.

*For a Python object to be* **hashable**, *it must be ordered and unchangeable. Therefore, we can add a tuple to a set but not a list.*   
Numerical data types and strings are also **unchangeable** so we can add these to sets as well.

In [None]:
# Define a set holding the names of fruits
fruit_set = {'cherry', 'banana', 'apple'}
print(fruit_set)

# We can add items to the set using the `.add()` method
fruit_set.add('mango')
print(fruit_set)

# Duplicates are not allowed in sets
fruit_set.add('apple')
print(fruit_set)

# We can remove items using the `.remove()` method
fruit_set.remove('cherry')
print(fruit_set)

The `.intersection()` and `.union()` methods are also  useful.

- The intersection of two sets is a new set that contains all of the elements that are in both sets.
- The union of two sets is a new set that contains all of the elements that are in at least one of the two sets.

Try to work out the output before you run the cell below.

In [None]:
a = {1, 2, 3}  # Define a set of numbers
b = {3, 4, 5}  # Define another set of numbers

# Compute the intersection
intersection = a.intersection(b)
print(intersection)

# Compute the union
union = a.union(b)
print(union)

### Dictionary

A `dict` is **unordered**, **changeable** and **indexed**.

In Python, dictionaries are defined with curly brackets. They have **keys** and **values**.

```python
my_dict = {'my_key_1': 1, 'my_key_2': 2}
```

- Like lists and tuples, dictionaries can contain strings and any Python object. Keys, however, must be **hashable**.
- Like lists and sets, items can be added and removed from a dictionary.
- Like lists and tuples, dictionaries can be **indexed** although the syntax is slightly different.
- Dictionaries cannot contain duplicate keys but can contain duplicate values.

In [None]:
# Define a dictionary describing a user
user_dict = {
    'name': 'John',
    'last_name': 'Doe',
    'age': 22
}
print(user_dict)

# We can index the dictionary using keys
age_value = user_dict['age']  # Get the age of the user
print(age_value)

# We can add items using a similar syntax
user_dict['bought'] = ['apple', 'mango']
print(user_dict)

# We can also change values easily
user_dict['age'] = 23
print(user_dict)

Dictionaries are a very versatile data type. To learn more about the utilities of dictionaries download the dictionary [Cheat-Sheet](https://github.com/ehmatthes/pcc_2e/releases/download/v1.0.1/beginners_python_cheat_sheet_pcc_dictionaries.pdf).

----

## Operators

Operators are used to perform operations on variables and values.

Python provides multiple types of operators which can be used with multiple data types. Here, we will look at:

- arithmetic operators,
- assignment operators,
- and comparison operators.

### Arithmetic operators

They are used with numeric values to perform common mathematical operations. Most of these are relatively intuitive.

| Operator | Name           | Example |
|:---------|:---------------|:--------|
| `+`      | Addition       | x + y   |
| `-`      | Subtraction    | x - y   |
| `*`      | Multiplication | x * y   |
| `/`      | Division       | x / y   |
| `%`      | Modulus        | x % y   |
| `**`     | Exponentiation | x ** y  |
| `//`     | Floor division | x // y  |

In [None]:
x = 10  # Define an integer 10
y = 2   # Define an integer 2

print(x + y)   # Addition
print(x - y)   # Subtraction
print(x * y)   # Multiplication
print(x / y)   # Division
print(x % y)   # Modulus (remainder)
print(x ** y)  # Exponentiation

y = 3          # Redefine y so we can see the effect of floor division
print(x / y)   # Division
print(x // y)  # Floor division

Note: *Some arithmetic operators can also be used with other data types like strings. Maybe you could try and see what happens if you add two strings or multiply a string by an integer.*

### Assignment operators

Assignment operators are used to assign values to variables.

We already know the `=` assignment operator, though there are multiple others.

| Operator  | Example        | Equivalent |
|:----------|:---------------|:-----------|
| `=`       | x = 3          | x = 3      |
| `+=`      | x += 3         | x = x + 3  |
| `-=`      | x -= 3         | x = x - 3  |
| `*=`      | x \*= 3        | x = x * 3  |
| `/=`      | x /= 3         | x = x / 3  |
| `%=`      | x %= 3         | x = x / 3  |
| `**=`     | x \*\*= 3      | x = x ** 3 |
| `//=`     | x // 3         | x = x // 3 |

Notice that numeric types are **unchangeable** so if we want to change the number assigned to a variable we need to reassign it like this:

```python
x = 3      # We have assigned 3 to x
x + 3      # Here x has not changed
x = x + 3  # Here we have reassigned x to x + 3 (or 6)
```

Using assignment operators is a simpler way of rewriting the same experession:

```python
x += 3  # Here we have reassigned x to x + 3 (or 6)
```

Try some other assignment operators to get used to the syntax.

In [None]:
x = 5  # Assign 5 to x
x * 2  # No change to x

# Let's verify that nothing has changed
print(x)

# Now we will try an assignment operator
x *= 2
print(x)  # Notice that x is now 10

# We use the other syntax when we want to store the answer in a different variable
y = x / 2
print(y)

### Comparison operators

Comparison operators are used to **compare** values. They return the **Booleans** `True` or `False`. These operators are useful for controlling the flow of a programme. We will learn more about this in the [Control flow](#Control-flow) section.

| Operator | Name                       | Example |
|:---------|:---------------------------|:--------|
| `==`     | Equal                      | x == y  |
| `!=`     | Not equal                  | x != y  |
| `>`      | Greater than               | x > y   |
| `<`      | Less than                  | x < y   |
| `>=`     | Greater than or equal to   | x >= y  |
| `<=`     | Less than or equal to      | x <= y  |

Notice how the equal operator is `==` rather than `=`. Why do you think this is the case?

Hint: *Remember the assignment operators.*

**Before running the cell below try to work out what the output will be.**

In [None]:
x = 15
y = 5.5

print(x == y)
print(x != y)
print(x > y)
print(x < y)
print(x >= y)
print(x <= y)

Note: *Comparison operators are not limited to numerical types. Try comparing some strings.*

----

## Imports

In Python, we can use the `import` keyword to include and utilise external code. Python comes with many pre-installed modules and packages, and additional third-party or custom packages can be installed via `!pip` or created as needed.

>The Anaconda distribution of Python includes many packages relevant to data science, machine learning, and scientific computing. Additional packages can be installed via `!conda`, which efficiently manages dependencies and environments.

A **library** is a collection of modules and/or packages bundled together to provide specific functionality. They may include multiple packages and modules or even other dependencies.
* A **package** is a collection of Python modules.
    * A **module** is a single Python file. It may contain multiple functions and classes.

For example, Python comes with a module called `time`. We can use it to print the current time.

In [None]:
import time  # Import the module

current_time = time.ctime()                  # Get the current time
print('The current time is:', current_time)  # Print the current time

Sometimes we may only need a specific **module** or **function** from a **package** or **module**.

To achieve this, we can use the `from <MODULE/PACKAGE> import <FUNCTION/MODULE>` syntax.

Once code is imported, there is no need to repeat the `import` command. The imported module, function or package remains available to all cells executed afterward. Re-importing is unnecessary and will not have any effect beyond the initial import.

Note: *The outputs of cells that use the above and below importing methods will be identical, aside from the exact time of execution.*

In [None]:
# Import the `ctime` function from the `time` module
from time import ctime

current_time = ctime()
print('The current time is:', current_time)

In Jupyter, you can press `Shift` + `Tab` while your cursor is inside a function's parentheses to view its documentation.

In Colab, you can highlight the function or place your cursor on it and hover to see the docstring or use the tooltip popup.

Try it after running the cell below:

In [None]:
import math

math.exp(10)  # Use the shortcut when this line is selected

----

## Errors

The Python error messages are pretty intuitive, so you can work out where you went wrong. Execute the cells below; they contain examples of code that will raise errors.

In [None]:
10 / 0

In [None]:
int('hello')  # Will raise a ValueError as we cannot convert a string to an integer

Note: *However, we can convert a string with no alpha characters into an integer.*

In [None]:
number = '10'
print(type(number))

number = int(number)  # Reassign `number` to the integer version of its value
print(type(number))

Python has many types of errors to help the programmer debug their code. If you are interested, you can click [here](https://www.tutorialspoint.com/python/python_exceptions.htm) to see a list of all the different error types.

----

## Functions

In Python, we can create blocks of reusable code that can be called when needed. Programmers often use functions to prevent rewriting multiple lines of code or to break up complex processes, making the code much easier to read and debug.

We can define functions using the keyword `def`. After a colon `:` we require an indent. They are usually a `Tab` or four `Space`s.

```python
def my_function():
    # Do something here
    return
```

To call a function, use the function name followed by parentheses.

```python
my_function()
```

- Code within a function executes only when it is called.
- Variables defined within a function cannot be accessed outside of the function body unless explicitly returned.
- In Jupyter, after a function has been defined it can be used by any cell below.

The cell block below demonstrates the syntax for defining and calling a function.

In [None]:
def tell_the_time():
    """
    This function returns the current time.

    Indentation is crucial in Python to define the scope of the code.
    Ensure you use consistent spacing to avoid errors.

    The `return` keyword specifies the value that the function outputs
    when it is called.
    """
    import time  # Importing the time module within the function scope
    the_time = time.ctime()
    return the_time

# Call the function and save its output to the variable `current_time`
current_time = tell_the_time()

# Print the result
print("The current time is:", current_time)

Functions can also also take **arguments**/**parameters**. These are specified inside the parentheses. You can add as many as you want, just separate them with a comma.

#### Exercise 2: Personalised greeting

Complete the following tasks:

1. Try to change the code in the cell so that it prints your name instead.
2. Try to change the code to print a different greeting.

In [None]:
def greet(name):
    """
    This function generates a personalised greeting message.

    Parameters:
    -----------
    name : str

    The name to include in the greeting message.

    Example:
    --------
    To call the function and create a greeting:
    `greet('Emily')`

    Inside the function, the argument `name` is used to dynamically
    generate the greeting. This makes the function reusable with
    different inputs.

    Note:
    -----
    The `format` method is used to replace `{}` in the string with the
    value of the `name` argument.
    """
    greeting = 'Hello, {}!'.format(name)
    return greeting

# Example usage of the function
our_name = 'John Doe'
message = greet(our_name)  # Call the function and save the output

# Print the personalised greeting
print(message)

Learn more about functions by downloading this [Cheat-Sheet](https://github.com/ehmatthes/pcc_2e/releases/download/v1.0.1/beginners_python_cheat_sheet_pcc_functions.pdf).

----

## Control flow

Python also offers a number of ways to control how and what code is executed.

**Conditionals:**

```python
if ... else
```

**Loops:**

```python
for ... while
```

### Conditionals

We can combine comparison operators with `if` statements to control how code is executed:

```python
if CONDITION:
    # do something
```
    
We can chain multiple comparisons together using `elif` and `else`, giving us even more control over the execution of code:

```python
if CONDITION:
    # do something
elif CONDITION:
    # do something
else:
    # do something
```
    
`elif` is shorthand for 'else if'. When using `elif`, if an above comparison evaluates to `True` then any below conditions prefixed by `elif` will not be evaluated. If they were prefixed by `if` they will also be evaluated.

Try adjusting the code below to change the output.


In [None]:
x = 100
y = 2

if x > y:
    print('x is greater than y')

Again, try adjusting the code below to change the output.

In [None]:
x = 100
y = 200

if x == y:
    print('x is equal to y')
else:
    print('x is not equal to y')

Try changing the numbers to change the output.

In [None]:
x = 200
y = 33

if x < y:
    print("x is less than y")
elif x == y:
    print("x and y are equal")
else:
    print("x is greater than y")

### Loops

Python provides two primary types of loops: `for` loops and `while` loops.

A `for` loop is used to iterate over a sequence, such as a list, tuple, dictionary, set, or string... essentially any iterable object.

Using a `for` loop, you can execute a block of code once for each item in the sequence, making it a powerful tool for repetitive tasks.

```python
sequence = [1, 2, 3, 4]
for x in sequence:
    # do something
```

In [None]:
shopping_list = ['apples', 'cereal', 'beer', 'milk']  # Define a sequence of items

"""
As we iterate through the list, each item is evaluated individually.
After the evaluation of an item is complete we move to the next item
automatically.

In this case, the loop will assign each item in the list to the variable
`item`, so that we can use it in our code.

Here, we use a conditional to control how we process each item.
"""

for item in shopping_list:
    if item == 'beer' or item == 'milk':
        print('We need to buy:', item, '(Drink)')
    else:
        print('We need to buy:', item, '(Food)')

With a `while` loop we can execute a set of statements as long the corresponding stated condition evaluates to `True`.

In [None]:
i = 1  # We will start at 1

"""
Here we evaluate code until a condition is met.
In this case we are waiting until i > 6.

Remember we need to increment i. We use the assignment
operator here. If we forget to increment i, the loop
will run forever!

If this were to happen you could interrupt the kernel
by using ⌘/Ctrl+M+I (Colab) or pressing I twice (Jupyter)
to stop code execution.
"""

while i <= 6:
    print(i)
    i += 1

Check out the `while` loop [Cheat-Sheet](https://github.com/ehmatthes/pcc_2e/releases/download/v1.0.1/beginners_python_cheat_sheet_pcc_if_while.pdf).

----

## Reading files

In many Python workflows, especially in data science and cheminformatics, you will often need to load data from external files for further analysis or processing.

A clean and safe way to open files in Python is by using a `with` block:

```python
with open('path_to_file') as f:
    # do something with the file
```

* `with` ensures the file is properly closed after reading, even if an error occurs.
* `'path_to_file'` is the location of the file on your system.
* `as f` assigns the file object to the variable `f`, which you can then use to access its content.

Within the tutorial folder, we have included a subdirectory named `data` that contains a sample text file: `lorem_ipsum.txt`. Its relative path is:

```
data/lorem_ipsum.txt
```

However, file handling differs slightly between **JupyterLab** (running locally) and **Google Colab** (running in the cloud). To make this notebook work in both environments, the code below:

* Automatically detects whether the notebook is running in Colab
* Downloads the file from GitHub if necessary (in Colab)
* Then reads and prints the file contents


In [None]:
import os

# Check if running on Colab
try:
    import google.colab
    IN_COLAB = True
except ImportError:
    IN_COLAB = False

# Set file path based on environment
if IN_COLAB:
    file_path = 'data/lorem_ipsum.txt'
else:
    file_path = '../data/lorem_ipsum.txt'

# If running in Colab, download the file
if IN_COLAB and not os.path.exists(file_path):
    os.makedirs('data', exist_ok=True)
    !wget -q https://raw.githubusercontent.com/MEDC0080/RDKitTutorial/main/data/lorem_ipsum.txt -O {file_path}

# Open the file and read its content
with open(file_path) as f:
    content = f.read()

# The file is now closed as we are outside the `with` block
print(f.closed)  # Should print `True` if the file is closed
print("\n" + "File content: " + content)  # The "\n" adds a new line

To check out more about reading files download this [Cheat-Sheet](https://github.com/ehmatthes/pcc_2e/releases/download/v1.0.1/beginners_python_cheat_sheet_pcc_files_exceptions.pdf).

----

## Discussion

That was it for the Python introduction.

Feel free to add more code cells and experiment with the concepts you have learnt.

You should now know enough to move onto the **RDKit-focused content** in [`02_rdkit_introduction.ipynb`](02_rdkit_introduction.ipynb). Use this notebook as a reference if you need to refresh any of the concepts covered here.

If you want to learn more there are some external resources linked at the beginning of this notebook. You can click [here](#Contents) to go back to the top.