## Notebook 1 - Getting Started with Python & Google Colab

[Python](https://en.wikipedia.org/wiki/Python_(programming_language)) is a general purpose computing language with many versions, as it is being continually developed and improved over time.

The easiest way to start with almost any computing language is the canonical [Hello World](https://en.wikipedia.org/wiki/%22Hello,_World!%22_program) code.

In [1]:
print("Hello World!")

Hello World!


Libraries add extra functionality to Python - like a box of tools. We start by learning how to import and use one.

To see which version of Python we're using, let's importing the "sys" library containing the system tools we need, and then asking it for the Python _version_. These versions have three numbers, they're the same kind of thing you will recognise from iOS or Android.

In [2]:
import sys
sys.version   # Note that any text in a code cell that starts with the # character (hash character) is a comment, and will not be executed.

'3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0]'

You will find **+ Code** and **+ Text** buttons at the top-left of the notebook, directly under the View, Insert, and Runtime menus.

**+ Text** adds an executable code cell. **+ Text** adds a descriptive text box, useful for notes and explanations. I use both text cells and code comments to go a bit deeper into what the code is doing, and _why_.

Run code by clicking the ▶ "play" icon or hitting **Shift-Enter**.

In [3]:
# We can use the print statement to print an informative diagnostic message, telling us which version of Python we are running.

print("We are running Python", sys.version_info.major,"point",sys.version_info.minor, "revision", sys.version_info.micro)

We are running Python 3 point 12 revision 12


In [4]:
# We can import only the "date" function from another library "datetime" which lets us access the current system data and time, to improve our diagnostic message

from datetime import date
print("[",date.today(), "] We are running Python ", sys.version_info.major, ".", sys.version_info.minor, ".", sys.version_info.micro, sep="")   # sep="" stops the print statement adding spaces (without it we'd get "3 . 11 . 13" not "3.11.13")

[2025-12-10] We are running Python 3.12.12


A program that _only_ prints "Hello World!" is not _particularly_ useful. Let's modify the code to ask for your name and then greet you by name.

In [5]:
usr_name = input("What is your name? ")   # Ask the user for their name
print("Hello ", usr_name, "!", sep="")    # Print the greeting, formatted neatly

What is your name? 
Hello !


In [6]:
# What is this 'usr_name' token? usr_name is a variable, which we use to store the data entered by the user; you can see what sort of data type variable it is with the following:

type(usr_name)

str

Of course, if we typed in "World" as our input, we would get the same output as in the very first example. Let's see if the user is trying to do that.

In [13]:
usr_name = input("What is your name? ")   # Ask the user for their name
if usr_name == "World":
    print("A great way to start coding is by writing a Hello", usr_name, "program!")  # Indented: what to do if the condition is true. If the user typed "World" → give a special message (unlikely a person has that name)
else:
    print("Hello ", usr_name, "!", sep="")                                            # Indented: what to do if the condition is false → give the normal greeting

What is your name? Iain
Hello Iain!


The **if** statement above is a _conditional_. It allows the flow of program execution to branch depending on whether a condition is true or not, e.g. depending on the result of a decision.

Note that the blocks controlled by these flow-control statements must be indented with white space - either a tab or several spaces (usually 2 or 4). You can use either, just be consistent!

This will be useful to you later in machine learning models such as decision trees.

Let me show you some other ways to control program flow. We can have a loop:

In [8]:
for i in range(1,6):                     # The range is specified from the start( which is included) to the *terminating* condition (which is not) i.e. start=1, stop=6 (not included, so the last we see is 5), step=+1
  print(i)
print("Once I caught a fish alive\n")    # "\n" is the NEWLINE symbol; you will see it inserts a blank line

for i in range(6,11):                    # i.e. start=6, stop=11 (not included, so the last we see is 10), step=+1
  print(i)
print("Then I let it go again")


1
2
3
4
5
Once I caught a fish alive

6
7
8
9
10
Then I let it go again


In [9]:
for i in range(10,0,-1):                                   # looping down from 10 to 1 by increments of -1 i.e. range(start=10, stop=0 (not included), step=-1)
  print(i)
print("Blast-off! The rocket has cleared the tower.")      # (Fun fact: Python is NOT used for real rocket launch software! But it's great for getting up to speed on loops)

10
9
8
7
6
5
4
3
2
1
Blast-off! The rocket has cleared the tower.


**Try it yourself:**
1. Modify the countdown to start at 5 instead of 10.
2. Print "Ignition!" when i == 3.

In [10]:
# EMPTY CELL FOR YOUR WORK

Finally, let's just have a look at what type of variable the index variable i is ...

In [11]:
type(i)

int

Note that it's an int, not a str. Hold that thought for a moment though, it's important and we will definitely return to it.

These loops are over a known number of iterations. What if we don't know in advance when we should exit out of a loop?

The while construct allows us to handle these scenarios. Run the code below, give it your name, your friend's name, some other names. Finally, please enter the name [Godot](https://en.wikipedia.org/wiki/Waiting_for_Godot).

In [12]:
usr_name = None   # An uninitialised variable is still a variable.
                  # Try checking the type() of usr_name just after setting it to None
while usr_name != "Godot":
  usr_name = input("What is your name? ")   # Ask the user for their name
  if usr_name!= "Godot":
    print("Hello ", usr_name, ". We're waiting for Godot.", sep="")
print("Hello ", usr_name, ". It must surely be tomorrow by now!", sep="")

What is your name? Iain
Hello Iain. We're waiting for Godot.
What is your name? Godot
Hello Godot. It must surely be tomorrow by now!


A small note on strings. Remember: text values in Python have type str, short for string.

Fun fact: they're called this because they are a sequence of characters threaded together just like beads on a string.

"ABCDE" is just like -Ⓐ-Ⓑ-Ⓒ-Ⓓ-Ⓔ- when you visualise it like that.

You can use either single quotes 'hello' or double quotes "hello" to write string literals (strings that are constants). Why might you need to change between them? To answer that, look at - and run - this code.

In [15]:
usr_name = input("What is your name? ")   # Ask the user for their name
print("Hello ", usr_name, "!", sep="")    # Print the greeting, formatted neatly
print('I just spoke with a person called "', usr_name, '" and I read ', usr_name, "'s text!", sep="")   # acknowledge that the machine read the user's text

What is your name? Iain
Hello Iain!
I just spoke with a person called "Iain" and I read Iain's text!


Sometimes you might need a string literal containing a single-quote/apostrophe, or a double quote character. To assign such a string, use the _other_ type of quote to enclose it in a pair of quotes.

Indulge me now with a little Shakespearean reference, to introduce Boolean algebra:

In [16]:
to_be = True                     # try changing this to False

print(to_be or not to_be  # Aye,
      == True)
# and that is the rub.


True


**Try it yourself:** Copy the code above into the cell below, changing to_be to False. What happens to the output? Why?

In [17]:
# EMPTY CELL FOR YOUR WORK

This code illustrates a fundamental rule of Boolean logic: a variable is either True or False. Python never allows a Boolean to be “somewhere in between.”
It's known as the [Law of Excluded Middle](https://en.wikipedia.org/wiki/Law_of_excluded_middle).

The OR operator

In [18]:
print("X","Y", "X∨Y\n-------")
for left_condition in range(False, True+1):
  for right_condition in range(False, True+1):
    print(str(bool(left_condition))[0], str(bool(right_condition))[0],'' ,str(bool(left_condition or right_condition))[0])


X Y X∨Y
-------
F F  F
F T  T
T F  T
T T  T


The AND operator

In [19]:
print("X","Y", "X∧Y\n-------")
for left_condition in range(False, True+1):
  for right_condition in range(False, True+1):
    print(str(bool(left_condition))[0], str(bool(right_condition))[0],'' ,str(bool(left_condition and right_condition))[0])


X Y X∧Y
-------
F F  F
F T  F
T F  F
T T  T


The NOT operator

In [20]:
print("X", "¬X\n----")
for condition in range(False, True+1):
  print(str(bool(condition))[0], '', str(not condition)[0])

X ¬X
----
F  T
T  F


So our Shakespearean example is just saying that if variable X is the variable 'to_be', then _only_ the following two rows in the truth table can apply
```
X Y X∨Y
-------
F T  T
T F  T
```
Either way, the result _must_ evaluate to True.

Congratulations! You have now mastered Boolean algebra and formal logic.

**However**, haven't you wondered - *what machine* is actually running this code?

It's not your laptop. It's running in the ☁
(cloud).

Let me show you.

In [51]:
import sys, platform, os, psutil, shutil

process = psutil.Process(os.getpid())

print("=== ENVIRONMENT SUMMARY ===")
print("Python:", sys.version.split()[0])
print("Executable:", sys.executable)
print("OS:", platform.system(), platform.release())
print("Architecture:", platform.machine())

print("\nCPU cores:", os.cpu_count())
print("CPU usage: ", psutil.cpu_percent(interval=1), "%", sep="")

# Process CPU % is relative to one core; values >100% mean multiple cores in use
print("Process CPU: ", process.cpu_percent(interval=1), "%", sep="")

vm = psutil.virtual_memory()
print("RAM total: ", round(vm.total / 1e9, 1),"GB", sep="")
mem = process.memory_info()
print("Process RAM: ", round(mem.rss / 1e9, 1),"GB", sep="")

total, used, free = shutil.disk_usage(".")
print("Disk usage (current filesystem): ",
      round(total / 1e9, 1), "GB total, ",
      round(used / 1e9, 1), "GB used, ",
      round(free / 1e9, 1), "GB free", sep="")

=== ENVIRONMENT SUMMARY ===
Python: 3.12.12
Executable: /usr/bin/python3
OS: Linux 6.6.105+
Architecture: x86_64

CPU cores: 2
CPU usage: 3.5%
Process CPU: 0.0%
RAM total: 13.6GB
Process RAM: 0.2GB
Disk usage (current filesystem): 115.7GB total, 41.1GB used, 74.6GB free


That's not your laptop, is it?
It's the virtual machine, running in the cloud, at a Google data center. Let's see if we can find some more information about it.

In [52]:
import requests

def get_metadata(path):
    url = f"http://metadata.google.internal/computeMetadata/v1/{path}"
    headers = {"Metadata-Flavor": "Google"}
    try:
        return requests.get(url, headers=headers, timeout=2).text
    except Exception as e:
        return None

vm_name = get_metadata("instance/name")
zone = get_metadata("instance/zone")
project_id = get_metadata("project/project-id")
machine_type = get_metadata("instance/machine-type")

print("VM name:", vm_name)
print("Zone:", zone)
print("Region:", zone.rsplit("/", 1)[-1].rsplit("-", 1)[0] if zone else None)
print("Machine type:", machine_type)
print("Project ID:", project_id)

VM name: 
Zone: 
Region: None
Machine type: 
Project ID: 


We can't. Why? ***Cloud opacity***.

In shared cloud environments, you may not be able to discover **exactly where** your code is running — and that’s intentional. Professional systems are designed this way to balance security, privacy, and flexibility.

This abstraction means you rarely need to worry about where code runs — only that it runs correctly and reproducibly.

That allows you to focus on writing and reasoning about code rather than managing machines.

Very possibly, code that makes data analysis and machine learning work, at scale.

So we know some features of the virtual machine (VM) running our code.

*What about the laptop you are using right now?*

You now recognise they are not and cannot be the same machine (though you can certainly run Python on your own machine, but that is out of scope for this course).

Try this code cell. Does this reflect the reality of your own machine more?

In [54]:
from google.colab import output

output.eval_js("""
({
  userAgent: navigator.userAgent,
  platform: navigator.platform,
  cores: navigator.hardwareConcurrency,
  memoryGB: navigator.deviceMemory
})
""")

{'userAgent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36',
 'platform': 'Win32',
 'cores': 8,
 'memoryGB': 8}

Some of the fields might be accurate with regard to your local machine, some might not.

The reasons for this are beyond the scope of this course.

But look how far you have come, from coding and running the canonical Hello World app!

Please continue to Workbook 2.ipynb