# Thinking about programming

Research computing and data science are an integral part of research. Whether it is putting together a short script, managing big data or developing a full-blown software package, programming cannot be avoided. As the title suggests, this notebook contains short musings about what it is that we do when we program.

Before the fun starts, let's sit back and consider the big picture.

## What have you learned before this summer school

Ideally, you have completed the Coursera course "Introduction to Python for Researchers" by Chris Cooling. We'll build on this content. 

This course has been carefully designed to give you skills that most researchers need. For example, the course uses the VS Code environment, which is considered a good choice for a research project development, and introduces good software engineering practices, such as the concept of testing from the start.

By now, you should be familiar with most types of variables, their scope and have an idea how different types are stored in computer memory. You know how to test various conditions with Boolean operators and how to write an if statement. You have been introduced to variable types that can hold multiple values at the same time, such as strings and lists. You know how to iterate through these collections and have practised how to manipulate them.
Early on, you also learned about writing your own functions that enable you to make your code modular and to reuse it.

The exercises (especially the final one) allowed you to practise these concepts and also, importantly, made you think about how to design and structure a bit more complicated problem. You may have noticed that this is almost a separate skill. When you are relatively comfortable with Python syntax, you will find that the clean and efficient design of your code will gradually become the focus. This said, it may be a good idea to pause and think about programming in general.



***

## Thinking computationally

So, before we dive into Python syntax, let's think about what it means to program. 

Everyone has a different problem that they would like to address. Regardless of the problem, you all need to decompose the problem into smaller, manageable steps (similarly to giving someone directions). When you design the steps, you have to keep in mind how they would map them onto available "tools" that a programming language offers. When you are facing a large programming project, "modularity" comes into play. Broadly speaking, it is about deciding how to meaningfully divide your solution into parts (or modules) so that the code can be reused and built on.

You will have the opportunity to tackle various problems this week, and I encourage you to view this course as a programmatic or computational thinking course that happens to be taught in Python, rather than just a Python course.

***

## Building blocks of programming languages

### Terms

Lines of programming code are referred to as **statements** - a unit of code that has an effect, for example, calculating a value and assigning it to a variable ```a = b + 2```. Within statements, one can find **expressions**, for example, the addition of two numbers ```b + 2```. Expressions can contain values ```2``` and variables ```b``` and **operators** ```+```. Allowed operators depend on the type of the value or variable. For example, numeric operators include addition.

### Programming languages ecosystem

It is safe to say that most programming languages are designed using the same conceptual building blocks. Of course, syntax and details may vary, but the basic idea stays the same. This is good news! Once you have learned to program with one language, the others will become easier. Your focus will shift to the particular advantages and packages that each language offers instead.

Each language has advantages and disadvantages. Each has a traditional community and tooling around it. For example, at Imperial, you'll find R users predominantly in Biological Sciences. However, it may not be well known that there is an R community in the Department of Mathematics and the Business School. It is not that surprising when you realise that R is a statistical language at the core.

Let's take a stock of the basic building blocks (or grammar, if you will):

* variables
* functions
* conditionals
* loops
* comments
* memory management


***

**Variables**. Imagine you wrote a program that converts degrees Celsius to Fahrenheit. If the program were able to operate only on a single value, it would not be practical at all. Your program becomes significantly more useful by introducing a variable that can take on different values in your computation. The program can now convert any value you need!

In [2]:
# This statement converts only one value
print("100 degrees Celsius is", (100 * 9/5) + 32, "Fahrenheit")

100 degrees Celsius is 212.0 Fahrenheit


In [3]:
# A variable lets us process any value
celsius = 100
print("100 degrees Celsius is", (celsius * 9/5) + 32, "Fahrenheit")

celsius = 10
print("10 degrees Celsius is", (celsius * 9/5) + 32, "Fahrenheit")

100 degrees Celsius is 212.0 Fahrenheit
10 degrees Celsius is 50.0 Fahrenheit


**Functions** let you call multiple statements (or lines of code) by one name. Most of us know what ```print("Hello world!")``` means. By using the ```print``` function, you evoke a sequence of statements that results in the obligatory "Hello world" on your screen. Functions help to keep our code concise and efficient. Identifying statement sequences that are often used together and turning them into a function will always benefit your coding style. Python and Python modules offer a wide range of functions, and you can also define your own.

In [4]:
def celsius_to_fahrenheit(celsius):
    return (celsius * 9/5) + 32

print(celsius_to_fahrenheit(100))
print(celsius_to_fahrenheit(10))


212.0
50.0


**Conditionals**. If your code contains no conditionals or loops, it will always run line by line from top to bottom. Often, however, one needs to make decisions "on the fly"; for example, depending on the result of a calculation inside the program.

Let's think back to converting Celsius to Fahrenheit values. Imagine that you would like the program to tell the user if the value is above the boiling point of water. Conditionals enable you to do just this - different lines of code are executed for different values.



In [5]:
# A function that checks if the Fahrenheit value is above or below boiling point

def celsius_to_fahrenheit_boiling_point(celsius):
    fahrenheit = (celsius * 9/5) + 32
    if fahrenheit >= 212:
        message = "Fahrenheit value", fahrenheit, "is above boiling point."
    else:
        message = "Fahrenheit value", fahrenheit, "is below boiling point."
    return message

print(celsius_to_fahrenheit_boiling_point(100))
print(celsius_to_fahrenheit_boiling_point(10))

('Fahrenheit value', 212.0, 'is above boiling point.')
('Fahrenheit value', 50.0, 'is below boiling point.')


A **loop** will repeat statements in the "body" of the loop. One can select different types of loops depending on the task. ```For``` loops are good for repeating a sequence of statements for a predetermined number of times; for example, on a list of items. ```While``` loops are good for repeating code while a certain condition holds true.

In [6]:
# A function that converts a list of Celsius values to Fahrenheit
def celsius_to_fahrenheit_list(celsius_values):
    fahrenheit_values = []
    for c in celsius_values:
        fahrenheit_values.append((c * 9/5) + 32)
    return fahrenheit_values

# Example usage:
celsius_values = [0, 10, 25, 100]
fahrenheit_values = celsius_to_fahrenheit_list(celsius_values)
print(fahrenheit_values)


[32.0, 50.0, 77.0, 212.0]


**Comments** usually do not appear in this list of basic building blocks, but they definitely should. Comments in your code are lines that are not executed. They are included to help you, or whoever will use the code, understand what it does. A good practice is to comment on why a line of code is included rather than commenting on what it does.

In [8]:
# Refactored function with docstring
def celsius_to_fahrenheit_list(celsius_values):
    """
    Convert a list of Celsius temperatures to Fahrenheit.
    Parameters:
        celsius_values (list of float or int): List of temperatures in Celsius.
    Returns:
        list of float: Corresponding temperatures in Fahrenheit.
    """
    fahrenheit_values = []
    for c in celsius_values:
        fahrenheit_values.append((c * 9/5) + 32)
    return fahrenheit_values

**Memory management**. This point is also my addition. Your programming journey will benefit from having a mental model of memory management during program execution. When a program runs, each variable has to be stored in memory (RAM). The fine details of this may vary in different languages, and it is a good idea to know how. For example, knowing the way a matrix is implemented will enable you to speed up your code by accessing the elements of the matrix in the right order. Efficient memory use is at the heart of many optimisation techniques. You may think you are a long way away from having to do something like this... The reality is that we deal with an explosion of available data, and cutting-edge science often runs into scalability issues.

In [9]:
# Example: Check if two objects are the same in memory using 'is'

a = [1, 2, 3]
b = a
c = [1, 2, 3]

print(a is b)  # True, because b refers to the same object as a
print(a is c)  # False, because c is a different object with the same contents as a

True
False


## Debugging

Debugging refers to looking for errors or "bugs". The moment you start programming, you will encounter errors. This is completely normal and should not discourage you from embracing programming. The idea is to realise the inescapable nature of this phenomenon and adopt a level-headed attitude - emotions are not likely to help here.

In principle, one can distinguish three types of errors:

### Syntax errors 
The code does not follow the language syntax rules. These errors plague us all but beginners tend to be more prone to committing them. If a program contains a syntax error, it refuses to run and produces an error message. It is a very good idea to read those closely and get used to decoding them.

### Runtime errors
The program is syntactically correct and starts running. If, however, an instruction that cannot be completed is encountered, the program stops with an error message. For example, one tries to use a variable that has not been assigned a value.

### Semantic errors
These are the worst! The program runs and produces a result. BUT it also includes a logical flaw - a flaw in the "meaning" of the code - and the result is actually wrong. This brings us to testing. Good programming practice involves producing tests for each meaningful piece of code that one writes. These tests "exercise" the code on different values and extreme cases to uncover any potential logical flaws.


***

## Food for thought - natural and formal languages

Have you ever contemplated a comparative analysis of programming and spoken languages? Paraphrasing Allen B. Downey.

Programming languages are languages. So, how they compare to spoken languages? They are formal languages. Other examples of formal languages include mathematic notation or chemical formulas. Formal languages have no ambiguity and high literalness (each line means exactly what it says). 

The opposite example is natural languages (spoken languages), which are highly ambiguous and often not literal at all. For example, when we say that someone has "bats in the belfry", it means that they are acting eccentric, not that they keep bats. Natural languages are also highly redundant - there are many ways to express one idea. High redundancy makes up for high ambiguity. 

The high literalness of programming languages is achieved by strict syntax and interpretation. Each line of code follows the language syntax and implies a single interpretation. Remember, the computer does exactly what you tell it!


***

## Future directions 

Beyond programming, additional knowledge and skills may be required to complete the computational aspects of your projects. 

We recommend that you review [RCDS courses](https://www.imperial.ac.uk/early-career-researcher-institute/learning-and-development/courses-by-programme/research-computing-and-data-science/) for topics that may be useful. 

Below, I list a few resources that should be on your radar:

* Working with data and software at Imperial - check out this [page](https://www.imperial.ac.uk/early-career-researcher-institute/learning-and-development/courses-by-programme/research-computing-and-data-science/software-data/) for a comprehensive overview.

* Many research problems require long computations or big data processing. If you need to scale up, check out the [Research Computing Service](https://www.imperial.ac.uk/admin-services/ict/self-service/research-support/rcs/) that takes care of Imperial's high performance computing cluster. Remember that you will need to have some knowledge of the Linux command line take advantage of it.

* Are you ready to learn from realistic exemplars? Check out our repository of real projects developed by doctoral students for their postgraduate colleagues - [ReCoDE](https://imperialcollegelondon.github.io/ReCoDE-home/).


### Acknowledgement:
Many of the ideas in this notebook were derived from [Think Python by Allen B. Downey](https://allendowney.github.io/ThinkPython/).