<img src="./intro_images/MIE.PNG" alt="notebook banner image" width="100%" align="left" />

<table style="float:right;">
    <tr>
        <td>                      
            <div style="text-align: right"><a href="https://alandavies.netlify.com" target="_blank">Dr Alan Davies</a></div>
            <div style="text-align: right">Senior Lecturer Health Data Science</div>
            <div style="text-align: right">University of Manchester</div>
         </td>
         <td>
             <img src="./intro_images/alan.PNG" alt="Alan Davies Photo" width="30%" />
         </td>
     </tr>
</table>

# 2.0 Variables and strings
****

#### About this Notebook
This notebook introduces the concept of <code>variables</code> that can be used to store data that vary in the programs that we write.

<div class="alert alert-block alert-warning"><b>Learning Objectives:</b> 
<br/> At the end of this notebook you will be able to:
    
- Explore the different types of variable available in Python 

- Practice storing and accessing data in variables of different types
    
- Explore the storage, access and manipulation of textual data

</div> 

<a id="top"></a>

<b>Table of contents</b><br>

2.1 [Variable names](#varnames)

2.2 [Working with strings](#workingstrings)

2.3 [Changing a variables type](#changingtype)

Most programs receive data from some input that the program then manipulates in some way to produce a result or output. This input can be very diverse. For example it maybe a keypress in a video game to move a character on the screen, a list of payroll numbers or some data from a martian probe. In all cases we need to store this data somewhere. In computer programs we use <code>variables</code> to store data. The name variable implies that the thing being stored may vary. Lets look at some examples.

In [1]:
x = 1
weight_kg = 10.56
my_name = "David Smith"
price = 6
quit = True

In the example above we have 4 variables called <code>x</code>, <code>weight_kg</code>, <code>my_name</code>, <code>price</code> and <code>quit</code>. There are 4 main types of data including:<br>
<ul>
<li><b>Integers</b> - whole number values</li>
<li><b>Floating point</b> numbers - numbers with a point (period/dot) in them i.e. 10.56</li>
<li><b>Strings</b> - contain text</li>
<li><b>Boolean</b> - True or False values</li>
</ul>
Python is able to work out which <code>type</code> a variable is based on the value stored within. For example, it knows <code>price</code> is an <code>integer</code> because it contains an integer (whole number) value <code>6</code>.

<div class="alert alert-success">
<b>Note:</b> We use the term <code>floating point</code> because the dot doesn't necessarily represent a decimal point (base 10). It could be binary (base 2), octal (base 8) or hexidecimal (base 16) to name a few.  
</div>

The equals operator <code>=</code> is used for variable assignment. This is basically saying store the value on the right of the equals in the label on the left of the equals. i.e. <code>x = 1</code> means put <code>1</code> into a label called <code>x</code>. We can see what is inside a variable (what value it contains) by using the print function and passing in the variable name. We will talk more about functions and passing variables to them later. For now we will just use <code>print()</code> to display values.

In [1]:
weight_kg = 10.56
print(weight_kg)

10.56


Note that if we put a value between quotes like we did earlier i.e. <code>"Hello world"</code> this is a <code>string</code> (text value), in this case a <code>string literal</code>. If we don't use quotation marks, the value contained in the variable will output instead.

Another way of looking at it is like a box that you want to store some data inside which you can give a meaningful label to in order to help organise and store your data i.e. <code>weight_kg = 10.56</code>

<img src="./intro_images/box.PNG" width="300" />

<div class=accessibility>
<b>Accessibility:</b> An image of a box that weights 10.56 kg
</div>

We can use the label when we want to retrieve that value later for some computation or other processing. We can also see what type a variable is by using the <code>type</code> function. For the <code>weight_kg</code> it is represented by the word <code>float</code> because the value has a point in it.

In [2]:
weight_kg = 10.56
type(weight_kg)

float

<div class="alert alert-block alert-info">
<b>Task 1:</b>
<br> 
1. What type do you think the variables above (<code>price</code> and <code>my_name</code>) are?<br> 
2. Use the <code>type()</code> function to check in the cells below.
</div>

In [4]:
type(price)

int

In [5]:
type(my_name)

str

<a id="varnames"></a>
#### 2.1 Variable names

In the maths domain variables tend to be labeled with a single letter such as <code>i</code>, <code>x</code> and <code>j</code>... In programming we can afford to use longer and more descriptive labels that better describe the value they hold i.e. <code>weight_kg</code>, which suggests it might contain some data on weight measured in kilograms. The convention in Python is to use <code>snake case</code>. This is where words are written in lower case and separated by an underscore i.e. <code>data_file_loader</code>. Other languages like C and Java use <code>camel case</code> where new words (apart from the first word) are capitalised like the humps on a camel's back (i.e. <code>dataFileLoader</code>). There are a few restrictions to how we can name a variable in Python. These include:
<ul>
<li>The first character cannot be a number</li>
<li>The name can't be the same as an existing Python keyword (more about this later)</li>
</ul>
Variables can start with an underscore, contain letters and numbers and be any length. Case is important though. A variable named <code>my_name</code> is not the same as one called <code>My_name</code>. In this case you would have made (declared) 2 separate variables.

<div class="alert alert-block alert-info">
<b>Task 2:</b>
<br> 
1. Which of these are legal variable names? <code>_accounts</code>, <code>1005_accounts</code> and <code>my_accounts</code><br> 
2. Use the cells below to check and assign a value to each variable i.e. <code>_accounts = 10</code>.
</div>

In [7]:
_accounts = 10

In [8]:
1005_accounts = 10

SyntaxError: invalid token (<ipython-input-8-9b65dcd07650>, line 1)

In [9]:
my_accounts = 10

<a id="workingstrings"></a>
#### 2.2 Working with strings

Strings refer to textual data in a programming context. A string is made up of a set of characters. In Python strings are defined using either the single or double quotes.

In [6]:
"This is a string."

'This is a string.'

In [7]:
'So is this.'

'So is this.'

Strings can also be joined together (<code>concatenated</code>) using the plus operator.

In [8]:
"This string can be " + "joined to that string."

'This string can be joined to that string.'

There are some useful ways of interacting with strings in Python. Let's say we had a string:

In [11]:
my_string = "This is a text string."

We can use the <code>len()</code> function to see the length of the string (how many characters in the string).

In [12]:
len(my_string)

22

<div class="alert alert-success">
<b>Note:</b> This also counts the spaces too. These spaces are known as <code>white space</code>. 
</div>

To access certain characters in a string, you just need to specify the position of the character in the string (starting from 0). For example to access the letter <code>i</code> in the word <code>this</code> we would write:

In [13]:
my_string[2]

'i'

If we want to retrieve the whole word, we can provide a start (0) and end (4) position separated by a colon.

In [14]:
my_string[0:4]

'This'

This is called string <code>slicing</code>. Here are some further examples:

In [15]:
print(my_string[:])
print(my_string[-1])
print(my_string[3:-5])

This is a text string.
.
s is a text st


This is useful for processing textual data.

<div class="alert alert-block alert-info">
<b>Task 3:</b>
<br> Using string slicing, print the word <code>text</code> from <code>my_string</code>.
</div>

In [16]:
print(my_string[10:-7])

text 


It is important to also realise that numbers enclosed in quotes are strings and not numbers. 

In [14]:
x = "123"
print(x)
print(type(x))

123
<class 'str'>


<div class="alert alert-success">
<b>Note:</b> This can catch you out when doing things like asking for user input. This input may look like a number but may in fact be a string.
</div>

In the next cell we try and add the number 4 to our string which causes an error. We get around this by using <code>type casting</code> to change the integer 4 into a character 4. We can then add it to the string. We discuss this in more detail in section 1.3

In [15]:
x + 4

TypeError: must be str, not int

In [16]:
x + str(4)

'1234'

Alternatively we could turn the string <code>x</code> into an integer and add the number 4 to it.

In [17]:
int(x) + 4

127

You can display strings with variables in several ways including using a comma or the percentage operator: 

In [2]:
name = "Claire"

In [3]:
print("This is to say hi to",name)

This is to say hi to Claire


You can also use the percentage operator as a placeholder.

In [4]:
print("Hi %s nice to meet you" % name)

Hi Claire nice to meet you


The letter after the percent is related to the type of variable you want to print. The <code>%s</code> is for <code>string</code>. Some of the more commonly used letters can be seen below.

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg .tg-kiyi{font-weight:bold;border-color:inherit;text-align:left}
.tg .tg-fymr{font-weight:bold;border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-xldj{border-color:inherit;text-align:left}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-kiyi">Letter</th>
    <th class="tg-kiyi">Variable type</th>
  </tr>
  <tr>
    <td class="tg-xldj">d, i</td>   
    <td class="tg-0pky">Integer</td>
  </tr>
   <tr>
    <td class="tg-xldj">s</td>   
    <td class="tg-0pky">String</td>
  </tr>
   <tr>
    <td class="tg-xldj">f, g</td>   
    <td class="tg-0pky">Float</td>
  </tr>
</table>

Finally we can use an <code>f-string</code> which is a newer option in Python version 3 and above. You simply precede the string with an <code>f</code> and then place any variables between braces <code>{ }</code>.

In [5]:
print(f"Hi {name} nice to meet you")

Hi Claire nice to meet you


Another useful thing to be able to do is to find a word within a string. Let's say we were looking for a certain keyword in a string of text, such as <code>syncope</code> (a temporary loss of consciousness) in a longer string of text.

In [20]:
presenting_complaint = "A 68 year old male complained of feeling feint followed by an episode of syncope and headache."

In [21]:
presenting_complaint.find("syncope")

73

This returns the index of the word if it is found or <code>-1</code> if it is not i.e.

In [22]:
presenting_complaint.find("stroke")

-1

You can also split a string up based on a character called a <code>delimiter</code>. In this case we can split the string by spaces. We could also split by comma or other character if relevant.

In [23]:
words = presenting_complaint.split(" ")
print(words)

['A', '68', 'year', 'old', 'male', 'complained', 'of', 'feeling', 'feint', 'followed', 'by', 'an', 'episode', 'of', 'syncope', 'and', 'headache.']


<div class="alert alert-success">
<b>Note:</b> This splits items and stores them as elements (items) in a <code>list</code> separated by commas. We will cover list's a little later. 
</div>

The opposite of the <code>split()</code> function is the <code>join()</code> function which we can use to put the string back together again. 

In [25]:
joined_str = " ".join(words)
print(joined_str)

A 68 year old male complained of feeling feint followed by an episode of syncope and headache.


You can also add a character in between when joining. Let's say you had some data you wanted to separate with dashes or with no separation at all, for example DNA.

In [26]:
x = ["ctga", "ccta", "aact"]
x

['ctga', 'ccta', 'aact']

In [27]:
data_joined_dash = "-".join(x)
print(data_joined_dash)

ctga-ccta-aact


In [28]:
data_joined_nospace = "".join(x)
print(data_joined_nospace)

ctgacctaaact


In [29]:
comma_string = "name, dob, pmh, social_history, lab_results, next_of_kin"

<div class="alert alert-block alert-info">
<b>Task 4:</b>
<br> Using the string split function. Split the string <code>comma_string</code> by comma.
</div>

In [None]:
words = comma_string.split(",")
print(words)

We can also replace a word or words in a string with other words. Let's say we wanted to replace <code>syncope</code> with <code>LOC</code> for Loss Of Consciousness because we think this is a more recognized word. Here we use the <code>replace</code> function and put the word we want to replace first followed by a comma and then the new word.

In [30]:
presenting_complaint.replace("syncope", "LOC")

'A 68 year old male complained of feeling feint followed by an episode of LOC and headache.'

Other useful string functions include:

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg .tg-kiyi{font-weight:bold;border-color:inherit;text-align:left}
.tg .tg-fymr{font-weight:bold;border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-xldj{border-color:inherit;text-align:left}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-kiyi">Function</th>
    <th class="tg-kiyi">Description</th>
  </tr>
  <tr>
    <td class="tg-xldj">lower()</td>   
    <td class="tg-0pky">Changes text to lower case</td>
  </tr>
  <tr>
    <td class="tg-xldj">upper()</td>   
    <td class="tg-0pky">Changes text to upper case (capitals)</td>
  </tr>
  <tr>
    <td class="tg-xldj">isalpha()</td>   
    <td class="tg-0pky">Checks if text contains just text</td>
  </tr>
  <tr>
    <td class="tg-xldj">isdigit()</td>   
    <td class="tg-0pky">Checks if text contains just numbers</td>
  </tr>
  <tr>
    <td class="tg-xldj">isspace()</td>   
    <td class="tg-0pky">Checks if text is a space</td>
  </tr>
  <tr>
    <td class="tg-xldj">startswith()</td>   
    <td class="tg-0pky">Looks for a string at the start of another</td>
  </tr>
  <tr>
    <td class="tg-xldj">startswith()</td>   
    <td class="tg-0pky">Looks for a string at the end of another</td>
  </tr>
</table>

There are also a list of <code>escape characters</code> that can be used inside strings. For example <code>\t</code> for <code>tab</code>.

In [31]:
print("This is \t a tab")

This is 	 a tab


In [32]:
print("This is a \n newline")

This is a 
 newline


In [33]:
print("This is a \n\r newline and carriage return (like an old typewritter)")

This is a 
 newline and carriage return (like an old typewritter)


We can use <code>in</code> and <code>not in</code> to see if a word or substring (string within a string) are present. For example to see if the word <code>chest</code> is in the string below:

In [35]:
pc = "86 year old female with crushing central chest pain radiating down left arm."

In [36]:
"chest" in pc

True

In [37]:
"chest pain" in pc

True

In [38]:
"kidney pain" in pc

False

In [39]:
"kidney pain" not in pc

True

<a id="changingtype"></a>
#### 2.3 Changing a variables type

To store and process the values contained inside variables you may need to change their type from time to time (called <code>type casting</code>). For example when storing a phone number, we might want to store this as text rather than as an integer. There are several functions in Python for altering a variables type, including <code>int()</code>, <code>float()</code> and <code>str()</code>. 

In [40]:
pi = 3.141592
type(pi)

float

In [41]:
pi = str(pi)
type(pi)

str

In the example above, we declare a variable called <code>pi</code> and give it the value <code>3.141592</code> to represent the Greek letter pi ($\pi$) that represents the ratio of a circumference of a circle to its diameter. When we use the <code>type</code> function we can see that it is a <code>float</code>. Next we use the <code>str()</code> function to convert it into a string and overwrite the existing value. Now when we view the type it is a <code>string (str)</code>.

<div class="alert alert-block alert-info">
<b>Task 5:</b>
<br> 1. Cast the variable <code>pi</code> back into a <code>float</code> and then into an <code>integer</code>. Finally print its contents and it's type.
<br> 2. What value would you expect to see?
</div>

In [48]:
pi = float(pi)
pi = int(pi)
print(pi)
type(pi)

Or combining steps

In [45]:
pi = int(float(pi))
print(pi)
type(pi)

In the next notebook we will take a look at how to add comments to code to document the code base and keywords used in the Python programming language. 

### Notebook details
<br>
<i>Notebook created by <strong>Dr. Alan Davies</strong>.
<br>
&copy; Alan Davies 2021

## Notes:

In [2]:
#This cell maintains the accessibility of the notebook content.
from IPython.core.display import HTML
def css_styling():
    styles = open("./styles/custom.css", "r").read()
    return HTML(styles)
css_styling()