# Tutorial 2: Variables
-----------------------------------------------------------------

## Overview: Variables
This next section takes a look at working with **variables** 
- Variables are how a program maintains its **state** or memory.<br> 
- State is the value of a program environment at a given point in time (or runtime).<br> 
- Programs check their state variables and make decisions based upon their values or simply display or use them in some fashion.<br> 
- Variables have some special properties (discussed later) which determine how they are to be used and what kinds of values they store.<br> 
- They, along with commands, are a fundamental building block for any program.

The goal is to familiarize you with specific topics without going off on too many tangents. Python, like any computer language, is complex with lots of moving parts. It can be a little overwhelming at first, but don't worry, we will do our best to fill in the gaps as quickly and completely as we can.

<div class="alert alert-block alert-warning"> <b>Attention:</b> This tutorial is for novices </a>. </div>

## Learning Objectives
After this session, you will be able to :
- Recognize Python's rules for naming and using variables.
- Recognize that variables can hold different data types which are dynamically typed (you don’t need to declare the type of a variable).
- Use basic string functions.
- Allow you to capture user input to a variable.

## Prerequisites
- Submodule 1- Tutorial 1: Python Overview
  
## Getting Started
As will be true in every tutorial, please "run" the next code box to install needed packages for (in this case) the quizzes

In [None]:
%pip install jupyterquiz
from jupyterquiz import display_quiz

# Variables
In Python, variables are used <u>to store data values</u> that can be <u>referenced and manipulated</u> in a program.

A variable acts as a labeled container for a value, allowing you to reuse and manage data efficiently. For example, you can assign a number, string, or any other data type to a variable, and then use the variable's name to access or modify the value.

Variables come in lots of different shapes and sizes, including
- Numbers
- Characters
- Strings
- Arrays
- Lists
- Objects and more

## Python is case-sensitive

This means capitalization matters in a name, whether it is a variable, module, method, class or function.<br>
Not all languages are this way, so be mindful of your variables in particular, so you don't create a nasty "bug" or program error that is hard to detect.
<br><br>
For example, "var" and "Var" are not the same reference.
<br><br>
There are naming standards for different languages. Python has one called the <b>PEP 8 Style Guide</b>. You can find it here: https://peps.python.org/pep-0008/
<br>
We will try to stay consistent with the PEP guide in this course. But, when you start writing code, don't worry too much about the particular "style", rather try to be consistent throughout so your readers know what to expect.
<br>
<br>
Try the following code:

In [None]:
# The style guide says variables should be all lowercase with words separated by the underscore
my_var = "Python is case-sensitive"

Now see what happens when you "miscase" a variable.

In [None]:
print(my_Var)	# throws an error, the case is different from the declaration

Note that the capital "V" instead of the lower case "v" causes Python to consider this as a different variable from the one you declared.
<br>
<br>
It does a lookup in the variable table and can't find it, so it throws an error.
<br> 
<br>
Now this looks pretty easy to fix here, but imagine trying to scan a 15,000 line script to find a wonky variable.
<br>
<br>
It gets worse in that Python may end up creating a new variable with that wonky name and use it later, mucking up your program in difficult to determine ways.
<br>
<br>
That's why good software engineering practices are important and good to adopt early so you don't spend a lot of wasted time and effort trying to find that one careless mistake that is ruining months of coding frenzy.
<br>
<br>
The correct code is in the next cell.

In [None]:
print(my_var)	# This is OK, because we have defined a variable my_var, but not my_Var

## State

Let's take a closer look at the _concept_ of variables before exploring the types. 
Close your eyes for a moment and consider your "state". Are you hungry? Bored? Cold? Tired?
<br>
Suppose somebody asked you “How do you feel?” What would you say?
“I’m tired. I want to go back to bed. Regardless, your answer (should) provide an indication of your <b>state</b>.

So why are we interested in state information? *State information* allows us to make informed, as opposed to arbitrary, decisions.
<br>
Stop when the light is red, go when it is green instead of stop/go whenever you feel like.
<br>
<b>I’m hungry</b> is a state that indicates I should go find something to eat.
- However, other states may countermand that, such as I'm on a diet or busy.

Virtually all programs rely on state information to make decisions.
<br>
<br>
If you have ever looked at a flowchart, you might find this thing called a decision diamond: 
<br>
![image.png](attachment:15ac042f-3186-41cd-917b-5047b43ded62.png)
<br>
At the center of the diamond is usually state information upon which you will base a yes/no decision/action.
<br>
<br>
So, in order to actually have state information, we need to store it.
There are various ways to store things, but for now we’ll focus on computer memory and VARIABLES


## Numerical Variables

Python supports several numeric data types:
* Integers (int): Whole numbers, positive or negative, with unlimited precision.
* Floating-Point Numbers (float): Numbers with decimal points.
* Complex Numbers (complex): Numbers with a real and imaginary part, represented as a + bj.
* Decimals (decimal.Decimal): For precise decimal arithmetic, often used in financial applications.
* Booleans (bool): A subtype of integers, representing True (1) or False (0).
<br>
<br>
Python can do math with integers which dynamically to create values that have decimals (see below) 

In [None]:
#Use some simple math operators
x=1         #an integer
y=3         #another integer
x/y         #a floting point number

Each variable type has certain rules and limited methods that can be done on such a variable. It is not logical to ask for the absolute value of a letter or a word, but an absolute value function is useful for numbers.
<br>
Number variables have many possible operations, including these common ones in the table below, including the Python codes. 
<br>

<br>
<table>
<thead>
<tr><th>Arithmetic</th><th>Comparison</th><th>Mathematical Functions</th></tr>
</thead>
<tbody>
<tr><td>Addition +</td><td>Equal to ==</td><td>Absolute value abs(x) </td></tr>
<tr><td>Subtraction -</td><td>Not equal to !=</td><td>Round (x, n) </td></tr>
<tr><td>Multiplication * </td><td>Greater than ></td><td>X raised to the power of y pow(x,y) </td></tr>
<tr><td>Division /</td><td>Less than <</td><td>Maximum or min value max(x1,x2,...) min(x1, x2, ..)</td></tr>
<tr><td>Remainder (modulus) %</td><td>Greater than or equal to >= </td><td>Sum in iterations sum(iterable)</td></tr>
<tr><td>Exponential **</td><td>Less than or equal to <= </td><td>Trig functions: math.sin(x), math.cos(x)</td></tr>
</tbody>
</table>

Use the next box to try out math functions.

In [None]:
#This demonstrates using some of those tools. You should explore them all! 
#Typically, only one answer, to the "last" question, will appear when you "run" a python box, 
#so you need to remove the # to see different results

round(100/7, 3) # the 2nd piece of information is the number of decimals to round to
#x/y==3  #x & 7 were defined above. == asks "does x/y=3?"
#5**2

Some common math functions must be imported from the math module:
The next module we want to take a brief glimpse of is the math module.

Python provides the math module for numeric functions such as log and square root (sqrt) and lots of others.

You can find documentation for this and many more modules on the Python web site: https://docs.python.org/

In [None]:
import math
y = 10
print(math.sqrt(y))

## Strings

<p style="background:blue;color:white;font-family:arial"> <font size="4">Strings are very common in  bioinformatics!</font> </p>

Strings are a series of letters and/or numbers which you define within quotes.

A short list of the kinds of string data in bioinformatics would include:
<br>

* DNA sequence: "ATCGTACG" stored in a variable named "dna_sequence"
* RNA sequence: "AUGCAUG" stored in a variable called "rna_transcript
* Protein sequence: "MKKLDFE" stored in a variable named "protein_sequen"
* Protein refseq accession code: "XP_011519818.1" stored in a variable named "prot_ID"
* FASTA description: "insulin-like growth factor 1 receptor isoform X6 Homo sapience" stored in a variable "Description
e<br.


### Basic string functions
There are a few common and useful manipulations for string variables, shown here for a variable called var1. You can see that some are methods taking an argument (such as len(x)) and some are methods for strings, as we saw before with x.upper().
<br>
<table>
<thead><tr><th>Basic String Functions</th><th>Result</th></tr>
</thead>
<body>
<tr><td> len(var1) </td><td> Returns the length of a string</td></tr>
<tr><td> var1.upper() </td><td> Converts a string to uppercase</td></tr>
<tr><td> var1.lower() </td><td> Converts a string to lowercase</td></tr>
<tr><td> var1.find(sub) </td><td> Returns the index of the first occurrence of sub in a string or -1 if not found</td></tr>
<tr><td> var1.split(sep) </td><td> Splits a string into a list of substrings based on a separator</td></tr>
<tr><td> var1.strip()</td><td> Removes newline from a string, for example in a multiline FASTA</td></tr>
<tr><td> + </td><td> Joins two strings together, aka concatenates them</td></tr>
</body>
</table>

Try using a few of these tools on the provided strings or write your own.

In [None]:
# Strings
dna_sequence="atcgtacgc"
var1 = "This is a string."
var2 = 'This is also a string.'
print(dna_sequence.upper())
print(var1.split("s"))

In [None]:
var3 = """This is a 
multiline string
"""
print(var3)

It is quite common to fashion output in the form of a string:


In [None]:
print(var1 + " But, not a very interesting one.")

This is a simple pair of examples. Later we will look at some formatting options.
For now, let's turn our attention to indexing strings.

# Indexing and Iterating

Indexing and iterating in Python both involve accessing elements of a data structure like lists or arrays, but they serve different purposes. **Indexing** retrieves specific elements directly using their position, enabling targeted access (e.g., `var[2]` gives the third element of var). **Iterating**, on the other hand, sequentially accesses *each* element, often using loops, making it ideal for processing multiple elements but less efficient for isolated lookups. While indexing is precise and instantaneous, iterating provides a convenient way to handle collections without explicitly managing indices. We will iterate in the next tutorial.

## Indexing Notation & Numbering

It is likely that you would want to know WHERE in a long string (say, an RNA transcript) one could find the start codon. This position is the INDEX.
<br>

Python, like some other languages such as C++, is a "zero index" language. That means that the first position in the string is at position zero.
Python comes with special notation for indexing. A bracket around the value of a position in the string OR you can use a colon to collect a range of characters. 
<br>
- Note that Python uses a zero-based index. This means that the first position in a string has the value 0.
- The END of a string can be referenced with negative numbers: [-3] for the third-to-last character. A way to look at negative indexes is to think of the list as being recycled.
- To index more than one element, use the notation \[ start : stop ] 

In [None]:
print(dna_sequence)
print(dna_sequence[-1])

<div class="alert alert-block alert-info"> <b>Tip:</b> Try this: Find the index (position) of "T" in var1 (which is "This is a string.") and print the last character in a string, which is defined as position -1. Can you try other tasks? the "answers" are in the 2nd code box</a> </div>


In [None]:
#Try here

In [None]:
# Print a specific character using its index
x=var1.find("T")
print(x)
dna_sequence.find("g")  #how many indices will this return?

In [None]:
# print a chosen set of characters from a new variable that you create 
new_var=

#print characters by index

Strings have some other special properties in Python.
<br>
<br>
Strings are also a sequence object, so are iterable. But, they cannot have characters substituted. This means they are "immutable".

In [None]:
# Iterating over a string
for i in var1:
    print(i)

In [None]:
# Produces Error because you can't change an immutable object
dna_sequence[2] = "c"
print(dna_sequence)

In [None]:
# Although you could reassign the whole variable value
dna_sequence = "ATGCCGATT"
print(dna_sequence)

### Test your knowledge
Run the next code box (with the Jupyter Quiz) to test your understanding of string variables

In [None]:
from jupyterquiz import display_quiz
str_qz = "PythonQuizQuestions/stringVarQuiz.json"
display_quiz(str_qz)

# Variable Type conversion

There are times when you may need to convert a variable from one type to another.
<br>
<br>
For example, you might want to print a person's age like this: "Sally is *age* years old" where *age* happens to be a number.
<br>
<br>
To do this, use the str() function to convert a number variable to a string or a string to a number with the int() function:
<div class="alert alert-block alert-info"> <b>Tip:</b> Try this: remove the str() around X in the "Sarah is" statement to see what happens if you treat an integer like a string WITHOUT converting it.</a> </div>

In [None]:
# Type conversion: numeric to string
letters= "ten"
x = 10
y=3*x
z=3*letters
print("Sarah is " + str(x) + " years old.") #Try using x without the str()-- it will not work since it's blending strings and numbers
# this is the more modern way to mix text and numerical variables. (f'What to print {numerical variable} more text')
print(f'Sarah is {x} years old')              
print(y)
print(z)

# Working with Escape Characters

While we are on the subject of strings, occasionally you will run into special problems such as:
- How to display characters which you can't see (like tabs and spaces)
- How to display a quotation mark
<br>

For this, we use the escape character "\" (backslash).

<table>
<thead>
<tr><th>Escape Sequence</th><th>meaning</th></tr>
</thead>
<tbody>
<tr><td>\\\</td><td>backslash (\\)</td></tr>
<tr><td>\\'</td><td>single quote</td></tr>
<tr><td>\\"</td><td>double quote</td></tr>
<tr><td>\\n</td><td>ASCII linefeed (LF)</td></tr>
<tr><td>\\t</td><td>ASCII horizontal tab (TAB)</td></tr>    
</tbody>
</table>
<br>
The official Python documentation for use of the escape character: [link](https://docs.python.org/3/reference/lexical_analysis.html#literals)
<br>
Try running the examples below — they will help you deal with the requirement in **Windows** to use backslashes (`\`) when specifying subfolders.

In [None]:
# Windows Path Example
print("C:\Bioinformatics\Datasets")

See the problem? The backslash is necessary for the Windows path, but it is also an escape character.
- One way to solve this is proper use of the escape character:

In [None]:
# A better way to input Windows paths
print("C:\\Data\\project")
print("C:\\Bioinformatics\\Datasets")

Mentioned earlier, another way is using the "r" prefix, which stands for raw, like so:

In [None]:
print(r"C:\Bioinformatics\Datasets")

### Operators

Operators allow us to perform operations on variables.
<br>
We can add, divide, concatenate, assign values and many more things.
<br>
<br>
It’s important to understand that operators are "polymorphic". That is, they behave differently depending upon what it is they are operating on.
<br>
For example, the **+** operator adds numbers but concatenates strings.

# Capturing User Input

On many occasions you will want to ask your user for input from the keyboard, which becomes the value for a variable
<br>
<br>
We can prompt a user for them to input the value for a variable, then use it in the program, like this:

In [None]:
# User Input: Note the input comes back as a string.
player1 = input("What is your name, player? ")
print("Welcome, ", player1)

# Square a Number. Again, even though we are asking for a number,
# the input will still be a string so we will have to convert using what is called a "cast" (see below).
# In this case, we are casting the string to a float (floating point number).
num1 = input("Enter the number you would like me to square: ")
num2 = float(num1)
print("The result is ", num2 * num2)

# Test Your Knowledge
Now it’s your turn to apply what you have learned. 

A common unit of bioinformatics data is a FASTA DNA sequence. It looks like this:

```
>crab_anapl ALPHA CRYSTALLIN B CHAIN (ALPHA(B)-CRYSTALLIN).             
MDITIHNPLIRRPLFSWLAPSRIFDQIFGEHLQESELLPASPSLSPFLM```R

The first line starts with a > then the name
The second (and subsequent) line(s) have DNA, RNA, or protein sequences.R

Write Python code to do the following:
<br>
<be>
<ul> 
<li>Prompt for input "FASTA sequence name. "
<li>Prompt for input "What is the DNA sequence? (use ATGC or atgc) of at least 10 bases."
<li>Output "The DNA fragment is: > (since > starts all FASTA sequences) then string together the fragment name, "\n" to get a return and the DNA sequence.
<li>Print just the last letter of the FASTA code (or any position(s) you choose).
</ul>

The solution is at the end of this Jupyter notebook, after the wrap up.

In [None]:
#Write your own code here

### Test Your Knowledge

Take the following quiz to check your coding knowledge.

In [None]:
from jupyterquiz import display_quiz
lesson2qz = "PythonQuizQuestions/fmquiz.json"
display_quiz(lesson2qz)

# Conclusion 

By now, you should have a basic grasp of some of the ways that variables can hold information and be used in Python scripts.
<br>
With that foundation, we will look at more advanced [data structures](./Submodule_1_Tutorial3_DataStructures.ipynb).

## Clean up
Remember to shut down your Jupyter Notebook instance when you are done for the day to avoid unnecessary charges. You can do this by stopping the notebook instance from the Cloud console. 

In [None]:
# Here is the solution for the above exercise:

# Prompt for input "FASTA name "
seq_name = input("FASTA sequence name: ")

# Prompt for input “Your last name is: “
DNA_seq = input("What is the DNA sequence?(use ATGC or atgc) of at least 10 bases ")

# Output of the FASTA segment: “The FASTA sequence is: “ and string together your 
# fragment name and contents
print("The FASTA sequence is: \n>" + seq_name + "\n" + DNA_seq)

#Extra Credit
print("The last letter in the FASTA sequence is: " + DNA_seq[-1])
