# Programming Basics in Python

In this course, and in data science in general, we'll predominantly use a programming language called "Python". Python is a flexible and popular programming language that is the most common language used in data science. In data science we most often use Python code wrapped inside. 

## Python, Notebooks, Code, Cells, and More

Python is a bit different from other languages you may have seen if you've programmed before such as C/C++ or Java in that it is "interpreted" rather than compiled, meaning that rather than taking the code we write and translating it into "machine code" (compilation) in one step, the python code is executed on the fly, step by step. This allows the Python code to run in what we are using here - a "notebook" file; a notebook ".ipynb" file is basically like a webpage with python code and the output of that code embedded in it. Notebook files are the most common tool used to hold and execute our code in data science for a few reasons:
<ul>
<li> We can look at data, code, and results all in one place, without leaving the page. This is useful as it helps us to explore and experiment quickly. </li>
<li> We can run individual parts of our code independently without needing to wait for it all to be processed. <li>
<li> We can make changes to our code easily and run multiple trials. </li>
<li> We can present out output easily, and add context to help make it clear to others. </li>
</ul>

In short, in data science we are generally trying to gain insight into some data, the notebook format fits well into that process as it allows us an easy method to basically document the process we went through and display the results. 

### Blocks in Notebooks

Out notebooks are made up of blocks - markdown (text) and code. The markdown cells are cells like this, they are basically HTML web pages. The cell below is a code cell, we can click the play button beside it to run that block of code, and it will execute on its own, showing any output it generates below. If there's an error in our code, it'll print that out below as well; there will be a little green check and a time counter when things run successfully. 

In [57]:
print("hello world")

hello world


### Markdown Blocks

What you're reading right now is a markdown block. Markdown is a simple way to format text, it's basically a simplified version of HTML. You can find a guide to markdown here: https://www.markdownguide.org/basic-syntax/ In notebooks we normally use the markdown cells to integrate context into our code. These markdown cells can also contain images, links, and other media.

To edit one of these cells, double-click on it, the source code will appear.

## Variables

Variables are a fundamental concept in programming. In short, a variable is an object (more on that later) that holds some value - a number, a string (text), a list, etc...

We can create variables by giving them a name, then saying what is in it. Of note, text needs to be in quotation marks, while numbers, variable names, and most other things are just text. 

### Naming Variables

Variables can be named almost anything you want, there are a few guidelines that will make it easier to manage though:
<ul>
<li> Names should have some meaning - we want to be able to identify what a variable is, especially if others ever look at our code.</li>
<li> It's common to shorten or abbreviate variable names, but it is probably a bad convention outside of extremely common ones from the outside world like rpm or b2c - even then, it's probably better to spell it out more. Modern coding tools, even the more basic non-AI based autocomplete type ones, are great at automatically filling in full variable names, so the advantage of being lazy is quickly disappearing. 
<li> Some varaible names have <i>default</i> usages that you'll get used to, it is a good idea to not overlap these names. A few of the really common ones are: 
    <ul>
    <li> X - features, or input values for predictive models.</li>
    <li> Y - target, or what we want to predict in a predictive model.</li>
    <li> Single letters (i, n, m, k, m, etc...) - often indexes or counters for position or looping. This will make more sense after the next ~3 workbooks.</li>
    <li> df 
    </ul>
<li> There are several keywords which you can't use, and may or may not be banned from using in different cases. Things like lowercase true/false, def, etc...</li>
<li> Names that come from other libraries (again, this will make more sense after a few workbooks) aren't banned, but overlapping names with functions that exist elsewhere and may or may not be imported into your code, can be confusing. For example, don't try to use names like pi or sqrt. </li>
</ul>

In [58]:
name = "akeem"
i = 0
height = 72.5

my_numbers = [i,height, 0]

In [59]:
name

'akeem'

### Output

In notebook files we get our output directly under each code block. In general, whatever the last line outputs, or returns (if anything), will be printed below the code block. If we want to print specific things, we can "wrap" our values in a print() function - this takes the output of whatever we give it and forces it to print. We can also see that each print statement terminates its own line by default, if we want to manipulate our text to be fancy and pretty, we need to do a little work to it. For now, we won't really worry about that much. 

<b>Note:</b> this is also a simple example of something we'll use a lot later, a function. The print function takes in some value as an input, and prints whatever that is as output. 

In [60]:
print(name)
print("is this many inches tall:", height)

akeem
is this many inches tall: 72.5


### Variable Types

Variables in Python are <i>weakly typed</i> (though this is kind of changing), meaning that a variable can hold any type of value and change between them. This is in contrast to other languages like C or Java, where each variable has a predefined type. We can list out some examples of different variable types here, don't worry too much about memorizing them if they are foreign to you, they'll become second nature after using them for a which.  Some common types we'll deal with are:
<ul>
<li> Integer - a whole number (-2,-1,0,1,2...)</li>
<li> Float - a decimal number (3.14, -23.2345, 0.04)</li>
<li> Character - a letter, number, or puncuation (a, 3, #, ')</li>
<li> String - text, a "string" of characters ("my desk is cold", "bob", "apple")</li>
<li> Boolean - a True/False value. <b>Note:</b> True/False is normally directly mapped to 1/0; the two sets of values are often used interchangably and you should probably assume that they'll be interpreted interchangablly if you use them.</li>
<li> List - a series of several variables. [12.4, 25.3, 34, 11] </li>
<li> Tuple - a "bundle" of two or more other variables into one package. (49.2, 137.6) </li>
<li> Dictionary - a collection of key/value pairs.</li>
</ul>

We can ask a variable what it is, this is more useful as your programs get more complex as you might be manipulating values and grab an incorrect type. We can use the type() as a debug tool to help us figure out what is going on.

In [61]:
print(type(name))
print(type(i))

<class 'str'>
<class 'int'>


### Changing Variables

Variables are automatically created when "declared" or instantiated, which is the 'name = "akeem"' type statements above. We can also change them by either redoing that declaration, or applying some action (function). 

In [62]:
print(i)
i = (i * 8) / 4
print(i)

0
0.0


#### Changing Types

Note that if we change what is in a variable, we can change its type. This is something that we need to be careful of as there is no inherent rule that stops you from changing someone's "bank account balance" into "Orange" in your code. 

In [63]:
testBool = True
testTouple = (1,2)
testChar = 'a'

In [64]:
#Change an existing variable
print(type(testBool))
testBool = "Apple"
print(type(testBool))

<class 'bool'>
<class 'str'>


## Simple Variable Manipulation

Our basic variable types are pretty intuitive to use, in most cases. We can do basic math with numbers, we can do basic string manipulation with text, we can build lists of other variables, and we can do some logical operations. 

### Math

We can do basic math with numbers, and we can use the standard operators: +, -, *, /, %, **. We can also use parentheses to control the order of operations. This is all pretty standard and similar to how you'd use a calculator or Excel. A few things that might be new are:
<ul>
<li> % - this is the modulus operator, it returns the remainder of a division operation. 5 % 2 = 1, 10 % 3 = 1, 10 % 2 = 0. One common place this comes up is if we need to do something to "every 5th name", or calculating how many crates are needed to hold a bunch of boxes. </li>
<li> ** - this is the exponent operator, it raises the first number to the power of the second. 2 ** 3 = 8, 3 ** 2 = 9, 2 ** 4 = 16. </li>
<li> += - this is a shorthand operator, it adds the value on the right to the value on the left, and stores the result in the variable on the left. This is the same as saying x = x + 1, but is a bit shorter. </li>
</ul>

In [65]:
#Math examples
i += 5

In [66]:
i = i + 5

### Strings

Strings hold text and we commonly want to manipulate that text. Some simple things that we can do with Strings are:
<ul>
<li> Concatenate - we can add two strings together to make a longer string. "Hello" + "World" = "HelloWorld" </li>
<li> Split - we can split a string into a list of strings based on a delimiter. "Hello World".split(" ") = ["Hello", "World"] </li>
<li> Replace - we can replace a substring with another substring. "Hello World".replace("World", "Bob") = "Hello Bob" </li>
<li> Find - we can find the index of a substring. "Hello World".find("World") = 6 </li>
<li> Upper/Lower - we can change the case of a string. "Hello World".upper() = "HELLO WORLD" </li>
<li> Newline - we can add a newline character to a string. "Hello\nWorld" = "Hello <br> World" </li>
</ul> 

In [67]:
# String Manipulation
greeting1 = "Hello"
greeting2 = "Everyone"

greeting = greeting1 + " " + greeting2
greeting

'Hello Everyone'

## Exercise

Let's stop and try some simple variable creation and manipulation exercises. Try to:
<ul>
<li> Create a variable called "name" and set it to your name. </li>
<li> Create a variable called "age" and set it to your age. </li>
<li> Create a variable called "height" and set it to your height in inches. </li>
<li> Create a statement that prints out your name, age, and height in a sentence. </li>
<li> Add a new block that udpates the name, age, and height to someone else's values and print it again. </li>
</ul>

In [68]:
# Create code.

## Objects and Viewing Variables

One concept that we'll address in more detail later, but I want to introduce early, is the idea of an object, in the context of object-oriented programming. Read this now, if everything doesn't make sense, that's fine, we'll revisit it soon. This is a concept which is pretty simple... once it clicks for you. When you first start programming it will take a bit for it to click, and that "bit" is really variable from person to person. In short, the idea of object-oriented programming is that we create "objects" in the memory of our program, and the bulk of the program is those objects interacting with each other and being acted upon. For example, we can think of a simple version of a university, think of a system to track grades - a more simplified Moodle. There would be objects representing, roughly, each noun that we care about - student, class, class-offering, registration, instructor, etc... Each of these objects would be made up of a few things:
<ul>
<li> Attributes - these are variables that hold information about the object. For example, the student object would have an attribute for name and birthdate, the class-offering object would have attributes for location and time. </li>
<li> Methods - these are functions that the object can "do" or have done to it. For example, the class-offering object may have a "reschudle" function that updates that offerings time and place after checking for conflicts, or it may have a "close course" function that finalizes the grades and marks things to read-only. 
<li> Self - one important aspect is that objects are "aware" and can refer to them selves. There is an automatic "self" prefix that an object uses to refer to and act on it's own attributes and methods. For example, a class-offfering object can call self.description to update the attribute of the offering's description, or it can call it's self.reschedule() function to change when or where it is.</li>
</ul>

A program is basically built up from creating instances, or individual examples, of the different types of objects that we need to model our scenario, then using the different methods to perform whichever actions need to be done. For a school, a bunch of students, courses, classes, and registrations are created with all their attributes, then their methods to register, withdraw, schedule, grade, etc... are called to go through the process of doing school. 

If it helps, we can kind of visualize the objects physically. We have a template of a generic student, class, and offering, and when one is created we instantiate, or create a new instance of the object from that template. These objects float around and interact with each other - a student object calls on a class-offering object's "registerStudent" function to create a registration object when they click on the web button to register. That function takes the student's info, the "self" info about the offering, checks to make sure there's no space or schedule conflicts, and calls on the constructor (a function that creates a new object) of the registration class to make that new registration object connecting the student and the class, and store that new registration object in the bucket with all the others. 

As we make simpler variables like strings and numbers, the same process is occurring - though for the built-in data types it is kind of obscured for convenience. Every time we make a string variable we are actually calling on the string class to make a new string object, and when we add to or subtract from it we are calling on it's functions to do the work on itself. We just don't notice because the language allows us to use strings without effort, because they are so common. One of the most common programming languages, C++, did not, at least when I learned it in school have built in strings - you would have to either import an outside library, or use the default of an array of characters. Python is way easier. 

### Viewing Current Objects

We can view a list of the currently existing objects in the program's memory at any time by clicking the "Variables" button on the toolbar of VS Code. We get a list of all the objects that have been created to this point and their values. For simple things like integers, strings, even lists we'll see the actual value, for complex objects like a student in our fake school there will be lots of data and objects in objects in objects (e.g. a student could contain "address" objects, which in turn contain "postal code*" objects, which in turn contain strings), so the "value" column becomes less readable. These objects are all of the things that our program "knows", and each (mostly) line of code either adds to or does something to one the objects. Programming is just doing the correct action to the correct one of my objects in that variables list, that scales up to just having larger lists of larger numbers of items, but the concept is the same. 

In my experience learning how to code, really understanding this conceptually and being able to relate it to actual programming was a big step in code really making sense to me. 

\* You may say, "the postal code is just a string, why make another type of object for it?" This is a good example of smart design to make things easier. If you had a normal string, you'd probably want to check if that was valid as a postal code, so you'd write a function to check it. If there was a postal code object, it would only allow valid postal codes to be created as it would check itself - this would mean that you never miss something in verifying a postal code and any changes are automatic. There could even be the ability to have the object check itself vs the street address provided to verify accuracy. We want to centralize actions in one location as much as possible almost all the time, which means we want to have objects that can "handle themselves" or do most things you may need to do with an object internally. Someone shouldn't be updating a postal code by grabbing the student object's "postal code" text field and typing in a new value; the update should happen by accessing the "postal code" object, asking it to run the "update" function with the new postal code as an input, which then makes sure that everything is ok and does all the work. No one needs to know anything about a postal code to use it, this makes it portable to other programs, and rather than having a million implementations of everything leading to inconsistencies and errors, we can have one object that does a thing, check that it works, then everyone can just borrow it. This is what we do when we import the common packages that we'll need for data science - pandas, sklearn, keras, etc... - there is a standardized implementation of "accuracy" that we know is correct, and we can just call it instead of calculating it. 

## Lists and Data Structures

In most programs we want to have lots of variables which allow us to do some complex things. In programming speak that leads us to the idea of <b>data structures</b>, or containers that hold many variables at once. Lists are the main data structure in Python that we can use to hold other variables. Lists are ordered, meaning that the order of the items in the list is important. We can access items in a list by their index, which is their position in the list. We can also add items to a list, remove items from a list, and do other things to manipulate the list. In Python, lists are denoted by square brackets [] and items are separated by commas. We can reference a specific item by using it's index in a square bracket after the list name - e.g. my_list[3] will return the 4th item in the list, as the 4th item is at position 3. 



In [69]:
my_stuff = [name, i, height, "this measures my height"]
my_stuff

['akeem', 10.0, 72.5, 'this measures my height']

### List Basics

Lists can contain a limitless number of items, of any type. Manipulating lists requires the idea of the position, or index, of an item in the list. This is how we can refer directly to one item from our list. To get a specific item we can use square brackets and the index, or position number, to specify one item from the list. 

Some common list operations that we can do are:
<ul>
<li> Append - we can add an item to the end of a list. my_list.append(5) </li>
<li> Pop - we can remove an item from a list. my_list.pop(5) </li>
<li> Sort - we can sort a list. my_list.sort() </li>
<li> Reverse - we can reverse a list. my_list.reverse() </li>
<li> Length - we can get the length of a list. len(my_list) </li>
</ul>

### Indexing and Length

Most data structures that have positions, like our list, are "0 indexed", which just means that the first position is position 0. This is in contrast to something that is "1 indexed", where the first item is item #1, like you might count in normal life. This isn't universal, but it is close to it, so the first item is (pretty much) always item #0. Extending from this, the last item will be "length - 1", as our list of 4 items has things in positions 0, 1, 2, and 3. We can ask for the length using the len() function. 

<b>Note:</b> there are other ways to get each item from a list that will be used more later as they are more efficient. The idea of accessing things via an index is transferable to lots of programming-ish topics and is really critical. 

In [70]:
print(my_numbers)
my_numbers.append(1)
print(my_numbers)
my_numbers.pop(1)
print(my_numbers)

[0, 72.5, 0]
[0, 72.5, 0, 1]
[0, 0, 1]


In [71]:
print(my_stuff[0])
print(my_numbers[2])

akeem
1


In [72]:
my_length = len(my_stuff)
print(my_length)
print(my_stuff[my_length - 1])

4
this measures my height


### Tuples

Tuples are a data structure that is similar to a list, but with a few key differences. Tuples are denoted by parentheses () instead of square brackets. The main difference is that tuples are immutable, meaning that once they are created they cannot be changed. This is useful for things like coordinates, where we want to make sure that the x and y values don't change.

Tuples are often used when we want to "package" a few values together, especially when we want to return multiple values from a function. Accessing values from a tuple is the same as a list, we just use the index to get the value. 

In [73]:
my_tuple = ("akeem", "semper")
print(my_tuple[0])
print(my_tuple[1])

print(my_tuple[0:2])
print(my_tuple[0]+ " " + my_tuple[1])

akeem
semper
('akeem', 'semper')
akeem semper


## Exercise

Let's stop and try some simple list and tuple creation and manipulation exercises. Try to:
<ul>
<li> Create a list called "exercise_list" of 5 numbers. </li>
<li> Create a tuple of the approximate latitude/longitude of where you are, add it to the end of the list. </li>
<li> Change the first item of the list to be the name of where you're located, e.g. "Edmonton". </li>
<li> Run the next block to print items from the list. </li>
<li> Update the latitude to be 2 degrees farther south. </li>
<li> Multiply the second and third items of the list. </li>
<li> Multiply the third and fourth items of the list. </li>
<li> Remove the original four numbers that you used above for the multiplication above, replace those 4 items with 1 item in the list - the sum of the two products. This should leave you with a 3 item list. </li>
<li> Run the final cell to print the list. </li>
</ul>

In [74]:
### Start Here
exercise_list = [2,4,6,8,10]
my_loc = (53.4, 135.2)
exercise_list.append(my_loc)
exercise_list[0] = "Edmonton"


In [75]:
exercise_list[5][0]

53.4

In [76]:
### Run this cell to check part 1
# str(non string thing to print)
print("Location: " + exercise_list[0] + " X:" + str( exercise_list[5][0] ) + " Y:" + str(exercise_list[5][1]))

Location: Edmonton X:53.4 Y:135.2


In [77]:
exercise_list

['Edmonton', 4, 6, 8, 10, (53.4, 135.2)]

In [80]:
True or 1 == 1

True

In [78]:
### Do more stuff
tmp_lat = my_loc[0]-2
tmp_lon = my_loc[1]
new_loc = (tmp_lat, tmp_lon)
exercise_list[5] = new_loc

tmp1 = exercise_list[1]*exercise_list[2]
tmp2 = exercise_list[3]*exercise_list[4]
exercise_list.pop(1)
exercise_list.pop(1)
exercise_list.pop(1)
exercise_list.pop(1)
exercise_list.insert(1, tmp1+tmp2)


In [79]:
### Run this cell to check part 2
print("Location: " + exercise_list[0])
print("Coordinates: " + str(exercise_list[2]))
print("My number:" + str(exercise_list[1]))

Location: Edmonton
Coordinates: (51.4, 135.2)
My number:104
