<h1>Welcome to Python and Pizza! 🐍🍕</h1>

<p>For this workshop we'll be using the Google Colab platform to deliver content.
Please follow along with the presenter as we navigate through each section. The layout will make it easy for members to run and test their own code. </p>
<br></br>

<h3>What will we cover in this workshop?</h3>
<p>We will teach programming in Python on a beginner level. There will be discussion of data visualisation tools such as Plotly, and data manipulation with Pandas.</p>


<img src="https://files.realpython.com/media/Newbie_Watermarked.a9319218252a.jpg"></img>

# Why learn Python?

Python provides a beginner-friendly introduction to problem solving with computers.
There are over 100,000 publically available libraries so chances are, there is already something out there that can help you with your problem. Learning Python will leave you in good stead with many companies, as it is an essential part of a person's problem solving toolkit today.

1.  It's fun!

2.  Practical applications
- Web development
- AI and Machine Learning
- Data Analytics and Visualisation
- Games!!

3.  Real world usage
- NASA
- Intel
- Pixar
- Netflix
- Facebook
- Tesla

4. It's versatile
- Complete both small and complex tasks

5. High-demand skillset



# Basics Part 1: How to use Google Colab?

First let's introduce the code cells seen below. Each cell can contain a separate sequence of code (logical instructions), which can be run by clicking the "Run Cell" button or pressing Ctrl+Enter over a cell. 
<br></br>
*Important Notes:*

1.   Any output is shown below the cell after pressing Run.
2.   If the code is faulty, there will be an error message giving information about where the error was.
3.   Each cell can be run separately to make separate changes, but they are all part of the same program. Variables will be the same across different cells.
4.   After running a cell, there is a number in square brackets [ ] indicating when it was run.



In [None]:
# This is an empty code cell. It doesn't contain any code, just this text.

# Basics Part 2: Python Fundamentals

##Variables
<p>Variables are useful to store information throughout the lifetime of our program. <br> They can be given a name and assigned values:
</p>

In [None]:
a = 10         

<p>If we run the above cell, what we have done is given the variable 'a' a value of 10. <br> As a sanity check, let's print out the value of 'a'.</p>

In [None]:
print(a)

10


<p>Running the above cell should display the number 10 below the cell. <br> The print statement is a useful tool to determine variable values, debug and display information to the programmer.</p>

##Variable Types

Another thing to remember about variables is that they also have a 'type'.
<br> The 'type' of a is currently an int (short for integer), which can be any whole number.
<br></br> 
**Common variable types**:

<pre>
*   int       (whole numbers)
*   float     (numbers with decimals)
*   string    (sequences of characters)
*   boolean   (either true or false)
</pre>

<br> There are more types which are used to hold more than one value at a time, such as:

<pre>
*   list        (stores an array of indexed items)
*   dictionary  (stores key, value pairs)
*   tuple       (a pair of variables)
</pre>

<br>
<p>Below is a brief example of how to define these variable types:

In [None]:
a = 10              #int
a = 10.5            #float
a = "hello world"   #string   -   must define using wrapped apostrophes '' or ""
a = True            #boolean

<p>Note that we reused the variable name 'a'. 

<br>This is extremely cursed code. But it proves a simple point. In Python, you can change a variable's type without any issues. Here we change it from type int, to float, to string and then to boolean.

<br>There's one other thing "wrong" with this code. The name 'a' is not very good as it can be easy to forget what its purpose is. We use naming conventions in Python to help programmers easily tell what the code is supposed to do.

<br>The following example uses better coding practices, and shows how to define the rest of the variable types:

In [None]:
# list of ints
intList = [0, 1, 2, 3]      

# dictionary pairing names (strings) to ages (ints)
agesDict = {'Bob': 15, 'Boba Fett': 20, 'Bobbinson': 99, 'Bobert': 150}      

# a tuple containing booleans
booleanValues = (True, False, True)

# these types can be initialised empty, like so
# note the use of different brackets to define a different type
intList = []
agesDict = {}
booleanValues = ()

Each of the secondary variable types will have their own specific methods to add/remove elements.

One example is shown below:



In [None]:
intList = [1, 2, 3]
intList.append(5)     # The list.append() function adds the new element to the end of the list
print(intList)        # Run this code to see the change

num = intList.pop()   # The list.pop() will remove the last item of the list, and return it
print(num)            # What will this print?

[1, 2, 3, 5]
5


We can also do something called 'indexing' to fetch elements of a list/dictionary placed at a certain index. Note that in most programming languages including Python, indexes are counted beginning from 0. 

<br>For example:

In [None]:
intList = ["a", "b", "c"]   # "a" has index 0
                            # "b" has index 1
                            # "c" has index 2

var = intList[0]
print(var)                  # What will this return? Run the code and see!

var = intList[2]
print(var)                  # What will this return? Run the code and see!


# Dictionaries are indexed into using key values
dict1 = {"hello": "world", "URC": "Is cool", "EWB": "Is also cool"}   
result = dict1["hello"]
print(result)

a
c
world


##Operations

<p>Mathematical operations are commonly used with variables in programming.</p>

![picture1](https://drive.google.com/uc?export=view&id=1lL9g17nAEsA4RfLrcdkHOTQPUIltYgpQ)

##Functions

Functions are a useful part of any program. They take an input, process the input, and then return an output. This is similar to mathematical functions of the form f(x).

<br>
<p>In python we declare a function using the following formatting:</p>


*   `def` keyword to say we are defining a function
*   Here `myFunction` is the name of our function
*   `myFunction` takes no inputs, as indicated by the empty brackets `()`
*   All functions have a `return` statement, to give an output. Here we have no output as the `return` statement is empty




In [None]:
def myFunction():
    return

The next function below is more intricate, and must take in some inputs. We call these parameters.

<br>Note that it also performs the operation `value1 + value1`.
The function then returns the result of that operation. The return statement also exits the function.



In [None]:
def double(val1):
    result = val1 + val1
    return result

# equivalent way of writing
def double2(val2):
    return val2 + val2

After running the above code cell, the function will be defined. You can then use the function by using the following format below. We "call" the function `double` using an input parameter of 10. The return value of the function is then assigned to the variable number.

In [None]:
number = double(10)
print(number)           # expect 20

20


##Conditional Statements

Conditional statements help build the logical flow of our program. The program equates them to be either `True` or `False` (Note that these are boolean values).

We can either use symbols to write conditional statements:
![picture2](https://drive.google.com/uc?export=view&id=1DDZHO5I1HQJMlHzUUzRj_hQaknNlzMu0)

Or we can use keywords such as the following:
![picture3](https://drive.google.com/uc?export=view&id=1PiH34TynEAId_ka9T9fECZuamyw4ZZ2D)

<br>Another useful feature of Python is the `if` and `else` statement. We use if-else statements to change what the code will do depending on what conditions are present. Say for example we want to divide a number by 2 if it is above 10, otherwise we want to multiply it by 2. The required code would look like this:



In [None]:
def myFunction(num):
    if (num > 10):              # Here our condition is whether num > 10. 
        return num / 2          # If the condition equates to True, this line is run, exiting the function.
    else:
        return num * 2          # Else statements are run when the condition equates to False.

##Loops

Loops allow us to iterate through a sequence, which could be a list, tuple, range of numbers or more. They are another important feature of many programming languages as they allow us to easily represent repetitive tasks.

<br>Common variants are the for loop, do-while loop and while-do loop. For this workshop we will only cover the for loop as it is the most important.

<br>To iterate through a sequence, use the following syntax:

In [None]:
numList = [0, 1, 2, 6, 7, 8]

for num in numList:
    print(num)          # This will print each number on a separate line

0
1
2
6
7
8


We use a variable to represent the current element, in this case `num`. This variable is updated for each iteration of the loop. We also select a sequence of elements to iterate through. In this case we chose a list.

<br>We will now see how to use the in-built `range()` function to iterate through a range of numbers:

In [None]:
for i in range(5):
    print(i)

0
1
2
3
4


# Data Manipulation: Using Pandas

##What is Pandas?

Pandas is a software library containing data manipulation and analysis tools. 

> Expound further on what pandas is 

<br>To start using pandas, we must first install the module. On your own computer, this would be done by running a command on the terminal: `pip install pandas`. 
However the necessary modules have already been installed for Google Colab, so there is no need to do that.

<br>To make use of the library, we must import it into our code. This is done like so:

In [2]:
import pandas as pd

Typically these import statements are included at the beginning of our code. Since we can run code cells sequentially in Google Colab, running the above import will allow us access to library functions anywhere else within this document.

<br>Also of importance, note that we import `pandas` under the nickname `pd`. This nickname will come up later as we use the library, and just helps us shorten the code a little bit.

<br>Moving on, let us look at the first `pandas` object you can create, called a Series:

In [None]:
myList = [5, 6, 7, 8]
mySeries = pd.Series(myList)


mySeries      # this singular statement will print information about the Series object 's'

0    5
1    6
2    7
3    8
dtype: int64

A **Series** is a one-dimensional labeled array able to store any data type. The axis labels are collectively referred to as the **index**.

<br> Note in the above code cell, our index just contains numbers. But we can change the index to be something else, like so:

In [None]:
indexes = [4, 3, 2, 1, 0]
mySeries = mySeries.reindex(indexes)

mySeries

4    NaN
3    8.0
2    7.0
1    6.0
0    5.0
dtype: float64

By default, values in the new index that do not have corresponding records in the current Series will be given a value `NaN`. This means Not A Number, as it is an empty field.

Another important data structure in pandas is called the `dataframe`. It's similar to a spreadsheet like Excel. Unlike the Series data type, you can have more than one column, which can also have their own labels. Let's see an example below:

In [None]:
# Let's manually create a dataframe
dataThing = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=dataThing)

df  # Google Colab does this cool thing where it formats your data nicely:

Unnamed: 0,col1,col2
0,1,3
1,2,4


Speaking of Excel, we can also import Excel spreadsheets and put them into dataframes. We do this through .csv files, which Excel has an export feature for. CSV stands for "comma separated values", which is a way to format a spreadsheet in a text file. 

<br>We'll use the `pandas.read_csv()` function to move our data.  Here is an example of reading csvs below using the file `test_data.csv`:

In [51]:
df = pd.read_csv('test_data.csv')  

df

Unnamed: 0,students,test1,test2,test3
0,Bobby,90,80,70
1,Bobbinson,0,2,4
2,Bobert,50,50,50
3,Bobemily,0,50,100


Let's index using the column of names instead:

In [21]:
df.set_index('students', inplace=True)

df

Unnamed: 0_level_0,test1,test2,test3
students,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bobby,90,80,70
Bobbinson,0,2,4
Bobert,50,50,50
Bobemily,0,50,100


Let's saw we want to add a new column to calculate the averages for each row. This will give us the average score for each student across all tests. This can be done like so:

In [37]:
averagesList = df.mean(axis=1)

df["averages"] = averagesList

df

Unnamed: 0_level_0,test1,test2,test3,averages
students,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bobby,90,80,70,80.0
Bobbinson,0,2,4,2.0
Bobert,50,50,50,50.0
Bobemily,0,50,100,50.0


Note that we specified that the axis should be "1". An axis of 1 means the function is applied over the indexes/rows, whereas an axis of 0 means we apply the average across the columns. If we wanted to instead find the average score for a single test, we would use an axis of 0. Let's see what this looks like:

In [42]:
test_avgs = df.mean(axis=0)
test_avgs.index = df.columns

test_avgs

test1       35.0
test2       45.5
test3       56.0
averages    45.5
Name: test_avgs, dtype: float64

##Data Visualisation

In [46]:
import pandas as pd
pd.options.plotting.backend = "plotly"

df = pd.read_csv('test_data.csv') 
df["averages"] = df.mean(axis=1)
fig = test_avgs.plot(kind="bar")
fig.show()