# Introduction to Python <a name="top"></a>

Welcome to the Introduction to Python briefing.

Topics:
0. [General Python Info](#generalInfo)
1. [Printing](#printing)
2. [Variables](#variables)
3. [Loops](#loops)
4. [Functions](#functions)
5. [Data Reading](#read)
6. [Data Manipulation](#manip)
7. [Data Output](#write)
8. [Scipy/Numpy](#extra)

## Installation.

We need to install Python and Visual Studio Code for this demo. Getting the jupyter notebook and git are recommended but not needed. 

### Required:

#### 1) Clone this repository: 
Click on the big green button and select "Download as Zip". This will download all the files you need for this class.


#### 2) Python
To install Python, you will need to visit this website and download the version that matches your operating system. You can find the installer at the bottom of the page.
<a href="https://www.python.org/downloads/release/python-382/">https://www.python.org/downloads/release/python-382/</a>


Once downloaded, double click on "Install Certificates". This will open a termnial window but dont worry.


#### 3) Visual Studio Code
We will be writing our command line program in Visual Studio Code. This is a free IDE that allows programming in a variety of languages.
<a href="https://code.visualstudio.com">https://code.visualstudio.com</a>
https://code.visualstudio.com

Once you have installed Visual Studio, when you open it up you will want to click on the box like icon on the left hand side of the window. This will open up the extensions that Visual Studio has installed. We want to install the Python linter. This will be labeled "Python" and is published by Microsoft. Just click the green install button and you will be good to go. A linter is a feature that is basically like a spell checker for programming. It will tell you if you have made some sort of mistake that will cause the program to fail to run. 

If you have any issues with this step we will go over it at the start of class so dont worry too much.


### Optional:

The following two programs are useful in a general sense but we wont be working with them directly for this class. If you are interested, feel free to pick these up.

#### 1) Jupyter Notebook
Jupyter Notebook is a program that lets you format your code in a way that makes it more accessable for non-programmers to understand. You can embed runable python code with accompanying text in order to make a presentable product.

To install you need to open terminal and run:

```pip3 install jupyter```

To start the notebook run:

```jupyter notebook```

#### 2) Git
Git is basically google docs for programmers. If you are planning on working collaberatively on just about any programming project, you will be coordinating work through git. It isnt too difficult to learn the basics and is very useful for version control.

https://git-scm.com/book/en/v2/Getting-Started-Installing-Git

You can clone from my repository for this project at:

https://github.com/cthummel/IntroductionToPython

This can be done on the command line with:

```git clone https://github.com/cthummel/IntroductionToPython```

or by selecting the big green button and downloading the zip.

 ## 0. General Python Info <a name="generalInfo"></a>
 
 First we will load in some packages that will be useful later on. Typically you include all of your package imports at the top of the file. Here we are installing numpy which is a package that has many useful data structures and commands.

In [2]:
!python3 -m pip install numpy scipy

import numpy as np
#import scipy
import sys, math, random



Next lets cover commenting. Comments are easy in python. Any line with # will comment out all info after the # until the next line.

In [3]:
#This line has been commented out.
print("After this, the following bits have been commented out.") #See you dont get the rest of this text.
#or this.
#or even this.

After this, the following bits have been commented out.


Comments are useful for your own bookkeeping about what functions or variables do within your program.

Each line in python does not require an ending semi-colon. That said, whitespace is fairly important. Each new line is considered to be a new command so you cannot have runon commands. Furthermore, as we will see later with if-then, for, or while commands, the spacing of the following lines of code is important.

In [None]:
if 2 + 2 == 4:
    print("Santa is real.")
else:
    print("I cant believe math doesnt work.")
    
for word in ["This", "is", "an", "array", "of", "words."]:
    print(word)

## 1. Printing: <a name="printing"></a>
To start with lets do the classic "Hello World" example. Below is a code block that is written in python and if you select the cell and click "run" at the top or press Shift+Enter, it will run the code.


In [None]:
print("Hello World")

The print() command will output everything within it to the console. This is an easy way to check the contents of variables or keep track of where you are in your program. Lots of print commands are kinda bad practice but they are useful as an initial debugging technique.

In [None]:
print(3)
print("This works?")
print("You", "can", "have", "multiple", "items")
print([1,2,3,4])

#Feel free to add your own print commands below.




## 2. Variables: <a name="variables"></a>

Typically we wont be handling hard coded values. Instead we will want to use variables to hold data that we assign dynammically.

Variables are pretty easy to handle in python. Since it is a scripting language, there is no type casting until runtime. That means when you name a variable, you dont need to tell the program what type of data it is. The program will try to figure it out itself.

In [None]:
variables = 3
are = 4.0
easy = "This is more than one word..."
butPleaseUseGoodNamingSchemes = 'a'

a = 10
b = 15
c = a * b

d, e, f = 20, 25, 30

print(c, math.log(c), easy, butPleaseUseGoodNamingSchemes)
print(d, e, f, d + e)



#As a quick side note, ^ is not the same as "raised to the power of"
print(a, a^2, a * a, a**2)

## 3. Loops: <a name="loops"></a>

Lots of the work that you will be doing in python will contain some sort of repetative process. For example, if you wanted to code a way of calculating the mean of a data set, you would need to sum all the values and then divide by the total number. To do that we would want to loop through the data set.

In [None]:
example_data_set = [29, 63, 74, 13, 44, 18, 53, 8, 68, 92]

for x in example_data_set:
    print(x)

In [None]:
i = 0
while i < 10:
    #We can use if statements if we want to do different things within the loop.
    if i == 9:
        print("Yikes we are at 9")
    elif i == 5:
        i += 1
        continue
    else:
        print(i)
        
    #Always occurs.
    i += 1
    

In [None]:
summation = 0
for x in example_data_set:
    summation += x
    
#To check our work lets divide the sum by the length of the vector. (To avoid integer division we multiply by 1.0)
print("Our work:", summation * 1.0 / len(example_data_set))

#We can check our work using the numpy mean function
print("Numpy results:", np.mean(example_data_set))



The most important thing to keep in mind with loops is the looping condition. You need to make sure that it fits your needs as well as having a clear end point. A poorly made loop can run infinitely or break before you want it to.

## 4. Functions <a name="functions"></a>

Often times we will want to run the same bit of code multiple times in multiple places. We can write our own functions in order to do this. The following is a simple function that takes no input and just prints a statement.

In [None]:
def sayHello():
    print("Dont tell me what to do.")
    
sayHello()

In general functions will take parameters and may or may not return objects/values to the caller. Below are some examples.

In [None]:
def mean(data):
    result = 0
    for item in data:
        result += item
    return result * 1.0 / len(data)

def raiseToThePower(number, power):
    return number ** power

def giveMeLotsOfStuff():
    return 1, 2, 3, "Hi", 5, 6

temp_array = [1,2,3,4,5,6,7,8,9,10]
print(mean(temp_array))

#Parameters by default fill in order but, if you want, you can specify which parameter you want to fill.
print(raiseToThePower(5,3))
print(raiseToThePower(power=3, number=5))

#You can return multiple items.
a, b, c, d, e, f = giveMeLotsOfStuff()
print("Multiple Items:", a, b, c, d, e, f)

## 5. Data Reading <a name="read"></a>

So this section will put together what we have been working on so far. We will want to write a function that takes a file path name and returns the data set that is in the file.

In [None]:
def readData(filename):
    header = ""
    dataset = []
    
    with open(filename) as f:
        #Grab the header at the top of file for later.
        header = np.array(f.readline().strip().split(","))
        
        #Get each row from the file.
        for line in f:
            row = line.strip().split(",")
            dataset.append(row)
            
    return header, np.array(dataset)

header, dataSet = readData("Data/risk_factors_cervical_cancer.csv")
print(dataSet[0])


rowCount, colCount = np.shape(dataSet)
print(rowCount, colCount)

### 6. Data Manipulation <a name="manip"></a>

Now that we have read in the data set, lets fix it up a bit. Right now each entry was read in as a string and not a number. Also we need to figure out what to do with the missing data.

In [None]:
def convertToFloat(data):
    for i in range(rowCount):
        for j in range(colCount):
            if(data[i,j] == "?"):
                data[i,j] = -1
    return data.astype(np.float)

    
dataSet = convertToFloat(dataSet)
print(dataSet[0])
print(dataSet[0, 0])

Now lets check that missing data. If we find a bunch of -1s in a column, that means that the column is missing a lot of data.

In [None]:
def checkMissingData(data):
    missingPercentage = np.zeros(colCount)
    for j in range(colCount):
        for i in range(rowCount):
            if(data[i,j] == -1):
                missingPercentage[j] += 1
    return missingPercentage * 1.0 / rowCount

print(np.around(checkMissingData(dataSet), 3))

So we can see that we have a two columns (26 and 27) with a high percentage of missing values. Lets remove those two columns. Keep in mind we will want to remove the corresponding headers as well or the data will be out of sync.

In [None]:
#np.delete(dataset, what to delete, row or column)
dataSet = np.delete(dataSet, [26, 27], 1)
header = np.delete(header, [26, 27], 0)
print(dataSet[0])