# A quick tour of Python to understand basic Programming Structures


Spring 2017 - Prof. Foster Provost

Teacher Assistant: Maria L Zamora Maass


***

This notebook shows examples of Python built-in functions, packages and programming structures useful for Data Science and Business Analytics. This notebook is a modified version of [Rob Moakler](https://github.com/rmoakler/learning-data-science/blob/master/Spring%202016/Hands-on/Module%201%20-%20Python%20and%20IPython%20Notebooks/IPython%20Notebook%20Tour.ipynb)

IPython notebooks are made up of cells. There are two basics types of entries in an IPython notebook: text cells for comments, and code cells (commands). You can edit a cell by double clicking on it. You can get it back to the display mode (run a cell) by pressing the "Play" (▶) button, and you can also stop it with the "square" ◼︎ . Basically, all the tasks for cells can be found in the tool bar.



## Text cells

To write text in a cell we must select the cell and go to the toolbar to change it from "code" to "markdown". Now, you can write and do text formatting:

- Hashtag (number sign) is useful \# for titles
- Simple \*asterisk\* or \_underscores\_ to emphasize things: _example_. 
- Double **asterisks** to make things bold. 
- Square Brackets [ ] are for links and images
- Also, HTML code is allowed. Some resources can be found in [HTML w3schools](http://www.w3schools.com/html/html_examples.asp) <p style="color:red;">This is a text with HTML code.</p>

- And you can write math with $\LaTeX$ (this is a typesetting system for the production of scientific documentation https://www.latex-project.org/): This can be achieved by wrapping it in dollar signs, $x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$. If you don't know how to write a symbol, you can go to [Detexify](http://detexify.kirelabs.org/classify.html).





If you are ever stuck, just Google **"Markdown syntax"** since the language the formatting is done in is called Markdown.

## Code cells

Now, we will see a code cell (this is the default format of any cell). Here, we simply type any Python command and then click "Play". When we play a cell, the code in it is executed and it returns what we are asking for. Some of this information can be "remembered" as long as we keep this window open (session running). Code cells will always start with "`In [ ]:`". 

For example, in the following cell, I want it to remember that the **_VARIABLE_** called "x" is just the sum of two given numbers and I also want it to print the sum. Please select it and press the ▶ button to run it.

In [1]:

x = 5 + 5
print "The value of the 'x' variable is " + str(x) + "."


The value of the 'x' variable is 10.


Instead of having to manually go and click the "Play" button every time, you can also run a cell with your keyboard. Just press **Ctrl + Enter** or **Shift + Enter**. Experiment with both of those to see what the difference is.

We typically read IPython notebooks from **top to bottom**. This means that if a cell relies on a variable or function that was created earlier in the notebook, you must run the corresponding cell to make that information available in future cells  (_we cannot just call "x" in other cell if we don't run this one before_) !!!!

_Note_: The number in the "In [#]:" statement will always increase by one for every time you run a code cell.

## Python commands 

### 1. Variables, operations and data types
Variables are used to store data. 
This data can be of a variety of types: 

- Integer numbers
- Floating (decimal numbers)
- Strings

Let's create three variables of each type with 3 different names:

In [2]:
some_integer = 5
some_float = 7.12
some_string = "Student"

We can print out these variables. Remember we should run the previous cell first!

In [3]:
print some_integer
print some_float
print some_string

5
7.12
Student


What if I want to print some text and then some numbers? One easy way to do this is to realize that printing will always **want** string data. 

If you have data that is not a string (like an integer or float), you can **convert** it to a string: 

In [4]:
print "My integer is " + str(some_integer) + "."
print "My float converted into integer is " + str( int(some_float) ) + "."

My integer is 5.
My float converted into integer is 7.


What else can we do with our variables? We can do basic math: **operations**.

In [5]:
print "sum " + str( some_integer + some_float )
print "multiplication " + str ( some_integer * some_float )
print "quotient " + str( some_integer / some_float )
print "power " + str( 10**some_integer )

sum 12.12
multiplication 35.6
quotient 0.702247191011
power 100000


We can store this as a new variable and print it:

In [6]:
my_sum = some_integer + some_float
print "Sum variable: " + str( my_sum )

Sum variable: 12.12


There are also other **data structures**:

- Lists (or arrays)
- Dictionaries
- Sets


In [7]:
some_list = [0,0,1,2,3,3,4.5,7.6]
some_dictionary = {'student1': '(929)-000-0000', 'student2': '(917)-000-0000', 'student3': '(470)-000-0000'}
some_set = set( [1,2,4,4,5,5] )

print "This is a list:  " + str(some_list)
print "This is a dictionary:  " + str( some_dictionary )
print "This is a set:  " + str( some_set )

This is a list:  [0, 0, 1, 2, 3, 3, 4.5, 7.6]
This is a dictionary:  {'student3': '(470)-000-0000', 'student2': '(917)-000-0000', 'student1': '(929)-000-0000'}
This is a set:  set([1, 2, 4, 5])


How can we use  **individual** elements? 

In Python (and almost every other language), we should count elements of a _list_ starting from zero! To get the first item we should look in the 0th space:


In [8]:
print some_list[0]

0


Adding things to the list is as easy as appending them,

In [9]:
some_list.append(5)
print some_list

[0, 0, 1, 2, 3, 3, 4.5, 7.6, 5]


How can we call an element (**VALUE**) of a _dictionary_ ?  Use the **"KEY"** !! 

In [10]:
print some_dictionary['student1']

(929)-000-0000


### 2. Create functions

Functions allow us to do predefined operations, or in other words, to do encapsulation of procedures. If we knew we had to do some operation many times, and wanted to save a bit of time, instead of writing the same code many times, we could define functions. 

For example, consider having to calculate the area of a circle.

In [11]:
def area_of_a_circle(radius):
    area = 3.1416 * radius * radius
    return area

In [12]:

circle_area = area_of_a_circle(5)
print "Area of a circle with radius 5 is: " + str( circle_area)


Area of a circle with radius 5 is: 78.54


Can you see what is going on here? My function that I helpfully named `"area_of_a_circle"` takes one **argument** that we will call radius. It then uses this radius to get the area and then *returns* it. Now, whenever I want to get the area of some circle, I simply call `area_of_a_circle()` and place the radius in the middle of the parentheses.

### 3. Loops / iterations

We will be doing a lot of repetative things in Python. This doesn't mean we need to do a ton of copy and pasting, though. We can use **loops** to make this easy. For example, if we wanted to square each number from 1 to 5,

In [13]:
for number in [1, 2, 3, 4, 5]:
    print number * number

1
4
9
16
25


Let's use the function we did before. Remember this is a function that can only be used in **this notebook** 

( unless we write in a **"script"** file, but we'll see that later!! ):

In [14]:
for number in [1, 2, 3, 4, 5]:
    print "Area of circle with radius " + str(number) + " is: " + str( area_of_a_circle(number) )

Area of circle with radius 1 is: 3.1416
Area of circle with radius 2 is: 12.5664
Area of circle with radius 3 is: 28.2744
Area of circle with radius 4 is: 50.2656
Area of circle with radius 5 is: 78.54


### 4. Conditionals and comparisons

Sometimes we want to check something before deciding what to do next. For example,

In [15]:
def is_best_prof(name):
    if name == "Foster":
        return "Yes!"
    else:
        return "No!"

In [16]:
print is_best_prof("Foster")

Yes!


In [17]:
print is_best_prof("John")

No!


As we can see here, we made **comparison** of names with the "equal" operation  (==).

Other comparisons:

- strictly less than  < 
- less than or equal  <=
- strictly greater than  >
- greater than or equal  >=
- not equal  !=
- object identity  "is"
- negated object identity "is not"

What if we want to compare more than one element? 
We should include **bitwise operations** such as:

- "and", also known as "&" 
- "or", also known as "|"

Let's see if you can guess my age with this function!!


In [18]:
def is_my_age(age_argument):
    if age_argument < 21:
        return "Of course not!"
    elif (age_argument >= 21) & (age_argument <= 25):
        return "Maybe.."
    elif age_argument > 30:
        return "Don't even think about it!"

In [19]:
print is_my_age(10)

Of course not!


In [20]:
print is_my_age(23)

Maybe..


In [21]:
print is_my_age(40)

Don't even think about it!


### 5. Packages and built-in functions

Python has a ton of packages that make doing complicated stuff very easy. We won't discuss how to install packages, or give a detailed list of what packages exist, but we will give a brief description about how they are used. An easy way to think of why package are useful is by thinking: "**Python packages give us access to MANY functions!**".

This are pre-defined functions (built-in) that will make our life easier!! (e.g. the funciton 'str()' that we used to convert numbers into strings)

In this class we will use four packages very frequently: `pandas`, `sklearn`, `matplotlib`, and `numpy`:

- **`pandas`** is a data manipulation package. It let's you store data in data frames. More on this next class.
- **`sklearn`** is a machine learning and data science package. It let's you do fairly complicated machine learning tasks, such as running regressions and building classification models with only a few lines of code!
- **`matplotlib`** let's you make nice looking plots.
- **`numpy`** (pronounced num-pie) is used for doing "math stuff" such as complex math operations (e.g., square roots, exponents, logs) and give you complex matrix operation abilities.

If it's confusing as to why this is useful, don't worry. As we use them throughout the semester, their usefulness will become apparent.

To make the contents of a package useful, you need to import it:

In [22]:
import pandas
import sklearn
import matplotlib
import numpy

Sometimes you will want to use short names for packages. This has just become the norm now, so we will often be doing it so that we fit in with all the professional programmers.

In [23]:
import pandas as pd
import numpy as np

We can now use some package specific things. For example, numpy has a function called `sqrt()` which will give us the square root of a numpy. Since it is part of numpy, we need to tell Python that's where it is by using a dot.

In the following cell you can also see how to write **comments** in your code (professional programmers write comments to allow somebody else understand their code). You should always write commands and procedures considering that they'll be understandable and straightforward.

In [24]:

# In this part of the code I am using numpy (np) functions

print "Square root: " + str ( np.sqrt(25) )
print "Maximum element of our previous list: " + str( np.max(some_list) )

# In this part of the code I am using python functions

print "Number of elements in our previous list: " + str( len(some_list) )
print "Sum of elements in our previous list: " + str( sum(some_list) )
print "Range of 5 numbers (remember we start with 0): " + str( range(5) )



Square root: 5.0
Maximum element of our previous list: 7.6
Number of elements in our previous list: 9
Sum of elements in our previous list: 26.1
Range of 5 numbers (remember we start with 0): [0, 1, 2, 3, 4]


What about **pandas** ?? The basic aspect of this package is the concept of **DATAFRAMES**. 

A Dataframe is 2-dimensional labeled data structure with columns of potentially different types. It is generally the most commonly used pandas object. Along with the data, you can optionally pass index (row labels) and columns (column labels) arguments. If you pass an index and / or columns, you are guaranteeing the index and / or columns of the resulting DataFrame. [More details here](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe)

This is how it looks:


In [25]:
list1 = ['studentA',22,'(929)-000-000']
list2 = ['studentB',27,'(646)-000-000']
list3 = ['studentC',30,'(917)-000-000']

pd.DataFrame([list1,list2,list3],columns=['Name','Age','Mobile'])


Unnamed: 0,Name,Age,Mobile
0,studentA,22,(929)-000-000
1,studentB,27,(646)-000-000
2,studentC,30,(917)-000-000


We'll see more and more functions during the semester and you can always look for them (remember, google is your best friend) !!

#### 5.1. Auto complete for packages

One of the most useful things about IPython notebook is its tab completion. 

Try this: click just after `sqrt(` in the cell below and press `Shift + Tab` 4 times, slowly

In [None]:
np.sqrt(

I find this amazingly useful. I think of this as "the more confused I am, the more times I should press Shift+Tab". Nothing bad will happen if you tab complete 12 times.

Okay, let's try tab completion for function names! Just hit `Tab` when typing below to get suggestions.

In [None]:
np.sq

This is super useful when you forget the names of everything!

## 6. Help, help, and more help!

- [Codecademy's Python Course](https://www.codecademy.com/learn/python). Working though this class will give you a _great_ foundation for Python.
- [Diving into Python](http://www.diveintopython.net/toc/index.html) online book. Working you way from chapter 1 through chapter 5 would put you in a great place!
- [Python for Data Analysis](https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython-ebook/dp/B009NLMB8Q/ref=mt_kindle?_encoding=UTF8&me=) was the book that Prof. Foster suggested me when I was taking this course. You can take a look to the chapters: Preliminaries, Introductory Examples (e.g. "Counting Time Zones with pandas”), IPython (page 46 to 62) and specially, Pandas.


If you are ever stuck just remember: it is normal. This is actually how professional programmers work every day. Google is your best friend, and websites such as Stackoverflow.com has an answer to almost any programming question!


## 7. Hands-on

To master your new found knowledge of Python, you should try these hands-on examples. 

Your homeworks will be in a similar format to this section.

**1\. Create one list of 5 fruits and another one with 5 colors**

**2\. Go through each fruit (first list) and print out the name of the fruit with one color of the second list **

(don't worry, it doesn't have to be the color of the fruit!)

Example of what you should print:  _apple is purple_

**3\. Add two new fruits to your list with a _BUILT-IN_ function **

( Look for the function with the **TAB** hint! )

**4\. Use the list of fruits and sort the names (put them in alphabetical order) **

( Hint: Numpy has a great function for that!)

**5\. Create a new empty list called "count_letters". Go through your list of fruits, and for each one, add an entry to that new list (count_letters) telling the number of letters each fruit name.**

Example of what you should print: _apple is 5 letters_


** 6\. Make a function called `one_more_change` that takes a list (input_list) and returns a new list (output_list) where each element of the original will be increased by 1 and divided by 2. **

In [None]:
def one_more(input_list):
    output_list = []   # What should we do?
    
    return output_list

** 7\. Use the previous function to change the value of a list of 10 random numbers. **