# Business Analytics - Unit 02

## Lab 01 - Python Basics

### Introduction

The purpose of this lab is to introduce the basics of Python and the IPython notebook to non-programmers and new business analysts. After completing this lab the student should be able to:

1. Gain comfort with IPython Notebook
2. Write output to the screen
3. Read and write text files
4. Basic programming control structures
5. Use standard Python data structures

### 1. IPython Introduction

*Please see the [unit 02 slides](https://docs.google.com/presentation/d/1SGOeJent24JNuDxn5RuQRXK81shfn1sxbMxsymnNkD0/edit?usp=sharing) for a description of IPython User Interface*

IPython notebooks consist of a series of blocks containing markdown text, executable blocks of Python code, and output blocks (the output of the code you've executed). You can these different kinds of blocks in the little window above. Executable blocks are "Code" and annotations are "Markdown." IPython notebooks are a means in which business analysts use to share and document code in their data projects. 

**Side Note:** *If you want to put a text comment in your code block simple preface the line with the '#' character and Python will not try to execute that line.*

### 2. Printing to the screen

This example will demonstrate how to execute Python code that will print to your screen. It will also help you understand how to use IPython and help build "muscle memory".

To execute the block, just place your cursor in the code block and click the run button in the tool bar above.

In [2]:
print('Hello, World!')

Hello, World!


See how simple that was! The output should have printed the text between the quote marks below the code block. 

You can also print variables from you code block:

In [3]:
message = 'Hello, World!'
print(message)

Hello, World!


Finally, let's look at combining the two types of print statements we used

In [4]:
print('To you I say:', message)

To you I say: Hello, World!


Notice that Python automatically added the the space between the quoted text string and the variable message? This example only works if you ran the code block before it, to assign 'Hello, World!' to the *message* variable - otherwise you would get an error!

### 3. Reading from and Writing to a file

#### Writing to a file

In [7]:
with open('output/test_output_file.txt', 'w') as g:
    g.write('I am writing to the test_output_file.txt')
    g.write('If I can read this my write operation is working!')
g.close()

In this example we specified the file location with a relative file specification. That statement 'output/test_file.txt' says in this working directory go to the output directory, and create a file called "test_file.txt"

This is a simple example, in more complex examples you can write processed data to a file for later use.

You'll notice that the 'write' statements are indented. Python is really particular about indented code. If you remove the ident you will get an error message. Python uses indentation to indicate a control structure; in this example Python will repeatedly do everything in the indent under the 'with open' statement. 

*Please note:* If you received and error indicating the file already exists, its because it was copied to your PC when you cloned the git repo. Simply find the file in the directory, delete it, and rerun the code block

#### Reading from a file

In [8]:
with open('input/test_input_file.txt', 'r') as g:
    for line in g:
        print(line)
g.close()

This is the first line of the text file

This is the second line of the text file

This is the third line of the text file

This is the fourth line of the text file

I hope that you enjoy Business Analytics!



After executing this code you should see each line of the test_input_file printed in a result block in your notebook. This is a simple example. In more complex cases you can save the content of the file to variables for processing later.

### 4. Basic programming control structures

A control structure is a block of programming that analyzes variables and chooses a direction in which to go based on given parameters. The term flow control details the direction the program takes. It is the basic decision-making process in computing. For our purposes the focus will be on the most widely used statements.

#### if statement

In [11]:
var = 100
if(var) == 100:
    print("1 - Got a true expresssion value", var)

1 - Got a true expresssion value 100


The **if** statement allows you to check if an expression for **TRUE**.  If the statement if FALSE execution ends wihtout anything happening.

#### elif statement

In [10]:
var = 100
if(var) == 200:
   print("1 - Got a true expression value", var)
elif(var) == 150:
   print("2 - Got a true expression value", var)
elif(var) == 100:
   print("3 - Got a true expression value", var)

3 - Got a true expression value 100


The *elif* statement allows you to check multiple expressions for *TRUE* and execute a block of code as soon as one of the conditions evaluates to *TRUE*. If none of the expressions are TRUE then execution ends without anything happening. 

#### else statement

In [15]:
var = 100
if(var) == 200:
    print("1 - Got a true expression value", var)
elif(var) == 300:
    print("2 - Got a true expression value", var)
else:
    print("0 - No true expression value", var)

0 - No true expression value 100


An else statement can be combined with an if statement and elif statements. An else statement contains the block of code that executes if the conditional expression in the if statement resolves to 0 or a FALSE value.

#### for loop

In [18]:
spanish_days_of_week = ['Lunes', 'Martes', 'Miércoles', 'Jueves', 'Viernes', 'Sábado', 'Domingo']
for day in spanish_days_of_week:
    print('The current day is', day)

The current day is Lunes
The current day is Martes
The current day is Miércoles
The current day is Jueves
The current day is Viernes
The current day is Sábado
The current day is Domingo


A for loop is a control flow statement for specifying iteration, which allows code to be executed repeatedly.

### 5. Python data structures

pandas is a Python library for data manipulation and analysis. It offers data structures and operations for manipulating numerical tables and time series. 

pandas is well suited for many different kinds of data:

* Tabular data, as in an SQL table or Excel spreadsheet
* Ordered and unordered time series data.
* Arbitrary matrix data with row and column labels
* Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure

The two primary data structures of pandas, Series (1-dimensional) and DataFrame (2-dimensional), handle the vast majority of typical use cases in finance, statistics, social science, and many areas of engineering. 

In [1]:
import pandas as pd
df = pd.read_csv('input/parking.csv')

The first step is to load the pandas library. This is done by importing the library using the *import* key word. In the previous code we used a local alias after importing the library, this will allow us to call function from the library. 

To load the sourde data from a .csv file to a *dataframe* we call the *read_csv* function from the pandas libary, passing the function the location of the source data. 

We can inspect the data in the dataframe by calling the head() function on the dataframe.  It display the first 5 records.

In [2]:
df.head()

Unnamed: 0,﻿X,Y,OBJECTID,ROWID_,DAY_OF_WEEK,HOLIDAY,WEEK_OF_YEAR,MONTH_OF_YEAR,ISSUE_TIME,VIOLATION_CODE,VIOLATION_DESCRIPTION,LOCATION,RP_PLATE_STATE,BODY_STYLE,ADDRESS_ID,STREETSEGID,XCOORD,YCOORD,TICKET_ISSUE_DATE
0,-76.994964,38.899564,32443136,7814262,TUESDAY,0,49,12,1034.0,P002,STAND OR PARK IN ALLEY,800 BLOCK 8TH ST NE WEST SIDE,MD,VA,802662,2041.0,400437,136856,2015-12-01T00:00:00.000Z
1,-77.066881,38.903706,32443137,7814263,TUESDAY,0,49,12,1547.0,P039,PARK AT EXPIRED METER,3400 BLOCK WATER ST NW SOUTH SIDE,VA,4D,805850,5005.0,394199,137318,2015-12-01T00:00:00.000Z
2,-77.007824,38.95292,32443138,7814264,TUESDAY,0,49,12,1925.0,P003,RESIDENTIAL PERMIT PKING BEYOND LIMIT W/O PERMIT,5200 BLOCK FORT TOTTEN DR NE WEST *,MD,UT,802277,1890.0,399322,142779,2015-12-01T00:00:00.000Z
3,-77.032409,38.902172,32443139,7814265,TUESDAY,0,49,12,2041.0,P281,FAIL TO DISPLAY A MULTISPACE METER RECEIPT,1400 K ST NW NORTH SIDE,VA,4D,240269,,397189,137146,2015-12-01T00:00:00.000Z
4,-77.035568,38.902531,32443140,7814266,TUESDAY,0,49,12,838.0,P031,UNAUTHORIZED VEHICLE IN LOADING ZONE,1600 BLOCK K ST NW SOUTH SIDE,MD,VA,801818,1512.0,396915,137186,2015-12-01T00:00:00.000Z


pandas also allows us to query the data to filter down the results.

In [3]:
df.loc[df['DAY_OF_WEEK']=='THURSDAY'].head()

Unnamed: 0,﻿X,Y,OBJECTID,ROWID_,DAY_OF_WEEK,HOLIDAY,WEEK_OF_YEAR,MONTH_OF_YEAR,ISSUE_TIME,VIOLATION_CODE,VIOLATION_DESCRIPTION,LOCATION,RP_PLATE_STATE,BODY_STYLE,ADDRESS_ID,STREETSEGID,XCOORD,YCOORD,TICKET_ISSUE_DATE
11513,-77.015681,38.902526,32939470,8516313,THURSDAY,0,49,12,1223.0,P281,FAIL TO DISPLAY A MULTISPACE METER RECEIPT,400 BLOCK K ST NW SOUTH SIDE,DC,4D,815026,12954.0,398640,137185,2015-12-03T00:00:00.000Z
11514,-77.046641,38.910202,32939471,8516314,THURSDAY,0,49,12,1849.0,P199,PARK IN A DESIGNATED ENTRANCE,1600 BLOCK 21ST ST NW WEST SIDE,DC,4D,814335,12352.0,395955,138038,2015-12-03T00:00:00.000Z
11515,-77.02757,38.908578,32939472,8516315,THURSDAY,0,49,12,1626.0,P012,DISOBEYING OFFICIAL SIGN,1200 BLOCK O ST NW NORTH SIDE,DE,4D,811790,10114.0,397609,137857,2015-12-03T00:00:00.000Z
11516,-77.02757,38.908578,32939473,8516316,THURSDAY,0,49,12,1629.0,P055,NO PARKING ANYTIME,1200 BLOCK O ST NW SOUTH SIDE,DC,PU,811790,10114.0,397609,137857,2015-12-03T00:00:00.000Z
11517,-77.009063,38.895618,32939474,8516317,THURSDAY,0,49,12,1744.0,P259,NO STOPPING OR STANDING IN PM RUSH HOUR ZONE,500 BLOCK N CAPITOL ST NE EAST SIDE,MD,4D,812697,10918.0,399214,136418,2015-12-03T00:00:00.000Z


To filter results using the label of a column you have to first call the .loc function on the dataframe.

.loc[] is primarily label based, but may also be used with a boolean array.

Allowed inputs are:

* A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index).
* A list or array of labels, e.g. ['a', 'b', 'c'].
* A slice object with labels, e.g. 'a':'f' (note that contrary to usual python slices, both the start and the stop are included!).
* A boolean array.
* A callable function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above)

In the example above we passed a boolean value, everytime Python returned a TRUE matching the value to the day of the week, Python will return the row of data. 

pandas also allows us to return specific columns

In [4]:
df[['DAY_OF_WEEK', 'WEEK_OF_YEAR', 'MONTH_OF_YEAR', 'ISSUE_TIME']].head()

Unnamed: 0,DAY_OF_WEEK,WEEK_OF_YEAR,MONTH_OF_YEAR,ISSUE_TIME
0,TUESDAY,49,12,1034.0
1,TUESDAY,49,12,1547.0
2,TUESDAY,49,12,1925.0
3,TUESDAY,49,12,2041.0
4,TUESDAY,49,12,838.0
