<center>  <a class="anchor" id="top"></a>
<img src="nsf.png" alt="Drawing" style="Width:  900px;"/>
<center>

# Table of Contents
1. [Setup](#setup)
1. [Magics](#magics)
1. [Widgets](#widgets)
1. [LaTeX](#latex)    
1. [Strings](#strings)
1. [Lists](#list)
1. [Tuples](#tuples)
1. [Dictionaries](#dictionaries)
1. [Custom Functions](#custom-functions)
1. [Ternary Operators](#ternary-operators)
1. [List Comprehensions](#list-comprehensions)
1. [Regular Expressions](#regular-expressions)
1. [Pandas](#pandas)
1. [Real-world Example](#rwe)
1. [Resources](#resources)

## Setup  <a class="anchor" id="setup">
Below are the packages and functions that we will use in this notebook. The first cell pip installs the ***haversine*** package because it is not one of the pre-installed packages available for Azure Notebooks. The "!" tells Jupyter Notebook to execute the code as a cell command. The first line of code in the second cell is importing the entire pandas package with an alias of pd. The next line in the second cell is importing the ***YouTubeVideo*** function from the ***display*** module in the ***IPython*** library. In the last line in the second cell is importing the ***haversine*** function from the ***haversine*** library.

In [None]:
!pip install haversine

In [None]:
import pandas as pd
from IPython.display import YouTubeVideo
from haversine import haversine

In the cell below we import the contents of the EmployeeList and Movie-Rating csv files into data frames.

In [None]:
dfEmployeeList = pd.read_csv("EmployeeList.csv")
dfMovieRatings = pd.read_csv("Movie-Ratings.csv")

[Back to Table of Contents](#top)

## Magics <a class="anchor" id="magics"></a>

Magic commands are designed to succinctly solve various common problems in standard data analysis. Magic commands come in two flavors: ***line magics***, which are denoted by a single % prefix and operate on a single line of input, and ***cell magics***, which are denoted by a double %% prefix and operate on multiple lines of input. (Jake VanderPlas November 2016. ***Python Data Science Handbook***. Retrieved from https://bit.ly/2KzrSZk)
    
The first two cells below contains informational magics, the third cell below contains a line magic that is used to time a single line of code, and the fourth cell below contains a cell magic that is used to time all of the code in the cell that it is in.

In [None]:
%magic

In [None]:
%lsmagic

In [None]:
%timeit L = [n ** 2 for n in range(500)]

In [None]:
%%timeit
    L = []
    for n in range(500):
        L.append(n ** 2)

[Back to Table of Contents](#top)

## YouTube video about Widgets <a class="anchor" id="widgets"></a>

Widgets are eventful python objects that have a representation in the browser, often as a control like a slider, textbox, etc.  (Jupyter Widgets. ***Simple Widget Introduction***. Retrieved from https://bit.ly/2YowKdr)

Below is a YouTube video that does a great job at introducing Jupyter widgets.

In [None]:
YouTubeVideo('6SHnmho7zCs')

[Back to Table of Contents](#top)

## LaTeX <a class="anchor" id="latex"></a>
LaTeX is a high-quality typesetting system that includes features designed for the production of scientific documentation. Below is a YouTube video that goes into depth about LaTeX.

In [None]:
YouTubeVideo('BVlXXbBvzVo')

[Back to Table of Contents](#top)

## Strings <a class="anchor" id="strings"></a>

String can be defined as a sequence of characters. The article ***String in Python*** contains a more detailed explanation and it can be found at https://bit.ly/33b302k. The cells below contains examples of how you can slice strings.

We create a variable and assign it the value of "atlanta".

In [None]:
city = 'atlanta'

We use the ***len*** function to get the length of the ***city*** variable

In [None]:
len(city)

We subset the first 3 characters of the city variable. Note that the first position is inclusive but the second position is exclusive

In [None]:
city[0:3]

We subset the first 3 characters with an implicit 0 for the first position

In [None]:
city[:3]

We subset the city variable by starting at the beginning of the variable to the character that is 4 positions from the end

In [None]:
city[:-4]

We have are doing a more complicated subset. The first part of the subset is extracting the first 4 characters from the city variable. The ***2*** is telling python to grab every other character that it subsetting earlier.

In [None]:
city[0:4:2]

We subset the city variable that starting at the character that is in the fifth position from the end and grabing all of the characters until it reaching the end of the variable.

In [None]:
city[-5:]

We concatenate "hot" with the subset that we created above.

In [None]:
"hot" + city[-5:]

We test to see if the string "atl" is in the city variable

In [None]:
"atl" in city 

We test to see if the string "chi" is in not the city variable

In [None]:
"chi" not in city

[Back to Table of Contents](#top)

## List <a class="anchor" id="list"></a>

Lists are just like the arrays, declared in other languages. Lists need not be homogeneous always which makes it a most powerful tool in Python. A single list may contain DataTypes like Integers, Strings, as well as Objects. Lists are also very useful for implementing stacks and queues. Lists are mutable, and hence, they can be altered even after their creation. (GeeksForGeeks. ***Python List***. Retrieved from https://bit.ly/2XnnvZz)

Below is a simple example of how to create an empty list, how to add elements to a list, how to subset a list, and then how to sort a list.

In [None]:
x = []

In [None]:
x.append("Jill")
x.append("Jane")
x.append("John")
x.append("Joan")
x.append("James")
x

In [None]:
x[0:3]

In [None]:
x.sort()
x

In [None]:
x.sort(reverse = True)
x

[Back to Table of Contents](#top)

## Tuples <a class="anchor" id="tuples"></a>

Tuple is a collection of Python objects much like a list. The sequence of values stored in a tuple can be of any type, and they are indexed by integers. The important difference between a list and a tuple is that tuples are immutable. Also, Tuples are hashable whereas lists are not.  (GeeksForGeeks. ***Python Tuples***. Retrieved from https://bit.ly/2YLJLgn)

In [None]:
x = (33.7721, -84.3902)
x

[Back to Table of Contents](#top)

## Dictionaries <a class="anchor" id="dictionaries"></a>

Dictionary in Python is an unordered collection of data values, used to store data values like a map, which unlike other Data Types that hold only single value as an element, Dictionary holds ***key:value*** pair. Key value is provided in the dictionary to make it more optimized. Each ***key-value*** pair in a Dictionary is separated by a colon :, whereas each key is separated by a ‘comma’.  (GeeksForGeeks. ***Python Dictionary***. Retrieved from https://bit.ly/2OFgMXO)
        
Below is a dictionary that contains the top 5 scorers in the NBA for the 2018-2019 season. We illustrate some of the methods that are available to dictionaries and also show how you can use the ***in*** clause to test whether a key with a certain value exists in your dictionary.

In [None]:
top_scorers = {"Harden":2818, "George":2159, "Walker":2102, "Beal":2099, "Lillard":2067}

In [None]:
len(top_scorers)

In [None]:
top_scorers.keys()

In [None]:
top_scorers.values()

In [None]:
top_scorers.items()

In [None]:
scorer = "Curry"
scorer in top_scorers

In [None]:
scorer = "Harden"
scorer in top_scorers

In [None]:
scorer = "Curry"
scorer not in top_scorers

In [None]:
scorer = "Harden"
scorer not in top_scorers

[Back to Table of Contents](#top)

## Custom Functions <a class="anchor" id="custom-functions"></a>

Below is a very simple example of a custom function. The point is to show how easy it is to construct a simple function.

In [None]:
def MultiplyByTwo(x):
    return x * 2

In [None]:
MultiplyByTwo(100)

[Back to Table of Contents](#top)

## Ternary Operators <a class="anchor" id="ternary-operators"></a>

Ternary operators also known as conditional expressions are operators that evaluate something based on a condition being true or false. It was added to Python in version 2.5. It simply allows to test a condition in a single line replacing the multiline if-else making the code compact. (GeeksForGeeks. ***Ternary Operator in Python***. Retrieved from https://bit.ly/2YwywcE)

The example below starts off by creating two variables, x and y, and assigns the values 10 and 20 to them respectively. In the next cell a ternary operator is used to set the value of the max_value variable. The ternary operatory returns the value of the variable that represents the bigger number.

In [None]:
x, y = 10, 20

In [None]:
max_value = x if x > y else y
max_value

The example below shows a more traditional method. Notice how succint the code is in the ternary example compared to the example below.

In [None]:
if x > y:
    max_value = x
else:
    max_value = y

max_value

[Back to Table of Contents](#top)

## List Comprehensions <a class="anchor" id="list-comprehensions"></a>

List comprehensions provide a concise way to create lists. It consists of brackets containing an expression followed by a ***for*** clause, then zero or more ***for*** or ***if*** clauses. The expressions can be anything, meaning you can put in all kinds of objects in lists. (PythonForBeginners. ***List Comprehensions in Python***. Retrieved from https://bit.ly/2Q8zFi4)

The first cell shows an example of using a list comprehension to square all of the numbers in the ***numbers*** list. The second example shows how you can use list expression with a ***if*** statement to determine what elements you want to include in your new list.

In [None]:
numbers = [2, 3, 4, 5]
squares = [number ** 2 for number in numbers]
squares

In [None]:
names = ['ryan', 'edward', 'wade']
upper_case = [name.upper() for name in names if len(name) == 4]
upper_case

[Back to Table of Contents](#top)

## YouTube video about Regular Expressions <a class="anchor" id="regular-expressions"></a>

A regular expression is a sequence of characters that define a search pattern. The video below is a great introduction to regular expressions in Python.

In [None]:
YouTubeVideo('kWyoYtvJpe4')

[Back to Table of Contents](#top)

## Intro to Pandas <a class="anchor" id="pandas"></a>

Many methods are available to you when you load your data into a pandas data frame. Below are examples of a few of them. 

In [None]:
dfMovieRatings.head()

In [None]:
dfMovieRatings.columns

In [None]:
dfMovieRatings.shape

In [None]:
dfMovieRatings.Genre.unique()

In [None]:
TotalBudget = dfMovieRatings["Budget (million $)"].sum()
TotalBudget

Pandas offers a succint but powerful methods to subset rows and columns from your data frame. In this section we will focus on two methods in particular, the ***loc*** and ***iloc*** method. The ***loc*** method subset rows and columns based on labels and the ***iloc*** method subset rows and columns based on integer position. We will start with eamples of the loc method. Below are descriptions of what the cells are doing below:

In the cell below we are returning the values of all of the columns in the first row. The output is returned as a pandas series because we are only returning one row.

In [None]:
FirstRow = dfMovieRatings.loc[0, :]
FirstRow

In the cell we are returning all of the columns in the first 3 rows. 

In [None]:
SubsetRows = dfMovieRatings.loc[0:3, :]
SubsetRows

In the cell below we are returning the "Film" and "Genre" columns for all rows.

In [None]:
SubsetColumnsUsingList = dfMovieRatings.loc[:,["Film", "Genre"]]
SubsetColumnsUsingList.head()

In the cell below we are returning the columns from the "Genre" column to the "Year of release" for all of the rows

In [None]:
SubsetColumnsBetweenTwoColumns = dfMovieRatings.loc[:, "Genre":"Year of release"]
SubsetColumnsBetweenTwoColumns.head()

In the cell below we are returning all of the columns where the rows in the "Genre" column are equal to "Comedy"

In [None]:
SubsetRowsBasedOnBooleanCondition = dfMovieRatings.loc[df.Genre =='Comedy', :]
SubsetRowsBasedOnBooleanCondition.head()

In the cell below we are returning all of the columns where the rows in the "Genre" column are equal to "Comedy" or "Action" 

In [None]:
SubsetRowsBasedOnMulitpleBooleanCondition = dfMovieRatings.loc[(df.Genre =='Comedy') | (df.Genre =='Action'), :]
SubsetRowsBasedOnMulitpleBooleanCondition.head()

In the cells below we are subsetting rows and columns using the ***iloc*** method. This method subsets rows and columns based on integer position. Here are descriptions of what is going on in each cell:

In the cell below we are returning all of the columns in the first 3 rows of the data frame. Note that unlike the loc method, the iloc method second position is not inclusive. So in this example we grab the rows starting at row zero but stoping before row 3 which results in a data set of 3 rows.


In [None]:
SubsetRowsOneThroughThree = dfMovieRatings.iloc[0:3,:]
SubsetRowsOneThroughThree

In the cell below we are returning all of the columns in the last 3 rows.

In [None]:
SubsetLastThreeRows = dfMovieRatings.iloc[-3:,:]
SubsetLastThreeRows

In the cell below we are returning the first, third, and fifth columns for all rows. Remember python is a zero based system so the first position starts at zero and not at 1.

In [None]:
SubsetColumnsOneThreeFive = dfMovieRatings.iloc[:,[0,2,4]]
SubsetColumnsOneThreeFive.head()

In the cell below we are returning column 1 to column 3 for all rows.

In [None]:
SubsetColumnsOneThroughThree = dfMovieRatings.iloc[:,0:3]
SubsetColumnsOneThroughThree.head()

You can also use pandas to group rows together in your data frame and apply aggregations functions to them. Below is a simple ***groupby*** example where we group the data frame by the ***Genre*** and ***Year of release*** columns and then sum up the ***Budget (million $)*** column for each distinct combination of that grouping. 

The code in the first cell below groups the rows by ***Genre*** and ***Year of release*** then sums the ***Budget (million $)*** column based on that grouping. That action returns a ***DataFrameGroupBy*** object and not the data frame object that it is based on. The code in the next cell shows that the DataFrameGroupBy object that was created has a index that is based on the ***Genre*** and ***Year of release*** column. We want to remove that index and convert the ***DataFrameGroupBy*** object to a regular data frame. We can accomplish with the code in the next cell via the reset_index method. The reset_index method converts the index values to columns and creates a new sequential index.


In [None]:
GroupbyExample = dfMovieRatings.groupby(["Genre","Year of release"], as_index = "False")['Budget (million $)'].sum()
print(GroupbyExample.head())

In [None]:
list(GroupbyExample.index.values)[:6]

In [None]:
GroupbyExample = GroupbyExample.reset_index()
GroupbyExample.head(6)

[Back to Table of Contents](#top)

## Real World Example <a class="anchor" id="rwe"></a>

We will conclude with a real world example. We will create a function that uses the ***haversine function*** from the ***haversine*** library to calculate the distance between two geographical points. The ***haversine*** function requires two parameters, a tuple that contains the latitude and longitude of the starting location and another tuple that contains the latitude and longitude of the ending location. We need to reshape the dfEmployeeList data in order to be able to use the Haversine function. In the example below we do that using a custom function. Let's unpack what going on in the 2 cells below first starting with second cell that calls the custom function then we will explain the custom function. Here are the steps:

1. In the second cell below we call the custom function, useHaversine, using a lambda function. A lambda function is an anonymous function in Python that is defined using the keyword ***lambda***. The can be applied to columns or the entire data frame to create calculated columns. In the example below we are using a lambda function against each row of the dfEmployeelist data frame. We do so by using the ***apply*** method against the dfEmployeeList and setting the axis parameter to 1. The alias ***row*** is used to reference the row in our lambda function.
1. The lambda function passes the entire row to the usehaversine function. The useHaversine function will take the row passed to it and extract the information from it that it needs.
1. The useHaversine function takes the row that was passed to it and builds a tuple of the latitude and longitude of the starting location then does the same for the ending location. It uses those tuples as the first and second parameter of the ***haversine function*** from the ***haversine*** library. The ***unit*** parameter is set to "mi" because we want the return value to be in miles.



In [None]:
dfEmployeeList.columns

In [None]:
def useHaversine(row):
    location_one = (row["lat_EmployeeAddress"], row["lon_EmployeeAddress"])
    location_two = (row["lat_TerminalAddress"], row["lon_TerminalAddress"])
    return haversine(location_one, location_two, unit="mi")

In [None]:
dfEmployeeList["Haversine Function"] = dfEmployeeList.apply(lambda row: useHaversine(row), axis=1)
dfEmployeeList

[Back to Table of Contents](#top)

## Resources <a class="anchor" id="resources"></a>

Below are some good resources that covers important topics in Python from a data analytics standpoint. 

* Python for Data Analysis Notebook = https://notebooks.azure.com/wesm/projects/python-for-data-analysis
* Conda Cheat Sheet = https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf
* Pep explained = https://www.python.org/dev/peps/pep-0001/#what-is-a-pep
* PEP8 = https://pep8.org/
* GitHub tutorial = https://guides.github.com/activities/hello-world/
* Jupyter Lab Tutorial = https://buildmedia.readthedocs.org/media/pdf/jupyterlab/stable/jupyterlab.pdf
* Basic Command Prompt commands = https://www.digitalcitizen.life/command-prompt-how-use-basic-commands
* Python Data Structures = http://thomas-cokelaer.info/tutorials/python/data_structures.html
* Data School YouTube Channel = https://www.youtube.com/channel/UCnVzApLJE2ljPZSeQylSEyg

[Back to Table of Contents](#top)