# A review of Python

<div class="alert alert-block alert-danger">
<b>Check the Kernel you are using:</b> Before we get started, if you are running this on HiPerGator, double check the kernel in use. This is shown in the top right of the window and should look like: <img src="images/kernel.python310.png" alt"Image showing that the notebook is using the Python 3.10 kernel" style="float:right">
</div>

This notebook has been adapted from the [UF Data Science and Informatics](http://www.dsiufl.org/) [Python 0 workshop](https://github.com/dsiufl/Python-Workshops). 

**This notebook has placeholders for code to be netered during class.** If you want the completed version, see the [Intro_to_Python.ipynb file](Intro_to_Python.ipynb)--but try typing the code in during class, it is slower, but you will likely remember more.


## About UF DSI--The original creators of this content [<img src="images/UF_DSI_logo.png" alt="UF DSI Logo" style="width: 200px;float:right"/>](http://www.dsiufl.org/)

> We are an multi- and inter- disciplinary student organization that is dediated to promoting Data Science here at the Univeristy of Florida. We are partnered with the UF Informatics Institute who's aim is to foster informatics research and education.

## What is Python?

Python is an easy-to-use and robust **Object-Oriented** programming language. A lot of new software application are built with Python for this reason. It is used in other areas of computer science such and software engineering, digital arts, cybersecurity, and of course Data Science! 

This is a workshop that will introduce you to the basics of python and introduce you to Data Science and Visualization in Python. Due to the breadth of the language there are still many topics left for you to explore! Here we teach you the necessary skills. 


<div class="alert alert-block alert-info">
    <b>Object Oriented Programming (or OOP)</b> is a popular paradigm in programming. Classes and the Objects created from them offer a level of abstraction that facilitates coding by compartmentalizing data and the methods that opperate on these data. Here are some resources for more information on OOP:
    <ul>
        <li><a href="https://www.freecodecamp.org/news/object-oriented-programming-concepts-21bb035f7260/">How to explain object-oriented programming concepts to a 6-year-old</a><\li>
            <li><a href="https://github.com/comptoolsres/Jupyter_content/blob/main/py4e_ch14_ObjectOrientedProgramming.ipynb">A Jupyter notebook with my notes on OOP for my Computational Tools class</a></li>
    </ul>
</div>

## Variables and Types

#### Calculator

Python can be used as a calculator. <code>Shift+Enter</code> runs the code block so you don't have to click run every time

In [1]:
# Addition and Subtraction
(3+2) - 8

-3

In [2]:
# Multiplication and Division
(3*2)/3

2.0

In [3]:
# Exponentiation
6**2

36

Variables can be given alphanumeric names beginning with an underscore or letter.  **Variable types do not have to be declared and are inferred at run time.**

<div class="alert alert-block alert-info">
<b>Note:</b> Python is what is referred to as "dynamically typed", variables do have a type, but that type is inferred based on the data that the variable holds at any given time. This has advantages and disadvantages. Speed is one disadvantage--any operation on a variable needs to first determine the variable's type and then the appropriate opperation can be made.</div>

In [4]:
a=1
print(type(a))

<class 'int'>


In [7]:
b = 2.5
print(type(b))

<class 'float'>


Strings can be declared with either single or double quotes.

In [4]:
c1 = "Go "
c2 = "Gators"
c3 = c1 + c2
print(c3)
print(type(c3))

Go Gators
<class 'str'>


### Exercise 1

Some things to try:
 * Practice making some variables of different types. 
 * See what happens when you add, subtract, divide integers and floating point numbers. 
 * Practice printing and formatting output.

In [6]:
# Integers 
x1 = 4
x2 = 3
# Floats
y1 = 2.5
y2 = 1.5
# Strings
z1 = "Hello"
z2 = "world"
# Add two int variables
x = x1+x2
print(x)
# Add two float variables
y=y1+y2
print(y)
# Add an int and a float
xy=x1+y1
print(xy)
# Divide a int by a float
x_y=x2/y2
print(x_y)

7
4.0
6.5
2.0


In [9]:
# %load snippets/Ex_01.variables.py
# Integers 
a=10
b=6
print(f"a is {a} \nb is {b}\n")

# Floats
x=7.5
y=9.
print(f"x is {x} \ny is {y}\n")

# Strings
name='matt'
number='three'
print(f"Your name is {name} and you have {number} fingers.\n")

# Add two int variables
c=a+b
print(f"{a} + {b} = {c}\n")

# Add two float variables
z=x+y
print(f"{x} + {y} = {z}\n")

# Add an int and a float
j=a+y
print(f"{a} + {y} = {j} \nAnd j is type: {type(j)}\n")

# Divide a int by a float
k=a/y
print(f"{a} / {y} = {k} \nAnd k is type: {type(k)}.")
print("Note that his is different than Python 2 behavior. In Python 2, you would get 1.\n")


a is 10 
b is 6

x is 7.5 
y is 9.0

Your name is matt and you have three fingers.

10 + 6 = 16

7.5 + 9.0 = 16.5

10 + 9.0 = 19.0 
And j is type: <class 'float'>

10 / 9.0 = 1.1111111111111112 
And k is type: <class 'float'>.
Note that his is different than Python 2 behavior. In Python 2, you would get 1.



## Modules and Import

~~Files with a .py extension are known as Modules in Python.~~ By convention, Python scripts are saved with a `.py` file extension. A Module is little more than a Python script and its functions and class definitions can be imported into other scripts or notebooks. Modules are used to store functions, variables, and class definitions.  

The Python language is intentionally limited in its functionality, with required features being added as needed.

Modules that are not part of the standard Python library are included in your program using the <code>import</code> statement.

In [None]:
# To use Math, we must import it
import math
print(cos(0)) #does not work. module not specified


1.0


Whoops.  Importing the <code>math</code> module allows us access to all of its functions, but we must call them in this way

In [13]:
print(math.cos(0))

1.0


Alternatively, you can use the <code>from</code> keyword

In [None]:
from math import cos
print(cos(0))
     # we only imported cos, not the pi constant
     #from math import cos as sin Ex. Arbitrarily name function

1.0


Using the <code>from</code> statement we can import everything from the math module.  

<div class="alert alert-block alert-warning">
    <b>Disclaimer:</b> many Pythonistas discourage using <code>from ____ import *</code> for performance reasons and to avoid namespace conflicts.  <b>Only import what you need</b>. 
</div>


In [15]:
from math import *
   # now we don't have to make a call to math--but this is lazy and not recommended

## Strings
As you may expect, Python has a powerful, full featured string module.  

### Substrings
Python strings can be substringed using bracket syntax

In [3]:
mystring = "Go Gators, Come on Gators, Get up and go!"
print (mystring)

Go Gators, Come on Gators, Get up and go!


In [4]:
print(mystring[11:25])

Come on Gators


**Python is a 0-index based language.**  Generally whenever forming a range of values in Python, the first argument is inclusive whereas the second is not, i.e. <code>mystring[11:25]</code> returns characters 11 through 24.

You can omit the first or second argument

In [21]:
   # all characters before the 9th index
print(mystring[:9])

Go Gators


In [22]:
   # all characters at or after the 27th
print(mystring[27:])

Get up and go!


In [23]:
   # you can even omit both arguments
print(mystring[:])

Go Gators, Come on Gators, Get up and go!


Using negative values, you can count positions backwards

In [24]:
print(mystring[-3:-1])

go


#### Exercise 2

Write the code to print the substring Gators. You can use either occurence of the word in the string.

In [5]:
# Add your code here

print(mystring[3:9])

Gators


In [28]:
# %load snippets/Ex_02.substring_Gators.py
print(mystring[3:9])

Gators


### String Functions
Here are some more useful string functions


#### lower and upper

<div class="alert alert-block alert-info">
    
<b>Note:</b> Referring back to the idea of OOP: classes and objects, a there is a string class and <code>mystring</code> is an object made from that class. The class has several methods, <code>.lower()</code> and <code>.upper()</code> are two methods of the string class and can be applied to the <code>mystring</code> object.
    
</div>

In [6]:
mystring.lower()
mystring.upper()

'GO GATORS, COME ON GATORS, GET UP AND GO!'

#### find

In [7]:
  # returns the index of the first occurence of Gators
mystring.find('Gators')

3

Looks like nothing was found.  -1 is returned by default.

In [37]:
  # no Seminoles here
mystring.find('Seminoles')

-1

#### split

In [38]:
   # returns a list of strings broken by a space by default
mystring.split()

['Go', 'Gators,', 'Come', 'on', 'Gators,', 'Get', 'up', 'and', 'go!']

In [41]:
   # you can also define the separator
mystring.split(',')

['Go Gators', ' Come on Gators', ' Get up and go!']

#### join

The <code>join</code> is useful for building strings from lists or other iterables.  Call <code>join</code> on the desired separator

In [44]:
print(' '.join(['Go','Gators!']))

Go Gators!


For more information on string functions:

https://docs.python.org/3.8/library/stdtypes.html#string-methods

### Exercise 3

Reading Documentation is an important skill. There are many string methods. Have a look over the [String Methods section of the Python documentation](https://docs.python.org/3.8/library/stdtypes.html#string-methods) and impliment two new methods, e.g. try the `.replace()` method or the `.swapcase()` method). 

In [9]:
mystring.swapcase()

'gO gATORS, cOME ON gATORS, gET UP AND GO!'

In [13]:
mystring.replace("Gator","Crocodile")

'Go Crocodiles, Come on Crocodiles, Get up and go!'

## Lists
The Python standard library does not have traditional C-style fixed-memory fixed-type arrays.  Instead, lists are used and can contain a mix of any type.

Lists are created with square brackets []

In [26]:
    # Note that a list can have multiple variable types
mylist = [1,2,3,4,'five']

In [27]:
   # add an item to the end of the list
mylist.append(6.0)
print(mylist)

[1, 2, 3, 4, 'five', 6.0]


In [28]:
   # insert the number 7 at index 6
mylist.insert(6,7)
print(mylist)

[1, 2, 3, 4, 'five', 6.0, 7]


In [29]:
   # removes the first matching occurence 
mylist.remove('five')
print(mylist)

[1, 2, 3, 4, 6.0, 7]


In [None]:
   # by default, the last item in the list is removed and returned
popped = mylist.pop()
#Default pops last item in the list, storing it 
print(popped)
print(mylist)

7
[1, 2, 3, 4, 6.0]


In [31]:
  # returns the length of any iterable such as lists and strings
print(len(mylist))

5


In [None]:
# default list sorting. When more complex objects are in the list, arguments can be used to customize how to sort
mylist.sort()
#Sorted by numeric variables in this instance
print(mylist)

[1, 2, 3, 4, 6.0]


In [33]:
  # reverse the list
mylist.reverse()
print(mylist)

[6.0, 4, 3, 2, 1]


For more information on Lists:

https://docs.python.org/3.8/tutorial/datastructures.html#more-on-lists

## Conditionals
Python supports the standard if-else-if conditional expression. REMEMBER TO INDENT

In [34]:
a = 1; b = 2
if a < b:
    print('a is less than b')
elif a == b:
    print('a is equal to b')
else:
    print('a is greater than b')

a is less than b


## Loops
Python supports for, foreach, and while loops
### For (counting)
Traditional counting loops are accomplished in Python with a combination of the <code>for</code> key word and the <code>range</code> function

In [35]:
 # with one argument, range produces integers from 0 to 9
for x in range(10):
    print(x)

0
1
2
3
4
5
6
7
8
9


In [38]:
   # with three arguments, range starts at 1 and goes in steps of 3 until greater than 12
for z in range (1, 13, 3):
    print(z)

1
4
7
10


### Foreach
As it turns out, counting loops are just foreach loops in Python.  The <code>range</code> function returns a list of integers over which <code>for in</code> iterates.  This can be extended to any other iterable type

In [40]:
 # iterate over a list of strings
grocery_list = ['juice', 'tomatoes', 'potatoes', 'bananas']
for i in grocery_list:
       print (i)

juice
tomatoes
potatoes
bananas


### Exercise 4

Let's combine conditionals and for loops. Write code to go through the integers from 0 to 10 and print "Odd" for odd numbers, "Even" for even numbers, "Less than 5" for numbers less than 5, "Equals 5" for 5, and "Greater than 5" for numbers greater than 5.


In [51]:
# Your code here
for x in range(11):
    if x%2==0:
        if x<5:
            print(x,'Even and Less than 5')
        elif x==5:
            print(x,'Even and Equals 5')
        else:
            print(x,'Even and More than 5')
    else:
        if x<5:
            print(x,'Odd and Less than 5')
        elif x==5:
            print(x,'Odd and Equals 5')
        else:
            print(x,'Odd and More than 5')

0 Even and Less than 5
1 Odd and Less than 5
2 Even and Less than 5
3 Odd and Less than 5
4 Even and Less than 5
5 Odd and Equals 5
6 Even and More than 5
7 Odd and More than 5
8 Even and More than 5
9 Odd and More than 5
10 Even and More than 5


In [53]:
# %load snippets/Ex_04.loop_conditional.py
for num in range(1,11):
    if (num % 2) == 0:
        print(f"{num} is even.")
    else:
        print(f"{num} is odd.")
    
    if num < 5:
        print(f"{num} is less than 5.")
    elif num == 5:
        print(f"{num} equals 5")
    else:
        print(f"{num} is greater than 5.")

1 is odd.
1 is less than 5.
2 is even.
2 is less than 5.
3 is odd.
3 is less than 5.
4 is even.
4 is less than 5.
5 is odd.
5 equals 5
6 is even.
6 is greater than 5.
7 is odd.
7 is greater than 5.
8 is even.
8 is greater than 5.
9 is odd.
9 is greater than 5.
10 is even.
10 is greater than 5.


## Functions

Functions are an important part of good coding. They help break your code into functional elements, making your code easier to write, read and maintain. This is one step of **abstraction**--allowing you to focus on relevant portions of code (e.g. we don't normally care *how* the `.upper()` method works, but if we do, we could look at the code). They help reduce repeated blocks of code. 

A function may or may not take input values and may or may not return output values. 

Functions need to be defined before they can be used. The definition syntax is the following:

In [55]:
def add(a, b):
    return a + b
add (1,3)

4

In [59]:
def player(name, number): # use some arguments
    return f"Player {name} wears number {number}"
     # cast number to a string when concatenating
player('Michael Jordan', 25)


'Player Michael Jordan wears number 25'

Functions can have optional arguments if a default value is provided in the function signature

In [64]:
def player(name, number, team = 'Chicago Bulls'):
      return f"Player {name} wears number {number} and plays for the {team}"
player('Michael Jordan', 25)


'Player Michael Jordan wears number 25 and plays for the Chicago Bulls'

In [65]:
   # supplying all three arguments
player('Michael Jordan', 25, 'NYC Giants')

'Player Michael Jordan wears number 25 and plays for the NYC Giants'

Python functions can be called using named arguments, instead of positional

In [None]:
player(number=24, name='Michael Jorgan', team='NYC Giants')
#Use the same convention that was defined in the function

'Player Michael Jorgan wears number 24 and plays for the NYC Giants'

### return
In Python functions, an arbitrary number of values can be returned

### Exercise 5

Write a function that takes a number of years and calculates and prints how many days, hours, minutes and seconds that corresponds to and returns the nubmer of seconds.

In [71]:
def year2secs(years):
    days=years*365
    hours=days*24
    minutes=hours*60
    seconds=minutes*60
    print(f"There are {days} days, {hours} hours, {minutes} minutes and {seconds} seconds in {years} years")
    return f"{seconds}"

year2secs(60)

There are 21900 days, 525600 hours, 31536000 minutes and 1892160000 seconds in 60 years


'1892160000'

In [70]:
# %load snippets/Ex_05.yearConvert_function.py
def yearConvert(years):
    '''Takes years and prints, days, hours, minutes, seconds and returns seconds.'''
    
    # Use a try:/except: to catch non-numeric values of years
    try:
        days=365*float(years)
    except:
        print(f"Expecting a number for years, got {years}, a {type(years)}.")

    hours=24*days
    minutes=60*hours
    seconds=60*minutes

    print(f"{years} is:")
    print(f"   {days} days")
    print(f"   {hours} hours")
    print(f"   {minutes} minutes")
    print(f"   {seconds} seconds")

    return seconds

yearConvert(60)

60 is:
   21900.0 days
   525600.0 hours
   31536000.0 minutes
   1892160000.0 seconds


1892160000.0

# Data Science Tutorial

<div class="alert alert-block alert-info">
<b>Note:</b> We will return to look more at Pandas later, but this tutorial is a good introduction.</div>


Now that we've covered some Python basics, we will begin a tutorial going through many tasks a data scientist may perform.  We will obtain real world data and go through the process of auditing, analyzing, visualing, and building classifiers from the data.

We will use a database of selected professor salaries downloaded from a set of data made available by Vincent Arel-Bundock called [Rdatasets](https://vincentarelbundock.github.io/Rdatasets/). The specific file can be downloaded using this link: 
https://vincentarelbundock.github.io/Rdatasets/csv/carData/Salaries.csv

## Obtaining the Data

[`Pandas`](https://pandas.pydata.org/) is a powerful module for tabular data--much of the data we deal with!

Using the `Pandas` library we can easily import data from a given link or from a file on our computer (must know syntax for filepath). In this case we will give it a link.

In the code below, note that it is common to use aliases when importing modules to provide shorter names to reference them. In this case, Pandas is normally imported as `pd`. You could call it whatever you want, but it is best to stick with convention.

In [None]:
import pandas as pd # import the module and alias it as pd

salary_data = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/carData/Salaries.csv')
salary_data.head() # show the first few rows of the data

Lets take a look at some simple statistics for the **yrs.since.phd** column

<code>salary_data.mean().round()</code> will take the mean of each column (this computation ignores the currently present `nan` values (not a number)), then round, and return a dataframe indexed by the columns of the original dataframe.

This function can be used to replace all missing values with the mean of each column. In this tutorial however, we will not use this method, because the large number of missing values would greatly skew our standard deviations.

#### Check Unique Values

Structurally, Pandas dataframes are a collection of Series objects sharing a common index.  In general, the Series object and Dataframe object share a large number of functions with some behavioral differences.  In other words, whatever computation you can do on a single column can generally be applied to the entire dataframe.

Now we can use the dataframe version of <code>describe</code> to get an overview of all of our data

## Visualizing the Data
Another important tool in the data scientist's toolbox is the ability to create visualizations from data.  Visualizing data is often the most logical place to start getting a deeper intuition of the data.  This intuition will shape and drive your analysis.

Even more important than visualizing data for your own personal benefit, it is often the job of the data scientist to use the data to tell a story.  Creating illustrative visuals that succinctly convey an idea are the best way to tell that story, especially to stakeholders with less technical skillsets.

We'll be using the plotting library matplotlib, which stands for mathematical plotting library. It is the most widely used plotting library, and has a few other packages built on top of it (like a library called seaborn) to make your plots even more beautiful and easy to use. 

We'll start by doing a bit of setup.

In [None]:
#importing matplotlib library with an alias as well as the seaborn library
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style = 'darkgrid', color_codes = True)   # my personal style preferences

# hack to make seaborn plots bigger on jupyter notebooks
def setPlt():
    f, ax = plt.subplots(figsize = (13,9))
    sns.despine(f, left = True, bottom = True)

Let's go ahead and start with a histogram of the years since the professors got their phd using the displot( ) function.  

In [None]:
#create our first plot, a histogram of salaries

setPlt()
hist = sns.distplot(salary_data['yrs.since.phd'])

Visualization is all about asking questions of the data. One thing that we could be curious about is how the pay changes as people have had their phd for longer. We can make a scatterplot of exactly that using the scatter function. 

In [None]:
setPlt()
scatplt = sns.scatterplot(x = 'yrs.since.phd', y = 'salary', data = salary_data)

If the above does not work please follow these steps:

MAC

-Open terminal and type <code>conda remove seaborn</code> and press Enter. 
    
-Then type <code>conda install seaborn==0.9.0</code>

    
Windows

-Open Anaconda Prompt (Press Windows button and type "Anaconda Prompt")
    
-Type <code>conda remove seaborn</code>
    
-Then type <code>conda install seaborn==0.9.0</code>

You may need to restart anaconda 

Seems like there are some people who have had their Ph.D for a while but still dont get paid a lot. Does the same hold true with how long they've worked? What about their position?

In [None]:
setPlt()
sns.scatterplot(x = 'yrs.service', y = 'salary' ,data = salary_data)

We can also color our graph fairly easily, let's compare the years since Phd to the Years of employement to see the distribution of salary.

In [None]:
#colored scatter plot
setPlt()
sns.scatterplot(x = 'yrs.since.phd', y = 'salary', hue = 'rank', data = salary_data)

## Summary

So far in our three-part Python series (check out DSI's [Python 1](https://github.com/dsiufl/Python-Workshops/tree/master/Python1) and [Python 2](https://github.com/dsiufl/Python-Workshops/tree/master/Python2) modules for more), we've learned about variables, data structures, functions, and graphing. While we have introduced these topics in the context of data science with Python, they are central to programming in any language and in any context.


### Data Science in a Nutshell
We believe that data science has the potential to revolutionalize the way we understand our world. Anyone can learn the tools of Data Science in order to ensure success. Our goal is to give you these tools and create a community of data scientists here at UF.

We hope you enjoed the workshop and look forward to seeing you soon!

#### Visit our website if you want to get involved with DSI:  http://www.dsiufl.org

# Thank You !

In [32]:
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def introduction(self):
        print("Hello my name is " + self.name)

In [33]:
P = Person("Matt",25)
P.introduction()

Hello my name is Matt
