## Lighthouse Labs
### W01D2 Programming in Python for DS
Instructor: Socorro Dominguez  
June 22, 2021

**Agenda:**

* Housekeeping Guidelines
* Introduction to Python Programming Language
    * History
    * Syntax
    * Basic Data Types
* Python for Data Science
* Popular IDEs

## Housekeeping

- Mics muted (to avoid background noises)
- Cameras on
- You can ask; unmute yourself if you want to ask something.
    - You can ask me questions. I can ask you questions too. :)
    

- We will take a 10 min break after 45/50 minutes of class.

[Download this notebook](https://downgit.github.io/#/home?url=https://github.com/sedv8808/LighthouseLabs/blob/main/W01D2/W01D2_PPDS.ipynb)

# What is Python?

- Python is a widely used general-purpose, high-level programming language. 
- Initially designed by Guido van Rossum in 1991 and developed by Python Software Foundation. 
- Developed to allow programmers express concepts in fewer lines of code.

- Python is a General Purpose object-oriented programming language, which means that it can model real-world entities. It is also dynamically-typed because it carries out type-checking at runtime.

- The distinctive feature of Python is that it is an interpreted language (no need to compile).

We will be working with Python 3, which was released in December 2008.

 
![image](https://d2h0cx97tjks2p.cloudfront.net/blogs/wp-content/uploads/sites/2/2017/12/Features-of-python-01.jpg)

### Python History

Python was first introduced by <b>Guido Van Rossum</b> in 1991 at the National Research Institute for Mathematics and Computer Science, Netherlands.
[source:https://www.journaldev.com/34415/history-of-python-programming-language]

#### Python Versions
* In 1994, Python 1.0 was released with new features like lambda, map, filter, and reduce.

* Python 2.0 added new features such as list comprehensions, garbage collection systems.

* On December 3, 2008, Python 3.0 (also called "Py3K") was released. It was designed to rectify the fundamental flaw of the      language.
[source:https://www.javatpoint.com/python-history]

### Fun Facts

* Guido van Rossum named it after the comedy show [Monty Python's Flying Circus](https://www.netflix.com/ca/title/70213238).
* Python has become the most popular coding language in the world. This makes a career in Python a great choice.
* Python has just turned 30, but it still has that X factor: 
    * Google users have searched for Python much more than they have searched for Kim Kardashian, Donald Trump, Tom Cruise etc.

### Why Python for Data Science ?
* Simple programming language to pick up, from a syntax point of view. 

* Python also has an active community with a vast selection of libraries and resources.

* Professionals working with data science applications don’t want to be bogged down with complicated programming requirements. They want to use programming languages like Python and Ruby to perform tasks hassle-free.

[source:https://www.cbtnuggets.com/blog/technology/data/why-data-scientists-love-python]

## What is Jupyter?

- For this Bootcamp, we will mostly be using Python via [Jupyter](https://jupyter.org/index.html)

- You can think of Python like a car’s engine, while Jupyter is like a car’s dashboard

  - Python is the programming language that runs computations
  - Jupyter is an integrated development environment (IDE) that provides an interface by adding convenient features and tools

![engine](img/python_jupyter.png)

## Jupyter Notebooks

- Code, plots, formatted text, equations, etc. in a single document
- Run Python code interactively
- Also supports R, Julia, Perl, and over 100 other languages (and counting!)

- Notebooks are great for exploration and for documenting your workflow
- Many options for sharing notebooks in human readable format:
  - Share online with [nbviewer.jupyter.org](http://nbviewer.jupyter.org/)
  - If you use Github, any notebooks you upload are automatically rendered on the site
  - Convert to HTML, PDF, etc. with [nbconvert](https://nbconvert.readthedocs.io/en/latest/)

## Working with Notebooks

A notebook consists of a series of "cells":
- **Code cells**: execute snippets of code and display the output
- **Markdown cells**: formatted text, equations, images, and more

By default, a new cell is always a code cell.

## Python Data Science Ecosystem

The Python libraries for data science are developed and maintained by external "3rd party" development teams
- Python core + 3rd party libraries = **ecosystem** 
- To install and manage 3rd party libraries, you need to use a package manager such as `conda` (which comes with Anaconda/Miniconda)

Some of the libraries in the Python data science ecosystem:

![ecosystem_big](img/ecosystem_big.png)

From [The Unexpected Effectiveness of Python in Science](https://speakerdeck.com/jakevdp/the-unexpected-effectiveness-of-python-in-science) (Jake VanderPlas)

During the Bootcamp, we will be working with `pandas`, `numPy`, `seaborn`, `matplotlib`, `plotly`, `sklearn` and `keras` libraries.

## 2.Basic Syntax and Data Types

### Built in Data Types - Values and Objects
We will been working with values, which are pieces of data that a computer program works with, such as a number or text.
We will assign a lot of these values to objects (variables) with the assignment operator `=`.
These values will always belong to a data type

Here are some data types built-in to the Python language:

* Integers - `int`
* Floating-point numbers - `float` - NaN belongs to this group
* Strings - `str`
* Booleans - `bool` - two values: True and False.
* Lists - `list`
* Tuples - `tuple`
* Sets - `set`
* Dictionaries - `dict`


In [1]:
length = 46

In [2]:
string = "Mulan's dragon's name is Mushu"

In [3]:
type(length)

int

In [7]:
type(string)

str

### String "Verbs"

There are several methods/verbs to transform strings or extract information from them.

In [4]:
len(string)

30

In [5]:
string.upper()

"MULAN'S DRAGON'S NAME IS MUSHU"

### Numerical Data Types and Casting

In [10]:
age = 6 
type(age)

int

In [11]:
age = 6.0 
type(age)

float

**IMPORTANT**
Something that you will notice in Pandas dataframes are NaN values.

This stands for **Not A Number**, and it is a special value used to represent missing data in pandas.

It is considered a **float** numeric value.

**Casting**

- `int` to `float`:

In [12]:
float(4)

4.0

- `int` to `str`

In [13]:
length = 40
length_string = str(length)
print(length_string)
type(length_string)

40


str

- `float` to `int`

In [14]:
int(4.99)

4

### Python data structures: Lists, Tuples, Dictionaries and Sets

### Lists

Similarly to how a string is a sequence of characters in order, a   
list is a sequence of elements with a particular order.

Lists can be identified by their square brackets.

The elements in a list can be any objects, and they don’t all need to have the same type.

In [17]:
string_list = string.split()
string_list

["Mulan's", "dragon's", 'name', 'is', 'Mushu']

By using the .split() verb, we can get a list from a string.

##### Slicing

We can slice lists by elements.

We slice with `[]`; the start is inclusive, and the end is exclusive.

So string_list[1:3] fetches elements 1 and 2, but not 3.

In other words, it gets the 2nd and 3rd elements in the list.

[ x, y ) <- range

In [18]:
string_list[1:3]

["dragon's", 'name']

#### List Comprehension

We can manipulate a whole list using list comprehension:

In [19]:
new_list = [0,3,4,5,7,8,3,3,7,9,5,5]

In [20]:
[x+5 for x in new_list]

[5, 8, 9, 10, 12, 13, 8, 8, 12, 14, 10, 10]

### Tuples
Tuples are a data structure very similar to lists but with two main differences:

They are represented with parentheses instead of square brackets, and
They are immutable

In [15]:
my_tuple = ('I', None,  'do', 1, False)
my_tuple

('I', None, 'do', 1, False)

### Sets

Data structure that:
- are unordered, meaning there is no element 0 and element 1, and
- The values contained are unique - meaning there are no duplicate entries.
- Sets are made with curly brackets.

In [16]:
my_set = {2, 1.0, 'apple', 1.0, 'apple'}
my_set

{1.0, 2, 'apple'}

#### Dictionaries
Dictionaries are unordered pairs of keys and corresponding values

In [6]:
account_details = {'Name':'Jack Sparrow','Account_Type':'Checking','Branch':'North York','Age':32}
account_details

{'Name': 'Jack Sparrow',
 'Account_Type': 'Checking',
 'Branch': 'North York',
 'Age': 32}

In [7]:
account_details['Account_Type']

'Checking'

In [8]:
account_details.keys()

dict_keys(['Name', 'Account_Type', 'Branch', 'Age'])

In [10]:
account_details.items()

dict_items([('Name', 'Jack Sparrow'), ('Account_Type', 'Checking'), ('Branch', 'North York'), ('Age', 32)])

#### Why do we need dictionaries if we have lists?

Dictionaries can have labels or keys associated with a value whereas a list only has an index. In a list the order matters, in a dictionary, the order does not matter

### Summary

|Data Structure	| Preserves order | Mutable | Symbol| Can contain duplicates | Can be sliced |
|---------|------|------|------|------|------|
|str	|✓	|☓	|''  , ""|	✓|✓|
|list	|✓	|✓	|[] |	✓|✓|
|tuple	|✓	|☓	|() |	✓|✓|
|set	|☓	|✓	|{} |	☓|☓|
|dict  |✓	|✓	|{} | 	☓| ☓|

### Errors and Exceptions Handling

In [7]:
def div_func(a,b):
    '''Here write documentation'''
    result = a/b
    return result

In [8]:
div_func(5,2.0)

2.5

In [9]:
div_func(5,0.0)

ZeroDivisionError: float division by zero

In [10]:
def div_func(a,b):
    try:
        result = a/b
    except ZeroDivisionError:
        print('b is zero and division is not possible')
        return
    return result

In [11]:
div_func(5,-5)

-1.0

In [12]:
div_func(5,0.0)

b is zero and division is not possible


In [13]:
def count_letter_in_word(input_word='gulp'):
    return len(input_word)

In [14]:
count_letter_in_word(5)

TypeError: object of type 'int' has no len()

In [11]:
def count_letter_in_word(input_word='gulp'):
    if type(input_word)!=str:
        raise TypeError('incorrect input type. Please provide a string input')
    return len(input_word)

In [34]:
count_letter_in_word('horse')

5

In [16]:
count_letter_in_word(5)

TypeError: incorrect input type. Please provide a string input

In [12]:
def count_letter_in_word_new(input_word='gulp'):
    result=0
    try:
        result=len(input_word)
    except Exception as e:
        print(e)
    return result

In [37]:
count_letter_in_word_new(7)

object of type 'int' has no len()


0

### Handling Date and Time in Python

**Get current date time**

In [17]:
# import datetime class from datetime module
from datetime import datetime

# get current date
datetime_object = datetime.now()
print(datetime_object)
print('Type :- ',type(datetime_object))

2021-04-27 11:44:37.511317
Type :-  <class 'datetime.datetime'>


In [18]:
my_string = '2018-11-3'

# Create date object in given time format yyyy-mm-dd
my_date = datetime.strptime(my_string, "%Y-%m-%d")

print(my_date)
print('Type: ',type(my_date))

2018-11-03 00:00:00
Type:  <class 'datetime.datetime'>


In [49]:
print('Month: ', my_date.month) # To Get month from date
print('Year: ', my_date.year) # To Get month from year

Month:  11
Year:  2018


#### Measuring Time Span with Timedelta Objects

In [13]:
#import datetime
from datetime import datetime, timedelta
# get current time
now = datetime.now()
print ("Today's date: ", str(now))

Today's date:  2021-05-25 12:01:46.292124


In [15]:
#add 15 days to current date
future_date_after_15days = now + timedelta(days = 15)
print('Date after 15 days: ', future_date_after_15days)

#subtract 2 weeks from current date
two_weeks_ago = now - timedelta(weeks = 2)
print('Date two weeks ago: ', two_weeks_ago)
print('two_weeks_ago object type: ', type(two_weeks_ago))

Date after 15 days:  2021-06-09 12:01:46.292124
Date two weeks ago:  2021-05-11 12:01:46.292124
two_weeks_ago object type:  <class 'datetime.datetime'>


#### Find the Difference Between Two Dates and Times

In [16]:
# import datetime
from datetime import date
# Create two dates
date1 = date(2008, 8, 18)
date2 = date(2008, 8, 10)

# Difference between two dates
delta = date2 - date1
print("Difference: ", delta.days)
print('delta object type: ', type(delta))

Difference:  -8
delta object type:  <class 'datetime.timedelta'>


**Accessing certain date attributes**

In [53]:
date1.month

8

#### Formatting Dates: More on <code>strftime()</code> and <code>strptime()</code>

<code>strptime()</code> is the method we used before, and you’ll recall that it can turn a date and time that’s formatted as a text string into a datetime object, in the following format:

<code>time.strptime(string, format)</code>

Note that it takes two arguments:

<code>string</code> − the time in string format that we want to convert

<code>format</code> − the specific formatting of the time in the string, so that <code>strptime()</code> can parse it correctly

In [54]:
# import datetime
from datetime import datetime
date_string = "1 August, 2019"

# format date
date_object = datetime.strptime(date_string, "%d %B, %Y")

print("date_object: ", date_object)

date_object:  2019-08-01 00:00:00


#### Below is a list of directives used in <code>strftime()</code> and their meanings

| Directive | Meaning|
|-----------|--------|
| %a |Weekday abb. Sun, Mon, …, Sat (en_US)|
| %A | Weekday Full Sunday,Monday (en_US)|
| %w | weekday as decimal 0,1,2,3.. |
| %d | day of month as decimal 0,1,2,...30 ||
| %b | Month Jan, Feb, …, Dec (en_US) |
| %B | Month January,February... |
| %m | Month as a zero-padded decimal |
| %y | Year without century as zero padded decimal 00,01,..,99 |
| %Y | Year with century 1970,1980 etc. |
| %H | our (24-hour clock) as a zero-padded decimal number.00, 01, …, 23 |
| %I | Hour (12-hour clock) as a zero-padded decimal number.01, 02, …, 12 |
| %p | Am,PM |
| %M | Minute 00,01..,59 |
| %S | Second 00,01..,59 |


### IDEs for Python and DS 

![](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python+IDEs/patolino-pernalonga-python-ide2.gif)

Finding an IDE that works for you is important. 

If you are already familiar with an IDE, use it. 

Some previous LHL students have reported being happy with Visual Studio and others are happy with Jupyter Notebooks.

### Python challenge

#### Goal:
Make a two-player Rock-Paper-Scissors game.Ask for player plays (using input), compare them, print out a message of congratulations to the winner, and ask if the players want to start a new game

#### Rules:

* Rock beats scissors
* Scissors beats paper
* Paper beats rock

#### Concepts Used

* Loops( for,while etc.)
* if else

#### Function Template

* def compare(user1_answer='rock',user2_answer='paper')

#### Error Handling
* If the user_answer input is anything other than 'rock','paper' or 'scissors' return a string stating 'invalid input'

#### Libraries needed
* sys - to accept user input.

#### Example on how to use the library <code>sys</code> for input is below

In [56]:
import sys

age = input("What's your age?")
days_old = float(age)*365
print('you are '+ str(days_old)+' days old.')

What's your age?10
you are 3650.0 days old.


#### Basic if else

In [57]:
if int(age) > 30 and int(age)<=100:
    print('you are an adult')
elif int(age) < 30 and int(age) >=15:
    print('you are a young adult')
elif int(age) >=0 and int(age) <15:
    print('you are a kid')
else:
    print(' I do not understand the input!')

you are a kid


#### Code Solution for challenge

In [58]:
user1 = input("What's your name?")
user2 = input("And your name?")

What's your name?A
And your name?B


In [61]:
user1_answer = input("%s, do yo want to choose rock, paper or scissors?" % user1)
user2_answer = input("%s, do you want to choose rock, paper or scissors?" % user2)

A, do yo want to choose rock, paper or scissors?
B, do you want to choose rock, paper or scissors?


In [1]:
def compare(u1, u2):
    if ___ == _____:
        return("It's a tie!")
    ____ u1 == 'rock':
        ____ u2 == 'scissors':
            return("Rock wins!")
        ____:
            return("Paper wins!")
    ____ u1 == 'scissors':
        ____ u2 == 'paper':
            return("Scissors win!")
        ____:
            return("Rock wins!")
    ____ u1 == 'paper':
        ____ u2 == 'rock':
            return("Paper wins!")
        ____:
            return("Scissors win!")
    ____:
        return("Invalid input! You have not entered rock, paper or scissors, try again.")
        sys.exit()

SyntaxError: invalid syntax (<ipython-input-1-8574ebf79990>, line 4)

In [None]:
print(compare(user1_answer, user2_answer))

## Cooking Challenge

Alberto is making spaghetti tonight and he needs to make sure that if he doesn't have enough of the ingredients in his pantry, he adds them to his shopping list.

- For each item in the recipe, check if the ingredient is in Alberto's pantry.

- If the recipe ingredient is in the pantry, check if the recipe requires more of the ingredient than what Alberto has in storage. If so, add the name and the quantity he needs to purchase as key-value pairs in the dictionary shopping_list.

- If the recipe item is not in the pantry, add the ingredient and the quantity as **key-value pairs in the dictionary** shopping_list.

In [1]:
pantry = {'pasta': 3, 'garlic': 4,'sauce': 2,
          'basil': 2, 'salt': 3, 'olive oil': 3,
          'rice': 3, 'bread': 3, 'peanut butter': 1,
          'flour': 1, 'eggs': 1, 'onions': 1, 'mushrooms': 3,
          'broccoli': 2, 'butter': 2,'pickles': 6, 'milk': 2,
          'chia seeds': 5}

meal_recipe = {'pasta': 2, 'garlic': 2, 'sauce': 3,
          'basil': 4, 'salt': 1, 'pepper': 2,
          'olive oil': 2, 'onions': 2, 'mushrooms': 6}


shopping_list = dict()

In [64]:
shopping_list

{}

In [4]:
for key, value in pantry.items():
    print(key, value)

pasta 3
garlic 4
sauce 2
basil 2
salt 3
olive oil 3
rice 3
bread 3
peanut butter 1
flour 1
eggs 1
onions 1
mushrooms 3
broccoli 2
butter 2
pickles 6
milk 2
chia seeds 5


In [12]:
for ingredient in pantry:
    print("first loop", ingredient)
    for ingredient2 in meal_recipe:
        print("2", ingredient2)

first loop pasta
2 pasta
2 garlic
2 sauce
2 basil
2 salt
2 pepper
2 olive oil
2 onions
2 mushrooms
first loop garlic
2 pasta
2 garlic
2 sauce
2 basil
2 salt
2 pepper
2 olive oil
2 onions
2 mushrooms
first loop sauce
2 pasta
2 garlic
2 sauce
2 basil
2 salt
2 pepper
2 olive oil
2 onions
2 mushrooms
first loop basil
2 pasta
2 garlic
2 sauce
2 basil
2 salt
2 pepper
2 olive oil
2 onions
2 mushrooms
first loop salt
2 pasta
2 garlic
2 sauce
2 basil
2 salt
2 pepper
2 olive oil
2 onions
2 mushrooms
first loop olive oil
2 pasta
2 garlic
2 sauce
2 basil
2 salt
2 pepper
2 olive oil
2 onions
2 mushrooms
first loop rice
2 pasta
2 garlic
2 sauce
2 basil
2 salt
2 pepper
2 olive oil
2 onions
2 mushrooms
first loop bread
2 pasta
2 garlic
2 sauce
2 basil
2 salt
2 pepper
2 olive oil
2 onions
2 mushrooms
first loop peanut butter
2 pasta
2 garlic
2 sauce
2 basil
2 salt
2 pepper
2 olive oil
2 onions
2 mushrooms
first loop flour
2 pasta
2 garlic
2 sauce
2 basil
2 salt
2 pepper
2 olive oil
2 onions
2 mushrooms

In [6]:
for i in pantry:
    print(pantry[i])

3
4
2
2
3
3
3
3
1
1
1
1
3
2
2
6
2
5


In [8]:
pantry['pasta'] - meal_recipe['pasta']

1

## Additional Resources
[Python Documentation](https://docs.python.org/3/contents.html)  
[O'Reilly Python DS Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/)  
[UBC Programming in Python for DS](https://prog-learn.mds.ubc.ca/en/)