# Python


## Intro to Python

*Computer programming* is the process of giving instructions to a computer to perform an action or set of actions. Computer programming is done using a *programming language*--the words and symbols we use to write instructions for computers to follow.

Data professionals use Python to analyze data in faster, more efficient, and more powerful ways because it optimizes every phase of the data workflow--exploring, cleaning, visualizing data, and creating machine learning models.

Python, R, Java, and C++ are four of the most commonly used programming languages for data analysis. The following chart compares them using five considerations: speed, accessibility, variable, data science focus, and programming paradigm.
- Speed: Compile time, runtime, hardware, installed dependencies, and code efficiency all contribute to the speed of a program's execution.
- Accessibility: Refers to how easy the programming language is to learn and use.
- Variables: The way a program uses variables will have an effect on a systems core operations or kernel speed. Languages that use *static variables* (i.e., strongly-typed) maintain a value throughout the entire run of a program. Languages that use *dynamic variables* (i.e., weakly-typed) allow values to be determined when the program is run.
- Data science focus: Some programming languages have individual characteristics that better serve tasks in data analysis.
- Programming paradigm: *Object-oriented* programming languages are modeled around data objects. *Functional* programming languages are modeled around functions. *Imperative* lanaguages are modeled around code statements that can alter the state of the program itself.


| Features by software | Python | R | Java | C++ |
| --- | --- | --- | --- | --- |
| Speed | Slower | Depends on configuration and add-ons | Faster | Very Fast |
| Accessibility | Easy to learn | Complex | Easy to learn | Complex |
| Variable | Dynamic | Dynamic | Static | Declarative |
| Data science focus | Machine learning and automated analysis | Exploratory data analysis and building extensive statistical libraries | Used across projects with open-source assets | Not as widely used but very powerful implementations | 
| Programming paradigm | Object-oriented | Functional | Object-oriented | Multi-paradigm (imperative & object oriented |

<a name="jupyter-notebooks"></a>
## Jupyter Notebooks

*Jupyter notebooks* are open-source web appplications for creating and sharing documents containing live code, mathematical formuals, visualizations, and text.

Jupyter notebooks are partitioned into *cells*--modular code input or output fields.

Learn more about Jupyter notebooks and the Jupyter project online: [docs.jupyter.org](https://docs.jupyter.org/en/latest/https://docs.jupyter.org/en/latest/).

## Object-Oriented Programming

Object-oriented programming is a programming system that is based around objects, which can contain both data and code that manipulates that data.

An *object* is an instance of a class; a fundamental building block of Python. A *class* is an object's data type that bundles data and functionality together.

As an example, by assigning a value to the *string* class, it enables us to use functionality of a string, including `swapcase`, `replace`, and `split`.

In [3]:
# Assign a string to a variable and check its type
magic = 'HOCUS POCUS'
print(type(magic))

<class 'str'>


In [4]:
# Use swapcase() string method to convert from caps to lowercase
magic = 'HOCUS POCUS'
magic = magic.swapcase()
magic

'hocus pocus'

In [5]:
# Use replace() string method to replace some letters with other letters
magic = magic.replace('cus', 'key')
magic

'hokey pokey'

In [6]:
# Use split() string method to split the string into 2 strings
magic = magic.split()
magic

['hokey', 'pokey']

`swapcase`, `replace`, and `split` are examples of *methods*. A method is a function that belongs to a class and typically performs an action or operation.

Methods and attributes in a class are acccessed using *dot notation*. 

The core Python classes include:
- Integers
- Floats
- Strings
- Booleans
- Lists
- Dictionaries
- Tuples
- Sets
- Frozensets
- Functions
- Ranges
- None

An *attribute* is a value associated with an object or class which is reference by name using dot notation.

For example, a Pandas DataFrame has attributes called `shape` and `columns`.

In [8]:
pip install pandas

Collecting pandas
  Downloading pandas-2.0.2-cp39-cp39-macosx_11_0_arm64.whl (10.9 MB)
     |████████████████████████████████| 10.9 MB 12.2 MB/s            
[?25hCollecting tzdata>=2022.1
  Downloading tzdata-2023.3-py2.py3-none-any.whl (341 kB)
     |████████████████████████████████| 341 kB 7.6 MB/s            
Installing collected packages: tzdata, pandas
Successfully installed pandas-2.0.2 tzdata-2023.3
Note: you may need to restart the kernel to use updated packages.


In [9]:
# Set-up cell to create the `planets` dataframe
# (This cell was not shown in the instructional video.)
import pandas as pd
data = [['Mercury', 2440, 0], ['Venus', 6052, 0,], ['Earth', 6371, 1],
        ['Mars', 3390, 2], ['Jupiter', 69911, 80], ['Saturn', 58232, 83],
        ['Uranus', 25362, 27], ['Neptune', 24622, 14]
]

cols = ['Planet', 'radius_km', 'moons']

planets = pd.DataFrame(data, columns=cols)

In [10]:
# Display the `planets` dataframe
planets

Unnamed: 0,Planet,radius_km,moons
0,Mercury,2440,0
1,Venus,6052,0
2,Earth,6371,1
3,Mars,3390,2
4,Jupiter,69911,80
5,Saturn,58232,83
6,Uranus,25362,27
7,Neptune,24622,14


In [11]:
# Use shape dataframe attribute to check number of rows and columns
planets.shape

(8, 3)

In [12]:
# Use columns dataframe attribute to check column names
planets.columns

Index(['Planet', 'radius_km', 'moons'], dtype='object')

Python lets you define your own classes, each with their own special attributes and methods.

For example, suppose we want to build a Spaceship class to be reused later. A class is like a blueprint for all things that share characteristics and behaviors. In this case, the class is Spaceship. There can be all different kinds of spaceships. They can have different names and different purposes. Whenever you create an object of a given class, you’re creating an instance of that class.

In [14]:
class Spaceship:
    
    # class attribute
    tractor_beam = 'off'
    
    # class constructor--called whenever a new instance of the class is created
    def __init__(self, name, kind):
        self.name = name
        self.kind = kind
        self.speed = None
    
    # instance methods
    def warp (self, warp):
        self.speed = warp
        print(f'Warp {warp}, engage!')
        
    def tractor(self):
        if self.tractor_beam == 'off':
            self.tractor_beam = 'on'
            print('Tractor beam on.')
        else:
            self.tractor_beam = 'off'
            print('Tractor beam off.')

To create an instance of a `Spaceship`, we need to supply a name and kind. Then, we can use the functions and attributes of the instance.

In [16]:
# Create an instance of the Spaceship class (i.e. "instantiate")
ship = Spaceship('Mockingbird','rescue frigate')

# Check ship's name
print(ship.name)

# Check what kind of ship it is
print(ship.kind)

# Check tractor beam status
print(ship.tractor_beam)

# Set warp speed
ship.warp(7)

# Check speed
ship.speed

# Toggle tractor beam
ship.tractor()

# Check tractor beam status
print(ship.tractor_beam)

Mockingbird
rescue frigate
off
Warp 7, engage!
Tractor beam on.
on


## Variables and data types

Variables can store values of any data type. A data type is an attribute that describes a piece of data based on its values, its programming language, or the operations it can perform.

*Assignment* means the process of storing a value in a variable. An *expression* is a combination of numbers, symbols, or other variables that produce a result when evaluated.

Python is *dynamically-typed*. This means that variables can point to objects of any type.

*Naming restrictions* are rules built into the language that must be followed. When naming variables, programmers must adhere to all naming conventions.

- *Keywords* must be avoided when naming variables. Keywords are special words that are reserved for a specific purpose and can only be used for that purpose (e.g., `for`, `in`, `if`, `else`).
- Avoid using function names in variables (e.g., `str`, `print`).
- Only include letters, numbers, and underscores. You cannot use special characters or whitespace. Variable names must start with a letter or an underscore.

Variable names are case-sensitive.

There are some best-practice naming conventions to help make code readable and maintainable:
- Descriptive names are better than cryptic abbreviations because they help other programmers (and you) read and interpret your code.
- Variable names and function names should be written in snake_case, which means that all letters are lowercase and words are separated using an underscore. 

See [PEP 8 Style Guide for Python](https://peps.python.org/pep-0008/) to review other style tips.

In [1]:
# Assign a list containing players' ages
age_list = [34, 25, 23, 19, 29]

In [7]:
# Find the maximum age and assign to `max_age` variable
max_age = max(age_list)
max_age

34

In [3]:
# Convert `max_age` to a string
max_age = str(max_age)
max_age

'34'

In [4]:
# Reassign the value of `max_age`
max_age = 'ninety-nine'
max_age

'ninety-nine'

In [9]:
# Find the maximum age and assign to `max_age` variable
max_age = max(age_list)
# Find the minimum age and assign to `min_age` variable
min_age = min(age_list)

# Subtract `min_age` from `max_age`
max_age - min_age

15

Python is able to manipulate variables using operations in expressions.

In [10]:
# Addition of 2 ints
print(7+8)

15


Similar to integers, Python can add (i.e., concatenate) two strings together.

In [11]:
# Addition of 2 strings
print("hello " + "world")

hello world


However, Python cannot add a string and an integer together.

In [12]:
# You cannot add a string to an integer
print(7+"8")

TypeError: unsupported operand type(s) for +: 'int' and 'str'

The built-in `type` function can be used to determine the type of a variable.

In [13]:
# The type() function checks the data type of an object
type("A")

str

In [14]:
# The type() function checks the data type of an object
type(2)

int

In [None]:
# The type() function checks the data type of an object
type(2.5)

Python will implicitly convert the result of an expression to the appropriate data type. In this example, the result of adding an integer and a float together is a float.

In [15]:
# Implicit conversion
print(1 + 2.5)

3.5


Programmers can also explicitly convert types to another type. In this example, the result of `2+2` is being converted into a string.

In [16]:
# Explicit conversion (the str() function converts a number to a string)
print("2 + 2 = " + str(2 + 2))

2 + 2 = 4


## Functions

Functions and methods are very similar, but there are a few key differences. Methods are a specific type of function. They are functions that belong to a class. 

To learn more about functions, check out the [Functions reference guide](./content/Functions.pdf) in the content directory.

There are many functions built-in to Python like `print()` and `type()`.

In [1]:
# The print() function can print text to the screen
print('Black dove, where will you go?')

Black dove, where will you go?


In [2]:
# The type() function returns an object's data type
number = 15

type(number)

int

In [3]:
# The str() function converts an object into a string
number = str(number)

type(number)

str

You can also define custom functions that accept parameters--like the function below called `greeting()`.

In [4]:
# Define a function
def greeting(name):

    print('Welcome, ' + name + '!')
    print('You are part of the team!')

greeting('Rebecca')

Welcome, Rebecca!
You are part of the team!


It is best practice to define a function for code that may need to be repeated many times, like calculating the area of a triangle. By defining the logic inside a function, it can be used to calculate the area of many different triangles.

In [5]:
# Define a function to calculate area of triangle
def area_triangle(base, height):
    return base * height / 2

In [6]:
# Use the function to assign new variables and perform calculations
area_a = area_triangle(5, 4)
area_b = area_triangle(7, 3)
total_area = area_a + area_b
total_area

20.5

In [7]:
# Define a function that converts hours, minutes, and seconds to total seconds
def get_seconds(hours, minutes, seconds):
    total_seconds = 3600*hours + 60*minutes + seconds
    return total_seconds

In [8]:
# Use the function to return a result
get_seconds(16, 45, 20)

60320

## Docstrings

Docstrings can be used to scaffold your code. Docstrings are entered at the top of a function as a multi-line string.

In [9]:
def seed_calculator(fountain_side, grass_width):
    """
    Calculate number of kilograms of grass seed needed for
    a border around a square fountain.

        Parameters:
            fountain_side (num): length of 1 side of fountain in meters
            grass_width (num): width of grass border in meters

        Returns:
            seed (float): amount of seed (kg) needed for grass border
    """
    # Area of fountain
    fountain_area = fountain_side**2
    # Total area
    total_area = (fountain_side + 2 * grass_width)**2
    # Area of grass border
    grass_area = total_area - fountain_area
    # Amount of seed needed (35 g/sq.m)
    seed = grass_area * 35
    # Convert to kg
    seed = seed / 1000

    return seed

In [10]:
seed_calculator(12, 2)

3.92

## Comparitors

In [27]:
# > checks for greater than
print(10>1)

True


In [13]:
# == checks for equality
print("cat" == "dog")

False


In [14]:
# != checks for inequality
print(1 != 2)

True


In [15]:
# Some operators cannot be used between different data types
print(1 < "1")

TypeError: '<' not supported between instances of 'int' and 'str'

In [16]:
# Letters that occur earlier in the alphabet evaluate to less than letters from later in the alphabet
# BOTH sides of an `and` statement must be true to return True
print("Yellow" > "Cyan" and "Brown" > "Magenta")

False


In [17]:
# An `or` statement will return True if EITHER side evaluates to True
print(25 > 50 or 1 != 2)

True


In [18]:
# `not` reverses Boolean evaluation of what follows it
print(not 42 == "Answer")

True


## Conditional Statements

In [20]:
# Define a function that checks validity of username based on length
def hint_username(username):
    if len(username) < 8:
        print("Invalid username. Must be at least 8 characters long.")
    else:
        print("Valid username.")

In [21]:
# Define a function that uses modulo to check if a number is even
def is_even(number):
    if number % 2 == 0:
        return True
    return False

In [22]:
is_even(19)

False

In [23]:
is_even(20)

True

In [24]:
# Define a function that checks validity of username based on length
def hint_username(username):
    if len(username) < 8:
        print("Invalid username. Must be at least 8 characters long.")
    elif len(username) > 15:
        print("Invalid username. Cannot exceed 15 characters.")
    else:
        print("Valid username.")

In [25]:
hint_username("ljñkljfñklasdjflkñadjglk{a")

Invalid username. Cannot exceed 15 characters.
