# Workshop 1: Introduction to Python 
This workshop is modified from examples from [Computational and Inferential Thinking: The Foundations of Data Science](https://inferentialthinking.com/chapters/intro.html).

Python is a high-level, interpreted programming language known for its simplicity and readability. It is widely used in various domains, including web development, data analysis, text analysis, artificial intelligence, scientific computing, and more. Python's versatility and extensive library ecosystem make it a popular choice for both beginners and experienced developers.

## Applications of Python
| **Category** | **Libraries/Tools** | **Example Use Case**|
| -------------|---------------------|---------------------|
| Web Development | Django, Flask | Build a blog, dashboard, or interactive visualization|
| Data Science | pandas, NumPy, Matplotlib | Analyze, clean, and visualize quanitative or qualitative data|
| Automation | os, shutil, sched | Rename a batch of files |
| Text Analyis | spaCy, NLTK | Sentiment analysis on reviews |
| Mapping | GeoPandas, Folium, geopy | Plot locations on an interactive map |
| Data Management | pandas, `csv`, `xlm`, `json` | Clean and standardize metadata for a digital collection |
| Webscraping | BeautifulSoup, Scrapy | Collect daily weather data |
| Machine Learning | scikit-learn, TensorFlow | Classify emails as spam|
| Data Processing/Cleaning | pandas, regex, `csv` | Clean historical census data for analysis |
| Education |  Jupyter | Provide introductory material in an easy format |


## Data Types

Similar to learning other languages, programming languages also depend on grammatical rules that govern the way the language is utilized. 

Data types are the kinds of values you can use in Python. Every value in Python has a type and unlike other many other languages you do not have to explicitly set these types. However because of that you have to be diligent to keep track of the types as you go. 

Each type will have it's own set of functions and expressions that can performed. 

### Basic Data Types
| Type | Example | Description |
|------|---------|-------------|
| `int`| 42, -2, 3244235 | Whole numbers (positive or negative) |
| `float` | 3.193727, 0.55, -2423.34243 | Decimal numbers |
| `str` | "hello", "hello, world", "h" | Text (string of characters) | 
| `bool` | True, False | Boolean values (logic, conditionals) | 

Example of discrete and continuous observations. 

Discrete:
![Penguin with hatchlings](img/penguin_clutch.jpg)

Continuous:
<img src="img/penguin_bill.png" alt="Penguin Bill Measurements" width="300"/>

**Data Type 1:** Integers (Ints) - Whole numbers either positive or negative. 

In [None]:
## Now let's go through some examples of these data types 

## number of penguins - ints
18 
10
324234


### Variable Names 

Instead of solving for the same thing again and again, often times we will give names to the expressions we compute. This can be seen as the same thing as setting something equal to variable in algebra. We can do this through assignment statements by setting a name equal to an expression using an equal sign (=). Let's try some examples. 

You will always see the variable name on the left side of the equal sign and the value or expression on the right side. 

` VARIABLE NAME = EXPRESSION OR VALUE`


In [None]:
num_of_penguins = 18

### Expresssions

We can combine our data types or values to create more complex expressions. 

In [None]:
3 * 4

In [None]:
num_of_penguins_on_island_one = 10 + 18 + 22 
print(num_of_penguins_on_island_one)

We can also do other types of mathematics: 

| Operation | Symbol | Example | Output |
|------|---------|-------------|--------|
| Addition | `+` | `1 + 1`| `2`|
| Subtraction |  `-` | `2 - 1` | `1`|
| Multiplication | `*` | `2 * 2` | `4`|
| Division | `/` | ` 3 / 1 ` | `3`|
| Division (floor) | `//` | `10 // 3`| `3`|
| Remainder | `%` | `10 % 3` | `1`|
| Exponent | `**` | ` 3 ** 2` | `9`| 

**Data Type 2:** Floats - Decimal numbers either positive or negative. 

In [None]:
##  bill sizes and averages - floats 
average_penguins_per_species_island_one = num_of_penguins_on_island_one / 3.0
print(average_penguins_per_species_island_one)

12.2 

bill_length_mm = 12.2
bill_depth_mm = 5.1

Let's try combining variable names with what you know about expressions and numbers. 


**Try this yourself!** <br /> Let’s combine what we know about variable names, expressions, and int/floats to calculate the adjusted body mass of a penguin after a simulated feeding period. Suppose you observe a penguin that weighs 4,200 grams. Based on typical feeding patterns, you expect the penguin to gain 8% of its body mass over a week. After gaining weight, the penguin is tagged, and the tagging equipment adds an additional 150 grams.

First, calculate the weight gain. Then add the equipment weight to get the total adjusted body mass.

In [None]:
## Try this yourself by setting the penguin's weight, calculating the weight gain, and then calculating the adjusted body mass.
## Print this out in both grams and kg (1 kg = 1000 g)

penguin_weight = ...
weight_gain = ...
equipment_weight = ...
adjusted_body_mass = ...

print(adjusted_body_mass, weight_gain)
print("Adjusted body mass:", adjusted_body_mass, "grams or ", adjusted_body_mass / 1000, "kg")

### Strings

So far we've just talked about numbers, but what about data in text form? A piece of text represented in a computer is called a *string*. A string can be a single character, a word, a sentence, or even an entire corpus of material. Let's look at some examples of strings. 

**Data Type 3:** Strings - series of one of more characters of text. This could include even numbers when enclosed with quotes. Quotation used can be single '' or double "". 

In [None]:
'hello, world'

In [None]:
'h'

There are lots of interesting things we can do with strings - we'll talk about some of the more complicated ones in minute - but one of the easiest things to do with strings is to combine them. We can do this with a simple addition sign (+).

In [None]:
'data' + 'science'

In [None]:
"data" + 6

## Functions

Let's say we're doing the same thing again and again. We can create a function - which is esssentially a bit of code that performs some task when we call it. What do we mean when we call it?  Functions all have names so we can use those names when we want to invoke a specific function. There are several functions that are built-into the Python language including: print(), max(), type(), min(), mean(), and many more.  Let's try out a couple of these: 

In [None]:
'Today is Monday'
print('Hello, world')
'Today is Monday'

In [None]:
len('abcdefghijklmnopqrstuvwxyz')

In [None]:
max(1, 2, 3, 4)

In [None]:
newNumber = int('5')
newNumber + 10

We also have several different kinds of string specific functions such as .upper() and .replace(). Notice these have to be used with strings (as indicated by the dot (.)) so you would call this with string.upper() - let's look at some examples.

In [None]:
word = 'data'
word.upper()

In [None]:
word

**Try this yourself!** <br /> Given the heights of the three of the highest scoring WNBA players (Diane Taurasi, Tamika Catchings, Tina Thompson), write an expression that computes the smallest difference between any of the three heights. Your expression shouldn't have any numbers in it, only function calls and the names `diane`, `tamika`, and `tinaT`. Give the value of your expression the name `min_height_difference`.

In [None]:
# The three players' heights, in meters:
diane =  1.83 # Diane Taurasi is 6'0"
tamika = 1.91  # Tamika Catchings 6'3"
tineT = 1.88 # Tina Thompson is 6'2"

# We'd like to look at all 3 pairs of heights, compute the absolute
# difference between each pair, and then find the smallest of those
# 3 absolute differences.  This is left to you!  If you're stuck,
# try computing the value for each step of the process. 
min_height_difference = ...

Let's go back to our penguin example and look at some more string functions. 

In [None]:
num_penguins = 18

In [None]:
## We can also convert numbers to strings
num_penguins_str = str(num_penguins)
print(num_penguins_str, type(num_penguins_str))
print(num_penguins, type(num_penguins))

In [None]:
## Now let's look more at strings 

In [None]:
## we can do a couple different things with strings, like finding their length
species = "gentoo"
species_length = len(species)
print(species_length)

In [None]:
## or we can add strings together
region = "Antarctica" 
island = "Dream"
full_location = region + " - " + island
full_title = "The " + species + " penguin from " + full_location
print(full_title)


In [None]:
## we can also strip characters from strings, select specific words or characters, and convert them to uppercase or lowercase
species = "  Gentoo Penguin  x"
stripped_species = species.strip('x').strip()
print(species)
print(stripped_species)

In [None]:
last_word = full_location.split()[-1]
print(last_word)

In [None]:
uppercase_location = full_location.upper()
print(uppercase_location)

In [None]:

lowercase_location = full_location.lower()
print(lowercase_location)

### Collection Data Types
| Type | Example | Description |
|------|---------|-------------|
| `list`| `[1, 2, 3]` or `[99, 'hello', False]` | Ordered, changeable sequence |
| `dict` | `{"key":"value"}` or `{"Ada":['Librarian', 42, "Carpenter"]}` | Key-value pairs |
| `tuple` | `(1, 2, 3, 3)` | Ordered, unchangeable sequence | 
| `set` | `set(1, 2, 3)` | Unordered collection of unique values | 


**Data Type 4:** List - an ordered, mutable collection of items. Uses square brackets [ ]

In [None]:
## Now let's look at lists, which are a way to store multiple items in a single variable
islands = ["Torgersen", 
           "Biscoe", 
           "Dream", 
           "Torgersen", 
           "Anvers"]
print(islands)
print(type(islands))


We can access items in lists by identifying their "index" which is just the position of the item in the list. 

Unlike some other languages (R and MATLAB), python starts counting at 0. So our first item in the list is actually in index `0`. 

In [None]:
## we can select specific items from a list using their index (Python always starts counting at 0)
first_island = islands[0]
print(first_island)

In [None]:
## an easy way to get the last item is to use a negative index
last_island = islands[-1]
print(last_island)

In [None]:
## we can also add items to a list using append()
islands.append("New Island")
print(islands)


In [None]:
## or we can remove items using remove()
islands.remove("Torgersen")
print(islands)


In [None]:
## we can also sort lists
islands.sort()
print(islands)

**Data Type 5:** Dictionary - Mutable collection of items stored with key-value pairs. Uses curly brackets { }

In [None]:
## now let's look at dictionaries, which are a way to store key-value pairs

penguin_info = {
    "id": 1,
    "species": "Gentoo",
    "island": "Biscoe",
    "weight": 4.5,
    "bill_length": 0.2,
    "bill_depth": 0.05
    
}
print(penguin_info)

print(type(penguin_info))



In [None]:
## we can access specific values in a dictionary using their keys
species = penguin_info["species"]
print(species)


In [None]:
## we can also add new key-value pairs to a dictionary
penguin_info["age"] = 3
print(penguin_info)


In [None]:
## or we can remove key-value pairs using the del keyword
del penguin_info["weight"]
print(penguin_info)

In [None]:
## we can have dictionaries within dictionaries, which is useful for more complex data structures
summer_population = {
    1: {
        "id": 1,
        "species": "Gentoo",
        "island": "Biscoe",
        "weight": 4.5,
        "bill_length": 0.2,
        "bill_depth": 0.05
    },
    22: {
        "id": 22,
        "species": "Adelie",
        "island": "Torgersen",
        "weight": 3.8,
        "bill_length": 0.18,
        "bill_depth": 0.04
    }
}
print(summer_population)


In [None]:
## we can access specific books in the collection using their keys
penguin_id = 22
penguin_details = summer_population[penguin_id]
print(penguin_details)

## we can also access specific values within the nested dictionary
penguin_weight = penguin_details["weight"]
print(penguin_weight)

average_bill_length = summer_population[1]["bill_length"] + summer_population[22]["bill_length"] / 2
print(average_bill_length)

## Loops and Conditionals 

### Conditionals or Comparisons
Sometimes we want to be able to compare different variables or expressions. This could be to find matching amounts/text or perhaps you want to identify when a specifc number rises above a certain threshold. We can do this through using comparison statements. 

**Data Type 6:** Booleans - represent either True or False often in logic or conditional statements

In [None]:
1 > 5

In [None]:
5 > 1

In [None]:
## let's look at some examples of using conditional statements 

penguin_1_weight = summer_population[1]["weight"]
penguin_22_weight = summer_population[22]["weight"]


penguin_1_weight > penguin_22_weight

penguin_1_weight == penguin_22_weight

### Methods of comparison 

| Comparison | Operator | 
|------|---------|
| Less than | `<` |
| Greater than | `>` | 
| Less than or equal to | `<=` | 
| Greater than or equal to | `>=` |
| Equal | `==` |
| Not equal | `!=` |

In [None]:
## We can combine if statements with conditional operators to help move through the code 

if penguin_1_weight > penguin_22_weight:
    print("Penguin 1 is heavier than Penguin 22")
elif penguin_1_weight < penguin_22_weight:
    print("Penguin 1 is lighter than Penguin 22")
else:
    print("Penguin 1 and Penguin 22 have the same weight")



In [None]:
## we can also combine multiple conditions together using keywords like 'and', 'or', and 'not' 
if penguin_1_weight > 3.7 and penguin_22_weight > 3.7:
    print("Both penguins are heavier than the average weight.")
else:
    print("At least one penguin is not heavier than the average weight.")

### Loops (Automation)

Sometimes we will want to repeat the exact same task again and again. In python, loops allow us to create this type of automations to repeat tasks efficiently. The most common types of loops are using *for* and *while*. A for loop is often used to iterate over a sequence such as a list executing the same code for each element in the list. A while loop continues to run as long as a specified condition remains true. 

In [None]:
## The final piece is to look at loops which allows us to iterate over our collection data types

## Let's start with a simple for loop to iterate over a list of islands
for island in islands:
    print("Island:", island)

In [None]:
## we can also use the range function to iterate over a list by the index values 
for i in range(len(islands)):
    print(f"Island {i + 1}: {islands[i]}")


In [None]:
## now let's combine loops and conditional statements
for penguin_id, details in summer_population.items():
    if details["weight"] > 3.5:
        print(f"Penguin {penguin_id} is larger than 3.5 kg and was found in {details['island']}")