# Introduction to Python

V.F. Scalfani, University of Alabama Libraries

Fall 2024

## Anticipated Length of Workshop

1 hour

## Notes and Additional Resources

Code in the notebook is MIT licensed. You can find a copy of the license in the [UALIB_Workshops Repository](https://github.com/UA-Libraries-Research-Data-Services/UALIB_Workshops/blob/master/LICENSE).

Content in this workshop was adapted from previously offered Python workshops from UA Libraries: https://github.com/UA-Libraries-Research-Data-Services/UALIB_Workshops

In addition to the workshop content, we recommend starting with the following Python resources to learn more:

1. https://github.com/jakevdp/WhirlwindTourOfPython

2. http://swcarpentry.github.io/python-novice-gapminder/

3. https://docs.python.org/3/tutorial/index.html

4. Python Crash Course: a hands-on, project-based introduction to programming, by Eric Matthes: [Scout Link](http://libdata.lib.ua.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=cat00456a&AN=ua.8611906&site=eds-live&scope=site)


## What is the purpose of this workshop?

Here is what we will cover:

1. Setting up Python
2. Getting help in Python
3. Python syntax and variables
4. Indexing and accessing data
5. Functions
6. Conditional statements
7. Loops
8. Data I/O
9. Very brief intro to data analysis and plotting

## 1. Setup

1. Go to https://www.python.org/
2. Download the latest python for Windows
3. Run the .exe python file, be sure to select "Add python.exe to PATH", then Install Now
4. Open VS Code and install the Microsoft Python and Jupyter extensions via the Extensions Tab.
5. Open a VS Code terminal (View > Terminal) and type `pip install matplotlib`. This will install the matplotlib libraries (a python software package: https://matplotlib.org/) we will use later in the workshop.

**Note: Installing python libraries system wide via pip is generally not the best approach, but it works for our intorductory lesson today. Moreover, the installation is temporary on the library computers.**

As you develop Python code and your skills, it's often necassary or convenient to have different Python environments for different projects. There are several good solutions available to manage Python dependencies. Our preferred method is to use the conda package manager (e.g., via miniforge). See our previous workshop on "Introduction to Conda" for more information: https://github.com/UA-Libraries-Research-Data-Services/UALIB_Workshops. Another option is Python virtual environments: https://docs.python.org/3/library/venv.html

### Check Installation

Note that this file is an interactive Jupyter notebook; we can run our code within this file!

See: https://jupyter.org/

In [None]:
# First let's make sure everything is working with Python
# Note that starting a line with `#` is a comment in Python
import sys
print(sys.version)

In [None]:
print("Hello World!")

## 2. Getting help in Python

### Web Documentation

We recommend starting out with using the online web-based documentation for Python: https://docs.python.org/3/. See also the Library Reference section for built-in Python functions: https://docs.python.org/3/library/index.html

### help() function 

For additional tips, see: https://jakevdp.github.io/PythonDataScienceHandbook/01.01-help-and-documentation.html

In [None]:
# If you know the name of the function, use the help() function 
# to display the docstring
help(sorted)

In [None]:
# Help also works on variables (we will talk more about that later!)
a = [6,3,0,5]
help(a)

### dir() function

In [None]:
# The Python dir() function is useful for exploring modules
# https://stackoverflow.com/questions/139180/how-to-list-all-functions-in-a-python-module
# this prints a list of available functions and variables

# example with time module
import time  # In python, you will often need to import libraries to use them.
dir(time)

In [None]:
# now we can use help() to get more information about a particular function or method
help(time.sleep)

In [None]:
# Try it
time.sleep(5)

## 3. Python syntax and variables 

Some content in this section adapted and inspired from: https://github.com/jakevdp/WhirlwindTourOfPython

### Simple Variables

In [None]:
# An integer
a = 5
print(a)

In [None]:
type(a)

In [None]:
# Floating points
b = 1.825
print(b)

In [None]:
type(b)

In [None]:
# a string
s1 = "Thanks for coming to my workshop!"
print(s1)

In [None]:
type(s1)

In [None]:
s2 = 'single quotes work too'
print(s2)

In [None]:
# Sometimes it is necassary to use double quotes
s3 = "Like when there is a single quote ' within a string"
print(s3)

### Compound Variables

Two common compound variables are lists and dictionaries. For others, see: https://docs.python.org/3/library/index.html


In [None]:
# create a list with numbers
nums = [1, 3, 7, 11]
print(nums)

In [None]:
type(nums)

In [None]:
# create a list with strings
flowers = ["rose", "tulip", "carnation", "marigold"]
print(flowers)

In [None]:
# Lists can mix types
mixed = [1, "rose", 3, "tulip"]
print(mixed)

In [None]:
# We can even do lists within lists
flowers_list = [[1, 3, 7, 11],["rose", "tulip", "carnation", "marigold"]]
print(flowers_list)

In [None]:
# We often prefer dictionaries over lists
# As it can be easier to keep track of variables and idxs
# note that the dictionary keys must be unique

flowers_dict = {
    "rose": 1,
    "tulip": 3,
    "carnation": 7,
    "marigold": 11
}

In [None]:
flowers_dict

In [None]:
# Or something like this works well too
# If you want more defined key-value pairs
# top level keys should be unique

flowers_entries_dict = {'Entry 1': {'type': 'rose', 'num': 1},
 'Entry 2': {'type': 'tulip', 'num': 3},
 'Entry 3': {'type': 'carnation', 'num': 7},
 'Entry 4': {'type': 'marigold', 'num': 11}}

flowers_entries_dict

## 4. Indexing and accessing data

In [None]:
# Python indexing starts at 0 from left to right. 
# When indexing from right to left, the indexing starts at -1.
print(nums)
print(nums[0])
print(nums[-1])
print(nums[0:2]) # a slice

In [None]:
# When indexing a lists of lists
# Need to go 2 or more levels
print(flowers_list)
print(flowers_list[0])
print(flowers_list[1])
print(flowers_list[0][0])
print(flowers_list[1][3])

In [None]:
# Access dictionary data using keys
print(flowers_dict)
print(flowers_dict["tulip"])

In [None]:
# We could also cast the dictionary into a list
print(list(flowers_dict.keys())[1])
print(list(flowers_dict.values())[1])

In [None]:
# Example with 2 key-value pairs
print(flowers_entries_dict)
print(flowers_entries_dict["Entry 3"])
print(flowers_entries_dict["Entry 3"]["type"])
print(flowers_entries_dict["Entry 3"]["num"])

In [None]:
print(list(flowers_entries_dict.keys())[2])
print(list(flowers_entries_dict.values())[2])
print(list(flowers_entries_dict.values())[2]["type"])
print(list(flowers_entries_dict.values())[2]["num"])

## 5. Functions

https://nbviewer.org/github/jakevdp/WhirlwindTourOfPython/blob/master/08-Defining-Functions.ipynb

### Using Existing Functions



In [None]:
# functions are called with parentheses
print("Hello World!")

In [None]:
# functions can be applied directly to objects: "methods"
myList = [23, 1, 45, 9]
myList.reverse() # () evaluates reverse function method with no arguments
print(myList)

In [None]:
# use a function from within a module
import math
math.sqrt(9)

In [None]:
# or
from math import sqrt
sqrt(9)

### Define Custom Functions

Python functions are defined using the `def`` statement. A general Python syntax format for a function looks like this:

```python

def function_name():
    do something

```
or

```python

def function_name(param1, parmam2, ...):
    do something
```

In [None]:
# functions do not need inputs
def print_flowers():
    """ Prints a list of common flowers
        There are no inputs.
    """
    print("rose")
    print("tulip")
    print("carnation")
    print("marigold")

In [None]:
# call the function
print_flowers()

In [None]:
# Use return to output a variable
def get_name(item):
    """ returns only name of an item before a hyphen"""
    split_item = item.split('-')
    return split_item[0]

In [None]:
my_item = 'rose-12345'
get_name(my_item)

## 6. Conditional statements

https://nbviewer.org/github/jakevdp/WhirlwindTourOfPython/blob/master/07-Control-Flow-Statements.ipynb

A simplified general Python syntax for conditional statements is as follows:

```
if expression1:
  do something1
elif expression2:
  do something2
else:
  do something3

```

### if


Use an `if` statement to make a choice and determine the direction of code execution. Start the line of code with `if` followed by the condition, then end with a colon, `:`. Conditional statements are often tested with comparison operators (e.g., >) or sequence operations (e.g., x in s):

https://docs.python.org/3/library/stdtypes.html#boolean-operations-and-or-not

https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range

In [None]:
# if statement with condition met
flower = 'hydrangea'

# check length of the string
if len(flower) > 5:
  print(flower, 'has more than 5 characters')

In [None]:
# if statement with condition not met
flower = 'lily'

if len(flower) > 5:
  print(flower, 'has more than 5 characters')

### else

In the above example, the condition is not met, so nothing happens, we can add an else condition to create an alternative code execution.

In [None]:
# add an else
flower = 'lily'

if len(flower) > 5:
  print(flower, 'has more than 5 characters')
else:
  print(flower, 'has less than 5 characters')

### elif

Additional conditional tests can be added before else with the elif statement (else if).

In [None]:
# for example, what if we want to test len(flower) == 5
flower = 'tulip'

if len(flower) > 5:
  print(flower, 'has more than 5 characters')
elif len(flower) == 5:
   print(flower, 'has 5 characters') 
else:
  print(flower, 'has less than 5 characters')

In [None]:
# caution, the if-elif-else sequence stops when the first one is true
flower = 'tulip'

if len(flower) > 5:
  print(flower, 'has more than 5 characters')
elif len(flower) == 5:
   print(flower, 'has 5 characters')
elif 'u' in flower:
   print(flower, 'contains the character u') 
else:
  print(flower, 'has less than 5 characters')

In [None]:
# One solution with a boolean

flower = 'tulip'

if len(flower) > 5:
  print(flower, 'has more than 5 characters')
elif len(flower) == 5 and 'u' in flower:
   print(flower, 'has 5 characters and contains the character u')
else:
  print(flower, 'has less than 5 characters')

In [None]:
# Alternative with all ifs
flower = 'tulip'

if len(flower) > 5:
  print(flower, 'has more than 5 characters')
if 'u' in flower:
   print(flower, 'contains the character u')
if len(flower) == 5:
   print(flower, 'has 5 characters')  
if len(flower) < 5:
  print(flower, 'has less than 5 characters')

## 7. Loops

https://github.com/jakevdp/WhirlwindTourOfPython/blob/master/07-Control-Flow-Statements.ipynb

If we wanted to print a series of statements, we could do this one at a time, but it is inefficient:

In [None]:
print("rose")
print("tulip")
print("carnation")
print("marigold")

`for` loops allow repeated execution of code on a known collection of values such as a range of numbers or a list. A general syntax example is as follows:

```python
for item in items:
  do something

```

`while` loops are another type of loop and are useful when you need to iterate for a specific condition and/or don't know the specific number of iterations.
We will not cover these today.

```python
while condition:
  do something

```

In [None]:
flowers = ["rose", "tulip", "carnation", "marigold"]
print(flowers)

In [None]:
# 1. Method one where we access list flowers directly
for flower in flowers:
    print(flower)

In [None]:
# 2. Method two, use a range to access idxs
for idx in range(len(flowers)):
    print(idx, flowers[idx])

In [None]:
# 3. Method 3, use enumerate
for idx, flower in enumerate(flowers):
    print(idx, flower)

In [None]:
# We can also loop through lists of lists, like this:
flowers = [["rose", 1], ["tulip", 5], ["carnation", 30], ["marigold", 50]]

In [None]:
# 1. Direct
for flower, num in flowers:
    print(flower, num)

In [None]:
# 2. range
for idx in range(len(flowers)):
    print(idx, flowers[idx][0], flowers[idx][1])

In [None]:
# 3. enumerate
for idx, (flower, num) in enumerate(flowers):
    print(idx, flower, num)

In [None]:
# It is sometimes necassary and useful to use more than one loop
flowers = [["rose", 1], ["tulip", 5], ["carnation", 30], ["marigold", 50]]

for flower_data in flowers:
    #print(flower_data)
    for item in flower_data:
        print(item)

In [None]:
# Let's look at how to loop through a dictionary:
flowers_dict = {
    "rose": 1,
    "tulip": 3,
    "carnation": 7,
    "marigold": 11
}

In [None]:
for key in flowers_dict.keys():
    print(key)

In [None]:
for value in flowers_dict.values():
    print(value)

In [None]:
# Both at same time
for key,value in flowers_dict.items():
    print(key, value)

In [None]:
# another example

flowers_entries_dict = {'Entry 1': {'type': 'rose', 'num': 1},
 'Entry 2': {'type': 'tulip', 'num': 3},
 'Entry 3': {'type': 'carnation', 'num': 7},
 'Entry 4': {'type': 'marigold', 'num': 11}}

for key, value in flowers_entries_dict.items():
    print(key, value)

In [None]:
# If you want to access value elements
for key, value in flowers_entries_dict.items():
    print(key, value['type'], value['num'])

## 8. Data I/O

### Loading Data

Loading tabular data into python lists or dictionaries is very useful. We can use the built-in csv module: https://docs.python.org/3/library/csv.html

We will use the Iris dataset from here: https://archive.ics.uci.edu/dataset/53/iris, which is licensed as CC-BY 4.0: https://creativecommons.org/licenses/by/4.0/legalcode

Select Download, then extract the zip and copy the bezdekIris.data file in to the same directory as this notebook.

In [None]:
# import the data
import csv

iris_data = []
with open('bezdekIris.data', 'r') as infile:
    reader = csv.reader(infile, delimiter=',')

    for idx,row in enumerate(reader): # this let's us add an index or line number
        if row: # append only non empty rows
           # append to list, but add the word sample and an index
           iris_data.append(["Sample " + str(idx+1)] + row)

In [None]:
iris_data[0:10]

In [None]:
# Alternatively let's use a dictionary
# Attribute information is listed in iris.names file
iris_data_dict = {}
col_names = ['sepal length in cm', 'sepal width in cm', 'petal length in cm', 'petal width in cm', 'class']
with open('bezdekIris.data', 'r') as infile:
    reader = csv.DictReader(infile, delimiter=',', fieldnames=col_names)

    for idx,row in enumerate(reader):
        iris_data_dict["Sample " + str(idx+1)] = row

In [None]:
#list(iris_data_dict.items())[0:5]
iris_data_dict

### Writing Data

Let's go ahead and write the modified data. We can use the same csv module.

In [None]:
# For the iris_data list, which is a list of lists
with open('bezdekIiris_modified.data', 'w', newline='') as outfile:
    writer = csv.writer(outfile, delimiter='\t')

    for row in iris_data:
        writer.writerow(row)

In [None]:
# For the iris_data_dict
with open('bezdekIiris_modified_dict.data', 'w', newline='') as outfile:
    writer = csv.writer(outfile, delimiter='\t')

    # If you want to add header
    #header = ['sample'] # first column
    #for key in list(iris_data_dict.values())[0]:
    #    header.append(key)
    #writer.writerow(header)

    # write the data
    for key, sub_dict in iris_data_dict.items():
        row = [key] + list(sub_dict.values())
        writer.writerow(row)

# 9. Very brief intro to data analysis and plotting

We can use the python statistics module to get some basic summary statistics of the Iris dataset:

https://docs.python.org/3/library/statistics.html


In [None]:
# We can use our existing iris_data_dict variable
iris_data_dict

In [None]:
# first let's create a list of the different variables/features from the iris_data_dict
sepal_lengths = []
for sample in iris_data_dict.values():
    sepal_lengths.append(float(sample['sepal length in cm']))

In [None]:
sepal_lengths

In [None]:
# Another way to do this in Python is on one line like this:
sepal_lengths = [float(sample['sepal length in cm']) for sample in iris_data_dict.values()]
sepal_widths = [float(sample['sepal width in cm']) for sample in iris_data_dict.values()]
petal_lengths = [float(sample['petal length in cm']) for sample in iris_data_dict.values()]
petal_widths = [float(sample['petal width in cm']) for sample in iris_data_dict.values()]

In [None]:
# Now import the statistics library
import statistics

# Create function to print summary statistics
def summary_stats(data):
    print("Mean: " + str(round(statistics.mean(data), 2)))
    print("Median: " + str(round(statistics.median(data), 2)))
    print("Min: " + str(round(min(data), 2)))
    print("Max: " + str(round(max(data), 2)))
    print()

# Print summary statistics for each feature

print("Sepal Length:")
summary_stats(sepal_lengths)

print("Sepal Width:")
summary_stats(sepal_widths)

print("Petal Length:")
summary_stats(petal_lengths)

print("Petal Width:")
summary_stats(petal_widths)

We can use the Matplotlib library to create some basic visualizations of the data:

https://matplotlib.org/

https://matplotlib.org/stable/tutorials/index

In [None]:
import matplotlib.pyplot as plt

# Scatter Plot - Sepal Width vs Sepal Length
# https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html
plt.figure(figsize=(6, 4))
plt.scatter(sepal_lengths, sepal_widths)
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.show()

In [None]:
# We can get a bit fancier
# Define a color for each class

colors = {'Iris-setosa': 'indigo', 'Iris-versicolor': 'orange', 'Iris-virginica': 'teal'}
class_names = [sample['class'] for sample in iris_data_dict.values()]
#print(class_names)

In [None]:
class_colors = [colors[class_name] for class_name in class_names]
#print(class_colors)

In [None]:
# Create Scatter Plot with class colors - Sepal Length vs Sepal Width
plt.figure(figsize=(6, 4))
plt.scatter(sepal_lengths, sepal_widths, c=class_colors)
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.show()

In [None]:
# Add a legend (key) for the classes
# Scatter Plot - Sepal Length vs Sepal Width
plt.figure(figsize=(6, 4))
plt.scatter(sepal_lengths, sepal_widths, c=class_colors)
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')

for class_label, color in colors.items():
    plt.scatter([], [], color=color, label=class_label)
plt.legend()
plt.show()

In [None]:
# Try another one!
# Scatter Plot - Petal Length vs Petal Width
plt.figure(figsize=(6, 4))
plt.scatter(petal_lengths, petal_widths, c=class_colors)
plt.xlabel('Petal Length (cm)')
plt.ylabel('Petal Width (cm)')
plt.show()

In [None]:
# Plot all combinations
# self-study!
# Adapted from ChatGPT 4o

import matplotlib.pyplot as plt

# Define feature pairs for scatter plots
feature_pairs = [
    ('Sepal Length', sepal_lengths, 'Sepal Width', sepal_widths),
    ('Sepal Length', sepal_lengths, 'Petal Length', petal_lengths),
    ('Sepal Length', sepal_lengths, 'Petal Width', petal_widths),
    ('Sepal Width', sepal_widths, 'Petal Length', petal_lengths),
    ('Sepal Width', sepal_widths, 'Petal Width', petal_widths),
    ('Petal Length', petal_lengths, 'Petal Width', petal_widths)
]

# Set up a 2x3 grid for subplots
fig, axes = plt.subplots(2, 3, figsize=(12, 6))  # Adjust the figsize as needed

# Flatten axes for easy iteration
axes = axes.flatten()

# Plot each feature pair in the grid
for i, (x_label, x_data, y_label, y_data) in enumerate(feature_pairs):
    axes[i].scatter(x_data, y_data, c=class_colors)
    axes[i].set_xlabel(f'{x_label} (cm)')
    axes[i].set_ylabel(f'{y_label} (cm)')

# Add legend at the bottom
handles = [
    plt.Line2D([0], [0], marker='o', color='w', markerfacecolor=color, markersize=10, label=cls)
    for cls, color in colors.items()
]
fig.legend(handles=handles, loc='lower center', ncol=3, bbox_to_anchor=(0.5, -0.05))

# Adjust layout to make space for the legend
plt.tight_layout()
plt.subplots_adjust(bottom=0.1)
plt.show()