# Introduction to Programming with Python - Part 2

##  Setup

With this Google Colaboratory (Colab) notebook open, click the "Copy to Drive" button that appears in the menu bar. The notebook will then be attached to your own user account, so you can edit it in any way you like -- you can even take notes directly in the notebook.

## Instructors

- Ashley Evans Bandy
- Claire Cahoon
- Walt Gurley
- Natalia Lopez

## Learning objectives

By the end of our workshop today, we hope you'll build on basic Python syntax to understand control flow and reading and writing files. With these in hand, you'll know enough to write and apply basic scripts and explore other features of the language.  

## Today's Topics
- Comparison operators
- If statements
- For loops
- Reading and writing files

If you would like to learn more about data analysis using Python, we will explore these topics in our Python open labs starting next week.

For a reference to previous workshop content, you can access a version of the previous workshop materials with all example and exercise code completed in the [Introduction to Programming with Python - Part 1 filled notebook](https://colab.research.google.com/github/NCSU-Libraries/data-viz-workshops/blob/master/Introduction_to_Programming_with_Python/Introduction_to_Programming_with_Python_1_filled.ipynb).


## Zoom etiquette

Please make sure that your mic is muted during the workshop.

## Questions during the workshop

Please feel free to ask questions in the Zoom chat throughout the workshop.

We have a second instructor who will be monitoring chat on Zoom. They will answer as able, and will collect questions with answers that might help everyone to be answered at the end of the workshop.

## Jupyter Notebooks and Google Colaboratory

We will be using Google Colabs in today's workshop. Jupyter notebooks are a way to write and run Python code in an interactive way. If you would like to know more about Colaboratory or how to use this tool, you can visit the [Welcome Notebook](https://colab.research.google.com/notebooks/welcome.ipynb).

If you'd like to install a Python distribution locally, though, we're happy to help. Feel free to [get help from our graduate consultants](https://www.lib.ncsu.edu/dxl) or [schedule an appointment with Libraries staff](https://go.ncsu.edu/dvs-request).

## Control flow 

Control flow refers to statements that allow you to change the path of your program. Instead of executing each line of code in order, you can make decisions or move to different lines of code depending on different conditions.

To learn more about control flow:

- ["Control Flow" chapter of *A Byte of Python*](https://python.swaroopch.com/functions.html)
- [Python documentation on control flow tools](https://docs.python.org/3/tutorial/controlflow.html)

### If Statements

"The if statement is used to check a condition: if the condition is true, we run a block of statements (called the if-block), else we process another block of statements (called the else-block). The else clause is optional." (From ["Control Flow" chapter of *A Byte of Python* includes a section on "If Statements"](https://python.swaroopch.com/functions.html))

#### Comparison Operators

Comparison operators are used to compare two values and return a value of `True` or `False`, a boolean data type.

The following are Python comparison operators: `==`, `!=`, `>`, `<`, `>=`, `<=`

In [None]:
# Compare numbers


In [None]:
# Compare strings


In [None]:
# Compare other expressions using multiple data types


**Try it yourself:** What is the result of this comparison?

In [None]:
len([1, 2]) == 2

#### If, Else if, Else (`if`, `elif`, `else`)

In [None]:
# If statements (conditional execution)


# Print "Hi Bob!" if cat_name is "Bob"


# Add an else statment if the condition isn't met


In [None]:
#You can use if, else if ("elif"), and else to specify more than one condition


In [None]:
#You can also use logical operators (and, or, not) to evaluate expressions.


In [None]:
# Greater than or less than can be used as part of an if statement 
cat_ages = {
    "Sally": 5, 
    "Bob": 16,
    "Bill": 1,
    "Carla": 10,
    "Pete": 3
}

# Pick a cat name from the cat_ages dictionary


# Print a message based on the age range of a cat


#### Activity: Use `if` statements to create personalized messages (10 min)

Write an if statement (`if`, `elif`, `else`) that includes an operator (`and`, `or`, `not`) to generate a message for members of the *International Cats of Mystery* service (member information is contained in the dictionaries below). The language of the message is dependant on the following criteria:

If a someone has been a member for three years or more, and they have more than one subscription, then send them this message:

> **[member name]**, as a loyal member, you are eligible for a special offer.

Otherwise, send them this message:

> **[member name]**, thank you for your membership. Are you subscribed to all of our channels?

TIP: Note that the `joined` key in the member information dictionary contains the date that a member joined, not the number of membership years.

In [None]:
# Dictionaries containing individual member information for members of the
# International Cats of Mystery service
member_info_1 = {"name": "Bill", 
                 "joined": 2015, 
                 "address 1": "12309 Scratch Tree Lane", 
                 "address 2": "Apt. 1", 
                 "city": "Beverly Hills", 
                 "state": "CA", 
                 "zip code":"90210", 
                 "subscriptions" : ['Little Known Hiss-tories podcast']}
member_info_2 = {"name": "Pete", 
                 "joined": 2018, 
                 "address 1": "501 Grumpy Street", 
                 "address 2": "", 
                 "city": "Long Beach", 
                 "state": "California", 
                 "zip code": "90803",
                 "subscriptions" : ['Leisure Cats magazine','Little Known Hiss-tories podcast']}
member_info_3 = {"name": "Carla", 
                 "joined": 2006, 
                 "address 1": "2120 Curious Court", 
                 "address 2": "unit B", 
                 "city": "Raleigh", 
                 "state": "North Carolina", 
                 "zip code": "27610", 
                 "subscriptions" : ['email newsletter', 'Cat Nips subscription box']}

In [None]:
# Check a member's years of membership and number of subscriptions and print a
# message accordingly


#### Bonus Activity: If Statement

If you have additional time after completing the previous activity try this bonus challenge. Write a similar if statement to the one above, but conisder how you might include an additional condition. The language of the message is dependant on the following revised criteria:

If a someone has been a member for three years or more, and they have more than one subscription, then send them this message:

> **[member name]**, as a loyal member, you are eligible for a special offer.

If someone has been a member for three years or more, and they have more than one subscription, and they live in California then send them this message:

> **[member name]**, thank you for being a loyal member.

Otherwise, send them this message:

> **[member name]**, thank you for your membership. Are you subscribed to all of our channels?

TIP: While not covered previously, you can nest if statements, with one inside another as a way to write this code. Try looking up how to do nest if statements and note how the indentation might change.

In [None]:
# Check a member's years of membership, number of subscriptions, and state and
# print a message accordingly


### For loops

For loops iterate over a sequence of objects and peform some sort of function. 

For example, a "For loop" can be used to go through a list of names, and evaluate whether each name is uppercase or not.

- [Refer to the "For loop" section in "Control Flow" chapter of *A Byte of Python*](https://python.swaroopch.com/functions.html)

In [None]:
# Iterate over a range
# The range function returns a sequence of numbers, you can specify the length, among other things.
# Here, the range gives 6 numbers, starting at zero.


In [None]:
# Iterate over a string


In [None]:
# For loops let you iterate over a list or other iterable object
cat_names = ["Bob", "Bill", "Pete", "Carla", "Sally"]

# Loop over the members of cat_names to print out the element and its length


In [None]:
# You can combine types of control flow


### Activity: Test for correct member state names using a `for` loop and `if` statements (10 min)

We have combined the individual membership information for each member of the *International Cats of Mystery* service into the list `members`. We need to determine if the `state` address listing for each member is formatted correctly as a two letter abbreviation (e.g., North Dakota should be listed as ND).

Create a for loop to iterate over the list of members to test if each member's `state` address listing is formatted correctly as a two letter abbreviation. If it is correctly formatted, print out:

>  State name correct for **[member name]**

if it is not correctly formatted, print:

> Fix **[member state]** for **[member name]**

In [None]:
# Combine the member info from the first activity into a list
members = [member_info_1, member_info_2, member_info_3]
members

In [None]:
# Loop over the member info contained in the list "members" to test if state
# listing is correctly formatted


### Control flow with functions

In [None]:
# Using a function in a for loop


## 5 Minute Break

**ADVANCED: List Comprehensions**

List comprehensions are a "pythonic" way of building lists in a compact manner

In [None]:
# A list of numbers


# A function to add 1 to a number


# Create a new list applying the function add_one() to the values in nums


In [None]:
# Using conditional statements in list comprehension
cat_names

# Create a new list of cat_names that are lower case and longer than 3 characters


## Reading and writing files

Working with comma-separated and similar data files will be covered in a later workshop. It's worthwhile, however, to see how to read and write data or text to and from a file. We'll start with writing some text to a file, then explore how to read it.

### Write to a file

In [None]:
# Sample text to write to a file
sample_text = """
Cats and kittens everywhere,
Hundreds of cats,
Thousands of cats,
Millions and billions and trillions of cats.
"""

# Write the sample text to a file (test different open modes)

    
# You might sometimes see an older pattern:
# f = open('lorem.txt', 'w')
# f.write(sample_text)
# f.close()

You can check the "Files" tab in the column at left now to find the output file.
Note that you must click the REFRESH" button to see it.

### Read from a file

In [None]:
# Read the newly created file.


In [None]:
# We can also read the file line by line


#### Parsing data read from a file

External data we read into our application is often not formatted for our desired manipulations or analyses. We often have to parse the data from a file into a structure that is appropriate for manipulation or analysis.

In [None]:
# Create an empty list to store the data we will parse


# Read in the cat_text file line by line


# Print out the resulting list


### Activity: Parse a text file containing data seperated by semicolons (10 min)

We have membership information for 25 members of the *International Cats of Mystery* service stored in the file `international_cats_of_mystery.txt`. The file is formatted as follows:

1. Each line is seperated by a newline character (`\n`)

1. The first line in the file contains the data variable names (e.g., Name, Joined, Address 1,...)

1. Each subsequent line contains the data variable values for one member (e.g., Bill, 2015, 12309 Scratch Tree Lane,...)

1. Each variable name or value is seperated by semicolons (`;`)

Use a file parsing method to parse the `international_cats_of_mystery.txt` file to create a list that contains lists of values from each line in the file. The first two items in this list should be:

```python
[
  ['Name', 'Joined', 'Address 1', 'Address 2', 'City', 'State', 'Zip Code', 'Subscriptions'],
  ['Bill', '2015', '12309 Scratch Tree Lane', 'Apt. 1', 'Beverly Hills', 'CA', '90210', 'Little Known Hiss-tories podcast'],
  ...
]
```

In [None]:
# Fetch the text file for the activity
!curl https://raw.githubusercontent.com/NCSU-Libraries/data-viz-workshops/master/Introduction_to_Programming_with_Python/International_cats_of_mystery_US_chapter.txt -o international_cats_of_mystery.txt

In [None]:
# Parse the data in the "international_cats_of_mystery.txt" file


## Bonus activity: Generate personalized messages using data from a file

This activity incorporates concepts we have covered in both *Introduction to Programming with Python* workshops. Use the skills you have developed through your work on these materials as well as references to additional resources we provide to complete this extended activity.

For this activity we have membership information for 25 members of the *International Cats of Mystery* service stored in the file `international_cats_of_mystery.txt`. The file is formatted as follows:

1. Each line is seperated by a newline character (`\n`)

1. The first line in the file contains the data variable names (e.g., Name, Joined, Address 1,...)

1. Each subsequent line contains the data variable values for one member (e.g., Bill, 2015, 12309 Scratch Tree Lane,...)

1. Each variable name or value is seperated by semicolons (`;`)

Our job is to read in the `international_cats_of_mystery.txt` file, parse the data, and format the data to create a list of dictionaries containing membership information that match the format:

```python
{'address 1': '12309 Scratch Tree Lane',
  'address 2': 'Apt. 1',
  'city': 'Beverly Hills',
  'joined': 2015,
  'name': 'Bill',
  'state': 'CA',
  'subscriptions': ['Little Known Hiss-tories podcast'],
  'zip code': '90210'}
```

Once we have the data prepared we will generate the following personalized message for each member:

> *Hello **[member name]**, Thank you for being a valued member of International Cats of Mystery for the last **[member's years of membership]** years. We have an exclusive discount on our **[member's top subscription with the service]**.*

This is similar to the final activity in *Introduction to Programming with Python 1* except this time we will use a `for` loop to automate this process.

In [None]:
# Fetch the text file for the final activity
!curl https://raw.githubusercontent.com/NCSU-Libraries/data-viz-workshops/master/Introduction_to_Programming_with_Python/International_cats_of_mystery_US_chapter.txt -o international_cats_of_mystery.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2667  100  2667    0     0  16773      0 --:--:-- --:--:-- --:--:-- 16773


In [None]:
# Read whole text file to preview contents


### Read and parse data line by line and store in a list

Read the membership information data file line by line, parse the string (remove the newline character and split based on semicolon), and store each line as a list in the empty list `data_from_file`.

Tip: Remember the list method `append()` to append new items to a list

In [None]:
# Empty list to store data


# Read in text file line by line


# Print data_from_file


### Create member info dictionaries

We now have a list containing one list per line from the file that we read (stored in `data_from_file`). We need to loop through the `data_from_file` list and use the variable names and values to create dictionaries containing each member's information.

To do this we will loop over the lists containing data in `data_from_file` (`data_from_file[1:]`, remember the first list item containes the variable names), create an empty dictionary, and then use another loop to loop over a range of values the length of items in `data_from_file[0]` to create a dictionary for each member consisting of variable names (`data_from_file[0][i]`) and values (`data[i]`) as key, value pairs.

```python
member_info = []

for data in data_from_file[1:]:
    data_dict = {}
    for i in range(len(data_from_file[0])):
        data_dict[data_from_file[0][i]] = data[i]
    member_info.append(data_dict)
```

In [None]:
# Create an empty list to store the dictionaries


# Loop over each data list from the parsed data


# Print out the list of dictionaries


The following cell uses list comprehension to perform the same operations as the previous cell. This is included to show a comparison of two methods for generating the same resultes. The list comprehension uses a `zip` function to create an iterable of paired tuples of variable names (from `data_from_file[0]`) and variable values from each member data row in `data_from_file` (i.e., `data_from_file[1:]`). The `dict()` function is then used to construct a dictionary from the output of `zip()`.

In [None]:
# For list items from index 1 in data_lists create a dictionary of variable
# names (data_lists[0]) and values as key, value pairs


### Convert the `Joined` key value from a string to an integer

When we read in our file, all of the data was parsed as a string. We must convert any data that should be in another data type accordingly.

In [None]:
# Test that value stored in 'Joined' is a string

# Convert the 'Joined' value to an integer for each member


### Fix state names using a defined function

Some of the state names in our member info list are not formatted correctly as two letter abbreviations. We will create a function to fix the incorrectly formatted state names.

In [None]:
# State abbreviations lookup
state_abbreviations = {
    'California': 'CA',
    'New York': 'NY',
    'North Carolina': 'NC',
    'Texas': 'TX'
}

# Fix state names functions


In [None]:
# Run the fix_state_format for each member in member_info


### Convert `Subscriptions` string to a list

Finally, as our file was parsed as a string, the values contained in the `Subscriptions` key are not formatted as a list. We must convert the current string form to a list using the string function `split()`.

In [None]:
# Convert the Subscriptions key value for each member to a list


### Use a `for` loop to generate a pesonalized message for each member

Use the code cells below to write two functions that produce a personalized email with the provided membership information and then loop through the `member_info` list to produce this message for all members.

Tips:

- Define a function that calculates how many years someone has been a member - Define a second function that:
  1. gets the member name,
  1. calls the previous function to calculate their years of membership,
  1. gets their most popular subscription,
  1. and formats this information into a personalized message.

- The year a member joined is an integer. Remember, you can only concatenate strings (recall the `str()` method)

- Subscriptions are a list. Each member's subscriptions are listed from most popular to least popular.

In [None]:
# Define a function that calculates membership years


For reference, here is the message format again:
> *Hello **[member name]**, Thank you for being a valued member of International Cats of Mystery for the last **[member's years of membership]** years. We have an exclusive discount on our **[member's top subscription with the service]**.*

In [None]:
# Define a function that creates the personalized message


In [None]:
# Loop through the "member_info" list to produce the personalized message for
# each member


## Further resources and topics

### Filled version of this notebook

[Introduction to Programming with Python - Part 2 filled notebook](https://colab.research.google.com/github/NCSU-Libraries/data-viz-workshops/blob/master/Introduction_to_Programming_with_Python/Introduction_to_Programming_with_Python_2_filled.ipynb) - a version of this notebook with all code filled in for the guided activity and exercises.

### Resources

- [A Byte of Python](https://python.swaroopch.com/) is a great intro book and reference for Python
- [Official Python documentation and tutorials](https://docs.python.org/3/)
- [Real Python](https://realpython.com/) contains a lot of different tutorials at different levels
- [LinkedIn Learning](https://www.lynda.com/Python-training-tutorials/415-0.html) is free with NC State accounts and contains several video series for learning Python
- [Dataquest](https://www.dataquest.io/) is a free then paid series of courses with an emphasis on data science

### Topics

- Other data structures: sets, tuples
- Libraries, packages, and pip
- Virtual environments
- Text editors and local execution environments
- The object-oriented paradigm in Python: classes, methods

### Installing Python 

There are quite a few ways to install Python on your own computer, including the [official Python downloads](https://www.python.org/downloads/) and the very popular data-science focused [Anaconda Python distribution](https://www.anaconda.com/products/individual). Depending on your operating system, how you want to write code, and what type of projects you might work on, there are other approaches as well, such as using [pyenv](https://github.com/pyenv/pyenv) and [poetry](https://python-poetry.org/). If you're not sure which approach to take, feel free to get in touch and we'll talk through options and help you get set up. 

### Popular editors for Python

Today we've been writing and running code in Google Colab, which is one particular version of Jupyter Notebooks. Depending on your projects and what you're working on, you may want to write your code in a text editor. While there are many options, if you're just getting started we recommend [Visual Studio Code](https://code.visualstudio.com/) for any operating system but are happy to talk through other editors.


## Evaluation survey
Please, spend 1 minute answering these questions that can help us a lot on future workshops. 

https://go.ncsu.edu/dvs-eval

## Credits

This workshop was developed by Scott Bailey, Ashley Evans Bandy, Claire Cahoon, Walt Gurley, and Natalia Lopez from the NC State University Libraries. Materials are based on workshops by Scott Bailey, Vincent Tompkins, Javier de la Rosa, Peter Broadwell, and Simon Wiles.