<a href="https://colab.research.google.com/github/columbia-data-club/meetings/blob/main/2023/january-26-intro-to-python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![A blue background with the Python logo and the words Data Club on it](https://raw.githubusercontent.com/columbia-data-club/meetings/main/assets/images/data-club-python.png)

## Introduction to Python

January 26, 2023

by [Moacir P. de Sá Pereira](https://moacir.com) for the [Columbia Data Club](https://github.com/columbia-data-club/)

This notebook accompanies a ~70-minute presentation that introduces Python to complete beginners to Python and to programming. 

* _Think Python 2e_ by Allen Downey and [available for free download](https://greenteapress.com/wp/think-python-2e/) is a good introduction to programming in general via Python. 


## Introductory Caveats

* There is _no such thing_ as “learning Python.” Good beginner Python instruction comes in two flavors:
  * Specific instruction on a library or module to solve a specific task (this is what we do in most Data Club meetings)
  * General instruction to teach insights into programming
* “Learning Python” in a vacuum will likely feel frustrating and unsatisfying. Better to have a specific task you want to complete with computation in general.
* Abstractly, programming languages are not different from each other. Almost all of them are “[Turing-complete](https://en.wikipedia.org/wiki/Turing_completeness),” meaning they can all be used to solve the same set of problems. As such, skills you learn on any programming language will carry over into Python.
* Nevertheless there are two good reasons to start your programming career with Python:
  1. Python is allegedly easy and has rather human-readable code
  2. Python has a rich library of additional modules that you can use to [analyze data](https://pandas.pydata.org/) and [geospatial data](http://geopandas.org), [explore open quantum systems](https://qutip.org/), [transcribe audio interviews](https://alphacephei.com/vosk/), [make maps](https://python-visualization.github.io/folium/), [model weather](https://unidata.github.io/MetPy/latest/userguide/startingguide.html), [design games](https://www.pygame.org/news), [encrypt data](https://cryptography.io/en/latest), [analyze a large corpus of text](https://spacy.io/), [research family histories](https://github.com/gramps-project/gramps), [build websites](https://www.djangoproject.com/), [control embedded systems and the Internet of Things](https://projects.raspberrypi.org/en/projects), [edit movies](https://zulko.github.io/moviepy/), [scrape the web](https://www.crummy.com/software/BeautifulSoup/bs4/doc/), [simulate ocean dynamics](https://xgcm.readthedocs.io/en/latest/), [generate algorithmic art](https://spin.atomicobject.com/2021/12/14/generative-art-zero-random/),
and [study the stars](https://www.astropy.org/). For starters. We will use none of these today.

## Python Resources

Because our time today is so short, it's important to emphasize that there are many resources both on campus and online for people taking their first steps programming. 

- Columbia
  - [**Research Data Services**](https://library.columbia.edu/services/research-data-services/) This is me and my colleagues. We provide various research data-related consultation services for free to Columbia researchers. That’s why I’m here. Reach out to us by emailing data@library.columbia.edu
  - [**Data Club**](https://library.columbia.edu/services/research-data-services/data-club.html). RDS hosts Data Club, which meets a few times every semester to learn more about data, usually with Python. [Join the email list](https://listserv.cuit.columbia.edu/scripts/wa.exe?SUBED1=CUL-DATA-CLUB&A=1)
  - [**Foundations for Research Computing**](https://rcfoundations.research.columbia.edu/) Foundations hosts two-day intensive workshops on Python, R, and Git a few times each year. The material is based on [Software Carpentry’s “Plotting and Programming with Python”](https://swcarpentry.github.io/python-novice-gapminder/), and you can learn on your own that way, too.
  - “[**Computing in Context**](https://www.cs.columbia.edu/2016/computing-in-context-a-computer-science-course-for-liberal-arts-majors-expands-with-new-sipa-track/).” This is a course for undergraduates and SIPA students. It is typically taught in the Fall. Check [the Directory of Courses](https://doc.search.columbia.edu/classes/computing+in+context?instr=&name=&days=&semes=&hour=&moi=) for availability.
- Online
  - “[Plotting and Programming in Python](https://swcarpentry.github.io/python-novice-gapminder/).” Take the intensive Foundations Python workshop at your own speed.
  - [_Think Python 2e_](https://greenteapress.com/wp/think-python-2e/) is a free textbook for Python that approaches the language thoughtfully and for learners with no programming experience
  - [Women who Code Python](https://www.womenwhocode.com/python) is an online forum with webinars and networking opportunities for women who use Python.
  - [Data Umbrella](https://dataumbrella.substack.com/) is a community for underrepresented persons in data science. While not Python-specific, they often co-host events with [NYC PyLadies](https://www.meetup.com/NYC-PyLadies/events/), which is Python-specific.

## What Is This Webpage You Are Using?

This is an interactive notebook. It lets you write prose (as I have done so far) and also code. That is, this notebook is made up of a bunch of blocks (called “cells”) going down the page. Each cell can hold either text, like this one, or code, where you can type in Python. This provides a very flexible environment for you as you learn, because you can experiment inside a code cell and slowly iterate through your project.

It also solves a common problem regarding using Python on a personal computer; installing it can be cumbersome, but these interactive notebooks let you Python away within your web browser.

[Project Jupyter](https://en.wikipedia.org/wiki/Project_Jupyter) is the gold standard for interactive notebooks, but we are using Google's interpretation of Jupyter notebooks through Google's Colaboratory. Notebooks are not limited to Python. [Observable notebooks](http://www.observablehq.com), for example, work with JavaScript.

Interactive notebooks are especially helpful for [Exploratory Data Analysis](https://en.wikipedia.org/wiki/Exploratory_data_analysis), because none of the changes are permanent, yet you have a space where you can move quickly to understand your data.

So let's start programming!

In [None]:
# This text is a comment. Anything showing up after a "#" sign is ignored
# by the computer. Colab also colors comments in dark green, so you know
# that any text in dark green will be ignored.

# Code cells like this one have a little play button beside them. You press
# that to execute the code in the cell. Alternatively, you can type
# Shift+Enter/Return to execute the code. Despite all these comments, this
# cell has only one line of code, and it is the print statement below.

# The print() function prints whatever is inside the parentheses to the
# screen, but keep in mind that the text we want printed is surrounded by
# quote marks. Whether you use single or double quotes doesn't matter, but
# you should be consistent.

print("Hello from a code cell!")

I left the code cell empty above so you can try typing comments in it and testing what happens with the `print()` function. Of course you can also edit the code cell above it. Or this text cell.

## (Data) Types in Python

There is no limit to how we can describe and imagine things in Python, but everything has the same basic building blocks. We'll look at these today:

 * Numbers
 * Strings
 * Booleans
 * Functions
 * Lists
 * Dictionaries
 * Classes

### Numbers

Numbers are, well, numbers. Under the hood, Python distinguishes between integers and non-integers (called "floats"), but you can use them pretty interchangeably. Try out some basic math in the code cell below. Type an expression inside a `print()` function.

In [None]:
# As you can see, Colab colors numbers dark green like comments.

print(7 * 6)

Like everything we're going to be working with today, numbers can be assigned to variables. Variables can be named anything, but idiomatic Python ("Pythonic") practice is to use descriptive variable names made up of lowercase words separated by an underscore.

In [None]:
# Let's create a variable, "my_number", and assign the value 7 to it:

my_number = 7
print(my_number * 3)

Variables, once defined in a cell (that is, the cell is executed), are available in subsquent cells.

In [None]:
# my_number was set above.

print(my_number + 2)

### Strings

Strings are characters (letters, numbers, punctuation, emoji) surrounded by quotes. We used a string in the first code cell in this notebook. You can use some basic math on strings, too, but the results might be a bit unexpected. 

In [None]:
# Strings are represented in red in Colab.

my_string = "Hello, how are you? 😅"

print(my_string * 3)
print(my_string + "2")

As of a few years ago, strings in Python can also interpolate the values of variables if they are "f-strings."



In [None]:
## Note the color of the f in the f string and of the text in the braces"
my_name = "Moacir"
my_greeting = f"Hello, {my_name}"
my_non_f_string_greeting = "Hello, {my_name}"


print(my_greeting)
print(my_non_f_string_greeting)

### Booleans

There are two values for Booleans, `True` and `False` (capitals matter). We might not assign these values to variables, but we do rely on them to do comparisons and to evaluate conditional statements.

In [None]:
# The "==" means "is equal to" and is distinct from "=",
# which is used to assign values to variables.

print(f"`my_number` is {my_number}")
print(f"Is `my_number` equal to 2? {my_number == 2}")
print(f"Is `my_number` less or equal to 100? {my_number <= 100}")
print(f'Is `my_number` equal to "a number"? {my_number == "a number"}')
# Note I switched to single quotes above. Why?

### Functions

Functions are where the action starts to happen. Functions often expect something to be passed to them, one or more parameters. They then take these parameters and do something with them. I'm vague here because that's the point. These are expressive features!

Functions are also a place where we can talk about how white-space (tabs and spaces) matters in Python. Everything inside the function has to be indented with tabs or spaces. The first line that's back to the normal margin exists outside of function.

We've already used the `print()` function several times. It is built into Python. Similarly, the `len()` function can tell us the length of a name. But we can also create our own.

In [None]:
print(f"`my_name` is {len(my_name)} letters long")

def how_long_is_a_word(word):
  ## Note the two space indent!
  print(f'Your word, "{word}," is {len(word)} letters long')

# Now we "call" the function and pass in an argument.
how_long_is_a_word("a word")

Conditional statements (`if` statements) and loops also use the same kind of indented syntax. We'll get to loops later, but let's incorporate a lot of what we have learned into a tip calculator.

In [None]:
def tip_calculator(total, good_service = True):
  # We are passing in two parameters, "total" and "good_service".
  # We also create a default value for "good_service"
  if good_service:
    total_with_tip = 1.2 * total
  else: 
    total_with_tip = 1.18 * total
  
  # Blank lines don't matter.

  return total_with_tip # this is what the function outputs

# Now we're outside the function and we can call it:

print(f"You owe ${tip_calculator(20, good_service = False)}")
print(f"You owe ${tip_calculator(20)}")
print(f"You owe ${tip_calculator(20, True)}")

# We can use variable and assign return values to variables!
my_total = 40
my_service = 20 > 10
my_total_with_tip = tip_calculator(my_total, good_service = my_service)
print(f"I owe ${my_total_with_tip}")

### Lists

So far, we've been working with single things, like a single number or a single string. However, we often get want to work with many things at once. One way of arranging information is with a list. A list is what it sounds like: a list of things, separated by commas, inside brackets (`[]`).

Lists can have different types of things inside (numbers, strings, other lists, etc.) or all the same kind of thing. Lists are iterable, so you can do the same transformation on every element in the list.

In a list, order matters, and you can access each member of the list with its index value. The tricky thing about lists, though, is that the first member of the list has an index value of `0`, not `1`. As such, I tend to talk about the "zeroth" member and then the "oneth" and "twoth" members, to underscore the difference.

In [None]:
# Lists are lists of things surrounded by brackets

my_list = [my_name, 2, "tree", 82.4]
my_list_of_numbers = [3, 5, 2, 8, 29, 3.2]


print(my_list[0]) # get the zeroth member of the list.
print(my_list_of_numbers[-1]) # get the last member of the list.


In [None]:
# Lists have lengths"

print(len(my_list))
print(len(my_list_of_numbers))

In [None]:
# You can iterate over a list using a `for` loop")
for element in my_list_of_numbers: # Note the colon! We've got to indent
  # "element" is an arbitrary variable that refers to the current list
  # member in the loop
  print(element * 2.1)

In [None]:
# You can also iterate over lists using the very Pythonic 
# "list comprehension" syntax. This is hard to understand
# at first, but it allows for very pithy yet expressive code

my_list_of_squared_numbers = [n * n for n in my_list_of_numbers]
# above, "n" is doing the same work as "element" in the previous cell

print(my_list_of_squared_numbers)

### Dictionaries

Dictionaries are objects that have properties. You define them by surrounding a list of properties by braces (`{}`). Properties take the form of a string that's the key, then a colon, and then the value. The value can be anything.

You can then refer to a property of a dictionary with bracket syntax, like with a list, but using the property key instead of the index.

In [None]:
# Let's define four dictionaries. They may be familiar.

raphael = {
    "name": "Raphael",
    "aliases": ["The Green Defender"],
    "weapons": ["Sai", "Trident", "Laser Gun"],
    "occupation": "Cool But Rude",
    "height": 1.55,
    "weight": 67,
    "bandana": "red"
}
leonardo = {
    "name": "Leonardo",
    "aliases": ["D'Artagnan", "The Leo-meister", "Leo"],
    "weapons": ["Ninjato blades", "Ninja Stars", "Sword", "Twin Katanas"],
    "occupation": "Turtle Leader",
    "height": 1.55,
    "weight": 70,
    "bandana": "blue"    
}
donatello = {
    "name": "Donatello",
    "aliases": ["Donny", "The Donster"],
    "weapons": ["Bō staff", "Spear", "Laser Gun"],
    "occupation": "Inventor (does machines)",
    "height": 1.45,
    "weight": 66,
    "bandana": "purple"    
}
michelangelo = {
    "name": "Michelangelo",
    "aliases": ["Mikey" "Michael J. Angelo"],
    "weapons": ["Nunchaku", "Manriki-gusari", "Turtle Line", "Laser Gun"],
    "occupation": "Pizza Delivery Boy",
    "height": 1.52,
    "weight": 68,
    "bandana": "orange"  
}

# Now let's put all the dictionaries into a list.
turtles = [donatello, leonardo, michelangelo, raphael]

In [None]:
# Now we can iterate over the list and pull out specific properties

for turtle in turtles:
  print(f"{turtle['name']} is {turtle['height']}m tall, and his primary weapon is {turtle['weapons'][0].lower()}.")

### Classes

Classes are a further step in reasoning about things in Python. In the dictionaries example, we had the same basic structure for each of the four dictionaries, as each referred to a different ninja turtle. We can abstract that structure out and create a class specifically for ninja turtles. Then we can add additional functionality for the ninja turtle class.

Classes are defined with the `class` keyword and typically use PascalCasing. They also have a few built in functions, called "dunders" (because they use double underscores) to help describe the class better. If a class has a function, we usually call that function a method.

In [None]:
class NinjaTurtle:
  # The "__init__" dunder is run in the background whenever we make a new
  # instance of the class
  def __init__(self, name, aliases, weapons, occupation, height, weight, bandana):
    self.name = name # note indent!
    self.aliases = aliases
    self.weapons = weapons
    self.occupation = occupation
    self.height = height
    self.weight = weight
    self.bandana = bandana
  # End of indent for "__init__"

  # Let's create a method that gives the turtle's height in inches.
  def height_in_inches(self):
    return self.height * 39.37008

  # And let's create a method that gives the turtle's BMI.
  def bmi(self):
    return self.weight / (self.height * self.height)

# "self" refers to the class instance itself, but we don't have to include
# it when calling these method functions.

In [None]:
# Let's recreate the turtles, but as NinjaTurtles, not dictionaries.

# Order of parameters matters!
raphael_as_ninja_turtle = NinjaTurtle("Raphael", ["The Green Defender"], 
                      ["Sai", "Trident", "Laser Gun"], 
                      "Cool But Rude", 1.55, 67, "red")

# Or we can use keyword arguments:
leonardo_as_ninja_turtle = NinjaTurtle(name="Leonardo", 
                       aliases=["D'Artagnan", "The Leo-meister", "Leo"],
                       weapons=["Ninjato blades", "Ninja Stars", "Sword", "Twin Katanas"],
                       occupation="Turtle Leader",
                       height=1.55, weight=70, bandana="blue")   

donatello_as_ninja_turtle = NinjaTurtle("Donatello", ["Donny", "The Donster"],
                        ["Bō staff", "Spear", "Laser Gun"],
                        "Inventor (does machines)", 1.45, 66, "purple")    
michelangelo_as_ninja_turtle = NinjaTurtle("Michelangelo", ["Mikey" "Michael J. Angelo"], 
                           ["Nunchaku", "Manriki-gusari", "Turtle Line", "Laser Gun"], 
                           "Pizza Delivery Boy", 1.52, 68, "orange")

# Now we can put our NinjaTurtles in the "turtles" list and iterate over them.
turtles = [donatello_as_ninja_turtle, leonardo_as_ninja_turtle, michelangelo_as_ninja_turtle, raphael_as_ninja_turtle]

In [None]:
# What are the turtles' BMIs?

for turtle in turtles:
  print(f"{turtle.name}'s BMI is {turtle.bmi()}")

In [None]:
# We can also reproduce the loop we did above, with more elegant syntax.

for turtle in turtles:
  print(f"{turtle.name} is {turtle.height_in_inches()} in. tall, and his primary weapon is {turtle.weapons[0].lower()}.")

In [None]:
# Note what happens when we try to "print()" a NinjaTurtle:

print(donatello)
print(donatello_as_ninja_turtle)

In [None]:
# Neither is a great experience, but we can make use of dunder methods,
# docstrings, and subclasses to improve on the results.
# 
# This is just a demonstration! We're getting advanced and tickling
# very specific Pythonic itches.

class DocumentedNinjaTurtle(NinjaTurtle):
  """A helpful class for managing ninja turtles"""
  def __str__(self):
    return self.name
  
donatello_as_documented_ninja_turtle = DocumentedNinjaTurtle("Donatello", 
                                                             ["Donny", "The Donster"],
                                                             ["Bō staff", "Spear", "Laser Gun"],
                                                             "Inventor (does machines)", 
                                                             1.45, 66, "purple")

print(donatello_as_documented_ninja_turtle)
print(donatello_as_documented_ninja_turtle.__doc__)

## Writing to Files

Finally, let's quickly look at writing to files. Even though with Colab you are inside the web browser, there's still a file explorer on the left side you can use to upload or download files. This is one way of reading data into your notebook, but for this exercise, we'll write our height and weapons report out to a file we can download and share with Shredder.

In [None]:
with open("height_and_weapons_secrets.txt", "a") as file:
  for turtle in turtles:
    # "\n" is the symbol to create a new line in Python.
    file.write(f"{turtle.name} is {turtle.height_in_inches()} in. tall, and his primary weapon is {turtle.weapons[0].lower()}.\n")

## Conclusion

There's a lot we didn't do today. We didn't use a single `import` statement, meaning we only used built-in Python code. There's a lot we skipped, most notably error checking and dealing with errors. Python is pretty good at describing errors, but it's annoying to write code that causes them intentionally. Better to have them arise organically in a group of beginners. 

At intermediate levels, you can start [handling the errors](https://docs.python.org/3/tutorial/errors.html). If we tried to create a `NinjaTurtle` with a height of "🥰," Python would let us, but when we tried to calculate that turtle's BMI, the program would crash, because you can't multiply "🥰" times itself. You also can't convert "🥰" to inches. A more robust class definition would check to make sure that the value given for height is a number, for example, and raise an exception that alerts us that "A `NinjaTurtle`'s `height` must be represented by a number."

Hopefully this workshop has shown you that Python is accessible, especially if you like puzzles and problem solving. You can also get started doing things with Python with little training, picking things up as you go and improving every time you sit down to code.

At the same time, it's important to underscore that, outside of list comprehensions and dunder methods, everything here is quickly portable to other languages. It's the concepts that matter, and it's the concepts that will occupy your imagination.

In [None]:
# See?

"🥰" * "🥰"

In [None]:
crashing_michelangelo = NinjaTurtle("Michelangelo", ["Mikey" "Michael J. Angelo"], 
                           ["Nunchaku", "Manriki-gusari", "Turtle Line", "Laser Gun"], 
                           "Pizza Delivery Boy", "🥰", 68, "orange")
print(crashing_michelangelo.height_in_inches())