<a href="https://colab.research.google.com/github/columbia-data-club/meetings/blob/main/WIS/2023/1_1_Introduction_to_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![A blue background with the Python logo and the words Data Club on it](https://raw.githubusercontent.com/columbia-data-club/meetings/main/assets/images/data-club-python.png)

## Introduction to Python

December 1, 2023

by [Moacir P. de Sá Pereira](https://moacir.com) for [Women in STEM @ SIPA](https://sipa.campusgroups.com/wis/home/), modified from  [Columbia Data Club](https://github.com/columbia-data-club/) notebooks.

This notebook accompanies a ~70-minute presentation that introduces Python to complete beginners to Python and to programming.


## Python Resources



Because our time today is so short, it's important to emphasize that there are many resources both on campus and online for people taking their first steps programming.

- Columbia
  - [**Research Data Services**](https://library.columbia.edu/services/research-data-services/) This is me and my colleagues. We provide various research data-related consultation services for free to Columbia researchers. That’s why I’m here. Reach out to us by emailing data@library.columbia.edu
  - [**Data Club**](https://library.columbia.edu/services/research-data-services/data-club.html). RDS hosts Data Club, which meets a few times every semester to learn more about data, usually with Python. [Join the email list](https://listserv.cuit.columbia.edu/scripts/wa.exe?SUBED1=CUL-DATA-CLUB&A=1)
  - [**Foundations for Research Computing**](https://rcfoundations.research.columbia.edu/) Foundations hosts two-day intensive workshops on Python, R, and Git a few times each year. The material is based on [Software Carpentry’s “Plotting and Programming with Python”](https://swcarpentry.github.io/python-novice-gapminder/), and you can learn on your own that way, too.
  - “[**Computing in Context**](https://www.cs.columbia.edu/2016/computing-in-context-a-computer-science-course-for-liberal-arts-majors-expands-with-new-sipa-track/).” This is a course for undergraduates and SIPA students. It is typically taught in the Fall. Check [the Directory of Courses](https://doc.search.columbia.edu/classes/computing+in+context?instr=&name=&days=&semes=&hour=&moi=) for availability.
- Online
  - “[Plotting and Programming in Python](https://swcarpentry.github.io/python-novice-gapminder/).” Take the intensive Foundations Python workshop at your own speed.
  - [_Think Python 2e_](https://greenteapress.com/wp/think-python-2e/) is a free textbook for Python that approaches the language thoughtfully and for learners with no programming experience
  - [Women who Code Python](https://www.womenwhocode.com/python) is an online forum with webinars and networking opportunities for women who use Python.
  - [Data Umbrella](https://dataumbrella.substack.com/) is a community for underrepresented persons in data science. While not Python-specific, they often co-host events with [NYC PyLadies](https://www.meetup.com/NYC-PyLadies/events/), which is Python-specific.

## What Is This Webpage You Are Using?



This is an interactive notebook. It lets you write prose (as I have done so far) and also code. That is, this notebook is made up of a bunch of blocks (called “cells”) going down the page. Each cell can hold either text, like this one, or code, where you can type in Python. This provides a very flexible environment for you as you learn, because you can experiment inside a code cell and slowly iterate through your project.

It also solves a common problem regarding using Python on a personal computer; installing it can be cumbersome, but these interactive notebooks let you Python away within your web browser.

[Project Jupyter](https://en.wikipedia.org/wiki/Project_Jupyter) is the gold standard for interactive notebooks, but we are using Google's interpretation of Jupyter notebooks through Google's Colaboratory. Notebooks are not limited to Python. [Observable notebooks](http://www.observablehq.com), for example, work with JavaScript.

Interactive notebooks are especially helpful for [Exploratory Data Analysis](https://en.wikipedia.org/wiki/Exploratory_data_analysis), because none of the changes are permanent, yet you have a space where you can move quickly to understand your data.

So let's start programming!

In [None]:
# This text is a comment. Anything showing up after a "#" sign is ignored
# by the computer. Colab also colors comments in dark green, so you know
# that any text in dark green will be ignored.

# Code cells like this one have a little play button beside them. You press
# that to execute the code in the cell. Alternatively, you can type
# Shift+Enter/Return to execute the code. Despite all these comments, this
# cell has only one line of code, and it is the print statement below.

# The print() function prints whatever is inside the parentheses to the
# screen, but keep in mind that the text we want printed is surrounded by
# quote marks. Whether you use single or double quotes doesn't matter, but
# you should be consistent.

print("Hello from a code cell!")

I left the code cell empty above so you can try typing comments in it and testing what happens with the `print()` function. Of course you can also edit the code cell above it. Or this text cell.

## (Data) Types in Python

There is no limit to how we can describe and imagine things in Python, but everything has the same basic building blocks. We'll look at these today:

 * Numbers
 * Strings
 * Booleans
 * Functions
 * Lists
 * Dictionaries

### Numbers

Numbers are, well, numbers. Under the hood, Python distinguishes between integers and non-integers (called "floats"), but you can use them pretty interchangeably. Try out some basic math in the code cell below. Type an expression inside a `print()` function.

In [1]:
# As you can see, Colab colors numbers dark green like comments.

print(7 * 6)

42


Like everything we're going to be working with today, numbers can be assigned to variables. Variables can be named anything, but idiomatic Python ("Pythonic") practice is to use descriptive variable names made up of lowercase words separated by an underscore.

In [2]:
# Let's create a variable, "my_number", and assign the value 7 to it:

my_number = 7
print(my_number * 3)

21


Variables, once defined in a cell (that is, the cell is executed), are available in subsquent cells.

In [3]:
# my_number was set above.

print(my_number + 2)

9


### Strings

Strings are characters (letters, numbers, punctuation, emoji) surrounded by quotes. We used a string in the first code cell in this notebook. You can use some basic math on strings, too, but the results might be a bit unexpected.

In [4]:
# Strings are represented in red in Colab.

my_string = "Hello, how are you? 😅"

print(my_string * 3)
print(my_string + "2")

Hello, how are you? 😅Hello, how are you? 😅Hello, how are you? 😅
Hello, how are you? 😅2


As of a few years ago, strings in Python can also interpolate the values of variables if they are "f-strings."



In [5]:
## Note the color of the f in the f string and of the text in the braces"
my_name = "Moacir"
my_greeting = f"Hello, {my_name}"
my_non_f_string_greeting = "Hello, {my_name}"


print(my_greeting)
print(my_non_f_string_greeting)

Hello, Moacir
Hello, {my_name}


### Booleans

There are two values for Booleans, `True` and `False` (capitals matter). We might not assign these values to variables, but we do rely on them to do comparisons and to evaluate conditional statements.

In [6]:
# The "==" means "is equal to" and is distinct from "=",
# which is used to assign values to variables.

print(f"`my_number` is {my_number}")
print(f"Is `my_number` equal to 2? {my_number == 2}")
print(f"Is `my_number` less or equal to 100? {my_number <= 100}")
print(f'Is `my_number` equal to "a number"? {my_number == "a number"}')
# Note I switched to single quotes above. Why?

`my_number` is 7
Is `my_number` equal to 2? False
Is `my_number` less or equal to 100? True
Is `my_number` equal to "a number"? False


### Functions



Functions are where the action starts to happen. Functions often expect something to be passed to them, one or more parameters. They then take these parameters and do something with them. I'm vague here because that's the point. These are expressive features!

Functions are also a place where we can talk about how white-space (tabs and spaces) matters in Python. Everything inside the function has to be indented with tabs or spaces. The first line that's back to the normal margin exists outside of function.

We've already used the `print()` function several times. It is built into Python. Similarly, the `len()` function can tell us the length of a name. But we can also create our own.

In [7]:
print(f"`my_name` is {len(my_name)} letters long")

def how_long_is_a_word(word):
  ## Note the two space indent!
  print(f'Your word, "{word}," is {len(word)} letters long')

# Now we call the function and pass in an argument.
how_long_is_a_word("a word")

`my_name` is 6 letters long
Your word, "a word," is 6 letters long


Conditional statements (`if` statements) and loops also use the same kind of indented syntax. We'll get to loops later, but let's incorporate a lot of what we have learned into a tip calculator.

In [8]:
def tip_calculator(total, good_service = True):
  # We are passing in two parameters, "total" and "good_service".
  # We also create a default value for "good_service"
  if good_service:
    total_with_tip = 1.2 * total
  else:
    total_with_tip = 1.18 * total

  # Blank lines don't matter.

  return total_with_tip # this is what the function outputs

# Now we're outside the function and we can call it:

print(f"You owe ${tip_calculator(20, good_service = False)}")
print(f"You owe ${tip_calculator(20)}")
print(f"You owe ${tip_calculator(20, True)}")

# We can use variable and assign return values to variables!
my_total = 40
my_service = 20 > 10
my_total_with_tip = tip_calculator(my_total, good_service = my_service)
print(f"I owe ${my_total_with_tip}")

You owe $23.599999999999998
You owe $24.0
You owe $24.0
I owe $48.0


### Lists



So far, we've been working with single things, like a single number or a single string. However, we often get want to work with many things at once. One way of arranging information is with a list. A list is what it sounds like: a list of things, separated by commas, inside brackets (`[]`).

Lists can have different types of things inside (numbers, strings, other lists, etc.) or all the same kind of thing. Lists are iterable, so you can do the same transformation on every element in the list.

In a list, order matters, and you can access each member of the list with its index value. The tricky thing about lists, though, is that the first member of the list has an index value of `0`, not `1`. As such, I tend to talk about the "zeroth" member and then the "oneth" and "twoth" members, to underscore the difference.

In [9]:
# Lists are lists of things surrounded by brackets

my_list = [my_name, 2, "tree", 82.4]
my_list_of_numbers = [3, 5, 2, 8, 29, 3.2]


print(my_list[0]) # get the zeroth member of the list.
print(my_list_of_numbers[-1]) # get the last member of the list.


Moacir
3.2


In [10]:
# Lists have lengths"

print(len(my_list))
print(len(my_list_of_numbers))

4
6


In [11]:
# You can iterate over a list using a `for` loop")
for element in my_list_of_numbers: # Note the colon! We've got to indent
  # "element" is an arbitrary variable that refers to the current list
  # member in the loop
  print(element * 2.1)

6.300000000000001
10.5
4.2
16.8
60.900000000000006
6.720000000000001


In [12]:
# You can also iterate over lists using the very Pythonic
# "list comprehension" syntax. This is hard to understand
# at first, but it allows for very pithy yet expressive code

my_list_of_squared_numbers = [n * n for n in my_list_of_numbers]
# above, "n" is doing the same work as "element" in the previous cell

print(my_list_of_squared_numbers)

[9, 25, 4, 64, 841, 10.240000000000002]


### Dictionaries

Dictionaries are objects that have properties. You define them by surrounding a list of properties by braces (`{}`). Properties take the form of a string that's the key, then a colon, and then the value. The value can be anything.

You can then refer to a property of a dictionary with bracket syntax, like with a list, but using the property key instead of the index.

In [21]:
a = "education, labor markets, economic empowerment and entrepreneurship with a gender lens, gender-based violence interventions, design and evaluation of education, social protection, entrepreneurship and income generation programs, survey design, econometric techniques with mixed methods, Big Data analysis with machine learning, applied research and consulting for governments, foundations, international and grassroots organizations"
a.split(", ")

['education',
 'labor markets',
 'economic empowerment and entrepreneurship with a gender lens',
 'gender-based violence interventions',
 'design and evaluation of education',
 'social protection',
 'entrepreneurship and income generation programs',
 'survey design',
 'econometric techniques with mixed methods',
 'Big Data analysis with machine learning',
 'applied research and consulting for governments',
 'foundations',
 'international and grassroots organizations']

In [23]:
# Let's define four dictionaries.

mitts = {
  "name": "Tamar Mitts",
  "title": "Assistant Professor of International and Public Affairs",
  "areas": [],
  "on_leave": True
}

ohalloran = {
  "name": "Sharyn O'Halloran",
  "title": "George Blumenthal Professor of Political Economics and Professor of International and Public Affairs",
  "areas": ["Financial regulation and risk analysis", "big data and computational social sciences"],
  "on_leave": True
}

oliver = {
    "name": "Nuria Oliver",
    "title": "Adjunct Professor of International and Public Affairs",
    "areas": ["Artificial Intelligence", "Data Science for Social Good", "Human-computer Interaction"],
    "on_leave": False
}

martinez_restrepo = {
  "name": "Susana Martínez-Restrepo",
  "title": "Adjunct Assistant Professor of International and Public Affairs",
  "areas": ['education', 'labor markets', 'economic empowerment and entrepreneurship with a gender lens',
    'gender-based violence interventions', 'design and evaluation of education',
    'social protection', 'entrepreneurship and income generation programs',
    'survey design', 'econometric techniques with mixed methods',
    'Big Data analysis with machine learning',
    'applied research and consulting for governments', 'foundations',
    'international and grassroots organizations'],
  "on_leave": False
}

# Now let's put all the dictionaries into a list.
faculty = [martinez_restrepo, oliver, ohalloran, mitts]


In [26]:
# Now we can iterate over the list and pull out specific properties

for professor in faculty:
  print(f"{professor['name']} is {professor['title']}.")

Susana Martínez-Restrepo is Adjunct Assistant Professor of International and Public Affairs.
Nuria Oliver is Adjunct Professor of International and Public Affairs.
Sharyn O'Halloran is George Blumenthal Professor of Political Economics and Professor of International and Public Affairs.
Tamar Mitts is Assistant Professor of International and Public Affairs.


It is in this ability to iterate over a list of dictionaries that really begins to unlock the power of computation. Say we want to see what professors are not on leave this year.

In [27]:
for professor in faculty:
  if professor["on_leave"] == False:
    print(professor["name"])

Susana Martínez-Restrepo
Nuria Oliver


_Terser!_

In [28]:
print([professor["name"] for professor in faculty if professor["on_leave"] == False])

['Susana Martínez-Restrepo', 'Nuria Oliver']


Say we want to find out which professors work on data, but we also want to know if they are on leave.

In [29]:
for professor in faculty:
  works_on_data = 0
  for interest in professor["areas"]:
    if ("data" in interest) or ("Data" in interest):
      works_on_data = 1
  if works_on_data == 1:
    leave_status = "is not"
    if professor["on_leave"]:
      leave_status = "is"
    print(f"{professor['name']} works on data and {leave_status} on leave this year.")


Susana Martínez-Restrepo works on data and is not on leave this year.
Nuria Oliver works on data and is not on leave this year.
Sharyn O'Halloran works on data and is on leave this year.


Let's write form letters to professors who work on data and are not on leave!

In [33]:
for professor in faculty:
  works_on_data = 0
  for interest in professor["areas"]:
    if ("data" in interest) or ("Data" in interest):
      works_on_data = 1
  if works_on_data == 1 and professor["on_leave"] == False:
    lastname = professor["name"].split(" ")[-1]
    print(f"Dear Professor {lastname},\n\nI am a very good student and want to learn data.\n\nThank you.\n\n---")


Dear Professor Martínez-Restrepo,

I am a very good student and want to learn data.

Thank you.

---
Dear Professor Oliver,

I am a very good student and want to learn data.

Thank you.

---


## Conclusion

There's a lot we didn't do in this module. We didn't use a single `import` statement, meaning we only used built-in Python code. There's a lot we skipped, most notably error checking and dealing with errors. Python is pretty good at describing errors, but it's annoying to write code that causes them intentionally. Better to have them arise organically in a group of beginners.

At intermediate levels, you can start [handling the errors](https://docs.python.org/3/tutorial/errors.html).

Hopefully this introduction has shown you that Python is accessible, especially if you like puzzles and problem solving. You can also get started doing things with Python with little training, picking things up as you go and improving every time you sit down to code.

At the same time, it's important to underscore that, outside of list comprehensions and other things like dunder methods, everything here is quickly portable to other languages. It's the concepts that matter, and it's the concepts that will occupy your imagination.