# Worksheet 25: Introduction to Python

Jupyter Notebooks are convenient to organize our code with some text!

Python is a versatile programming language used widely in data science, web development, automation, and more.

In this notebook, we will discuss:
- Basic Python syntax and operations.
- Introduction to NumPy for numerical computing.
- Introduction to Pandas for data structures.

### 1. Set up

In [13]:
# IMPORTANT
# Running this chunk lets you have multiple outputs from a single chunk; run it first!
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

We will use some packages today but will load them as we learn!

### 2. Basics in Python

Let's first talk about different types of objects and assignments:

In [14]:
# Create an object and print
two = 2
print(two)

2


In [15]:
# Create two objects at once and print
one, four = two - 1, two + two
print(one)
print(four)

1
4


In [16]:
# Create a list of numbers
one2five = [1,2,3,4,5]
print(one2five)

[1, 2, 3, 4, 5]


In [17]:
# Create a list of numbers from 1 to 7, by 2
x = list(range(1,7,2))
print(x)
# anything you notice?

[1, 3, 5]


Python also has some basic built-in functions:

In [18]:
# Some functions that can be convenient
len(x)
type(x)
min(x)
max(x)

3

list

1

5

In [19]:
# What about the mean?
mean(x)

NameError: name 'mean' is not defined

There are 5 main data types:

In [None]:
age_in_years = 35
type(age_in_years)

int

In [None]:
previous_evals = 4.4
type(previous_evals)

float

In [None]:
course = "SDS 322E"
type(course)

str

In [None]:
enjoyed_course = True
type(enjoyed_course)

bool

In [20]:
my_instructor = [age_in_years, previous_evals, course, enjoyed_course]
type(my_instructor)

list

Quick indexing in Python is also done using `[]` but since Python begins counting at 0, the first element is at the 0th position. In general, the nth element is located in the (n−1)th position.

In [21]:
my_instructor[0]

35

#### Try it! Index the course in `my_instructor`.

In [28]:
# Write code here
my_instructor[2]

'SDS 322E'

### 3. Working with NumPy

The package `NumPy` has many built-in functions that we can use when working with data structures. When importing a package, we usually give it a short name. Then we will call functions from this package using the abbreviation:

In [23]:
# Import a new package
import numpy as np

In [24]:
# Create an array (vector)
a = np.array(range(10)) # array is from the numpy package

# take a look
a

# versus
print(a)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

[0 1 2 3 4 5 6 7 8 9]


In [25]:
# Operations on elements of arrays
a*2 # multiply by 2
a**2 # square

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

In [26]:
# Common operations on arrays
# sum of elements
np.sum(a)

# mean of elements
np.mean(a)

# standard devaition of a list of elements
np.std(a)

np.int64(45)

np.float64(4.5)

np.float64(2.8722813232690143)

#### Try it! In a team of 4, create an array called `cook` that contains either `True` and `False` values to represent if you cooked for Thanksgiving or not. Find the proportion of members who did cook.

In [31]:
# Write code here
cook = np.array([False, True, False, False])
float(np.mean(cook))

0.25

### 4. Working with pandas

Objects in `pandas` can be thought as enhanced versions of `numpy` structured arrays in which the rows and columns are identified with labels rather than integer indices.

In [32]:
# Import a new package
import pandas as pd

In [33]:
# Create a data frame
df = pd.DataFrame([{'a': 4.4, 'b': True}, {'a': 3, 'b': False}])

# look at the data frame
df

# Index the data frame
df['a']

Unnamed: 0,a,b
0,4.4,True
1,3.0,False


0    4.4
1    3.0
Name: a, dtype: float64

#### Try it! In a team of 4, create a data frame called `thanks`, including: a column called `name` with the name of each member, a column called `cook` to indicate if that member cooked or not for Thanksgiving, and another column called `sleep` with each member's longest time of sleep during the break.

In [None]:
# Write code here
thanks = pd.DataFrame([{'name': 'Connor', 'cook': False}, 
                       {'name': 'Jeff', 'cook': True}, 
                       {'name': 'Bill', 'cook': False}, 
                       {'name': 'Joe', 'cook': False}])
thanks

Unnamed: 0,name,cook
0,Connor,False
1,Jeff,True
2,Bill,False
3,Joe,False


Next, we will talk more about pandas with data wrangling.