# Intro to Data Science - Python Basics

This Jupyter Notebook accompanies the Data Science lectures for the UChicago Pathways in Economics program for high school students. In this notebook we will cover the basics of programming in Python.
The topics include:

- Variable assignment
- Commenting
- Importing packages, Numpy
- Calculator functions
- Booleans
- Functions
- Pandas, Matplotlib

These lectures are inspired by the Quantecon Datascience notebooks. https://datascience.quantecon.org/

In [2]:
x = 5

## 1. Variable Assignment

Variable assignment associates a value to a variable. We can associate many different kinds of of values. 

Assign your names to the variable `myname`. 

In [3]:
print(x)

5


In [4]:
# assign the value `5` to the variable `x`
x = 5

In [5]:
# print out the value associated with `x`
print(x)

5


In [6]:
myname = "Alex"

In [7]:
print(myname)

Alex


In [8]:
myname # write the variable name here to output the value

'Alex'

We can write over the variable to assign a new value. 

In [9]:
myname = "Lebron James"

In [10]:
# Run again to see that value has changed
print(myname)

Lebron James


### Comments

Code comments are short notes that you leave for future readers of your code (usually yourself). 

Comments explain what the code does.

In [11]:
firstname = "Lebron"     # Assign the variable firstname
lastname  = "James"      # Assign the variable lastname

# Combine the firstname, a space, and lastname 
name = firstname + " " + lastname

# Print out the full name
print(name)

Lebron James


### Quiz 1
---
1. Assign your first name to the variable `firstname`.
2. Assign your last name to the variable `lastname`.
3. Combine your first name, last name, and a space into on variable `name`.
4. Use the function `len` to find out the number of letters in the variable `name`.
    - The way you use `len` is similar to the way we used `print`.
5. Assign the result of part (4) to a variable named `num_letters`.

### Example Answers 1

In [12]:
num_letter = len(myname)
print(num_letter)

12


## 2. Packages

Packages are collections of tools bundled together. Some of these libraries are massive projects maintained by many developers online.

- `numpy` is a package with many tools for working with vectors and matrices. 

- `pandas` is a package for data manipulation and analysis.

- `matplotlib` is a library for making plots and data visualization.
---

Load packages using the function `import`.


In [14]:
import numpy as np

Access functions and objects from the package using the following syntax. 

`package.function`

In [16]:
np.log(98)        # Access the function log from the package numpy.

4.584967478670572

### Python can also function as a calculator

In [23]:
# Assign variables
a = 4
b = 2

print(a)
print(b)

4
2


In [18]:
# Do arithmetic
print("a + b is", a + b) # addition
print("a - b is", a - b) # subtraction
print("a * b is", a * b) # multiplication
print("a / b is", a / b) # division
print("a ** b is", a**b) # exponent

print("\nPython follows PEMDAS.\n")

out1 = (a + b) * a
out2 = a + (b * a)

print("out1 = ", out1)
print("out2 = ", out2)

a + b is 6
a - b is 2
a * b is 8
a / b is 2.0
a ** b is 16

Python follows PEMDAS.

out1 =  24
out2 =  12


### Quiz 2
---
#### Part A
1. Load the package `time`.
2. Use google to find the function from `time` that returns the local time.
3. Call that function. Output should look like below.

>time.struct_time(tm_year=2021, tm_mon=2, tm_mday=5, tm_hour=19,  tm_min=41, tm_sec=59, tm_wday=4, tm_yday=36, tm_isdst=0)

---
#### Part B (math)
We're going to verify the "trick" that the percent difference $\frac{x-y}{x}$ between two numbers close to 1 can be well approximated by the difference between the log of the two numbers $\log(x) - \log(y)$.

Let, 

$$ x = 1.05 \\ y = 1.02
$$

4. Assign the above values to $x$ and $y$ respectively. 
5. Assign $z_1 = \frac{x-y}{x}$.

6. Google how to compute logarithms using `numpy`.
7. Assign $z_2 = \log(x) - \log(y)$.
8. Google how to compute absolute value using `numpy`.
9. Compute $error=|z_1 - z_2|$
10. Distance should be around $0.0004$. 

### Example Answers 2

In [22]:
# Part A
import time

print(time.localtime())

time.struct_time(tm_year=2021, tm_mon=2, tm_mday=21, tm_hour=16, tm_min=34, tm_sec=8, tm_wday=6, tm_yday=52, tm_isdst=0)

In [26]:
# Part B
x = 1.05
y = 1.02 

z1 = (1.05 - 1.02) / 1.05
z2 = np.log(x) - np.log(y)

error = np.abs(z1 - z2)
print(error)

0.0004161083018237241


## 3. Booleans

There are different kinds of variables. So far we've worked with

- **strings** (words)
- **floats** (numbers with decimals)
- **integers** (numbers without decimals)


Now we're going to work with Boolean variables. 

A boolean variable is either true or false. 

In [25]:
x = True
print(x)
print(type(x))

True
<class 'bool'>


In [27]:
y = False
print(y)

False


In [29]:
x = 5
print(type(x))

x = True
print(type(x))

<class 'int'>
<class 'bool'>


Booleans variables are the output of **variable comparison**.

In [35]:
a = 10.5
b = 4.0
c = 4.0 

print(b == c)

# print("a > b", "is", a > b)   # greater than
# print("a < b", "is", a < b)   # less than 
# print("a == b", "is", a == b) # check for equality
# print("a >= b", "is", a >= b) # greater than or equal to
# print("a <= b", "is", a <= b) # less than or equal to

True


**Multiple comparisons** are handy.

In [44]:
a = True
b = False

In [45]:
print(abc)

5


In [42]:
a or b # this is true if either a or b is true

True

## 4. Functions

Functions take inputs and return outputs. We've already worked with a few functions so far. 

- `print()`
- `np.log()`
- `+`
- `-`
- `len()`

We can write our own functions. 

Every function starts with `def` and ends with `return`.

In [46]:
def add2(x):
    '''This is a simple function that adds 2 to any input.'''
    result = x + 2
    return result

In [47]:
add2(10)

# print('Add 2 to 5 = ', add2(5))

12

### Lists
Lists are a collections of items. 

Each item can be of any type. 

In [49]:
mylist = [5, "Alex", 6.3]
mylist

[5, 'Alex', 6.3]

Access items in a list using brackets.

This is called indexing.

- Python starts counting at zero.

In [50]:
mylist[0] # first element

5

In [51]:
print(mylist[1]) # second element

print(mylist[2]) # 3rd element

print(mylist[-1]) # -1 means last element


Alex
6.3
6.3


### Quiz 4
---
1. Make a list, `list1` with the numbers 1, 4, 7, 10, and 200.
2. Write a function which computes the mean of `list1`.
    - Remember $mean = \frac{sum}{num. elements}$
3. Google the `numpy` function which computes the mean.
4. Check the results of your function against the numpy function.

### Example answers 4

In [151]:
list1 = [1, 4, 7, 10, 200]

In [152]:
def get_avg(inputlist):
    
    # Add all 5 numbers together
    listsum = inputlist[0] + inputlist[1] + inputlist[2] + inputlist[3] + inputlist[4]
          
    # Divide by 5 to compute the average
    avg = listsum / len(inputlist)
    return avg
    

In [153]:
get_avg(list1)

44.4

In [154]:
np.mean(list1)

44.4

## 5. For Loops

For loops loop over a list and do an operation for each element. 

In [155]:
for item in mylist:
    print(item)

5
Alex
6.3


In [24]:
for ii in range(10):
    # Range(10) returns a list of numbers 0-9
    print('ii squared = ', ii ** 2)

ii squared =  0
ii squared =  1
ii squared =  4
ii squared =  9
ii squared =  16
ii squared =  25
ii squared =  36
ii squared =  49
ii squared =  64
ii squared =  81


### Quiz 5
---
1. Write a new function that will compute the mean of `list1` using a for loop.
2. Check that we get the same answer as before. 
3. Now define a new list, `list2` $= [1000, 2000, 3000]$
4. Use your new function to compute the average of `list2`.

### Example answers 5

In [157]:
list2 = [1000, 2000, 3000]

In [158]:
def get_avg_new(inputlist):
    
    tally = 0
    
    # Add the numbers together using a for-loop
    for number in inputlist:
        tally = tally + number
                          
    # Divide by the number of elements
    avg = tally / len(inputlist)
    return avg
    

In [159]:
get_avg_new(list2)

2000.0