In [1]:
# Import statements 
from datascience import *
import numpy as np

# Intro 

This notebook is intended to be paired with the [Data 8 Coding Guide](https://drive.google.com/file/d/19ydn1pUwNQkpudAkMcW5btHLg1NDRFqv/view), and demonstrates some of the basic Python tools used in Data 8, as well as common coding pitfalls. To allow you to run the whole notebook at once, `try` and `except` statements have been added; you are not responsible for understanding these functions. In addition to this notebook, another great resource for walking through code is [Python Tutor](http://www.pythontutor.com/). 

# Data Types
### Examples of Data Types
The cells below show three different varieties of single values. 

In [2]:
# Example of boolean.
True

True

In [3]:
# Example of string.
"Hello Data 8!"

'Hello Data 8!'

In [4]:
# Example of numbers (integer and floating point).
1, 1.0

(1, 1.0)

Single values can be combined together in an array by using the `make_array` function, or calling functions like `np.arange`.

In [5]:
# Creating an array of strings with make_array.
make_array("string 1", "string 2", "string 3") 

array(['string 1', 'string 2', 'string 3'], dtype='<U8')

In [6]:
# Creating an array of 10 numbers with np.arange.
np.arange(1, 11)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

Finally, we can create Tables by reading in a dataset from a .csv file or creating an empty Table with `Table()` and adding columns using `with_columns`. 

In [7]:
# Creating a Table with textbook chapters
Table().with_columns(
    "Textbook Chapter Number",
    np.arange(1, 4), "Topic",
    make_array("Data Science", "Causality and Experiments", "Programming in Python")
)

Textbook Chapter Number,Topic
1,Data Science
2,Causality and Experiments
3,Programming in Python


### Common Mistakes with Data Types

In [8]:
# You cannot use array operations on a Table, or Table operations on an array. 
# For example, calling .num_rows on an array will error.
try:
    make_array(1, 2, 3).num_rows
except Exception as e:
    print("Encountered error!", e.__repr__())

Encountered error! AttributeError("'numpy.ndarray' object has no attribute 'num_rows'")


In [9]:
# Here is how to fix the above error: 
len(make_array(1, 2, 3))

3

In [10]:
# Remember that the output of .select is another Table and not an array, even if only one column is specified.  
Table().with_columns(
    "Textbook Chapter Number",
    np.arange(1, 4), 
    "Topic",
    make_array("Data Science", "Causality and Experiments", "Programming in Python")
).select("Topic")

Topic
Data Science
Causality and Experiments
Programming in Python


In [11]:
# To get an array from a Table, use .column. 
Table().with_columns(
    "Textbook Chapter Number",
    np.arange(1, 4), 
    "Topic",
    make_array("Data Science", "Causality and Experiments", "Programming in Python")
).column("Topic")

array(['Data Science', 'Causality and Experiments',
       'Programming in Python'], dtype='<U25')

In [12]:
# Be way of matching incompatible data types. 
try:
    "1" + 2
except Exception as e:
    print("Encountered error!", e.__repr__())

Encountered error! TypeError('can only concatenate str (not "int") to str')


# Understanding Names
Names (or variables) allow us to keep track of a specific value. We assign values to variables with the `=` operator; the name of the variable is always on the left hand side, while the value is on the right hand side. The right hand side is always evaluated before the right hand side. 
### Examples of Name Usage

In [13]:
# Assign x to 10, then reassign x to x + 20. 
x = 10
x = x + 20
x

30

We evaluate expressions that contain names by replacing the name with its value. In the example below, we replace `fahrenheit_temps` on the second line with an array of `[60, 70, 80]` after the assignment on the first line. 

In [14]:
# Convert farenheit temperatures into celsius temperatures.
fahrenheit_temps = make_array(60, 70, 80)
celsius_temps = 5 / 9 * (fahrenheit_temps - 32)
celsius_temps

array([15.55555556, 21.11111111, 26.66666667])

In [15]:
# Note that variables are able to remember their value. Here's x that we saw earlier:
x

30

### Common Mistakes with Names

In [16]:
# Don't mix up names and strings. Quotes denote that a piece of text is a string, 
# whereas leaving out quotes will instruct Python to evaluate the text as a name.
# Here's farenheit_temps as a string: 
"farenheit_temps"

'farenheit_temps'

In [17]:
# This is different than x as a name:
fahrenheit_temps

array([60, 70, 80], dtype=int64)

In [18]:
# Be wary of this distinction when you access columns in Tables.
# We assign chapters to be the table of textbook chapters and column_name to be "Topic".
# We are able to perform a .select operation using this name:
chapters = Table().with_columns(
    "Textbook Chapter Number",
    np.arange(1, 4), 
    "Topic",
    make_array("Data Science", "Causality and Experiments", "Programming in Python")
)
column_name = "Topic"
chapters.select(column_name)

Topic
Data Science
Causality and Experiments
Programming in Python


In [19]:
# However, if we instead used "column_name", this will result in an error.
try:
    chapters.select("column_name")
except Exception as e:
    print("Encountered error!", e.__repr__())

Encountered error! ValueError('The column "column_name" is not in the table. The table contains these columns: Textbook Chapter Number, Topic')


In [20]:
# If you see a NameError, it indicates that you've probably made a typo somewhere. 
# Make sure to double-check your spelling if you see this error.
try:
    fahrenheit_temp
except Exception as e:
    print("Encountered error!", e.__repr__())

Encountered error! NameError("name 'fahrenheit_temp' is not defined")


# Functions
Functions allow us to write a single piece of code that can be reused on multiple inputs.
### Examples of Functions

In [21]:
# Functions can take in 0 or more arguments as input. Here's an example of a function that takes 2 numbers and together: 
def add(x, y):
    return x + y
add(3, 4)

7

In [22]:
# One of the main benefits of functions is that they can be called multiple times on different inputs. 
add(1, 2)

3

In [23]:
# Arguments passed into functions can also be names. 
a = 8
b = 3
add(a, b)

11

### Common Mistakes with Creating Functions

In [24]:
# Make sure you don't hard code names in function bodies by referring to variables outside the function body.
# If you create a function that takes in arguments, you should always use them. 
# Here's an example of a buggy adding function that refers to the names a and b from the previous cell instead of the arguments.
# As a result, the output will always be a + b regardless of what arguments are passed in. 
def buggyAdder(x, y):
    return a + b
buggyAdder(1, 2)

11

In [25]:
# Always verify that you pass in the correct data types to your functions. 
# Here's an example of a data type mismatch caused by passing a string and an integer into the add function.
try:
    add("1", 2)
except Exception as e:
    print("Encountered error!", e.__repr__())

Encountered error! TypeError('can only concatenate str (not "int") to str')


In [26]:
# Make sure you pass the correct number of number arguments into a function. 
# Here's what happens when we try to pass only one argument into our add function. 
try:
    add(1)
except Exception as e:
    print("Encountered error!", e.__repr__())

Encountered error! TypeError("add() missing 1 required positional argument: 'y'")


In [27]:
# Variables created inside functions only exist inside the function. 
# Trying to refer to them outside of the function will result in an error. 
# The below function adds the "Hello " as a prefix to a string s that's passed in. 
# The function contains a string variable prefix that's assigned to "Hello ".
def greet(s):
    prefix = "Hello "
    return prefix + s
greet("Cool Cats and Kittens")

'Hello Cool Cats and Kittens'

In [28]:
# Here's what happens when you try to access the prefix variables 
try: 
    print(prefix)
except Exception as e:
    print("Encountered error!", e.__repr__())

Encountered error! NameError("name 'prefix' is not defined")


In [29]:
# If you forget a return statement, your function won't be able to pass values outside the function. 
# Here's an example of a function that calculates the mean of an array, but is missing a return statement. 
# You'll notice that there's no output for this cell. 
def mean(array):
    sum(array) / len(array)
mean(np.arange(5))

In [30]:
# Finally, make sure you store the output or result of the function, or else all the work that the function did will be lost. 
# In the following code snippet, we fix the mean function and want to store and print the value of the mean of an array. 
# However, since we forget to assign the output of the function to anything, the code will fail.
def mean(array):
    return sum(array) / len(array)
meanOfArray = 0 # Dummy value 
mean(make_array(4, 5, 6))
print("The mean of the array is: ", meanOfArray)

The mean of the array is:  0


# Iteration
Iteration allows you to repeat a block of code a specified number of times.
### Examples of Iteration

In [31]:
# Basic for loop that prints out each index. 
# Note that at the ith iteration, element is assigned to np.arange(10, 20).item(i). 
for element in np.arange(10, 20):
    print(element)

10
11
12
13
14
15
16
17
18
19


There's two common paradigms for iteration. In the first paradigm, you don't care about what index you're on, and you only want to repeat something a specified number of times. For example, the following code creates an array of 10 zeroes. Note that the iterating variable is called `unused`, and is never referred to in the body of the loop. 

In [32]:
# The following piece of code creates an array of 10 zeroes.
zeroes = make_array() 
for unused in np.arange(10):
    zeroes = np.append(zeroes, 0)
zeroes

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

On the other hand, if you do care about the index, you want to incorporate the index variable in the body of your for loop. 

In [33]:
# The following piece of code counts the number of even numbers in the numbers from 1 to 10.
# It prints the index, as well as if the number is even or odd. 
numEvens = 0
for number in np.arange(1, 11):
    if (number % 2 == 0):
        print(number, "Even number found!")
        numEvens = numEvens + 1
    else:
        print(number, "Odd number found!")
print("Count of even numbers:", numEvens)

1 Odd number found!
2 Even number found!
3 Odd number found!
4 Even number found!
5 Odd number found!
6 Even number found!
7 Odd number found!
8 Even number found!
9 Odd number found!
10 Even number found!
Count of even numbers: 5
