# DS 3000
Day 0

- motivating clear communication in DS
- admin/syllabus
- data collection/get to know you
- jupyter
    - markdown
    - gotchas (when in doubt: restart & run all)
    
    
- python lightening review (brush up on our skills & pick up a few new tricks)
    - types (tuple, lists, dict, floats & ints, string)
        - tuple unpacking
        - list indexing
        - string formatting & operations
    - if statement & comparison operators
    - iteration (for a while loops)
        - iterating through dict
    - functions
        - default arguments
        - multiple return values (or is it?) ... tuple unpacking

# Motivating Clear Communication in DS

Let's do a little preliminary data collection with a point.

# Admin/Syllabus

Take a look at the course Canvas page ([1:35 pm class](https://northeastern.instructure.com/courses/119304)) ([3:25 pm class](https://northeastern.instructure.com/courses/119299)) together.

Be sure to mention:
- Labs are graded on completion/effort not correctness and are due @ the end of each class we have them
    * You'll do a very quick Lab 1 **Today!**
- First homework will be assigned by the start of next class
- Sometime in the next two weeks, be sure to stop by office hours
- Our first visitor is scheduled for Sep. 23, a Senior Research Data Scientist at Meta
    

# An Example of Data Collection/Getting to Know You

Please fill out this [Google Form](https://forms.gle/wCJDVij1u1kxmjVr9) (the link is also on Canvas) as a way of providing me some information about yourself (and to provide us with a data set we can use for some examples going forward!). We'll have another instance later in the class where you can share something about yourself with your classmates.

**Notes** 

- when we use the data set later, all names/identifiers will be removed, but if you are uncomfortable providing any of the information, you may leave your response blank.
- this will count as your attendance point for today (we will use Qwickly for the first time next week)


# Jupyter Notebooks

Jupyter contains two cell (in these blue / green rectangles) types:
- markdown
    - markdown is a simple text/document formatting language
- python cells
    - a python interpreter is running in the background with all python variables / functions etc
    
By merging both, Jupyter provides a 'living' document which includes:
- results of analysis
- method of how analysis was done (the code)
- the ability to easily modify a few things and poke around or modify an analysis


    
# Installing Jupyter Notebook  
  
In the terminal type:

`pip install notebook`

Then to run Jupyter Notebook in the browser, type in the terminal:

`jupyter notebook`

**Note**: make sure the notebook file `.ipynb` is in the appropriate folder.

# Navigating Jupyter

- selecting a cell
- changing cell type
- running a cell
    - for markdown cell: renders text
    - for python cell: runs the code
- add a cell
- remove a cell


# The Jupyter-Python Gotcha

The state of variables and functions may depend on previous cells which have been modified or deleted:

In [1]:
def scale_it(x):
    return 4 * x

# note

In [2]:
scale_it(5)

20

This can be problematic as `.ipynb` are saved with the outputs of each cell!

Mitigate the issue by:
- observing the index (idx) in `In [idx]` and `Out [idx]`

Best practice:

- Give a fresh `Kernel>Restart & Run All`
    - before sharing
    - when debugging

**Note**: this is required of all your submissions for this class


# Jupyter Output

In [3]:
# by default jupyter echos the result of the final line's evaluation
x = 3
y = x + 5

In [4]:
# you can suppress it with ;
x+ 10;

In [5]:
# jupyter reproduces anything printed to the command line
print('hey, does this work?')
print('how about this?');

hey, does this work?
how about this?


# Markdown Rundown

Outside of class, spend 5-10 minutes with [this markdown guide](https://www.markdownguide.org/basic-syntax/)

# Headings

more #'s yields smaller headings

# one #
## two #
### three #

## Lists

here is a list of things I love:
- baseball
- python
- open source software

## Links
you can link to website, like [this one](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) which contains a more complete markdown reference (and was used to generate this quick markdown guide, many examples taken from them)

## Images

![alt text](http://dangerouslyirrelevant.org/wp-content/uploads/2016/03/2015-Gallup-Student-Poll-1-3.jpg "Logo Title Text 1")

## Tables

| Car Repair                         | Cost ($) | Prob | Salted Roads? |
|------------------------------------|----------|------|---------------|
| None                               | 0        | .9   | No            |
| Oxygen sensor replacement          | 250      | .01  | No                    |
| Under car rust repair              | 1000     | .02  | Yes           |
| Timing Belt Replacement            | 750      | .03  | No            |
| Fuel cap replacement or tightening | 25       | .03  | No            |
| rusted muffler repair              | 250      | .01  | Yes           |

Tables can be tough to generate by hand, go ahead and use a [table generator](https://www.tablesgenerator.com/markdown_tables) online to save yourself some time.

## Block quote
    
    This is a blockquote
    
## Python code for display (not for running)  
  

```python
import numpy as np
rng = np.random.default_rng(seed=0)
```

## Latex Math

$$ \sum_{i=0}^n a_i = \frac{a_0 + a_n}{2} (n + 1) $$

# Lab 1 (Due @ the end of class via Canvas)

Introduce yourself to your neighbors by writing a quick markdown biography of yourself in a Jupyter notebook.  Be sure to use:
- 2 different heading levels
- a list
- a link to some website
- an image
    - avoid pictures of yourself please
    - link to something available online, see example above

You're welcome to be funny, this is really an excuse to get warmed up with jupyter and markdown and meet each other.  

When you're done, swap laptops with a classmate who will add their own silly positive review, praising and encouraging whatever you've shared.

Please be mindful:
- everything you share should make all classmates feel safe and welcome
- your response should be positive, take the moment to make somebody else smile and feel good :)

# Python lightening review

- brush up our skills 
- pick up a few new tricks

This is not intended to be an introduction to these topics, but a quick refresher.  If you want a more complete set of DS3000 relevant review topics, see `py_review.ipynb` on the course Canvas site.

Also note:
- this review will be quicker paced than we go over new material
- please interrupt me (raise your hand or just speak up!), questionsmake class much more fun and tailored to your needs

# Types
- ints / floats
- tuple
    - an immutable sequence of objects
- list
    - a mutable sequence of objects
    - sorting
- dict
    - a mutable mapping between objects
- strings
    - python has a great library of methods for strings

## Tuples

In [6]:
# objects needn't be the same type
my_tuple = 1, 2, 3, 'six'

# you can use parenthases if you want
my_tuple = (1, 2, 3, 'six')

my_tuple[0]

1

In [7]:
# tuple's are immutable, we can't change the items inside them
my_tuple[0] = 'some new object'

TypeError: 'tuple' object does not support item assignment

## Lists

In [None]:
# cast my_tuple into a new list object
my_list = list(my_tuple)

# lists are mutable (may be changed)
my_list = [3, 1, 4, 1, 5, 9]

# index of first two items
my_list[-2:]

In [None]:
# you can use negative indexing to count backwards from end
my_list[-1] = 'a'
my_list

In [None]:
# you can sort a list
my_list = [3, 1, 4, 1, 5, 9]
sorted(my_list)

In [None]:
# you can sort a list (backwards)
sorted(my_list, reverse=True)

In [None]:
# add an item
my_list.append('dont forget me')
my_list

## Dictionaries

A real life dictionary assigns a definition (value) to every word (key).

Python dictionaries assign a (not necessarily unique) value to every key.  
(and they're not sorted like real dictionaries!)

In [None]:
# stores favorite numbers of some people
# keys are 'eric', 'qi', ...
# values are 17, 7, 3, 1
favorite_number_dict = {'eric':  17, 'qi': 7, 'lynne': 3, 'tamrat': 1}
favorite_number_dict

In [None]:
# what's tamrat's favorite number?
favorite_number_dict['tamrat']

In [None]:
# keys
'a' == ('a', 'b', 'c')

In [None]:
# keys must be immutable
some_dict_wont_work = {('a', 'b', 'c'): 123}

In [None]:
some_dict_wont_work[('a', 'b', 'c')]

In [None]:
# notice that each key has a unique value.  
# some values may be repeated among keys.  

# we shouldn't store the numbers as keys and values as names
#(remember the surjective-injective requires w/ inverse existence from CS1800?)
problematic_fav_number_dict = {17: 'eric', 7: 'qi', 3: 'lynne', 1 :'tamrat'}
problematic_fav_number_dict

# Strings

Python has awesome [string manipulation methods](https://docs.python.org/3/library/stdtypes.html#string-methods), we'll highlight a few useful ones here.  Worth a few minutes to famliarize yourself with the link.

(tip: handling file paths?  use [pathlib](https://docs.python.org/3/library/pathlib.html) instead of treating them as strings)

In [None]:
# string formatting (putting data into a string)
# see https://docs.python.org/3/library/string.html#formatspec for other ways to format besides .2f below
name = 'eric'
fav_num = 17
greeting_str = f'hi {name}, I heard your favorite number is about {fav_num:.1f}'

print(greeting_str)

In [None]:
some_string = 'hello python world!'

In [None]:
# replaces all occurances of one string with another
some_string.replace('python', 'ds3000')

In [None]:
# splits a string on all occurances of 'o'
some_string.split('o')

In [None]:
# joins a list of strings together with 
url = 'https://www.some-website.com/this-section/this-subsection/file_<useful-thing>_gibberish-here-too.html'
url.split('/')

In [None]:
url.split('/')[5]

In [None]:
# pull out the "useful thing" from the url
# assumes 
# 1st '_' is to immediate left "useful thing" 
# 2nd '_' is to immediate right of "useful thing"
s_useful_thing = url.split('/')[5].split('_')[1]
s_useful_thing

# Control Flow (If statements)

In [None]:
x = 3
if x > 10:
    print('x is smaller than 10')
else:
    print('x is not smaller than 10')

# Iteration (loops)

In [None]:
# looping by index
for idx in range(5):
    print(idx)

In [None]:
# break: you can stop a loop early if you want
for idx in range(5):
    print(idx)
    if idx > 2:
        # break immediately stops this iteration and
        # leaves the loop
        break

In [None]:
# continue: skip a particular iteration
for idx in range(5):
    if idx == 2:
        # continue statement ends this iteration of loop
        # unlike break, it continues to loop (3, 4 printed below too!)
        continue
    print(idx)

In [None]:
# looping over lists
month_list = ['jan', 'feb', 'mar', 'apr', 'may']
for month in month_list:
    print(month)

In [None]:
# looping over tuples
month_tuple = 'jan', 'feb', 'mar', 'apr', 'may'
for month in month_tuple:
    print(month)

In [None]:
# looping over (key, value) pairs of dictionary
fav_num_dict = {'eric': 17, 'qi': 7, 'tamrat': 1, 'lynne': 3}
for name, fav_num in fav_num_dict.items():
    print(f"{name}'s favorite number is {fav_num}")

In [None]:
# looping over just the keys
fav_num_dict = {'eric': 17, 'qi': 7, 'tamrat': 1, 'lynn': 3}
for name in fav_num_dict.keys():
    print(name)

In [None]:
# looping over just the values
fav_num_dict = {'eric': 17, 'qi': 7, 'tamrat': 1, 'lynne': 3}
for fav_num in fav_num_dict.values():
    print(fav_num)

# Functions

In [None]:
def double_it(x):
    """ double the value of x 
    
    Args:
        x (float): some input
        
    Returns:
        out (float): twice the output
    """
    return x * 2

In [None]:
double_it(2)

In [None]:
def apply_exponent(x, exp=3):
    """ double the value of x 
    
    Args:
        x (float): some input
        exp (float): exponent (default=3)
        
    Returns:
        out (float): x to the power exp
    """
    return x ** exp

In [None]:
# rely on default exp=3
apply_exponent(2)

In [None]:
# explicitly pass another exponent
# input arguments (2, 10) are assigned to variables (x, exp)
# by position
apply_exponent(2, 10)

In [None]:
# We can pass by name too
apply_exponent(x=2, exp=5)

In [None]:
# this is helpful since we needn't worry about the order or args in the fxn definition
# (super helpful with many function inputs, can be confusing to remember order of 5+ items!)
apply_exponent(exp=5, x=2)

In [None]:
# functions can return multiple values
def nonsense_fxn(some_list, some_int, some_float):
    """ a nonsense function
    
    Args:
        some_list (list): a list
        some_int (int): integer
        some_float (float): float
        
    Returns:
        list_truncate (list): some_list, truncated to first some_int
            items
        float_scaled (float): some_float, multiplied by some_int
    """
    # truncate list
    list_truncate = some_list[:some_int]
    
    # scale float
    float_scaled = some_float * some_int
    
    return list_truncate, float_scaled

In [None]:
some_list = ['a', 'b', 'c', 'd', 'e']
some_int = 3
some_float = 3.14159

lt_output, fs_output = nonsense_fxn(some_list, some_int, some_float)

lt_output

In [None]:
fs_output

In [None]:
# whats going on under the hood:
# function returns a tuple
tuple_out = nonsense_fxn(some_list, some_int, some_float)

# tuple is being unpacked into its elements
lt_output, fs_output = tuple_out

tuple_out