# Dealing with Data Spring 2020 – Class 1

## February 13th, 2020

---

## Agenda:

1. Introductions
2. Laptop Setup
3. Break
4. First Steps
5. Group Project Overview & Group Formation

## Course Goals: 

- Empower you with hands-on knowledge of `Python` and `SQL`,
- Enable you to implement (basic) Databases and (small) Data Pipelines,
- Enrich your resume with a practical data application project, and, 
- Expose you to Big Data concepts and technologies. 

## We Will _Not_ Cover:

- R,
- Pyton's 'Pandas' library, 
- Data analysis techniques, 
- Data visualization techniques, 
- Data science / machine learning algorithms, 
- Statistics

## You're Best Suited for this Class if: 

1. You have the time to invest in practicing your programming at home, 
2. You don't have any prior experience with a programming language. 

## Grading

- 50% individual homework assignments
- 10% in-class participation 
- 10% team-member ratings for your group project
- 30% group project 

## Homework

- 7 assignments over the course's duration (mix of coding and short essays) 
- Your 5 best *completed* assignments will count towards your final grade
- Late assignments are subject to a 3% penalty each day; After 1 week you will receive a 0

---

---

# Laptop Setup

## NYU Classes

Make sure you are able to access our course page on NYU Classes. This is where I will be posting all of our course content (notebooks, datasets, et. cetera). This is also where all of your assignments will be posted, graded, and returned. 

## Colab

In order to standardize the way we all use Jupyter Notebook (described below) we are going to use Google's Colab (https://colab.research.google.com/notebooks/welcome.ipynb) 

Think of using Colab as renting a computer via your web browser (I recommend Chrome) that you can use. This is important because that means that in order to save your work, you need to download it from Colab onto your computer (machine) and re-upload it next time you'd like to work with it. 

For instance, if I want to open today's class notebook in Colab, I: 

<br>

1. **Will go to https://colab.research.google.com/notebooks/welcome.ipynb**

<br> 

2. **Will click "file" > "upload notebook"** 

<div> 
    <img src="attachment:Screen%20Shot%202019-10-23%20at%201.54.00%20PM.png" width=600  />
</div>

3. **Note that a Jupyter Notebook will always have a .ipynb extension. In Colab I can click "Upload" then "Choose File" and upload the notebook (that I have downloaded from NYU Classes) to Colab.**

<div> 
    <img src="attachment:Screen%20Shot%202019-10-23%20at%201.54.49%20PM.png" width=600  />
</div>

4. **Once I click "Open" ...**

<div> 
    <img src="attachment:Screen%20Shot%202019-10-23%20at%201.55.06%20PM.png" width=600  />
</div>

5. **Colab should open the notebook, and voila!**

<div> 
    <img src="attachment:Screen%20Shot%202019-10-23%20at%201.55.27%20PM.png" width=600  />
</div>

---

---

# So, What is Programming? 

- Simple, written instructions that tell computers what to do, 
- These instructions (building blocks) can be used to build something more complex, useful, and beautiful. 

---

---

# A few basics regarding Jupyter Notebooks (formerly iPython Notebooks)

<br>

1. To execute a cell, hit Shift+Enter <br><br>

2. If you want to execute a cell, and add an aditional, blank cell below it, hit Option+Enter <br><br>

3. On the top of the window you will see this cell says 'Markdown' - this means you can type as if you would in any word processor. To execute any code in Python, you must ensure the cel is in 'Code' (you can ignore the other two options for now). 

    NB: You can use this cheat sheet in order to do things in Markdown like make headers or use bold/italics: https://guides.github.com/pdfs/markdown-cheatsheet-online.pdf <br><br>

4. If you make a mistake and, for instance, start a 'while' loop with no ending and your computer gets stuck, you can click the square next to the 'Code/Markdown' dropdown menu above, which will stop the kernel (aka, the cell). <br><br>

5. When a cell is running, there will be a little '*' to its left. This means it's working on executing the code in the cell (the more complex the code, the longer it will take to execute). Once it is done, it will display a number. The numbers are really just there to show you the order in which you executed your cells, and nothing more. They can largely be ignored.

<br>

## Now, for a bit of background on Python! 

<br>

Python is one of many programming languages, and just like other languages, it has its pros and cons. 

For more on Python, they have a handy site: https://www.python.org/about/gettingstarted/, but for now just know that it is what's called an object-oriented, high-level programming language (read as: versatile and fairly basic). 

<br>

Python was released more than 25 years ago, in 1991, by a Guido van Rossum. In his own words:

_"...In December 1989, I was looking for a "hobby" programming project that would keep me occupied during the week around Christmas. My office ... would be closed, but I had a home computer, and not much else on my hands. I decided to write an interpreter for the new scripting language I had been thinking about lately: a descendant of ABC that would appeal to Unix/C hackers. I chose Python as a working title for the project, being in a slightly irreverent mood (and a big fan of Monty Python's Flying Circus)._

<br>


Finally, and perhaps most importantly, if you ever have any questions, you can ask me, or, visit https://stackoverflow.com/, quite possibly the most useful tool on the internet. Consider it the Google of coding questions. Input your search query (I.e., 'Convert string to integer') and you'll get hundreds if not thousands of answers!

<br>

P.S. You are most likely running Python 3.6.4, which is the latest vertsion (to check which version you are running, open your terminal and simply type "python"). Unfortunately, with each new update to Python there are some quirky changes. For instance, in previous Python versions, to print something you would say: 

    print "Hello, my name is Alex" 

Whereas in the latest version, you say: 

    print("Hello, my name is Alex") 

It may seem trivial, but it's anything but when you can't figure out why the code you've spent all night writing won't execute. 

P.P.S. Perhaps most important of all, if you are in a 'Code' cell and want to type something non-code, just put a '#' before it (demonstrated below). 

In [0]:
print("Hello World")

---

# Exercise 1: Print your own message below:

In [0]:
# print your own message here

---

# Comments

You will have noticed in the cell above the line 
`# your code here` 

This is a _comment in the code_

Comments are notes in your source code that aren't exectued when your code is run. These are useful for reminding yourself what your code does, and for notifying others to your intentions. 

In [0]:
# a comment

# we use comments to write down things that we want to rememember
# or instructions for other users that will read/use our code

# anything after the # is ignored by python.

print("I could have written code like this.")  # the comment after is ignored

You can also use a comment to "disable" or "comment out" a piece of code

In [0]:
# print("This won't run.")

# Multiline comments 

Python has single line and multiline comments. These multiline comments are also called _docstrings_ because they are often used to write documentation for the code. 

In [0]:
# this is a single line comment
# this a second single line comment

print("trying out some comments")

"""This code is used to print 
a message of your choice. Notice that 
this message that is included in the triple double
quotes will not execute."""

print("let's see if that worked...")

---

# Markdown 

In notebooks, you can simply double click on a piece of text and then edit it. To restore it back, from edit mode, press "Run". Markdown is a very simple language for formatting text, and you can read further instructions by going to "Help : Markdown" from the menu, or checking the [online examples](http://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html)

Below, we will see a few examples. (Double click on each cell below to see how it looks in Markdown).

# Big Header 
## Smaller Header
### A little smaller header
#### Getting smaller and smaller
##### Very very small header
###### I do not even know if this is a header anymore

If you want to write code, you can use the backtick characters: `print("Hello World")`

Or you can use triple backticks, and can get color coding by specifying the language that you use.
```python
print("Hello World")
```

```html
<html>
    <body>Hello World!</body>
</html>
```

* You can also create bulleted lists: 
* ...
* Learn Python
* ...
* Millions!

And ordered lists:
1. Learn Python
2. Learn SQL
3. ....
4. Millions!

---

# Expressions, Data Types, and Variables

We will start first by dealing with numbers and see how we can do various math and logical operations with Python. Here are a few operators that we will be using:

* `+` addition, add two numbers
* `-` subtraction and negation
* `*` multiplication, multiply two numbers
* `/` divide first  with the outcome being decimal outcome
* `//` division with the outcome being integer (return only the integer part of the outcome)
* `%` modulo/division remainder, what is the remainder when the first number is divided by the second?
* `**` power, raise the first number into a power given by the second number

In [0]:
print("addition:")
print(1 + 1)
print(1.5 + 1)

In [0]:
print("subtraction:")
print(1 - 1)
print(1 - 1.0)

In [0]:
print("negation:")
print(-5)

In [0]:
print("multiplication:")
print(3 * 3)  # integer
print(3 * 3.0)  # floating point

In [0]:
print("division:")
print(4 / 2)
print(5 / 2)

In [0]:
print("integer division:")
print(4 // 2)
print(5 // 2)

In [0]:
print("modulo:")  # modulo is a fancy term for remainder of a division
print(4 % 2)
print(5 % 2)
print(5.5 % 2)
print(75 % 4)
print(10 % 3)

In [0]:
print("power:")
print(10**3)
print(2**5)

---

# Exercise 2

Assume that you go to a restaurant, and you order $50 worth of food. Then you need to add the NY Sales Tax (8.875%) and add a tip (say, 20%). Write down the calculation that will print the total cost of the food.

In [0]:
# tip calculated on the pre-tax amount

In [0]:
# tip calculated on the after-tax amount

# Solution

In [0]:
print("The total cost of the meal, with tip on the pre-tax amount is:")
print(50 + 50 * 8.875 / 100 + 50 * 20 / 100)

In [0]:
print("The total cost of the meal, with tip on the post-tax amount is:")
print(50 + 50 * 8.875 / 100 + (50 + 50 * 8.875 / 100) * 20 / 100)

---

# Exercise 3


You have a stock that closed at \$550 on Monday, and then closed at \$560 on Tuesday. Calculate its daily return: the daily return is defined as the difference in the closing prices, divided by the closing price the day before.

In [0]:
# solution here

# Solution

In [0]:
print("The daily return is:")
print((560 - 550) / 550)

In [0]:
# if we want to show percentages, we can modify our code and add a multiplication with 100 in front:
print("The daily return is:")
print(100 * (560 - 550) / 550)

---

# Exercise 4



Assume that someone's height is 5 ft and 9 inches. Conver that to centimeters. Remember that one foot is 30.48 centimeters, and one inch is 2.54 centimeters.

In [0]:
# solution here

# Solution

In [0]:
print("The height in centimeters is:")
print(5 * 30.48 + 9 * 2.54)

---

# Exercise 5



Write a Python program to compute the future value of a deposit with a a principal amount of \$10,000, rate of interest 3%, and after 5 years. Remember that the value is: $ Principal * (1+interest)^{years}$ and you will need to use the power operator  (**) for this calculation.

In [0]:
# solution here

# Solution

In [0]:
print("The value of the deposit will be:")
print(10000 * (1 + 3 / 100)**5)

---

# Exercise 6

Assume that someone's height is 180 centimeters. Convert that height into feet and inches. Remember that one foot is 30.48 centimeters, and one inch is 2.54 centimeters. You will need to use the modulo operator (%) for this conversion. Also use the `int(...)` function to get the integer part of a division. Optionally, you can also use the `round(...)` function to get the rounded integer number from a decimal.

In [0]:
# solution here

# Solution

In [0]:
print("180 cm is")
print(180 // 30.48)
print("feet and ")
print(180 % 30.48 / 2.54)
print("inches")

In [0]:
# let’s see how we can print the message in one line
print("180 cm is", 180 // 30.48, "feet and ", 180 % 30.48 / 2.54, "inches")

In [0]:
# Let's use int() function to make the output look a bit nicer.
print("180 cm is", int(180 / 30.48), "feet and", int(180 % 30.48 / 2.54), "inches")

In [0]:
# Let's use the functions int() and round()
print("180 cm is", int(180 / 30.48), "feet and", round(180 % 30.48 / 2.54, 1), "inches")

---

# Data Types

Notice in the examples below, the different operations result in different types of outcomes.

In [0]:
print(1 + 1) # this is an integer

In [0]:
print(1.0 + 1) # this is an integer added to a decimal/floating point

In [0]:
print("1" + "1") # this a string/text 

In [0]:
print(100 * 2)

In [0]:
print(100 * "2")

We will learn more about the different types of variables later, but for now, remember that you can use the `type` command to find out the type of an expression:

In [0]:
type(1)

In [0]:
type(3.14)

In [0]:
type("Hello")

In [0]:
type("1")

In [0]:
type("3.14")

In [0]:
type(1+1)

In [0]:
type("1" + "1")

In [0]:
type(10 * "2")

**Lesson**: Notice the different data types above: `str` (string), `int` (integer), `float` (decimal numbers). One thing hat you will notice is that the outcome of the various operators (`+`, `*`, etc) can be different based on the data types involved.

---

## Exercise 7 

What will the following program print out? Figure it our _before_ running the program.

In [0]:
x = 41
x = x + 1
print(x)

In [0]:
x = "41"
x = x + "1"
print(x)

---

# A Note About Stack Overflow 

Consider [Stack Overflow](https://stackoverflow.com/) your go-to (other than myself and your TA) for any questions you may have. It is a life-saver for even the most experienced programmer. To ask the best possible question on Stack Overflow: 

- Explain clearly what you're trying to do,
- Identify the version of software you're using,
- Provide an error message,
- Explain what you've already tried to make it work

# Recommended Readings

- [Paul Ford's "What is Code"](https://www.bloomberg.com/graphics/2015-paul-ford-what-is-code/)