# Python Fundamentals: Introduction to Jupyter and Python

* * * 
<div class="alert alert-success">  
    
### Learning Objectives 
    
* Run and edit Jupyter Notebooks.
* Assign and calculate with variables in Python.
* Getting help when dealing with error messages. 
</div>

### Icons Used in This Notebook
🔔 **Question**: A quick question to help you understand what's going on.<br>
🥊 **Challenge**: Interactive exercise. We'll work through these in the workshop!<br>
💡 **Tip**: How to do something a bit more efficiently or effectively.<br>
🎬 **Demo**: Showing off something more advanced – so you know what Python can be used for!<br>

### Sections
1. [This Workshop](#this)
2. [Working With Jupyter Notebooks](#jupyter)
3. [Variables in Python](#variables)
4. [Calculating With Variables](#calculate)
5. [Demo: Working With Data Frames](#demo)

<a id='this'></a>

# This Workshop

Python is a general-purpose, powerful, and high-level programming language. It can be used for many tasks, including building websites and software, automating tasks, conducting data analyses, and more.

The best way to learn how to code is to do something useful, so this introduction to Python is built around **data analysis**.

The data we will be using in this workshop comes from [Gapminder](https://www.gapminder.org/), an independent educational non-proﬁt. The dataset contains data for 142 countries, with values for life expectancy, GDP per capita, and population, every five years, from 1952 to 2007. It looks like this:

<img src="../img/gap_ex.png" alt="gapminder_data" width="500"/>

Imagine you're a data scientist wanting to perform an exploratory data analysis on this dataset, using the basics of Python. By the end of this workshop series, you'll be able to do so.

<a id='jupyter'></a>
# Working With Jupyter

We use Jupyter Lab as our interface for coding. The document you are looking at is called a **Notebook**: it allows us to compile all aspects of a data project in one place, making it easier to show the entire process of a project to your intended audience! 

In Jupyter Notebook documents, code and text are divided into cells which can each be run separately.

Jupyter Notebooks consist of **code cells** and **Markdown cells**. You can change the type of a cell in the top of the screen where it says "Markdown" or "Code".

## Code Cells

The cell below is a code cell. Press **Shift + Enter** on it to run the code and advance to the next cell.<br>

In [None]:
print('Welcome to D-Lab!')

The previous cell contains a **statement**. This is an instruction that the Python interpreter can execute.

`print()` is a **function**. A function is like a little program that performs an action on some value or data. You can often identify a function because of its trailing round parentheses `()`. The `print()` function just prints out whatever you put in between the parentheses.

## 🥊 Challenge 1 : My First Print Statement 

Write your own `print` statement in the code cell below. Follow the syntax of the example above, and change the text in the quotation marks.

In [None]:
# Write your own awesome print statement here!


🔔 **Question:** What does the below code cell do?

In [None]:
a = 1 + 2
print(a)

## 🥊 Challenge 2: Executing Cells Multiple Times

Try using **Shift + Enter** to run the following cell three times. What is the output?

In [None]:
a = a + 1
print(a)

💡 **Tip:** In Jupyter, you don't always need a print statement. Jupyter will automatically output the result of the **last evaluation** in a cell to your screen. When you just want to check what value a certain variable holds, you can just type the variable name and run the cell. Let's try it:

In [None]:
a

## Don't Forget the Brackets

Below, we use a function name without parentheses. What does the output say? 

In [None]:
print

💡 **Tip**: This is referring to the stored function in memory. In order for the function to actually run, it must be called with `()`.

## Markdown Cells

This cell is written in **Markdown**, a simple markup language. We use it to narrate the workshop and provide context. 

Markdown has its own syntax, which is fairly straighforward. Here's a [cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) if you want to know more. You can also double-click on any of the Markdown cells in the Notebook to see how they are made.

**💡 Tip:** Double click this text cell to see the Markdown code rendering the output. Press **Shift + Enter** to go back to the formatted text.

### 💡 Tip: Command Mode and Edit Mode 

If you want to create new empty cells, you can use the `+` button at the top of the Notebook. 

Enter **Edit Mode** by pressing **Enter** or using the mouse to click on a cell’s editor area. Edit Mode allows you to edit cells.<br>
Enter **Command Mode** by pressing **Esc** or using the mouse to click outside a cell's editor area. Command Mode allows you to use shortcut keys:

* **a**: Add cell above  
* **b**: Add cell below  
* **dd**: delete cell (also click `Edit -> Undo Cell Operation` if needed)
* **m**: Convert cell to Markdown  
* **y**: Convert cell to Code  

## The Kernel

One of the tabs in the Jupyter menu is "Kernel". This kernel is the computational engine that executes the code contained in a Jupyter Notebook. Each time you run a code block, the kernel processes that block, executes the code, and keeps a record of what was run.
 
💡 **Tip**: Jupyter remembers all lines of code it executed, **even if it's not currently displayed in the Notebook**. Deleting a line of code or changing it to Markdown does not delete it from the Notebook's memory if it has already been run! This can cause a lot of confusion.

### Restarting the Kernel

To clear your session in a Jupyter Notebook, use `Kernel -> Restart` in the menu. The kernel is basically the program actually running the code, so if you reset the kernel, it's as if you just opened up the Notebook for the first time. **All of the variables you set are lost.**

First, run the cell below. What is the output?

In [None]:
mystring = 'I am just a string.'

print(mystring)

 Now use `Kernel -> Restart` in the menu! Then run the code below. What happens?

In [None]:
print(mystring)

Note that the error message tells you where the error happened (with an arrow, no less!). It is telling us that `mystring` is not defined, since we just reset the kernel.

💡 **Tip**: The number next to each code cell is a **counter**. It gives you a record of what order you ran your cells in. To reset the counter, you have to reset the kernel.

## Jupyter Autocomplete

Jupyter Notebooks also allow for tab completion, just like many command line interpreters and text editors. If you begin typing the name of something (e.g., a variable, a file in the current directory, etc.) that already exists, you can simply hit **Tab** and Jupyter will autocomplete it for you. If there is more than one possibility, it will show them to you and you can choose from there. For example:

In [None]:
test_me = 1
test_me_2 = 2

🔔 **Question:** Try typing `te`, and see what happens when you hit `TAB`! What are you seeing?

In [None]:
# YOUR CODE HERE



## Exiting Jupyter

When you close your Jupyter Notebook window, all of your values will be lost. But you can save your code for a later time.

First go to `File -> Close and Shutdown Notebook` in order to shut down the Notebook you are using and close its window. Once all Notebooks are shut down, you can shut down the entire Jupyter server by closing Anaconda Navigator. You may get a warning dialog box alerting you that Jupyter is still running. Just click **Quit** to shut everything down.

<a id='variables'></a>

# Variables in Python

You may have noticed our usage of the equals sign, `=`, in previous cells to create "placeholders" for certain values. We were creating **variables**.

*   Variables are containers for values that we want to refer to again later in the code.
*   In Python, the `=` symbol assigns the value on the right to the name on the left.
*   The variable is created when a value is assigned to it. When you call the variable, it will refer to whatever value it currently holds.

Here's Python code that assigns a year to a variable `year` and a month in quotation marks to a variable `month`.

In [None]:
year = 1972
country = 'Afghanistan'

# Printing our new variables
print(year)
print(country)

💡 **Tip:** To see which variables we have assigned, use the magic commands `%who` and `%whos`. Magic commands are Jupyter-specific: [read about all of them here](https://ipython.readthedocs.io/en/stable/interactive/magics.html). There are a lot of really useful ones!

In [None]:
# This is a magic command
%whos

💡 **Tip**: If your kernel crashes (or if you reset it), you will **lose the variables** that have been set!

## Any Comments?

You might have seen a pound sign `#` at the beginning of the code celll above. This is a **comment**, meaning that line of your code won't run.

🔔 **Question:** Try running the cell below, then comment out the last two lines and run it again. What changes?


In [None]:
print(1 + 1)

This line should be commented
this_variable_too = 'a' * 'b'

## Indenting

Consistent indentation is essential in Python. Python pays close attention to blank spaces, and uses this to understand how you're structuring code. So, you're only supposed to add spaces or indents in places where Python expects you to - otherwise, you'll run into an error.

## 🥊 Challenge 3: Indenting

Read the error message down below. What is the error type? How can we fix it?

In [None]:
right_indent = 1
    wrong_indent = 'abc'

## Naming Variables

Variable names ***must*** follow a few rules:

* They cannot start with a digit.
* They cannot contain spaces, quotation marks, or other punctuation.
* They *may* contain an underscore (typically used to separate words in long variable names).
    
Ignoring these rules will result in an error in Python. 

## 🥊 Challenge 4: Debugging Variable Names

The following two blocks of code include variable names that cause an error. For each block of code, consider the following questions:
1. Which rule is being broken? Can you find this information in the error message?
2. What guidelines aren't being followed? 
3. How would you change the code?

In [None]:
country-1 = 'Zimbabwe'
2b = 'Africa'

print(country-1, 'is a country in', 2b)

In [None]:
avg_Age = 30.332
anotherVariable = 28.801
print(avg_age - another_variable)

## Debugging

You've seen two types of errors by now: `SyntaxError` (you're writing something wrong) and `NameError` (the variable, function, or module you're calling doesn't exist). There are many other errors which we will go over later. For now, just remember: <span style="color:red">**error messages are your friend!**</span>

When you want to try and debug an error, think of the following:

1. **Read the errors!** Especially the end of the error message. It gives you a summary about what went wrong, and in which line the error is found. 
2. **Check your syntax.** You might just be spelling something wrong.
3. **Read the documentation.** You might just be using a function in a wrong way. Get into the habit of reading documentation. It looks daunting at first but will get easier over time. 

💡 **Tip:** When you're programming, most of your time will be spent debugging, looking stuff up (like program-specific syntax, [documentation](https://github.com/dlab-berkeley/python-intensive/blob/master/glossary.md#documentation) for packages, useful functions, etc.), or testing. Relatively little time is actually spent typing out the code.

## 🥊 Challenge 5: What The...

What does the following error seem to tell you? Google the error and see if you can fix it!

In [1]:
print('something went wrong)

SyntaxError: EOL while scanning string literal (2323373726.py, line 1)

<a id='calculate'></a>
# Calculating With Variables

*   The key feature of variables is that we can use them just as if they were values.
*   **Operators** (special symbols that perform calculations) are shown in purple in a Jupyter Notebook. These are special symbols that tell Python to perform certain operations.
* **Functions** are processes that perform operations on variables. Python has built-in functions like `print()` that are always available to use.

Let's check out some common operations below. 

🔔 **Question:** What outputs do you expect for each line below? Note what values get substituted in for the variables in each operation. 

In [None]:
year = 1952
pop = 60000
gdpPercap = 241
gdpTotal = 14460000

# Addition
year = year + 3
print('Addition', year)

# Subtraction
year = year - 3
print('Subtraction:', year)

# Multiplication
print('Multiplication:', pop * gdpPercap)

# Division
print('Division:', gdpTotal / pop)

💡 **Tip**: As you can see above, comments can be useful as reminders to yourself when writing code. Feel free to add comments to the code cells in these Notebooks!

## 🥊 Challenge 6: Swapping Values

Let's say we have two variables and we want to swap the values for each of them. 

Does the following method accomplish the goal?

💡 **Tip**: What is the value of first and last at the end of the cell?

In [None]:
start = 1997
end = 1952

start = end
end = start

Using a third temporary variable (you could call it `temp`), swap the first and last variables, so that `start = 1952` and `end = 1997`.

In [None]:
start = 1997
end = 1952

# YOUR CODE HERE


This is a common technique that is used for swapping variables around. However, often we might choose to just use new variables, rather than overwrite the ones here.

<a id='demo'></a>

# 🎬 Demo: Working With Data Frames

To cap off this session, here's a demo to see what reproducible data science with Python looks like.
Just run the code cell below, and don't worry if you don't understand everything!

We'll be using a `pandas` data frame to store and manipulate the data - you'll learn more about `pandas` in the next workshop!

Let's have a look at the data:

In [None]:
import pandas as pd

# Reading in a comma-seperated values file
chis_df = pd.read_csv('../data/gapminder-FiveYearData.csv')
chis_df.head()

Let's focus on the relation between GDP per capita (`gdpPercap`) and life expectancy (`lifeExp`). Using `pandas`, we can:

1. Check out if there's a correlation between variables.
2. Plot that correlation.

In [None]:
# Getting the correlation
df['lifeExp'].corr(df['gdpPercap'])

In [None]:
# Plotting
df.plot(kind='scatter', x='lifeExp', y='gdpPercap');

<div class="alert alert-success">

## ❗ Key Points

* Jupyter has markdown and code cells. Change the type of cell in the dropdown box on top of your screen.
* Use `variable = value` to assign a value to a variable in order to record it in memory.
* `print()` is a built-in function. Functions can be recognized by their trailing parenthesis. 
* Use `print(variable)` to display the value of `variable`.
* Use `# some kind of explanation` to add comments to your code.
* In the menu bar, go to `Kernel -> Restart Kernel` to restart Python. All your assigned variables will be lost.
     
</div>