# Python Fundamentals: Introduction to Jupyter and Python

* * * 
<div class="alert alert-success">  
    
### Learning Objectives 

* Understand the aims of this workshop.    
* Work with Jupyter Notebooks.
* Use variables to do calculations in Python.
* Read error messages to fix your code. 
</div>

### Icons Used in This Notebook
🔔 **Question**: A quick question to help you understand what's going on.<br>
🥊 **Challenge**: Interactive exercise. We'll work through these in the workshop!<br>
💡 **Tip**: How to do something a bit more efficiently or effectively.<br>
⚠️ **Warning:** Heads-up about tricky stuff or common mistakes.<br>
📝 **Poll:** A Zoom poll to help you learn!<br>
🎬 **Demo**: Showing off something more advanced – so you know what Python can be used for!<br>

### Sections
1. [This Workshop](#this)
2. [Working With Jupyter Notebooks](#jupyter)
3. [Variables in Python](#variables)
5. [Demo: Working With Data Frames](#demo)

<a id='this'></a>

# This Workshop

This workshop is called Data Science <i>for</i> Social Justice. Our goal is to build programmatic and data science skills, grounded in a politics of justice, to better inform our advocacy as machine learning and AI tools are increasingly used in society.

Even if you’re new to programming, the ability to collect and analyze data independently means you no longer have to take people in power at their word when they tell you what "the data" says. You can find data, evalute it, analyze it, and present your own findings. This will help you decide who you can trust, and help you find a voice in speaking truth to power. At the same time, you can speak with greater authority when data science and machine learning products are used improperly in society, because you yourself have developed the technical skills to interact with these products on a deeper level.

So, we begin by building a foundation in Python.

## What is Python?

Python is a general-purpose programming language. It can be used for many tasks, including building websites and software, automating tasks, data analysis, and more.

📝 **Poll**: Have you used Python, or another programming language before? What have you used it for?

This introduction to Python is built around **data analysis**, although many of these skills will be useful for other applications of Python.

The dataset we will be using in this workshop comes from the social media platform [Reddit](https://www.reddit.com). We will specifically use data from a subreddit, or a sub-community in Reddit, called [r/amitheasshole](https://reddit.com/r/AmItheAsshole). 

The subreddit describes itself as "A catharsis for the frustrated moral philosopher in all of us, and a place to finally find out if you were wrong in an argument that's been bothering you. Tell us about any non-violent conflict you have experienced; give us both sides of the story, and find out if you're right, or you're the asshole."

We've scraped many of the comments from this subreddit, and placed them in a file that looks something like this:

<img src="../../images/aita_ex.png" alt="gapminder_data" width="800"/>

Imagine you're a data scientist wanting to perform an exploratory data analysis on this dataset, using the basics of Python. By the end of this workshop, you'll be doing just that.

<a id='jupyter'></a>
# Working With Jupyter Notebooks

We use Jupyter as our interface for coding. The document you are looking at is called a **Jupyter Notebook**: it allows us to add code, computational output, visualizations and images, along with explanatory text in a single document. 

You can use Jupyter Notebooks for all sorts of data science tasks: from data cleaning and transformation, to exploratory data analysis and visualization, to machine learning, and more.

💡 **Tip**: You can create a new Notebook of your own by clicking on `File -> New -> Notebook`, or by clicking the blue `+` button.

## Two Types of Cells

In Jupyter Notebook documents, code and text are divided into cells which can each be run separately. There are two types of cells: **Markdown cells** and **code cells**. 

## Markdown Cells

This cell is written in **Markdown**, a language that you can use to add formatting elements to plain text – things like headings, bulleted lists, and URLs.

Markdown has its own syntax, which is fairly straighforward. Here's a [cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) if you want to know more. You can also double-click on any of the Markdown cells in the Notebook to see how they are made.

Let's double-click this text cell to see the Markdown code that's rendering this cell. Press **Shift + Enter** to go back to the formatted text.

If you want to create new empty cells, you can use the `+` button at the top of the Notebook. You can change the cell type at the top of the Notebook as well, where it says "Markdown" or "Code".

## Command Mode and Edit Mode 

Jupyter has two modes: Edit mode and Command mode. Edit mode allows you to type into the cells like a normal text editor. Command mode allows you to edit the notebook as a whole, but not type into individual cells.

- Enter **Edit Mode** by pressing **Enter** or double-clicking on a cell’s editor area.
- Enter **Command Mode** by pressing **Esc** or clicking outside a cell's editor area. In command mode you can use shortcuts to work with cells:
    - `c`: copy cell.
    - `v`: paste cell.
    - `d, d` (press key twice): delete cell.
    - `a`: insert cell above.
    - `b`: insert cell below.

## Code Cells

The cell below is a code cell. Press **Shift + Enter** on it to run the code and advance to the next cell.<br>

In [None]:
print('Welcome to D-Lab!')

`print()` is a **function**. A function is like a little program that performs an action on some value or data. You can identify a function through its trailing round parentheses `()`. The `print()` function just prints out whatever you put in between the parentheses.

## 🥊 Challenge 1: Printing

Write your own `print()` function in the code cell below. Follow the syntax of the example above, and change the text in the quotation marks.

In [None]:
# YOUR CODE HERE


<a id='variables'></a>

# Variables in Python

You can use Python as a calculator:

In [None]:
10 + 4 * 2

This is cool, but how can we save the result of this calculation?

In Python, we can assign a value to a **variable**, using the equals sign `=`. You can think of a variable as a "placeholder" or symbolic name for some value.

For example, we can assign the result of the above calculation to a variable called `a`.

📝 **Poll:** What do you think the output of the following code cell will be?

In [None]:
a = 10 + 4 * 2
print(a)

From now on, whenever you refer to `a`, Python will substitute the value we assigned to it. 

💡 **Tip**: In Jupyter, you don't always need to use `print` explicitly. When you want to check what value a certain variable holds, you can just type the variable name and run the cell. Let's try it:

In [None]:
a

## 🥊 Challenge 2: Executing Cells Multiple Times

Try using **Shift + Enter** to run the following cell three times. What is the output? Can you explain what is happening?

In [None]:
a = a + 1
print(a)

## Calculating With Variables

The key feature of variables is that we can use them just as if they were values. Let's check out some common operations below. 

🔔 **Question:** What outputs do you expect to be printed in the cell below? 

In [None]:
n_comments_post1 = 1000
n_comments_post2 = 250
n_posts = 355
n_posts_removed = 24
avg_score = 25.2
n_users = 583

# Addition
print(n_comments_post1 + n_comments_post2)

# Subtraction
print(n_posts - n_posts_removed)

# Multiplication
print(avg_score * n_posts)

# Division
print(n_posts / n_users)

You might have noticed the pound signs `#` in the code cell above. These are **comments**, meaning that line of your code won't run.

In [None]:
print(1 + 1)

This line should be commented
this_variable_too = 'a' * 'b'

## Naming Variables

Variable names **must** follow a few rules:

* They cannot start with a digit.
* They cannot contain spaces, quotation marks, or other punctuation.
* They *may* contain an underscore.
* They are case-sensitive (`sub_reddit` is not the same as `Sub_reddit`).

Ignoring these rules will result in an error in Python. 

## 🥊 Challenge 3: Debugging Variable Names

The following block of code includes variable names that cause an error. Consider the following questions:
1. Which rule is being broken? Can you find this information in the error message?
2. How would you change the code?

In [None]:
subreddit.1 = 'AskHistorians'
platform = 'Reddit'

print(subreddit.1, 'is a subreddit in', platform)

## Debugging

You've seen two types of errors by now: `SyntaxError` (you're writing something wrong) and `NameError` (the variable, function, or module you're calling doesn't exist). There are many other errors. **Don't be daunted by them!**

When you want to try and debug an error, think of the following:

1. **Read the errors.** Especially the end of the error message. It gives you a summary about what went wrong, and in which line the error is found. 
2. **Check your syntax.** You might just be spelling something wrong.
3. **Look for help.** You might just be using a function in a wrong way. Get into the habit of reading documentation and finding help online. We'll be doing this in the next workshops.

## 🥊 Challenge 4: What The...

What does the following error tell you?

In [None]:
print 'something went wrong'

When you're programming, most of your time will be spent debugging, looking stuff up, or testing. Relatively little time is actually spent typing out the code.

## The Kernel

The **kernel** is the computational engine that executes the code contained in a Jupyter Notebook. Each time you run a code block, the kernel processes that block, executes the code, and keeps a record of what was run.
 
⚠️ **Warning**: Jupyter remembers all lines of code it executed, **even if it's not currently displayed in the Notebook**. Deleting a line of code or changing it to Markdown does not delete it from the Notebook's memory if it has already been run! This can cause a lot of confusion.

### Restarting the Kernel

To clear your session in a Jupyter Notebook, use `Kernel -> Restart` in the menu. The kernel is basically the program actually running the code, so if you reset the kernel, it's as if you just opened up the Notebook for the first time. **All of the variables you set are lost.**

🔔 **Question:** Run the cell below. What is the output?

In [None]:
mystring = 'I am just a string.'

mystring

 Now use `Kernel -> Restart` in the menu! Then run the code below. What happens?

In [None]:
mystring

Note that the error message tells you where the error happened (with an arrow, no less!). It is telling us that `mystring` is not defined, since we just reset the kernel.

If you encounter problems like these, you should restart your kernel and rerun all cells in order.

💡 **Tip:** To see which variables we have assigned, you can use the magic command `%whos`. Magic commands are Jupyter-specific: [read about all of them here](https://ipython.readthedocs.io/en/stable/interactive/magics.html). There are a lot of really useful ones!

In [None]:
# This is a magic command
%whos

## 🥊 Challenge 5: Swapping Values

Let's say we have two variables and we want to swap the values for each of them, so that `start` is assigned to `2017` and `end` is assigned to `2023`. 

🔔 **Question**: Does the following method accomplish the goal?

In [None]:
start = 2017
end = 2023

start = end
end = start

print(start, end)

Using a third temporary variable (you could call it `temp`), swap the first and last variables, so that `start = 2023` and `end = 2017`.

In [None]:
start = 2017
end = 2023

# YOUR CODE HERE


<a id='demo'></a>

# 🎬 Demo: Working With Data Frames

To cap off this session, here's a demo to see what reproducible data science with Python and Jupyter looks like.

We'll be using a Pandas data frame to store and manipulate the data. Don't worry if you don't understand this yet – you'll learn more about `pandas` in Notebook 3!

First, let's have a look at the data:

In [None]:
import pandas as pd

# Reading in a comma-separated values file
df = pd.read_csv('../../data/aita_top_submissions.csv')
df.head()

Next, we will select only the rows in our data where the submission author wrote a long enough post, and assign those to a new variable. We'll then count the number of posts assigned to each "flair":

In [None]:
# Subsetting a data frame
df_posts = df[df['selftext'].str.len() > 20]
flair_counts = df_posts.value_counts('flair_text', normalize=True)
flair_counts.head(5)

Finally, we'll plot the top 5 most common flairs in a bar plot, showing the proportion of posts assigned each flair:

In [None]:
import matplotlib.pyplot as plt
# Create bar plot
flair_counts.head(5).plot(kind='bar')
# Change xticks
plt.xticks(rotation=15, horizontalalignment='right')
# Change labels
plt.xlabel('Flair Text', fontsize=15)
plt.ylabel('Proportion', fontsize=15)
plt.show()

<div class="alert alert-success">

## ❗ Key Points

* Jupyter has markdown and code cells.
* `print()` is a function. Functions can be recognized by their trailing parentheses. 
* Use `variable = value` to assign a value to a variable in order to store it in memory.
* Use `print(variable)` to display the value of `variable`.
* In the menu bar, go to `Kernel -> Restart Kernel` to restart Python. All your assigned variables will be lost.
     
</div>