<a href="https://colab.research.google.com/github/brenngraham/python-tutorials/blob/main/Intro_to_Python_and_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro to Python
If you are interested in learning a programming language, Python is one of the best options since it is highly flexible, relatively easily readable, and extensively used in industry/government/academia.

If you're going to be using Python extensively, you should probably set up an IDE (Integrated Development Environment) on your computer. I would recommend [PyCharm](https://www.jetbrains.com/help/pycharm/installation-guide.html) as a good free option.

But, for the purposes of the Economics Society ML tutorial, we'll be doing all of the coding in the free version of Google Colab.

In Colab there are two kinds of cells:
- text cells (like this one)
- code cells (like the one below)

**To get started, let's write a simple "Hello World" program:**

In [None]:
# Script begins

print("Hello World!")

Hello World!


So, in the program above, the first line is a comment. All comments in Python start with a "#" and are not executed by the computer. They are a way for programmers to leave notes in their code to explain what's going on, what they need to work on in the future, and so on.

The second line is a print statement, which (just as the name implies) prints out something to the user. In this case, we are printing out a string (which is a kind of data in Python made of a series of characters).

**Now, let's look at how variables work:**

In [None]:
# Declare/initialize a variable to hold an integer (int)
x = 2
# Print the variable out to the user
print(x)

# Declare/initialize a variable to hold a decimal (double)
y = 5.1
# Print the variable out to the user
print(y)

# Declare/initialize a variable to hold a string (str)
myString = "helloworld"
print(myString)

2
5.1
helloworld


A varaible is basically just a way that you can store and assign a name to a value in your program. They come in a bunch of varieties, including some more complicated kinds called data structures. But, the three most important for you to know are integers, doubles, and strings.
- **Integer:** Also abbreviated as an "int" an integer in programming is defined the same way as it is in math. It is a number which can be expressed without a fraction and can be positive, negative, or zero.
- **Double:** These basically correspond to decimals in math. They are numbers which can be represented as a fraction and can be positive, negative, or zero. (Note: these variables cannot hold an infinite number of digits - so infinitely repeating numbers, very large numbers, and very small numbers may not be represented with perfect accuracy)
- **String:** A string is made up of a sequence of characters, which can include letters, numbers, symbols, and spaces. It is usually denoted by quotation marks.

One of the (usually) nice parts of Python is that it automatically detects what type your variables should be, so you don't need to explicitly specify the type.

You can also perform mathematical operations on the numeric variables, as shown below.

In [None]:
# An example of division in Python
x = 4
y = 12
z = 12/4
print(z)

3.0


**But sometimes, you want to get values from the user which you don't know in advance.**

In that case, you need to collect input from the user.

In [None]:
# Prompt the user for input and store the result in the variable name
name = input("Enter your name: ")

# Greet the user
print("Hello ", name)

Enter your name: Bob
Hello  Bob


**How about if you need the program to do different things depending on what the values are?**

This is called "conditional logic".

In [None]:
# Initialize the variable
num1 = int(input("Enter a number: "))

# Check if the number is greater than 12
if(num1>50):
  print("Num1 is big")
# Check if the number is greater than 35
elif(num1>35):
  print("Num1 is medium")
# If none of the previous conditions were true, do this
else:
  print("Num1 is small")

Enter a number: 2
Num1 is small


There are two basic types of conditional statements:
- **If:** The code within an if statement (indicated by the level of indentation) is executed if and only if the condition is true.
- **elif (else if):** The code within an elif statement is executed if the condition for that statement is true and all previous conditions were false.
- **else:** The code within an else statement is executed if all of the previous conditions were false.

**How exactly do those relational operators work?**

There are a variety of relational operators you can use in Python, most of which you are probably already familiar with from math.

All of these statements compare the relative value of numbers and return true or false.

In [None]:
a = 10
b = 20

# Less than: returns true if a is less than b, false otherwise.
print(a < b)

# Greater than: returns true if a is greater than b, false otherwise
print(a > b)

# Equal to: returns true if a is equal to b, false otherwise
print(a == b)

# Less than or equal to: returns true if a is less than or equal to b, false
# otherwise
print(a <= b)

# Greater than or equal to: returns true if a is greater than or equal to b,
# false otherwise
print(a >= b)

# Not equal to: returns true if a is not equal to b, false otherwise
print(a != b)

True
False
False
True
False
True


**What about logical operators?**

Logical operators are used to evaluate if certain logical conditions are met based on combinations of true and/or false conditions.

In [None]:
a = True
b = False

# And: returns true if both a and b are true, false otherwise
print(a and b)

# Or: returns true if either a or b (or both) are true, false otherwise
print(a or b)

# Not: returns the opposite of a
print(not a)

False
True
False


**What if you need to do something multiple times?**

If you would like to repeat a process multiple times, you need to use some sort of loop. There are two basic options. The first of these is a for loop:

In [None]:
# A for loop which iterates from 0 to 4
for step in range(5):
  print(step)


0
1
2
3
4


This kind of loop takes the variable which is named after the keyword "for" and sets it equal to each element of the object specified after the keyword "in". In this case, that object is a list of the integers between 0 and 5, not including 5 (which is produced by the range() function).

The other option is a while loop:

In [None]:
# Initialize the variable count
count = 0

# Execute the code over and over until count >= 5
while count < 5:
  print("Count: ", count)
  # Increment the value of count
  count += 1

Count:  0
Count:  1
Count:  2
Count:  3
Count:  4


The while loop above does basically the same thing as the for loop we looked at. But, you can also make a while loop which executes an unpredictable number of times:

In [None]:
num = 0

while num != 10:
  print("The number is: ", num)
  num = int(input("Enter a new number or 10 to quit: "))


The number is:  0
Enter a new number or 10 to quit: 5
The number is:  5
Enter a new number or 10 to quit: 7
The number is:  7
Enter a new number or 10 to quit: 9
The number is:  9
Enter a new number or 10 to quit: 11
The number is:  11
Enter a new number or 10 to quit: 13
The number is:  13
Enter a new number or 10 to quit: 10


**My program is getting kind of big... how do I divide it into different sections?**

Instead of writing your program in one large block, you can break it into chunks called functions which perform smaller tasks which you can string together to accomplish your goal. This can make your code easier to read, write, and understand.

Functions can have inputs and outputs of almost any data type.

In [None]:
# Definining a function with an input "n"
def factorial(n):
    result = 1
    for i in range(1, n + 1):
        result *= i
    # A return statement tells the computer what value should be sent back to
    # wherever the function was called from.
    return result

# This calls the "factorial" function and stores whatever gets returned in the
# variable "result".
result = factorial(5)

# Print the result to the user.
print("Factorial of 5 is:", result)


Factorial of 5 is: 120


You can also use functions to incorporate **recursion** into your code. This is when a function calls itself. It can get complicated to understand, but is often the most efficient way to solve a problem.

Specifically, it is useful if you have a task that can be broken down into smaller, similar tasks. Instead of solving each small task independently, you solve one small task and delegate the rest to the same function. This process continues until the smallest, simplest task is solved directly, and then the solutions are combined to solve the original, larger problem.

A classic example is calculating the factorial of a number. To find the factorial of 5, you can break it down into multiplying 5 by the factorial of 4, which is then broken down into multiplying 4 by the factorial of 3, and so on, until you reach the factorial of 1 (which is 1). Then, you combine these results to get the factorial of 5: 5 * 4 * 3 * 2 * 1.

Recursion can be a powerful and elegant way to solve certain problems, but it's important to define a base case (the simplest problem that can be solved directly) to prevent infinite recursion and ensure that the function eventually terminates.

In [None]:
# Defining a recursive function with an input "n"
def factorial(n):
    # This is the "base" condition where the recursion stops.
    if n == 0:
        return 1
    # This is where where the function calls itself, making it recursive.
    else:
        return n * factorial(n - 1)

# Example usage:
result = factorial(5)

# Print the result to the user.
print("Factorial of 5 is:", result)


Factorial of 5 is: 120


# Intro to Pandas
Pandas is an open-source library in Python which is very convenient for working with data.

(Fun fact: Pandas was created in 2008 by a guy named Wes McKinney who was working at AQR Capital Management)

**What are the advantages of Pandas?**

- Fast and efficient
- Can pull together data from different files easily
- Flexibly reshapes and pivots data sets
- Can handle time series

**Setting up Pandas**
To install Pandas on your computer manually, you will need to run the command "pip install pandas" (without the quotation marks) in the command window from the same folder as your python-pip file.

But, since we're using Colab, we can just import it. A lot of IDEs (including PyCharm) will also automate this process for you.

In [None]:
# Import the Pandas library and give it the nickname pd
import pandas as pd

You don't have to use the nickname (also called an alias), but it makes it a bit shorter to write out each time.

**Pandas data structures**

**Series:** a one-dimensional labeled array which can hold any type of data. It has axis labels called indexes and basically acts like a column in an Excel sheet. You can import data from a variety of file types (CSV, Excel, SQL, etc.) into a series.

Creating a series:

In [None]:
import numpy as np

# Create an empty series
ser = pd.Series()
print("Empty Pandas series: ", ser)

# Create a simple array
data = np.array(['e', 'c', 'o', 'n'])

# Put the array into a series
ser = pd.Series(data)

# The \n here is a "newline" character which tells the computer to put a line
# break wherever it is placed in the output.
print("Panda series with data: \n", ser)

Empty Pandas series:  Series([], dtype: float64)
Panda series with data: 
 0    e
1    c
2    o
3    n
dtype: object


  ser = pd.Series()


**Data frame:** is a two-dimensional tabular data structure which can change size and hold a wide variety of data types at once.

Creating a data frame:

In [None]:
# Calling the DataFrame function to make a data frame
df = pd.DataFrame()
print(df)

# Making a list of strings
lst = ['Econ', 'is', 'the', 'best', 'major']

# Turn the list into a DataFrame
df = pd.DataFrame(lst)
print(df)

Empty DataFrame
Columns: []
Index: []
       0
0   Econ
1     is
2    the
3   best
4  major


# Resources for additional exploration

[Google's Python Class](https://developers.google.com/edu/python)

[Introduction to Computer Science and Programming in Python (YouTube/MIT OCW) ](https://youtube.com/playlist?list=PLUl4u3cNGP63WbdFxL8giv4yhgdMGaZNA&si=pFWYe08E0h0hxVs0)

[Pandas user guide](https://pandas.pydata.org/docs/user_guide/10min.html)

[Codecademy: Introduction to Pandas and NumPy](https://www.codecademy.com/article/introduction-to-numpy-and-pandas)