# A Quick and Dirty Intro to Python

In this notebook you'll work with a wide variety of the most commonly employed tools in python. We'll begin with a brief overview of the history of python, how to set up your python environment and then declare your first variables! We'll talk about the different data types, and some of the more advanced data structures as well as foundational syntax such as for loops, conditionals and keywords. Then, we'll go into Numpy and play around with the new data structure it offers. We'll explore the uses of numpy and why exactly it shows up in every data science project out there. Finally, we'll go over pandas and how to use it to represent your data in a convenient way. A rough and guiding list of topics can be seen bellow: 
* Python Basics
    * Historical background
    * Variable Declarations
    * Indentation
    * Conditionals
    * Loops
    * Keywords
    * Data types
* Numpy Basics (https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf)
    * Multidimensional Arrays
    * Extended Operators
    * Array Manipulation
    * Saving/Loading Arrays
    * Memory Mapping
    * Data Types
* Pandas Basics
    * Dataframes
    * Pre-processing
    * Feauture Engineering
    * Using Numpy Arrays
    * CSV Conversion
    


## Python Basics

### Data Types

#### Numbers

Let's start off with defining data types. These are what ultimately drive a programming language, and tend to vary between languages. One broad category of data types is that of numbers. In Python, we have 3 types of numbers: integer (`int`), floating-point (`float`), and complex (`complex`).

In [9]:
a= 3 # Integer/int

print(a, "is of type", type(a))

a= 3.1 # Floating-point/float

print(a, "is of type", type(a))

a= 3.1+4j # Complex/complex

print(a, "is of type", type(a))

3 is of type <class 'int'>
3.1 is of type <class 'float'>
(3.1+4j) is of type <class 'complex'>


**Remark:** Notice how we never declared the *type* of the variable `a`, and that python had no issue putting a float in `a` even though it was previously an integer. This is because Python is a **dynamically typed** language.

**Remark:** Notice that we don't include semi-colons ";" at the end of our lines. In python, you can use semicolons if you'd like, but semicolons are *not* required and in fact, most people omit them entirely.

For this workshop, we won't work much with complex numbers since their use case for most people taking this workshop will be exceptionally limited. We'll now focus on just floats and integers. Let's see some basic operations on these numbers.

In [50]:
a=2
b=3.5
c=6

print(a+b) # Basic addition. If one argument is a float, even if the other is an integer, the result gets "promoted" to a float

print(c/a) # Float division. Even when a evenly divides c and both are integers, 
           # c/a is still a float for the sake of maintaining precision

print(b/c) # The output is a float with finite precision (no infinite repeating decimals) 

print(b//c) # Integer division. This acts as regular float division but rounds down

print(c/a) # If both arguments for division are integers, then the float division still outputs a float

print(c//a) # But the integer division results in an integer!

5.5
3.0
0.5833333333333334
0.0
3.0
3


Aside from numbers, we also have a data type known as a boolean:

In [8]:
a=True # Boolean True

print(a, "is of type", type(a))

a=False # Boolean False

print(a, "is of type", type(a))

True
False


#### Sequences

Next we have what're known as "Sequence" data types. These are incredibly robust and will appear quite frequently, so take note! We have 3 types: String (`str`), `list`, and `tuple`. They all are collections of elements, with Strings being restricted to being a collection of individual characters while lists and tuples can be composed of any elements. In order to access an element in the collection, you must refer to its "index" which is just the element's location within the collection. The first element is at index 0, the second at index 1, etc... 

In [55]:
a="This is a String" # String/str

print(a[0]) # Access element at index 0
print(a[1]) # Access element at index 1
print(a[2]) # Access element at index 2
print(a[3]) # Access element at index 3

print(a, "of type", type(a))

a='This is also a String' # String/str

print(a, "of type", type(a))

a='''This is still a String''' # String/str

print(a, "of type", type(a))

a=[1,2+1j,3.14,"Four"] # List/list

print() # New line
print(a[0]) # Access element at index 0
print(a[1]) # Access element at index 1
print(a[2]) # Access element at index 2
print(a[3]) # Access element at index 3

print(a, "is of type", type(a))

a=("One",2.71,3+4j,4,5,6) # Tuple/tuple

print()
print(a[0]) # Access element at index 0
print(a[1]) # Access element at index 1
print(a[2]) # Access element at index 2
print(a[3]) # Access element at index 3

print(a, "is of type", type(a))

T
h
i
s
This is a String of type <class 'str'>
This is also a String of type <class 'str'>
This is still a String of type <class 'str'>

1
(2+1j)
3.14
Four
[1, (2+1j), 3.14, 'Four'] is of type <class 'list'>

One
2.71
(3+4j)
4
('One', 2.71, (3+4j), 4, 5, 6) is of type <class 'tuple'>


There's one final data type known as a dictionary (`dict`) which operates off of `key:value` pairs. 

In [36]:
a={0:"Zero",1:"One", 3:"Three", "Four":4} # Dictionary/dict

print(a, "is of type", type(a))

print(a[0]) # Access value with KEY 0

print(a[1]) # Access value with KEY 1

#print(a[2]) # Access value with KEY 2, but no such value exists!

print(a[3]) # Access value with KEY 3

#print(a["3"]) # Access value with KEY "3", but no such value exists!

print(a["Four"]) # Access value with KEY "Four"

#print(a["four"]) # Access value with KEY "four", but no such value exists!

{0: 'Zero', 1: 'One', 3: 'Three', 'Four': 4} is of type <class 'dict'>
Zero
One
Three
4


**Aside:** data types have a property known as *mutability* or its opposite, *immutability*. We say a data type is *mutable* when it can be modified after being created. It's important to remember that data types are simply encoded bits occupying memory somewhere in our machine. To say its *mutable* is to say that we can change its value mid-process without changing what entity we refer to (made clearer in example), while it is *immutable* if it cannot be modified after being created. The following are *immutable* data types: number values (including booleans), strings, tuples and frozensets. In this workshop we won't really touch frozensets, but it's worth noting that they, true to their name, are frozen and *immutable*. 

In [32]:
#Example of immutability

x=10 # Initializing x with 10

print("x:",id(x)) # Printing x's ID

y=x # Setting y=x

print("y:",id(y)) # Printing y's ID. Since y=x, y points to x

y=y+1 # Changing y's value

print("y+1:",id(y)) # Since y no longer points to the value of x, its ID changes

print("x:",id(x)) # x is unaffected (original ID for comparison)

x: 140732641809504
y: 140732641809504
y+1: 140732641809536
x: 140732641809504


### Control Flow

#### Logic

Conditionals and logic form the building blocks for more complex, and generally more useful, programs. We've already mentioned booleans `True` and `False`, but how do we use them? To manipulate them, we introduce the *logical operators*: `not`, `and`, and `or`. If you've used other programming languages, it may be a bit weird to see the *logical operators* be written words, and not a mess of symbols like `!`, `&&`, `||`, but python was made to be as readable as possible, and hence opted for written words instead. 

In [39]:
a= True
b= False

print(a and b) # True and False is False

print(a or b) # True or False is True

print(not a) # Not True is False

print(not b) # Not False is True

False
True
False
True


You aren't restricted to working with pure booleans either. Python has what's known as *comparison operators* which can be used in all sorts of situations and evaluates into booleans. Let's see them coded up.

In [58]:
a=3
b=7
c=3

print(a==c) # A simple equality. Note that it uses 2 "=" signs to differentiate it from assignment

print(a > 3) # Strictly greater than inequality. Note 3 is not strictly greater than 3 

print(a >= 3) # Greater than or equal to. 3 is not strictly greater than 3, but is equal to 3

print(b < a) # Strictly less than inequality

print(a <= 3) # Less than or equal to.

print(a!=3) # Not equal to
 

True
False
True
False
True
False


#### Conditionals

To harness these logical instruments, we use conditional code blocks such as `if`, `else`, and `elif`. Let's jump in!

In [70]:
a=3
b=7
c=3

if a==c: 
    print("Apparently a=c. Who would've thought?")

if a>3:
    print("Well this really shouldn't be printed")
elif a>2:
    print("Yeah hopefully this gets printed, or else we have an issue")
    
if b==c:
    print("If this gets printed either you messed with the values or this is broken")
elif b<=c:
    print("This also is wrong. Just wrong.")
else:
    print("I mean nothing else worked, so this is our default!")

Apparently a=c. Who would've thought?
Yeah hopefully this gets printed, or else we have an issue
I mean nothing else worked, so this is our default!


Now our if/else/elif are great and all, but they aren't the best solution in every single situation. Suppose we wanted to output the name of the month N. We could well use a lot of if/elif statements...

In [76]:
MONTH=3

if(MONTH==1):
    print("Jan")
elif(MONTH==2):
    print("Feb")
elif(MONTH==3):
    print("Mar")
#
#
#
#
#
#
#
elif(MONTH==12):
    print("Dec")
else:
    print("Some default text")

Mar


That's messy. There's a lot of overhead for a relatively simple task, but how could we do it better? Well, a dictionary! You may initially want to use a list or tuple, and that's a good instinct, but they're not as flexible as dictionaries. For one, in our case we have our months start at 1 and end at 12, but a list or tuple would index from 0 to 11. This is actually very simple to get around (we could just index by `MONTH-1`) but what if we had the opposite problem: given the name of the month, find its order in the year. Since we couldn't index by a String, lists and tuples wouldn't work. Dictionaries, however, have no such limitation! In general, if you are going to have many conditionals which simply evaluate the value of a variable against a fixed set of results, it may be worth switching to a dictionary.

In [77]:
switcher = {
        1: "January",
        2: "February",
        3: "March",
        4: "April",
        5: "May",
        6: "June",
        7: "July",
        8: "August",
        9: "September",
        10: "October",
        11: "November",
        12: "December"
    }

switcher[MONTH]

'March'