# Learning python - Day 3

## Outline:

- Errors and how to read stuff
- Concept of Modules
- Numpy/pandas
    - Really basic/high level coverage


# Error!

## What is an error?

- An error is a mistake in your code that causes it to not run
- There are many different types of errors
- Errors are your friend! They tell you what is wrong with your code
- You will encounter errors. Many, many errors. So many errors...
    - Everyone does

## How to read an error

- Read the error from the bottom up
- The last line is the most important
- The last line tells you what type of error it is
- The second to last line tells you where the error is
- The third to last line tells you what caused the error

## Example errors

In [2]:
prant("hello world")

NameError: name 'prant' is not defined

In [3]:
x = 1
y = 2
z = (x + y

SyntaxError: unexpected EOF while parsing (<ipython-input-3-69a4f124dfba>, line 3)

In [5]:
print(math.pi())

NameError: name 'math' is not defined

# Python Modules

A module in Python is simply a file containing Python definitions and statements (just like an R library). Putting code into modules is useful because of the ability to import the module functionality into your script or IPython session. This generally will add functionality that you need for a specific task.

## What a module provides

Modules provide functions that you can use in your code. For example, the `math` module provides the `sqrt()` function that you can use in your code to compute the square root of a number. Modules also provide objects that you can use in your code. For example, the `math` module provides the `pi` object that you can use in your code to get the value of pi. Modules also provide classes that you can use in your code. For example, the `datetime` module provides the `datetime` class that you can use in your code to create datetime objects.



## How to Import a Module

To use a module in your code, you first need to import it (just like an R library). To import a module, use the `import` statement at the top of your code. You can also use the `from...import` statement to import specific attributes or functions from a given module.

In [1]:
import math



Now you can use the functions and variables defined in the `math` module:



In [2]:
print(math.pi)  # prints: 3.141592653589793

3.141592653589793




# Numpy

Numpy is a Python library used for working with arrays. It also has functions for working in the domain of linear algebra, fourier transform, and matrices.

## How to Import Numpy



In [3]:
import numpy as np



## Creating a Numpy Array



In [87]:
arr = np.array([1, 2, 3, 4, 5])
arr2d = np.array([range(10), range(10,20)])
arr3d = np.array(range(3**3)).reshape((3,3,3))
print("A 1 dimensional array:\n", arr, "\n") 
print("A 2 dimensional array:\n", arr2d, "\n")
print("A 3 dimensional array:\n", arr3d, "\n")

A 1 dimensional array:
 [1 2 3 4 5] 

A 2 dimensional array:
 [[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]] 

A 3 dimensional array:
 [[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]] 



multidimensional numpy arrays are sliced by tuple rather than multiple indexing

In [88]:
print(arr[2]) 
print(arr2d[1, 3])
print(arr3d[0, 2, 1])

3
13
7


In [55]:
linear = np.linspace(0.5, 5, num= 10)
logorithmic = np.logspace(0, 5, num= 6, base= 10)
geometric = np.geomspace(1, 100000, num= 6)

print(linear)
print(logorithmic)
print(geometric)

[0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5 5. ]
[1.e+00 1.e+01 1.e+02 1.e+03 1.e+04 1.e+05]
[1.e+00 1.e+01 1.e+02 1.e+03 1.e+04 1.e+05]




## Array Operations

Note the difference compared to normal lists.

In [56]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

print(arr1 + arr2)
print(arr1 + 10)
print(arr2 ** 0)

[5 7 9]
[11 12 13]
[1 1 1]


In [89]:
print(arr)
print(arr < 3)
mask = arr < 3
print(arr[mask])
arr[mask] = arr[mask] * 5
print(arr)

[1 2 3 4 5]
[ True  True False False False]
[1 2]
[ 5 10  3  4  5]


# Pandas

Pandas is a library used for data manipulation and analysis. It is used to extract data and store it as tables or named lists (series). It is an expansion of numpy arrays with names and labeled indecies.

- Series == named 1 dimensional numpy array
- Dataframe == named 2 dimensional numpy array or a list of Series that share an index

It can best be compared to either a spreadsheet (excel) or to dataframes or tibbles in R.

## Functionality that Pandas provides

- Reading and writing data
- Selecting subsets of data
- Calculating across rows and columns
- Finding and filling missing data
- Applying functions to data

## How to Import Pandas

In [59]:
import pandas as pd



## Creating a Pandas DataFrame Manually



In [60]:
data = {
    'apples': [3, 2, 0, 1],
    'oranges': [0, 3, 7, 2]
}

purchases = pd.DataFrame(data)

print(purchases)

   apples  oranges
0       3        0
1       2        3
2       0        7
3       1        2




## Reading Data from CSV File



In [64]:
df = pd.read_csv('titanic.csv')

print(df.head())

print(type(df))  # prints: pandas.core.frame.DataFrame

df.head()

   Survived  Pclass                                               Name  \
0         0       3                             Mr. Owen Harris Braund   
1         1       1  Mrs. John Bradley (Florence Briggs Thayer) Cum...   
2         1       3                              Miss. Laina Heikkinen   
3         1       1        Mrs. Jacques Heath (Lily May Peel) Futrelle   
4         0       3                            Mr. William Henry Allen   

      Sex   Age  Siblings/Spouses Aboard  Parents/Children Aboard     Fare  
0    male  22.0                        1                        0   7.2500  
1  female  38.0                        1                        0  71.2833  
2  female  26.0                        0                        0   7.9250  
3  female  35.0                        1                        0  53.1000  
4    male  35.0                        0                        0   8.0500  
<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,Survived,Pclass,Name,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
0,0,3,Mr. Owen Harris Braund,male,22.0,1,0,7.25
1,1,1,Mrs. John Bradley (Florence Briggs Thayer) Cum...,female,38.0,1,0,71.2833
2,1,3,Miss. Laina Heikkinen,female,26.0,0,0,7.925
3,1,1,Mrs. Jacques Heath (Lily May Peel) Futrelle,female,35.0,1,0,53.1
4,0,3,Mr. William Henry Allen,male,35.0,0,0,8.05


## Manipulate pandas dataframes

### Selecting Data

Most operations are not "inplace" by default. The original value isn't changed and you need to capture the output by either rewriting the old variable or making a new one.

In [30]:
df.head(10)

df.drop('Fare', axis = 1)

df.sort_values('Age', ascending=False)

df.iloc[[0, 1, 2, 3, 4, 5], [0, 3, 5]]

df.sort_values('Age', ascending=False).iloc[0:10]

Unnamed: 0,Survived,Pclass,Name,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
627,1,1,Mr. Algernon Henry Wilson Barkworth,male,80.0,0,0,30.0
847,0,3,Mr. Johan Svensson,male,74.0,0,0,7.775
490,0,1,Mr. Ramon Artagaveytia,male,71.0,0,0,49.5042
95,0,1,Mr. George B Goldschmidt,male,71.0,0,0,34.6542
115,0,3,Mr. Patrick Connors,male,70.5,0,0,7.75
669,0,2,Mr. Henry Michael Mitchell,male,70.0,0,0,10.5
741,0,1,Capt. Edward Gifford Crosby,male,70.0,1,1,71.0
535,0,3,Mr. Samuel Beard Risien,male,69.0,0,0,14.5
33,0,2,Mr. Edward H Wheadon,male,66.0,0,0,10.5
508,0,3,Mr. James Webber,male,66.0,0,0,8.05


Like numpy arrays dataframes are indexed by tuple. Unlike numpy arrays you should use the `.loc[]` method. Values are accessed by `{row label}, {column label}`

In [95]:
passenger_8586_fare = df.loc[[85, 86], ["Name", "Fare"]]
print(passenger_8586_fare)
seniors = df.loc[df["Age"] > 65, "Name"]
seniors

                            Name    Fare
85         Mr. William Neal Ford  34.375
86  Mr. Selman Francis Slocovski   8.050


33                    Mr. Edward H Wheadon
95                Mr. George B Goldschmidt
115                    Mr. Patrick Connors
490                 Mr. Ramon Artagaveytia
508                       Mr. James Webber
535                Mr. Samuel Beard Risien
627    Mr. Algernon Henry Wilson Barkworth
669             Mr. Henry Michael Mitchell
741            Capt. Edward Gifford Crosby
847                     Mr. Johan Svensson
Name: Name, dtype: object

Doesn't work in the wrong order

In [96]:
df.loc[["Name", "Fare"], [85, 86]]

KeyError: "None of [Index(['Name', 'Fare'], dtype='object')] are in the [index]"

## A very brief introduction to installing modules:

We have both numpy and pandas pre-installed for you, but if you need to use a module you don't have, you may first need to install it using pip.

You can also install modules using conda.

We will go into more detail about how to install modules on Friday.


In [None]:
# You don't need to actually run this. This is just an example.
# pip install numpy pandas