# Python Lists

As opposed to **int**, **bool** etc., a list is a `compound data type` i.e. **you can group values** together:

    a = "is"
    b = "nice"
    my_list = ["my", "list", a, b]

In [1]:
# area variables (in square meters)
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50

# Create list areas
areas = [hall, kit, liv, bed, bath]

# Print areas
print(areas)

[11.25, 18.0, 20.0, 10.75, 9.5]


#### Create list with different types

A list can contain any **Python type**. 

Although not common, a list can also contain a mix of Python types including strings, floats, booleans, etc.

In [1]:
# area variables (in square meters)
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50

# Adapt list areas
areas = ["hallway",hall,"kitchen",kit,"living room",liv,"bedroom", bed, "bathroom", bath, True]

# Print areas
print(areas)

['hallway', 11.25, 'kitchen', 18.0, 'living room', 20.0, 'bedroom', 10.75, 'bathroom', 9.5, True]


### Select the valid list

A list **can contain** any Python type. 

A list itself is also a Python type. That means that a list can also contain a list! 

        my_list = [el1, el2, el3]

### List of lists

As a data scientist, you'll often be dealing with a lot of data, and it will **make sense to group some of this data**.

Instead of creating a flat list containing strings and floats, representing the names and areas of the rooms in your house, `you can create a list of lists`. The script below can already give you an idea.

Don't get confused here: "hallway" is a string, while hall is a variable that represents the float 11.25 you specified earlier.

In [4]:
# area variables (in square meters)
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50

# house information as list of lists
house = [["hallway", hall],
         ["kitchen", kit],
         ["living room", liv],
         ["bedroom", bed],
         ["bathroom", bath]]

# Print out house
print(house)

# Print out the type of house
print(type(house))

[['hallway', 11.25], ['kitchen', 18.0], ['living room', 20.0], ['bedroom', 10.75], ['bathroom', 9.5]]
<class 'list'>


### Subset and conquer

Subsetting Python lists is a piece of cake. 

Index starts from 0.

Thecode sample below creates a list x and then selects "b" from it. 

We can also use negative indexing.

    x = ["a", "b", "c", "d"]
    x[1]
    x[-3] # same result!

In [5]:
# Create the areas list
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]

# Print out second element from areas
print(areas[1])

# Print out last element from areas
print(areas[-1])

# Print out the area of the living room
print(areas[5])

11.25
9.5
20.0


### Subset and calculate

Subsetting - selecting a single value from a list

After you've extracted values from a list, you can use them to perform additional calculations. 

Take this example, where the second and fourth element of a list x are extracted. 

The strings that result are **pasted together** using the + operator:

        x = ["a", "b", "c", "d"]
        print(x[1] + x[3])

In [6]:
# Create the areas list
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]

# Sum of kitchen and bedroom area: eat_sleep_area
eat_sleep_area = areas[3] + areas[7]

# Print the variable eat_sleep_area
print(eat_sleep_area)

28.75


### Slicing

Sliciung : is selecting **multiple elements** from your list. 

syntax:

    my_list[start:end]
    
The **start index will be included**, while **the end index is not**.

The code sample below shows an example. A list with "b" and "c", corresponding to indexes 1 and 2, are selected from a list x:

    x = ["a", "b", "c", "d"]
    x[1:3]
    
    results : ['b', 'c']
    
The elements with index 1 and 2 are included, while the element with index 3 is not.

In [1]:
x = ["a", "b", "c", "d"]
y = x[1:3]
y

['b', 'c']

### Slicing and dicing (2)

It's also possible **not to specify** indexes. 

If you don't specify the begin index, Python will start your slice at the beginning of the list. 

If you `don't specify the end index`, the slice `will go all the way to the last element` of your list.

In [2]:
x = ["a", "b", "c", "d"]
print(x[:2])
print(x[2:])    
print(x[:])    

['a', 'b']
['c', 'd']
['a', 'b', 'c', 'd']


### Subsetting lists of lists

`To subset lists of lists`, you can use the same technique as before: `square brackets`. 
 
results in a list, that you can subset again by adding additional square brackets.

The first sq. bracket will access the `outer list` , the second will the `inner`.

In [3]:
x = [["a", "b", "c"],
     ["d", "e", "f"],
     ["g", "h", "i"]]
print(x[2][0])
print(x[2][:2])
print(x[2])

g
['g', 'h']
['g', 'h', 'i']


### Replace list elements

To replace a list element:

1. subset the list and 

2. assign new values to the subset. 

You can select single elements or you can change entire list slices at once.

In [5]:
x = ["a", "b", "c", "d"]

x[1] = "r"
print(x)
x[2:] = ["s", "t"]
print(x)

['a', 'r', 'c', 'd']
['a', 'r', 's', 't']


### Extend a list

is to add elements to a list.

You can use the **+ operator**.

In [6]:
x = ["a", "b", "c", "d"]
y = x + ["e", "f"]

print(y)

['a', 'b', 'c', 'd', 'e', 'f']


### Delete list elements

To `remove elements` from your list use `del()` function.

    x = ["a", "b", "c", "d"]
    del(x[1])
    
`Pay attention here:` after deleting an element,  the indexes of the remaining elements will update automatically!

In [7]:
x = ["a", "b", "c", "d"]
del(x[1])
print(x)

['a', 'c', 'd']


### Copying a list

To copy a  list we have can use two techniques:

**1. using list() function**
    
    x_copy=list(x)

**2. using .copy() method**

    x_copy =x.copy()

In [12]:
#wrong way
# Create list x
x = [1,2,3]

# Create x_copy
x_copy = x

# Modify the copy
x_copy[0] = 5.0

# Print areas
print(x) 

# the original list remains un affected

[5.0, 2, 3]


In [13]:
# Create list x
x = [1,2,3]

# Create x_copy
x_copy = list(x)

# Modify the copy
x_copy[0] = 5.0

# Print areas
print(x) 

# the original list remains un affected

[1, 2, 3]


In [14]:
# Create list x
x = ['a','e','i','o']

# Create x_copy
x_copy = x.copy()

# Modify the copy
x_copy[0] = 5.0

# Print areas
print(x) 

# the original list remains un affected

['a', 'e', 'i', 'o']


# Functions

### Familiar functions

Out of the box, Python offers a bunch of `built-in functions`. You already know two such functions: `print()` and `type()`. 

You've also used the functions `str()`, `int()`, `bool()` and `float()` to switch between data types. 


The general recipe for calling functions and saving the result to a variable is:

    output = function_name(input)
    
    Note this works only ehn the function has a return type.

In [7]:
# Create variables var1 and var2
var1 = [1, 2, 3, 4]
var2 = True

# Print out type of var1
print(type(var1))

x = len(var1)
# Print out length of var1
print(x)
print(type(x))

# Convert var2 to an integer: out2
out2 = int(var2)
print(out2)

<class 'list'>
4
<class 'int'>
1


### Help!

Maybe you already know the name of a Python function, but you still have to figure out how to use it. 

To ask for information about a function use:

1. help() function  **help(max)** or 

2. ? before the function name. **?max**

In [15]:
?max

### Multiple arguments

Square brackets is **optional**. 

### **Default Values**

Default values are also optional, we don't need to explicityly specify. 

e.g.

`key=None` means that if you don't specify the key argument, it will be None. 

`reverse=False` means that if you don't specify the reverse argument, it will be False.

In [14]:
# Create lists first and second
first = [11.25, 18.0, 20.0]
second = [10.75, 9.50]

# Paste together first and second: full
full = first + second

# Sort full in descending order: full_sorted
full_sorted = sorted(full,reverse= True)

# Print out full_sorted
print(full_sorted)

[20.0, 18.0, 11.25, 10.75, 9.5]


### String Methods

Note that each data type has its own **associated method!**

Strings come with **a bunch of methods**. 

e.g 
    
    .upper()
    .count()
    .lower()
    
    
    etc

In [15]:
# string to experiment with: place
place = "poolhouse"

# Use upper() on place: place_up

place_up = place.upper()
# Print out place and place_up
print(place)
print(place_up)

# Print out the number of o's in place
print(place.count('o'))

poolhouse
POOLHOUSE
3


### List Methods

Lists, floats, integers and booleans are also types that come packaged with a bunch of useful methods. 

e.g. list methods

    index(), to get the index of the first element of a list that matches its input and
    count(), to get the number of times an element appears in a list.
    append(), that adds an element to the list it is called on,
    remove(), that removes the first element of a list that matches the input, and
    reverse(), that reverses the order of the elements in the list it is called on.

In [16]:
# Create list areas
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Print out the index of the element 20.0
print(areas.index(20.0))

# Print out how often 9.50 appears in areas
print(areas.count(9.50))

2
1


### Q. What is a Module?

Consider a module to be the same as a code library.

A file containing a set of functions you want to include in your application.

https://www.w3schools.com/python/python_modules.asp

### Create a Module

To create a module just save the code you want in a file with the file extension .py

In [17]:
#Save this as mymodule.py

#def greeting(name):
  #print("Hello, " + name)

### Use a Module
Now we can use the module we just created, by using the import statement:

In [19]:
import mymodule

mymodule.greeting("Jonathan")

Hello, Jonathan


### Packages in Python

A package is a collection of Python modules and a module is a single Python file.

E.g from math package we can import the exact value of pi

In [22]:
# Definition of radius
r = 0.43

# Import the math package
import math
# Calculate C
C = 2* math.pi*r

print(math.pi)
# Calculate A
A = math.pi*(r**2)

# Build printout
print("Circumference: " + str(C))
print("Area: " + str(A))

3.141592653589793
Circumference: 2.701769682087222
Area: 0.5808804816487527


### Selective import

General imports, like import math, will import the entire package. 

However, usually we need few things from a given package, in that case we selectively import a module that we need.

e.g import only pi from the math package

    from math import pi

In [23]:
# Definition of radius
r = 192500

# Import radians function of math package
from math import pi, radians, sqrt

# Travel distance of Moon over 12 degrees. Store in dist.
dist = r * radians(12)

root = sqrt(dist)
print(root)
# Print out dist
print(dist)

200.79119931179505
40317.10572106901


# Numpy and NumPy Array

Numpy is a powerful package to do data science.

A list baseball has already been defined in the Python script, representing the height of some baseball players in centimeters. 

Add some code to create a numpy array from it.

In [27]:
# Create list baseball
baseball = [180, 215, 210, 210, 188, 176, 209, 200]

# Import the numpy package as np
import numpy as np

# Create a numpy array from baseball: np_baseball
np_baseball = np.array(baseball)

# Print out type of np_baseball
print(type(np_baseball))

<class 'numpy.ndarray'>


### Generate data

    Arguments for np.random.normal()
    distribution mean
    distribution standard deviation
    number of samples

In [31]:
import numpy as np

height = np.round(np.random.normal(1.8, 0.2, 5000), 2)
weight = np.round(np.random.normal(60.32, 15, 5000), 2)
np_city = np.column_stack((height, weight))

print(height[0:10])
print(weight[0:10])

[1.72 1.82 1.95 1.54 1.78 1.87 2.08 2.02 1.6  2.28]
[56.08 76.71 68.71 63.73 63.94 81.57 38.09 54.14 96.17 28.4 ]


In [32]:
print(np_city[0:10])

[[ 1.72 56.08]
 [ 1.82 76.71]
 [ 1.95 68.71]
 [ 1.54 63.73]
 [ 1.78 63.94]
 [ 1.87 81.57]
 [ 2.08 38.09]
 [ 2.02 54.14]
 [ 1.6  96.17]
 [ 2.28 28.4 ]]


In [42]:
?np.random.normal

### meter to cm

In [35]:
# height is available as a regular list
  
# Import numpy
import numpy as np

# Convert height to height_cm: 
height_cm = height *100

# Print np_height_m
print(height_cm)

[172. 182. 195. ... 171. 175. 173.]


### BMI

In [36]:
# Import numpy
import numpy as np


# Calculate the BMI: bmi
bmi = weight / (height**2)

# Print out bmi
print(bmi)

[18.95619254 23.15843497 18.06969099 ... 23.27895763 20.48326531
 16.74629958]


### Subsetting Numpy array

We can subset numpy arrays the we we substet regular lists. 

In [57]:
x = [4 , 9 , 6, 3, 1]

x[1]
    
import numpy as np
    
y = np.array(x)
y[1]

9

### Slicing numpy array

In [58]:
import numpy as np

#slicing works same way as list
xx[:2]

array([1, 2])

For numpy we can also use boolean numpy arrays:

In [44]:
import numpy as np
x = [4 , 9 , 6, 3, 1]

y = np.array(x)

high = y > 5  # returns boolen series
print(high)

print(y[high])

[False  True  True False False]
[9 6]


In [38]:
import numpy as np

# Create the light array
light = bmi < 21

# Print out light
print(light)

[ True False  True ... False  True  True]


In [39]:
# Print out BMIs of all baseball players whose BMI is below 21
print(bmi[light])

[18.95619254 18.06969099 20.18053276 ...  9.83792419 20.48326531
 16.74629958]


### NumPy Side properties

First Numpy arrays cannot contain elements with different types buts list can.

If you try to build such a list, some of the elements' types are changed to end up with a homogeneous list. This is known as `type coercion`.

Second, the typical arithmetic operators, such as +, -, * are element-wise operations for numpy arrays but for not for lists.

In [47]:
# type coercion

np.array([True, 1, 2]) + np.array([3, 4, False])

# boolean focrced into int  value

array([4, 5, 2])

In [54]:
#numpy arithematic

x = [1,2,3]
y = [2,2,2]

# simply will concatenate or no element wise opration
print(x+y)

import numpy as np

xx = np.array(x)
yy = np.array(y)

xx+yy # element wise addition

[1, 2, 3, 2, 2, 2]


array([3, 4, 5])

### 2D Numpy Arrays

In [59]:
# Create baseball, a list of lists
baseball = [[180, 78.4],
            [215, 102.7],
            [210, 98.5],
            [188, 75.2]]

# Import numpy
import numpy as np

# Create a 2D numpy array from baseball: np_baseball
np_baseball = np.array(baseball)

# Print out the type of np_baseball
print(type(np_baseball))

# Print out the shape of np_baseball
print(np_baseball.shape)

print(np_baseball)

<class 'numpy.ndarray'>
(4, 2)
[[180.   78.4]
 [215.  102.7]
 [210.   98.5]
 [188.   75.2]]


### Subsetting 2D NumPy Arrays

Substetting for regular Python lists is a real pain. 

For 2D numpy arrays, however, it's pretty intuitive!

In [63]:
x = [["a", "b"], ["c", "d"]]
[x[0][0], x[1][0]]

['a', 'c']

In [67]:
# using numpy
import numpy as np
np_x = np.array(x)
np_x

array([['a', 'b'],
       ['c', 'd']], dtype='<U1')

In [69]:
np_x[:,0] # selects all values of the first column

array(['a', 'c'], dtype='<U1')

### 2D Arithmetic

In [72]:
import numpy as np
np_mat = np.array([[1, 2],[3, 4],[5, 6]])
np_mat * 2

array([[ 2,  4],
       [ 6,  8],
       [10, 12]])

### Numpy: Basic Statistics

One of the big reasons why Numpy arrays are way popular than mere lists is that we can perform statistical calculations on them!

### Average versus median

In [75]:
import numpy as np
x = [1, 4, 8, 10, 12]
print(np.mean(x))
print(np.median(x))

7.0
8.0


In [77]:
import numpy as np
# Create np_height_in from np_baseball
np_height_in =np.array(np_baseball[:,0])

# Print out the mean of np_height_in
print(np.mean(np_height_in))

# Print out the median of np_height_in
print(np.median(np_height_in))

198.25
199.0


### We can explore our data using numpy + some stat

In [78]:
# np_baseball is available

# Import numpy
import numpy as np

# Print mean height (first column)
avg = np.mean(np_baseball[:,0])
print("Average: " + str(avg))

# Print median height. Replace 'None'
med = np.median(np_baseball[: , 0])
print("Median: " + str(med))

# Print out the standard deviation on height. Replace 'None'
stddev = np.std(np_baseball[: , 0])
print("Standard Deviation: " + str(stddev))

# Print out correlation between first and second column. Replace 'None'
corr = np.corrcoef(np_baseball[: ,0] , np_baseball[: , 1])
print("Correlation: " + str(corr))

Average: 198.25
Median: 199.0
Standard Deviation: 14.635146053251399
Correlation: [[1.         0.95865738]
 [0.95865738 1.        ]]
