## Python Workshop 2

Last time, we worked on the absolute basics of Python. We covered essential data types in Python and the built in functions that are included. This week we will continue, working off the basics we started with last time.

**We will cover:** 
- Functions 
- Dictionaries
- Numpy Basics
- Pandas Basics

### Functions in Python

+ Define a function using the def keyword
+ Enclose the function's arguments in ()
+ Follow the () with :
+ Indent the body of the function

In [1]:
# Defining the function
def exampleFunction():
    print("Hello, World!")
    
# Calling the function
exampleFunction()

Hello, World!


Notice that the function does not need to specify a return statement. In the example, it simply prints out the message.

In [2]:
# Example of a function with two arguments
def addTwoNumbers(a, b):
    return a + b

addTwoNumbers(9, 10) #19

19

We can also use the lambda function to make one line functions.

In [3]:
add = lambda x, y : x + y
add(9, 10)

19

In [4]:
def findEvens(list0):
    return [element for element in list0 if element % 2 ==0]

list1 = [1, 1, 1, 1, 1, 3, 4, 3, 3, 5, 5, 6, 6, 6, 6, 8]

findEvens(list1)

[4, 6, 6, 6, 6, 8]

In [5]:
# Create a function that finds a certain value and replaces it any number of times
def findAndReplace(list0, value=False, replacement=False):
    if value:
        index = list0.index(value)
        if replacement:
            list0[index] = replacement
            return list0

        return list0
    else:
        return list0
    
findAndReplace(list1, 4, 6)

[1, 1, 1, 1, 1, 3, 6, 3, 3, 5, 5, 6, 6, 6, 6, 8]

In [6]:
# Find and replace all the even elements with something
def findAndReplaceEven(list0, replacement):
    to_replace = findEvens(list0)
    
    while to_replace:
        curr = to_replace.pop(0)
        list0 = findAndReplace(list0, curr, replacement)
    
    return list0

findAndReplaceEven(list1, "Something")

[1,
 1,
 1,
 1,
 1,
 3,
 'Something',
 3,
 3,
 5,
 5,
 'Something',
 'Something',
 'Something',
 'Something',
 'Something']

### Dictionaries
Dictionaries in Python are a type of data structure that stores data in key-value pairs. They are similar to real-world dictionaries, where you look up a word (the key) to find its definition (the value).

Key Features of Dictionaries:
+ Key-Value Pairs: Each item in a dictionary has a key and a corresponding value.
+ Unique Keys: Keys in a dictionary must be unique and immutable (like strings, numbers, or tuples).
+ Mutable: You can add, remove, or change key-value pairs after the dictionary is created.
+ Unordered: Dictionaries do not maintain the order of elements.

In [7]:
pizzas = {"Original":
          {"Toppings": ["Pepperoni", "Peppers", "Sausage"],
           "Crusts": ["Thin", "Deep-Dish"], 
           "Sizes": [10, 12, 14, 16]},
          "Vegetarian":
          {"Toppings":["Mushroom", "Onion"],
           "Crusts": ["Thin", "Deep-Dish"], 
           "Sizes": [12, 14, 16]},
          }

In [8]:
pizzas["Original"]

{'Toppings': ['Pepperoni', 'Peppers', 'Sausage'],
 'Crusts': ['Thin', 'Deep-Dish'],
 'Sizes': [10, 12, 14, 16]}

In [9]:
pizzas["Original"]["Toppings"]

['Pepperoni', 'Peppers', 'Sausage']

In [10]:
pizzas.keys()

dict_keys(['Original', 'Vegetarian'])

In [11]:
pizzas["Original"].keys()

dict_keys(['Toppings', 'Crusts', 'Sizes'])

In [12]:
list(pizzas.keys())

['Original', 'Vegetarian']

In [13]:
pizzas["Prices"] = {"10": 10, "12": 12, "14": 14, "16": 16}

In [14]:
pizzas

{'Original': {'Toppings': ['Pepperoni', 'Peppers', 'Sausage'],
  'Crusts': ['Thin', 'Deep-Dish'],
  'Sizes': [10, 12, 14, 16]},
 'Vegetarian': {'Toppings': ['Mushroom', 'Onion'],
  'Crusts': ['Thin', 'Deep-Dish'],
  'Sizes': [12, 14, 16]},
 'Prices': {'10': 10, '12': 12, '14': 14, '16': 16}}

In [15]:
pizzas["Prices"]["16"] = 20
pizzas["Prices"]

{'10': 10, '12': 12, '14': 14, '16': 20}

In [16]:
pizzas["Prices"].update({"16": 21})
pizzas["Prices"]

{'10': 10, '12': 12, '14': 14, '16': 21}

In [17]:
pizzas

{'Original': {'Toppings': ['Pepperoni', 'Peppers', 'Sausage'],
  'Crusts': ['Thin', 'Deep-Dish'],
  'Sizes': [10, 12, 14, 16]},
 'Vegetarian': {'Toppings': ['Mushroom', 'Onion'],
  'Crusts': ['Thin', 'Deep-Dish'],
  'Sizes': [12, 14, 16]},
 'Prices': {'10': 10, '12': 12, '14': 14, '16': 21}}

In [18]:
for i in list(pizzas["Prices"]):
    pizzas["Prices"].update({i: pizzas["Prices"][i] + 1})

pizzas["Prices"]

{'10': 11, '12': 13, '14': 15, '16': 22}

In [19]:
list(pizzas["Prices"])

['10', '12', '14', '16']

### Numpy

NumPy (Numerical Python) is a powerful library in Python widely used for numerical computing, data manipulation, and scientific computing. It provides a high-performance multidimensional array object and tools for working with these arrays, making it a cornerstone of the data science and machine learning ecosystem.

In [20]:
import numpy as np

In [21]:
zeros = np.zeros(15)
zeros

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [22]:
zeros.reshape(5, 3)

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [23]:
ones = np.ones(15).reshape(5,3)
ones

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [24]:
numbers = np.arange(15).reshape(3, 5)
numbers

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [25]:
numbers[2][1] == numbers[2, 1] # Which number are these indices refering too?

True

In [26]:
numbers.transpose() # We can also transpose a matrix

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

In [27]:
np.random.seed(42)

randoms = [np.random.randint(1, 101) for i in range(100)]
randoms

[52,
 93,
 15,
 72,
 61,
 21,
 83,
 87,
 75,
 75,
 88,
 100,
 24,
 3,
 22,
 53,
 2,
 88,
 30,
 38,
 2,
 64,
 60,
 21,
 33,
 76,
 58,
 22,
 89,
 49,
 91,
 59,
 42,
 92,
 60,
 80,
 15,
 62,
 62,
 47,
 62,
 51,
 55,
 64,
 3,
 51,
 7,
 21,
 73,
 39,
 18,
 4,
 89,
 60,
 14,
 9,
 90,
 53,
 2,
 84,
 92,
 60,
 71,
 44,
 8,
 47,
 35,
 78,
 81,
 36,
 50,
 4,
 2,
 6,
 54,
 4,
 54,
 93,
 63,
 18,
 90,
 44,
 34,
 74,
 62,
 100,
 14,
 95,
 48,
 15,
 72,
 78,
 87,
 62,
 40,
 85,
 80,
 82,
 53,
 24]

In [28]:
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x[1:7:2]

array([1, 3, 5])

In [29]:
print(x[-2:10])
print(x[-3:3:-1])

[8 9]
[7 6 5 4]


In [30]:
y = np.array([2])
x += y
x

array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [31]:
x = np.arange(35).reshape(5, 7)
b = x > 20
print(b)
x[b[:, 5], 1:3]

[[False False False False False False False]
 [False False False False False False False]
 [False False False False False False False]
 [ True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True]]


array([[22, 23],
       [29, 30]])

### Pandas

Pandas leverages NumPy's powerful array-processing capabilities to provide even more advanced data manipulation tools, especially for working with structured data like tables, time series, and data frames.

In [32]:
import pandas as pd

In [33]:
# Creating a NumPy array
data = np.array([[1, 2, 3], [4, 5, 6]])

# Creating a Pandas DataFrame from the NumPy array
df = pd.DataFrame(data, columns=['A', 'B', 'C'])
print(df)

   A  B  C
0  1  2  3
1  4  5  6


In [34]:
# Creating a DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


In [35]:
# Creating a DataFrame from a list of lists
data = [
    ['Alice', 25, 'New York'],
    ['Bob', 30, 'Los Angeles'],
    ['Charlie', 35, 'Chicago']
]
df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
print(df)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


In [36]:
# Creating a DataFrame from a list of dictionaries
data = [
    {'Name': 'Alice', 'Age': 25, 'City': 'New York'},
    {'Name': 'Bob', 'Age': 30, 'City': 'Los Angeles'},
    {'Name': 'Charlie', 'Age': 35, 'City': 'Chicago'}
]
df = pd.DataFrame(data)
print(df)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


In [37]:
# Selecting columns from an existing DataFrame
df_existing = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})
df_new = df_existing[['A', 'B']]
print(df_new)

   A  B
0  1  4
1  2  5
2  3  6


In [38]:
df_existing.iloc[0]

A    1
B    4
C    7
Name: 0, dtype: int64

In [39]:
df_existing.loc[0]

A    1
B    4
C    7
Name: 0, dtype: int64

In [40]:
df.set_index(["City"], inplace= True, drop= False)

In [41]:
df

Unnamed: 0_level_0,Name,Age,City
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
New York,Alice,25,New York
Los Angeles,Bob,30,Los Angeles
Chicago,Charlie,35,Chicago


In [42]:
df.loc["New York"]

Name       Alice
Age           25
City    New York
Name: New York, dtype: object

In [43]:
df = df.drop("City", axis=1)

In [44]:
df.reset_index(inplace=True)

In [45]:
df

Unnamed: 0,City,Name,Age
0,New York,Alice,25
1,Los Angeles,Bob,30
2,Chicago,Charlie,35


In [46]:
# Creating a DataFrame from a CSV file
df = pd.read_csv('bearAttacks.csv')

In [47]:
df.head()

Unnamed: 0,Name,age,gender,Date,Month,Year,Type,Location,Description,Type of bear,Hunter,Grizzly,Hikers,Only one killed,Latitude,Longitude
0,Mary Porterfield,3.0,female,19/05/1901,May,1901,Wild,"Job, West Virginia",The children were gathering flowers near their...,Black bear,0,0,0,0,38.864277,-79.556998
1,Wilie Porterfield,5.0,male,19/05/1901,May,1901,Wild,"Job, West Virginia",The children were gathering flowers near their...,Black bear,0,0,0,0,38.864277,-79.556998
2,Henry Porterfield,7.0,male,19/05/1901,May,1901,Wild,"Job, West Virginia",The children were gathering flowers near their...,Black bear,0,0,0,0,38.864277,-79.556998
3,John Dicht,18.0,male,24/11/1906,Nov,1906,Wild,"Elk County, Pennsylvania","Thinking the bear was dead, Dicht began skinni...",Black bear,0,0,0,1,41.437362,-78.626009
4,Baby Laird,1.0,,05/10/1908,Oct,1908,Captive,"Tucson, Arizona",After a bear escaped from a cage at Elysian Gr...,Black bear,0,0,0,1,32.222876,-110.974847


In [48]:
df.isna().sum()

Name                0
 age                2
gender              1
Date                0
Month               0
Year                0
Type                0
Location            0
Description         0
Type of bear        0
Hunter              0
Grizzly             0
Hikers              0
Only one killed     0
Latitude           46
Longitude          46
dtype: int64

In [49]:
df_dropped = df.dropna()
df_dropped.info()

<class 'pandas.core.frame.DataFrame'>
Index: 118 entries, 0 to 165
Data columns (total 16 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Name             118 non-null    object 
 1    age             118 non-null    float64
 2   gender           118 non-null    object 
 3   Date             118 non-null    object 
 4   Month            118 non-null    object 
 5   Year             118 non-null    int64  
 6   Type             118 non-null    object 
 7   Location         118 non-null    object 
 8   Description      118 non-null    object 
 9   Type of bear     118 non-null    object 
 10  Hunter           118 non-null    int64  
 11  Grizzly          118 non-null    int64  
 12  Hikers           118 non-null    int64  
 13  Only one killed  118 non-null    int64  
 14  Latitude         118 non-null    float64
 15  Longitude        118 non-null    float64
dtypes: float64(3), int64(5), object(8)
memory usage: 15.7+ KB


In [50]:
df["Latitude"].fillna("?", inplace=True)
df["Longitude"].fillna("?", inplace=True)

df_dropped = df.dropna()
df_dropped.info()

<class 'pandas.core.frame.DataFrame'>
Index: 163 entries, 0 to 165
Data columns (total 16 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Name             163 non-null    object 
 1    age             163 non-null    float64
 2   gender           163 non-null    object 
 3   Date             163 non-null    object 
 4   Month            163 non-null    object 
 5   Year             163 non-null    int64  
 6   Type             163 non-null    object 
 7   Location         163 non-null    object 
 8   Description      163 non-null    object 
 9   Type of bear     163 non-null    object 
 10  Hunter           163 non-null    int64  
 11  Grizzly          163 non-null    int64  
 12  Hikers           163 non-null    int64  
 13  Only one killed  163 non-null    int64  
 14  Latitude         163 non-null    object 
 15  Longitude        163 non-null    object 
dtypes: float64(1), int64(5), object(10)
memory usage: 21.6+ KB
