<a href="https://colab.research.google.com/github/AndreaReid/BigData/blob/main/BigData_Python_Extended.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GGE-5405/6505 Big Data

# **Intro to Python - Extended**



---



## Python Classes & Objects

Complete the following tutorial: https://www.w3schools.com/python/python_classes.asp

## Create a module

Refresher: https://www.w3schools.com/python/python_modules.asp

Add the following code to a file called "mymodule.py"

```
def greeting(name):
  print("Hello, " + name)
```

In [None]:
# check to make sure module is in current directory
!ls

In [None]:
# check contents of module from command line
!cat mymodule.py 

In [None]:
# import module from current workspace
import mymodule

In [None]:
# call the function from mymodule
mymodule.greeting("Kai")

In [None]:
# create an alias
import mymodule as m
m.greeting("Kai")

In [None]:
# import just greeting function from module
from mymodule import greeting
greeting("Kai")

## Let's explore some useful modules and libraries!



### JSON



What is a python dictionary: https://www.w3schools.com/python/python_dictionaries.asp


Complete the following JSON tutorial: https://www.w3schools.com/python/python_json.asp

Let's try it out:

In [None]:
# create a python dictionary
person1 = {
  "name": "John",
  "age": 36,
  "country": "Norway"
}
type(person1)

In [None]:
import json

# convert to json
myjson = json.dumps(person1, indent=4)
print(type(myjson))

In [None]:
myjson

In [None]:
# write to file
with open("person1.json", "w") as f:
  f.write(myjson)

In [None]:
# check contents
!cat person1.json

In [None]:
# create python file object from json file
f = open("/content/sample_data/anscombe.json")

# create python list of dictionaries
data = json.load(f)
f.close()

print(type(data))

In [None]:
# extract first item from the list
dict1 = data[0]
print(type(dict1))

In [None]:
print(dict1.keys())

### CSV

Info about the module here: https://docs.python.org/3/library/csv.html#

In [None]:
# Use a shell command to get first 4 lines of the file
!head -4 /content/sample_data/california_housing_test.csv

In [None]:
# use csv module to read csv file
import csv
with open('sample_data/california_housing_test.csv', newline='') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        print(', '.join(row))
        #print(row)

In [None]:
# use DictReader method to access information like a dictionary
with open('sample_data/california_housing_test.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print(row['longitude'], row['latitude'])


In [None]:
print(row)

In [None]:
# write new csv file
with open('names.csv', 'w', newline='') as csvfile:
    fieldnames = ['first_name', 'last_name']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
    writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
    writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})

In [None]:
# we can use shell commands to zip files
!zip names.zip names.csv

In [None]:
# and unzip files
!unzip names.zip

### NumPy

NumPy (Numerical Python) is python a library that provides efficient functions and methods for working with arrays. It is much, much faster than working with Python lists.

Learn more: https://www.w3schools.com/python/numpy/default.asp

Documentation: https://numpy.org/doc/stable/reference/arrays.html

In the example below, we can create a NumPy array object (ndarray) using the array() function. We can create 0D, 1D, 2D, and 3D arrays.

In [None]:
import numpy as np

# 0-D
a = np.array(42)

# 1-D
b = np.array([1, 2, 3, 4, 5, 6])

# 2-D
c = np.array([[1, 2, 3], [4, 5, 6]])

# 3-D
d = np.array([[[1.1, 1.2, 1.3], [1.4, 1.5, 1.6]], [[2.1, 2.2, 2.3], [2.4, 2.5, 2.6]]])

# use ndim attribute to check number of dimensions
print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)

In [None]:
# print the arrays
print("------0D------")
print(a)
print("")
print("------1D------")
print(b)
print("")
print("------2D------")
print(c)
print("")
print("------3D------")
print(d)

In [None]:
# Extract from 1-D 
print(f"First element in 1D array: {b[0]}")

# Extract from 2-D (index order: [row,col])
print(f"Second element in first row in 2D array: {c[0,1]}")

# Extract from 3-D (index order: [dim,row,col])
print(f"Second element in 1st row of 2nd dimension in 3D array: {d[1,0,1]}")

In [None]:
# Check the shape of each array
print(a.shape) # 0D - return is empty
print(b.shape) # 1D - returns number of elements
print(c.shape) # 2D - returns row, col
print(d.shape) # 3D - returns dim, row, col

In [None]:
# reshape 1D to 2D
print(b.reshape(2,3))

In [None]:
# flatten 3D to 1D
print(d.reshape(-1))

In [None]:
# Create an array using arange()
# arange() generates numbers within specified range
help(np.arange)

In [None]:
# create an array with arange()
arr = np.arange(10).reshape(5,2)
print(arr)

In [None]:
# Example: create a function to rescale an array

def rescale(input_array):
    L = np.min(input_array)
    H = np.max(input_array)
    output_array = (input_array - L) / (H - L) # normalize the array between 0 and 1
    return output_array

arr_rescale = rescale(arr)
arr_rescale