<img src = "https://img.betapage.co/images/77640967-77641456.png" height=50% width = 50%>

In [12]:
import numpy as np

# Introduction to NumPy

"Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. This library provides you with an array data structure that holds some benefits over Python lists, such as: being more compact, faster access in reading and writing items, being more convenient and more efficient."


# What is a NumPy array?

"The central feature of NumPy is the array object class. Arrays are similar to lists in Python, except that every element of an array must be of the same type, typically a numeric type like float or int. Arrays make operations with large amounts of numeric data very fast and are generally much more efficient than lists."

LINK: https://engineering.ucsb.edu/~shell/che210d/numpy.pdf

<img src = "http://community.datacamp.com.s3.amazonaws.com/community/production/ckeditor_assets/pictures/332/content_arrays-axes.png">

# NumPy Array Syntax
The function array takes two arguments: the list to be converted into the array and the type of each member of the list. 

In [3]:
#List to be converted
lst = [1,2,3,4,5,6,7,8,9]

arr = np.array(lst)
arr

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

Array elements are accessed, sliced, and manipulated just like lists.

In [4]:
#Index from the 2nd index on
arr[2:]

array([3, 4, 5, 6, 7, 8, 9])

In [5]:
#manipulate item at index 0
arr[0] = 10
arr

array([10,  2,  3,  4,  5,  6,  7,  8,  9])


<b>* Why can't we simply use a python list for these scientific computations?<b>

# Python List VS NumPy Array

"Arrays and lists are both used in Python to store data, but they don't serve exactly the same purposes. They both can be used to store any data type (real numbers, strings, etc), and they both can be indexed and iterated through, but the similarities between the two don't go much further. The main difference between a list and an array is the functions that you can perform to them. For example, you can divide an array by 3, and each number in the array will be divided by 3 and the result will be printed if you request it. If you try to divide a list by 3, Python will tell you that it can't be done, and an error will be thrown."


In [6]:
lst = [3,6,9,12,15,18,12]
#lst/3
lst2= [i/3 for i in lst]
lst2

[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 4.0]

In [7]:
arr = np.array([3,6,9,12,15,18,12])
arr/3

array([1., 2., 3., 4., 5., 6., 4.])

Arrays can be multidimensional. Unlike lists, different axes are accessed using commas inside bracket notation. Here is an example with a two-dimensional array (e.g., a matrix)

In [8]:
lst1 = [1,2,3,4,5]
lst2 = [5,6,7,8,9]
arr = np.array([lst1,lst2])
arr

array([[1, 2, 3, 4, 5],
       [5, 6, 7, 8, 9]])

In [9]:
arr/3

array([[0.33333333, 0.66666667, 1.        , 1.33333333, 1.66666667],
       [1.66666667, 2.        , 2.33333333, 2.66666667, 3.        ]])

In [10]:
lst_lst = [lst1,lst2]
lst_lst

[[1, 2, 3, 4, 5], [5, 6, 7, 8, 9]]

In [11]:
lst_lst/3

TypeError: unsupported operand type(s) for /: 'list' and 'int'

# Indexing Arrays VS Lists

In [None]:
arr

In [None]:
arr[0][1]

In [None]:
lst_lst

In [None]:
#lst_lst[0,1]
lst_lst[0][1]

In [None]:
lst_lst[0][1]

In [None]:
arr[-1]

In [None]:
lst_lst[-1]

In [None]:
arr

<h3> How to index a multidemsional array? </h3><br>
The individual elements of arrays can be accessed in the same way as for lists.

<img src = "http://www.scipy-lectures.org/_images/numpy_indexing.png" height = 60% width = 60%>

In [None]:
list_2d = [[0,1,2,3,4,5],
           [10,11,12,13,14,15],
           [20,21,22,23,24,25],
           [30,31,32,33,34,35],
           [40,41,42,43,44,45],
           [50,51,52,53,54,55]]

In [None]:
array_2d = np.array(list_2d)
print(array_2d)
array_2d.shape

t1=array_2d.shape
print(t1[0])
print(len(t1))

In [None]:
print(array_2d[0,3:5])

In [None]:
print(array_2d[4:,4:])

In [None]:
print(array_2d[:,2])

In [None]:
print(array_2d[2::2,::2]) # step by 2
# 2::2 for 1st dimension
# ::2 for 2nd dimension
# [started index]:[stop index]:[step]

In [None]:
# adding new column to numpy array

In [None]:
calc = array_2d[:,5] * 1.05
print(calc)

new_column = np.array(calc)
print(new_column)

In [None]:
new_array_2d = np.column_stack((array_2d,new_column))
print(new_array_2d)

# Changing Array to different DataType

In [None]:
arr = arr.tolist()
arr

In [None]:
type(arr)

In [None]:
arr = np.array(arr)
arr

In [None]:
type(arr)

In [None]:
arr.shape

# Change Array Shape

<img src = "https://www.safaribooksonline.com/library/view/python-for-data/9781449323592/httpatomoreillycomsourceoreillyimages1346880.png" height = 50% width = 30% style = display.left> 

Transposed versions of arrays can also be generated, which will create a new array with the final two axes switched:

In [None]:
arr

In [None]:
arr.shape

In [None]:
arr.transpose()

In [None]:
arr.transpose().shape

In [None]:
arr.reshape((5,2))

Make multidimensional array into one-dimensional array

In [None]:
arr.shape

In [None]:
arr.flatten()

In [None]:
arr.flatten().shape

# Create New Array (Specific)

Numpy also provides many functions to create arrays.

Creates an array of all zeros with a specified shape.

In [None]:
#1-Dimensional
np.zeros(10)

In [None]:
#2-Dimensional
np.zeros((2,2), int)

Creates an array of all ones with a specified shape.

In [None]:
#1-Dimensional
np.ones(10, int)

In [None]:
#2-Dimensional
np.ones((2,2))

Creates a constant array (specified number) with a specified shape.

In [None]:
#1-Dimensional
np.full(10,7)

In [None]:
#2-Dimensional
np.full((2, 2), 7)

Created an array of a specified shape with random values.

In [None]:
#1-Dimensional
np.random.random(10)

In [None]:
#2-Dimensional
np.random.random((2,2))

Create an array of a specified length with evenly spaced values.

In [None]:
#1-Dimensional
np.arange(10)

Create an array with a specified "start", "stop", and number of values, evenly spaced.

In [None]:
#1-Dimensional
np.linspace(1, 10, num = 20)

Creates a 2x2 identity matrix (array).

An identity matrix is a square matrix having 1s on the main diagonal, and 0s everywhere else. These are called identity matrices because, when you multiply them with a compatible matrix , you get back the same matrix.
http://www.sparknotes.com/math/algebra2/matrices/section3.rhtml

In [None]:
#2-Dimensional
np.eye(10)

OR

In [None]:
#2-Dimensional
np.identity(10)

# Math Functions using NumPy

"As such, it probably won’t surprise you that you can just use +, -, *, / or % to add, subtract, multiply, divide or calculate the remainder of two (or more) arrays. However, a big part of why NumPy is so handy, is because it also has functions to do this. The equivalent functions of the operations that you have seen just now are, respectively, np.add(), np.subtract(), np.multiply(), np.divide() and np.remainder()."

https://www.datacamp.com/community/tutorials/python-numpy-tutorial

In [None]:
arr = np.ones((10,10))
arr

In [None]:
np.add(arr,2)

In [None]:
#OR
arr + 2

In [None]:
np.multiply(arr,2)

In [None]:
#OR
arr*2

In [None]:
np.subtract(arr,1)

In [None]:
#OR
arr -1 

In [None]:
np.divide(arr,2)

In [None]:
#OR
arr/2

In [None]:
np.remainder(arr,1)

In [None]:
#OR
arr % 1

In [None]:
arr.sum()

In [None]:
arr.min()

In [None]:
arr.max()

In [None]:
arr.mean()

# <font color = magenta> NumPy Problem 1 </font>
<font color = magenta>
Create the three arrays displayed in the image, below.

<img src = "https://i.stack.imgur.com/ojnFF.jpg">

In [13]:
#Array 1
arr1=np.array([[4,6,4],[1,1,8],[0,7,5],[5,3,3],[8,9,5]])
arr1


array([[4, 6, 4],
       [1, 1, 8],
       [0, 7, 5],
       [5, 3, 3],
       [8, 9, 5]])

In [14]:
#Array 2
arr2=np.array([[8,8,4],[3,4,4],[0,0,9],[3,7,3],[3,4,7]])
arr2

array([[8, 8, 4],
       [3, 4, 4],
       [0, 0, 9],
       [3, 7, 3],
       [3, 4, 7]])

In [15]:
#Array 3
arr3=np.array([[9,5,4],[7,7,3],[9,5,9],[8,7,8],[5,8,8]])
arr3

array([[9, 5, 4],
       [7, 7, 3],
       [9, 5, 9],
       [8, 7, 8],
       [5, 8, 8]])

# <font color = magenta> NumPy Problem 2 </font>
<font color = magenta>
Create a multidimensional array of your dimension choice and fill it random values(not filled manually).

In [16]:
import random
arr4=np.random.randint(100,size=(3,5))
arr4

array([[49, 94, 13, 24,  4],
       [ 4, 34, 59, 94, 21],
       [94, 88, 49, 46, 95]])

Find the min and max values of your array.

In [17]:
minv1=arr4.min()
maxv1=arr4.max()
print("min value of my array is: " + str(minv1))
print("max value of my array is: " + str(maxv1))

min value of my array is: 4
max value of my array is: 95


# <font color = magenta> NumPy Problem 3 </font>
<font color = magenta>
Create another multidimensional array of your dimension choice and fill it random values(not filled manually). Find the max value of your new array and replace it with your min value. Find the min value and replace it in your array with the max value.

In [18]:
arr5=np.random.randint(100,size=(5,8))
#arr5=np.array([[1,2,3,4,5],[5,6,7,8,8],[8,3,1,7,6]])
print(arr5)
ar5maxv=arr5.max()
ar5minv=arr5.min()

print("\nmin value of the array is: " + str(ar5minv))
print("max value of the array is: " + str(ar5maxv))

minvidx1=np.where(arr5==ar5minv)
maxvidx1=np.where(arr5==ar5maxv)

for i in range(len(minvidx1[0])):
    arr5[minvidx1[0][i]][minvidx1[1][i]]=ar5maxv

for i in range(len(maxvidx1[0])):
    arr5[maxvidx1[0][i]][maxvidx1[1][i]]=ar5minv

print("\nAfter replace min value with max, and max with min value:")
print(arr5)

[[29 94 77 81 71 43 91 25]
 [18 45 65  9 52 72 73 34]
 [80 93 46 40 30 29 45 79]
 [77 36 64 14 42 66 82 54]
 [48 80 62 43  2 11 27 60]]

min value of the array is: 2
max value of the array is: 94

After replace min value with max, and max with min value:
[[29  2 77 81 71 43 91 25]
 [18 45 65  9 52 72 73 34]
 [80 93 46 40 30 29 45 79]
 [77 36 64 14 42 66 82 54]
 [48 80 62 43 94 11 27 60]]


# <font color = magenta> NumPy Problem 4 </font>

Create a random vector of size 10 and sort it.

In [19]:
ranvec1=np.random.randint(0,50,size=10)
ranvec1

array([14, 47, 28, 38,  8, 40,  8, 44,  1, 29])

# <font color = magenta> NumPy Problem 5 </font>

<font color = magenta>
How to swap two rows of an array?

In [20]:
arr51=np.random.randint(0,100,size=(5,8))
print(arr51)
rowtemp=arr51[1].copy()
arr51[1]=arr51[3]
arr51[3]=rowtemp
print("\n\nafter swap 2nd row with 4th row:")
print(arr51)


[[51  6  3 68 48 98 98 16]
 [58 24 11 83 18  2 35 66]
 [41  1 24  6 78 82 98 37]
 [ 6 39 42 80 22 45 98 22]
 [43 60  4  7 52  2 86 53]]


after swap 2nd row with 4th row:
[[51  6  3 68 48 98 98 16]
 [ 6 39 42 80 22 45 98 22]
 [41  1 24  6 78 82 98 37]
 [58 24 11 83 18  2 35 66]
 [43 60  4  7 52  2 86 53]]


# Numpy with Bay Area housing data set

In [21]:
def read_file_housing(filename):
    file_open = open(filename,"r")
    fixed_file = open("fixed-housing-data.csv","w")
    line_count = 0
    for line in iter(file_open):
        line_count += 1
        if "HomeID" in line:
            continue
        line_no_newline = line.rstrip()
        line1 = line_no_newline.replace("84085","94085") #Ex9
        line2 = line1.replace("84087","94087") #Ex9
        line3 = line2.replace("85014","95014") #Ex9
        line4 = line3.replace("85051","95051") #Ex9
        line5 = line4.replace("l","1") #Ex11 -- Car_Garage
        line_split = line5.split(",")
        if (int(line_split[5]) < 100): #Ex10 -- School_API
            line_split[5] = int(line_split[5]) * 10
        else:
            line_split[5] = int(line_split[5])
        line_split = [str(x) for x in line_split]
        myString = ",".join(line_split) + "\n"
        fixed_file.write(myString)
    return

In [22]:
read_file_housing("bayarea_home_prices.csv")

In [23]:
import numpy as np

In [None]:
"""
0 = HomeID
1 = HomeAge
2 = HomeSqft
3 = LotSize
4 = BedRooms
5 = HighSchoolAPI
6 = ProxFwy
7 = CarGarage
8 = ZipCode
9 = HomePriceK
"""

In [24]:
housing = np.loadtxt("fixed-housing-data.csv",
                          dtype=int,
                          delimiter=",")

In [None]:
print(housing[0:2])

In [None]:
print(housing.shape)

In [None]:
# home prices
print(housing[:,9])

In [None]:
print(housing[:,9] + 10)

In [None]:
print(housing.sum(axis=0)) # sum by columns

In [None]:
print(housing.sum(axis=1)) # sum by rows

In [None]:
homes_94085 = (housing[:,8] == 94085)

In [None]:
print(homes_94085)

In [None]:
data_94085 = housing[homes_94085,][:,:]
#print(data_94085)

In [None]:
sum_price_94085 = data_94085[:,9].sum()

In [None]:
average_94085 = sum_price_94085/25
print(average_94085)

# NumPy Problem 6
### Calculate average price in each zip code: 94085, 94087, 95014, 95051
### Calculate minimum and max price in each zip code: 94085, 94087, 95014, 95051
### Calculate standard deviation of price in each zip code: 94085, 94087, 95014, 95051

In [25]:
# Your code here
zipCodes=np.unique(housing[:,8])
print("Zip Code, Avg Price(K), Min. Price(K), Max. Price(K), Standard deviation of Price (K):")
print("======================================================================================")
for zc in zipCodes:
    tempHzcRef=(housing[:,8]==zc)
    tempHzcDt=housing[tempHzcRef,][:,:]
    tempzcpDt=tempHzcDt[:,9]
    avgp=tempzcpDt.sum()/len(tempzcpDt)
    minp=tempzcpDt.min()
    maxp=tempzcpDt.max()
    zchstd=np.std(tempzcpDt)
    print("%s,       %8.2f,        %5d,        %5d,             %5.2f" %(zc,avgp,minp,maxp,zchstd))


Zip Code, Avg Price(K), Min. Price(K), Max. Price(K), Standard deviation of Price (K):
94085,         885.96,          809,          934,             33.71
94087,        1151.48,         1103,         1190,             27.57
95014,        1263.32,         1194,         1336,             37.74
95051,        1023.20,          942,         1097,             46.04


In [27]:
h1 = housing[housing[:,5].argsort()] # by school_api ascending
print(h1)

[[   65    14  1617  8394     2   850     2     0 94087  1138]
 [   73    25  1302  8668     3   850     4     2 95014  1240]
 [   23    15  1828  6956     3   851     4     3 94085   916]
 [   20    13  1358  6819     2   851     3     2 94085   859]
 [   79    17  1373  8953     2   851     2     0 94087  1190]
 [   77    17  1881  8921     3   852     2     0 95014  1194]
 [   19    10  1246  6810     2   853     4     3 95051   942]
 [   32    18  1866  7181     2   854     2     3 95051  1049]
 [   95    13  1582  9339     3   856     3     0 95014  1267]
 [   26    12  1500  7025     2   856     4     2 94085   934]
 [   99    19  1880  9470     3   857     3     3 95014  1269]
 [   53    23  1289  7873     2   857     3     0 95051  1074]
 [   67    24  1947  8502     2   857     4     0 94087  1179]
 [  100    11  1691  9476     4   857     4     0 95014  1250]
 [   14    25  1974  6547     2   857     4     3 94085   865]
 [   44    11  1415  7541     3   859     4     0 95051

In [28]:
h2 = housing[housing[:,5].argsort()[::-1]] # by school_api descending
print(h2)

[[   38    22  1724  7339     3   975     3     3 95051  1038]
 [   35    12  1943  7249     2   974     2     0 95051  1030]
 [   27    13  1836  7027     2   966     3     3 94085   914]
 [   17    23  1464  6773     3   965     4     2 94085   882]
 [   69    21  1575  8579     2   962     4     3 94087  1128]
 [   37    13  1874  7333     3   960     3     2 95051  1044]
 [   45    15  1249  7609     3   960     2     2 95051  1000]
 [    2    10  1563  6085     2   959     4     3 94085   861]
 [    4    14  1215  6129     3   959     4     2 94085   809]
 [   33    11  1953  7199     3   959     3     2 95051  1042]
 [    7    13  1947  6183     3   959     3     1 94085   843]
 [   76    12  1947  8882     3   954     3     2 94087  1173]
 [   59    22  1559  8096     2   953     2     3 95051  1080]
 [   57    11  1927  7983     3   950     3     1 94087  1116]
 [   10    24  1933  6276     2   950     4     1 94085   885]
 [   50    19  1836  7803     3   949     3     0 95051

# NumPy Problem 7
### Find top-2 listings by School API for all zipcodes

In [29]:
# Your code here
print("Top-2 listings by School API:")
print(h2[0])
print(h2[1])
print("\n")
zipCodes=np.unique(housing[:,8])

print("Zip Code, Top-2 listings by School API:")
print("=============================================================")
for zc in zipCodes:
    tempzcRef=(housing[:,8]==zc)
    tempzcDt=housing[tempzcRef,][:,:]
    sortedDT=tempzcDt[tempzcDt[:,5].argsort()[::-1]]
    print("%s :" %zc)
    print(sortedDT[0])
    print(sortedDT[1])
    print("\n")


Top-2 listings by School API:
[   38    22  1724  7339     3   975     3     3 95051  1038]
[   35    12  1943  7249     2   974     2     0 95051  1030]


Zip Code, Top-2 listings by School API:
94085 :
[   27    13  1836  7027     2   966     3     3 94085   914]
[   17    23  1464  6773     3   965     4     2 94085   882]


94087 :
[   69    21  1575  8579     2   962     4     3 94087  1128]
[   76    12  1947  8882     3   954     3     2 94087  1173]


95014 :
[   97    10  1645  9352     4   942     3     3 95014  1336]
[   93    25  1298  9309     3   942     3     0 95014  1269]


95051 :
[   38    22  1724  7339     3   975     3     3 95051  1038]
[   35    12  1943  7249     2   974     2     0 95051  1030]




# NumPy Problem 8
### Prices are expected to go up by 4% next year.
### Add another column with predicted prices

In [30]:
# Your code here
housing2=np.insert(housing, 10, [housing[:,9]*1.04],axis=1)
print(housing2)

[[    1    24  1757 ... 94085   894   929]
 [    2    10  1563 ... 94085   861   895]
 [    3    14  1344 ... 94085   831   864]
 ...
 [   98    21  1312 ... 95014  1284  1335]
 [   99    19  1880 ... 95014  1269  1319]
 [  100    11  1691 ... 95014  1250  1300]]


# NumPy Problem 9
### Sort the matrix based on HomeID. Save the updated numpy matrix with added column in Problem 8 to a file.

In [31]:
# Your code here
housing3=housing2[housing2[:,0].argsort()]
np.savetxt("housing_with_inflation_rate.csv", housing3, fmt="%d", delimiter=",")

# <font color = magenta> NumPy Problem 10 </font>

Write a function that takes a long string containing multiple words. Print the same string, except with the words in backwards order. 

<i>HINT: Use <b>YOUR_STRING<code>.split()</code></b> function<br></i>

In [32]:
# myString= "This is Lynbrook High school."
# output to : School High Lynbrook is this.
# first letter of first word is Uppercase
# period at the end of last word
myString = "We do not worry about grade in this class."
arrmystr=myString.split()
mn=len(arrmystr)
arrmystr[0]=arrmystr[0].lower()+"."
arrmystr[mn-1]=arrmystr[mn-1].replace(".","")
tmpstr1=arrmystr[mn-1]
tmpstr1=tmpstr1[0].upper()+tmpstr1[1:]
arrmystr[mn-1]=tmpstr1
arrmystr2=arrmystr[::-1]
myString2=" ".join(arrmystr2)

print(myString2)


Class this in grade about worry not do we.
