## Numpy

NumPy, which stands for Numerical Python, is a library consisting of multidimensional array objects and a collection of routines for processing those arrays. Using NumPy, mathematical and logical operations on arrays can be performed. This tutorial explains the basics of NumPy such as its architecture and environment. It also discusses the various array functions, types of indexing, etc. An introduction to Matplotlib is also provided. All this is explained with the help of examples for better understanding.

Comparison between python list and a numpy array

https://www.geeksforgeeks.org/python-lists-vs-numpy-arrays/

Learning resources 

https://www.youtube.com/watch?v=QUT1VHiLmmI

https://www.tutorialspoint.com/numpy/index.htm

In [1]:
import numpy as np

In [3]:
python_list = [1, 2, 3]
numpy_arr = np.array(python_list)
numpy_arr

array([1, 2, 3])

In [4]:
a = np.array([1, 2, 3])
print(a)
print(type(a))

[1 2 3]
<class 'numpy.ndarray'>


In [5]:
# 2D-array
b = np.array([
    [1, 3, 4],
    [-2, 1, -1],
    [0, 4, 5]
])
print(b)

[[ 1  3  4]
 [-2  1 -1]
 [ 0  4  5]]


In [6]:
b.dtype

dtype('int64')

In [7]:
c = np.array([
    [1, 3, 4],
    [-2, 1, -1],
    [0, 4, 5]
], dtype=np.float16)

In [8]:
c.dtype

dtype('float16')

In [9]:
c

array([[ 1.,  3.,  4.],
       [-2.,  1., -1.],
       [ 0.,  4.,  5.]], dtype=float16)

In [37]:
np.random.random((2,2,2))

array([[[0.79599262, 0.3613093 ],
        [0.50962542, 0.46094019]],

       [[0.23563043, 0.4175063 ],
        [0.28896607, 0.77603755]]])

In [38]:
# shape
d = np.random.random((30, 5, 3))
d.shape

(30, 5, 3)

In [39]:
print(a.shape)
print(b.shape)
print(c.shape)

(3,)
(3, 3)
(3, 3)


In [41]:
a.reshape((3, 1))

array([[1],
       [2],
       [3]])

In [45]:
a.reshape((1, 1, 3))

array([[[1, 2, 3]]])

In [46]:
a.reshape((3,1,1))

array([[[1]],

       [[2]],

       [[3]]])

In [47]:
b.reshape((-1))

array([ 1,  3,  4, -2,  1, -1,  0,  4,  5])

In [51]:
e = np.random.randint(1, 10, (3,2))
print(e.shape)
print(e)
e = e.reshape((2,3))
print(e.shape)
print(e)

(3, 2)
[[3 9]
 [5 6]
 [2 3]]
(2, 3)
[[3 9 5]
 [6 2 3]]


In [49]:
e.reshape((3,2,1))

array([[[0.0879769 ],
        [0.36042174]],

       [[0.34766508],
        [0.87148654]],

       [[0.33995668],
        [0.91738028]]])

In [None]:
# indexing and slicing
print(a[0])
print(a[1])
print(a[2])

In [None]:
a[:2]

In [None]:
a[1:]

In [None]:
print(b[0,0])
print(b[1,0])
print(b[2,2])

In [None]:
b[:, :2]

In [None]:
b[1:, :2]

In [None]:
b[:, 0]

In [None]:
# concatenation
a = np.random.rand(6, 3)
print(a)
print(a.shape)
b = np.random.rand(11, 3)
print(b)
print(b.shape)

In [None]:
c = np.concatenate((a, b), axis=0)
print(c)
print(c.shape)

In [None]:
a = np.random.rand(10,3)
print(a)
print(a.shape)
b = np.random.rand(10, 5)
print(b.shape)
c = np.concatenate((a,b), axis=1)
print(c)
print(c.shape)

In [None]:
# arithmetic
a = np.array([1, 2, 3])
2 * a

In [None]:
print(a.mean())
print(a.min())
print(a.max())
print(np.argmax(a))

In [None]:
a = np.array([
    [1, 4, 5],
    [2, 3, 1]
])
print(a)

In [None]:
a.mean(axis=0)

In [None]:
a.mean(axis=1)

In [None]:
a.mean()

In [None]:
a = np.array([
    [1, 2, 3],
    [3, 2, 1],
    [1, 2, 3]
])
b = np.array([
    [5, 4, 3],
    [3, 2, 1],
    [1, 2, 3]
])
a + b

In [None]:
# comparison
a = np.array([1, 2, 3, 4, 5, 6, 7])
# produces mask array
a > 3

In [None]:
# can filter using mask array 
a[a > 3]

### Pandas

Pandas provides easy-to-use data structures and data analysis tools for the Python programming language

In [None]:
import pandas as pd

In [None]:
# pandas series
series = pd.Series([1, 2, 3, 4, 5])
series

In [None]:
series[0]

In [None]:
series.values # converts to numpy array

In [None]:
series > 3

In [None]:
series[series > 3]

In [None]:
# pandas dataframe
data = {
    'Name': ['John', 'Alice', 'Bob', 'Eve'],
    'Age': [30, 25, 35, 28],
    'City': ['New York', 'Los Angeles', 'Chicago', 'San Francisco']
}

# Create a DataFrame from the dictionary
people_df = pd.DataFrame(data)
people_df

In [None]:
# changing index from numbers 0 - 3 to name column
people_df.set_index("Name", inplace=True)
people_df

In [None]:
# displaying first nth rows
people_df.head()

In [None]:
people_df.tail()

In [None]:
# column selection
people_df["Age"]

In [None]:
people_df["City"]

In [None]:
# item selection
people_df["Age"]["John"]

In [None]:
print(people_df.index)
print(people_df.columns)

In [None]:
# delete rows and columns
series = pd.Series([1, 2, 3, 4, 5], index=list("abcde"))
print(series)
series.drop("d", inplace=True)
series

In [None]:
people_df.drop("Eve", inplace=True)
print(people_df)

In [None]:
people_df.drop("Age", axis=1, inplace=True)
people_df

In [None]:
# slicing and flitering
# Define lists for each column
names = ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace', 'Helen', 'Isabel', 'Jack']
ages = [28, 35, 42, 30, 22, 39, 45, 29, 32, 37]
genders = ['Female', 'Male', 'Female', 'Male', 'Female', 'Male', 'Female', 'Female', 'Female', 'Male']
cities = ['New York', 'Los Angeles', 'Chicago', 'San Francisco', 'Boston', 'Miami', 'Dallas', 'Seattle', 'Denver', 'Atlanta']
occupations = ['Engineer', 'Teacher', 'Doctor', 'Artist', 'Student', 'Lawyer', 'Nurse', 'Chef', 'Software Developer', 'Salesperson']
incomes = [60000, 45000, 90000, 35000, 20000, 75000, 55000, 62000, 80000, 40000]
marital_statuses = ['Married', 'Single', 'Divorced', 'Single', 'Married', 'Married', 'Single', 'Single', 'Married', 'Divorced']

# Create a DataFrame from the lists
people_df = pd.DataFrame({
    'Name': names,
    'Age': ages,
    'Gender': genders,
    'City': cities,
    'Occupation': occupations,
    'Income': incomes,
    'Marital Status': marital_statuses
})

# Display the DataFrame
people_df

In [None]:
people_df.set_index(people_df["Name"], inplace=True)
people_df

In [None]:
# select multiple columns
people_df[["Name", "City"]]

In [None]:
people_df[["Name", "Age", "Occupation"]]

In [None]:
# index name based selecting row
people_df.loc[["Alice", "Eve", "Jack"]]

In [None]:
# index name based rows selection and coloumn selection
people_df.loc[["Alice", "Eve", "Jack"], ["Age", "Occupation", "Marital Status"]]

In [None]:
# index based selection for column
people_df.iloc[:, [0, 3, 5]]

In [None]:
# index based selection for column
people_df.iloc[[0, 4, 9], :]

In [None]:
# index based selection for column and rows
people_df.iloc[[0, 4, 9], [0, 3, 5]]

In [None]:
# filtering
people_df

In [None]:
bool_mask = people_df["Age"] >= 39
print(bool_mask)

In [None]:
people_df[bool_mask]

In [None]:
# arithmethic with data frame
people_df["Age"] -= 2
people_df

In [None]:
people_df.describe()

### Loading csv files with pandas

In [None]:
ecommerce_df = pd.read_csv("./ecommerce_customer_data_large.csv", sep=",")
ecommerce_df.head(10)

In [None]:
# grouping based on a column e.g. group product category
groups = ecommerce_df.groupby("Product Category").groups
groups

In [None]:
print(groups.keys())

In [None]:
# selecting rows based on the Books index
ecommerce_df.iloc[groups["Books"], :]

In [None]:
# missing values
ecommerce_df.isna().any()

In [None]:
ecommerce_df.describe()

In [None]:
# converting pandas dataframe to numpy array
intrests = ecommerce_df[["Product Price", "Quantity", "Total Purchase Amount", "Customer Age", "Returns"]]
intrests

In [None]:
intrests.fillna(0.0)

In [None]:
interests_array = intrests.values
print(type(interests_array))
interests_array