# 2.1.1 Numpy

**Support for large, multi-dimensional arrays and matrices, and a large collection of high-level mathematical functions to operate on these arrays.**

In [1]:
import numpy as np

**nparray object: an n-dimensional array of homogeneous (same) data types, with many operations being performed in compiled code for performance**

- fixed sized
- same data type
- much more efficient mathematical operations than built in data types like list

**numpy.dtype**

- intc (same as a C integer) and intp (used for indexing)
- int8, int16, int32, int64
- uint8, uint16, uint32, uint64
- float16, float32, float64
- complex64, complex128

# Numpy Arrays

**Create a numpy array**

- Conversion from other Python structures (lists, tuples)
- Built-in NumPy array creation (arrange, ones, zeros, etc)
- Reading arrays from a file

In [2]:
np.array([2,3,1,0])

array([2, 3, 1, 0])

In [3]:
# numpy.zeros(shape, dtype=float, order='C', *, like=None): 
#    - returns a new array of given shape and type filled with zeros
#    - shape object must be a tuple: (#, #)

# 5 rows, 5 columns
np.zeros((5,5)) 

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [4]:
# np.ones():
#    - fills a numpy array with ones

np.ones((5,2))

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [5]:
# np.arange(int):
#    - creates a list containing all number from 0 to the proided int(exclusive)

np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [6]:
# reverse numpy array

np.arange(15,5, -1)

array([15, 14, 13, 12, 11, 10,  9,  8,  7,  6])

In [7]:
# numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)
#    - Return evenly spaced numbers over a specified interval.
#    - (enpoint=True) means stop value is inclusive. 
#    - Returns num evenly spaced samples, calculated over the interval [start, stop].
#    - T he endpoint of the interval can optionally be excluded.

np.linspace(1, 10, 4)

array([ 1.,  4.,  7., 10.])

In [8]:
# Generates random number from 0 - 1. Changes with each run.

np.random.random()

0.4336858314708091

In [9]:
# random.default_rng() is recomened constructor for the

random_obj = np.random.default_rng(seed=None)

# Changes if no seed provided. 

random_obj.random()

0.3726395758649137

In [10]:
# Seed provided, no change. 

random_obj = np.random.default_rng(seed=42)
random_obj.random()

0.7739560485559633

In [11]:
# numpy.reshape(a, newshape, order='C')
#    - Gives a new shape to an array without changing its data.

print('Original:\n', np.arange(9))
print()
print('After using reshape:\n', np.arange(9).reshape(3,3))

Original:
 [0 1 2 3 4 5 6 7 8]

After using reshape:
 [[0 1 2]
 [3 4 5]
 [6 7 8]]


In [12]:
x = np.arange(2,10)
print(x)
x[-1]

[2 3 4 5 6 7 8 9]


9

In [13]:
x.shape = (2,4)
print('Array:\n', x, '\n')
print('x[-1]: ', x[-1])
print('x[1,3]: ', x[1,3])

Array:
 [[2 3 4 5]
 [6 7 8 9]] 

x[-1]:  [6 7 8 9]
x[1,3]:  9


In [14]:
a = np.arange(1,11)
b = np.arange(12,22)
a+b

array([13, 15, 17, 19, 21, 23, 25, 27, 29, 31])

In [15]:
# numpy.dot(a, b, out=None)
#    - Dot product(matrix multiplication) of two arrays

a = np.arange(1,11).reshape(2,5)
b = np.arange(12,22).reshape(5,2)
result = np.dot(a,b)
result

array([[260, 275],
       [660, 700]])

In [16]:
# numpy.transpose(a, axes=None)
#    - Returns an array with axis transposed (swapped axis)

result.transpose()

array([[260, 660],
       [275, 700]])

In [17]:
# linalg.inv(a)
#    - Compute the multiplicative inverse of a matrix

np.linalg.inv(result)

array([[ 1.4 , -0.55],
       [-1.32,  0.52]])

# Scipy

- built on the NumPy library
- various tools and functions for solving common problems in scientific computing

**ex:**
- Fourier Transforms (scipy.fftpack)
- Multidimensional image processing (scipy.ndimage)
- Spatial data structures and algorithms (scipy.spatial)

# Continuing with Pandas

In [18]:
import pandas as pd

In [19]:
WORLD_DATA_PATH = '/Users/christopherreid/My Drive (christopherreid@arizona.edu)/Classes/6. Summer 2023/CSC 380 - Principles of Data Science/Lecture Slides/Lecture 3.1/spotify-top-50/data/spotify-streaming-top-50-world.csv'

In [20]:
world_df = pd.read_csv(WORLD_DATA_PATH)

FileNotFoundError: [Errno 2] No such file or directory: '/Users/christopherreid/My Drive (christopherreid@arizona.edu)/Classes/6. Summer 2023/CSC 380 - Principles of Data Science/Lecture Slides/Lecture 3.1/spotify-top-50/data/spotify-streaming-top-50-world.csv'

In [None]:
world_df.sample(1)

# Q: The time range of the dataset?

In [None]:
#type(world_df['date'].dtype)
world_df['date'].dtype #numpy object

In [None]:
#type(world_df['date'])
type(world_df['date'][0])

In [None]:
# pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=False, format=None, exact=_NoDefault.no_default, unit=None, infer_datetime_format=_NoDefault.no_default, origin='unix', cache=True)
#    - Converts argument to datetime
#    - Converts column date to date-datatype

world_df['date'] = pd.to_datetime(world_df['date'])

**Q: What is the time range in which this dateset is recording top 50?**

**Assume this records everyday.**

In [None]:
world_df['date'].max(), world_df['date'].min()