*In Numpy we'll find:*
- ndarray, an efficient multidimentional array providing fast array-oriented arithmetic operations and flexible broadcasting capabilities
- Mathematical functions for fast operations on entire arrays of data without having to write loops
- Tools for reading/writing array data to disk and working with memory-mapped files
- Linear Algebra, random number generation, and Fourier transform capabilities
- A C API for connecting Numpy with libraries written in C, C++, or Fortran 

In [2]:
import numpy as np
import sys

# Python list
py_list = [1, 2, 3, 4, 5, 6]
print(f"Python list size: {sys.getsizeof(py_list)} bytes")  # Memory used by the list object
print(f"Memory per element (Python list): {sys.getsizeof(py_list[0])} bytes")  # Memory per element

# NumPy array
np_array = np.array([1, 2, 3, 4, 5])
print(f"NumPy array size: {np_array.nbytes} bytes")  # Total memory used by the array


Python list size: 104 bytes
Memory per element (Python list): 28 bytes
NumPy array size: 40 bytes


### The numpy ndarray

In [3]:
import numpy as np
data = np.array([[1.5, -0.1, 3], [0, -3, 6.5]])
data

array([[ 1.5, -0.1,  3. ],
       [ 0. , -3. ,  6.5]])

In [5]:
#scalar multiplication 
data * 10

array([[ 15.,  -1.,  30.],
       [  0., -30.,  65.]])

In [6]:
#vectorized addition
data + data

array([[ 3. , -0.2,  6. ],
       [ 0. , -6. , 13. ]])

In [9]:
# every array has a "shape", which is a tuple indicating the size of each dimention
data.shape

(2, 3)

In [11]:
# numpy arrays are homegenous (of the same type) -> "dtype" helps us chech that type
data.dtype

dtype('float64')

In [12]:
# creating ndarray
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1

array([6. , 7.5, 8. , 0. , 1. ])

In [13]:
# nested sequences, like a list of equal-length lists, will be converted into a multidimensional array
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [15]:
# naturally, since we have created numpy object from a list of lists, our array will have two dimentions.
# Let's check that with "ndim"
print(f"Number of dimentions in our array: {arr2.ndim}")
print(f"Tuple portraying shape of our array:  {arr2.shape}")

Number of dimentions in our array: 2
Tuple portraying shape of our array:  (2, 4)


Unless explicitly specified, numpy.array tries to infer a good data type for the array that it creates. The data type is stored in a special dtype metadata object.

In [18]:
print(arr1.dtype)
print(arr2.dtype)

float64
int64


In [19]:
# in addition to numpy.array there are other functions for creating new arrays
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [20]:
np.zeros((3,6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [22]:
?np.empty
np.empty((2,3,2))

array([[[0., 0.],
        [0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.],
        [0., 0.]]])

It’s not safe to assume that numpy.empty will return an array of all zeros. This function returns uninitialized memory and thus may contain nonzero "garbage" values. You should use this function only if you intend to populate the new array with data.

In [26]:
# numpy.arange is an array-valued version of the python range function
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

## Working with JSON

In [27]:
import json

In [28]:
person = '{"name": "Alice", "Languages": ["english", "french"]}'

In [30]:
person_dict = json.loads(person)

In [31]:
person_dict

{'name': 'Alice', 'Languages': ['english', 'french']}

In [37]:
# we can use "load" function in order to read json from existing file
with open('/Users/dorotamisztal-poleszczuk/Desktop/Michał chwilowy/Books/Python_for_Data_Analysis/Chapter_4/person.json', 'r') as f:
    data = json.load(f)

data

{'name': 'Alice', 'languages': ['english', 'french']}

In [38]:
# we can convert python dict to JSON string using "dumps" method
import json
person_dict = {"name": "Bob",
              "age": 12,
              "children": None}

person_json = json.dumps(person_dict)
person_json

'{"name": "Bob", "age": 12, "children": null}'

In [39]:
# in order to write JSON to a file we can use "dump" method
import json

person_dict = {"name": "Bob",
              "languages": ["English", "French"],
              "married": True,
              "age": 32
              }

with open("person.txt", "w") as json_file:
    json.dump(person_dict, json_file)
    


In [40]:
# we've created a new "person.txt" file in our filesystem
ls

Numpy Arrays and Vectorized Computations.ipynb
person.json
person.txt


In [42]:
# Python pretty print JSON 
import json

person_string = '{"name": "Bob", "languages": "English", "numbers": [2, 1.6, null]}'

# getting python dict object
person_dictinary = json.loads(person_string)

# pretty printing JSON string back 
print(json.dumps(person_dictinary, indent=4, sort_keys=True))

{
    "languages": "English",
    "name": "Bob",
    "numbers": [
        2,
        1.6,
        null
    ]
}


# Data Types for ndarrays

In [43]:
# data type or dtype is a special object containing the information (metadata) the ndarray needs to interpret chunk of memory
import numpy as np
arr1 = np.array([1,2,3], dtype=np.float64)
arr2 = np.array([1,2,3], dtype=np.int32)

print(arr1.dtype)
print(arr2.dtype)

float64
int32


In [44]:
# type conversion
arr = np.array([1,2,3,4,5])
print(arr.dtype)

# we'll convert into float 
floating_arr = arr.astype(np.float64)
print(floating_arr.dtype)

int64
float64


Liczby całkowite ze znakiem a bez znaku:
 - Liczby całkowite ze znakiem: Mogą przechowywać zarówno liczby dodatnie, jak i ujemne (np. int8 może przechowywać wartości od -128 do 127).
 - Liczby całkowite bez znaku: Mogą przechowywać tylko liczby nieujemne (np. uint8 może przechowywać wartości od 0 do 255).

In [50]:
# In this example, integers were cast to floating point
# If I cast some floating-point numbers to be of integer data type, the decimal part will be truncated

arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
print(arr)
arr = arr.astype(np.int32)
print(arr)
print(arr.dtype)

[ 3.7 -1.2 -2.6  0.5 12.9 10.1]
[ 3 -1 -2  0 12 10]
int32


In [54]:
# when we have an array of strings representing numbers, we can use astype to convert them to numeric form
numeric_strings = np.array(["1.25", "-9.6", "42"], dtype=np.string_)
print(numeric_strings.dtype)

|S4


In [57]:
numeric_strings.astype(np.float64) #If casting were to fail for some reason (like a string that cannot be converted to float64), a ValueError will be raised

array([ 1.25, -9.6 , 42.  ])

In [59]:
# naturally, we could also use another "dtype" attribute
int_array = np.arange(10)
calibers = np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)

int_array = int_array.astype(calibers.dtype)
print(int_array.dtype)

float64


In [60]:
# There are shorthand type code strings you can also use to refer to a dtype
zeros_uint32 = np.zeros(8, dtype="u4")
zeros_uint32

array([0, 0, 0, 0, 0, 0, 0, 0], dtype=uint32)

# Arthmetic with Numpy arrays

In [62]:
# arrays allow us to express batch operations on data without writing any for loops 
# NumPy users call this vectorization
arr = np.array([[1., 2., 3.], [4.,5.,6.]])
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [63]:
arr * arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [64]:
arr - arr

array([[0., 0., 0.],
       [0., 0., 0.]])

In [65]:
# arthmetic operations with scalars 
1/arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [67]:
# comparisons between arrays of the same size yield Boolean arrays
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
print(arr2)

arr2 > arr



[[ 0.  4.  1.]
 [ 7.  2. 12.]]


array([[False,  True, False],
       [ True, False,  True]])

In [69]:
# Evaluating operations between differently sized arrays is called broadcasting and will be discussed later

## Basic Indexing and Slicing

An important first distinction from Python's built-in lists is that array slices are views on the original array. This means that the data is not copied, and any modifications to the view will be reflected in the source array.