# Tutorial Series — Data Analysis with Python - Full Course for Beginners (Numpy, Pandas, Matplotlib, Seaborn)
Source: https://www.youtube.com/watch?v=r-uOLxNrNk8

## About Numpy (practical starts at 1:29:48)
- NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
- Various other libraries like Pandas, Matplotlib, and Scikit-learn are built on top of this amazing library.
- It is also very efficient.

In [4]:
import sys
import numpy as np

In [5]:
np.__version__

'1.24.3'

## Numpy vs List

In Python, NumPy arrays and standard lists are both used for storing collections of data. However, they have different characteristics and are suited for different use cases. Here are the key differences:

Performance:

- NumPy Arrays: They are more efficient for numerical operations because they are stored in contiguous memory locations and have a fixed data type, which allows for vectorized operations (operating on entire arrays).
- Lists: They are less efficient for numerical computations because they are implemented as arrays of pointers to objects, which can be of different types. This diversity of types means operations typically involve more overhead.

Memory Usage:

- NumPy Arrays: More memory-efficient for large arrays of numerical data, as they have a fixed data type, reducing memory overhead.
- Lists: Can consume more memory for large data sets, especially when the data is of a uniform type, because of the overhead of individual object storage.

Functionality:

- NumPy Arrays: Come with a vast array of functions and methods optimized for numerical computations, including mathematical functions, statistical operations, and linear algebra routines.
- Lists: Provide general-purpose functionalities, such as adding or removing elements, but lack the specialized mathematical and statistical methods inherent to NumPy.

Flexibility:

- NumPy Arrays: Generally less flexible in terms of storing different data types; they require all elements to be of the same data type.
- Lists: Highly flexible, as they can store elements of different types, including nested lists.

Use Cases:
- NumPy Arrays: Preferred for scientific computing, data analysis, machine learning, and any domain where large-scale numerical data processing is required.
- Lists: Ideal for general-purpose programming when you need a flexible container to store a collection of mixed data types.

In summary, if your work is heavily numerical or requires high performance and efficiency (such as in data science or numerical simulations), NumPy arrays are generally the better choice. For general programming tasks where such optimizations are not critical, and you need to store a varied collection of elements, Python lists are more suitable.

In [6]:
# create numpy array (just pass in a list of something)- usually that list is derived from text files or external sources
my_list = [10,20,30,40]
print(my_list)
my_array = np.array([10,20,30,40])
my_array

[10, 20, 30, 40]


array([10, 20, 30, 40])

## Similarities of Python Lists and Numpy

**Accessing Indexes**

In [7]:
my_list[0]

10

In [8]:
my_array[0]

10

**List Slicing Example**

In [9]:
my_list[1:3]

[20, 30]

In [10]:
my_array[1:3]

array([20, 30])

## Differences of Python Lists and Numpy

**Accessing multiple variables**

In [11]:
my_list[0], my_list[1], my_list[2] # this will still work for list

(10, 20, 30)

In [12]:
my_array[0], my_array[1], my_array[2] # this will work for numpy arrays too

(10, 20, 30)

**— Multi-indexing in numpy | output is a numpy array with values of first 3 index**

In [13]:
x = my_array[[0, 1, 2]] # access index 0, 1, and 2 in my_array
x # this is a numpy array datatype.


array([10, 20, 30])

## Types in Numpy

In [14]:
# use array.dtype to see datatype of your numpy array
my_array.dtype # im not sure why my dtype is int32 instead of int64 (when my PC is x64) - You may refer to this for some info (https://stackoverflow.com/questions/36278590/numpy-array-dtype-is-coming-as-int32-by-default-in-a-windows-10-64-bit-machine)

dtype('int32')

**Create array and specify it to be a int8 array (i.e. store using more precise bits, making program more performant)**

In [15]:
my_int8_array = np.array([1, 2, 3], dtype=np.int8)
my_int8_array

array([1, 2, 3], dtype=int8)

In [16]:
my_int8_array.dtype

dtype('int8')

**Create array and specify it to be a float array**

In [17]:
my_float_array = np.array([1, 2, 3], dtype=np.float64)
my_float_array

array([1., 2., 3.])

In [18]:
my_float_array.dtype

dtype('float64')

**You can store strings and even objects but it is rarely used this way. Numpy is typically used for numerical processing**

**— Store String**

In [19]:
my_string_array = np.array(['a', 'b', 'c'])
my_string_array

array(['a', 'b', 'c'], dtype='<U1')

In [20]:
my_string_array.dtype

dtype('<U1')

**— Store Object**

In [21]:
class Dog():
    def __init__(self, name):
        self.name = name

my_dawg = Dog("Aaron")
my_dawg2 = Dog("Bob")

my_object_array = np.array([my_dawg, my_dawg2])
my_object_array

array([<__main__.Dog object at 0x00000193C7F41A10>,
       <__main__.Dog object at 0x00000193C7F38350>], dtype=object)

In [22]:
my_object_array.dtype

dtype('O')

## Dimensions and Shapes

**Creating a 2 x 3 (2 rows 3 column) array**

In [23]:
multi_dimension_array = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
multi_dimension_array

array([[1, 2, 3],
       [4, 5, 6]])

**See the shape to know how many rows by how many columns**

In [24]:
multi_dimension_array.shape

(2, 3)

**See the dimensions of the array**

In [25]:
multi_dimension_array.ndim

2

**See the size of the array (i.e. how many elements it has)**

In [26]:
multi_dimension_array.size

6

## — End of NP Tutorial (didn't finish all the np practical because I don't need to, i ended around 1:35:37) —