# Introduction to NumPy & Pandas

# NumPy
NumPy is a python library used for working with large, multi-dimentional arrays. It also has a large collection of mathmatical functions to operate on these arrays. Some uses of NumPy include:
1. Scientific domains of statistical computing, image processing, and mathematical analysis
2. Data Science for ETL, Exploratory Data Analysis, and Modeling
3. Machine Learning
4. Data Visualization

Let's import numpy as np.

In [8]:
pip install numpy 

Collecting numpy
  Downloading numpy-2.2.5-cp313-cp313-win_amd64.whl.metadata (60 kB)
Downloading numpy-2.2.5-cp313-cp313-win_amd64.whl (12.6 MB)
   ---------------------------------------- 0.0/12.6 MB ? eta -:--:--
   ---------------------------------------- 0.0/12.6 MB ? eta -:--:--
    --------------------------------------- 0.3/12.6 MB ? eta -:--:--
   - -------------------------------------- 0.5/12.6 MB 1.1 MB/s eta 0:00:11
   -- ------------------------------------- 0.8/12.6 MB 993.3 kB/s eta 0:00:12
   -- ------------------------------------- 0.8/12.6 MB 993.3 kB/s eta 0:00:12
   -- ------------------------------------- 0.8/12.6 MB 993.3 kB/s eta 0:00:12
   -- ------------------------------------- 0.8/12.6 MB 993.3 kB/s eta 0:00:12
   -- ------------------------------------- 0.8/12.6 MB 993.3 kB/s eta 0:00:12
   -- ------------------------------------- 0.8/12.6 MB 993.3 kB/s eta 0:00:12
   -- ------------------------------------- 0.8/12.6 MB 993.3 kB/s eta 0:00:12
   -- --------

In [11]:
import numpy as np

## NumPy Array
Let's create an array using the function np.array, and include 3 values in square brackets. You might find the Numpy arrays similar to Python lists that we discussed in the last chapter. The difference is that Numpy arrays perform faster and are more memory efficient. That makes it much easier to process large data sets that are commonly used in machine learning models.

In [10]:
np.array([0,1,2])

array([0, 1, 2])

With the array function, we can also create a matrix of values. Let's create an Array_1 with 2 groups of numbers, each group is written within a pair of square brackets. Now let's print Array_1 and we can see the 2 groups listed below.

In [14]:
# Creating a matrix
array_1 =np.array([[1,2,3],[4,5,6]])
print(array_1)

[[1 2 3]
 [4 5 6]]


We can assign an array with other types of data as well. For example, let's create an Array_2 with 4 numbers in one group, and a mix of string and numbers in another group.

The result shows that one string value in the array turns all values into strings even if you did not put quotation marks around the numbers. This is different from Python lists where you can have different types of data in one list.

In [17]:
# Arrays in Python can also have words.
array_2 =  np.array([[1,2,3],['hi', 'hello', 'bonjour'], [5,6,7]])
print(array_2)


array_3 =  np.array([[1,2,3],['hi', 'hello', 'bonjour'], ['salut',6,7]])
print(array_3)

[['1' '2' '3']
 ['hi' 'hello' 'bonjour']
 ['5' '6' '7']]
[['1' '2' '3']
 ['hi' 'hello' 'bonjour']
 ['salut' '6' '7']]


## Operations with NumPy Arrays
Once we create a numpy array, we can use the `numpy.shape` to see the dimension of the array. This is an important function because when we start working with bigger and more complex data sets, it can be important to know its shape. Let's look at the shape of Array_1 that we created earlier. The result tells us this array has 2 rows and 3 columns.

In [18]:
np.shape(array_3)

(3, 3)

To call a particular cell within an array, we use the row and column index to identify the location of the value. Let's try to call the item in row 2 and column 1 and we get the value 4.

In [20]:
# Calling a particular cell within array [row, column]
#array_3 [1,0]
array_2 [2,2]


np.str_('7')

In [26]:
# Retrieve the first item (index 0) from the second row (index 1)
array_1[1,]

array([4, 5, 6])

Now let's look at how to access the minimum or maximum value of an array. Still using Array_1 as an example, we can apply `min()` and `max()` function to find the smallest and the largest values in the array.

In [27]:
print(array_1.min())
print(array_1.max())

1
6


# Pandas

The second Python package that we will explore is Pandas. Pandas offers powerful data structures that help analyze and manipulate data. It is an open source Python package that is most widely used for data science and analysis that is built on top of Numpy. 

Pandas automates tasks that are time consuming and repetitive. Some uses of Pandas include:

 1. Data cleaning - systematically clean dirty data
 2. Loading and saving data - Easily import data from an external source and export to your local computer
 3. Filling data - systematically fill in data
 4. Joining data - Merge datasets together
 5. Statistical analysis - Run statstistcal analysis on datasets easily

In [30]:
pip install pandas

Collecting pandas
  Downloading pandas-2.2.3-cp313-cp313-win_amd64.whl.metadata (19 kB)
Collecting pytz>=2020.1 (from pandas)
  Downloading pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas)
  Downloading tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading pandas-2.2.3-cp313-cp313-win_amd64.whl (11.5 MB)
   ---------------------------------------- 0.0/11.5 MB ? eta -:--:--
   ---------------------------------------- 0.0/11.5 MB ? eta -:--:--
   ---------------------------------------- 0.0/11.5 MB ? eta -:--:--
    --------------------------------------- 0.3/11.5 MB ? eta -:--:--
   - -------------------------------------- 0.5/11.5 MB 1.7 MB/s eta 0:00:07
   -- ------------------------------------- 0.8/11.5 MB 1.6 MB/s eta 0:00:07
   ---- ----------------------------------- 1.3/11.5 MB 1.4 MB/s eta 0:00:08
   ----- ---------------------------------- 1.6/11.5 MB 1.4 MB/s eta 0:00:08
   ------ --------------------------------- 1.8/11.5 M

In [31]:

import pandas as pd

## Pandas Series
The first type of data structure in Pandas is called series. Series is an one-dimensional array with index labels. It can also hold different types of data. Series can be made from lists, dictionaries, and numpy arrays.

Series are useful when trying to make simple and organizied data that can be quickly digested.

Consider the code below for different types of data that will be used to create a series.

In [None]:
markers = ['a','b','c'] 
list_1 = [12,24,36]
array_1 = np.array([15,30,45])
dict_1 = {'d':20,'e':40,'f':60}

We will first apply `list_1` into a series. Please note that Python is case sensitive and series needs to be written with a capital S.

In [None]:
# Applying list into a series


We see that pandas automatically assigned indexs (0, 1, 2) to the list when the series was created. We can change the index of a series by using the `markers` list we mentioned above by inserting it into the series code.

In [None]:
# Changing the index of a series


We can make a series the same way with an array and a dictionary. Using dictionaries to make a series is different, as dictionaries already have their own indicies assigned. The data is inputted the same way, but there is no requirement to add index data.

In [None]:
# Apply array into a series


In [None]:
# Apply dictionary into a series


We can see that pandas shows what the data type of the series, which in this case is an integer. A series can hold not just integers, but many othe data objects. 

## Operations with Pandas Series
Similar to a dictionary, you can use the index of a series to easily look up values. Consider the following gift shop data. We can access a value by calling the index, similar to a dictionary. If we want to see the sales on magnets. We just need to type the series name, and call magnets in square brackets.

In [None]:
# Gather a value by the index


One powerful use of a series is performing series operations. Consider the following Q2 data that has different revenue numbers. We can add the two series (Gift_shop_salesQ1, Gift_shop_salesQ2) together as they have the same index and find the bi-annual revenue numbers.

However, in instances when you are trying to combine series that have different indicies, only the shared indicies amongst the series will return a value, with the rest being null. Since `Gift_shop_salesQ3` introduced new indicies, these indicies that are not present in all three series are left as null.

## Pandas DataFrame

By far the most frequent use of pandas is the DataFrame. DataFrame is a 2-dimensional data structure that contains rows and columns of data, similar to an Excel table.

DataFrames allow us to leverage the power of pandas in ways mentioned earlier in the lesson, such as to clean the data, manipulate the data, and perform statistical analysis.

There are multiple ways of creating a DataFrame. The first method that we will look at is creating it with an array.

There's no labels on the rows and columns of this numpy array. In order to prepare for the DataFrame, we'll create a list called rows with five characters to lable the five rows, and another list called columns with three characters to label the three columns.

We will now generate a DataFrame called dataframe_1, which will contain the information from the `Frames` array as well as the row and column names.

Similar to a pandas series, DataFrames can store various types of data, such as integers, strings, lists, etc.

We will now create a DataFrame with a dictionary. 