# Introduction to Numpy and Pandas

In this notebook, we will explore two primary libraries in Python used for data analysis and manipulation - Numpy and Pandas. Both of these libraries serve different purposes but are interconnected. While Numpy provides support for mathematical operations on arrays and matrices, Pandas provides high-performance, easy-to-use data structures and data analysis tools.

# Importing the Numpy and Pandas Libraries

Before using these libraries, we must import them. They are usually imported under the aliases np and pd to provide a short handle. Here is an example of how we can import these libraries:

In [1]:
import numpy as np
import pandas as pd

This allows us to use numpy and pandas functions by only having to call np or pd.

# Numpy Array

Numpy array, also known as ndarray, is the core of the Numpy library. It is a grid of values, all of the same type, and is indexed by a tuple of positive integers. They are useful for performing mathematical and logical operations on arrays.

Here's how to create a numpy array from python list and inspect its type:

In [2]:
my_list = [1, 2, 3, 4]
my_array = np.array(my_list)

# Check the type
print(type(my_array)) # Outputs: <class 'numpy.ndarray'>

<class 'numpy.ndarray'>


Numpy also provides a range of functions to generate arrays - `arange()`, `linspace()`, `logspace()`, `zeros()`, `ones()`

In [3]:
# Generate an array of 10 numbers from 0(inc) to 10(exc)
np_array = np.arange(10)
print(np_array)  # Outputs: [0 1 2 3 4 5 6 7 8 9]

[0 1 2 3 4 5 6 7 8 9]


# Mathematical Operations with Numpy Arrays

One of the key features of Numpy arrays is the ability to perform vectorized operations. That is, the arrays support broadcasting which means performing an operation on all items in an array at once. This allows for efficient mathematical manipulations of arrays.

For example, we can easily multiply each element of an array by 2:

In [4]:
my_array = np.array([1,2,3,4])
new_array = my_array * 2
print(new_array)  # Outputs: array([2, 4, 6, 8])

[2 4 6 8]


# Numpy Array Indexing and Slicing

Numpy offers several ways to index into arrays. Similar to python lists, numpy arrays can be sliced.

In [5]:
my_array = np.array([1, 2, 3, 4, 5])

# Get a value at an index
print(my_array[1])  # Outputs: 2

# Slice
print(my_array[1:3])  # Outputs: array([2, 3])

2
[2 3]


# Numpy Array Methods

Numpy arrays have many useful methods built-in. Examples are `argmax()`, `sum()`, `argmin()`, `max()`, `min()` etc.

In [6]:
my_array = np.array([1, 2, 3, 4, 5])

# Find index of max value
print(my_array.argmax())  # Outputs: 4

# Sum of array elements
print(my_array.sum())  # Outputs: 15

4
15


# Pandas series

A Series is similar to a numpy array but with added functionalities. It is a 1D labelled array holding any data type. The labels of the data are called the index.

Here's how we can create a Pandas Series.

In [7]:
my_series = pd.Series([1, 2, 3, 4], index =['a', 'b', 'c', 'd'])
print(my_series)

a    1
b    2
c    3
d    4
dtype: int64


# Creating a Data Frame from multiple Pandas series

You can join two or more pandas series to form a DataFrame using the `pd.concat()` function.

In [8]:
series1 = pd.Series([1, 2, 3, 4], index =['a', 'b', 'c', 'd'])
series2 = pd.Series([5, 6, 7, 8], index =['a', 'b', 'c', 'd'])

df = pd.concat([series1, series2], axis=1)
print(df)

   0  1
a  1  5
b  2  6
c  3  7
d  4  8


It is important to remember that Numpy and pandas provide us several capabilities in Data Analysis starting from mathematical computations to statistical analysis. Both libraries contain many more features and methods than discussed above. Always feel free to explore their documentation for more advanced functionalities.